TensorFlow - 1.1 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2020-03-23

Version

1.1 English

The main difference between Caffe and TensorFlow is that the model is summarized by a single file and quantization information must be retrieved from a GraphDef.

**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
usage: vai_c_tensorflow.py [-h] [-f FROZEN_PB] [-a ARCH] [-o OUTPUT_DIR]
                           [-n NET_NAME] [-e OPTIONS] [-q]
 
optional arguments:
  -h, --help            show this help message and exit
  -f FROZEN_PB, --frozen_pb FROZEN_PB
                        prototxt
  -a ARCH, --arch ARCH  json file
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        output directory
  -n NET_NAME, --net_name NET_NAME
                        prefix-name for the outputs
  -e OPTIONS, --options OPTIONS
                        extra options
  -q, --quant_info      extract quant info

Now, the interface clearly explains how to specify the frozen graph. Assuming that the model and quantization information is required.

vai_c_tensorflow.py --frozen_pb deploy.pb --net_name cmd --options  "{'placeholdershape': {'input_tensor' : [1,224,224,3]},   'quant_cfgfile': 'fix_info.txt'}" --arch arch.json --output_dir work/temp

As you can see, the quantization information and the shape of the input placeholder are specified. It is common practice to have placeholder layers specifying the input of the model. It is good practice to specify all dimensions and use the number of batches equal to one. Optimize for latency and accept a batch size 1-4 (but this does not improve latency, it improves very little the throughput, and it is not completely tested for any networks).

There are cases where calibration and fine tuning provide a model that cannot be executed in native TensorFlow, but it contains the quantization information. If you run this front end with [-q, --quant_info extract quant info ] on, create quantization information.

The software repository should provide examples where the compiler is called twice. The first one is to create a quantization information file (using a default name and location) and this is used as input for the code generation.

Note: Remember to introduce the output directory and the name of the code generated. The run time contract is based on where the outputs are written. The main approach to call a different compiler for different architecture is through the arch.json file. This file is used as a template for the output description and as an internal feature of the platform/target FPGA design. Furthermore, there is also a python interface where an expert could exploit custom optimizations.