The main difference between Caffe and TensorFlow is that the model is summarized by a single file and quantization information must be retrieved from a GraphDef.
**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
usage: vai_c_tensorflow.py [-h] [-f FROZEN_PB] [-a ARCH] [-o OUTPUT_DIR]
[-n NET_NAME] [-e OPTIONS] [-q]
optional arguments:
-h, --help show this help message and exit
-f FROZEN_PB, --frozen_pb FROZEN_PB
prototxt
-a ARCH, --arch ARCH json file
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
output directory
-n NET_NAME, --net_name NET_NAME
prefix-name for the outputs
-e OPTIONS, --options OPTIONS
extra options
-q, --quant_info extract quant info
Now, the interface clearly explains how to specify the frozen graph. Assuming that the model and quantization information is required.
vai_c_tensorflow.py --frozen_pb deploy.pb --net_name cmd --options "{'placeholdershape': {'input_tensor' : [1,224,224,3]}, 'quant_cfgfile': 'fix_info.txt'}" --arch arch.json --output_dir work/temp
As you can see, the quantization information and the shape of the input placeholder are specified. It is common practice to have placeholder layers specifying the input of the model. It is good practice to specify all dimensions and use the number of batches equal to one. Optimize for latency and accept a batch size 1-4 (but this does not improve latency, it improves very little the throughput, and it is not completely tested for any networks).
There are cases where calibration and fine tuning provide a model that cannot be
executed in native TensorFlow, but it contains the quantization information. If you run
this front end with [-q, --quant_info extract quant info
]
on, create quantization information.
The software repository should provide examples where the compiler is called twice. The first one is to create a quantization information file (using a default name and location) and this is used as input for the code generation.