- Prepare Float Model: Before running vai_q_tensorflow, prepare the frozen inference tensorflow
model in floating-point format and calibration set, including the files listed
in the following table.
Table 1. Input Files for vai_q_tensorflow No. Name Description 1 frozen_graph.pb Floating-point frozen inference graph. Ensure that the graph is the inference graph rather than the training graph. 2 calibration dataset A subset of the training dataset containing 100 to 1000 images. 3 input_fn An input function to convert the calibration dataset to the input data of the frozen_graph during quantize calibration. Usually performs data preprocessing and augmentation. For more information, see Getting the Frozen Inference Graph, Getting the Calibration Dataset and Input Function, and Custom Input Function.
- Run vai_q_tensorflow: the following commands to quantize the
model:
$vai_q_tensorflow quantize \ --input_frozen_graph frozen_graph.pb \ --input_nodes ${input_nodes} \ --input_shapes ${input_shapes} \ --output_nodes ${output_nodes} \ --input_fn input_fn \ [options]
For more information, see Setting the --input_nodes and --output_nodes and Setting the Options.
- After successful execution of the above command, two files are generated in
${output_dir}
:- quantize_eval_model.pb is used to evaluate on CPU/GPUs, and can be used to simulate the results on hardware. You need to run import tensorflow.contrib.decent_q explicitly to register the custom quantize operation, because tensorflow.contrib is now lazily loaded.
- deploy_model.pb is used to compile the DPU codes and deploy on it, which can be used as the input files to the Vitis AI compiler.
Table 2. vai_q_tensorflow Output Files No. Name Description 1 deploy_model.pb Quantized model for VAI compiler (extended Tensorflow format) 2 quantize_eval_model.pb Quantized model for evaluation - After deployment of the quantized model, sometimes it is necessary to compare
the simulation results on the CPU/GPU and the output values on the DPU.
vai_q_tensorflow supports dumping the simulation results with the
quantize_eval_model.pb generated in step 3.
Run the following commands to dump the quantize simulation results:
$vai_q_tensorflow dump \ --input_frozen_graph quantize_results/quantize_eval_model.pb \ --input_fn dump_input_fn \ --max_dump_batches 1 \ --dump_float 0 \ --output_dir quantize_reuslts \
The input_fn for dumping is similar to the input_fn for quantize calibration, but the batch size is often set to 1 to be consistent with the DPU results.
After successful execution of the above command, dump results are generated in
${output_dir}
. There are folders in${output_dir}
, and each folder contains the dump results for a batch of input data. In the folders, results for each node are saved separately. For each quantized node, results are saved in *_int8.bin and *_int8.txt format. If dump_float is set to 1, the results for unquantized nodes are dumped. The / symbol is replaced by _ for simplicity. Examples for dump results are shown in the following table.Table 3. Examples for Dump Results Batch No. Quant Node Name Saved files 1 Yes resnet_v1_50/conv1/biases/wquant {output_dir}/dump_results_1/resnet_v1_50_conv1_biases_wquant_int8.bin {output_dir}/dump_results_1/resnet_v1_50_conv1_biases_wquant_int8.txt
2 No resnet_v1_50/conv1/biases {output_dir}/dump_results_2/resnet_v1_50_conv1_biases.bin {output_dir}/dump_results_2/resnet_v1_50_conv1_biases.txt
Use the following steps to Run vai_q_tensorflow.