vai_q_tensorflow Usage - 1.4 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2021-07-22
Version
1.4 English

The options supported by vai_q_tensorflow are shown in the following tables.

Table 1. vai_q_tensorflow Options
Name Type Description
Common Configuration
--input_frozen_graph String TensorFlow frozen inference GraphDef file for the floating-point model, used for quantize calibration.
--input_nodes String The name list of input nodes of the quantize graph, used together with –output_nodes, comma separated. Input nodes and output_nodes are the start and end points of quantization. The subgraph between them is quantized if it is quantizable.

It is recommended to set –input_nodes to be the last nodes of the preprocessing part and to set –output_nodes to be the last nodes before the post-processing part, because some operations in the pre- and postprocessing parts are not quantizable and might cause errors when compiled by the Vitis AI compiler if you need to deploy the quantized model to the DPU. The input nodes might not be the same as the placeholder nodes of the graph.

--output_nodes String The name list of output nodes of the quantize graph, used together with –input_nodes, comma separated. Input nodes and output nodes are the start and end points of quantization. The subgraph between them is quantized if it is quantizable.

It is recommended to set –input_nodes to be the last nodes of the preprocessing part and to set –output_nodes to be the last nodes before the post-processing part, because some operations in the pre- and postprocessing parts are not quantizable and might cause errors when compiled by the Vitis AI compiler if you need to deploy the quantized model to the DPU.

--input_shapes String The shape list of input_nodes. Must be a 4-dimension shape for each node, comma separated, for example 1,224,224,3; support unknown size for batch_size, for example ?,224,224,3. In case of multiple input nodes, assign the shape list of each node separated by :, for example, ?,224,224,3:?,300,300,1.
--input_fn String This function provides input data for the graph used with the calibration dataset. The function format is module_name.input_fn_name (for example, my_input_fn.input_fn). The input_fn should take an int object as input which indicates the calibration step, and should return a dict`(placeholder_node_name, numpy.Array)` object for each call, which is then fed into the placeholder operations of the model.

For example, assign –input_fn to my_input_fn.calib_input, and write calib_input function in my_input_fn.py as:

def calib_input_fn:

# read image and do some preprocessing

return {“placeholder_1”: input_1_nparray, “placeholder_2”: input_2_nparray}

Note: You do not need to do in-graph preprocessing again in input_fn, because the subgraph before –input_nodes remains during quantization.

Remove the pre-defined input functions (including default and random) because they are not commonly used. The preprocessing part which is not in the graph file should be handled in in the input_fn.

Quantize Configuration
--weight_bit Int32 Bit width for quantized weight and bias.

Default: 8

--activation_bit Int32 Bit width for quantized activation.

Default: 8

--method Int32 The method for quantization.

0: Non-overflow method. Makes sure that no values are saturated during quantization. Sensitive to outliers.

1: Min-diffs method. Allows saturation for quantization to get a lower quantization difference. Higher tolerance to outliers. Usually ends with narrower ranges than the non-overflow method.

Choices: [0, 1]

Default: 1

--calib_iter Int32 The iterations of calibration. Total number of images for calibration = calib_iter * batch_size.

Default: 100

--ignore_nodes String The name list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization.
--skip_check Int32 If set to 1, the check for float model is skipped. Useful when only part of the input model is quantized.

Choices: [0, 1]

Default: 0

--align_concat Int32 The strategy for the alignment of the input quantizeposition for concat nodes. Set to 0 to align all concat nodes, 1 to align the output concat nodes, and 2 to disable alignment.

Choices: [0, 1, 2]

Default: 0

--simulate_dpu Int32 Set to 1 to enable the simulation of the DPU. The behavior of DPU for some operations is different from Tensorflow. For example, the dividing in LeakyRelu and AvgPooling are replaced by bit-shifting, so there might be a slight difference between DPU outputs and CPU/GPU outputs. The vai_q_tensorflow quantizer simulates the behavior for these operations if this flag is set to 1.

Choices: [0, 1]

Default: 1

--output_dir String The directory in which to save the quantization results.

Default: “./quantize_results”

 
--max_dump_batches Int32 The maximum number of batches for dumping.

Default: 1

--dump_float Int32 If set to 1, the float weights and activations will also be dumped.

Choices: [0, 1]

Default: 0

Session Configurations
--gpu String The ID of the GPU device used for quantization, comma separated.
--gpu_memory_fraction Float The GPU memory fraction used for quantization, between 0-1.

Default: 0.5

Others
--help Show all available options of vai_q_tensorflow.
--version Show vai_q_tensorflow version information.