vai_q_tensorflow Usage

The following table shows the vai_q_tensorflow options.

Table 1. vai_q_tensorflow Options
Name	Type	Description
Common Configuration
--input_frozen_graph	String	TensorFlow frozen inference GraphDef file for the floating-point model. It is used for post-training quantization.
--input_nodes	String	Specifies the name list of input nodes of the quantize graph to be used with `–output_nodes`, separated by commas. Input nodes and output nodes are the starting and ending points of quantization. The subgraph between them is quantized if it is quantizable. Recommended: Set `–input_nodes` as the last nodes for pre-processing and `–output_nodes` as the last nodes for post-processing because some of the operations required for pre-and- and post-processing are not quantizable. They might cause errors when the model is compiled by the Vitis AI compiler deployed to the DPU. The input nodes might not be the same as the placeholder nodes of the graph.
--output_nodes	String	Specifies the name list of output nodes of the quantize graph, combined with `–input_nodes`, separated by commas. Input nodes and output nodes are the starting and ending points of quantization. The subgraph between them is quantized if it is quantizable. Recommended: Set `–input_nodes` as the last nodes for pre-processing and `–output_nodes` as the last nodes for post-processing because some of the operations required for pre-and- and post-processing are not quantizable. They might cause errors when the model is compiled by the Vitis AI compiler and deployed to the DPU.
--input_shapes	String	Specifies the shape list of input nodes. It must be a four-dimensional shape for each node, separated by commas. For example, 1,224,224,3. Supports unknown size for batch_size, for example, 224,224,3. In case of multiple input nodes, assign the shape list of each node separated by, for example,? 224,224,3:? 300,300,1.
--input_fn	String	Provides input data for the graph when used with the calibration dataset. The function follows the `module_name.input_fn_name` format (for example, `my_input_fn.input_fn`). The `-input_fn` should take an `int` object as input, representing the calibration step, and return a dict of ("placeholder_node_name, numpy.Array") pairs as an object for each call. The object is then fed into the placeholder operations of the model. For instance, you can assign `-input_fn` to `my_input_fn.calib_input` and create the `calib_input` function `in my_input_fn.py` as follows: `def calib_input_fn: # read the image and do some preprocessing return {"placeholder_1": input_1_nparray, "placeholder_2": input_2_nparray}` Note: You do not need to perform in-graph pre-processing again in `input_fn` because the subgraph before `–input_nodes` remains during quantization. Remove the pre-defined input functions (including default and random) because they are not commonly used. The pre-processing part, which is not in the graph file, should be handled in `input_fn`.
Quantize Configuration
--weight_bit	Int32	Specifies the bit width for quantized weight and bias. Default value: 8
--activation_bit	Int32	Specifies the bit width for quantized activation. Default value: 8
--nodes_bit	String	Specifies the bit width of nodes. Node names and bit widths form a pair of parameters joined by a colon; the parameters are comma separated. When specifying the conv op name, only `vai_q_tensorflow` quantizes the weights of conv op using the specified bit width. For example, conv1/Relu:16,conv1/weights:8,conv1:16.
--method	Int32	Specifies the method for quantization. 0: Non-overflow method in which no values are saturated during quantization. Sensitive to outliers. 1: Min-diffs method that enables saturation for quantization to get a lower quantization difference. Higher tolerance to outliers. Usually ends with narrower ranges than the non-overflow method. 2: Min-diffs method with the strategy for depthwise. It enables saturation for large values during quantization to get smaller quantization errors. A particular strategy is applied for depthwise weights. It is slower than method 0 but has higher endurance to outliers. Default value: 1
--nodes_method	String	Specifies the method of nodes. Node names and methods form a pair of parameters joined by a colon; the parameter pairs are comma separated. When specifying the conv op name, only `vai_q_tensorflow` quantizes weights of conv op using the specified method, for example, 'conv1/Relu:1,depthwise_conv1/weights:2,conv1:1'.
--calib_iter	Int32	Specifies the calibration iterations. A total number of images for calibration = calib_iter * batch_size. Default value: 100
--ignore_nodes	String	Specifies the list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization.
--skip_check	Int32	If set to 1, the check for the floating-point model is skipped. Useful when only part of the input model is quantized. Range: [0, 1] Default value: 0
--align_concat	Int32	Specifies the strategy for aligning the input quantize position for concat nodes. 0: Aligns all the concat nodes 1: Aligns the output concat nodes 2: Disables alignment Default value: 0
--align_pool	Int32	Specifies the strategy for aligning the input quantize position for maxpool/avgpool nodes. 0: Aligns all the maxpool/avgpool nodes 1: Aligns the output maxpool/avgpool nodes 2: Disables alignment Default value: 0
--simulate_dpu	Int32	Set to 1 to enable DPU simulation. The behavior of DPU for some operations is different from TensorFlow. For example, the dividing in LeakyRelu and AvgPooling are replaced by bit-shifting, so there might be a slight difference between DPU outputs and CPU/GPU outputs. The vai_q_tensorflow quantizer simulates the behavior of these operations if this flag is set to 1. Range: [0, 1] Default value: 1
--adjust_shift_bias	Int32	Specifies the strategy for shift bias check and adjustment for the DPU compiler. 0: Disables shift bias check and adjustment 1: Enables with static constraints 2: Enables with dynamic constraints Default value: 1
--adjust_shift_cut	Int32	Specifies the shift cut check and adjustment strategy for the DPU compiler. 0: Disables shift cut check and adjustment 1: Enables with static constraints Default value: 1
--arch_type	String	Specifies the arch type for fixed neuron. DEFAULT means the quantization range of weights and activations is [-128, 127]. 'DPUCADF8H' means the weights quantization range is [-128, 127] while activation is [-127, 127]
--output_dir	String	Specifies the directory to save the quantization results. Default value: “./quantize_results”
--max_dump_batches	Int32	Specifies the maximum number of batches for dumping. Default value: 1
--dump_float	Int32	If set to 1, the float weights and activations are dumped. Range: [0, 1] Default value: 0
--dump_input_tensors	String	Specifies the Graph's input tensor name when the graph entrance is not a placeholder. Add a placeholder to the `dump_input_tensor` so that `input_fn` can feed data.
--scale_all_avgpool	Int32	Set to 1 to enable scale output of AvgPooling op to simulate DPU. Only kernel_size <= 64 is scaled. This operation does not affect special cases such as kernel_size=3,5,6,7,14 Default value: 1
--do_cle	Int32	1: Enables implementation of cross-layer equalization to adjust the distribution of the weight 0: Skips cross-layer equalization operation Default value: 0
--replace_relu6	Int32	Available only for do_cle=1 1: Replace ReLU6 with ReLU 0: Skips replacement Default value: 1
--replace_sigmoid	Int32	1: Replace sigmoid with hard-sigmoid 0: Skips replacement Default value: 0
--replace_softmax	Int32	1: Replace softmax with hard-softmax 0: Skips replacement Default value: 0
--convert_datatype	Int32	4: Do BN folding and convert to data type fp32 3: Do BN folding and convert to data type bfloat16 2: Do BN folding and convert to data type double 1: Do BN folding and convert to data type fp16 0: Skips conversion Default value: 0
--output_format	String	Indicates the format to save the quantized model, pb for saving tensorflow frozen pb, and onnx for saving the ONNX model. Default value: 'pb'
Session Configurations
--gpu	String	Specifies GPU device IDs used for quantization, separated by commas.
--gpu_memory_fraction	Float	Specifies the GPU memory fraction used for quantization, between 0-1. Default value: 0.5
Others
--help		Shows all available vai_q_tensorflow options.
--version		Shows the vai_q_tensorflow version information.

Examples

show help: vai_q_tensorflow --help
quantize: 
vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb \
                          --input_nodes inputs \
                          --output_nodes predictions \
                          --input_shapes ?,224,224,3 \
                          --input_fn my_input_fn.calib_input
dump quantized model: 
vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb \
                      --input_fn my_input_fn.dump_input

Refer to AMD Model Zoo for more TensorFlow model quantization examples.

vai_q_tensorflow Usage - 3.5 English

Vitis AI User Guide (UG1414)

Examples