The options supported by vai_q_tensorflow
are shown in the following tables.
Name | Type | Description |
---|---|---|
Common Configuration | ||
--input_frozen_graph | String | TensorFlow frozen inference GraphDef file for the floating-point model. It is used for quantize calibration. |
--input_nodes | String | Specifies the name list of input nodes of the
quantize graph, used together with –output_nodes ,
comma separated. Input nodes and output nodes are the start and end
points of quantization. The subgraph between them is quantized if it
is quantizable.Recommended: Set
–input_nodes as the last nodes for
pre-processing and –output_nodes as the last
nodes for post-processing because some of the operations
required for pre- and post-processing are not quantizable and
might cause errors when compiled by the Vitis AI compiler if you need to
deploy the quantized model to the DPU. The input nodes might not
be the same as the placeholder nodes of the
graph. |
--output_nodes | String | Specifies the name list of output nodes of the
quantize graph, used together with –input_nodes ,
comma separated. Input nodes and output nodes are the start and end
points of quantization. The subgraph between them is quantized if it
is quantizable. Recommended: Set
–input_nodes as the last nodes for
pre-processing and –output_nodes as the last
nodes for post-processing because some of the operations
required for pre- and post-processing are not quantizable and
might cause errors when compiled by the Vitis AI compiler if you need to
deploy the quantized model to the DPU. |
--input_shapes | String | Specifies the shape list of input nodes. Must be a 4-dimension shape for each node, comma separated, for example 1,224,224,3; support unknown size for batch_size, for example ?,224,224,3. In case of multiple input nodes, assign the shape list of each node separated by :, for example, ?,224,224,3:?,300,300,1. |
--input_fn | String | Provides the input data for the graph used with the calibration
dataset. The function format is module_name.input_fn_name (for example, my_input_fn.input_fn ). The -input_fn should take an int object as input which indicates
the calibration step, and should return a
dict`(placeholder_node_name, numpy.Array)` object for each call,
which is then fed into the placeholder operations of the model. For example, assign
Note: You do not need to do in-graph pre-processing
again in input_fn because the subgraph before
–input_nodes remains during
quantization. Remove the pre-defined input functions (including
default and random) because they are not commonly used. The
pre-processing part which is not in the graph file should be
handled in the input_fn. |
Quantize Configuration | ||
--weight_bit | Int32 | Specifies the bit width for quantized weight and
bias. Default: 8 |
--activation_bit | Int32 | Specifies the bit width for quantized
activation. Default: 8 |
--nodes_bit | String | Specifies the bit width of nodes. Node names and
bit widths form a pair of parameters joined by a colon; the
parameters are comma separated. When specifying the conv op name,
only vai_q_tensorflow will quantize the weights of
conv op using the specified bit width. For example,
'conv1/Relu:16,conv1/weights:8,conv1:16'. |
--method | Int32 | Specifies the method for quantization.
Default value: 1 |
--nodes_method | String | Specifies the method of nodes. Node names and
method form a pair of parameters joined by a colon; the parameter
pairs are comma separated. When specifying the conv op name, only
vai_q_tensorflow will quantize weights of conv
op using the specified method, for example,
'conv1/Relu:1,depthwise_conv1/weights:2,conv1:1'. |
--calib_iter | Int32 | Specifies the iterations of calibration. Total
number of images for calibration = calib_iter * batch_size. Default value: 100 |
--ignore_nodes | String | Specifies the name list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization. |
--skip_check | Int32 | If set to 1, the check for float model is
skipped. Useful when only part of the input model is quantized. Range: [0, 1] Default value: 0 |
--align_concat | Int32 | Specifies the strategy for the alignment of the
input quantizeposition for concat nodes.
Default value: 0 |
--simulate_dpu | Int32 | Set to 1 to enable the simulation of the DPU. The
behavior of DPU for some operations is different from TensorFlow.
For example, the dividing in LeakyRelu and AvgPooling are replaced
by bit-shifting, so there might be a slight difference between DPU
outputs and CPU/GPU outputs. The vai_q_tensorflow quantizer
simulates the behavior of these operations if this flag is set to
1. Range: [0, 1] Default value: 1 |
--adjust_shift_bias | Int32 | Specifies the strategy for shift bias check and
adjustment for DPU compiler.
Default value: 1 |
--adjust_shift_cut | Int32 | Specifies the strategy for shift cut check and
adjustment for DPU compiler.
Default value: 1 |
--arch_type | String | Specifies the arch type for fix neuron. 'DEFAULT' means quantization range of both weights and activations are [-128, 127]. 'DPUCADF8H' means weights quantization range is [-128, 127] while activation is [-127, 127] |
--output_dir | String | Specifies the directory in which to save the
quantization results. Default value: “./quantize_results” |
--max_dump_batches | Int32 | Specifies the maximum number of batches for
dumping. Default value: 1 |
--dump_float | Int32 | If set to 1, the float weights and activations
are dumped. Range: [0, 1] Default value: 0 |
--dump_input_tensors | String | Specifies the input tensor name of Graph when
graph entrance is not a placeholder. Add a placeholder to the
dump_input_tensor , so that input_fn can feed
data. |
--scale_all_avgpool | Int32 | Set to 1 to enable scale output of AvgPooling op
to simulate DPU. Only kernel_size <= 64 will be scaled. This
operation does not affect the special case such as
kernel_size=3,5,6,7,14 Default value: 1 |
--do_cle | Int32 |
Default value: 0 |
--replace_relu6 | Int32 | Available only for do_cle=1
Default value: 1 |
Session Configurations | ||
--gpu | String | Specifies the IDs of the GPU device used for quantization separated by commas. |
--gpu_memory_fraction | Float | Specifies the GPU memory fraction used for
quantization, between 0-1. Default value: 0.5 |
Others | ||
--help | Shows all available options of vai_q_tensorflow. | |
--version | Shows the version information for vai_q_tensorflow . |
Examples
show help: vai_q_tensorflow --help
quantize:
vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb \
--input_nodes inputs \
--output_nodes predictions \
--input_shapes ?,224,224,3 \
--input_fn my_input_fn.calib_input
dump quantized model:
vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb \
--input_fn my_input_fn.dump_input
Refer to Xilinx Model Zoo for more TensorFlow model quantization examples.