The following table shows the vai_q_tensorflow
options.
Name | Type | Description |
---|---|---|
Common Configuration | ||
--input_frozen_graph | String | TensorFlow frozen inference GraphDef file for the floating-point model. It is used for post-training quantization. |
--input_nodes | String | Specifies the name list of input nodes of the quantize graph to be
used with –output_nodes , separated
by commas. Input nodes and output nodes are the starting and ending
points of quantization. The subgraph between them is quantized if it
is quantizable.Recommended: Set
–input_nodes as the last nodes for pre-processing
and –output_nodes as the last
nodes for post-processing because some of the operations
required for pre-and- and post-processing are not quantizable.
They might cause errors when the model is compiled by the
Vitis AI compiler
deployed to the DPU. The input nodes might not be the same as
the placeholder nodes of the graph. |
--output_nodes | String | Specifies the name list of output nodes of the quantize graph,
combined with –input_nodes ,
separated by commas. Input nodes and output nodes are the starting
and ending points of quantization. The subgraph between them is
quantized if it is quantizable. Recommended: Set
–input_nodes as the last nodes for pre-processing
and –output_nodes as the last
nodes for post-processing because some of the operations
required for pre-and- and post-processing are not quantizable.
They might cause errors when the model is compiled by the
Vitis AI compiler and
deployed to the DPU. |
--input_shapes | String | Specifies the shape list of input nodes. It must be a four-dimensional shape for each node, separated by commas. For example, 1,224,224,3. Supports unknown size for batch_size, for example, 224,224,3. In case of multiple input nodes, assign the shape list of each node separated by, for example,? 224,224,3:? 300,300,1. |
--input_fn | String |
Provides input data for the graph when used with the calibration
dataset. The function follows the
For instance, you can assign
Note: You do not need to perform in-graph
pre-processing again in
input_fn because the
subgraph before –input_nodes
remains during quantization. Remove the pre-defined input
functions (including default and random) because they are not
commonly used. The pre-processing part, which is not in the
graph file, should be handled in
input_fn . |
Quantize Configuration | ||
--weight_bit | Int32 | Specifies the bit width for quantized weight and bias. Default value: 8 |
--activation_bit | Int32 | Specifies the bit width for quantized activation. Default value: 8 |
--nodes_bit | String | Specifies the bit width of nodes. Node names and bit widths form a
pair of parameters joined by a colon; the parameters are comma
separated. When specifying the conv op name, only vai_q_tensorflow quantizes the weights
of conv op using the specified bit width. For example,
conv1/Relu:16,conv1/weights:8,conv1:16. |
--method | Int32 | Specifies the method for quantization.
Default value: 1 |
--nodes_method | String | Specifies the method of nodes. Node names and methods form a pair of
parameters joined by a colon; the parameter pairs are comma
separated. When specifying the conv op name, only vai_q_tensorflow quantizes weights of
conv op using the specified method, for example,
'conv1/Relu:1,depthwise_conv1/weights:2,conv1:1'. |
--calib_iter | Int32 | Specifies the calibration iterations. A total number of images for
calibration = calib_iter * batch_size. Default value: 100 |
--ignore_nodes | String | Specifies the list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization. |
--skip_check | Int32 | If set to 1, the check for the floating-point model is skipped. Useful
when only part of the input model is quantized. Range: [0, 1] Default value: 0 |
--align_concat | Int32 | Specifies the strategy for aligning the input quantize position for
concat nodes.
Default value: 0 |
--align_pool | Int32 | Specifies the strategy for aligning the input quantize position for
maxpool/avgpool nodes.
Default value: 0 |
--simulate_dpu | Int32 | Set to 1 to enable DPU simulation. The behavior of DPU for some
operations is different from TensorFlow. For example, the dividing
in LeakyRelu and AvgPooling are replaced by bit-shifting, so there
might be a slight difference between DPU outputs and CPU/GPU
outputs. The vai_q_tensorflow quantizer simulates the behavior of
these operations if this flag is set to 1. Range: [0, 1] Default value: 1 |
--adjust_shift_bias | Int32 | Specifies the strategy for shift bias check and adjustment for the
DPU compiler.
Default value: 1 |
--adjust_shift_cut | Int32 | Specifies the shift cut check and adjustment strategy for the DPU compiler.
Default value: 1 |
--arch_type | String | Specifies the arch type for fixed neuron. DEFAULT means the quantization range of weights and activations is [-128, 127]. 'DPUCADF8H' means the weights quantization range is [-128, 127] while activation is [-127, 127] |
--output_dir | String | Specifies the directory to save the quantization results. Default value: “./quantize_results” |
--max_dump_batches | Int32 | Specifies the maximum number of batches for
dumping. Default value: 1 |
--dump_float | Int32 | If set to 1, the float weights and activations
are dumped. Range: [0, 1] Default value: 0 |
--dump_input_tensors | String | Specifies the Graph's input tensor name when the graph entrance is
not a placeholder. Add a placeholder to the dump_input_tensor so that input_fn
can feed data. |
--scale_all_avgpool | Int32 | Set to 1 to enable scale output of AvgPooling op to simulate DPU.
Only kernel_size <= 64 is scaled. This operation does not affect
special cases such as kernel_size=3,5,6,7,14 Default value: 1 |
--do_cle | Int32 |
Default value: 0 |
--replace_relu6 | Int32 | Available only for do_cle=1
Default value: 1 |
--replace_sigmoid | Int32 |
Default value: 0 |
--replace_softmax | Int32 |
Default value: 0 |
--convert_datatype | Int32 |
Default value: 0 |
--output_format | String | Indicates the format to save the quantized model,
pb for saving tensorflow frozen pb,
and onnx for saving the ONNX model. Default value: 'pb' |
Session Configurations | ||
--gpu | String | Specifies GPU device IDs used for quantization, separated by commas. |
--gpu_memory_fraction | Float | Specifies the GPU memory fraction used for
quantization, between 0-1. Default value: 0.5 |
Others | ||
--help | Shows all available vai_q_tensorflow options. | |
--version | Shows the vai_q_tensorflow version information. |
Examples
show help: vai_q_tensorflow --help
quantize:
vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb \
--input_nodes inputs \
--output_nodes predictions \
--input_shapes ?,224,224,3 \
--input_fn my_input_fn.calib_input
dump quantized model:
vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb \
--input_fn my_input_fn.dump_input
Refer to AMD Model Zoo for more TensorFlow model quantization examples.