Run the following commands to quantize the model:
$vai_q_tensorflow quantize \
--input_frozen_graph frozen_graph.pb \
--input_nodes ${input_nodes} \
--input_shapes ${input_shapes} \
--output_nodes ${output_nodes} \
--input_fn input_fn \
[options]
The input_nodes
and output_nodes
arguments
are the name list of input nodes of the quantize graph. They serve as the start and
end points of quantization. The main graph between them is quantized if it is
quantizable, as shown in the following figure.
It is recommended to set –input_nodes
as the
last nodes of the pre-processing part and -output_nodes
as the last nodes of the main graph because some
operations in the pre-and- post-processing parts are not quantizable. It might cause
errors when the model is compiled by the Vitis AI
compiler and deployed to the DPU.
The input nodes might not be the same as the placeholder nodes of the graph. The placeholder nodes should be set as input nodes if the frozen graph does not contain in-graph pre-processing.
The input_fn
should be consistent
with the placeholder nodes.
[options] stands for optional parameters. The most commonly used options are:
- weight_bit
- Bit width for quantized weight and bias (the default value is 8).
- activation_bit
- Bit width for quantized activation (the default value is 8).
- method
- Quantization methods, including 0 for non-overflow, 1 for min-diffs, and 2 for min-diffs with normalization. The non-overflow approach ensures that no values are saturated during quantization. The results can be affected by outliers. The min-diffs method allows saturation for quantization to achieve a lower quantization difference. It is more robust to outliers and usually results in a narrower range than the non-overflow method.