vai_q_onnx Usage - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

vai_q_onnx.quantize_static(
    model_input,
    model_output,
    calibration_data_reader,
    quant_format=vai_q_onnx.VitisQuantFormat.FixNeuron,
    calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
    input_nodes=[],
    output_nodes=[],
    op_types_to_quantize=None,
    per_channel=False,
    reduce_range=False,
    activation_type=QuantType.QInt8,
    weight_type=QuantType.QInt8,
    nodes_to_quantize=None,
    nodes_to_exclude=None,
    optimize_model=True,
    use_external_data_format=False,
    calibrate_method=CalibrationMethod.MinMax,
    extra_options=None)

Arguments

model_input
File path of the model to quantize.
model_output
File path of the quantized model.
calibration_data_reader
A calibration data reader. It enumerates calibration data and generates inputs for the original model. If you want to use random data for a quick test, you can set calibration_data_reader to None.
quant_format
  • QOperator: quantizes the model with quantized operators directly.
  • QDQ: quantize the model by inserting QuantizeLinear/DeQuantizeLinear on the tensor. Supports only 8-bit quantization.
  • VitisQuantFormat.QDQ: quantizes the model by inserting VAIQuantizeLinear/VAIDeQuantizeLinear on the tensor. Supports more bit-width and configurations.
  • VitisQuantFormat.FixNeuron: quantizes the model by inserting FixNeuron (composition of QuantizeLinear and DeQuantizeLinear) on the tensor.
calibrate_method
For DPU devices, set calibrate_method to 'vai_q_onnx.PowerOfTwoMethod.NonOverflow' or 'vai_q_onnx.PowerOfTwoMethod.MinMSE' to apply power-of-2 scale quantization. The PowerOfTwoMethod has two supported methods currently: MinMSE and NonOverflow. The default method is MinMSE.
input_nodes
A list(string) object. Names of the start nodes to be quantized. Nodes before these start nodes in the model are not optimized or quantized. For example, this argument can skip some pre-processing nodes or stop quantizing the first node. The default value is [].
output_nodes
A list(string) object. Names of the end nodes to be quantized. Nodes after these nodes in the model are not optimized or quantized. For example, this argument can skip some post-processing nodes or stop quantizing the last node. The default value is [].
op_types_to_quantize
Specifies the types of operators to quantize, such as ['Conv'] to quantize Conv only. It quantizes all supported operators by default.
per_channel
Quantize weights per channel. For DPU, this must be set to False as it currently does not support per-channel.
reduce_range
Quantize weights with 7 bits. For DPU, the reduce_range is not supported, so this must be set to False.
weight_type
Quantization data type of weight. For DPU, this must be set to QuantType.QInt8. For more details on data type selection, refer to https://onnxruntime.ai/docs/performance/quantization.html.
nodes_to_quantize
List of nodes names to quantize. The nodes in this list are quantized only when this list is None.
nodes_to_exclude
List of nodes names to exclude. The nodes in this list are excluded from quantization when it is None.
optimize_model
Optimizes the model before quantization is going to be deprecated soon. It is not recommended because optimization changes the computation graph, making debugging quantization loss difficult.
use_external_data_format
Option used for large size (>2GB) model. The default value is False.
extra_options
Key-value pair dictionary for various options in different cases. Currently used pairs:
ActivationSymmetric
Symmetrize calibration data for activations (default is False). In PowerOfTwoMethod calibrate_method, it should always set ActivationSymmetric as True.
WeightSymmetric
symmetrize calibration data for weights (The default value is True). In PowerOfTwoMethod calibrate_method, it should always set WeightSymmetric to True.
ForceQuantizeNoInputCheck
By default, some latent operators, such as maxpool and transpose, do not quantize if their input is not quantized already. Setting to True to force such an operator always quantizes input and generates quantized output. Also, the true behavior could be disabled per node using the nodes_to_exclude.
MatMulConstBOnly
The default value is False for static mode. If enabled, only MatMul with const B is quantized.
AddQDQPairToWeight
The default value is False, which quantizes floating-point weight and feeds it to the solely inserted DeQuantizeLinear node. If True, it remains floating-point weight and inserts QuantizeLinear/DeQuantizeLinear nodes to weight. In PowerOfTwoMethod calibrate_method, QDQ should always appear as a pair. Therefore, you need to add qdq pair to weight, and it should always set AddQDQPairToWeight to True.