Native ONNX models support only INT8 quantization and half-even rounding. When converting models from Vitis AI quantizer to ONNX format, the other quantization bits and more rounding methods, such as half-up or toward zero, cannot be exported. To address this, vai::QuantizeLinear and vai::DequantizeLinear are used to replace the corresponding native ONNX operators when exporting ONNX models. For the DequantizeLinear, the interface between native ONNX and Vitis AI is the same. However, for the QuantizeLinear there are some differences between them, which are outlined in the following points:
- ONNX has an input list (x, y_scale, y_zero_point): Vitis AI has an input list (x, valmin, valmax, scale, zero_point, method), where valmin and valmax are quantization intervals, for example, valmin=-128 and valmax=127 for INT8 symmetric quantization, and method is a rounding way which can be half-even, half-up, down, up, toward zero, away from zero, and so on.
-
Obtaining a native Quant-Dequant ONNX model is possible by setting native_onnx=True in the following definition. If is not set, the Quant-Dequant ONNX model is received, with Vitis AI QuantizeLinear and DequantizeLinear operators. The default value is False.
The function exports the quantized model in ONNX format:def export_onnx_model(self, output_dir="quantize_result", verbose=False, dynamic_batch=False, opset_version=None, native_onnx=True, dump_layers=False, check_model=False, opt_graph=False):
Argument | Description |
---|---|
Output_dir | Directory for quantization result and intermediate files. The default value is quantize_result. |
Verbose | Flag to control the verbose logging. |
Dynamic_batch | A flag to set the batch size of the input shape dynamic or not. The default value is False. |
Opset_version | The version of the default (ai.onnx) opset to target. If not set, the latest version that is stable for the current version of PyTorch is valued. |
Native_onnx | Export ONNX model with native Quant-Dequant operators or custom Quant-Dequant ones. If set to True, the native Quant-Dequant ONNX model is received. Otherwise, the VAI Quant-Dequant ONNX model is generated. The default is True. |
Dump_layers | Dump output of each layer in the ONNX model during runtime. The default value is False. |
Check_model | Check the difference in outputs between XMODEL and ONNX models. The default value is False. |
Opt_graph | Optimize ONNX graph. The default value is False. |