def export_onnx_model(self, output_dir, verbose)-New - 3.5 English - UG1414

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

Native ONNX models support only INT8 quantization and half-even rounding. When converting models from Vitis AI quantizer to ONNX format, the other quantization bits and more rounding methods, such as half-up or toward zero, cannot be exported. To address this, vai::QuantizeLinear and vai::DequantizeLinear are used to replace the corresponding native ONNX operators when exporting ONNX models. For the DequantizeLinear, the interface between native ONNX and Vitis AI is the same. However, for the QuantizeLinear there are some differences between them, which are outlined in the following points:

  • ONNX has an input list (x, y_scale, y_zero_point): Vitis AI has an input list (x, valmin, valmax, scale, zero_point, method), where valmin and valmax are quantization intervals, for example, valmin=-128 and valmax=127 for INT8 symmetric quantization, and method is a rounding way which can be half-even, half-up, down, up, toward zero, away from zero, and so on.
  • Obtaining a native Quant-Dequant ONNX model is possible by setting native_onnx=True in the following definition. If is not set, the Quant-Dequant ONNX model is received, with Vitis AI QuantizeLinear and DequantizeLinear operators. The default value is False.

    The function exports the quantized model in ONNX format:
    def export_onnx_model(self, output_dir="quantize_result", verbose=False, dynamic_batch=False,                   opset_version=None, native_onnx=True, dump_layers=False, check_model=False, opt_graph=False):
Table 1. Arguments
Argument Description
Output_dir Directory for quantization result and intermediate files. The default value is quantize_result.
Verbose Flag to control the verbose logging.
Dynamic_batch A flag to set the batch size of the input shape dynamic or not. The default value is False.
Opset_version The version of the default (ai.onnx) opset to target. If not set, the latest version that is stable for the current version of PyTorch is valued.
Native_onnx Export ONNX model with native Quant-Dequant operators or custom Quant-Dequant ones. If set to True, the native Quant-Dequant ONNX model is received. Otherwise, the VAI Quant-Dequant ONNX model is generated. The default is True.
Dump_layers Dump output of each layer in the ONNX model during runtime. The default value is False.
Check_model Check the difference in outputs between XMODEL and ONNX models. The default value is False.
Opt_graph Optimize ONNX graph. The default value is False.