(Optional) Pre-Processing on the Float Model - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

Pre-processing float32 model transforms and prepares it for quantization. It consists of the following three optional steps:

  • Symbolic shape inference: It is best suited for transformer models.
  • Model optimization: It uses ONNX Runtime native library to rewrite the computation graph, including merging computation nodes and eliminating redundancies to improve Runtime efficiency.
  • ONNX shape inference.

The primary objective of these steps is to enhance quantization quality. The ONNX Runtime quantization tool performs optimally when the tensor's shape is known. Both symbolic shape inference and ONNX shape inference play a crucial role in determining tensor shapes. Symbolic shape inference is particularly effective for transformer-based models, whereas ONNX shape inference works well with other models.

Model optimization performs certain operator fusion, making the quantization tool’s job easier. For instance, a Convolution operator followed by BatchNormalization can be fused into one during the optimization, which enables effective quantization.

ONNX Runtime has a known issue: model optimization cannot output a model size greater than 2 GB. As a result, for large models, optimization must be skipped.

The pre-processing API can be found in the onnxruntime.quantization.shape_inference Python module inside the quant_pre_process() function:

from onnxruntime.quantization import shape_inference

shape_inference.quant_pre_process(
    input_model_path: str,
    output_model_path: str,
    skip_optimization: bool = False,
    skip_onnx_shape: bool = False,
    skip_symbolic_shape: bool = False,
    auto_merge: bool = False,
    int_max: int = 2**31 - 1,
    guess_output_rank: bool = False,
    verbose: int = 0,
    save_as_external_data: bool = False,
    all_tensors_to_one_file: bool = False,
    external_data_location: str = "./",
    external_data_size_threshold: int = 1024,)
 
input_model_path
Path to the input model file.
output_model_path
Path to the output model file.
skip_optimization
Skip the model optimization step if set to true. This might result in ONNX shape inference failure for some models.
skip_onnx_shape
Skip ONNX shape inference. Symbolic shape inference is most effective with transformer-based models. Skipping all shape inferences might reduce the effectiveness of quantization because a tensor with an unknown shape cannot be quantized.
skip_symbolic_shape
Skip symbolic shape inference. Symbolic shape inference is most effective with transformer-based models. Skipping all shape inferences might reduce the effectiveness of quantization because a tensor with an unknown shape can not be quantized.
auto_merge
For symbolic shape inference. Automatically merge symbolic dims when conflict happens.
int_max
For symbolic shape inference, specify the maximum value for the integer to be treated as boundless for ops like slice.
guess_output_rank
Guess output rank to be the same as input 0 for unknown ops.
verbose
Logs detailed info of inference. Options are 0: turn off, 1: warnings, 3: detailed.
save_as_external_data
Saving an ONNX model to external data.
all_tensors_to_one_file
Saving all the external data to one file.
external_data_location
The file location to save the external file.
external_data_size_threshold
The size threshold for external data.