Inference of Quantized Model - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

Inference of quantized model in torch script format

You can run the quantized model in TorchScript format, a .pt file, in the PyTorch framework. Before performing inference, import the pytorch_nndct module, as it sets up quantized operators for this model. The following is an example code:
import pytorch_nndct

# prepare input data
input = ......

quantized_model = torch.jit.load(quantized_model_path)

# feed input data to quantized model and make inference
output = quantized_model(input)

Inference of quantized model in ONNX script format

You can run the quantized model in ONNX format by ONNX Runtime APIs.
For the ONNX model with native Quantize and DeQuantize operators, you can run the model by using ONNX Runtime. The following is an example code:
import onnxruntime as ort

# prepare input data
input_data = ......

ort_sess = ort.InferenceSession(quantized_model_path)
input_name = ort_sess.get_inputs()[0].name
ort_output =, {input_name: input_data})

You must set up custom operators and then run the model using onnxruntime_extensions for the ONNX model with VAI Quantize and DeQuantize operators. The setting up can be done by function load_vai_ops(), imported from pytorch_nndct. The following is the example code:

from onnxruntime_extensions import PyOrtFunction
from pytorch_nndct.apis import load_vai_ops

## Before running the ONNX model, custom ops must be set.

# prepare input data
input = ......

# run using onnxruntime_extensions API
run_ort = PyOrtFunction.from_model(quantized_model_path)
ort_outputs = run_ort(input)

Inference of XIR format quantized model

So far, XIR format quantized model cannot be run by any tool.