Inference of quantized model in torch script format
You can run the quantized model in TorchScript format, a .pt file, in the
PyTorch framework. Before performing inference, import the pytorch_nndct module, as
it sets up quantized operators for this model. The following is an example
code:
import pytorch_nndct
# prepare input data
input = ......
quantized_model = torch.jit.load(quantized_model_path)
# feed input data to quantized model and make inference
output = quantized_model(input)
Inference of quantized model in ONNX script format
You can run the quantized model in ONNX format by ONNX Runtime APIs.For the ONNX model with native Quantize and
DeQuantize operators, you can run the model by using ONNX Runtime. The following is
an example code:
import onnxruntime as ort
# prepare input data
input_data = ......
ort_sess = ort.InferenceSession(quantized_model_path)
input_name = ort_sess.get_inputs()[0].name
ort_output = ort_sess.run(None, {input_name: input_data})
You must set up custom operators and then run the model using onnxruntime_extensions for the ONNX model with VAI Quantize and DeQuantize operators. The setting up can be done by function load_vai_ops(), imported from pytorch_nndct. The following is the example code:
from onnxruntime_extensions import PyOrtFunction
from pytorch_nndct.apis import load_vai_ops
## Before running the ONNX model, custom ops must be set.
load_vai_ops()
# prepare input data
input = ......
# run using onnxruntime_extensions API
run_ort = PyOrtFunction.from_model(quantized_model_path)
ort_outputs = run_ort(input)
Inference of XIR format quantized model
So far, XIR format quantized model cannot be run by any tool.