Quantization in ONNX Runtime refers to the linear quantization of an ONNX model. vai_q_onnx tool is developed as a plugin for ONNX Runtime to support more post-training quantization (PTQ) functions to quantize a deep learning model.
Post-training quantization (PTQ) is a technique to convert a pretrained float model into a quantized model with little degradation in model accuracy.
A representative dataset is needed to run a few batches of inference on the float model to obtain the distribution of activations at each layer, a process which is called post-training quantization.