Vitis AI Quantizer Flow - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

The following figure depicts the overall model quantization flow.

Figure 1. VAI Quantizer Workflow
Note: Caffe has been deprecated from Vitis AI 2.5. For information on Caffe, see Vitis AI 2.0 user guide.

The Vitis AI quantizer accepts a floating-point model as input and performs pre-processing (folds batch-norms and removes nodes not required for inference). It then quantizes the weights/biases and activations to the given bit width.

Before quantizing the model, it is inspected using the step known as the inspector. The inspector outputs the partition information, indicating which operators can run on which device (DPU/CPU). In general, DPU is faster than CPU, and the idea is to run as many operators as possible on DPU devices. The partition results contain messages explaining why certain operators cannot be executed on the DPU. This helps you to understand DPU's ability better and assists in adapting your model for the DPU.

Vitis AI quantizer requires multiple iterations of inference for calibrating the activations to enhance the accuracy of quantized models and capture activation statistics. This necessitates a calibration image dataset as input. Typically, the quantizer works effectively with 100–1000 calibration images, as backpropagation is unnecessary, and an unlabeled dataset serves the purpose.

After calibration, the quantized model transforms into a DPU deployable format (referred to as deploy_model.pb for vai_q_tensorflow, and model_name.xmodel for vai_q_pytorch), which aligns with the data format of a DPU. The Vitis AI compiler can then compile this model and deploy it to the DPU. However, the standard version of TensorFlow or PyTorch framework cannot directly accept the quantized model.