The overall model quantization flow is detailed in the following figure.
The Vitis AI quantizer takes a floating-point model as input and performs pre-processing (folds batchnorms and removes nodes not required for inference), and then quantizes the weights/biases and activations to the given bit width.
To capture activation statistics and improve the accuracy of quantized models, the Vitis AI quantizer must run several iterations of inference to calibrate the activations. A calibration image dataset input is, therefore, required. Generally, the quantizer works well with 100–1000 calibration images. Because there is no need for back propagation, an unlabeled dataset is sufficient.
After calibration, the quantized model is transformed into a DPU deployable model (named deploy_model.pb for vai_q_tensorflow, model_name.xmodel for vai_q_pytorch, and deploy.prototxt / deploy.caffemodel for vai_q_caffe), which follows the data format of a DPU. This model can then be compiled by the Vitis AI compiler and deployed to the DPU. The quantized model cannot be taken in by the standard version of TensorFlow, PyTorch, or Caffe framework.