Generally, there might be a slight accuracy loss after quantization, but specific networks, like MobileNets, could experience a more significant accuracy reduction. In such situations, you can first attempt fast finetune to improve the accuracy of the quantized models. If the fast finetune approach still does not produce satisfactory results, consider using Quantization Aware Training (QAT) to enhance the accuracy of the quantized models further. QAT involves training the model with quantization-aware optimizations to achieve better accuracy in the quantized state.
The AdaQuant algorithm #idz1605002441796__note1 uses a small set of unlabeled data for activation calibration and weight fine-tuning. The Vitis AI quantizer incorporates this algorithm under "fast fine-tuning." Although slightly slower, fast fine-tuning can yield better performance than post-training quantization. Similar to Quantization-Aware Training (QAT), each run of fast fine-tuning might produce a different result.
Fast fine-tuning does not involve training the model and requires only a limited number of iterations. For classification models on the Imagenet dataset, 5120 images are sufficient for an experiment. The data used in the fast fine-tuning process does not require annotations. Modifying the model evaluation script is the primary requirement for fast fine-tuning; there is no need to set up the optimizer for training. To use fast fine-tuning, a function for model forwarding iteration is necessary and will be called during the process. Re-calibration with the original inference code is recommended to ensure accuracy.
You can find a complete example in the open-source example.
# fast finetune model or load finetuned parameter before the test
if fast_finetune == True:
ft_loader, _ = load_data(
subset_len=5120,
train=False,
batch_size=batch_size,
sample_method='random',
data_dir=args.data_dir,
model_name=model_name)
if quant_mode == 'calib':
quantizer.fast_finetune(evaluate, (quant_model, ft_loader, loss_fn))
elif quant_mode == 'test':
quantizer.load_ft_param()
python resnet18_quant.py --quant_mode calib --fast_finetune
python resnet18_quant.py --quant_mode test --fast_finetune
python resnet18_quant.py --quant_mode test --fast_finetune --subset_len 1 --batch_size 1 --deploy