Generally, there is a small accuracy loss after quantization, but for some networks such as MobileNets, the accuracy loss can be large. In this situation, first try fast finetune. If fast finetune still does not yield satisfactory results, quantize finetuning can be used to further improve the accuracy of the quantized models.
The AdaQuant algorithm 1 uses a small set of unlabeled data. It not only calibrates the activations but also finetunes the weights. The Vitis AI quantizer implements this algorithm and call it "fast finetuning" or "advanced calibration." Though slightly slower, fast finetuning can achieve better performance than quantize calibration. Similar to quantize finetuning, each run of fast finetuning produces a different result.
Fast finetuning does not train the model, and only needs a limited number of iterations. For classification models on Imagenet dataset, 1000 images are enough. Fast finetuning only needs some modification based on the model evaluation script. There is no need to set up the optimizer for training. To use fast finetuning, a function for model forwarding iteration is needed and will be called during fast finetuning. Re-calibration with the original inference code is recommended.
You can find a complete example in the open source example.
# fast finetune model or load finetuned parameter before test
if fast_finetune == True:
ft_loader, _ = load_data(
subset_len=1024,
train=False,
batch_size=batch_size,
sample_method=None,
data_dir=args.data_dir,
model_name=model_name)
if quant_mode == 'calib':
quantizer.fast_finetune(evaluate, (quant_model, ft_loader, loss_fn))
elif quant_mode == 'test':
quantizer.load_ft_param()
python resnet18_quant.py --quant_mode calib --fast_finetune
python resnet18_quant.py --quant_mode test --fast_finetune
- Itay Hubara et.al., Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming, arXiv:2006.10518, 2020.