vai_q_pytorch Fast fine-tuning

vai_q_pytorch Fast fine-tuning - 3.5 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2023-09-28

Version

3.5 English

Generally, there might be a slight accuracy loss after quantization, but specific networks, like MobileNets, could experience a more significant accuracy reduction. In such situations, you can first attempt fast finetune to improve the accuracy of the quantized models. If the fast finetune approach still does not produce satisfactory results, consider using Quantization Aware Training (QAT) to enhance the accuracy of the quantized models further. QAT involves training the model with quantization-aware optimizations to achieve better accuracy in the quantized state.

The AdaQuant algorithm^{#idz1605002441796__note1} uses a small set of unlabeled data for activation calibration and weight fine-tuning. The Vitis AI quantizer incorporates this algorithm under "fast fine-tuning." Although slightly slower, fast fine-tuning can yield better performance than post-training quantization. Similar to Quantization-Aware Training (QAT), each run of fast fine-tuning might produce a different result.

Fast fine-tuning does not involve training the model and requires only a limited number of iterations. For classification models on the Imagenet dataset, 5120 images are sufficient for an experiment. The data used in the fast fine-tuning process does not require annotations. Modifying the model evaluation script is the primary requirement for fast fine-tuning; there is no need to set up the optimizer for training. To use fast fine-tuning, a function for model forwarding iteration is necessary and will be called during the process. Re-calibration with the original inference code is recommended to ensure accuracy.

You can find a complete example in the open-source example.

# fast finetune model or load finetuned parameter before the test 
  if fast_finetune == True:
      ft_loader, _ = load_data(
          subset_len=5120,
          train=False,
          batch_size=batch_size,
          sample_method='random',
          data_dir=args.data_dir,
          model_name=model_name)
      if quant_mode == 'calib':
          quantizer.fast_finetune(evaluate, (quant_model, ft_loader, loss_fn))
      elif quant_mode == 'test':
          quantizer.load_ft_param()

For parameter fine-tuning and re-calibration of this ResNet18 example, run the following command:

python resnet18_quant.py --quant_mode calib --fast_finetune

To test the finetuned quantized model accuracy, run the following command:

python resnet18_quant.py --quant_mode test --fast_finetune

To deploy the finetuned quantized model, run the following command:

python resnet18_quant.py --quant_mode test --fast_finetune --subset_len 1 --batch_size 1 --deploy

Note: Itay Hubara et al.., Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming, arXiv:2006.10518, 2020.