(Optional) vai_q_tensorflow2 Fast Finetuning - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

Generally, there is a minor accuracy loss after quantization, but for specific networks like MobileNet, the accuracy loss can be significant. To address this, fast fine-tuning uses the AdaQuant algorithm, adjusting weights and quantizing parameters layer-by-layer with the unlabeled calibration dataset to improve accuracy for specific models.

Although fast fine-tuning takes longer than normal PTQ (still significantly shorter than QAT, given the smaller calib_dataset), it is turned off by default. However, you can enable it to enhance performance if you encounter accuracy issues. A recommended workflow is to try PTQ without fast fine-tuning, then attempt quantization with fast fine-tuning if the accuracy is unsatisfactory.

While QAT is another method to improve accuracy, it requires more time and relies on the training dataset. To activate fast fine-tuning during post-training quantization, set include_fast_ft=True.

quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_steps=None, calib_batch_size=None, include_fast_ft=True, fast_ft_epochs=10) 


  • include_fast_ft determines whether to perform fast finetuning or not.
  • fast_ft_epochs indicates the number of finetuning epochs for each layer.