The TensorFlow2 quantizer supports two approaches to quantizing a deep learning model:
- Post-training quantization (PTQ)
- PTQ is a technique to convert a pre-trained floating-point model into a quantized model with little degradation in model accuracy. To perform PTQ, a representative dataset is required to run a few batches of inference on the floating-point model, which helps obtain the distributions of the activations. This process is also known as quantize calibration.
- Quantization aware training (QAT)
- QAT models the quantization error in both the forward and backward passes during model quantization. When using QAT, it is recommended to begin with a floating-point pre-trained model that already exhibits good accuracy rather than starting from scratch.