Quantizing with Float Scale

Quantizing with Float Scale - 3.5 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2023-09-28

Version

3.5 English

The quantization for DPU uses power-of-2 scales, symmetry, and per-tensor quantizers and needs some special processes to simulate DPU behaviors. However, for other devices that support floating-point scales, a different quantize strategy is needed, therefore, float-scale quantization is introduced.

The fs quantize strategy: Performs quantization for inputs and weights of Conv2D, DepthwiseConv2D, Conv2DTranspose, and Dense layers. Conv-BN folding is not performed by default.
The fsx quantize strategy: Performs quantization for more layer types than the fs quantize strategy, such as Add, MaxPooling2D, and AveragePooling2D. In addition, the quantization process extends to the biases and activations of Conv2D, DepthwiseConv2D, Conv2DTranspose and Dense layers. By default, it includes Conv-BN folding.

Note: fs and fsx strategies are designed for target devices with floating-point supports. DPU does not have floating-point support at present, so models quantized with these quantize strategies cannot be deployed to DPU.

You can switch to using float scale quantization by setting quantize_strategy to fs or fsx in the construct function of VitisQuantizer. The following is an example code:

model = tf.keras.models.load_model('float_model.h5')
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model, quantize_strategy='fs')
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset,
                                           calib_step=100, 
                                           calib_batch_size=10，
                                           **kwargs)

calib_dataset: calib_dataset is used as a representative calibration dataset for calibration. You can use full or part of the eval_dataset, train_dataset, or other datasets.
calib_steps: calib_steps is the total number of steps for calibration. It has a default value of None. If calib_dataset is a tf.data dataset, generator, or keras.utils.Sequence instance and steps are None, calibration runs until the dataset is exhausted. Array inputs do not support this argument.
calib_batch_size: calib_batch_size is the number of samples per batch for calibration. If the calib_dataset is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the calib_dataset is in the form of a numpy.array object, the default batch size is set to 32.
**kwargs: **kwargs is the dict of the user-defined configurations of quantize strategy. It enables users to override the default built-in quantize strategy. For instance, setting bias_bit=16 enables the tool to quantize all the biases using 16-bit quantizers. For more information on the user-defined configurations, see the vai_q_tensorflow2 Usage section.