Quantizing with Float Scale

Quantizing with Float Scale - 2.5 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2022-06-15

Version

2.5 English

The quantization for DPU uses power-of-2 scales, symmetry, per-tensor quantizers and need some special processes to simulate DPU behaviors. For other devices supporting floating point scales will need a different quantize strategy, so we introduced the float scale quantization in this release.

The fs quantize strategy: Do quantization for inputs and weights of Conv2D, DepthwiseConv2D, Conv2DTranspose and Dense layers. By default, it will not do Conv-BN folding.
The fsx quantize strategy: Do quantization for more layer types than fs quantize strategy, such as Add, MaxPooling2D and AveragePooling2D. Moreover, it also quantizes the biases and activations of Conv2D, DepthwiseConv2D, Conv2DTranspose and Dense layers. By default, it will do Conv-BN folding.

Note: fs and fsx strategies are designed for target devices with floating-point supports. DPU does not have floating-point support now, so models quantized with these quantize strategies can not be deployed to them.

Users can switch to use float scale quantization by setting quantize_strategy to fs or fsx in the construct function of VitisQuantizer, example codes are showed as below:

model = tf.keras.models.load_model(‘float_model.h5’)
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model, quantize_strategy='fs')
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset,
                                           calib_step=100, 
                                           calib_batch_size=10，
                                           **kwargs)

calib_dataset: calib_dataset is used as a representative calibration dataset for calibration. You can use full or part of the eval_dataset, train_dataset, or other datasets.
calib_steps: calib_steps is the total number of steps for calibration. It has a default value of None. If calib_dataset is a tf.data dataset, generator, or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
calib_batch_size: calib_batch_size is the number of samples per batch for calibration. If the "calib_dataset" is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the calib_dataset is in the form of a numpy.array object, the default batch size is 32.
**kwargs: dict of the user-defined configurations of quantize strategy. It will override the default built-in quantize strategy. For example, setting bias_bit=16 will let the tool to quantize all the biases with 16bit quantizers. See vai_q_tensorflow2 Usage section for more information of the user-defined configurations.