This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.
vitis_quantize.VitisQuantizer.quantize_model(
calib_dataset=None,
calib_batch_size=None,
calib_steps=None,
verbose=0,
fold_conv_bn=True,
fold_bn=True,
replace_sigmoid=True,
replace_relu6=True,
include_cle=True,
cle_steps=10,
forced_cle=False,
include_fast_ft=False,
fast_ft_epochs=10)
Arguments
- calib_dataset
- A
tf.data.Dataset
,keras.utils.Sequence
, ornp.numpy
object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset. - calib_steps
- An int object, the total number of steps for calibration. Ignored with the
default value of None. If "calib_dataset" is a
tf.data
dataset, generator, orkeras.utils.Sequence
instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs. - calib_batch_size
- An int object, the number of samples per batch for calibration. If the
"calib_dataset" is in the form of a dataset, generator, or
keras.utils.Sequence
instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of anumpy.array
object, the default batch size is 32. - fold_conv_bn
- A
bool
object, whether to fold the batch norm layers into previousConv2D/DepthwiseConv2D/TransposeConv2D/Dense
layers. - fold_bn
- A
bool
object whether to convert the standalone batch norm layer into DepthwiseConv2D layers. - replace_sigmoid
- A
bool
object, whether to replace the Activation(activation='sigmoid') layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU. - replace_relu6
- A
bool
object, whether to replace the ReLU6 layers with ReLU layers. - include_cle
- A
bool
object, whether to do Cross-Layer Equalization before quantization. - cle_steps
- A
int
object, the iteration steps to do Cross-Layer Equalization. - forced_cle
- A
bool
object, whether to do forced Cross-Layer Equalization for ReLU6 layers. - include_fast_ft
- A
bool
object, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues. - fast_ft_epochs
- An int object, the iteration epochs to do fast fine-tuning for each layer.