Technically, quantize finetuning is similar to float model finetuning. The difference is that quantize finetuning uses the APIs of the vai_q_tensorflow2 to rewrite the float graph to convert it to a quantized model before the training starts. The typical workflow is as follows:
- Preparation.
Before finetuning, please prepare the following files:
Table 1. Input Files for vai_q_tensorflow2 Quantize Finetuning No. Name Description 1 Float model file Floating-point model files to start from. Can be omitted if training from scratch. 2 Dataset The training dataset with labels. 3 Training Scripts The Python scripts to run float train/finetuning of the model. - (Optional) Evaluate the Float Model
It is suggested to evaluate the float model first before doing quantize finetuning, which can check the correctness of the scripts and dataset. The accuracy and loss values of the float checkpoint can also be a baseline for the quantize finetuning.
- Modify the Training Scripts
Use the vai_q_tensorflow2 API,
VitisQuantizer.get_qat_model
, to do the quantization. The following is an example:model = tf.keras.models.load_model(‘float_model.h5’) # *Call Vai_q_tensorflow2 api to create the quantize training model from tensorflow_model_optimization.quantization.keras import vitis_quantize quantizer = vitis_quantize.VitisQuantizer(model) model = quantizer.get_qat_model() # Compile the model model.compile( optimizer= RMSprop(learning_rate=lr_schedule), loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=keras.metrics.SparseTopKCategoricalAccuracy()) # Start the training/finetuning model.fit(train_dataset)
- Save the Model
Call
model.save()
to save the trained model or use callbacks inmodel.fit()
to save the model periodically. For example:# save model manually model.save(‘trained_model.h5’) # save the model periodically during fit using callbacks model.fit( train_dataset, callbacks = [ keras.callbacks.ModelCheckpoint( filepath=’./quantize_train/’ save_best_only=True, monitor="sparse_categorical_accuracy", verbose=1, )])
- (Optional) Evaluate the Quantized Model
Call
model.evaluate()
on the eval_dataset to evaluate the quantized model, just like evaluation of the float model.Note: Quantize finetuning works like float finetuning, so it will be of great help to have some experience on float model training and finetuning. For example, how to choose hyper-parameters like optimizer type and learning rate.