Quantization aware training (QAT) is
similar to floating-point model training/finetuning. However, in QAT, the
vai_q_tensorflow APIs convert the floating-point graph to a quantized graph before the
training starts. Here is the typical workflow:
- Preparation: Before QAT, prepare the following files:
Table 1. Input Files for vai_q_tensorflow QAT No. Name Description 1 Checkpoint files Floating-point checkpoint files from which to start. Ignore this if you are training the model from scratch. 2 Dataset The training dataset with labels. 3 Train Scripts The Python scripts for running the float training/finetuning of the model. - Evaluate the floating-point model (optional): Evaluate the float checkpoint files before performing quantize finetuning to check the accuracy of the scripts and dataset. The accuracy and loss values of the float checkpoint can also be a baseline for QAT.
- Modify the training scripts: To create the quantize training graph, modify
the training scripts to call the function after the floating-point graph is
built. The following is an
example:
# train.py # ... # Create the float training graph model = model_fn(is_training=True) # *Set the quantize configurations import vai_q_tensorflow q_config = vai_q_tensorflow.QuantizeConfig(input_nodes=['net_in'], output_nodes=['net_out'], input_shapes=[[-1, 224, 224, 3]]) # *Call Vai_q_tensorflow API to create the quantize training graph vai_q_tensorflow.CreateQuantizeTrainingGraph(config=q_config) # Create the optimizer optimizer = tf.train.GradientDescentOptimizer() # start the training/finetuning; you can use sess.run(), tf.train, tf.estimator, tf.slim and so on # ...
Note: You can useimport vai_q_tensorflow as decent_q
for compatibility with older version codes of vai_q_tensorflow which wasimport tensorflow.contrib.decent_q
The
QuantizeConfig
contains the configurations for quantization.Some basic configurations like
input_nodes
,output_nodes
,and input_shapes
must be set up according to your model structure.Other configurations like
weight_bit
,activation_bit
, andmethod
have default values and can be modified as needed. See vai_q_tensorflow Usage for detailed information on all the configurations.-
input_nodes
/output_nodes
- They are used together to determine the subgraph
range you want to quantize. The pre-processing and post-processing
components are usually not quantizable and should be out of this
range. The input_nodes and output_nodes should be the same for the
float training and evaluation graphs to match the quantization
operations between them. Note: Operations with multiple output tensors (such as FIFO) are currently unsupported. You can add a tf.identity node to make an alias for the input_tensor to make a single output input node.
-
input_shapes
- The shape list of input_nodes must be 4-dimensional for each node. The information is comma separated, for example, [[1,224,224,3] [1, 128, 128, 1]]; support unknown size for batch_size, for example, [[-1,224,224,3]].
-
- Evaluate and generate the quantized model: After QAT, evaluate the quantized
graph with a checkpoint file and generate the frozen model. This can be done by
calling the following function after building the float evaluation graph. The
freezing process depends on the quantize evaluation graph, so they are often
called together. Note:
vai_q_tensorflow.CreateQuantizeTrainingGraph
andvai_q_tensorflow.CreateQuantizeEvaluationGraph
functions modify the default graph in TensorFlow. They must be called on different graph phases.vai_q_tensorflow.CreateQuantizeTrainingGraph
must be called on the float training graph whilevai_q_tensorflow.CreateQuantizeEvaluationGraph
needs to be called on the float evaluation graph.vai_q_tensorflow.CreateQuantizeEvaluationGraph
cannot be called right after calling thevai_q_tensorflow.CreateQuantizeTrainingGraph
function because the default graph has been converted to a quantize training graph. The correct approach is to call it after the floating-point model creation function.# eval.py # ... # Create the float evaluation graph model = model_fn(is_training=False) # *Set the quantize configurations import vai_q_tensorflow q_config = vai_q_tensorflow.QuantizeConfig(input_nodes=['net_in'], output_nodes=['net_out'], input_shapes=[[-1, 224, 224, 3]]) # *Call Vai_q_tensorflow API to create the quantize evaluation graph vai_q_tensorflow.CreateQuantizeEvaluationGraph(config=q_config) # *Call Vai_q_tensorflow API to freeze the model and generate the deploy model vai_q_tensorflow.CreateQuantizeDeployGraph(checkpoint="path to checkpoint folder", config=q_config) # start the evaluation; You can use sess.run, tf.train, tf.estimator, tf.slim and so on # ...