Quantize finetuning is almost the same as float model finetuning, the difference is that the vai_q_tensorflow's APIs are used to rewrite the float graph to convert it to a quantized graph, before the training starts. Here is the typical workflow.
Step 0: Preparation
Before finetuning, please prepare the following files:
No. | Name | Description |
---|---|---|
1 | Checkpoint files | Floating-point checkpoint files to start from. Can be omitted if train from scratch. |
2 | Dataset | The training dataset with labels. |
3 | Train Scripts | The python scripts to run float train/finetuning of the model. |
Step 1 (Optional): Evaluate the Float Model
It is suggested to evaluate the float checkpoint files first before doing quantize finetuning, which can check the correctness of the scripts and dataset, and the accuracy and loss values of the float checkpoint can also be a baseline for the quantize finetuning.
Step 2: Modify the Training Scripts
To create the quantize training graph, modify the training scripts to call the function after the float graph is built. The following is an example:
# train.py
# ...
# Create the float training graph
model = model_fn(is_training=True)
# *Set the quantize configurations
from tensorflow.contrib import decent_q
q_config = decent_q.QuantizeConfig(input_nodes=['net_in'],
output_nodes=['net_out'],
input_shapes=[[-1, 224, 224, 3]])
# *Call Vai_q_tensorflow api to create the quantize training graph
decent_q.CreateQuantizeTrainingGraph(config=q_config)
# Create the optimizer
optimizer = tf.train.GradientDescentOptimizer()
# start the training/finetuning, you can use sess.run(), tf.train, tf.estimator, tf.slim and so on
# ...
The QuantizeConfig
contains the
configurations for quantization.
Some basic configurations like input_nodes
, output_nodes
, input_shapes
need to be set according to your model
structure.
Other configurations like weight_bit
, activation_bit
, method
have default values and can be modified as needed. See vai_q_tensorflow Usage for detailed information of all the
configurations.
-
input_nodes
/output_nodes
: They are used together to determine the subgraph range you want to quantize. The pre-processing and post-processing part are usually not quantizable and should be out of this range. Note that the input_nodes and output_nodes should be the same for the float training graph and float evaluation graph for correctly match the quantization operations between them. Currently operations with multiple output tensors (such as FIFO) can not be supported, in that case you can simply add a tf.identity node to make a alias for the input_tensor to make a single output input node. -
input_shapes
: The shape list of input_nodes, must be a 4-dimension shape for each node, comma separated, e.g. [[1,224,224,3] [1, 128, 128, 1]]; support unknown size for batch_size, e.g. [[-1,224,224,3]].
Step 4: Evaluate the Quantized Model and Generate the Deploy Model
After quantize finetuning, generate the deploy model. Before that, you need to evaluate the quantized graph with checkpoint file. This can be done by calling the below function after building the float evaluation graph. As the deploy process needs to run based on the quantize evaluation graph, so they are often called together.
# eval.py
# ...
# Create the float evaluation graph
model = model_fn(is_training=False)
# *Set the quantize configurations
from tensorflow.contrib import decent_q
q_config = decent_q.QuantizeConfig(input_nodes=['net_in'],
output_nodes=['net_out'],
input_shapes=[[-1, 224, 224, 3]])
# *Call Vai_q_tensorflow api to create the quantize evaluation graph
decent_q.CreateQuantizeEvaluationGraph(config=q_config)
# *Call Vai_q_tensorflow api to freeze the model and generate the deploy model
decent_q.CreateQuantizeDeployGraph(checkpoint="path to checkpoint folder", config=q_config)
# start the evaluation, users can use sess.run, tf.train, tf.estimator, tf.slim and so on
# ...
Generated Files
After above steps, the generated file are in the ${output_dir}, list as below:
Name | Tensorflow Compatable | Usage | Description |
---|---|---|---|
quantize_train_graph.pb | Yes | Train | The quantize train graph. |
quantize_eval_graph_{suffix}.pb | Yes | Evaluation with checkpoint | The quantize evaluation graph with quantize information frozen inside. No weights inside, should be used together with the checkpoint file in evaluation. |
quantize_eval_model_{suffix}.pb | Yes | 1. Evaluation; 2. Dump; 3. Input to VAI compiler (DPUCAHX8H) | The frozen quantize evaluation graph, weights in the checkpoint and quantize information are frozen inside. It can be used to evaluate the quantized model on the host or to dump the outputs of each layer for cross check with DPU outputs. XIR compiler uses it as input. |
deploy_model_{suffix}.pb | No | Input to VAI compiler (DPUCZDX8G) | The deploy model, operations and quantize information are fused. DNNC compiler uses it as input. |
The suffix contains the iteration information from the checkpoint file and the date information to make it clear to combine it to checkpoints files. For example, if the checkpiont file is "model.ckpt-2000.*" and the date is 20200611, then the suffix will be "2000_20200611000000".