TensorFlow 1.x - 3.5 English - UG1414

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

Quantization API

def quantize(
input_frozen_graph = "", 
input_nodes = "", 
input_shapes = "", 
output_nodes = "", 
input_fn = "", 
method = 1, 
calib_iter = 100, 
output_dir = "./quantize_results", 
This function invokes the vai_q_tensorflow command tool in WeGO TensorFlow r1.15 and converts the input floating-point model to a fixed-point model for DPU deployment acceleration. To be fully compatible with the native vai_q_tensorflow quantizer, all parameters received from this API are forwarded to vai_q_tensorflow command tool directly. This function returns a quantized GraphDef object or None on failure.
Note: Only PTQ is supported now for on-the-fly quantization in WeGO. For more information on fast fine-tuning and QAT quantization, see vai_q_tensorflow Quantization Aware Training.


string. Path to input frozen graph(.pb) (default: )
string: The comma-separated name list of input nodes of the subgraph to be quantized and used together with output_nodes. Only the subgraph between input_nodes and output_nodes is included when generating the deployment model. Set it to the beginning of the main body of the model to quantize, such as the nodes after data pre-processing and augmentation. (default: )
string. The comma-separated shape list of input_nodes. The shape must be a 4-dimension shape for each node, separated by commas, for example, 1,224,224,3; Unknown size for batch size is supported, for example, ?,224,224,3; In case of multiple input_nodes, assign the shape list of each node, separated by :, for example, ?,224,224,3:?,300,300,1 (default: )
string: The comma-separated name list of output nodes of the subgraph to be quantized that is used together with input_nodes. Only the subgraph between input_nodes and output_nodes is included when generating the deployment model. Set it to the end of the main body of the model to quantize, such as the nodes, before post-processing. (default: )
string: The Python importable function that provides the input data. The format is module_name.input_fn_name, for example, my_input_fn.input_fn. The input_fn should take an int object as input indicating the calibration step and return a dict (placeholder_node_name : numpy.Array) object for each call, which will be fed into the model's placeholder nodes. (default: )
int32: {0,1,2}, default: 1. The quantization method, options are:
  • 0: non-overflow method. Ensures no values are saturated during quantization. It might cause inaccurate results
  • 1: min-diffs method. Enables saturation for large values during quantization to get smaller quantization errors. This method is slower than method 0 but has higher endurance to outliers.
  • 2: min-diffs method with a strategy for depthwise. Enables saturation for large values during quantization to get smaller quantization errors. Apply a special strategy for depthwise weights, but implement method 1 to standard weights and activation. This method is slower than method 0 but has higher endurance to outliers.
int32. The iterations of calibration. The total number of images for calibration = calib_iter * batch_size (default: 100)
string. The directory to save the quantization results (default: ./quantize_results).
Note: For more information on other parameters for **kargs, see vai_q_tensorflow Usage.
Note: For more information on the on-the-fly quantization examples for WeGO TensorFlow 1.x, see examples.