Quantization API
def quantize(
input_frozen_graph = "",
input_nodes = "",
input_shapes = "",
output_nodes = "",
input_fn = "",
method = 1,
calib_iter = 100,
output_dir = "./quantize_results",
**kargs)
This function invokes the
vai_q_tensorflow
command tool in WeGO TensorFlow r1.15 and converts the input floating-point model to
a fixed-point model for DPU deployment acceleration. To be fully compatible with the
native vai_q_tensorflow
quantizer, all parameters
received from this API are forwarded to vai_q_tensorflow
command tool directly. This function returns a
quantized GraphDef
object or None
on failure. Note: Only PTQ is supported now for on-the-fly quantization in WeGO.
For more information on fast fine-tuning and QAT quantization, see vai_q_tensorflow Quantization Aware Training.
Parameters
- input_frozen_graph
- string. Path to input frozen graph(.pb) (default: )
- input_nodes
- string: The comma-separated name list of input nodes of the subgraph to be quantized and used together with output_nodes. Only the subgraph between input_nodes and output_nodes is included when generating the deployment model. Set it to the beginning of the main body of the model to quantize, such as the nodes after data pre-processing and augmentation. (default: )
- input_shapes
- string. The comma-separated shape list of input_nodes. The shape must be a
4-dimension shape for each node, separated by commas, for example,
1,224,224,3
; Unknown size for batch size is supported, for example,?,224,224,3
; In case of multiple input_nodes, assign the shape list of each node, separated by:
, for example,?,224,224,3:?,300,300,1
(default: ) - output_nodes
- string: The comma-separated name list of output nodes of the subgraph to be quantized that is used together with input_nodes. Only the subgraph between input_nodes and output_nodes is included when generating the deployment model. Set it to the end of the main body of the model to quantize, such as the nodes, before post-processing. (default: )
- input_fn
- string: The Python importable function that provides the input data. The
format is
module_name.input_fn_name
, for example,my_input_fn.input_fn
. Theinput_fn
should take anint
object as input indicating the calibration step and return a dict (placeholder_node_name : numpy.Array
) object for each call, which will be fed into the model's placeholder nodes. (default: ) - method
- int32: {0,1,2}, default: 1. The quantization method, options are:
- 0: non-overflow method. Ensures no values are saturated during quantization. It might cause inaccurate results
- 1: min-diffs method. Enables saturation for large values during quantization to get smaller quantization errors. This method is slower than method 0 but has higher endurance to outliers.
- 2: min-diffs method with a strategy for depthwise. Enables saturation for large values during quantization to get smaller quantization errors. Apply a special strategy for depthwise weights, but implement method 1 to standard weights and activation. This method is slower than method 0 but has higher endurance to outliers.
- calib_iter
- int32. The iterations of calibration. The total number of images for calibration = calib_iter * batch_size (default: 100)
- output_dir
- string. The directory to save the quantization results (default: ./quantize_results).
Note: For more information on the on-the-fly quantization
examples for WeGO TensorFlow 1.x, see examples.