This section introduces the usage of execution tools and APIs to implement
quantization and generated model to be deployed on target hardware. The APIs are in the
module pytorch_binding/pytorch_nndct/apis/quant_api.py
are:
def torch_quantizer(quant_mode,
module,
input_args,
state_dict_file,
output_dir,
bitwidth_w,
bitwidth_a)
Function torch_quantizer
will create a quantizer.
Argumentss:
- quant_mode: An integer that indicates which quantization mode the process is using. 0 for turning off quantization. 1 for calibration of quantization. 2 for evaluation of quantized model.
- Module: Float module to be quantized.
- Input_args: input tensor with the same shape as real input of float module to be quantized, but the values can be random number.
- State_dict_file: Float module pretrained parameters file. If float module has read parameters in, the parameter is not needed to be set.
- Output_dir: Directory for quantization result and intermediate files. Default is “quantize_result”.
- Bitwidth_w: Global weights and bias quantization bit width. Default is 8.
- Bitwidth_a: Global activation quantization bit width. Default is 8.
def dump_xmodel(output_dir, deploy_check)
Function dump_xmodel
will create deployed model.
Arguments:
- Output_dir: Directory for quantizapyttion result and intermediate files. Default is “quantize_result”
- Depoly_check: Flags to control dump of data for accuracy check. Default is False.