vai_q_pytorch Usage - 1.3 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2021-02-03

Version

1.3 English

This section introduces the usage of execution tools and APIs to implement quantization and generate a model to be deployed on the target hardware. The APIs in the module pytorch_binding/pytorch_nndct/apis/quant_api.py are as follows:

class torch_quantizer(): 
  def __init__(self,
               quant_mode: str, # ['calib', 'test']
               module: torch.nn.Module,
               input_args: Union[torch.Tensor, Sequence[Any]] = None,
               state_dict_file: Optional[str] = None,
               output_dir: str = "quantize_result",
               bitwidth: int = 8,
               device: torch.device = torch.device("cuda"),
               qat_proc: bool = False):

Class torch_quantizer will create a quantizer object.

Arguments:

quant_mode: An integer that indicates which quantization mode the process is using. "calib" for calibration of quantization, and "test" for evaluation of quantized model.
Module: Float module to be quantized.
Input_args: Input tensor with the same shape as real input of float module to be quantized, but the values can be random numbers.
State_dict_file: Float module pretrained parameters file. If float module has read parameters in, the parameter is not needed to be set.
Output_dir: Directory for quantization result and intermediate files. Default is “quantize_result”.
Bitwidth: Global quantization bit width. Default is 8.
Device: Run model on GPU or CPU.
Qat_proc: Turn on quantize finetuning, also named quantization-aware-training (QAT).

def export_quant_config(self):

This function exports quantization steps information

def export_xmodel(self, output_dir, deploy_check):

This function export xmodel and dump operators' output data for detailed data comparison

Arguments:

Output_dir: Directory for quantization result and intermediate files. Default is “quantize_result”.
Deploy_check: Flags to control dump of data for detailed data comparison. Default is False. If it is set to True, binary format data will be dumped to output_dir/deploy_check_data_int/.