This section introduces the usage of execution tools and APIs to
implement quantization and generate a model to be deployed on the target hardware.
The APIs in the module pytorch_binding/pytorch_nndct/apis/quant_api.py
are as follows:
class torch_quantizer():
def __init__(self,
quant_mode: str, # ['calib', 'test']
module: torch.nn.Module,
input_args: Union[torch.Tensor, Sequence[Any]] = None,
state_dict_file: Optional[str] = None,
output_dir: str = "quantize_result",
bitwidth: int = 8,
device: torch.device = torch.device("cuda"),
qat_proc: bool = False):
Class torch_quantizer
will create a
quantizer object.
Arguments:
- quant_mode
- An integer that indicates which quantization mode the process is using. "calib" for calibration of quantization, and "test" for evaluation of quantized model.
- Module
- Float module to be quantized.
- Input_args
- Input tensor with the same shape as real input of float module to be quantized, but the values can be random numbers.
- State_dict_file
- Float module pretrained parameters file. If float module has read parameters in, the parameter is not needed to be set.
- Output_dir
- Directory for quantization result and intermediate files. Default is “quantize_result”.
- Bitwidth
- Global quantization bit width. Default is 8.
- Device
- Run model on GPU or CPU.
- Qat_proc
- Turn on quantize finetuning, also named quantization-aware-training (QAT).
def export_quant_config(self):
This function exports quantization steps information
def export_xmodel(self, output_dir, deploy_check):
This function export xmodel and dump operators' output data for detailed data comparison
Arguments:
- Output_dir
- Directory for quantization result and intermediate files. Default is “quantize_result”.
- Deploy_check
- Flags to control dump of data for detailed data comparison. Default is False. If it is set to True, binary format data will be dumped to output_dir/deploy_check_data_int/.