Class torch_quantizer
creates a
quantizer object.
class torch_quantizer():
def __init__(self,
quant_mode: str, # ['calib', 'test']
module: torch.nn.Module,
input_args: Union[torch.Tensor, Sequence[Any]] = None,
state_dict_file: Optional[str] = None,
output_dir: str = "quantize_result",
bitwidth: int = 8,
device: torch.device = torch.device("cuda"),
qat_proc: bool = False):
Arguments
- quant_mode
- An integer that indicates which quantization mode the process is using. "calib" for calibration of quantization, and "test" for evaluation of quantized model.
- Module
- Float module to be quantized.
- Input_args
- Input tensor with the same shape as real input of float module to be quantized, but the values can be random numbers.
- State_dict_file
- Float module pretrained parameters file. If float module has read parameters in, the parameter is not needed to be set.
- Output_dir
- Directory for quantization result and intermediate files. Default is “quantize_result”.
- Bitwidth
- Global quantization bit width. Default is 8.
- Device
- Run model on GPU or CPU.
- Qat_proc
- Turn on quantize finetuning, also named quantization-aware-training (QAT).