The torch_quantizer
class creates a quantizer
object:
class torch_quantizer():
def __init__(self,
quant_mode: str, # ['calib', 'test']
module: torch.nn.Module,
input_args: Union[torch.Tensor, Sequence[Any]] = None,
state_dict_file: Optional[str] = None,
output_dir: str = "quantize_result",
bitwidth: int = 8,
device: torch.device = torch.device("cuda"),
quant_config_file: Optional[str] = None,
target: Optional[str]=None):
Arguments
- Quant_mode
- An integer that indicates which quantization mode the process is using. The value calib is used for calibration of quantization, while test is used for evaluating the quantized model.
- Module
- Float module to be quantized.
- Input_args
- Input tensor with the same shape as the actual input of the floating-point module to be quantized, but the values can be random numbers.
- State_dict_file
- Float module pretrained parameters file. If the float module has read parameters, the parameter need not be set.
- Output_dir
- Directory for quantization results and intermediate files. The default value is quantize_result.
- Bitwidth
- Global quantization bit width. The default value is 8.
- Device
- Run model on GPU or CPU.
- Quant_config_file
- Location of the JSON file with the quantization strategy configuration.
- Target
- If the target device is specified, the hardware-aware quantization is on. The default value is None.