Usage
config_file = "./pytorch_quantize_config.json"
quantizer = torch_quantizer(quant_mode=quant_mode,
module=model,
input_args=(input),
device=device,
quant_config_file=config_file)
config_xxx_config.json
command:python resnet18_quant.py --quant_mode calib --config_file int_config.json
python resnet18_quant.py --quant_mode test --config_file int_config.json
"overall_quantize_config": {
...
"method": "entropy",
...
"per_channel": false,
...
},
"tensor_quantize_config": {
...
"weights": {
...
"method": "maxmin",
...
"per_channel": false,
...
}
"layer_quantize_config": [
{
"layer_type": "torch.nn.Conv2d",
...
"overall_quantize_config": {
...
"per_channel": false,
Configurations
- convert_relu6_to_relu
- (Global quantizer setting) Whether to convert ReLU6 to ReLU. Options: True or False.
- include_cle
- (Global quantizer setting) Whether to use cross-layer equalization. Options: True or False.
- include_bias_corr
- (Global quantizer setting) Whether to use bias correction. Options: True or False
- target_device
- (Global quantizer setting) Target type for the quantized model. Options: DPU, CPU, GPU
- quantizable_data_type
- (Global quantizer setting) Tensor types to be quantized in model
- data_type
- (Tensor quantization setting) Data type used in quantization. Option: int, bfloat16, float16, float32
- bit_width
- (Tensor quantization setting) Bit width used in quantization. Only applicable when the data type is int.
- method
- (Tensor quantization setting) Method used to calibrate the quantization scale. Options: Maxmin, Percentile, Entropy, MSE, diffs. Only applicable when the data type is int.
- round_mode
- (Tensor quantization setting) Rounding method in the quantization process. Options: half_even, half_up, half_down, std_round. Only applicable when the data type is int.
- symmetry
- (Tensor quantization setting) Whether to use symmetric quantization. Options: True or False. Only applicable when the data type is int.
- per_channel
- (Tensor quantization setting) Whether to use per_channel quantization. Options: True or False. Only applicable when the data type is int.
- signed
- (Tensor quantization setting) Whether to use signed quantization. Options: True or False. Only applicable when the data type is int.
- narrow_range
- (Tensor quantization setting) Whether to use symmetric integer range for signed quantization. Options: True or False. Only applicable when the data type is int.
- scale_type
- (Tensor quantization setting) Scale type used in the quantization process. Options: Float, poweroftwo. Only applicable when the data type is int.
- calib_statistic_method
- (Tensor quantization setting) Method for selecting the optimal quantization scale when multiple batch data have different scales. Options: modal, max, mean, median. Only applicable when the data type is int.
- When the configuration file is not provided in the torch_quantizer API, the default configuration, tailored for the DPU device and using the poweroftwo quantization method, is applied.
- When a configuration file is provided, the model configuration, encompassing global quantizer settings and global tensor quantization settings, overrides default settings.
- When only the model configuration is specified in the file, all tensors within the model will adopt the same configuration.
- You can use the layer configuration to assign specific configuration parameters to specific layers.
"convert_relu6_to_relu": false,
"include_cle": true,
"include_bias_corr": true,
"target_device": "DPU",
"quantizable_data_type": [
"input",
"weights",
"bias",
"activation"],
"datatype": "int",
"bit_width": 8,
"method": "diffs",
"round_mode": "std_round",
"symmetry": true,
"per_channel": false,
"signed": true,
"narrow_range": false,
"scale_type": "poweroftwo",
"calib_statistic_method": "modal"
overall_quantize_config
keyword as
follows: "convert_relu6_to_relu": false,
"include_cle": false,
"keep_first_last_layer_accuracy": false,
"keep_add_layer_accuracy": false,
"include_bias_corr": false,
"target_device": "CPU",
"quantizable_data_type": [
"input",
"weights",
"bias",
"activation"],
"overall_quantize_config": {
"datatype": "int",
"bit_width": 8,
"method": "maxmin",
"round_mode": "half_even",
"symmetry": true,
"per_channel": false,
"signed": true,
"narrow_range": false,
"scale_type": "float",
"calib_statistic_method": "max"
}
"convert_relu6_to_relu": false,
"convert_silu_to_hswish": false,
"include_cle": false,
"keep_first_last_layer_accuracy": false,
"keep_add_layer_accuracy": false,
"include_bias_corr": false,
"target_device": "CPU",
"quantizable_data_type": [
"input",
"weights",
"bias",
"activation"
],
"overall_quantize_config": {
"datatype": "bfloat16"
}
"tensor_quantize_config": {
"bias": {
"datatype": "float16",
}
}
- Each layer configuration must be in dictionary format.
- In each layer configuration, the quantizable_data_type and overall_quantize_config parameters are required. In the overall_quantize_config parameter, all quantization parameters for this layer must be included.
- If the setting is based on layer type, the layer_name parameter should be null.
- If the setting is based on the layer name, you need to perform a calibration process for the model. After calibration, you need to pick the required layer name from the Python file generated in the quantized_result directory. Additionally, ensure that the layer_type parameter is null.
- Similar to the model configuration, the quantization configuration for different tensors within a layer can be customized separately. These individual tensor configurations should be specified using the tensor_quantize_config keyword.
{
"layer_type": "torch.nn.Conv2d",
"layer_name": null,
"quantizable_data_type": [
"weights",
"bias",
"activation"],
"overall_quantize_config": {
"bit_width": 8,
"method": "maxmin",
"round_mode": "half_even",
"symmetry": true,
"per_channel": false,
"signed": true,
"narrow_range": false,
"scale_type": "float",
"calib_statistic_method": "max"
},
"tensor_quantize_config": {
"weights": {
"per_channel": true
},
"activation": {
"method": "entropy"
}
}
}
In the layer configuration based on layer name, the layer named ResNet::ResNet/Conv2d[conv1]/input.2 must be set to specific quantization parameters. The round_mode of activation in this layer is set to half_up:
{
"layer_type": null,
"layer_name": "ResNet::ResNet/Conv2d[conv1]/input.2",
"quantizable_data_type": [
"weights",
"bias",
"activation"],
"overall_quantize_config": {
"bit_width": 8,
"method": "maxmin",
"round_mode": "half_even",
"symmetry": true,
"per_channel": false,
"signed": true,
"narrow_range": false,
"scale_type": "float",
"calib_statistic_method": "max"
},
"tensor_quantize_config": {
"activation": {
"round_mode": "half_up"
}
}
}
The layer name ResNet::ResNet/Conv2d[conv1]/input.2 is picked from
generated file quantize_result/ResNet.py of
the example/resnet18_quant.py of the example
code: - Run the example code with the
python resnet18_quant.py --subset_len 100
command. The quantize_result/ResNet.py file is generated. - In the file, the name of the first convolution layer is ResNet::ResNet/Conv2d[conv1]/input.2.
- Copy the layer name to the quantization configuration file if this layer is set to a specific configuration.
import torch
import pytorch_nndct as py_nndct
class ResNet(torch.nn.Module):
def __init__(self):
super(ResNet, self).__init__()
self.module_0 = py_nndct.nn.Input() #ResNet::input_0
self.module_1 = py_nndct.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=[7, 7], stride=[2, 2], padding=[3, 3], dilation=[1, 1], groups= 1, bias=True) #ResNet::ResNet/Conv2d[conv1]/input.2
Configuration Restrictions
method: diffs or maxmin
round_mode: std_round for weights, bias, and input; half_up for activation.
symmetry: true
per_channel: false
signed: true
narrow_range: true
scale_type: poweroftwo
calib_statistic_method: modal.
For CPU and GPU devices, there are no similar restrictions in place. However, conflicts might arise when employing different configurations. For instance, the calibration methods 'maxmin,' 'percentile,' 'mse,' or 'entropy' do not support the calibration statistic method 'modal.' Furthermore, if the symmetry mode is set to asymmetry, the calibration methods 'mse' and 'entropy' are unsupported. If configuration conflicts occur, the quantization tool provides an error message to notify the user.