Configuration of Quantization Strategy - 2.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2022-06-15
Version
2.5 English
For multiple quantization strategy configurations, vai_q_pytorch supports quantization configuration file in JSON format. 
  1. Usage
    In order to make the customized configuration take effect, we only need to pass the configuration file to torch_quantizer API.
    config_file = "./pytorch_quantize_config.json"
    quantizer = torch_quantizer(quant_mode=quant_mode, 
                                module=model, 
                                input_args=(input), 
                                device=device, 
                                quant_config_file=config_file)
    There is example code in example/resnet18_quant.py, which could use the file example/pytorch_quantize_config.json as its configuration file. Run command with "--config_file pytorch_quantize_config.json" to quantize model.
    python resnet18_quant.py --quant_mode calib --config_file pytorch_quantize_config.json
    python resnet18_quant.py --quant_mode test --config_file pytorch_quantize_config.json
    In the example configuration file, the model configuration in "overall_quantizer_config" is set to entropy calibration method and per_tensor quantization.
    "overall_quantize_config": {
      ...
      "method": "entropy",
      ...
      "per_channel": false,
      ...
    },
    And the configuration of weights in "tensor_quantize_config" is maxmin calibration method and per_tensor quantization, which means weights use different quantization method from model configuration.
    "tensor_quantize_config": {
      ...
      "weights": {
        ...
        "method": "maxmin",
        ...
        "per_channel": false,
        ...
        }
    Besides, there is one layer quantization configuration in "layer_quantize_config" list. The configuration is based on layer_type, and set torch.nn.Conv2d layer to per_channel quantization.
    "layer_quantize_config": [
      {
        "layer_type": "torch.nn.Conv2d",
        ...
        "overall_quantize_config": {
          ...
          "per_channel": false,
  2. The configurations that can be set in the file:
    convert_relu6_to_relu
    (Global quantizer setting) Whether to convert ReLU6 to ReLU. Options: True or False.
    include_cle
    (Global quantizer setting) Whether to use cross layer equalization. Options: True or False.
    include_bias_corr
    (Global quantizer setting) Whether to use bias correction. Options: True or False
    target_device
    (Global quantizer setting) Device to deploy quantized model, options: DPU, CPU, GPU
    quantizable_data_type
    (Global quantizer setting) tensor types to be quantized in model
    bit_width
    (Tensor quantization setting)Bit width used in quantization
    method
    (Tensor quantization setting)Method used in calibration process. Options: Maxmin, Percentile, Entropy, MSE, diffs.
    round_mode
    (Tensor quantization setting)Rounding method in quantization process. Options: half_even, half_up, half_down, std_round
    symmetry
    (Tensor quantization setting)Whether to use symmetric quantization. Options: True or False
    per_channel
    (Tensor quantization setting)Whether to use per_channel quantization. Options: True or False
    signed
    (Tensor quantization setting)Whether to use signed quantization. Options: True or False
    narrow_range
    (Tensor quantization setting)Whether to use symmetric integer range for signed quantization. Options: True or Falsee
    scale_type
    (Tensor quantization setting)Scale type used in quantization process. Options: Float, power_of_two
    calib_statistic_method
    (Tensor quantization setting)Activation data statistic method in calibration process. Options: modal, max, mean, median
  3. Hierarchical Configuration
    Quantization configuration is in hierarchical structure.
    • If configuration file is not provided in the torch_quantizer API, the default configuration will be used, which is adapted to DPU device and uses power_of_two quantization method.
    • If configuration file is provided, model configuration, including global quantizer settings and global tensor quantization settings are required.
    • If only model configuration is provided in the configuration file, all tensors in the model will use the same configuration.
    • Layer configuration could be used to set some layers to specific configuration parameters.
    1. Default Configurations
      Details of default configuration are shown below.
      "convert_relu6_to_relu": false,
      "include_cle": true,
      "include_bias_corr": true,
      "target_device": "DPU",
      "quantizable_data_type": [
        "input", 
        "weights", 
        "bias", 
        "activation"],
      "bit_width": 8, 
      "method": "diffs", 
      "round_mode": "std_round", 
      "symmetry": true, 
      "per_channel": false, 
      "signed": true, 
      "narrow_range": false, 
      "scale_type": "power_of_two", 
      "calib_statistic_method": "modal"
    2. Model Configurations
      In the example configuration file "example/pytorch_quantize_config.json", the global quantizer settings are set under their respective keywords. And global quantization parameters must be set under the "overall_quantize_config" keyword. As shown below.
        "convert_relu6_to_relu": false,
        "include_cle": false,
        "keep_first_last_layer_accuracy": false,
        "keep_add_layer_accuracy": false,
        "include_bias_corr": false,
        "target_device": "CPU",
        "quantizable_data_type": [
          "input",
          "weights",
          "bias",
          "activation"],
      "overall_quantize_config": {
          "bit_width": 8, 
          "method": "maxmin", 
          "round_mode": "half_even", 
          "symmetry": true, 
          "per_channel": false, 
          "signed": true, 
          "narrow_range": false, 
          "scale_type": "float", 
          "calib_statistic_method": "max"
      }
      Optionally, the quantization configuration of different tensors in the model can be set separately. And the configurations must be set in "tensor_quantize_config" keyword. And in the example configuration file, we just change the quantization method of activation to "mse". The rest of the parameters are used the same as the global parameters.
      "tensor_quantize_config": {
      
          "activation": {
      
              "method": "mse", 
      
          } 
      }
    3. Layer Configurations
      Layer quantization configurations must be added in the "layer_quantize_config" list. And two parameter configuration methods, layer type and layer name, are supported. There are five notes to do layer configuration.
      • Each individual layer configuration must be in dictionary format.
      • In each layer configuration, the "quantizable_data_type" and "overall_quantize_config" parameter are required. And in "overall_quantize_config" parameter, all quantization parameters for this layer need to be included.
      • If the setting is based on layer type, the “layer_name” parameter should be null.
      • If the setting is based on layer name, the model needs to run the calibration process firstly, then pick the required layer name from the generated python file in quantized_result directory. Besides, the “layer_type” parameter should be null.
      • Same as the model configuration, the quantization configuration of different tensors in the layer can be set separately. And they must be set in "tensor_quantize_config" keywords.
      In the example configuration file, there are two layer configurations. One is based on layer type, the other is based on layer name. In the layer configuration based on layer type, torch.nn.Conv2d layer need to set to specific quantization parameters. And the "per_channel" parameter of weight is set to "true", "method" parameter of activation is set to "entropy".
      {
        "layer_type": "torch.nn.Conv2d",
        "layer_name": null,
        "quantizable_data_type": [
          "weights",
          "bias",
          "activation"],
        "overall_quantize_config": {
          "bit_width": 8,
          "method": "maxmin",
          "round_mode": "half_even",
          "symmetry": true,
          "per_channel": false,
          "signed": true,
          "narrow_range": false,
          "scale_type": "float",
          "calib_statistic_method": "max"
        },
        "tensor_quantize_config": {
          "weights": {
            "per_channel": true
          },
          "activation": {
            "method": "entropy"
          }
        }
      }
      In the layer configuration based on layer name, the layer named "ResNet::ResNet/Conv2d[conv1]/input.2" need to set to specific quantization parameters. And the round_mode of activation in this layer is set to "half_up".
      {
        "layer_type": null,
        "layer_name": "ResNet::ResNet/Conv2d[conv1]/input.2",
        "quantizable_data_type": [
          "weights",
          "bias",
          "activation"],
        "overall_quantize_config": {
          "bit_width": 8,
          "method": "maxmin",
          "round_mode": "half_even",
          "symmetry": true,
          "per_channel": false,
          "signed": true,
          "narrow_range": false,
          "scale_type": "float",
          "calib_statistic_method": "max"
        },
        "tensor_quantize_config": {
          "activation": {
            "round_mode": "half_up"
          }
        }
      }
      The layer name "ResNet::ResNet/Conv2d[conv1]/input.2" is picked from generated file "quantize_result/ResNet.py" of example code "example/resnet18_quant.py".
      • Run the example code with command "python resnet18_quant.py --subset_len 100". The quantize_result/ResNet.py file is generated.
      • In the file, the name of first convolution layer is "ResNet::ResNet/Conv2d[conv1]/input.2".
      • Copy the layer name to quantization configuration file if this layer is set to specific configuration.
      import torch
      import pytorch_nndct as py_nndct
      class ResNet(torch.nn.Module):
        def __init__(self):
          super(ResNet, self).__init__()
          self.module_0 = py_nndct.nn.Input() #ResNet::input_0
          self.module_1 = py_nndct.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=[7, 7], stride=[2, 2], padding=[3, 3], dilation=[1, 1], groups= 1, bias=True) #ResNet::ResNet/Conv2d[conv1]/input.2