量化策略配置 - 3.5 简体中文

Vai_q_pytorch 支持采用 JSON 格式的量化配置文件来处理多个量化策略配置。

用法

您只需将配置文件传递给 torch_quantizer API 即可激活自定义配置：

config_file = "./pytorch_quantize_config.json"
quantizer = torch_quantizer(quant_mode=quant_mode, 
                            module=model, 
                            input_args=(input), 
                            device=device, 
                           quant_config_file=config_file)

./example/ 目录包含以下 3 个示例：int_config.json、bfloat16_config.json 和 mix_precision_config.json。要量化该模型，请使用配置文件搭配 config_xxx_config.json 命令：

python resnet18_quant.py --quant_mode calib --config_file int_config.json
python resnet18_quant.py --quant_mode test --config_file int_config.json

在配置文件示例中，“overall_quantizer_config”中的模型配置设置为 entropy（熵）校准方法和 per_tensor 量化：

"overall_quantize_config": {
  ...
  "method": "entropy",
  ...
  "per_channel": false,
  ...
},

tensor_quantize_config 使用 maxmin 校准方法和 per_tensor 量化来指定权重配置，这表示权重采用来自该模型配置的独特量化方法：

"tensor_quantize_config": {
  ...
  "weights": {
    ...
    "method": "maxmin",
    ...
    "per_channel": false,
    ...
    }

在 layer_quantize_config 列表中包含单层量化设置。此设置是由该层的类型决定的，并对 torch.nn.Conv2d 层应用逐通道量化。

"layer_quantize_config": [
  {
    "layer_type": "torch.nn.Conv2d",
    ...
    "overall_quantize_config": {
      ...
      "per_channel": false,

配置

convert_relu6_to_relu: （全局量化器设置）表示是否将 ReLU6 转换为 ReLU。选项：True 或 False。
include_cle: （全局量化器设置）表示是否使用跨层均衡。选项：True 或 False。
include_bias_corr: （全局量化器设置）表示是否使用偏差纠正。选项：True 或 False
target_device: （全局量化器设置）量化模型的目标类型。选项：DPU、CPU 和 GPU
quantizable_data_type: （全局量化器设置）表示在模型中要量化的张量类型
data_type: （张量量化设置）表示量化中使用的数据类型。选项：int、bfloat16、float16 和 float32
bit_width: （张量量化设置）表示量化中使用的位宽。仅当数据类型为 int 时才适用。
method: （张量量化设置）表示用于校准量化比例的方法。选项：Maxmin、Percentile、Entropy、MSE 和 diffs。仅当数据类型为 int 时才适用。
round_mode: （张量量化设置）表示量化进程中使用的舍入方法。选项：half_even、half_up、half_down 和 std_round。仅当数据类型为 int 时才适用。
symmetry: （张量量化设置）表示是否使用对称量化。选项：True 或 False。仅当数据类型为 int 时才适用。
per_channel: （张量量化设置）表示是否使用 per_channel 量化。选项：True 或 False。仅当数据类型为 int 时才适用。
signed: （张量量化设置）表示是否使用有符号量化。选项：True 或 False。仅当数据类型为 int 时才适用。
narrow_range: （张量量化设置）表示针对有符号量化是否使用对称整数范围。选项：True 或 False。仅当数据类型为 int 时才适用。
scale_type: （张量量化设置）表示量化进程中使用的比例类型。选项：Float（浮点）和 poweroftwo（2 的幂）。仅当数据类型为 int 时才适用。
calib_statistic_method: （张量量化设置）当多批次数据存在不同比例时，此方法用于选择最优量化比例。选项：modal（模态）、max（最大值）、mean（平均值）和 median（中值）。仅当数据类型为 int 时才适用。

分层配置：量化配置遵循分层工作流程。

若在 torch_quantizer API 中未提供配置文件，则使用默认配置，此配置是专为 DPU 器件定制的，并使用 poweroftwo 量化方法。
如已提供配置文件，则模型配置（含全局量化器设置和全局张量量化设置）会覆盖默认设置。
如果配置文件中仅指定了模型配置，那么模型中的所有张量都将使用相同配置。
您可使用层配置来向特定层分配特定配置参数。

默认配置：

以下提供了默认配置的详细信息：

"convert_relu6_to_relu": false,
"include_cle": true,
"include_bias_corr": true,
"target_device": "DPU",
"quantizable_data_type": [
  "input", 
  "weights", 
  "bias", 
  "activation"],
"datatype": "int",
"bit_width": 8, 
"method": "diffs", 
"round_mode": "std_round", 
"symmetry": true, 
"per_channel": false, 
"signed": true, 
"narrow_range": false, 
"scale_type": "poweroftwo", 
"calib_statistic_method": "modal"

模型配置：

在配置文件示例 int_config.json 中，针对模型内的所有张量都分配了相同的 int8 量化配置。在此类情况下，必须在 overall_quantize_config 关键字下指定全局量化参数，如下所示：

  "convert_relu6_to_relu": false,
  "include_cle": false,
  "keep_first_last_layer_accuracy": false,
  "keep_add_layer_accuracy": false,
  "include_bias_corr": false,
  "target_device": "CPU",
  "quantizable_data_type": [
    "input",
    "weights",
    "bias",
    "activation"],
"overall_quantize_config": {
    "datatype": "int",
    "bit_width": 8, 
    "method": "maxmin", 
    "round_mode": "half_even", 
    "symmetry": true, 
    "per_channel": false, 
    "signed": true, 
    "narrow_range": false, 
    "scale_type": "float", 
    "calib_statistic_method": "max"
}

与 int_config.json 相似，在 bfloat16_config.json 中以相同 bfloat16 量化设置来配置模型中的所有张量。在此情况下，全局量化参数中指定的唯一数据类型如下：

  "convert_relu6_to_relu": false,
  "convert_silu_to_hswish": false,
  "include_cle": false,
  "keep_first_last_layer_accuracy": false,
  "keep_add_layer_accuracy": false,
  "include_bias_corr": false,
  "target_device": "CPU",
  "quantizable_data_type": [
    "input",
    "weights",
    "bias",
    "activation"
  ],
  "overall_quantize_config": {
    "datatype": "bfloat16"
  }

（可选）模型中不同张量的量化配置均可单独设置。这些配置必须在 tensor_quantize_config 关键字下进行设置。以配置文件 mix_precision_config.json 为例，量化的全局数据类型为 bfloat16，并将偏差的数据类型更改为 float16。其余参数与全局设置保持一致：

"tensor_quantize_config": {

    "bias": {
        "datatype": "float16", 
    } 
}

层配置：

层量化配置必须整合到 layer_quantize_config 列表中。每一层的配置方法都涉及两个参数：层类型和层名称。执行层配置时，有 5 个要点需要注意：

每个层配置都必须采用词典格式。
在每个层配置中，“quantizable_data_type”和“overall_quantize_config”参数都是必需的。在“overall_quantize_config”参数中，必须包含该层的所有量化参数。
如果设置基于层类型，那么“layer_name”参数应为空值 (null)。
如果根据层名称来设置，则需执行模型校准进程。校准后，您需从 quantized_result 目录中生成的 Python 文件提取所需的层名称。此外，请确保 layer_type 参数为空。
与模型配置相似，层中不同张量的量化配置均可单独自定义。这些独立的张量配置应使用 tensor_quantize_config 关键字来指定。

在配置文件示例中，有两种层配置。其中一种配置基于层类型，另一种配置则基于层名称。在基于层类型的层配置中，torch.nn.Conv2d 层需设为特定量化参数。权重的“per_channel”参数应设为“true”，激活的“method”参数应设为“entropy”：

{
  "layer_type": "torch.nn.Conv2d",
  "layer_name": null,
  "quantizable_data_type": [
    "weights",
    "bias",
    "activation"],
  "overall_quantize_config": {
    "bit_width": 8,
    "method": "maxmin",
    "round_mode": "half_even",
    "symmetry": true,
    "per_channel": false,
    "signed": true,
    "narrow_range": false,
    "scale_type": "float",
    "calib_statistic_method": "max"
  },
  "tensor_quantize_config": {
    "weights": {
      "per_channel": true
    },
    "activation": {
      "method": "entropy"
    }
  }
}

在基于层名称的层配置中，名为“ResNet::ResNet/Conv2d[conv1]/input.2”的层须设为特定量化参数。该层中激活的“round_mode”设为“half_up”：

{
  "layer_type": null,
  "layer_name": "ResNet::ResNet/Conv2d[conv1]/input.2",
  "quantizable_data_type": [
    "weights",
    "bias",
    "activation"],
  "overall_quantize_config": {
    "bit_width": 8,
    "method": "maxmin",
    "round_mode": "half_even",
    "symmetry": true,
    "per_channel": false,
    "signed": true,
    "narrow_range": false,
    "scale_type": "float",
    "calib_statistic_method": "max"
  },
  "tensor_quantize_config": {
    "activation": {
      "round_mode": "half_up"
    }
  }
}

层名称 ResNet::ResNet/Conv2d[conv1]/input.2 是从代码示例的 example/resnet18_quant.py 生成的 quantize_result/ResNet.py 文件中提取的：

使用 python resnet18_quant.py --subset_len 100 命令运行此代码示例。这样即可生成 quantize_result/ResNet.py 文件。
在此文件中，首个卷积层的名称为“ResNet::ResNet/Conv2d[conv1]/input.2”。
如果该层设置为特定配置，请将层名称复制到量化配置文件。

import torch
import pytorch_nndct as py_nndct
class ResNet(torch.nn.Module):
  def __init__(self):
    super(ResNet, self).__init__()
    self.module_0 = py_nndct.nn.Input() #ResNet::input_0
    self.module_1 = py_nndct.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=[7, 7], stride=[2, 2], padding=[3, 3], dilation=[1, 1], groups= 1, bias=True) #ResNet::ResNet/Conv2d[conv1]/input.2

配置限制

由于存在 DPU 相关的设计约束，在 DPU 上使用整数量化和部署量化模型时，量化配置应满足以下限制：

method: diffs or maxmin
round_mode: std_round for weights, bias, and input; half_up for activation.
symmetry: true
per_channel: false
signed: true
narrow_range: true
scale_type: poweroftwo
calib_statistic_method: modal.

对于 CPU 和 GPU 器件，不存在类似的限制。但采用不同配置时，可能出现冲突。例如，校准方法“maxmin”、“percentile”或“entropy”不支持校准统计方法“modal”。此外，如果对称模式设为非对称，则不支持校准方法“mse”和“entropy”。如果发生配置冲突，那么量化工具会提供错误消息来通知用户。