Generally, quantization can cause a slight loss of accuracy, but for certain networks like MobileNets, the accuracy loss can be more significant. In such cases, it is recommended to try fast fine-tuning first. If fast fine-tuning fails to produce satisfactory results, Quantization-Aware Training (QAT) can further enhance the accuracy of the quantized model.
However, specific requirements exist for the model to be trained using QAT APIs. All
operations to be quantized must be instances of the torch.nn.Module
object rather than Torch functions or Python operators. For instance, using
'+'
to add two tensors in PyTorch is common but not supported in QAT.
Instead, replace '+'
with
pytorch_nndct.nn.modules.functional.Add
. The operations that
require replacement are listed in the following table.
Operation | Replacement |
---|---|
+
|
pytorch_nndct.nn.modules.functional.Add
|
-
|
pytorch_nndct.nn.modules.functional.Sub
|
torch.add
|
pytorch_nndct.nn.modules.functional.Add
|
torch.sub
|
pytorch_nndct.nn.modules.functional.Sub
|
pytorch_nndct.nn.QuantStub
and pytorch_nndct.nn.DeQuantStub
at the beginning and end of the network to be
quantized. The network can be a complete or a partial sub-network.