The following are some tips to improve training results:
- If possible, load the pre-trained floating-point weights as initial values to start the quantization aware training. It is possible to train from scratch with random initial values, but it makes training more complex and lengthy.
- If pre-trained floating-point weights are loaded, different initial learning
rates and learning rate decrease strategies must be used for the network and
quantizer parameters, respectively. In general, the learning rate of network
parameters must be set small, while the learning rate of quantizer parameters
needs to be larger.
model = qat_processor.trainable_model() param_groups = [{ 'params': model.quantizer_parameters(), 'lr': 1e-2, 'name': 'quantizer' }, { 'params': model.non_quantizer_parameters(), 'lr': 1e-5, 'name': 'weight' }] optimizer = torch.optim.Adam(param_groups)
- For the choice of the optimizer, avoid using torch.optim.SGD, as this optimizer can prevent the training from converging. AMD recommends using torch.optim.Adam or torch.optim.RMSprop and their variants.