Most neural networks are typically over-parameterized with significant redundancy to achieve a certain accuracy. Pruning is the process of eliminating redundant weights while keeping the accuracy loss as low as possible.
The simplest form of pruning is called fine-grained pruning and results in sparse weight matrices. The Vitis™ AI pruner employs coarse-grained pruning, which eliminates neurons that do not contribute significantly to the accuracy of the network. For convolutional layers, the coarse-grained method prunes the entire 3D kernel and so it is also called channel pruning. Inference acceleration can be achieved without specialized hardware for coarse-grained pruned models. Pruning always reduces the accuracy of the original model. Retraining (fine-tuning) adjusts the remaining weights to recover accuracy.
Coarse-grained pruning works well on large models with common convolutions, for example, ResNet and VGGNet, but when it comes to depthwise convolution based models like the MobileNet-v2, the accuracy of the pruned model drops dramatically even with a small pruning rate.
The Vitis AI pruner introduces a one-shot neural architecture search (NAS) based approach to solve the above problem. This method requires a four-step process:
- Training
- Searching
- Pruning
- Fine-tuning (optional)
Compared with coarse-grained pruning, one-shot NAS jointly optimizes all the candidate subnetworks using supervised training. All candidate subnetworks are trained well and you are shown the potential for all candidate subnetworks. Then, the best FLOPs-accuracy trade-offs candidate subnetwork can be searched. The one-shot NAS method is effective in compressing the model with both depthwise convolutions and common convolutions but it requires a long training time and training skills.