Neural networks are typically over-parameterized with significant redundancy. Pruning is the process of eliminating redundant weights while keeping the accuracy loss as low as possible.
Figure 1. Coarse Pruning and Fine Pruning
Industry research has led to several techniques that serve to reduce the computational cost of neural networks for inference. These techniques include:
- Fine-grained pruning
- Coarse-grained pruning
- Neural Architecture Search (NAS)