The pruner is designed to reduce the number of model parameters while minimizing the accuracy loss. This is done iteratively as shown in the following figure. Pruning results in accuracy loss while retraining recovers accuracy. Pruning, followed by retraining, forms one iteration. In the first iteration of pruning, the input model is the baseline model, and it is pruned and fine-tuned. In subsequent iterations, the fine-tuned model obtained from the previous iterations becomes the new baseline. This process is usually repeated several times until a desired sparse model is obtained. The iterative approach is required because a model cannot be pruned in a single pass while maintaining accuracy. If in a single pass too many parameters are removed, the accuracy loss becomes a step function. Such loss is challenging to recover.
Leveraging the process of iterative pruning, higher pruning rates can be achieved without any significant loss of model performance.
The four primary stages in iterative pruning are as follows:
- Analysis
- Perform a sensitivity analysis on the model to determine the optimal pruning strategy.
- Pruning
- Reduce the number of computations in the input model.
- Fine-tuning
- Retrain the pruned model to recover accuracy.
- Transformation
- Generate a dense model with reduced weights.
Follow these steps to prune a model. The steps are also shown in the following figure.
- Analyze the original baseline model.
- Prune the model.
- Fine-tune the pruned model.
- Repeat steps 2 and 3 several times.
- Transform the pruned sparse model to a final dense encrypted model to be used with the Vitis AI Quantizer.
Guidelines for Better Pruning Results
The following is a list of suggestions to optimize pruning results. Following these guidelines has been found to help developers achieve higher pruning ratios and reduced accuracy loss.
- Use as much data as possible to perform a model analysis. Ideally, you should use all the data in the validation dataset but this can be time consuming. You can also use partial validation set data, but you need to make sure at least half of the data set is used.
- During the fine-tuning stage, experiment with a few parameters, including the initial learning rate, the learning rate decay policy and use the best result as the input to the next round of pruning.
- The data used in fine-tuning should be a subset of the original dataset used to train the baseline model.
- If the accuracy does not improve sufficiently after conducting several fine-tuning experiments, try reducing the pruning rate parameter and then re-run pruning and fine-tuning.