Iterative Pruning
The method includes two stages: model analysis and pruned model generation. After the model analysis is completed, the analysis result is saved in the file named .vai/xxx.sens. You can prune a model iteratively using this file. In iterative pruning, it is necessary to prune the model to the target sparsity gradually. This is accomplished through the use of an iterative loop comprising both a pruning and a fine-tuning step, and using a modest pruning ratio per step. Attempting to set the pruning ratio too high will result in a step loss in accuracy that may not be recoverable.
- Define an evaluation function. The function must take a model as
its first argument and return a
score.
def eval_fn(model, dataloader): top1 = AverageMeter('Acc@1', ':6.2f') model.eval() with torch.no_grad(): for i, (images, targets) in enumerate(dataloader): images = images.cuda() targets = targets.cuda() outputs = model(images) acc1, _ = accuracy(outputs, targets, topk=(1, 5)) top1.update(acc1[0], images.size(0)) return top1.avg
- Run model analysis and get a pruned
model.
runner.ana(eval_fn, args=(val_loader,)) model = pruning_runner.prune(removal_ratio=0.2)
Model analysis only needs to be done once. You can prune the model iteratively without re-running analysis because there is only one pruned model generated for a specific pruning ratio. The subnetwork obtained by pruning may not be very good because an approximate algorithm is used to generate this unique pruned model according to the analysis result. The one-step pruning method can generate a better subnetwork.
One-step Pruning
The method also includes two stages: adaptive-BN-based search for pruning strategy and pruned model generation. After searching, a file named .vai/xxx.search is generated in which the search result (pruning strategies and corresponding evaluation scores) is stored. You can get the final pruned model in one-step.
num_subnet
provides the target number of
candidate subnetworks satisfying the sparsity requirement to be identified. The best
subnetwork can be selected from these candidates. The higher the value, the longer
it takes to search, but the higher the probability of finding a better subnetwork.
# Adaptive-BN-based searching for pruning strategy. 'calibration_fn' is a function for calibrating BN layer's statistics.
runner.search(gpus=['0'], calibration_fn=calibration_fn, calib_args=(val_loader,), eval_fn=eval_fn, eval_args=(val_loader,), num_subnet=1000, removal_ratio=0.7)
model = runner.prune(removal_ratio=0.7, index=None)
The eval_fn
is the same with
iterative pruning method. A calibration_fn
function
that implements adaptive-BN is shown in the following example code. You should
define your code similarly.
def calibration_fn(model, dataloader, number_forward=100):
model.train()
with torch.no_grad():
for index, (images, target) in enumerate(dataloader):
images = images.cuda()
model(images)
if index > number_forward:
break
The one-step pruning method has several advantages over the iterative approach:
- The generated pruned models are typically more accurate. All subnetworks that meet the requirements are evaluated.
- The workflow is simpler because you can obtain the final pruned model in one step without iterations.
- Retraining a slim model is faster than retraining a sparse model.
There are two disadvantages to one-step pruning: One is that the random generation of pruning strategies is unpredictable. The other is that the subnetwork search must be performed once for every pruning ratio.