Pruning a Model - 2.5 English

Vitis AI Optimizer User Guide (UG1333)

Document ID
UG1333
Release Date
2022-06-15
Version
2.5 English

Iterative Pruning

The method includes two stages: model analysis and pruned model generation. After the model analysis is completed, analysis result is saved in the file named .vai/xxx.sens. You can prune a model iteratively using this file. In other words, you should prune the model to the target sparsity gradually to avoid the failure to improve the model performance in the retraining stage that is caused by setting a too high pruning ratio.
  1. Define an evaluation function. The function must take a model as its first argument and return a score.
    def eval_fn(model, dataloader):
      top1 = AverageMeter('Acc@1', ':6.2f')
      model.eval()
      with torch.no_grad():
      for i, (images, targets) in enumerate(dataloader):
        images = images.cuda()
        targets = targets.cuda()
        outputs = model(images)
        acc1, _ = accuracy(outputs, targets, topk=(1, 5))
        top1.update(acc1[0], images.size(0))
      return top1.avg
  2. Run model analysis and get a pruned model.
    runner.ana(eval_fn, args=(val_loader,))
    
    model = pruning_runner.prune(removal_ratio=0.2)

Run analysis only once for the same model. You can prune the model iteratively without re-running analysis because there is only one pruned model generated for a specific pruning ratio. The subnetwork obtained by pruning may not be very good because an approximate algorithm is used to generate this unique pruned model according to the analysis result. The one-step pruning method can generate a better subnetwork.

One-step Pruning

The method also include two stages: adaptive-BN-based searching for pruning strategy and pruned model generation. After searching, a file named .vai/xxx.search is generated in which the search result (pruning strategies and corresponding evaluation scores) is stored. You can get the final pruned model in one-step.

num_subnet provides the number of candidate subnetworks satisfying the sparsity requirement to be searched. The best subnetwork can be selected from these candidates. The higher the value, the longer it takes to search, but the higher the probability of finding a better subnetwork.

# Adaptive-BN-based searching for pruning strategy. 'calibration_fn' is a function for calibrating BN layer's statistics.
runner.search(gpus=['0'], calibration_fn=calibration_fn, calib_args=(val_loader,), eval_fn=eval_fn, eval_args=(val_loader,), num_subnet=1000, removal_ratio=0.7)

model = runner.prune(removal_ratio=0.7, index=None)

The eval_fn is the same with iterative pruning method. A calibration_fn function that implements adaptive-BN is shown in the following example code. You should define your code similarly.

def calibration_fn(model, dataloader, number_forward=100):
  model.train()
  with torch.no_grad():
    for index, (images, target) in enumerate(dataloader):
      images = images.cuda()
      model(images)
    if index > number_forward:
      break

The one-step pruning method has several advantages over the iterative approach.

  • The generated pruned models are more accurate. All subnetworks that meet the requirements are evaluated.
  • The workflow is simpler because you can obtain the final pruned model in one step without iterations.
  • Retraining a slim model is faster than a sparse model.

There are two disadvantages to one-step pruning: One is that the random generation of pruning strategy is unstable. The other is that the subnetwork searching must be performed once for every pruning ratio.