Pruning a Model - 3.5 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2023-09-28

Version

3.5 English

Iterative Pruning

The method includes two stages: model analysis and pruned model generation. After the model analysis is completed, the analysis result is saved in the file named .vai/xxx.sens. You can prune a model iteratively using this file. In iterative pruning, it is necessary to gradually prune the model to the target sparsity. This is accomplished by using an iterative loop comprising both a pruning and a fine-tuning step and a modest pruning ratio per step. Attempting to set the pruning ratio too high results in a steep loss in accuracy that cannot be recoverable.

Define an evaluation function. The function must take a model as its first argument and return a score.

def eval_fn(model, dataloader):
  top1 = AverageMeter('Acc@1', ':6.2f')
  model.eval()
  with torch.no_grad():
  for i, (images, targets) in enumerate(dataloader):
    images = images.cuda()
    targets = targets.cuda()
    outputs = model(images)
    acc1, _ = accuracy(outputs, targets, topk=(1, 5))
    top1.update(acc1[0], images.size(0))
  return top1.avg

Run model analysis and get a pruned model.

runner.ana(eval_fn, args=(val_loader,))

model = pruning_runner.prune(removal_ratio=0.2)

The model analysis only needs to be done once. You can prune the model iteratively without re-running analysis because only one pruned model is generated for a specific pruning ratio. The subnetwork obtained by pruning cannot be very good because an approximate algorithm generates this unique pruned model according to the analysis result. The one-step pruning method can generate a better subnetwork.

One-Step Pruning

The method includes two stages: adaptive-BN-based search for pruning strategy and pruned model generation. After searching, a file named .vai/xxx.search is generated to store the search result (pruning strategies and corresponding evaluation scores). You can get the final pruned model in one step.

num_subnet provides the target number of candidate subnetworks satisfying the sparsity requirement to be identified. The best subnetwork can be selected from these candidates. The higher the value, the longer it takes to search, but the higher the probability of finding a better subnetwork.

# Adaptive-BN-based searching for pruning strategy. 'calibration_fn' is a function for calibrating BN layer's statistics.
runner.search(gpus=['0'], calibration_fn=calibration_fn, calib_args=(val_loader,), eval_fn=eval_fn, eval_args=(val_loader,), num_subnet=1000, removal_ratio=0.7)

model = runner.prune(removal_ratio=0.7, index=None)

The eval_fn is the same as the iterative pruning method. A calibration_fn function that implements adaptive-BN is shown in the following example code. You should define your code similarly.

def calibration_fn(model, dataloader, number_forward=100):
  model.train()
  with torch.no_grad():
    for index, (images, target) in enumerate(dataloader):
      images = images.cuda()
      model(images)
    if index > number_forward:
      break

The one-step pruning method has several advantages over the iterative approach:

The generated pruned models are typically more accurate. All subnetworks that meet the requirements are evaluated.
The workflow is simpler because you can obtain the final pruned model in one step without iterations.
Retraining a slim model is faster than retraining a sparse model.

There are two disadvantages to one-step pruning: One is that the random generation of pruning strategies is unpredictable. The other is that the subnetwork search must be performed once for every pruning ratio.