To prune a model, follow these steps:
- Define a function to evaluate model performance. The function must satisfy
two requirements:
- The first argument must be an
keras.Model
instance to be evaluated - Returns a Python number to indicate the performance of the model
def evaluate(model): model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) score = model.evaluate(x_test, y_test, verbose=0) return score[1]
- The first argument must be an
- Use this evaluation function to run model
analysis:
runner.ana(evaluate)
- Determine a pruning ratio. The ratio indicates the reduction in the amount of
floating-point computation of the model in forward pass. [MACs of pruned model] = (1 – ratio) * [MACs of original model]
The value of ratio should be in (0, 1):
sparse_model = runner.prune(ratio=0.2)
Note:ratio
is only an approximate target value and the actual pruning ratio may not be exactly equal to this value.
The returned model from prune()
is sparse
which means the pruned channels are set to zeros and model size remains unchanged. The
sparse model has been used in the iterative pruning process. The sparse model is
converted to a pruned dense model only after pruning is completed.
Besides returning a sparse model, the pruning runner generates a specification file in the .vai directory that describes how each layer will be pruned.