When iterative pruning is completed, a sparse model is generated, with the same number of parameters as the original model but with many of these parameters now set to zero.
Call get_slim_model()
to remove zeroed
parameters from the sparse model and generate the final pruned model:
model.load_weights("model_sparse_0.5")
input_shape = [28, 28, 1]
input_spec = tf.TensorSpec((1, *input_shape), tf.float32)
runner = IterativePruningRunner(model, input_spec)
slim_model = runner.get_slim_model()
The runner uses the latest pruning specification to generate the slim model by default. You can see what the latest specification file is with the following command:
$ cat .vai/latest_spec
$ ".vai/mnist_ratio_0.5.spec"
If this file does not match your sparse model, you can explicitly specify the file path to be used:
slim_model = runner.get_slim_model(".vai/mnist_ratio_0.5.spec")
You can use keras model saving APIs to save the slim model and reload it
for inference or quantization. For
example,
slim_model.save('/tmp/model')
loaded_model = tf.keras.models.load_model('/tmp/model')