Baseline Model
SSD (https://arxiv.org/abs/1512.02325) is a deep neural network for detecting objects in images. This example uses the VGG16 as the backbone of the model.
Creating a Configuration File
Create a file named config.prototxt:
workspace: "examples/decent_p/ssd/"
model: "examples/decent_p/ssd/float.prototxt"
weights: "examples/decent_p/ssd/float.caffemodel"
solver: "examples/decent_p/ssd/solver.prototxt"
gpu: "0,1,2,3"
test_iter: 10
acc_name: "detection_eval"
ssd_ap_version: "11point"
rate: 0.15
pruner {
method: REGULAR
exclude {
layer_top:
"conv4_3_norm_mbox_loc"
layer_top:
"conv4_3_norm_mbox_conf"
layer_top: "fc7_mbox_loc"
layer_top: "fc7_mbox_conf"
layer_top:
"conv6_2_mbox_loc"
layer_top:
"conv6_2_mbox_conf"
layer_top:
"conv7_2_mbox_loc"
layer_top:
"conv7_2_mbox_conf"
layer_top:
"conv8_2_mbox_loc"
layer_top:
"conv8_2_mbox_conf"
layer_top:
"conv9_2_mbox_loc"
layer_top:
"conv9_2_mbox_conf"
}
}
Due to the nature of the SSD network, the number of filters in some convolution layers must be fixed and these layers need to be excluded from pruning. In the sample above, the top names of the layers to be excluded are listed within the "exclude" section. In general, if a convolution layer is directly calculated with the label, it cannot be pruned. For example, if the output of a convolution layer needs to be calculated with the label to get top-5 accuracy, then it must be excluded. Because the number of classes of label is fixed, it is necessary to ensure that the dimensions of the output of this convolution layer match the label.
Performing Model Analysis
$ ./vai_p_caffe ana –config config.prototxt
Pruning the Model
$ ./vai_p_caffe prune –config config.prototxt
Fine-tuning the Pruned Model
The following solver settings can be used as initial parameters for fine-tuning:
net: "float.prototxt"
test_iter: 229
test_interval: 500
base_lr: 0.001
display: 10
max_iter: 120000
lr_policy: "multistep"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
snapshot: 500
snapshot_prefix: "SSD_"
solver_mode: GPU
device_id: 4
debug_info: false
snapshot_after_train: true
test_initialization: false
average_loss: 10
stepvalue: 80000
stepvalue: 100000
stepvalue: 120000
iter_size: 1
type: "SGD"
eval_type: "detection"
ap_version: "11point"
$ ./vai_p_caffe finetune -config config.prototxt
Estimated time required: about 50 hours for 650 epochs using Cityscapes training set (2975 images, 4 x NVIDIA Tesla V100).
Getting Final Output
To get the finalized model, run the following:
$ ./vai_p_caffe transform –model baseline.prototxt –weights finetuned_model.caffemodel -output
final.caffemodel
Pruning Results
- Dataset
- Cityscapes (four classes)
- Input Size
- 500 x 500
- GPU Platform
- 4 x NVIDIA Tesla V100
- FLOPs
- 173G
- #Parameters
- 24M
Round | FLOPs | Parameters | mAP |
---|---|---|---|
0 | 100% | 100% | 0.571 |
1 | 50% | 29% | 0.587 |
2 | 9.7% | 9.7% | 0.559 |