In conventional convolution, each input channel needs to perform the operation with one specific kernel, and then the result is obtained by combining the results of all channels together.
In depthwise separable convolution, the operation is performed in two steps: depthwise convolution and pointwise convolution. Depthwise convolution is performed for each feature map separately as shown on the left side of the following figure. The next step is to perform pointwise convolution, which is the same as conventional convolution with kernel size 1x1. The parallelism of depthwise convolution is half that of the pixel parallelism.
In DPUCZDX8G, the depthwise conv is performed by the ALU engine, along with the pooling. The ALU parallel ranges from 1 to PP, and is recommended to be set as PP/2.
ALU Parallel | LUTs | FF | Block RAMs | DSPs |
---|---|---|---|---|
1 | 44212 | 88250 | 255 | 662 |
2 | 46599 | 92380 | 255 | 678 |
4 (recommended)
|
51388 | 98525 | 255 | 710 |
8 | 60751 | 111329 | 255 | 774 |