DepthwiseConv (ALU) - 4.0 English

DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide (PG338)

Document ID

PG338

Release Date

2022-06-24

Version

4.0 English

In conventional convolution, each input channel needs to perform the operation with one specific kernel, and then the result is obtained by combining the results of all channels together.

In depthwise separable convolution, the operation is performed in two steps: depthwise convolution and pointwise convolution. Depthwise convolution is performed for each feature map separately as shown on the left side of the following figure. The next step is to perform pointwise convolution, which is the same as conventional convolution with kernel size 1x1. The parallelism of depthwise convolution is half that of the pixel parallelism.

In DPUCZDX8G, the depthwise conv is performed by the ALU engine, along with the pooling. The ALU parallel ranges from 1 to PP, and is recommended to be set as PP/2.

Figure 1. Depthwise Convolution and Pointwise Convolution

Table 1. Resources of DPUCZDX8G B4096 with Different ALU Parallel
ALU Parallel	LUTs	FF	Block RAMs	DSPs
1	44212	88250	255	662
2	46599	92380	255	678
4 (recommended)	51388	98525	255	710
8	60751	111329	255	774