There is an option to determine the number of DPU engines that will be instantiated in a single DPU IP. The deep neural network features and the associated parameters supported by the DPU are shown in the following table.
| Features | Description (channel_parallel=16) | |
|---|---|---|
| conv2d | Kernel Sizes | kernel_w: [1, 16] kernel_h: [1, 16] |
| Strides | stride_w: [1, 4] stride_h: [1, 4] |
|
| Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w + 1] | |
| Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h + 1] | |
| In Size |
kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048 |
|
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation |
ReLU, LeakyReLU, ReLU6 |
|
| Dilation | dilation * input_channel <= 256 * channel_parallel | |
| depthwise-conv2d | Kernel Sizes | kernel_w: [1,3] kernel_h: [1,3] |
| Strides | stride_w: [1, 2] stride_h: [1, 2] |
|
| Pad_left/Pad_right | [1, (kernel_w - 1)] | |
| Pad_top/Pad_bottom | [1, (kernel_h - 1)] | |
| In Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048 | |
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation | ReLU, ReLU6 | |
| transposed-conv2d | Kernel Sizes | kernel_w: [1, 16] kernel_h: [1, 16] |
| Strides | stride_w: [1, 16] stride_h: [1, 16] |
|
| Pad_left/Pad_right | [1, kernel_w-1] | |
| Pad_top/Pad_bottom | [1, kernel_h-1] | |
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation |
ReLU, LeakyReLU, ReLU6 |
|
| depthwise-transposed-conv2d | Kernel Sizes | kernel_w: [3] kernel_h: [3] |
| Strides | stride_w: [1] stride_h: [1] |
|
| Pad_left/Pad_right | [1, kernel_w-1] | |
| Pad_top/Pad_bottom | [1, kernel_h-1] | |
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation | ReLU, ReLU6 | |
| average-pooling | Kernel Sizes |
kernel_w: [1, 8] kernel_h: [1, 8] kernel_w==kernel_h |
| Strides | stride_w: [1, 8] stride_h: [1, 8] | |
| Pad_left/Pad_right | [1, kernel_w-1] | |
| Pad_top/Pad_bottom | [1, kernel_h-1] | |
| elementwise-sum | Input channel | input_channel <= 256 *channel_parallel[1, 8912] |
| Activation | ReLU | |
| Concat | Network-specific limitation related to the size of feature maps, quantization results, and compiler optimizations. | |
| Fully Connected | Input Channel |
input channel <= 16*16*16 |