The DPU IP provides a few fixed configurations by different XO files. The configuration includes the number of processing engines, the different kernel/filter size in element-wise, and pooling.
The deep neural network features and the associated parameters supported by the DPU are shown in the following table:
| Features | Description (channel_parallel=64, bank_depth=256) | |
|---|---|---|
| Convolution | Kernel Sizes | w: [1, 16] h: [1, 16] |
| Strides | w: [1, 4] h: [1, 4] | |
| Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w] | |
| Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h] | |
| Input Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth | |
| Output Size | output_channel <= 256 * channel_parallel | |
| Activation | ReLU, LeakyReLU, ReLU6, H-swish, H-sigmod | |
| Dilation | dilation * input_channel <= 256 * channel_parallell | |
| depthwise-conv2d 1 | Kernel Sizes | W,H: {3, 5} W==H |
| Strides | W: [1, 4] H: [1, 4] | |
| Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w + 1] | |
| Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h + 1] | |
| In Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 4096 | |
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation | ReLU, ReLU6 | |
| Dilation | dilation * input_channel <= 256 * channel_parallell | |
| transposed-conv2d | Kernel Sizes | W: [1, 16] H: [1, 16] |
| Strides | W: [1, 16] H: [1, 16] | |
| Pad_left/Pad_right | [1, kernel_w-1] | |
| Pad_top/Pad_bottom | [1, kernel_h-1] | |
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation | ReLU, LeakyReLU, ReLU6, H-swish, H-sigmod | |
| depthwise-transposed-conv2d 1 | Kernel Sizes | W,H: {6, 9, 10, 12, 15, 20} |
| Strides | W: [2, 4] H: [2, 4] | |
| Pad_left/Pad_right | [1, kernel_w-1] | |
| Pad_top/Pad_bottom | [1, kernel_h-1] | |
| Out Size | output_channel <= 256 * channel_parallel | |
| Activation | ReLU, ReLU6 | |
|
max-pooling/ average-pooling |
Kernel Sizes |
2/4/6PE: W,H: [1, 8] W==H 8PE: W,H:[1,2,3,7] W==H |
| Strides | W: [1, 8] H: [1, 8] | |
| Pad_left/Pad_right | [1, kernel_w-1] | |
| Pad_top/Pad_bottom | [1, kernel_h-1] | |
| Activation | Not supported | |
| elementwise-sum | Input channel | input_channel <= 256 * channel_parallel |
| Activation | ReLU | |
|
||