The DPU IP provides a few fixed configurations by different XO files. The configuration includes the number of processing engines, the different kernel/filter size in element-wise, and pooling.
The deep neural network features and the associated parameters supported by the DPU are shown in the following table:
Features | Description (channel_parallel=64, bank_depth=256) | |
---|---|---|
Convolution | Kernel Sizes | w: [1, 16] h: [1, 16] |
Strides | w: [1, 4] h: [1, 4] | |
Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w] | |
Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h] | |
Input Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth | |
Output Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, LeakyReLU, ReLU6, H-swish, H-sigmod | |
Dilation | dilation * input_channel <= 256 * channel_parallell | |
depthwise-conv2d 1 | Kernel Sizes | W,H: {3, 5} W==H |
Strides | W: [1, 4] H: [1, 4] | |
Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w + 1] | |
Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h + 1] | |
In Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 4096 | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
Dilation | dilation * input_channel <= 256 * channel_parallell | |
transposed-conv2d | Kernel Sizes | W: [1, 16] H: [1, 16] |
Strides | W: [1, 16] H: [1, 16] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, LeakyReLU, ReLU6, H-swish, H-sigmod | |
depthwise-transposed-conv2d 1 | Kernel Sizes | W,H: {6, 9, 10, 12, 15, 20} |
Strides | W: [2, 4] H: [2, 4] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
max-pooling/ average-pooling |
Kernel Sizes |
2/4/6PE: W,H: [1, 8] W==H 8PE: W,H:[1,2,3,7] W==H |
Strides | W: [1, 8] H: [1, 8] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Activation | Not supported | |
elementwise-sum | Input channel | input_channel <= 256 * channel_parallel |
Activation | ReLU | |
|