There is an option to determine the number of DPU engines that will be instantiated in a single DPU IP. The deep neural network features and the associated parameters supported by the DPU are shown in the following table.
Features | Description (channel_parallel=16) | |
---|---|---|
conv2d | Kernel Sizes | kernel_w: [1, 16] kernel_h: [1, 16] |
Strides | stride_w: [1, 4] stride_h: [1, 4] |
|
Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w + 1] | |
Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h + 1] | |
In Size |
kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048 |
|
Out Size | output_channel <= 256 * channel_parallel | |
Activation |
ReLU, LeakyReLU, ReLU6 |
|
Dilation | dilation * input_channel <= 256 * channel_parallel | |
depthwise-conv2d | Kernel Sizes | kernel_w: [1,3] kernel_h: [1,3] |
Strides | stride_w: [1, 2] stride_h: [1, 2] |
|
Pad_left/Pad_right | [1, (kernel_w - 1)] | |
Pad_top/Pad_bottom | [1, (kernel_h - 1)] | |
In Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048 | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
transposed-conv2d | Kernel Sizes | kernel_w: [1, 16] kernel_h: [1, 16] |
Strides | stride_w: [1, 16] stride_h: [1, 16] |
|
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation |
ReLU, LeakyReLU, ReLU6 |
|
depthwise-transposed-conv2d | Kernel Sizes | kernel_w: [3] kernel_h: [3] |
Strides | stride_w: [1] stride_h: [1] |
|
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
average-pooling | Kernel Sizes |
kernel_w: [1, 8] kernel_h: [1, 8] kernel_w==kernel_h |
Strides | stride_w: [1, 8] stride_h: [1, 8] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
elementwise-sum | Input channel | input_channel <= 256 *channel_parallel[1, 8912] |
Activation | ReLU | |
Concat | Network-specific limitation related to the size of feature maps, quantization results, and compiler optimizations. | |
Fully Connected | Input Channel |
input channel <= 16*16*16 |