The DPU core provides some user-configurable parameters to optimize resource usage and customize different features. Different configurations can be selected for DSP slices, LUT, block RAM, and UltraRAM usage based on the amount of available programmable logic resources. There is an option to determine the frequency of DPU cores that is implemented on the board. The deep neural network features and the associated parameters supported by the DPU are shown in the following table.
Features | Description (channel_parallel=32) | |
---|---|---|
conv2d | Kernel Sizes | kernel_w: [1, 16] kernel_h: [1, 16] |
Strides | stride_w: [1, 4] stride_h: [1, 4] |
|
Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w + 1] | |
Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h + 1] | |
In Size |
kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048 |
|
Out Size | output_channel <= 256 * channel_parallel | |
Activation |
ReLU, ReLU6 |
|
Dilation | dilation * input_channel <= 256 * channel_parallel | |
depthwise-conv2d | Kernel Sizes |
kernel_w: [1,3] kernel_h: [1,3] |
Strides | stride_w: [1, 4] stride_h: [1, 4] |
|
Pad_left/Pad_right | [1, (kernel_w - 1)] | |
Pad_top/Pad_bottom | [1, (kernel_h - 1)] | |
In Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048 | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
transposed-conv2d | Kernel Sizes | kernel_w: [1, 16] kernel_h: [1, 16] |
Strides | stride_w: [1, 16] stride_h: [1, 16] |
|
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation |
ReLU, ReLU6 |
|
depthwise-transposed-conv2d | Kernel Sizes |
kernel_w/stride_w, kernel_h/stride_h : {3} |
Strides | ||
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
average-pooling | Kernel Sizes |
kernel_w, kernel_h: {2, 3, 5, 7, 8} kernel_w==kernel_h |
Strides | stride_w: [1, 8] stride_h: [1, 8] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
elementwise-sum | Input channel | input_channel <= 256 *channel_parallel[1, 8912] |
Activation | ReLU | |
Type | Sum | |
Concat | Network-specific limitation related to the size of feature maps, quantization results, and compiler optimizations. | |
Fully Connected | Input Channel |
input channel <= 16*16*32 |