DPU Configuration - 1.0 English

DPUCAHX8L for Convolutional Neural Networks Product Guide (PG366)

Document ID
PG366
Release Date
2024-03-25
Version
1.0 English

The DPU core provides some user-configurable parameters to optimize resource usage and customize different features. Different configurations can be selected for DSP slices, LUT, block RAM, and UltraRAM usage based on the amount of available programmable logic resources. There is an option to determine the frequency of DPU cores that is implemented on the board. The deep neural network features and the associated parameters supported by the DPU are shown in the following table.

Table 1. Deep Neural Network Features and Parameters Supported by DPU
Features Description (channel_parallel=32)
conv2d Kernel Sizes kernel_w: [1, 16]

kernel_h: [1, 16]

Strides stride_w: [1, 4]

stride_h: [1, 4]

Pad_left/Pad_right [0, (kernel_w - 1) * dilation_w + 1]
Pad_top/Pad_bottom [0, (kernel_h - 1) * dilation_h + 1]
In Size

kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048

Out Size output_channel <= 256 * channel_parallel
Activation

ReLU, ReLU6

Dilation dilation * input_channel <= 256 * channel_parallel
depthwise-conv2d Kernel Sizes

kernel_w: [1,3]

kernel_h: [1,3]

Strides stride_w: [1, 4]

stride_h: [1, 4]

Pad_left/Pad_right [1, (kernel_w - 1)]
Pad_top/Pad_bottom [1, (kernel_h - 1)]
In Size kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6
transposed-conv2d Kernel Sizes kernel_w: [1, 16]

kernel_h: [1, 16]

Strides stride_w: [1, 16]

stride_h: [1, 16]

Pad_left/Pad_right [1, kernel_w-1]
Pad_top/Pad_bottom [1, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation

ReLU, ReLU6

depthwise-transposed-conv2d Kernel Sizes

kernel_w/stride_w, kernel_h/stride_h : {3}

Strides
Pad_left/Pad_right [1, kernel_w-1]
Pad_top/Pad_bottom [1, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6
average-pooling Kernel Sizes

kernel_w, kernel_h: {2, 3, 5, 7, 8} kernel_w==kernel_h

Strides stride_w: [1, 8] stride_h: [1, 8]
Pad_left/Pad_right [1, kernel_w-1]
Pad_top/Pad_bottom [1, kernel_h-1]
elementwise-sum Input channel input_channel <= 256 *channel_parallel[1, 8912]
Activation ReLU
Type Sum
Concat Network-specific limitation related to the size of feature maps, quantization results, and compiler optimizations.
Fully Connected Input Channel

input channel <= 16*16*32