The DPU IP provides a few fixed configurations by different XO files. The configuration includes the number of processing engines, the different kernel/filter size in element-wise, and pooling.
The deep neural network features and the associated parameters supported by the DPU are shown in the following table:
Features | Description (channel_parallel=64, bank_depth=256) | |
---|---|---|
Convolution | Kernel Sizes | W, H: [1, 16] |
Strides | W, H: [1, 4] | |
Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w] | |
Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h] | |
Input Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth | |
Output Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid | |
Dilation | dilation * input_channel <= 256 * channel_parallell | |
depthwise-conv2d 1 | Kernel Sizes | W, H: [1, 8] |
Strides | W, H: [1, 4] | |
Pad_left/Pad_right | [0, (kernel_w - 1) * dilation_w + 1] | |
Pad_top/Pad_bottom | [0, (kernel_h - 1) * dilation_h + 1] | |
In Size | kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
Dilation | dilation * input_channel <= 256 * channel_parallell | |
transposed-conv2d | Kernel Sizes | kernel_w/stride_w, kernel_h/stride_h: [1, 16] |
Strides | ||
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid | |
depthwise-transposed-conv2d 1 | Kernel Sizes | kernel_w/stride_w, kernel_h/stride_h: [1, 8] |
Strides | ||
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Out Size | output_channel <= 256 * channel_parallel | |
Activation | ReLU, ReLU6 | |
max-pooling/ average-pooling (MISC unit in PL) |
Kernel Sizes |
2/4/6pe: W,H: [1, 8] W==H 8pe_normal: W,H:{1,2,3,7} W==H |
Strides | W: [1, 8] H: [1, 8] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Activation | Not supported | |
elementwise-sum (MISC unit in PL) |
Input channel | input_channel <= 256 * channel_parallel |
Activation | ReLU | |
max-pooling/ average-pooling (MISC unit on AI Engine) |
Kernel Sizes | 2/4/6/8pe: W,H: [1, 128] |
Strides | W: [1, 128] H: [1, 128] | |
Pad_left/Pad_right | [1, kernel_w-1] | |
Pad_top/Pad_bottom | [1, kernel_h-1] | |
Activation | Not supported | |
elementwise-sum (MISC unit on AI Engine) |
Input channel | input_channel <= 128 * channel_parallel |
Activation | ReLU, Hard-Sigmoid | |
elementwise-multi (MISC unit on AI Engine) |
Input channel | input_channel <= 128 * channel_parallel |
Activation | ReLU, Hard-Sigmoid | |
The different configuration are
as follows:
|