The DPUCVDX8G provides user-configurable parameters to optimize resource usage and customize features. Different configurations can be selected for AI Engines, DSP slices, LUT, block RAM, and Ultra RAM usage based on the amount of available programmable logic resources. There are also options for additional functions, such as channel augmentation, average pooling, and depthwise convolution. Furthermore, there is an option to configure the number of batch handlers of the DPUCVDX8G that is instantiated in a single DPUCVDX8G IP. The deep neural network features and the associated parameters supported by the DPUCVDX8G are shown in the following table.
A configuration file named arch.json is generated while integrating the DPUCVDX8G in using the Vitis™ accelerated flow. The arch.json file is used by the Vitis AI Compiler for model compilation. For more information on the Vitis AI Compiler, refer to the Vitis AI User Guide (UG1414). In the Vitis accelerated flow, the arch.json file is located at $TRD_HOME/vitis_prj/package_out/sd_card/arch.json.
Features | Description | Range |
---|---|---|
Convolution 2D and 3D | Kernel Sizes | w, h, d: [1, 16] w * h <= 64 w * h * d * input_channel <= 32768 |
Strides | w, h, d: [1, 8] | |
Padding | w: [0, kernel_w-1] h: [0, kernel_h-1] d: [0, kernel_d-1] |
|
Input Size | Arbitrary | |
Input Channel | 1~256 * channel_parallel | |
Output Channel | 1~256 * channel_parallel | |
Activation | ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid and Hard Swish | |
Dilation | dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1 | |
Constraint* | kernel_w * kernel_h * kernel_d * (ceil(input_channel / channel_parallel)) <= bank_depth | |
Depthwise Convolution 2D and 3D | Kernel Sizes | w, h: [1, 256] d: [1, 16] |
Strides | w, h: [1, 256] d = 1 |
|
Padding | w: [0, min(kernel_w-1,15)] h: [0, min(kernel_h-1,15)] d: [0, kernel_d-1] |
|
Input Size | Arbitrary | |
Input Channel | 1~256 * channel_parallel | |
Output Channel | 1~256 * channel_parallel | |
Activation | ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid, and Hard Swish | |
Dilation | dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1 | |
Constraint* | kernel_w * kernel_h * kernel_d * (ceil(input_channel / channel_parallel)) <= bank_depth | |
Transposed Convolution 2D and 3D | Kernel Sizes | kernel_w/stride_w: [1, 16] kernel_h/stride_h: [1, 16] kernel_d/stride_d: [1, 16] |
Strides | ||
Padding | w: [0, kernel_w-1] h: [0, kernel_h-1] d: [0, kernel_d-1] |
|
Input Size | Arbitrary | |
Input Channel | 1~256 * channel_parallel | |
Output Channel | 1~256 * channel_parallel | |
Activation | ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid, and Hard Swish | |
Depthwise Transposed Convolution 2D and 3D | Kernel Sizes | kernel_w/stride_w: [1, 256] kernel_h/stride_h: [1, 256] kernel_d/stride_d: [1, 256] |
Strides | ||
Padding | w: [0, min(kernel_w-1,15)] h: [0, min(kernel_h-1,15)] d: [0, kernel_d-1] |
|
Input Size | Arbitrary | |
Input Channel | 1~256 * channel_parallel | |
Output Channel | 1~256 * channel_parallel | |
Activation | ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid, and Hard Swish | |
Max Pooling | Kernel Sizes | w, h: [1, 256] |
Strides | w, h: [1, 256] | |
Padding | w: [0, min(kernel_w-1,15)] h: [0, min(kernel_h-1,15)] |
|
Average Pooling | Kernel Sizes | w, h: [1, 256] |
Strides | w, h: [1, 256] | |
Padding | w: [0, min(kernel_w-1,15)] h: [0, min(kernel_h-1,15)] |
|
Elementwise-Sum 2D and 3D | Input channel | 1~256 * channel_parallel |
Input size | Arbitrary | |
Feature Map Number | 1~4 | |
Elementwise-Multiply 2D and 3D | Input channel | 1~256 * channel_parallel |
Input size | Arbitrary | |
Feature Map Number | 2 | |
Concat | Output channel | 1~256 * channel_parallel |
Reorg | Strides | stride * stride * input_channel ≤ 256 * channel_parallel |
Fully Connected (FC) | Input_channel | Input_channel ≤ 2048 * channel_parallel |
Output_channel | Arbitrary | |
|