DPUCVDX8G Feature Support - 1.1 English

DPUCVDX8G for Versal ACAPs Product Guide (PG389)

Document ID
PG389
Release Date
2022-01-20
Version
1.1 English

The DPUCVDX8G provides user-configurable parameters to optimize resource usage and customize features. Different configurations can be selected for AI Engines, DSP slices, LUT, block RAM, and Ultra RAM usage based on the amount of available programmable logic resources. There are also options for additional functions, such as channel augmentation, average pooling, and depthwise convolution. Furthermore, there is an option to configure the number of batch handlers of the DPUCVDX8G that is instantiated in a single DPUCVDX8G IP. The deep neural network features and the associated parameters supported by the DPUCVDX8G are shown in the following table.

A configuration file named arch.json is generated while integrating the DPUCVDX8G in using the Vitis™ accelerated flow. The arch.json file is used by the Vitis AI Compiler for model compilation. For more information on the Vitis AI Compiler, refer to the Vitis AI User Guide (UG1414). In the Vitis accelerated flow, the arch.json file is located at $TRD_HOME/vitis_prj/package_out/sd_card/arch.json.

Table 1. Deep Neural Network Features and Parameters Supported by the DPUCVDX8G
Features Description Range
Convolution 2D and 3D Kernel Sizes w, h, d: [1, 16]

w * h <= 64

w * h * d * input_channel <= 32768

Strides w, h, d: [1, 8]
Padding w: [0, kernel_w-1]

h: [0, kernel_h-1]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel 1~256 * channel_parallel
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid and Hard Swish
Dilation dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1
Constraint* kernel_w * kernel_h * kernel_d * (ceil(input_channel / channel_parallel)) <= bank_depth
Depthwise Convolution 2D and 3D Kernel Sizes w, h: [1, 256]

d: [1, 16]

Strides w, h: [1, 256]

d = 1

Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel 1~256 * channel_parallel
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid, and Hard Swish
Dilation dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1
Constraint* kernel_w * kernel_h * kernel_d * (ceil(input_channel / channel_parallel)) <= bank_depth
Transposed Convolution 2D and 3D Kernel Sizes kernel_w/stride_w: [1, 16]

kernel_h/stride_h: [1, 16]

kernel_d/stride_d: [1, 16]
Strides
Padding w: [0, kernel_w-1]

h: [0, kernel_h-1]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel 1~256 * channel_parallel
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid, and Hard Swish
Depthwise Transposed Convolution 2D and 3D Kernel Sizes kernel_w/stride_w: [1, 256]

kernel_h/stride_h: [1, 256]

kernel_d/stride_d: [1, 256]
Strides
Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel 1~256 * channel_parallel
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid, and Hard Swish
Max Pooling Kernel Sizes w, h: [1, 256]
Strides w, h: [1, 256]
Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

Average Pooling Kernel Sizes w, h: [1, 256]
Strides w, h: [1, 256]
Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

Elementwise-Sum 2D and 3D Input channel 1~256 * channel_parallel
Input size Arbitrary
Feature Map Number 1~4
Elementwise-Multiply 2D and 3D Input channel 1~256 * channel_parallel
Input size Arbitrary
Feature Map Number 2
Concat Output channel 1~256 * channel_parallel
Reorg Strides stride * stride * input_channel ≤ 256 * channel_parallel
Fully Connected (FC) Input_channel Input_channel ≤ 2048 * channel_parallel
Output_channel Arbitrary
  1. In DPUCVDX8G, the channel_parallel parameter is 16.
  2. In some neural networks, the FC layer is connected with a Flatten layer. The Vitis AI compiler automatically combines the Flatten+FC to a global CONV2D layer, and the CONV2D kernel size is directly equal to the input feature map size of Flatten layer. For this case, the input feature map size cannot exceed the limitation of the kernel size of CONV, otherwise an error is generated during compilation.

    This limitation occurs only in the Flatten+FC situation.

  3. The bank_depth is the on-chip weight buffer depth. In the DPUCVDX8G, the bank_depth is 16384.
  4. If the Batch Normalization is quantized and can be transformed to a depthwise-conv2d equivalently, it would be transformed to depthwise-conv2d and the compiler would search for compilation opportunities to map the Batch Normalization into DPU implementations. Otherwise, the batch_norm would be executed by CPU.