Supported OPs and DPU Limitations

Supported OPs and DPU Limitations - 1.3 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2021-02-03

Version

1.3 English

Currently Supported Operators

Xilinx is continuously improving the DPU IP and the compiler to support more operators with better performance. The following table lists some typical operations and the configurations such as kernel size, stride, etc. that the DPU can support. If the operation configurations exceed these limitations, the operator will be assigned to the CPU. Additionally, the operators that the DPU can support are dependent on the DPU types, ISA versions, and configurations.

In order to make DPU adaptable to a variety of FPGA devices, some kinds of DPU are configurable. You can choose necessary engines, adjust some intrinsic parameters and create your own DPU IP with TRD projects. But that means the limitations can be very different between configurations. You can find more information about how will those options impact on the limitations in PG338. Or it is recommended that you could try compiling the model with your own DPU configuration. The compiler will tell you which operators would be assigned to CPU and why they would be so. The table shows a specific configuration of each DPU architeciture.

Table 1. Currently Supported Operators
Typical Operation Type in CNN	Parameters	DPUCZDX8G_ISA0_B4096_MAX_BG2 (ZCU102/104)	DPUCAHX8L_ISA0 (U280)	DPUCAHX8H_ISA2 (U50LV9E, U50LV10E, U280), DPUCAHX8H_ISA2_ELP2 (U50)	DPUCVDX8G_ISA0_B8192C32B3 (VCK190)	DPUCVDX8H_ISA0 (VCK5000)
Intrinsic Parameter		channel_parallel: 16 bank_depth: 2048	channel_parallel: 32 bank_depth: 4096	channel_parallel: 16 bank_depth: 2048	channel_parallel: 16 bank_depth: 16384	channel_parallel: 64 bank_depth: 256
conv2d	Kernel size	w, h: [1, 16]	w, h: [1, 16]	w, h: [1, 16]	w, h: [1, 16] w * h <= 64	w, h: [1, 16]
	Strides	w, h: [1, 8]	w, h: [1, 4]	w, h: [1, 4]	w, h: [1, 4]	w, h: [1, 4]
	Dilation	dilation * input_channel <= 256 * channel_parallel
	Paddings	pad_left, pad_right: [0, (kernel_w - 1) * dilation_w + 1]
	Paddings	pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h + 1]
	In Size	kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, LeakyReLU, ReLU6	ReLU, ReLU6	ReLU, LeakyReLU, ReLU6	ReLU, LeakyReLU, ReLU6	ReLU, LeakyReLU
	Group* (Caffe)	group==1
depthwise-conv2d	Kernel size	w, h: [1, 16]	w, h: [3]	Not supported
	Strides	w, h: [1, 8]	w, h: [1, 2]
	dilation	dilation * input_channel <= 256 * channel_parallel
	Paddings	pad_left, pad_right: [0, (kernel_w - 1) * dilation_w + 1]
	Paddings	pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h + 1]
	In Size	kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6	ReLU, ReLU6
	Group* (Caffe)	group==input_channel
transposed-conv2d	Kernel size	kernel_w/stride_w, kernel_h/stride_h: [1, 16]
	Strides	kernel_w/stride_w, kernel_h/stride_h: [1, 16]
	Paddings	pad_left, pad_right: [1, kernel_w-1]
	Paddings	pad_top, pad_bottom: [1, kernel_h-1]
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, LeakyReLU, ReLU6	ReLU, ReLU6	ReLU, LeakyReLU, ReLU6	ReLU, LeakyReLU, ReLU6	ReLU, LeakyReLU
depthwise-transposed-conv2d	Kernel size	kernel_w/stride_w, kernel_h/stride_h: [1, 16]	kernel_w/stride_w, kernel_h/stride_h: [3]	Not supported
	Strides	kernel_w/stride_w, kernel_h/stride_h: [1, 16]	kernel_w/stride_w, kernel_h/stride_h: [3]
	Paddings	pad_left, pad_right: [1, kernel_w-1]
	Paddings	pad_top, pad_bottom: [1, kernel_h-1]
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6	ReLU, ReLU6
max-pooling	Kernel size	w, h: [2, 8]	w, h: {2, 3, 5, 7, 8}	w, h: [1, 8]	w, h: [2, 8]	w, h: {1, 2, 3, 7}
	Strides	w, h: [1, 8]	w, h: [1, 8]	w, h: [1, 8]	w, h: [1, 4]	w, h: [1, 8]
	Paddings	pad_left, pad_right: [1, kernel_w-1]
	Paddings	pad_top, pad_bottom: [1, kernel_h-1]
	Activation	ReLU	not supported	ReLU	ReLU	not supported
average-pooling	Kernel size	w, h: [2, 8] w==h	w, h: {2, 3, 5, 7, 8} w==h	w, h: [1, 8] w==h	w, h: [2, 8] w==h	w, h: {1, 2, 3, 7} w==h
	Strides	w, h: [1, 8]	w, h: [1, 8]	w, h: [1, 8]	w, h: [1, 4]	w, h: [1, 8]
	Paddings	pad_left, pad_right: [1, kernel_w-1]
	Paddings	pad_top, pad_bottom: [1, kernel_h-1]
	Activation	ReLU	not support	ReLU	ReLU	not support
eltwise-sum	Input Channel	input_channel <= 256 * channel_parallel
eltwise-sum	Activation	ReLU	ReLU	ReLU	ReLU	ReLU
concat	Network-specific limitation, which relates to the size of feature maps, quantization results and compiler optimizations.
reorg	Strides	reverse==false : stride ^ 2 * input_channel <= 256 * channel_parallel reverse==true : input_channel <= 256 * channel_parallel
pad	In Size	input_channel <= 256 * channel_parallel
pad	Mode	"SYMMETRIC" ("CONSTANT" pad would be fused into adjacent operators during compiler optimization process)
global pooling	Global pooling will be processed as general pooling with kernel size euqal to input tensor size.
InnerProduct, Fully Connected, Matmul	These ops will be transformed into conv2d op with kernel size equal to 1x1

The following operators are primitively defined in different deep learning frameworks. The compiler can automatically parse these operators, transform them into the XIR format, and distribute them to DPU or CPU. These operators are partially supported by the tools, and they are listed here for your reference.