Currently Supported Operators

Table 1. Currently Supported Operators
Typical Operation Type in CNN	Parameters	DPUCZDX8G_ISA1_B4096 ³(ZCU102, ZCU104)	DPUCAHX8L_ISA0 (U50, U50LV, U280)	DPUCVDX8G_ISA3_C32B3 ⁴(VCK190)	DPUCAHX8H_ISA2_DWC ¹(U50, U55C, U50LV, U280)	DPUCADF8H_ISA0 (U200, U250)	DPUCVDX8H_ISA1_F2W4_4PE² (VCK5000)	DPUCV2DX8G_ISA1_C20B1 ⁵(VEK280/V70)
Intrinsic Parameter		channel_parallel: 16 bank_depth: 2048 bank_num: 8	channel_parallel: 32 bank_depth: 4096	channel_parallel: 16 bank_depth: 8192 bank_num: 8	channel_parallel: 16 bank_depth: 2048	channel_parallel: 16 bank_depth: 8192	channel_parallel: 64 bank_depth: 2048	channel_parallel: 32 bank_depth: 65528 bank_num: 1
conv2d	Kernel size	w, h: [1, 16]	w, h: [1, 16]	w, h: [1, 16] w * h * ceil(input_channel/2048) <= 64	w, h: [1, 16]	w, h: [1, 16]	w, h: [1, 16]	w, h: [1, 16] 256 * h * w <= 13760
	Strides	w, h: [1, 8]	w, h: [1, 4]	w, h: [1, 8]	w, h: [1, 4]	w, h: [1, 8]	w, h: [1, 4]	w, h: [1, 8]
	Dilation	dilation * input_channel <= 256 * channel_parallel
	Paddings	pad_left, pad_right: [0, (kernel_w - 1) * dilation_w]
	Paddings	pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h]
	In Size	kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth						kernel_w * kernel_h * ceil(input_channel / channel_parallel) * ceil(channel_parallel / 4) + 4 <= bank_depth
	In Size	input_channel <= 256 * channel_parallel		input_channel <= 256 * channel_parallel				input_channel <= 256 * channel_parallel
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid	ReLU, ReLU6	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid	ReLU, LeakyReLU, ReLU6	ReLU, LeakyReLU	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
	Group* (Caffe)	group==1
depthwise-conv2d	Kernel size	w, h: [1, 256]	w, h: [3]	w, h: [1, 256]	w, h: {1, 2, 3, 5, 7}	Not supported	w, h: [1, 8]	w, h: [1, 256] h * w <= 431
	Strides	w, h: [1, 256]	w, h: [1, 2]	w, h: [1, 256]	w, h: [1, 4]		w, h : [1, 4]	w, h: [1, 256]
	dilation	dilation * input_channel <= 256 * channel_parallel					dilation * input_channel <= 256 * channel_parallel
	Paddings	pad_left, pad_right: [0, min((kernel_w - 1), 15) * dilation_w]	pad_left, pad_right: [0, (kernel_w - 1) * dilation_w]	pad_left, pad_right: [0, min((kernel_w-1), 15) * dilation_w]	pad_left, pad_right: [0, (kernel_w - 1) * dilation_w]		pad_left, pad_right: [0, (kernel_w - 1) * dilation_w]	pad_left, pad_right: [0, min((kernel_w-1), 15) * dilation_w]
	Paddings	pad_top, pad_bottom: [0, min((kernel_h - 1), 15) * dilation_h]	pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h]	pad_top, pad_bottom: [0, min((kernel_h-1), 15) * dilation_h]	pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h]		pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h]	pad_top, pad_bottom: [0, min((kernel_h-1), 15) * dilation_h]
	In Size	kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth					kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth	(6 * stride_w + kernel_w) * kernel_h + 4 <= 512
	Out Size	output_channel <= 256 * channel_parallel					output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6, LeakyReLU⁶, Hard-Swish, Hard-Sigmoid	ReLU, ReLU6	ReLU, ReLU6, LeakyReLU⁷, Hard-Swish, Hard-Sigmoid	ReLU, ReLU6		ReLU, ReLU6	ReLU, ReLU6, LeakyReLU, Hard-Swish, Hard-Sigmoid
	Group* (Caffe)	group==input_channel					group==input_channel
transposed-conv2d	Kernel size	kernel_w/stride_w, kernel_h/stride_h: [1, 16]
	Strides	kernel_w/stride_w, kernel_h/stride_h: [1, 16]
	Paddings	pad_left, pad_right: [0, kernel_w-1]
	Paddings	pad_top, pad_bottom: [0, kernel_h-1]
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid	ReLU, ReLU6	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid	ReLU, LeakyReLU, ReLU6	ReLU, LeakyReLU	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid	ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
depthwise-transposed-conv2d	Kernel size	kernel_w/stride_w, kernel_h/stride_h: [1, 256]	kernel_w/stride_w, kernel_h/stride_h: [3]	kernel_w/stride_w, kernel_h/stride_h: [1, 256]	kernel_w/stride_w, kernel_h/stride_h: {1,2, 3, 5, 7}	Not supported	kernel_w/stride_w, kernel_h/stride_h: [1, 8]	kernel_w/stride_w, kernel_h/stride_h: [1, 256]
	Strides	kernel_w/stride_w, kernel_h/stride_h: [1, 256]	kernel_w/stride_w, kernel_h/stride_h: [3]	kernel_w/stride_w, kernel_h/stride_h: [1, 256]	kernel_w/stride_w, kernel_h/stride_h: {1,2, 3, 5, 7}		kernel_w/stride_w, kernel_h/stride_h: [1, 8]	kernel_w/stride_w, kernel_h/stride_h: [1, 256]
	Paddings	pad_left, pad_right: [0, min((kernel_w-1), 15)]	pad_left, pad_right: [1, kernel_w-1]	pad_left, pad_right: [0, min((kernel_w-1),15)]	pad_left, pad_right: [1, kernel_w-1]		pad_left, pad_right: [1, kernel_w-1]	pad_left, pad_right: [0, min((kernel_w-1),15)]
	Paddings	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]	pad_top, pad_bottom: [1, kernel_h-1]	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]	pad_top, pad_bottom: [1, kernel_h-1]		pad_top, pad_bottom: [1, kernel_h-1]	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]
	Out Size	output_channel <= 256 * channel_parallel					output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6, LeakyReLU⁶, Hard-Swish, Hard-Sigmoid	ReLU, ReLU6	ReLU, ReLU6, LeakyReLU⁷, Hard-Swish, Hard-Sigmoid	ReLU, ReLU6		ReLU, ReLU6	ReLU, ReLU6, LeakyReLU, Hard-Swish, Hard-Sigmoid
max-pooling	Kernel size	w, h: [1, 256] ceil(h/bank_num) * w <= bank_depth	w, h: {2, 3, 5, 7, 8}	w, h: [1, 256] ceil(h/bank_num) * w <= bank_depth	w, h: [1, 8]	w, h: [1, 16]	w, h: [1, 128]	w, h: [1, 256] h * w <= bank_depth
	Strides	w, h: [1, 256]	w, h: [1, 8]	w, h: [1, 256]	w, h: [1, 8]	w, h: [1, 8]	w, h: [1, 128]	w, h: [1, 256]
	Paddings	pad_left, pad_right: [0, min((kernel_w-1), 15)]	pad_left, pad_right: [1, kernel_w-1]	pad_left, pad_right: [0, min((kernel_w-1), 15)]	pad_left, pad_right: [1, kernel_w-1]			pad_left, pad_right: [0, min((kernel_w-1), 15)]
	Paddings	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]	pad_top, pad_bottom: [1, kernel_h-1]	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]	pad_top, pad_bottom: [1, kernel_h-1]			pad_top, pad_bottom: [0, min((kernel_h-1), 15)]
	Activation	ReLU, ReLU6	not supported	ReLU, ReLU6	not supported	ReLU	not supported	ReLU, ReLU6
average-pooling	Kernel size	w, h: [1, 256] ceil(h/bank_num) * w <= bank_depth	w, h: {2, 3, 5, 7, 8} w==h	w, h: [1, 256] ceil(h/bank_num) * w <= bank_depth	w, h: [1, 8] w==h	w, h: [1, 16]	w, h: [1, 128] w==h	w, h: [1, 256] h * w <= bank_depth
	Strides	w, h: [1, 256]	w, h: [1, 8]	w, h: [1, 256]	w, h: [1, 8]	w, h: [1, 8]	w, h: [1, 128]	w, h: [1, 256]
	Paddings	pad_left, pad_right: [0, min((kernel_w-1), 15)]	pad_left, pad_right: [1, kernel_w-1]	pad_left, pad_right: [0, min((kernel_w-1), 15)]	pad_left, pad_right: [1, kernel_w-1]			pad_left, pad_right: [0, min((kernel_w-1), 15)]
	Paddings	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]	pad_top, pad_bottom: [1, kernel_h-1]	pad_top, pad_bottom: [0, min((kernel_h-1), 15)]	pad_top, pad_bottom: [1, kernel_h-1]			pad_top, pad_bottom: [0, min((kernel_h-1), 15)]
	Activation	ReLU, ReLU6	not supported	ReLU, ReLU6	not supported	ReLU	not supported	ReLU, ReLU6
eltwise	type	sum, prod	sum	sum, prod	sum	sum	sum, prod	2-input sum, prod
	Input Channel	input_channel <= 256 * channel_parallel
	Activation	ReLU	ReLU	ReLU	ReLU	ReLU	ReLU, Hard-Sigmoid	ReLU
concat	Network-specific limitation, which relates to the size of feature maps, quantization results and compiler optimizations.
reorg	Strides	reverse==false : stride ^ 2 * input_channel <= 256 * channel_parallel reverse==true : input_channel <= 256 * channel_parallel
pad	In Size	input_channel <= 256 * channel_parallel
pad	Mode	"SYMMETRIC" ("CONSTANT" pad(value=0) would be fused into adjacent operators during compiler optimization process)					"SYMMETRIC", "CONSTANT" (all padding value are identical)	"SYMMETRIC" ("CONSTANT" pad(value=0) would be fused into adjacent operators during compiler optimization process)
global pooling	Global pooling will be processed as general pooling with kernel size equal to input tensor size.
InnerProduct, Fully Connected, Matmul	These ops will be transformed into conv2d op
resize	scale	NEAREST: ceil(scale/bank_num) * scale * ceil(input_channel/channel_parallel) <= bank_depth BILINEAR: only for 4-D feature maps. This would be transformed into a pad and depthwise-transposed-conv2d. TRILINEAR: only for 5-D feature maps. This would be transformed into a pad and transposed-conv3d.
resize	mode	NEAREST, BILINEAR	NEAREST, BILINEAR	NEAREST, BILINEAR, TRILINEAR	NEAREST, BILINEAR	NEAREST, BILINEAR	NEAREST, BILINEAR	NEAREST, BILINEAR
conv3d	kernel size	Not supported	Not supported	w, h, d: [1, 16] w * h * ceil(ceil(input_channel/16) * 16 * d / 2048) <= 64	Not supported	Not supported	Not supported	Not supported
	strides			w, h, d: [1, 8]
	paddings			pad_left, pad_right: [0, kernel_w-1] pad_top, pad_bottom: [0, kernel_h-1] pad_front, pad_back: [0, kernel_d-1]
	In size			kernel_w * kernel_h * kernel_d * ceil(input_channel/channel_parallel) <= bank_depth, input_channel <= 256 * channel_parallel
	Out size			output_channel <= 256 * channel_parallel
	Activation			ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
depthwise-conv3d	kernel size	Not supported	Not supported	w, h: [1, 256] d: [1, 16]	Not supported	Not supported	Not supported	Not supported
	strides			w, h: [1, 256] d=1
	paddings			pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_front, pad_back: [0, min((kernel_d-1), 15)]
	In size			kernel_w * kernel_h * kernel_d * ceil(input_channel/channel_parallel) <= bank_depth
	Out size			output_channel <= 256 * channel_parallel
	Activation			ReLU, ReLU6
transposed-conv3d	kernel size	Not supported	Not supported	kernel_w/stride_w, kernel_h/stride_h, kernel_d/stride_d: [1, 16]	Not supported	Not supported	Not supported	Not supported
	strides
	paddings			pad_left, pad_right: [0, kernel_w-1] pad_top, pad_bottom: [0, kernel_h-1] pad_front, pad_back: [0, kernel_d-1]
	Out size			output_channel <= 256 * channel_parallel
	Activation			ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
depthwise-transposed-conv3d	kernel size	Not supported	Not supported	kernel_w/stride_w, kernel_h/stride_h, kernel_d/stride_d: [1, 16]	Not supported	Not supported	Not supported	Not supported
	strides
	paddings			pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_front, pad_back: [0, min((kernel_d-1), 15)]
	Out size			output_channel <= 256 * channel_parallel
	Activation			ReLU, ReLU6
Strided_slice	Stride	Stride_batch = 1 Stride_channel = 1
correlation1d_elemwise	input size	input_channel <= 256 * channel_parrallel	Not supported	input_channel <= 256 * channel_parrallel	Not supported	Not supported	Not supported	Not supported
correlation2d_elemwise	input size	input_channel <= 256 * channel_parrallel	Not supported	input_channel <= 256 * channel_parrallel	Not supported	Not supported	Not supported	Not supported
argmax	axis	axis = input_channel	Not supported	axis = input_channel	Not supported	Not supported	Not supported	axis = input_channel
argmax	input size	input_channel < =128	Not supported	input_channel < =128	Not supported	Not supported	Not supported	input_channel < =128
reduction max	axis	axis = input_channel	Not supported	axis = input_channel	Not supported	Not supported	Not supported	axis = input_channel
reduction max	input size	input_channel < 2 ^ 12	Not supported	input_channel < 2 ^ 12	Not supported	Not supported	Not supported	input_channel < 2 ^ 12
cost_volume	input size	input_channel <= 256 * channel_parallel	Not supported	input_channel <= 256 * channel_parallel	Not supported	Not supported	Not supported	Not supported
transpose
For DPUCAHX8H, only list DPUCAHX8H_ISA2_DWC here. For more IP configurations, see DPUCAHX8H for Convolutional Neural Networks Product Guide (PG367) For DPUCVDX8H, only list DPUCVDX8H_ISA1_F2W4_4PE here. For more IP configurations, see DPUCVDX8H for Convolutional Neural Networks LogiCORE IP (PG403) For DPUCZDX8G, only list DPUCZDX8G_ISA1_B4096 here. For more IP Configurations, see DPUCZDX8G for Zynq UltraScale+ MPSoCs (PG338) For DPUCVDX8G, only list DPUCVDX8G_ISA3_C32B3 here. For more IP Configurations, see DPUCVDX8G for Versal Adaptive SoCs Product Guide (PG389) For DPUCV2DX8G, only list DPUCV2DX8G_ISA1_C20B1 here. For more IP Configurations, see DPUCV2DX8G for Versal Adaptive SoCs Product Guide(PG425) For DPUCZDX8G, the activation LeakyReLU for depthwise-conv like operators is not enabled by default. About how to enable this activation, please refer to DPUCZDX8G for Zynq UltraScale+ MPSoCs (PG338) For DPUCVDX8G, the activation LeakyReLU for depthwise-conv like operators is not enabled by default. About how to enable this activation, please refer to DPUCVDX8G for Versal Adaptive SoCs Product Guide (PG389)

Currently Supported Operators - 3.5 English

Vitis AI User Guide (UG1414)