DPU Configuration - 1.0 English

The DPU core provides some user-configurable parameters to optimize resource usage and customize different features. Different configurations can be selected for DSP slices, LUT, block RAM, and UltraRAM usage based on the amount of available programmable logic resources. There is an option to determine the frequency of DPU cores that is implemented on the board. The deep neural network features and the associated parameters supported by the DPU are shown in the following table.

Table 1. Deep Neural Network Features and Parameters Supported by DPU
Features	Description (channel_parallel=32)
conv2d	Kernel Sizes	kernel_w: [1, 16] kernel_h: [1, 16]
	Strides	stride_w: [1, 4] stride_h: [1, 4]
	Pad_left/Pad_right	[0, (kernel_w - 1) * dilation_w + 1]
	Pad_top/Pad_bottom	[0, (kernel_h - 1) * dilation_h + 1]
	In Size	kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6
	Dilation	dilation * input_channel <= 256 * channel_parallel
depthwise-conv2d	Kernel Sizes	kernel_w: [1,3] kernel_h: [1,3]
	Strides	stride_w: [1, 4] stride_h: [1, 4]
	Pad_left/Pad_right	[1, (kernel_w - 1)]
	Pad_top/Pad_bottom	[1, (kernel_h - 1)]
	In Size	kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6
transposed-conv2d	Kernel Sizes	kernel_w: [1, 16] kernel_h: [1, 16]
	Strides	stride_w: [1, 16] stride_h: [1, 16]
	Pad_left/Pad_right	[1, kernel_w-1]
	Pad_top/Pad_bottom	[1, kernel_h-1]
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6
depthwise-transposed-conv2d	Kernel Sizes	kernel_w/stride_w, kernel_h/stride_h : {3}
	Strides	kernel_w/stride_w, kernel_h/stride_h : {3}
	Pad_left/Pad_right	[1, kernel_w-1]
	Pad_top/Pad_bottom	[1, kernel_h-1]
	Out Size	output_channel <= 256 * channel_parallel
	Activation	ReLU, ReLU6
average-pooling	Kernel Sizes	kernel_w, kernel_h: {2, 3, 5, 7, 8} kernel_w==kernel_h
	Strides	stride_w: [1, 8] stride_h: [1, 8]
	Pad_left/Pad_right	[1, kernel_w-1]
	Pad_top/Pad_bottom	[1, kernel_h-1]
elementwise-sum	Input channel	input_channel <= 256 *channel_parallel[1, 8912]
	Activation	ReLU
	Type	Sum
Concat	Network-specific limitation related to the size of feature maps, quantization results, and compiler optimizations.
Fully Connected	Input Channel	input channel <= 161632