Preprocessing for Deep Neural Networks - 2023.2 English

Vitis Libraries

Release Date
2023.2 English

The input image are typically pre-processed before being fed for inference of different deep neural networks (DNNs). The preProcess function provides various modes to perform various preprocessing operations. The preprocessing function\(\ f(x\)) can be described using below equations.


The preProcess function supports operating modes presented in the below table:

Op Code Operation Description
0 image165 Mean subtraction
1 image166 Scale and clip
2 image167 Clipping
3 image168 Scale and bias
4 image169 Scale and bias with mean subtraction
5 image170 Complete operation

API Syntax

template <int INPUT_PTR_WIDTH_T,int OUTPUT_PTR_WIDTH_T, int T_CHANNELS_T, int CPW_T, int ROWS_T, int COLS_T, int NPC_T, bool PACK_MODE_T, int WX_T, int WA_T, int WB_T, int WY_T, int WO_T, int FX_T, int FA_T, int FB_T, int FY_T,int FO_T, bool SIGNED_IN_T, int OPMODE_T>

void preProcess(hls::stream<ap_uint<INPUT_PTR_WIDTH_T> > &srcStrm, ap_uint<OUTPUT_PTR_WIDTH_T> \*out, float params[3*T_CHANNELS_T], int rows, int cols, int th1, int th2)

The following table describes the template and the function parameters.

Table 592 Table gammacorrection Parameter Description
Parameter Description
srcStrm Input image stream
out Output pointer
params Array containing α, β and γ values
rows Input image height
cols Input image width
th1 Upper threshold
th2 Lower threshold
INPUT_PTR_WIDTH_T Width of input pointer
OUTPUT_PTR_WIDTH_T Width of output pointer
T_CHANNELS_T Total Channels
CPW_T Channels Packed per DDR Word
ROWS_T Max Height of Image
COLS_T Max Width of Image
NPC_T Number of pixels processed per clock
PACK_MODE_T data format (pixel packed or channel packed)
WX_T x bit width
WA_T alpha bit width
WB_T beta bit width
WY_T Gamma bit width
WO_T Output bit width
FX_T Number of integer bits for x
FA_T Number of integer bits for alpha
FB_T Number of integer bits for beta
FY_T Number of integer bits for gamma
FO_T Number of integer bits for output
SIGNED_IN_T Signed input flag
OPMODE_T Operating mode

Resource Utilization

The following table summarizes the resource utilization of preProcess for NPC_T =8, CPW_T=3 and OPMODE=0, for a maximum input image size of 1280x720 pixels. The results are after synthesis in Vitis 2019.2 for the Xilinx xcu200-fsgd2104-2-e FPGA at 300 MHz. Latency for this configuration is 0.7 ms.

Operating Mode

Operating Frequency


Utilization Estimate        
8 pixel 300 0 2 7554 11127 2155