The input image are typically pre-processed before being fed for inference of different deep neural networks (DNNs). The preProcess function provides various modes to perform various preprocessing operations. The preprocessing function\(\ f(x\)) can be described using below equations.
The preProcess function supports operating modes presented in the below table:
Op Code Operation Description 0 Mean subtraction 1 Scale and clip 2 Clipping 3 Scale and bias 4 Scale and bias with mean subtraction 5 Complete operation
API Syntax
template <int INPUT_PTR_WIDTH_T,int OUTPUT_PTR_WIDTH_T, int T_CHANNELS_T, int CPW_T, int ROWS_T, int COLS_T, int NPC_T, bool PACK_MODE_T, int WX_T, int WA_T, int WB_T, int WY_T, int WO_T, int FX_T, int FA_T, int FB_T, int FY_T,int FO_T, bool SIGNED_IN_T, int OPMODE_T>
void preProcess(hls::stream<ap_uint<INPUT_PTR_WIDTH_T> > &srcStrm, ap_uint<OUTPUT_PTR_WIDTH_T> \*out, float params[3*T_CHANNELS_T], int rows, int cols, int th1, int th2)
The following table describes the template and the function parameters.
Parameter | Description |
---|---|
srcStrm | Input image stream |
out | Output pointer |
params | Array containing α, β and γ values |
rows | Input image height |
cols | Input image width |
th1 | Upper threshold |
th2 | Lower threshold |
INPUT_PTR_WIDTH_T | Width of input pointer |
OUTPUT_PTR_WIDTH_T | Width of output pointer |
T_CHANNELS_T | Total Channels |
CPW_T | Channels Packed per DDR Word |
ROWS_T | Max Height of Image |
COLS_T | Max Width of Image |
NPC_T | Number of pixels processed per clock |
PACK_MODE_T | data format (pixel packed or channel packed) |
WX_T | x bit width |
WA_T | alpha bit width |
WB_T | beta bit width |
WY_T | Gamma bit width |
WO_T | Output bit width |
FX_T | Number of integer bits for x |
FA_T | Number of integer bits for alpha |
FB_T | Number of integer bits for beta |
FY_T | Number of integer bits for gamma |
FO_T | Number of integer bits for output |
SIGNED_IN_T | Signed input flag |
OPMODE_T | Operating mode |
Resource Utilization
The following table summarizes the resource utilization of preProcess for NPC_T =8, CPW_T=3 and OPMODE=0, for a maximum input image size of 1280x720 pixels. The results are after synthesis in Vitis 2019.2 for the Xilinx xcu200-fsgd2104-2-e FPGA at 300 MHz. Latency for this configuration is 0.7 ms.
Operating Mode Operating Frequency
(MHz)
Utilization Estimate BRAM_18K DSP_48Es FF LUT SLICE 8 pixel 300 0 2 7554 11127 2155