This example shows how various xfOpenCV funtions can be used to accelerate preprocessing of input images before feeding them to a Deep Neural Network (DNN) accelerator.
This specific application shows how pre-processing for Googlenet_v1 can be accelerated which involves resizing the input image to 224 x 224 size followed by mean subtraction. The two main
functions from Vitis vision library which are used to build this pipeline are xf::cv::resize()
and xf::cv::preProcess()
which operate in dataflow.
The following code shows the top level wrapper containing the xf::cv::resize()
and xf::cv::preProcess()
calls.
void pp_pipeline_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows_in, int cols_in, int rows_out, int cols_out, float params[3*T_CHANNELS], int th1, int th2)
{
//HLS Interface pragmas
#pragma HLS INTERFACE m_axi port=img_inp offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi port=img_out offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi port=params offset=slave bundle=gmem3
#pragma HLS INTERFACE s_axilite port=rows_in bundle=control
#pragma HLS INTERFACE s_axilite port=cols_in bundle=control
#pragma HLS INTERFACE s_axilite port=rows_out bundle=control
#pragma HLS INTERFACE s_axilite port=cols_out bundle=control
#pragma HLS INTERFACE s_axilite port=th1 bundle=control
#pragma HLS INTERFACE s_axilite port=th2 bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
xf::cv::Mat<XF_8UC3, HEIGHT, WIDTH, NPC1> imgInput0(rows_in, cols_in);
xf::cv::Mat<TYPE, NEWHEIGHT, NEWWIDTH, NPC_T> out_mat(rows_out, cols_out);
hls::stream<ap_uint<256> > resizeStrmout;
int srcMat_cols_align_npc = ((out_mat.cols + (NPC_T - 1)) >> XF_BITSHIFT(NPC_T)) << XF_BITSHIFT(NPC_T);
#pragma HLS DATAFLOW
xf::cv::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC3,HEIGHT, WIDTH, NPC1> (img_inp, imgInput0);
xf::cv::resize<INTERPOLATION,TYPE,HEIGHT,WIDTH,NEWHEIGHT,NEWWIDTH,NPC_T,XF_USE_URAM,MAXDOWNSCALE> (imgInput0, out_mat);
xf::cv::accel_utils obj;
obj.xfMat2hlsStrm<INPUT_PTR_WIDTH, TYPE, NEWHEIGHT, NEWWIDTH, NPC_T, (NEWWIDTH*NEWHEIGHT/8)>(out_mat, resizeStrmout, srcMat_cols_align_npc);
xf::cv::preProcess <INPUT_PTR_WIDTH, OUTPUT_PTR_WIDTH, T_CHANNELS, CPW, HEIGHT, WIDTH, NPC_TEST, PACK_MODE, X_WIDTH, ALPHA_WIDTH, BETA_WIDTH, GAMMA_WIDTH, OUT_WIDTH, X_IBITS, ALPHA_IBITS, BETA_IBITS, GAMMA_IBITS, OUT_IBITS, SIGNED_IN, OPMODE> (resizeStrmout, img_out, params, rows_out, cols_out, th1, th2);
}
This piepeline is integrated with Deep learning Processign Unit (DPU) as part of Vitis-AI-Library and achieved 11 % speed up compared to software pre-procesing.
- Overall Performance (Images/sec):
- with software pre-processing : 125 images/sec
- with hardware accelerated pre-processing : 140 images/sec