Now the test bench will run the case 10 times to calculate an average speed of the kernel
# build and run JPEG Decoder using U200 platform make run TARGET=hw PLATFORM=xilinx_u200_gen3x16_xdma_2_202110_1.xpfm
Building xclbin will take about 4 hours, take a coffee break.
Example output:
Found Platform Platform Name: Xilinx INFO: Found Device=xilinx_u200_gen3x16_xdma_2_202110_1 INFO: Importing kernelJpegDecoder.xclbin Loading: 'kernelJpegDecoder.xclbin' INFO: Kernel has been created INFO: Finish kernel setup ... INFO: Finish kernel execution INFO: Finish E2E execution INFO: Data transfer from host to device: 108 us INFO: Data transfer from device to host: 726 us INFO: Average kernel execution per run: 1515 us ... INFO: android.yuv will be generated from the jpeg decoder's output INFO: android.yuv is generated correctly
So for this 1280x960 android.jpg file the output throughput is about 1216MB/s ( (1280x960x3)/2/1515 ).
To check the output yuv file, download https://sourceforge.net/projects/raw-yuvplayer/ . Then upload the rebuild_image.yuv, set the right sample radio and custom size on the software, and check the yuv file.