To ensure the application passes Software Emulation with your changes, run the following command.
cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer TARGET=sw_emu
Make sure that the Software Emulation is passing.
Next, to verify the functionality is intact, use the following command to run Hardware Emulation.
cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer TARGET=hw_emu
The commands show that the SIMULATION is PASSED. This ensures that the generated hardware is functionally correct. However, you have not run the hardware on the FPGA. .
NOTE: This tutorial is provided with
xclbin
files in the$LAB_WORK_DIR/xclbin_save
directory. TheSOLUTION=1
option can be added to the make target for using thesexclbin
files forhw
runs. Thesexclbin
files were generated for Alveo U200 cards only. You must generate newxclbin
files for every platform used in this tutorial.Run the following steps to execute the application on hardware.
You are using 100,000 documents compute on the hardware.
cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer ITER=1 PF=4 TARGET=hw
If you are using an
xclbin
provided as part of solution in this tutorial, then use the following command.cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer ITER=1 PF=4 TARGET=hw SOLUTION=1
To use four words in parallel,
PF=4
will set the PARALLELIZATION macro to 4 in$LAB_WORK_DIR/reference_files/compute_score_fpga_kernel.cpp
.ITER=1
indicates buffer sent using single iteration (using a single buffer).
The following output displays.
Loading runOnfpga_hw.xclbin Processing 1398.903 MBytes of data Running with a single buffer of 1398.903 MBytes for FPGA processing -------------------------------------------------------------------- Executed FPGA accelerated version | 838.5898 ms ( FPGA 447.964 ms ) Executed Software-Only version | 3187.0354 ms -------------------------------------------------------------------- Verification: PASS
Total FPGA time is 447 ms. This includes the host to DDR transfer, Total Compute on FPGA and DDR to host transfer.
Total time of computing 100,000 documents is about 838 ms.
At this point, review the Profile reports and Timeline Trace to extract information, such as how much time it takes to transfer the data between host and kernel and how much time it takes to compute on the FPGA.