Run Software Emulation, Hardware Emulation and Hw - 2022.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2022-12-01
Version
2022.2 English
  1. To ensure the application passes Software Emulation with your changes, run the following command.

    cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer TARGET=sw_emu 
    

    Make sure that the Software Emulation is passing.

  2. Next, to verify the functionality is intact, use the following command to run Hardware Emulation.

    cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer TARGET=hw_emu 
    
    • The commands show that the SIMULATION is PASSED. This ensures that the generated hardware is functionally correct. However, you have not run the hardware on the FPGA. .

    NOTE: This tutorial is provided with xclbin files in the $LAB_WORK_DIR/xclbin_save directory. The SOLUTION=1 option can be added to the make target for using these xclbin files for hw runs. These xclbin files were generated for Alveo U200 cards only. You must generate new xclbin files for every platform used in this tutorial.

  3. Run the following steps to execute the application on hardware.

    You are using 100,000 documents compute on the hardware.

    cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer ITER=1 PF=4 TARGET=hw
    
    • If you are using an xclbin provided as part of solution in this tutorial, then use the following command.

      cd $LAB_WORK_DIR/makefile; make run STEP=single_buffer ITER=1 PF=4 TARGET=hw SOLUTION=1
      
    • To use four words in parallel, PF=4 will set the PARALLELIZATION macro to 4 in $LAB_WORK_DIR/reference_files/compute_score_fpga_kernel.cpp.

    • ITER=1 indicates buffer sent using single iteration (using a single buffer).

    The following output displays.

    Loading runOnfpga_hw.xclbin
    Processing 1398.903 MBytes of data
      Running with a single buffer of 1398.903 MBytes for FPGA processing
    --------------------------------------------------------------------
    Executed FPGA accelerated version  |   838.5898 ms   ( FPGA 447.964 ms )
    Executed Software-Only version     |   3187.0354 ms
    --------------------------------------------------------------------
    Verification: PASS
    
    • Total FPGA time is 447 ms. This includes the host to DDR transfer, Total Compute on FPGA and DDR to host transfer.

    • Total time of computing 100,000 documents is about 838 ms.

At this point, review the Profile reports and Timeline Trace to extract information, such as how much time it takes to transfer the data between host and kernel and how much time it takes to compute on the FPGA.