2022.2 English

Let’s run the hardware application using DDR with a size of 600MB, sequential address pattern, and enqueue the kernel one time. The host will migrate 600MB to DDR0 (buffer_in1) and DDR1(buffer_in2) respectively. The kernel will perform the compute and store the results in DDR2, buffer_output.

Here is the makefile command to run ()

#make ddr_addSeq_build  - executed already in first module.
make ddr_addSeq

The above run command essentially expands to the following.

make run TARGET=hw memtype=DDR dsize=600 addrndm=0 krnl_loop=1 buildxclbin=0
  • memtype sets memory as DDR or HBM

  • dsize is the amount of data migrated by the host to memory banks and accessed by the kernel ports, in1 and in2

  • kernel_loop sets the number of time the kernel loop repeats

  • buildxclbin=0 will not generate the new xclbin.

  • txSize is set to 64 by default. It’s the size of transactions issued by kernel port while accessing memory.

The make command will geneated build directory shown as ../build/DDR_Banks_d512_txSize64

TARGET=hw_emu can also be used for running hardware emulation, but this will take significant time to run the application for a 600MB size buffer. For this reason, the application is run on hardware by using TARGET=hw

The above commands to run the application on hardware show the following results

*** Running hw mode ***  Use Command Line to run application!
cd ./../build/DDR_Banks_d512_txSize64 &&  ./host vadd_hw.xclbin 600 0 1 64;

 Total Data of 600.000 Mbytes to be written to global memory from host

 Kernel is invoked 1 time and repeats itself 1 times

Found Platform
Platform Name: Xilinx
DEVICE xilinx_u200_gen3x16_xdma_2_202110_1
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
kernel_time_in_sec = 0.0416315
Duration using events profiling: 41473086 ns
 match_count = 157286400 mismatch_count = 0 total_data_size = 157286400
Throughput Achieved = 15.17 GB/s

The host is migrating 600MB of data to both DDR0 and DDR1. The kernel accesses this data using in1, in2 ports from DDR0 and DDR1, respectively. The vector addition is performed by kernel, and results are written to DDR2. These results from DDR2 are migrated back to the host. The next section goes over the steps required to migrate this DDR based application to HBM.