Run application using HBM - 2022.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2022-12-01
Version
2022.2 English

You will perform the following 3 experiments here.

  1. Kernel ports, in1 and in2 read from 2 HBM PCs. Host sends 512MB data to HBM.

  2. Kernel port, in1, and in2 read from 2 HBM PCs. The host sends more data than 512MB. This configuration will result in an application error since you are accessing more than 512MB.

  3. Kernel ports, in1 and in2 share the same HBM PC.

The contents of the example connectivity file, HBM_connectivity.cfg are shown below. Makefile target will create this file automatically based on argument, banks.

[connectivity]
sp=vadd_1.in1:HBM[0:1]
sp=vadd_1.in2:HBM[2:3]
sp=vadd_1.out:HBM[4:5]
  1. Run the following command to use the application with HBM memory of size 512MB for in1,in2, and out ports.

#make hbm_addSeq_2Banks_build  - executed already in first module.
make hbm_addSeq_2Banks

The above command is equivalent of

make run TARGET=hw memtype=HBM banks=0_1 dsize=512 buildxclbin=0
  • dsize=512 sets the data size to be accessed from HBM by kernel port in1 and in2.

  • banks=0_1 will create HBM_connectivity.cfg file with contents shown as above in appropriate builddir, ../build/HBM_2Banks_d512_txSize64

cd ./../build/HBM_addSeq_2Banks_d512_txSize64 &&  ./host vadd_hw.xclbin 512 0 1 64;

 Total Data of 512.000 Mbytes to be written to global memory from host

 The kernel is invoked 1 time and repeats itself one time

Found Platform
Platform Name: Xilinx
DEVICE xilinx_u50_gen3x16_xdma_201920_3
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
kernel_time_in_sec = 0.0413112
Duration using events profiling: 41136148 ns
 match_count = 134217728 mismatch_count = 0 total_data_size = 134217728
Throughput Achieved = 13.0511 GB/s
TEST PASSED
  1. If the host transfers data equivalent to more than 512MB, the application will have the following error.

Run the following command

make run TARGET=hw memtype=HBM banks=0_1 dsize=600

The application run results into error as shown below.

cd ./../build/HBM_addSeq_2Banks_d512_txSize64 &&  ./host vadd_hw.xclbin 600 0 1 64;

 Total Data of 600.000 Mbytes to be written to global memory from host

 The kernel is invoked 1 time and repeats itself 1 times.

Found Platform
Platform Name: Xilinx
DEVICE xilinx_u50_gen3x16_xdma_201920_3
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
XRT build version: 2.8.743
Build hash: 77d5484b5c4daa691a7f78235053fb036829b1e9
Build date: 2020-11-16 00:19:11
Git branch: 2020.2
PID: 17233
UID: 31781
[Mon Jan 11 19:28:15 2021 GMT]
HOST: xcodpeascoe40
EXE: /scratch/ravic/Vitis-In-Depth-Tutorial/Runtime_and_System_Optimization/Feature_Tutorials/04-using-hbm/build/HBM_addSeq_2Banks_d512_txSize64/host
[XRT] ERROR: std::bad_alloc
./../reference_files/host.cpp:162 Error calling err = krnl_vector_add.setArg(2, buffer_output[j]), error code is: -5
[XRT] WARNING: Profiling may contain incomplete information. Please ensure all OpenCL objects are released by your host code (e.g., clReleaseProgram()).
Makefile:102: recipe for target 'run' failed
make: *** [run] Error 1

As expected, the application results in error as you are trying to create a 600 MB buffer in HBM[0:1]. XRT sees this as a contiguous memory of 256*2 = 512MB, but the host exceeds this size limit, resulting in an application error.

The provided Makefile adds the flexibility of creating your custom connectivity file by either using the banks argument. Make target has functionality available in mem_connectivity.mk to create the memory connectivity file.

  1. If the application doesn’t require the full memory bank, Vitis flow also provides the capability of sharing the memory banks across the ports. Here is one example of connectivity for sharing banks between ports in1 and in2.

[connectivity]
sp=vadd_1.in1:HBM[0:1]
sp=vadd_1.in2:HBM[1:2]
sp=vadd_1.out:HBM[3:4]

The ports in1 and in2 and sharing bank 1 of HBM. So the application can create buffers for each kernel port with 384MB as maximum size.

Run the following command to use the application with HBM memory of size 384MB for in1,in2, and out ports.

#make hbm_addSeq_overlap_build  - executed already in first module.
make hbm_addSeq_overlap

The above command shows the following results.

*** Running hw mode ***  Use Command Line to run application!
cd ./../build/HBM_overlapBanks_d512_txSize64 &&  ./host vadd_hw.xclbin 384 0 1 64;

 Total Data of 384.000 Mbytes to be written to global memory from host

 Kernel is invoked 1 time and repeats itself 1 times

Found Platform
Platform Name: Xilinx
DEVICE xilinx_u50_gen3x16_xdma_201920_3
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
kernel_time_in_sec = 0.0311151
Duration using events profiling: 30897093 ns
 match_count = 100663296 mismatch_count = 0 total_data_size = 100663296
Throughput Achieved = 13.0321 GB/s
TEST PASSED

When multiple ports are sharing overlapping bank and one (or more) of the buffer trying to utilize the overlapping portion, the order of assigning buffers (in the host code) to the corresponding kernel ports can become important. In this particular example both buffers for the ports in1 and in2 are trying to utilize the overlapping bank 1 when each of them allocating 384Mb. Hence the host application must assigns buffer for in1 first and then assigns buffer for in2. Reversing this sequence will result into the bad alloc error. This is demonstrated in the following Figure.

Buffer Assignment for overlapping banks

In other words, there is no Lazy Allocation. The buffers are allocated upfront (and immediately) following the host code buffer handling order.

Additionally, you can also connect all the 32 HBM banks to each of the kernel ports based on the application requirement. This way, the whole memory space will be available to all the ports. The overall HBM efficiency will vary based on the access pattern and how many channels are being accessed, as described in the previous tutorial module.