Run Ppplication Using HBM - 2023.1 English - XD099

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-08-02
Version
2023.1 English

You will perform the following three experiments here.

  1. Kernel ports, in1 and in2 read from two HBM PCs. Host sends 512 MB data to HBM.

  2. Kernel port, in1, and in2 read from two HBM PCs. The host sends more data than 512 MB. This configuration will result in an application error because you are accessing more than 512 MB.

  3. Kernel ports, in1 and in2 share the same HBM PC.

The contents of the example connectivity file, HBM_connectivity.cfg are shown below. Makefile target will create this file automatically based on argument, banks.

[connectivity]
sp=vadd_1.in1:HBM[0:1]
sp=vadd_1.in2:HBM[2:3]
sp=vadd_1.out:HBM[4:5]
  1. Run the following command to use the application with HBM memory of size 512 MB for in1, in2, and out ports.

    #make hbm_addSeq_2Banks_build  - executed already in first module.
    make hbm_addSeq_2Banks
    

    The above command is equivalent of:

    make run TARGET=hw memtype=HBM banks=0_1 dsize=512 buildxclbin=0
    
    • dsize=512: Sets the data size to be accessed from HBM by kernel port in1 and in2.

    • banks=0_1: Creates the HBM_connectivity.cfg file with contents shown as above in appropriate builddir, ../build/HBM_2Banks_d512_txSize64.

    cd ./../build/HBM_addSeq_2Banks_d512_txSize64 &&  ./host vadd_hw.xclbin 512 0 1 64;
    
    Total Data of 512.000 Mbytes to be written to global memory from host
    
    The kernel is invoked 1 time and repeats itself one time
    
    Found Platform
    Platform Name: Xilinx
    DEVICE xilinx_u50_gen3x16_xdma_201920_3
    INFO: Reading vadd_hw.xclbin
    Loading: 'vadd_hw.xclbin'
    - host loop iteration #0 of 1 total iterations
    kernel_time_in_sec = 0.0413112
    Duration using events profiling: 41136148 ns
    match_count = 134217728 mismatch_count = 0 total_data_size = 134217728
    Throughput Achieved = 13.0511 GB/s
    TEST PASSED
    
  2. If the host transfers data equivalent to more than 512MB, the application will have the following error.

    Run the following command

    make run TARGET=hw memtype=HBM banks=0_1 dsize=600
    

    The application run results into error as shown below.

    cd ./../build/HBM_addSeq_2Banks_d512_txSize64 &&  ./host vadd_hw.xclbin 600 0 1 64;
    
    Total Data of 600.000 Mbytes to be written to global memory from host
    
    The kernel is invoked 1 time and repeats itself 1 times.
    
    Found Platform
    Platform Name: Xilinx
    DEVICE xilinx_u50_gen3x16_xdma_201920_3
    INFO: Reading vadd_hw.xclbin
    Loading: 'vadd_hw.xclbin'
    - host loop iteration #0 of 1 total iterations
    XRT build version: 2.8.743
    Build hash: 77d5484b5c4daa691a7f78235053fb036829b1e9
    Build date: 2020-11-16 00:19:11
    Git branch: 2020.2
    PID: 17233
    UID: 31781
    [Mon Jan 11 19:28:15 2021 GMT]
    HOST: xcodpeascoe40
    EXE: /scratch/ravic/Vitis-In-Depth-Tutorial/Runtime_and_System_Optimization/Feature_Tutorials/04-using-hbm/build/HBM_addSeq_2Banks_d512_txSize64/host
    [XRT] ERROR: std::bad_alloc
    ./../reference_files/host.cpp:162 Error calling err = krnl_vector_add.setArg(2, buffer_output[j]), error code is: -5
    [XRT] WARNING: Profiling may contain incomplete information. Please ensure all OpenCL objects are released by your host code (e.g., clReleaseProgram()).
    Makefile:102: recipe for target 'run' failed
    make: *** [run] Error 1
    

As expected, the application results in error because you are trying to create a 600 MB buffer in HBM[0:1]. XRT sees this as a contiguous memory of 256*2 = 512 MB, but the host exceeds this size limit, resulting in an application error.

The provided Makefile adds the flexibility of creating your custom connectivity file by either using the banks argument. Make target has functionality available in mem_connectivity.mk to create the memory connectivity file.

  1. If the application does not require the full memory bank, the Vitis flow also provides the capability of sharing the memory banks across the ports. Here is one example of connectivity for sharing banks between ports in1 and in2.

    [connectivity]
    sp=vadd_1.in1:HBM[0:1]
    sp=vadd_1.in2:HBM[1:2]
    sp=vadd_1.out:HBM[3:4]
    

The ports in1 and in2 and sharing bank 1 of HBM. So the application can create buffers for each kernel port with 384 MB as maximum size.

Run the following command to use the application with HBM memory of size 384 MB for in1, in2, and out ports.

#make hbm_addSeq_overlap_build  - executed already in first module.
make hbm_addSeq_overlap

The above command shows the following results.

*** Running hw mode ***  Use Command Line to run application!
cd ./../build/HBM_overlapBanks_d512_txSize64 &&  ./host vadd_hw.xclbin 384 0 1 64;

 Total Data of 384.000 Mbytes to be written to global memory from host

 Kernel is invoked 1 time and repeats itself 1 times

Found Platform
Platform Name: Xilinx
DEVICE xilinx_u50_gen3x16_xdma_201920_3
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
kernel_time_in_sec = 0.0311151
Duration using events profiling: 30897093 ns
 match_count = 100663296 mismatch_count = 0 total_data_size = 100663296
Throughput Achieved = 13.0321 GB/s
TEST PASSED

When multiple ports are sharing overlapping bank and one (or more) of the buffer trying to utilize the overlapping portion, the order of assigning buffers (in the host code) to the corresponding kernel ports can become important. In this particular example, both buffers for the ports in1 and in2 are trying to utilize the overlapping bank 1 when each of them allocating 384 MB. Hence, the host application must assign buffer for in1 first, and then assigns buffer for in2. Reversing this sequence will result into the bad alloc error. This is demonstrated in the following figure.

Buffer Assignment for overlapping banks

In other words, there is no Lazy Allocation. The buffers are allocated upfront (and immediately) following the host code buffer handling order.

Additionally, you can also connect all the 32 HBM banks to each of the kernel ports based on the application requirement. This way, the whole memory space will be available to all the ports. The overall HBM efficiency will vary based on the access pattern and how many channels are being accessed, as described in the previous tutorial module.