In this step, first, you will build the xclbin that can support transaction size of, say 64 bytes, 128 bytes, 256 bytes,512 bytes, 1024 bytes. Next, you can explore achievable bandwidth accessing a single Pseudo channel of HBM (256MB), two Psuedo channels (512MB), and four Psuedo channels (1024MB)
Here is an example of building the application using following target for Master 0 accessing PC0 as shown below. (Don’t run this command)
make build TARGET=hw memtype=HBM banks=0_31 dsize=256 addrndm=0 txSize=64 buildxclbin=1
The project provides the following flexibility to run an application using arguments, as shown below.
dsize=256 will access only a single Pseudo channel, because the datasize on the host size is 256 MB
txSize=64 will queue each command equivalent of 64 bytes from kernel port. Since each transfer is 64 bytes, this will be equivalent to a Burst length of 1. txSize=128 will be identical to Burst Length of 2, and so on.
banks0_31 configures kernel’s AXI master ports connect to all the banks. During the build, Makefile will create the HBM_connectivity.cfg file in the respective build directory. Refer to
mem_connectivity.mk
for more information. You can also create your custom connectivity by updating in_M0, in_M1, and out_M2 variablesaddrndm=0 will ensure the address generated is sequential when the kernel is run. As seen previously, this is an argument to the kernel passed down from the host code.
The above build command will create the xclbin under
You can run the following command to generate the builds for txSize of 64,128,256,512,1024 bytes.
make build_without_rama
# This command is already executed in the first module
If the machine doesn’t have enough resources to launch six jobs in parallel, you can run the above command one by one, as shown below
make noramajob-64 noramajob-128 noramajob-256 noramajob-512 noramajob-1024
To run the application with the above build created for txSize of 64,128,256,512,1024 bytes AND accessing 1,2,4 Pseudo channels (using dsize argument)
make all_hbm_seq_run
The above target will generate the output file <Project>/makefile/Run_SequentialAddress.perf
file with the following data
Addr Pattern Total Size(MB) Transaction Size(B) Throughput Achieved(GB/s)
Sequential 256 (M0->PC0) 64 13.0996
Sequential 256 (M0->PC0) 128 13.0704
Sequential 256 (M0->PC0) 256 13.1032
Sequential 256 (M0->PC0) 512 13.0747
Sequential 256 (M0->PC0) 1024 13.0432
Sequential 512 (M0->PC0_1) 64 13.1244
Sequential 512 (M0->PC0_1) 128 13.1142
Sequential 512 (M0->PC0_1) 256 13.1285
Sequential 512 (M0->PC0_1) 512 13.1089
Sequential 512 (M0->PC0_1) 1024 13.1097
Sequential 1024 (M0->PC0_3) 64 13.148
Sequential 1024 (M0->PC0_3) 128 13.1435
Sequential 1024 (M0->PC0_3) 256 13.1506
Sequential 1024 (M0->PC0_3) 512 13.1539
Sequential 1024 (M0->PC0_3) 1024 13.1454
This use case shows the maximum results when using one kernel master, M0 to access HBM. The table above shows the measured bandwidth in GB/s achieved.
The top 5 rows show the point to point accesses, ie, 256 MB accesses, with the Transaction size variation. The bandwidth is consistent around 13 GB/s.
The next ten rows show a grouping of 2 pseudo channels and 4 pseudo channels, ie, 512 MB and 1024 MB, respectively, and the bandwidth is constant.