Sequential Accesses - 2022.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2022-12-01
Version
2022.2 English

In this step, first, you will build the xclbin that can support transaction size of, say 64 bytes, 128 bytes, 256 bytes,512 bytes, 1024 bytes. Next, you can explore achievable bandwidth accessing a single Pseudo channel of HBM (256MB), two Psuedo channels (512MB), and four Psuedo channels (1024MB)

Here is an example of building the application using following target for Master 0 accessing PC0 as shown below. (Don’t run this command)

make build TARGET=hw memtype=HBM banks=0_31 dsize=256 addrndm=0  txSize=64 buildxclbin=1

The project provides the following flexibility to run an application using arguments, as shown below.

  • dsize=256 will access only a single Pseudo channel, because the datasize on the host size is 256 MB

  • txSize=64 will queue each command equivalent of 64 bytes from kernel port. Since each transfer is 64 bytes, this will be equivalent to a Burst length of 1. txSize=128 will be identical to Burst Length of 2, and so on.

  • banks0_31 configures kernel’s AXI master ports connect to all the banks. During the build, Makefile will create the HBM_connectivity.cfg file in the respective build directory. Refer to mem_connectivity.mk for more information. You can also create your custom connectivity by updating in_M0, in_M1, and out_M2 variables

  • addrndm=0 will ensure the address generated is sequential when the kernel is run. As seen previously, this is an argument to the kernel passed down from the host code.

The above build command will create the xclbin under /build/HBM_addSeq_allBanks_d512_txSize64

You can run the following command to generate the builds for txSize of 64,128,256,512,1024 bytes.

make build_without_rama # This command is already executed in the first module

  • If the machine doesn’t have enough resources to launch six jobs in parallel, you can run the above command one by one, as shown below

    make noramajob-64 noramajob-128 noramajob-256 noramajob-512 noramajob-1024

To run the application with the above build created for txSize of 64,128,256,512,1024 bytes AND accessing 1,2,4 Pseudo channels (using dsize argument)

make all_hbm_seq_run

The above target will generate the output file <Project>/makefile/Run_SequentialAddress.perf file with the following data

Addr Pattern   Total Size(MB) Transaction Size(B) Throughput Achieved(GB/s)

Sequential     256 (M0->PC0)             64                     13.0996
Sequential     256 (M0->PC0)             128                    13.0704
Sequential     256 (M0->PC0)             256                    13.1032
Sequential     256 (M0->PC0)             512                    13.0747
Sequential     256 (M0->PC0)             1024                   13.0432

Sequential     512 (M0->PC0_1)           64                     13.1244
Sequential     512 (M0->PC0_1)           128                    13.1142
Sequential     512 (M0->PC0_1)           256                    13.1285
Sequential     512 (M0->PC0_1)           512                    13.1089
Sequential     512 (M0->PC0_1)           1024                   13.1097

Sequential     1024 (M0->PC0_3)          64                     13.148
Sequential     1024 (M0->PC0_3)          128                    13.1435
Sequential     1024 (M0->PC0_3)          256                    13.1506
Sequential     1024 (M0->PC0_3)          512                    13.1539
Sequential     1024 (M0->PC0_3)          1024                   13.1454

This use case shows the maximum results when using one kernel master, M0 to access HBM. The table above shows the measured bandwidth in GB/s achieved.

The top 5 rows show the point to point accesses, ie, 256 MB accesses, with the Transaction size variation. The bandwidth is consistent around 13 GB/s.

The next ten rows show a grouping of 2 pseudo channels and 4 pseudo channels, ie, 512 MB and 1024 MB, respectively, and the bandwidth is constant.