Sequential Accesses - 2023.1 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-08-02
Version
2023.1 English

In this step, first, you will build the xclbin that can support transaction size of, say 64 bytes, 128 bytes, 256 bytes, 512 bytes, and 1024 bytes. Next, you can explore achievable bandwidth accessing a single Pseudo channel of HBM (256 MB), two Psuedo channels (512 MB), and four Psuedo channels (1024 MB).

Here is an example of building the application using following target for Master 0 accessing PC0 as shown below( Do not run this command).

make build TARGET=hw memtype=HBM banks=0_31 dsize=256 addrndm=0  txSize=64 buildxclbin=1

The project provides the following flexibility to run an application using arguments, as shown below.

  • dsize=256: Accesses only a single Pseudo channel, because the datasize on the host size is 256 MB.

  • txSize=64: Queues each command equivalent of 64 bytes from kernel port. Since each transfer is 64 bytes, this will be equivalent to a Burst length of 1. txSize=128 will be identical to Burst Length of 2, and so on.

  • banks0_31: Configures the kernel’s AXI master ports connect to all the banks. During the build, Makefile will create the HBM_connectivity.cfg file in the respective build directory. Refer to mem_connectivity.mk for more information. You can also create your custom connectivity by updating in_M0, in_M1, and out_M2 variables.

  • addrndm=0: Ensures the address generated is sequential when the kernel is run. As seen previously, this is an argument to the kernel passed down from the host code.

The above build command will create the xclbin under <Project>/build/HBM_addSeq_allBanks_d512_txSize64.

You can run the following command to generate the builds for txSize of 64, 128, 256, 512, and 1024 bytes.

make build_without_rama # This command is already executed in the first module

  • If the machine does not have enough resources to launch six jobs in parallel, you can run the above command one by one, as shown below.

    make noramajob-64 noramajob-128 noramajob-256 noramajob-512 noramajob-1024

To run the application with the above build created for txSize of 64, 128, 256, 512, and 1024 bytes AND accessing 1, 2, and 4 Pseudo channels (using dsize argument).

make all_hbm_seq_run

The above target will generate the output file <Project>/makefile/Run_SequentialAddress.perf file with the following data.

Addr Pattern   Total Size(MB) Transaction Size(B) Throughput Achieved(GB/s)

Sequential     256 (M0->PC0)             64                     13.0996
Sequential     256 (M0->PC0)             128                    13.0704
Sequential     256 (M0->PC0)             256                    13.1032
Sequential     256 (M0->PC0)             512                    13.0747
Sequential     256 (M0->PC0)             1024                   13.0432

Sequential     512 (M0->PC0_1)           64                     13.1244
Sequential     512 (M0->PC0_1)           128                    13.1142
Sequential     512 (M0->PC0_1)           256                    13.1285
Sequential     512 (M0->PC0_1)           512                    13.1089
Sequential     512 (M0->PC0_1)           1024                   13.1097

Sequential     1024 (M0->PC0_3)          64                     13.148
Sequential     1024 (M0->PC0_3)          128                    13.1435
Sequential     1024 (M0->PC0_3)          256                    13.1506
Sequential     1024 (M0->PC0_3)          512                    13.1539
Sequential     1024 (M0->PC0_3)          1024                   13.1454

This use case shows the maximum results when using one kernel master, M0 to access HBM. The table above shows the measured bandwidth in GB/s achieved.

The top five rows show the point to point accesses, i.e., 256 MB accesses, with the Transaction size variation. The bandwidth is consistent around 13 GB/s.

The next ten rows show a grouping of two pseudo channels and four pseudo channels, i.e., 512 MB and 1024 MB, respectively, and the bandwidth is constant.