In this section, you will observe achievable bandwidth using one HBM master port. You will also explore access to single or multiple Pseudo channels as various transaction sizes are initiated by the kernel master port.
The topology, for example, M0 to single PC0 directly across the switch or M0 to a group PC0-1 or M0 to a group PC0-3.
The number of bytes in the transaction vary from 64 bytes to 1024 bytes,
Addressing used: sequential/linear accesses or random accesses,
Use of the Random Access Memory Attachment (RAMA) IP to achieve better results.The RAMA IP is specifically designed to assist HBM-based designs with non-ideal traffic masters and use cases. For more information, refer to RAMA LogiCORE IP Product Guide
This section, via the above different configurations analyze enough data so that the developers will understand and make better decisions for their designs.
If your application is memory bound, it’s always beneficial to access 64-bytes of data whether it’s DDR or HBM. For this project, datawidth is set to 512 bits by default using dwidth
variable in Makefile. You can experiment with smaller data width by changing this variable. Additionally, performance measured is based on M_AXI interface memory performance read-only and write performance is not measured in this section. The measured bandwidth is using C++ std::chrono to record the time just before kernel enqueues and just after the queue finish command. The bandwidth is reported in GB/s achieved.
The kernel ports in1,in2, and out are connected to all the HBM channels. In this scenario, each kernel port will have access to all the HBM channels. The application should implement this connectivity only if the application requires accessing all the channels. HBM memory subsystem will attempt to give the kernel the best access to all the memories connected to, say, kernel port in1 to M11 or M12 of the HBM subsystem. The application will experience extra latency to access the Psuedo channels on the extremes, say PC0 or PC31, from the middle master M12. Due to this, the application may require more outstanding transaction settings on AXI interfaces connected to kernel ports.
In this module, all the kernel ports are connected to all the Psudeo channels for simplicity.
Let’s start with Bandwidth experiments using sequential accesses first.