4. Top-level Linking Consideration - 2023.1 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-08-02
Version
2023.1 English

For DDR based Alveo card (such as U200 and U250), the single DDR bank provides approximately 19.2 Gbps bandwidth. For 25 Gbps x 4 lane Aurora IP, the unidirectional throughput is about 12.5 Gbps. So when implementing the 25 Gbps lane speed loopback example design, if you make both the strm_dump and strm_issue kernels access the same DDR bank, the performance is degraded by DDR bandwidth limitation. So to achieve the highest performance, connect the AXI masters of the two HLS kernels to different DDR bank. You can control the kernel slr and sp assignment in the v++ linking configuration file. As an example, for U200 case, add following lines at the last of krnl_aurora_test.cfg file:

slr=strm_issue_0:SLR1
slr=strm_dump_0:SLR1
slr=krnl_aurora_0:SLR2
sp=strm_issue_0.m_axi_gmem:DDR[1]
sp=strm_dump_0.m_axi_gmem:DDR[2]

Based on the four adjustments mentioned above, implement a test design with U200 card, and observe about 11.5 Gbpss throughput as shown in the following running log:

$ ./host_krnl_aurora_test

------------------------ krnl_aurora loopback test ------------------------
Transfer size: 100 MB

Generate TX data block.
Program running in hardware mode
Load krnl_aurora_test_hw.xclbin
Create kernels
Create TX and RX device buffer
Transfer TX data into device buffer
Check whether startup status of Aurora kernel is ready...
Aurora kernel startup status is GOOD: 1000111111111
[12]channel_up [11]soft_err [10]hard_err [9]mmcm_not_locked_out [8]gt_pll_lock [7:4]line_up [3:0]gt_powergood
Begin data loopback transfer
Data loopback transfer finish
Transfer time = 8.723 ms
Fetch RX data from device buffer and verification
Data loopback transfer throughput = 11463.9 MB/s
Aurora Error Status:
SOFT_ERR: 0
HARD_ERR: 0

Data verification SUCCEED