Transferring data in bursts hides the memory access latency and improves bandwidth usage and efficiency of the memory controller. Also, check the HLS report for bursting information.
If burst data transfers occur, the detailed kernel trace will reflect the higher burst rate as a larger burst length number:
In the previous figure, it is also possible to observe that the memory data transfers following the AXI interconnect are actually implemented rather differently (shorter transaction time). Hover over these transactions, you would see that the AXI interconnect has packed the 16 x 4 byte transaction into a single package transaction of 1 x 64 bytes. This effectively uses the AXI4 bandwidth which is even more favorable. The next section focuses on this optimization technique in more detail.
Burst inference is heavily dependent on coding style and access pattern. However, you can ease burst detection and improve performance by isolating data transfer and computation, as shown in the following code snippet:
void kernel(T in[1024], T out[1024]) {
T tmpIn[1024];
T tmpOu[1024];
read(in, tmpIn);
process(tmpIn, tmpOut);
write(tmpOut, out);
}
In short, the function read
is
responsible for reading from the AXI input to an internal variable (tmpIn)
. The computation is implemented by the function
process
working on the internal variables tmpIn
and tmpOut
. The
function write
takes the produced output and writes to
the AXI output. For more information on burst, see the
AXI Burst Transfers in
the Vitis HLS User Guide
(UG1399).
The isolation of the read and write function from the computation results in:
- Simple control structures (loops) in the read/write function which makes burst detection simpler.
- The isolation of the computational function away from the AXI interfaces, simplifies potential kernel optimization.
- The internal variables are mapped to on-chip memory, which allow faster access compared to AXI transactions. Acceleration platforms supported in the Vitis core development kit can have as much as 10 MB on-chip memories that can be used as pipes, local memories, and private memories. Using these resources effectively can greatly improve the efficiency and performance of your applications.