Write code in such a way that bursting can be inferred. Ensure that none of the preconditions are violated. Bursting does not mean that you will get all your data in one shot – it is about merging the requests together into one request, but the data will arrive sequentially, one after another.
Burst length of 16 is ideal, but even burst lengths of 8 are enough. Bigger bursts have more latency while shorter bursts can be pipelined. Do not confuse bursting with pipelining, but note that bursts can be pipelined with other bursts.
If your bursts are of fixed length, you can unroll the inner loop where bursts are inferred and pipeline the outer loop. This will achieve the same burst length, but also pipelining between the bursts to enable higher throughput.
For greater throughput, focus on widening the interface up to 1024 bits rather than simply achieving longer bursts.
Bigger bursts have higher priority with the AXI interconnect. No dynamic arbitration is done inside the kernel.
You can have two m_axi
ports connected to same
DDR memory to model mutually exclusive access inside kernel, but the AXI interconnect
outside the kernel will arbitrate competing requests.
One way to get around the out-of-order access restriction is to create your own buffer in block RAM, store the bursts in this buffer and then use this buffer to do out of order accesses. This is typically called a line buffer and is a common optimization used in video processing.
As another alternative, you can consider adding the CACHE pragma or directive as
described in pragma HLS cache to improve performance of
the m_axi
interface.
Review the Burst Optimization section of the Synthesis Summary report to learn more about burst optimizations in the
design, and missed burst opportunities. If automatic burst is not occurring in your
design, you might want to use the hls::burst_maxi
data
type for manual burst, as described in Using Manual Burst.