Interfaces - Interfaces - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

There are two types of interfaces: windows and streams. Memory access (for stored windows) provides much higher bandwidth than streams: 2x40 GB/s vs. 2x5 GB/s (@1.25 GHz). Despite high memory bandwidth from the processor, another AI Engine (bandwidth 40 GB/s) or streams (2x5 GB/s) must fill the data. Somewhere in the kernel cascade, the data originates outside the AI Engine array (PL, DDR, etc.), requiring a stream source.

Window interfaces are used in a ‘ping-pong’ manner to allow for continuous data transfer while maintaining continuous processing. When multiple kernels map to the same AI Engine and communicate through windows, these windows use a single buffer because the kernels do not run simultaneously. Ping-pong buffering processes data only when the buffer completely fills, incurring minimum latency equal to the buffer filling duration. When an AI Engine kernel uses window interfaces, it must acquire a lock to gain access ownership to this memory. Lock acquisition and release takes a minimum of seven cycles per lock, which reduces the time allowed for processing.

As a rule of thumb, 900 MSPS (@ 25 GHz) is the maximum sample rate for which window interfaces are a viable solution. When kernel processing takes only a fraction of input window fill time, the utilization ratio falls below 1, enabling multiple kernels to map onto a single AI Engine.

In this tutorial, the goal is to achieve the maximum performance filter implementation, leading to a streaming interface at the input and the output.