Design Considerations for Graphs Interacting with Programmable Logic - 2024.1 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-06-06
Version
2024.1 English

The AI Engine-ML array is made up of AI Engine-ML tiles and AI Engine-ML array interface tiles on the last row of the array. The types of interface tiles include AI Engine-ML-PL and AI Engine-ML-NoC.

Knowledge of the PL interface tile, which interfaces and adapts the signals between the AI Engine-MLs and the PL region, is essential to take full advantage of the bandwidth between AI Engine-MLs and the PL. The following figure shows an expanded view of a single PL interface tile.

Figure 1. AI Engine-ML-PL Interface Tile

Note: Notice the interface tile supports two different clock domains (AI Engine-ML clock and PL clock), and a predefined number of streaming channels available to connect from the AI Engine-ML tile to a specific PL interface tile.

Following is a conceptual representation of the AI Engine-PL interface interacting with PL and AI Engine tiles:

Figure 2. Conceptual Representation of AIE-PL Interface

Notice the CDC path between PL and AI Engine. The latency of the path can vary when PL frequency or phase changes.

Generally, the higher the frequency of the PL, the lower the latency in absolute time. And the higher the frequency of the PL, the higher the throughput or sample rate of the PL kernels. It is important to plan the PL clocks for low latency applications and high speed designs, based on the AI Engine-to-PL rate matching or any other requirements.

When using event APIs to do profiling, the probing points are inside the AXI4-Stream switch box of the AI Engine-PL interface. However, if using --debug.aie.chipscope option of v++, the ILA probing points will be on the PL wrapper logic. Thus, there will be multiple cycle differences between the two methods when measuring the latency of the AI Engine graph.

Also, the path inside the AI Engine-PL interface including the AXI4-Stream switch box has the capability of buffering. The tready signal from AI Engine will be asserted after the device is booted, that is, even before the AI Engine graph is run by host code. So, if PL kernel starts transferring data to AI Engine, it fills all the buffers inside the AI Engine-PL interface, until back pressure occurs.