The AI Engine array is made up of AI Engine tiles and AI Engine array interface tiles on the last row of the array. The types of interface tiles include AI Engine-PL and AI Engine-NoC.
Knowledge of the PL interface tile, which interfaces and adapts the signals between the AI Engines and the PL region, is essential to take full advantage of the bandwidth between AI Engines and the PL. The following figure shows an expanded view of a single PL interface tile.
Following is a conceptual representation of the AI Engine-PL interface interacting with PL and AI Engine tiles:
Notice the CDC path between PL and AI Engine. The latency of the path can vary when PL frequency or phase changes.
Generally, the higher the frequency of the PL, the lower the latency in absolute time. And the higher the frequency of the PL, the higher the throughput or sample rate of the PL kernels. It is important to plan the PL clocks for low latency applications and high speed designs, based on the AI Engine-to-PL rate matching or any other requirements.
When using event APIs
to do profiling, the probing points are inside the AXI4-Stream switch box of the AI Engine-PL interface. However, if
using --debug.aie.chipscope
option of v++, the
ILA probing points will be on the PL wrapper logic. Thus, there will be multiple
cycle differences between the two methods when measuring the latency of the AI
Engine graph.
Also, the path inside the AI Engine-PL
interface including the AXI4-Stream switch box has
the capability of buffering. The tready
signal
from AI Engine will be asserted after the device is booted, that is, even before
the AI Engine graph is run by host code. So, if PL kernel starts transferring
data to AI Engine, it fills all the buffers inside the AI Engine-PL interface,
until back pressure occurs.