The AI Engine-ML array interface consists of the PL and NoC interface tiles. The AI Engine-ML array interface tiles manage the two following high performance interfaces.
- AI Engine-ML to PL
- AI Engine-ML to NoC
The following image shows the AI Engine-ML array interface structure.
One AI Engine-ML to PL interface tile contains eight streams from the PL to the AI Engine-ML and six streams from the AI Engine-ML to the PL. The following table shows one AI Engine-ML to PL interface tile capacity.
Connection Type | Number of Connections | Data Width (bits) | Clock Domain | Bandwidth per Connection (GB/s) | Aggregate Bandwidth (GB/s) |
---|---|---|---|---|---|
PL to AI Engine-ML array interface | 8 | 64 | PL (500 MHz) |
4 | 32 |
AI Engine-ML array interface to PL | 6 | 64 | PL (500 MHz) |
4 | 24 |
The exact number of PL and NoC interface tiles is device-specific. For example, in the XCVE2802 device, there are 38 columns of AI Engine-ML array interface tiles. However, only 28 array interface tiles are available to the PL interface. Therefore, the aggregate bandwidth for the PL interface is approximately:
- 24 GB/s * 28 = 0.672 TB/s from AI Engine-ML to PL
- 32 GB/s * 28 = 0.896 TB/s from PL to AI Engine-ML
The input_gmio
/output_gmio
attribute uses
DMA in the AI Engine-ML to NoC interface tile. The
DMA has two 32-bit incoming streams from the AI Engine-ML and two 32-bit streams to the AI Engine-ML. In addition, it has one 128-bit memory mapped AXI master
interface to the NoC NMU. The performance of one AI Engine-ML to NoC interface tile is shown in the following table.
Connection Type | Number of connections | Bandwidth per connection (GB/s) | Aggregate Bandwidth (GB/s) |
---|---|---|---|
AI Engine-ML to DMA | 2 | 4 | 8 |
DMA to NoC | 1 | 16 | 16 |
DMA to AI Engine-ML | 2 | 4 | 8 |
NoC to DMA | 1 | 16 | 16 |
The exact number of AI Engine-ML to NoC interface tiles is device-specific. For example, in the XCVE2802 device, there are 12 AI Engine-ML to NoC interface tiles. So, the aggregate bandwidth for the NoC interface is approximately:
- 8 GB/s * 12 = 96 GB/s from AI Engine-ML to PL
- 8 GB/s * 12 = 96 GB/s from PL to AI Engine-ML
- 3733 Mb/s * 32 bit * 4 DDRMCs / 8 = 59.728 GB/s
The performance of input_gmio
/output_gmio
accessing DDR memory through the NoC is further restricted by the NoC lane number in
the horizontal and vertical NoC, inter-NoC configurations, and QoS. Note that DDR
memory read and write efficiency is largely affected by the access pattern and other
overheads. For more information about the NoC, memory controller use, and
performance numbers, see the
Versal Adaptive SoC Programmable Network on Chip and
Integrated Memory Controller LogiCORE IP Product Guide (PG313).
For a single connection from the AI Engine-ML
or to the AI Engine-ML, both
input_plio
/output_plio
and
input_gmio
/output_gmio
have a hard bandwidth
limit of 4 GB/s. Some advantages and disadvantages for choosing
input_plio
/output_plio
or
input_gmio
/output_gmio
are shown in the
following table.
input_plio/output_plio | input_gmio/output_gmio | |
---|---|---|
Advantages |
|
|
Disadvantages |
|
|