The AI Engine-ML array interface consists of the PL and NoC interface tiles. The AI Engine-ML array interface tiles manage the two following high performance interfaces.
- AI Engine-ML to PL
- AI Engine-ML to NoC
The following image shows the AI Engine-ML array interface structure.
The following figure shows the AIE-ML v2 array interface structure.
One AI Engine-ML to PL interface tile contains eight streams from the PL to the AI Engine-ML and six streams from the AI Engine-ML to the PL. The following table shows one AI Engine-ML to PL interface tile capacity.
| Connection Type | Number of Connections | Data Width (bits) | Clock Domain | Bandwidth per Connection (GB/s) | Aggregate Bandwidth (GB/s) |
|---|---|---|---|---|---|
| PL to AI Engine-ML array interface | 8 | 64 | PL (500 MHz) |
4 | 32 |
| AI Engine-ML array interface to PL | 6 | 64 | PL (500 MHz) |
4 | 24 |
The exact number of PL and NoC interface tiles is device-specific. For example, in the XCVE2802 device, there are 38 columns of AI Engine-ML array interface tiles. However, only 28 array interface tiles are available to the PL interface. Therefore, the aggregate bandwidth for the PL interface is approximately:
- 24 GB/s * 28 = 0.672 TB/s from AI Engine-ML to PL
- 32 GB/s * 28 = 0.896 TB/s from PL to AI Engine-ML
The input_gmio/output_gmio attribute uses DMA in the AI Engine-ML to NoC interface tile. For AI Engine-ML, the DMA has two 32-bit incoming streams from the
AI Engine-ML and two 32-bit streams to
the AI Engine-ML. In addition, it has one
128-bit memory mapped AXI master interface to the NoC NMU.
For AI Engine-ML v2, the DMA has two 64-bit incoming streams from the
AI Engine-ML and two 64-bit streams to
the AI Engine-ML. In addition, it has two
128-bit memory mapped AXI master interfaces to the NoC NMU.
The performance of one AI Engine-ML to NoC interface tile is shown in the following table.
| Connection Type | Number of connections | Bandwidth per connection (GB/s) | Aggregate Bandwidth (GB/s) |
|---|---|---|---|
| AI Engine-ML to DMA | 2 | 4 | 8 |
| DMA to NoC | 1 | 16 | 16 |
| DMA to AI Engine-ML | 2 | 4 | 8 |
| NoC to DMA | 1 | 16 | 16 |
The exact number of AI Engine-ML to NoC interface tiles is device-specific. For example, in the XCVE2802 device, there are 12 AI Engine-ML to NoC interface tiles. So, the aggregate bandwidth for the NoC interface is approximately:
- 8 GB/s * 12 = 96 GB/s from AI Engine-ML to PL
- 8 GB/s * 12 = 96 GB/s from PL to AI Engine-ML
The performance of AI Engine-ML v2 to NoC on one interface tile on is shown in the following table.
| Connection Type | Number of connections | Bandwidth per connection (GB/s) | Aggregate Bandwidth (GB/s) |
|---|---|---|---|
| AI Engine-ML to NoC | 2 | 16 | 32 |
| NOC to AI Engine-ML | 2 | 16 | 32 |
- 3733 Mb/s * 32 bit * 4 DDRMCs / 8 = 59.728 GB/s
The performance of input_gmio/output_gmio accessing DDR
memory through the NoC is further restricted by the NoC lane number in the
horizontal and vertical NoC, inter-NoC configurations, and QoS.
For a single connection from the AI Engine-ML or to the AI Engine-ML, both input_plio/output_plio and input_gmio/output_gmio have a hard bandwidth limit of 4 GB/s.
Some advantages and disadvantages for choosing input_plio/output_plio or input_gmio/output_gmio are shown in the following table.
| input_plio/output_plio | input_gmio/output_gmio | |
|---|---|---|
| Advantages |
|
|
| Disadvantages |
|
|