Profiling for Memory Modules - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English

The following tables list the pre-defined metric set configurations available for memory modules. In the xrt.ini file all of these metric names should be in lower case and assigned to one of the following metric selectors:

  • tile_based_aie_memory_metrics
  • graph_based_aie_memory_metrics
Table 1. conflicts
Metric Name Description
Memory Conflict The time taken due to data memory conflicts on any of the 8 banks of the memory module.
Note: The hardware view is 8 banks of 128-bit width. The software view is 4 banks of 256-bit width.
Cumulative Memory Errors The time taken due to ECC errors in any of the Data Memory banks, as well as the 2x MM2S and the 2x S2MM DMAs.

Memory conflicts happen when two memory chunks reside in the same memory bank and are accessed either by the same AI Engine (using the two read ports) or by two different AI Engines. A potential solution is to constrain the locations of these memories to different banks. In order to get more details about which bank is causing these conflicts, you should analyze the events from an AI Engine simulation.

Table 2. dma_locks
Metric Name Description
Cumulative DMA Activity The time taken due to stalled lock acquires on both the MM2S and S2MM channels of the DMA.
Cumulative DMA Lock Count The lock stall count on the DMA channels.

The four DMA channels (2xS2MM and 2xMM2S) are driven by buffer descriptors. The Cumulative DMA Activity is a count of the time taken due to stalled lock acquire events on all channels. All these DMA events will help you understand why some connections through the device are slower than expected.

Table 3. dma_stalls_s2mm
Metric Name Description
S2MM Channel 0 Stalls The time S2MM channel 0 is stalled on lock acquire.
S2MM Channel 1 Stalls The time S2MM channel 1 is stalled on lock acquire.
Table 4. dma_stalls_mm2s
Metric Name Description
MM2S Channel 0 Stalls The time the MM2S channel 0 stalled on lock acquire.
MM2S Channel 1 Stalls The time the MM2S channel 1 stalled on lock acquire.

Each AI Engine memory module contains two input streams to memory map (S2MM) DMA, two memory map to output DMA streams (MM2S) channels. The s2mm_throughputs and mm2s_throughputs metrics profile the throughput of the S2MM and MM2S DMA channels respectively.

Table 5. s2mm_throughputs of AI Engine
Metric Name Description
DMA S2MM Channel 0 BD Packet Count The number of BD packets written over DMA S2MM channel 0.
DMA S2MM Channel 1 BD Packet Count The number of BD packets written over DMA S2MM channel 1.
DMA S2MM Channel 0 Throughput (MB/s) The throughput of DMA S2MM channel 0.
DMA S2MM Channel 1 Throughput (MB/s) The throughput of DMA S2MM channel 1.

The write_throughputs metric is deprecated, and s2mm_throughputs is used instead.

Table 6. mm2s_throughputs of AI Engine
Metric Name Description
DMA MM2S Channel 0 BD Packet Count The number of BD packets written over DMA MM2S channel 0.
DMA MM2S Channel 1 BD Packet Count The number of BD packets written over DMA MM2S channel 1.
DMA MM2S Channel 0 Throughput (MB/s) The throughput of DMA MM2S channel 0.
DMA MM2S Channel 1 Throughput (MB/s) The throughput of DMA MM2S channel 1.

The read_throughputs metric is deprecated, and mm2s_throughputs is used instead.

Table 7. s2mm_throughputs of AI Engine-ML
Metric Name Description
DMA S2MM Channel Lock Stall Time/% The time DMA S2MM channel 0 is stalled on a lock. This must be used with s2mm_throughputs in an AI Engine module.
DMA S2MM Channel Lock Stall Time/% The time DMA S2MM channel 1 is stalled on a lock. This must be used with s2mm_throughputs in an AI Engine module
DMA S2MM Memory Backpressure Time/% The time DMA S2MM channel 0 is inactive due to memory backpressure. This must be used with s2mm_throughputs in an AI Engine module.
DMA S2MM Memory Backpressure Time/% The time DMA S2MM channel 1 is inactive due to memory backpressure. This must be used with s2mm_throughputs in an AI Engine module.
DMA S2MM Channel 0 Throughput (MB/s) The throughput of DMA S2MM channel 0. This must be used with s2mm_throughputs in a memory module.
DMA S2MM Channel 1 Throughput (MB/s) The throughput of DMA S2MM channel 1. This must be used with s2mm_throughputs in a memory module.
Note: These metrics are only available on AI Engine-ML devices.
Table 8. mm2s_throughputs of AI Engine-ML
Metric Name Description
DMA MM2S Stream Backpressure Time/% The time DMA MM2S channel 0 is inactive due to stream backpressure. This must be used with mm2s_throughputs in an AI Engine module.
DMA MM2S Stream Backpressure Time/% The time DMA MM2S channel 1 is inactive due to stream backpressure. This must be used with mm2s_throughputs in an AI Engine module.
DMA MM2S Memory Starvation Time/% The time DMA MM2S channel 0 is inactive due to memory starvation. This must be used with mm2s_throughputs in an AI Engine module.
DMA MM2S Memory Starvation Time/% The time DMA MM2S channel 1 is inactive due to memory starvation. This must be used with mm2s_throughputs in an AI Engine module.
DMA MM2S Channel 0 Throughput (MB/s) The throughput of DMA MM2S channel 0. This must be used with mm2s_throughputs in a memory module.
DMA MM2S Channel 1 Throughput (MB/s) The throughput of DMA MM2S channel 1. This must be used with mm2s_throughputs in a memory module.
Note: These metrics are only available on AI Engine-ML devices.