The following tables list the pre-defined metric set configurations available
for AI Engine, in order of priority by which they are assigned to the available
counters. In the
xrt.ini
file all these metric
names should be in lower case and assigned to the metric selector aie_profile_core_metrics
.
Metric Name | Event ID | Description |
---|---|---|
Active Time | 28 | Time AI Engine was active since it was enabled. |
Stall Time | 22 | Time AI Engine was stalled. This stall includes AI Engine memory, stream, cascade, and lock stalls. |
Vector Instruction Time | 37 | Time AI Engine spent executing instructions in the vector processor. |
Cumulative Instruction Time | 32 | Time AI Engine spent executing load/store, stream get/put, lock acquire/release instructions. |
Active Utilization | Derived | Time AI Engine is actively executing instructions and not stalling. The percentage is relative to active time. |
Vector Instruction Utilization | Derived | Time AI Engine is executing vector instructions. The percentage is relative to active utilization time (active - stalls). |
These indicators help you understand the efficiency of the kernels that are
implemented in the AI Engines. You can compare stall time with active time to
determine if there is a data communication issue for each AI Engine.
Metric Name | Event ID | Description |
---|---|---|
Memory Stall Time | 23 | Time the AI Engine was not active due to a memory stall. |
Stream Stall Time | 24 | Time the AI Engine was not active due to a stream stall. |
Lock Stall Time | 26 | Time the AI Engine was in a lock stall. |
Cascade Stall Time | 25 | Time the AI Engine was in a cascade stall. |
A stall in an AI Engine can occur in various situations:
- A memory stall happens when multiple accesses to the same memory bank are requested from one core, multiple cores, and/or DMAs.
- Stream stalls occur when data production and consumption on a stream do not have the same rate, leading to input stream starvation or output stream overflow.
- A cascade stall is generated when the cascade writer does not have the same rate as the cascade reader.
- A lock stall happens if the window data producer does not have the same iteration rate as the window consumer.
Metric Name | Event ID | Description |
---|---|---|
Vector Instruction Time | 37 | Time spent by the AI Engine on vector instructions: vector processor instruction and vector data load/store |
Load Instruction Time | 38 | Time spent by the AI Engine on load instructions (move data from memory to registers) |
Store Instruction Time | 39 | Time spent by the AI Engine on store instructions (move data from registers to memory) |
Cumulative Instruction Time | 32 | Time spent by the AI Engine on memory and stream accesses and lock acquire/release |
All these indicators allow you to estimate the efficiency of your kernel. To
increase efficiency, you should optimize data access, favor vector instructions over
scalar instructions, and use 128-bit access to streams whenever possible.
Metric Name | Event ID | Description |
---|---|---|
Floating-Point Overflow Exception | 50 | Number of floating-point overflow exceptions generated by AI Engine |
Floating-Point Underflow Exception | 51 | Number of floating-point underflow exceptions generated by AI Engine |
Floating-Point Invalid Exception | 52 | Number of floating-point Invalid exceptions generated by AI Engine |
Floating-point Divide by Zero Exception | 53 | Number of floating-point divide by zero exceptions generated by AI Engine |
Floating-point exceptions lead to erroneous results. You might have to recode
your floating-point algorithm if you get too many exceptions, or even a single in a
critical area of the code.
Metric Name | Event ID | Description |
---|---|---|
Cascade Read Instruction Time | 42 | Time AI Engine spent executing read instructions on the cascade stream. |
Cascade Write Instruction Time | 43 | Time AI Engine spent executing write instructions on the cascade stream. |
Stream Read Instruction Time | 40 | Time AI Engine spent executing read instructions on data streams. |
Stream Write Instruction Time | 41 | Time AI Engine spent executing write instructions on data streams. |
Metric Name | Event ID | Description |
---|---|---|
AI Engine Trace Word Count | 75 | Amount of AI Engine trace produced. |
AI Engine Trace Stall Count | 76 | Amount of AI Engine trace back-pressure events produced. |
Memory Module Trace Word Count | 79 | Amount of Memory Module trace produced. |
Memory Module Trace Stall Count | 80 | Amount of Memory Module trace back-pressure events produced. |
These metrics, particularly the stall counts, help in defining the right
number of streams to transmit AI Engine and Memory Module events to the Programmable
Logic.
Metric Name | Event ID | Description |
---|---|---|
Active Time | 28 | Time AI Engine was active since it was enabled. |
Stream Write Instruction Time | 41 | Time AI Engine spent executing write instructions on data streams. |
Cascade Write Instruction Time | 43 | Time AI Engine spent executing write instructions on the cascade stream. |
Stall Time | 22 | Time AI Engine was stalled. This stall includes AI Engine memory, stream, cascade, and lock stalls. |
Stream Write Bandwidth (MB/s) | Derived | Write Bandwidth of Stream Ports in MB/s |
Cascade Write Bandwidth (MB/s) | Derived | Write Bandwidth of Cascade Ports in MB/s |
These metrics are useful to evaluate the overall output write bandwidth of
the system.
Metric Name | Event ID | Description |
---|---|---|
Active Time | 28 | Time AI Engine was active since it was enabled. |
Stream Read Instruction Time | 40 | Time AI Engine spent executing read instructions on data streams. |
Cascade Read Instruction Time | 42 | Time AI Engine spent executing read instructions on the cascade stream. |
Stall Time | 22 | Time AI Engine was stalled. This stall includes AI Engine memory, stream, cascade, and lock stalls. |
Stream Read Bandwidth (MB/s) | Derived | Read Bandwidth of Stream Ports in MB/s |
Cascade Read Bandwidth (MB/s) | Derived | Read Bandwidth of Cascade Ports in MB/s |
These metrics are useful to evaluate the overall output read bandwidth of the system.