Vitis Analyzer - Vitis Analyzer - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

Vitis Analyzer is an essential tool for accessing information on compilation, simulation, and implementation of AI Engine graphs. You can use it to obtain a summary on profiling data and to graphically display trace events. You can launch the tool with the vitis_analyzer command, or for this example, by entering:

$ make analyze

The Graph view displays connectivity of the AI Engine graph. The following figure displays this for the current example. This simple example shows a softmax kernel with streaming data input and output. Also visible are two buffers in data memory used for holding intermediate computations.

figure6

Figure 6 - Vitis Analyzer Graph View

The Array view displays how the AI Engine graph maps to the AI Engine array for the device specified. This example uses a VE2802 Versal AI Edge device which contains 304 AI Engine tiles. As shown in the following figure, this example utilizes a single AI Engine tile. The tile contains an AI Engine for kernel processing along with work buffers in data memory. The amount of memory required for these buffers depends on the number of classes in the softmax function. For this example with 2048 classes, only a small portion of the 64 kB associated with the tile gets utilized.

figure7

Figure 7 - Vitis Analyzer Array View

The following figure contains information from the Profile view. The highlighted fields show that the softmax kernel takes up to 8,083 cycles to process 2048 classes. For lowest speed grade Versal devices, this translates to a processing rate of ~123,716 softmax computations per second. Higher speed grade devices have a peak rate of ~154,645 softmax computations per second.

figure8

Figure 8 - Vitis Analyzer Profile View

The following figure shows part of the Vitis Analyzer trace view. The cursors show that the time between the end of one kernel invocation to the end of the next is 10.164 \(\mu s\). The additional overhead causes softmax computation rate to decrease to ~98,386 computations per second in higher speed grade devices. To improve processing rate, investigate using buffer kernel inputs instead of streams. This loads kernel data from the wider memory interfaces. You can also analyze the generated microcode to determine how to further optimize computation.

figure9

Figure 9 - Vitis Analyzer Trace View