Vitis Analyzer is an essential tool for accessing information on compilation, simulation, and implementation of AI Engine graphs. It can be used to obtain a summary on profiling data and to graphically display trace events. The tool might be invoked with the vitis_analyzer command, or for this example, by entering:
$ make analyze
The Graph view displays connectivity of the AI Engine graph, which for this example, is displayed in the following figure. This simple example shows a softmax kernel with streaming data input and output. Also visible are two buffers in data memory used for holding intermediate computations.
Figure 6 - Vitis Analyzer Graph View
The Array view displays how the AI Engine graph is mapped to the AI Engine array for the device specified. This example uses a VE2802 Versal AI Edge device which contains 304 AI Engine tiles. As shown in the following figure, this example utilizes a single AI Engine tile, which contains an AI Engine for kernel processing along with work buffers in data memory. The amount of memory required for these buffers depends on the number of classes in the softmax function. For this example with 2048 classes, only a small portion of the 64 kB associated with the tile is utilized.
Figure 7 - Vitis Analyzer Array View
The following figure contains information from the Profile view. The highlighted fields show that the softmax kernel takes up to 8,083 cycles to process 2048 classes. For lowest speed grade Versal devices, this translates to a processing rate of ~123,716 softmax computations per second. Higher speed grade devices could have a peak rate of ~154,645 softmax computations per second.
Figure 8 - Vitis Analyzer Profile View
The following figure shows part of the Vitis Analyzer trace view. The cursors show that the time between the end of one kernel invocation to the end of the next is 10.164 $\mu s$. The additional overhead causes softmax computation rate to decrease to ~98,386 computations per second in higher speed grade devices. To improve processing rate, one might investigate using buffer kernel inputs instead of streams. This causes kernel data to be loaded from the wider memory interfaces. Generated microcode could also be analyzed to determine how computation could be further optimized.
Figure 9 - Vitis Analyzer Trace View