Performance analysis - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

After running AIE simulation with 64x64 matrices, we can look at the profiling results with:

make OPT=0 aieviz

This opens up vitis analyzer with the run summary displayed. The profile tab is opened clicking on the last section Profile:

Open Profile information

There are 2 tiles which contain a kernel:

  • Column 10: the kernel output data type is int32

  • Column 20: the kernel output data type is int16

Two Tiles contain kernels

Let start with the first one which is outputing int32 data. The Total Function Time tab will inform us on the number of cycles necessary to compute this matrix multiply:

Performance of int32 version of the kernel

We can see that the number of cycles to run the entire function is 2092 cycles. If we want to compute the vector processor usage efficiency we have to use the following data:

  • number of multiplications to perform: 64 x 64 x 64

  • number of parallel int8 x int8 multiplications in the SIMD vector processor: 256

             64 x 64 x 64
Efficiency = ------------ = 0.49
              2092 x 256

This efficiency is not very high and we will see how to improve it in the next part of this tutorial. Anyway we can have a look to the assembly code to verify why we are at this level of performance. The Profile Details tab gives you access to this code:

Assembly Code of the Inner Loop

The inner loop is run 360 times (4 Iterations) and we can see how many VMUL and VMAC operations it contains: 8 VMUL/VMAC instructions out of 16 lines which is close to the 50% efficiency computed above.

An equivalent efficiency can be computed from the 16 bits version of the kernel as the kernel duration is 2089 cycles.