Tutorial Example - 2025.1 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-08-25
Version
2025.1 English

In this tutorial, the matrix sizes are the same but the input data type is int8 for both A and B matrices but the output data type can be either int16 or int32.

  • The sub matrix A is of size 4x16 on 8 bits which is 512 bits: 2 clocks cycles are necessary to load it

  • The sub matrix B is of size 16x8 on 8 bits which is 1024 bits: 4 clocks cycles are necessary to load it

  • The sub matrix C is of size 4x8 on 16 or 32 bits which is 512 or 1024 bits: 2 or 4 clocks cycles are necessary to store it, once every 4 sub-matrix multiplication-accumulation.

  • Finally, 512 MACs must be performed for this matrix which can be done in 2 clock cycles (256 int8 x int8 multiplication-accumulations can be performed each cycle).

The overall maximum efficiency is 50%: The limitation comes from the load operation of the B sub-matrix.

A simple way to balance load/compute/store operations is to load 2 sub-matrices A and 1 sub-matrix B to perform 2 multiplication-accumulations for each B.