Preliminaries - Preliminaries - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

In Part 2a, we examined the generated assembler code and found a NOP (no operation) between the VFPMAC (vector floating-point multiply accumulate) mnemonics. This NOP is unavoidable as a floating-point accumulation requires two cycles (see Figure Pipeline Diagram of AI Engine Fixed-point Vector Unit Multiplication and Upshift Paths of AM009).

We can split the matrix-vector multiplication into two separate multiply accumulate operations to perform a floating-point accumulation on each cycle.

Note: Use the multiply accumulate API to scale each matrix column by the corresponding vector element, rather than multiplying each matrix row by the column vector.

Fig. 1

Thus, splitting the vector additions into even and odd parts allow us to perform independent multiply accumulate operations:

Fig. 2

Also, the AI Engine has two load units. The Julia program aie_iir_2b.jl splits the matrix into even and odd columns and generates two header files.

We start by using the AI Engine APIs.