Design Details - Design Details - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

The design in this tutorial starts with a base platform. This platform contains the control interface and processing system (CIPS), NoC, AI Engine, and the interfaces among them. The Vitis compiler linker step builds on top of the base platform. It adds the AI Engine graphs and PL kernels. PL kernels are added to the base platform depending on the application. The specific PL kernels present in each design might vary. An ADF graph connects to an extensible Vitis platform. The graph I/Os connect either to the platform ports or to ports on Vitis kernels. This connection uses Vitis compiler connectivity directives. The Vitis compiler -l step adds the components (refer to make XSA). These include the following:

  • libadf.a

  • Data mover kernel (dma_hls.[hw|hw_emu].xo)

  • Connection interfaces defined in the system configuration file

For a schematic view of the design with the extended platform as shown in the following figure, open the following in the Vivado IDE:

`build/gemm_$(MAT_DIMS)/x$(GEMM_INSTS)/[hw|hw_emu]/_x/link/vivado/vpl/prj/prj.xpr`

Image of GeMM AIE Vivado BD GeMM 32x32x32

In this design, the GeMM computation happens in multiple stages. The input is split and broadcast to multiple cores. The number of rows in Mat A and the number of columns in Mat B is split into several blocks. This splitting is based on the cascade length. Then each block in Mat A is multiplied with the corresponding block in Mat B. This generates blocks of outputs. These blocks finally propagate to the final output.

The datamover kernel provides the parallel inputs required by the GeMM AIE graph. The data coming out of the AI Engines is streamed to a PL kernel. In this kernel, the data is compared to the expected constant pattern. If there is a mismatch, the system records it in the variable errCnt. The host app reads this variable to determine whether the test has passed or failed.

The system debugging and profiling IP (DPA) is added to the PL region of the device. It captures AI Engine runtime trace data if the EN_TRACE option is enabled in the design. The dma_hls kernel and the AI Engine array interface are both operating at 312.5 MHz.