- Data Movement
- The application uses the GMIO attribute to make external memory-mapped
connections to and from global memory. These connections are created between the
AI Engine kernel and
the logical global memory port of the hardware platform design through an NoC.
In this design, the buffer descriptors are programmed in the AI Engine
AI Engine interface
tiles DMAs to initiate AI Engine to DDR read and write transactions from the PS
program. The burst length of the memory-mapped transaction is 64-bit, and AI Engine interface
tiles DMAs use physical memory addressing read/write data from global
memory.Figure 1. Data Movement
- Data Slicing
- To compute matrix multiplication on AI Engine, matrix A is sliced horizontally and
distributed equally among all the core used through the AI Engine
AXI4-Stream network. Matrix B is transposed and feed to the first
core in the design element by element. The first core shares the input matrix B
with the other AI Engine cores through
the AXI4-Stream connection. As the output is in a z-order, hence a
re-ordering of the output matrix is required.Figure 2. Data Slicing
- Build Flow
- The following graph explains the build flow for GMIO based AI Engine designs.
- Runtime Execution
- At runtime, Linux application binary calls AI Engine
userspace
driver, and runtime library, libadf_api_xrt.a. AI Engineuserspace
drivers abstract the kernel-space driver which handles runtime configurations along with ELF loading.Figure 4. Runtime Execution - Sample Output
- Follow the Linux boot process to boot the Linux on the target. At the Linux
login prompt, login with the user as root and password as root. The AI Engine
XCLBIN
and executables are pre-installed in the /usr/bin/ directory.root@xilinx-vck190-202x_x:~# aie-matrix-multiplication Initializing ADF API... [INFO] AIE GMIO Matrix Multiplication [INFO] Matrix size(int32): 1200x1200 [ 68.288633] zocl-drm axi:zyxclmm_drm: zocl_create_client: created KDS client for pid(824), ret: 0 [ 68.297535] zocl-drm axi:zyxclmm_drm: zocl_destroy_client: client exits pid(824) [ 68.304999] zocl-drm axi:zyxclmm_drm: zocl_create_client: created KDS client for pid(824), ret: 0 [ 68.349096] [drm] found kind 29(AIE_RESOURCES) [80573.078]Loading PDI from DDR [80573.164]Monolithic/Master Device [80576.303]3.190 ms: PDI initialization time [80580.153]+++Loading Image#: 0x0, Name: aie_image, Id: 0x1C000000 [80585.916]---Loading Partition#: 0x0, Id: 0x0 [80645.089] 55.070 ms for Partition#: 0x0, Size: 19127040 Bytes [80647.906]Subsystem PDI Load: Done [ 68.349108] [drm] found kind 18(PDI) [ 68.448707] [drm] FPGA Manager load DONE [ 68.455278] [drm] Partition 1 already requested XAIEFAL: INFO: Resource group Avail is created. XAIEFAL: INFO: Resource group Static is created. XAIEFAL: INFO: Resource group Generic is created. [INFO] XCLBIN download complete [INFO] AIE cores are done executing [INFO] Running sanity check [INFO] XGeMM Success! [ 68.469075] [drm] zocl_xclbin_read_axlf fe3eeecf-1b48-4862-4723-5aba0732fe7b ret: 0 [ 114.412806] zocl-drm axi:zyxclmm_drm: zocl_destroy_client: client exits pid(824) root@xilinx-vck190-202x_x:~#
- Customizing and Rebuilding
- As previously mentioned, you can change the number of AI Engine cores used for matrix multiplication. However,
because the immediately available data memory to the core is limited, reducing
the number of AI Engine cores reduces
the maximum matrix size supported by the application. Within the
config.h
header file, theNUM_HW_ROWS
andNUM_HW_COLS
macros can be set to change the number of cores used. The maximum number of AI Engine cores available is 400.