Opportunities for Optimization - Opportunities for Optimization - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

In this tutorial, the 8-engine design was achieved by repeatedly instantiating the single engine design. This leaves a few opportunities for optimization on the table. These options include the following:

  • Each engine uses its own iFFT engine to transform the radar pulse. In reality, this operation is common to all engines and an single iFFT graph could achieve it. The output of that common graph could then be broadcast to all eight engines. Clearly from ifft2k_async() this approach saves seven instances of six tiles or ~40 tiles. It also removes seven GMIOs from the design which dramatically reduces the NoC bandwidth required to deliver the radar pulses to the AI Engine array from DDR.

  • Constructing an 8-engine design with a single iFFT requires some code restructuring because routing the iFFT graph output to all engines requires a new top-level graph. This complicates the “Stamp and Repeat” approach to placement but is manageable.

  • You can, in principle, remove the PL URAM portion of the design by partitioning these image buffers to DDR instead of the PL. In this case, the radar processing requires eight GMIO pairs, one pair for each engine. The data flow proceeds from DDR, streaming the input image to each engine over the NoC to the AIE array, updating each image segment by its engine, then streaming the output image back to DDR over the NoC. This removes all PL resources from the design—a significant saving and simplification. You need to optimize the DDR buffer design to maximize the burst bandwidth available to each engine. AMD is currently exploring this variant of the design.

Copyright © 2025 Advanced Micro Devices, Inc

Terms and Conditions