AI Engine design and compilation - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-12-05
Version
2025.2 English

The design is based on the replication of the same processing chain:

  1. Passthrough.

  2. Filtering.

  3. Gain.

  4. Passthrough.

The filter and the gain kernels receive asynchronous RTP to set the coefficients and the gain value.

By default 4 of these chains are implemented in the design. It can be changed using the Makefile parameter NAntenna.

In graph.cpp the graph is instanciated as follows: MyGraph<NAntenna,40> G("");. The value 40 means that a utilization ratio of 40% is specified for the filter and the gain leading to a co-location for these 2 kernels. If you want them in different tiles, just replace by a value above 50.

Here is the subgraph of the 4th antenna:

No graph image

What is important is that during the compilation, the user should declare that trace events should be extracted during runtime. Some specific flags have to be set depending on the way to extract these events: through GMIO or PLIO:

GMIO based event extraction

aiecompiler_trace_gmio_options.cfg

[aie]
event-trace=runtime
broadcast-enable-core=true
event-trace-port=gmio
xlopt=0

PLIO based event extraction

aiecompiler_trace_plio_options.cfg

[aie]
event-trace=runtime
broadcast-enable-core=true
num-trace-streams=16
event-trace-port=plio
trace-plio-width=128
xlopt=0

Here are some options definitions:

  • event-trace=runtime: trace events will be specified at runtime. This is actually the only possible option for hardware tracing.

  • broadcast-enable-core=true: ensures that the enable core signals are broadcasted so that all kernels start within a few clock cycles of each other.

  • event-trace-port=gmio/plio: selects the port type used for event tracing. GMIO is generally used for designs with limited PL resources, while PLIO is preferred for designs with sufficient PL resources.

  • num-trace-streams=16: sets the number of trace streams within the AI Engine array to be used for event tracing. The default is 4 streams, the maximum is 16. Increasing the number of streams can help reduce contention within the trace data path, especially in designs with a large number of active kernels. The drawback is that it would use more resource making it more difficult for the router to route the AI Engine design by itself.

  • trace-plio-width=128: specifies the width of the PLIO trace interface.

  • xlopt=0: disables extra optimizations that could interfere with event tracing. Typically the compiler will avoid inlining the kernels within the main function, allowing the user to see each kernel as a separate entity in the trace and have a clear view of all iterations.