Using Traffic Generators for AI Engine Designs - 2025.1 English - UG1701

Embedded Design Development Using Vitis User Guide (UG1701)

Document ID
UG1701
Release Date
2025-07-16
Version
2025.1 English

Overview

This section describes how to provide input and capture the output from the AI Engine array in all simulation and emulation modes using AXI traffic generators. In the AI Engine simulator, the input data stimulus is provided using the PLIO object which specifies a text file containing the data:

input_plio plin = input_plio::create("DataIn", adf::plio_32_bits, "data/input.txt");

Although this is a fast process to get your first simulation in place, the main limitation of this approach is that if you want to change the input file name for another simulation, you need to recompile the entire application. To avoid file name specification and rely on the independent External Traffic Generator to generate data traffic on the PLIO, see below:

input_plio plin = input_plio::create("DataIn", adf::plio_32_bits);

For hardware emulation, an equivalent feature exists that emulates the behavior of this PLIO and AXI4-Stream interface. Both Python and C++ APIs are provided to create these External Traffic Generators that will be connected seamlessly on any of these simulation or emulation modes.

The primary external data interfaces for the AI Engine array are AXI4-Stream interfaces. These are known as PLIOs and allow the AI Engine to receive data, operate on the data, and send data back on a separate AXI4-Stream interface. The input interface to the AI Engine is an AXI4-Stream consumer, and the output is an AXI4-Stream producer. To interact with these top level interfaces during hardware emulation complementary AXI4-Stream modules are provided. These complementary modules are referred to as the AXI traffic generators.

Tip: The width of a PLIO interface is an important system level design decision. The wider the interface the more data can be sent per PL clock cycle.

When developing an AI Engine application, you can test it standalone either in simulation (x86simulator, aiesimulator), or as part of a system project in emulation ( hw_emu). In either case, you need to send the input data from a predefined reference file and capture the output data in a separate file. Furthermore, if your AI Engine graph is intertwined with kernels that are located in the Programmable Logic (HLS C++ or RTL) then you also have to deal with these data flow interruptions. For example, a full system design might look like the following figure:

Figure 1. AI Engine + Programmable Logic Application

In a first step you replace all the connections which are not in the AI Engine array with text files to provide input data or capture output data:

Figure 2. Initial Simulation Framework

For more flexibility in data generation and verification you can exchange the text files with external traffic generators which enable dynamic simulated communication between the PL and the AI Engine array through AXI4-Stream TLM connected to Unix sockets. The power of these external traffic generator is that they can be used in all simulation/emulation framework without modification:

  • x86 Simulation
  • AI Engine simulation
  • HW Emulation

The overall simulation framework is illustrated in the following figure:

Figure 3. External Traffic Generator-Based AI Engine Simulation Flow

Each AI Engine block can be validated using an external test bench written in Python, MATLABĀ® , or C++.

For system projects, incorporating AI Engine graph applications and PL kernels, the data movers are replaced with external Traffic Generators source and sink, and the PL processing kernel is a streaming kernel connected to the AI Engine kernels. See the following figure for details.

Figure 4. Full System Emulation

The traffic generators are used to feed and flush the data into the full system with PL logic as well. You do not need to model the PS code writing data to DDR memory and model the data moving to the AI Engine kernels. This apporach replaces the data movers with external traffic generators dynamically producing data. The following sections describe the step-by-step changes that are needed to interface external traffic generators with an AI Engine system design for the emulation flow.

AI Engine Graph Modifications

Nothing has to be changed within the graph concerning the kernel connections. The definition of the traffic generators as the source of data from the PLIO port is the only change required as shown below. The example below is based on the code in Design Flow Using RTL Programmable Logic in AI Engine Kernel and Graph Programming Guide (UG1079).

plin = input_plio::create("DataIn1",adf::plio_32_bits);
clip_in = output_plio::create("clip_in",adf::plio_32_bits);
clip_out = input_plio::create("clip_out",adf::plio_32_bits);
plout = output_plio::create("DataOut1",adf::plio_32_bits);

The first parameter of the input/output plio declaration is important as this is the name that will be used on the traffic generator side to connect to the right socket.

x86 simulation and AI Engine simulation can be launched working with the traffic generators. Launching simulation requires running the aiesimulator or the x86simulator in parallel with the external traffic generator.

PL Kernels Change

When developing AI Engine applications for hardware emulation, you must model data transfers between AI Engine and programmable logic. However during initial development phase, the PL kernels are often unfinished and not ready to be used in Vitis link. The solution is to insert hooks in the programmable logic interface to connect to external traffic generators. AMD provides a complete set of pre-compiled .xo files that can be used for this purpose:

  • $(XILINX_VITIS)/data/emulation/XO/sim_ipc_axis_slave_32.xo, $(XILINX_VITIS)/data/emulation/XO/sim_ipc_axis_master_32.xo
  • $(XILINX_VITIS)/data/emulation/XO/sim_ipc_axis_slave_64.xo, $(XILINX_VITIS)/data/emulation/XO/sim_ipc_axis_master_64.xo
  • $(XILINX_VITIS)/data/emulation/XO/sim_ipc_axis_slave_128.xo, $(XILINX_VITIS)/data/emulation/XO/sim_ipc_axis_master_128.xo

The .xo files must be copied to the right location in your project and specified in the configuration file during the Vitis link stage.

Preparing Connectivity to Link the Traffic Generators

During the Vitis link stage (v++ -l), the previously defined .xo files will be used to connect the related kernel instances to the AI Engine graph. The hw_link.cfg configuration file is created in such a way that the kernel instance names matches the names you defined in the graph for the input_plio and the output_plio. For example, the code below matches the PLIO assignments in the example above:

[connectivity]

nk=sim_ipc_axis_master_32:1:in_interpolator
nk=sim_ipc_axis_slave_32:1:out_classifier
nk=polar_clip:1:polar_clip

sc=in_interpolator.M00_AXIS:ai_engine_0.in_interpolator
sc=ai_engine_0.out_interpolator:polar_clip.in_sample
sc=polar_clip.out_sample:ai_engine_0.in_classifier
sc=ai_engine_0.out_classifier:out_classifier.S00_AXIS

The format of the --connectivity.nk command is the kernel name such as sim_ipc_axis_master_32, the number of kernel instances to create, and the names of each kernel instance (in_interpolator). Refer to --connectivity Options for more information on the command.

The --connectivity.sc command defines the streaming connections between PL kernels, or between PL kernels and the AI Engine graph. In the example above the output port of the traffic generator in_interpolator.M00_AXIS is connected to the input port ai_engine_0.in_interpolator.

With this naming approach, the same external traffic generator can be used for multiple simulation or emulation runs. In the case of hardware emulation (hw_emu), you can write the external traffic generator in C++, Python, MATLAB, or HDL if familiar with RTL coding.

Host Code

The host code creation is relatively simple. As there are no programmable logic kernels, you can avoid all the stages where you look for and run the PL kernels as well as the parts where you allocate memory for all the buffer objects. The stages are:

  • Open the device
  • Load the xclbin file
  • Register XRT to connect to the design
  • Run the AI Engine graph

After compiling the host code, you can package the entire project. Running the emulation consists of running the external traffic generator in parallel with the standard emulation launch.