AI Engine Simulator - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English

The AMD Versal™ adaptive SoC AI Engine multi-threaded simulator (aiesimulator) includes the modeling of the global memory (DDR memory) and the network on chip (NoC) in addition to the AI Engine array. When the application is compiled using the simulation target, the AI Engine multi-threaded simulator can be invoked as follows.

aiesimulator –-pkg-dir=./Work

This will run the AI Engine simulator in the multi-threaded mode consuming some CPU threads.

Using Threads in the AI Engine Simulation

The calculation for the default number of threads used during AI Engine Simulation is based on the number of CPU cores available on the user machine and the number of active AI Engine core tiles used in the ​AI Engine Design.

In any multi-threaded process, there is overhead associated with the launching and synchronization of threads. The default threads consumed during AI Engine simulation is the optimal number of threads calculated by the multi-threaded model so that the overall computation on each thread should be substantial when compared to the overhead.

By default, the graph.run() function with no argument specifies a graph that runs forever. The AI Engine compiler generates code to execute the data flow graph in a perpetual while loop, thus simulation also runs perpetually. To create terminating programs for debugging, specify graph.run(<number_of_iterations>) in your graph code to limit the execution for the specified number of iterations. The specified number of iterations can be any positive integer value.graph::run(-1) also specifies a graph that runs forever.

The AI Engine simulator command first configures the simulator as specified in the compiler generated Work/config/scsim_config.json file. This includes loading PL IP blocks and their connections, configuring I/O data file drivers, and configuring the NoC and global memory (DDR memory) connections. It then executes the specified PS application and finally exits the simulator.

The AI Engine simulator has an optional --profile option, which enables printfs in kernel code to appear on the console, and also generates profile information. Also, the --dump-vcd <filename> option generates a value change dump (VCD) for the duration of the simulation. The --simulation-cycle-timeout <number-of-cycles> can be used to exit the simulation after a given number of clock cycles.

Important: If you do not provide either the clock cycles or the number of runs to graph.run(), the simulation runs forever. You need to press Ctrl+C twice to exit the simulator.
Tip: You might observe cycle count differences between simulation runs on the same design. This is because the simulator waits for a few seconds for all pending transactions (such as DMA) to finish. During this wait time, the simulator process is still ticking but can be context-switched by the OS and total cycles can be different for each run. To ensure that the total cycles are the same for each run, you should use the AI Engine simulator --simulation-cycle-timeout option to stop the simulator on the exact cycle. The total cycles that appear on the profiling report are same on each run.
Important: Do not include <iostream> in the kernel code to enable printfs. The use of #include <iostream> in the kernel code results in a compilation error for both the x86 simulator and the aiesimulator simulators.

The AI Engine simulator reports Average Throughput of each PLIO at the end of the simulation.

The report is printed on the console and in AIESimulator.log

Enabling core(s) of graph G
Waiting for core(s) of graph G to finish execution ...
core(s) are done executing
Exiting!
Cores are done executing but the simulation will run for some more cycles to allow PLIO to be flushed
Stopping Simulator.

Info: /OSCI/SystemC: Simulation stopped by user.
----------------------------------------------------------------------------------------------
Port Name           | Type              | Average Throughput
----------------------------------------------------------------------------------------------
Input_0             | IN                | 1499.787536 MBps  
Input_1             | IN                | 1480.061002 MBps  
Input_2             | IN                | 1500.666709 MBps  
Input_3             | IN                | 1499.567904 MBps  
Output_0            | OUT               | 1380.798274 MBps  
Output_1            | OUT               | 1361.883229 MBps  
Output_2            | OUT               | 1380.798274 MBps  
Output_3            | OUT               | 1380.798274 MBps  
----------------------------------------------------------------------------------------------

JSON file generated successfully!
IP-INFO: deleting ip PSIP_ps_i24 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
IP-INFO: deleting packet ip 
[INFO] : Simulation Finished, Sim result: 0 Total Simulation time 19696001 ps
AIEMLsim feature license is released.

The same report is generated in the summary table of the Analysis View of the AMD Vitis™ Unified IDE.

Figure 1. Summary Table in Analysis View

For designs using External Traffic Generators in AI Engine Simulation, throughput values might not be accurate because external traffic generators are not cycle accurate.

|INFO | in_interpolator | Total number of bytes sent : 2048
|INFO | out_interpolator | Total number of bytes received : 4096
|INFO | in_classifier | Total number of bytes sent : 4096
WARNING::[ XTLM_IPC::006 ] out_interpolator Closing Socket
|INFO | out_classifier | Total number of bytes received : 4096
WARNING::[ XTLM_IPC::006 ] in_interpolator Closing Socket
WARNING::[ XTLM_IPC::006 ] out_classifier Closing Socket
WARNING::[ XTLM_IPC::006 ] in_classifier Closing Socket
Stopping Simulator.

Info: /OSCI/SystemC: Simulation stopped by user.
INFO: The PLIO throughput may not be accurate when an External Traffic Generator is used.
----------------------------------------------------------------------------------------------
Port Name           | Type              | Average Throughput
----------------------------------------------------------------------------------------------
in_interpolator     | IN                | 315.640219 MBps   
in_classifier       | IN                | 516.259138 MBps   
out_classifier      | OUT               | 618.282816 MBps   
out_interpolator    | OUT               | 617.686090 MBps   
----------------------------------------------------------------------------------------------

JSON file generated successfully!
IP-INFO: deleting ip PSIP_ps_i6 
[INFO] : Simulation Finished, Sim result: 0 Total Simulation time 27676 ns
AIEsim feature license is released.

The Average Throughput will not be reported if the graph is run indefinitely (graph.run(-1)). Only if AI Engine simulator is stopped using --simulation-cycle-timeout=<ns> can you see the Average Throughput at the end of the simulation.

For designs with stalls or deadlock, the Average Throughput reported is 0.

The Average Throughput report in HW Emulation is found in the simulate.log file.

Note: The AI Engine simulator reports Average Throughput for PLIOs only.

Graphs and Sub-Graphs Simulation Runtimes

On an AI Engine design consisting of a larger graph composed of smaller sub-graphs, it is recommended to simulate the individual sub graphs first. Once the sub-graphs are verified and meet requirements in terms of functionality and performance, simulation of the larger graph can be attempted. Typically, depending on the size of the sub graphs, simulation time for the larger graph might take longer.