Profiling the NMUs Connected to AI Engine - 2025.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2025-11-20
Version
2025.2 English
Use the following steps to profile the NoC:
Note: Generate the hardware boot image (sd_card.img) before proceeding further.
  1. Set up the Vitis tool environment by sourcing the settings64.csh script.
    <Vitis Installation Path>/Vitis/<202x.x>/settings64.sh
  2. Launch the Chipscopy server using the following command.
    cs_server &
    Figure 1. Chipscopy Server
  3. Run the hardware server from the computer that connects to target board. To do so, launch the hardware server from the computer that has the JTAG connection to the vck190 board. Note the computer Name or IP address as highlighted above for cs_server.
  4. Boot up the board using the sd_card image.
  5. In the local Linux terminal, navigate to the directory containing the v++ build output.
  6. Configure the options to the vperf command as follows.
    vperf --link_summary <Design>.xsa.link_summary --hw_url <TCP:ComputerName>/<IP>:3121 --cs_url <TCP:ComputerName>/<IP>:3042 --verbose

    You must provide the --link_summary option. Set this to the link_summary output generated from the v++ --link step. Set the --hw_url option to the hardware server location, and the --cs_url option to the Chipscopy location.

    Executing the command outputs the following.

    [INFO] Successfully connected to hw_server and cs_server
    [INFO] Debug IP layout found at : <Vitis Build Path>/_x/link/int/debug_ip_layout.rtd
    [INFO] HW Device : xilinx_vck190_base_<>
    1  NOC_NMU128_X0Y6     :  CIPS_0/FPD_CCI_NOC_0
    2  NOC_NMU128_X0Y7     :  CIPS_0/FPD_CCI_NOC_1
    3  NOC_NMU128_X0Y8     :  CIPS_0/FPD_CCI_NOC_2
    4  NOC_NMU128_X0Y9     :  CIPS_0/FPD_CCI_NOC_3
    5  NOC_NMU128_X0Y4     :  CIPS_0/FPD_AXI_NOC_0
    6  NOC_NMU128_X0Y5     :  CIPS_0/FPD_AXI_NOC_1
    7  NOC_NMU128_X0Y3     :  CIPS_0/LPD_AXI_NOC_0
    8  NOC_NMU128_X0Y2     :  CIPS_0/PMC_NOC_AXI_0
    9  NOC_NMU128_X7Y10    :  ai_engine_0/M03_AXI
    10 NOC_NMU128_X10Y10   :  ai_engine_0/M00_AXI
    11 NOC_NMU128_X12Y10   :  ai_engine_0/M01_AXI
    12 NOC_NMU128_X8Y10    :  ai_engine_0/M05_AXI
    13 NOC_NMU128_X9Y10    :  ai_engine_0/M04_AXI
    14 NOC_NMU128_X11Y10   :  ai_engine_0/M02_AXI
    15 DDRMC_X0Y0          :  DDRMC/DDRMC_X0Y0
    16 DDRMC_X3Y0          :  DDRMC/DDRMC_X3Y0
    17 DDRMC_X1Y0          :  DDRMC/DDRMC_X1Y0

    The vperf command reads the metadata from the link summary and populates all the nodes corresponding to the NoC NMUs used by the design. For example, the output above lists the NoC NMUs that correspond to the following:

    • The Control, Interrupt and Processing subsystem
    • AI Engine GMIO interface ports
    • DDR memory controller channel ports

    You can use the vperf profiling control commands to filter out the nodes of interest and start profiling.

    Following is a list of the control commands that are available to start/end profiling the nodes of interest.

    Table 1. vperf Control Commands
    Command Description
    s/start_profile Starts profiling.
    e/end_profile Ends profiling.
    q/quit End profiling and quits the vperf command.
    n/set_npi_sample_period Sets the sample period that signifies the period over which values are sampled. Issuing this command discovers the debug cores that are available and displays the supported sample periods. You can choose sample period using the ID.
    ID Sample Period
    0 56 ms
    1 112 ms
    2 224 ms
    3 447 ms
    4 895 ms
    5 1790 ms
    6 3579 ms
    7 7158 ms
    t/set_tslide Represents 2TSLIDE clock cycles as one count. This is useful when overflow occurs while profiling the NoC.
    f/set_filter This command allows you to choose the nodes to profile. You can specify the list of nodes separated by comma or specify the range.
    Filter String Nodes Selected
    1,3,4,5,6,8 1,3,4,5,6,8
    1-5,8 1,2,3,4,5,8
    2,4,9-10 2,4,9,10
    It is recommended that you profile up to four nodes at a time. This is due to the slower sampling intervals over JTAG. Profiling more than four nodes can result in missing samples and consequently inaccurate results.
    c/clear_filter Clears all the filters and profiles all nodes(default)
    p/print This prints the configuration for easy reference before starting the profile.

    Following is an example of setting the npi_sample_period and Tslide options.

    > n
    Discovering debug cores...Done!
    
    Enumerating NoC (warning this can take 60-80s on xcvc1902)...Done
    Discovered Nodes: 
    {'enabled': ['DDRMC_X0Y0', 'DDRMC_X1Y0', 'DDRMC_X3Y0', 'NOC_NMU128_X0Y3', 'NOC_NMU128_X0Y2', 'NOC_NMU128_X0Y4', 'NOC_NMU128_X0Y5', 'NOC_NMU128_X0Y6', 'NOC_NMU128_X0Y7', 'NOC_NMU128_X0Y8', 'NOC_NMU128_X0Y9', 'NOC_NMU128_X11Y10', 'NOC_NMU128_X12Y10', 'NOC_NMU128_X9Y10', 'NOC_NMU128_X10Y10', 'NOC_NMU128_X7Y10', 'NOC_NMU128_X8Y10', 'NOC_NSU128_X6Y6'], 'disabled': [], 'invalid': []}
    Select ID from the list:
    ID  Sample Period 
    0  :  56ms
    1  :  112ms
    2  :  224ms
    3  :  447ms
    4  :  895ms
    5  :  1790ms
    6  :  3579ms
    7  :  7158ms
    
    >> 0
    [INFO] Sample period changed to 56ms
    
    > t
    Select Tslide between 0 and 27:
    >> 1
    [INFO] Tslide changed to: 1.
    
  7. Start profiling with s/start_profile after all configurations are set.
  8. Navigate to the hardware Linux console and run the application. When the run is complete, end the profiling using e/end_profile.
  9. When you quit from the vperf utility using the q/quit command, a new vperf.run_summary file generates. You can open this in the Vitis IDE.

    In the following example, the AI Engine application includes three GMIO input ports and six GMIO output ports. You can use the option f command on the vperf utility to filter the nodes of interest for profiling. This displays all the available nodes as shown below.

    > f
    List of detected nodes in the design:
    1  NOC_NMU128_X0Y6     :  CIPS_0/FPD_CCI_NOC_0
    2  NOC_NMU128_X0Y7     :  CIPS_0/FPD_CCI_NOC_1
    3  NOC_NMU128_X0Y8     :  CIPS_0/FPD_CCI_NOC_2
    4  NOC_NMU128_X0Y9     :  CIPS_0/FPD_CCI_NOC_3
    5  NOC_NMU128_X0Y4     :  CIPS_0/FPD_AXI_NOC_0
    6  NOC_NMU128_X0Y5     :  CIPS_0/FPD_AXI_NOC_1
    7  NOC_NMU128_X0Y3     :  CIPS_0/LPD_AXI_NOC_0
    8  NOC_NMU128_X0Y2     :  CIPS_0/PMC_NOC_AXI_0
    9  NOC_NMU128_X7Y10    :  ai_engine_0/M03_AXI
    10 NOC_NMU128_X10Y10   :  ai_engine_0/M00_AXI
    11 NOC_NMU128_X12Y10   :  ai_engine_0/M01_AXI
    12 NOC_NMU128_X8Y10    :  ai_engine_0/M05_AXI
    13 NOC_NMU128_X9Y10    :  ai_engine_0/M04_AXI
    14 NOC_NMU128_X11Y10   :  ai_engine_0/M02_AXI
    15 DDRMC_X0Y0          :  DDRMC/DDRMC_X0Y0
    16 DDRMC_X3Y0          :  DDRMC/DDRMC_X3Y0
    17 DDRMC_X1Y0          :  DDRMC/DDRMC_X1Y0
    

    An important observation is that the design has a total of nine I/O ports. However, there are only six NMU nodes (ID: 9-14) corresponding to the AI Engine. This is because each AI Engine to NMU connection through the interface tile supports two MM2S and two S2MM channels supported.

    There is a high possibility that read and write happens through the same interface tile but on different channels. Hence, they use the same NMU node for read and write transfers, and the same NMU node appears in both NoC counters in the Vitis IDE. See the following figure for reference.

    You can find information on GMIO port (input and output) interface tile and channel mapping in the graph compile summary report in the Vitis IDE.

    Figure 2. Interface Channels

    The following example sets the configuration to profile three NoC NMUs, 9-11 (considering the JTAG bandwidth limitations).

    Enter the filter to select Nodes from the above list:
    >> 9,10,11
    [INFO] Selected nodes according to the filter :
    1  NOC_NMU128_X7Y10    :  ai_engine_0/M03_AXI
    2  NOC_NMU128_X10Y10   :  ai_engine_0/M00_AXI
    3  NOC_NMU128_X12Y10   :  ai_engine_0/M01_AXI
    

    The following figure shows the NoC profile output in the Vitis IDE.

    Figure 3. NoC Counters Read
    1. NoC counters read group, which represents the traffic going through NMUs from DDR to AI Engine.
    2. NMU nodes corresponding to the GMIO input read.

      The NoC profile output in the Vitis IDE contains NMU write groups and the corresponding nodes.

      Figure 4. NoC Counters Write
    3. NoC Counter write group, which represents the traffic going through NMUs from AI Engine to DDR.
    4. NMU nodes corresponding to the GMIO output write.

      The following configuration profiles NoC NMUs, 12-14 (considering the JTAG bandwidth limitations).

      Enter the filter to select Nodes from the above list:
      >> 12,13,14
      [INFO] Selected nodes according to the filter :
      1  NOC_NMU128_X8Y10    :  ai_engine_0/M05_AXI
      2  NOC_NMU128_X9Y10    :  ai_engine_0/M04_AXI
      3  NOC_NMU128_X11Y10   :  ai_engine_0/M02_AXI

To get profiling for other NMU nodes:

  1. Re-configure the npi_sample_period and TSLIDE values and filter the required nodes.
  2. Start the profiling.
  3. Re-run the application on hardware.
  4. End the profiling and quit to generate the new vperf.run_summary file.
Figure 5. Only NoC Counter Write

In the preceding figure, only the NoC counters write group is in the profile summary and no read group. This is because the three nodes correspond to the GMIO output, and no GMIO input maps to that interface tile.

You can also open the Vivado project generated by the v++ link (_x/link/vivado/vpl/prj/prj.xpr) step. You can view the NoC interconnect network and locate the NMUs connected to any design with traffic flowing through the NoC.

Figure 6. NoC Vivado