Procedure to Profile the NMUs Connected to AI Engine - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English
Steps below provides a procedure to profile the NoC. It is expected that you have generated the hardware boot image (sd_card.img) before proceeding further.
  1. Setup the Vitis tool environment by sourcing the settings64.csh script.
    <Vitis Installation Path>/Vitis/<202x.x>/settings64.sh
  2. Launch the Chipscopy server using the command below.
    cs_server &
    Figure 1. Chipscopy Server
  3. Run the hardware server from the computer that connects to target board. To do so, launch hardware server from the computer that has JTAG connection to the vck190 board. Note the computer Name or IP address as highlighted above for cs_server.
  4. Boot up the board using the sd_card image.
  5. In the local Linux terminal, navigate to the directory where the v++ build output is present.
  6. Configure the options to the vperf command as shown below.
    vperf --link_summary <Design>.xsa.link_summary --hw_url <TCP:ComputerName>/<IP>:3121 --cs_url <TCP:ComputerName>/<IP>:3042 --verbose

    You must provide the --link_summary option, which should be set to the link_summary output generated from the v++ --link step. Set the --hw_url option to the hardware server location, and the --cs_url option to the Chipscopy location.

    The output below is generated when the command is executed.

    [INFO] Successfully connected to hw_server and cs_server
    [INFO] Debug IP layout found at : <Vitis Build Path>/_x/link/int/debug_ip_layout.rtd
    [INFO] HW Device : xilinx_vck190_base_<>
    1  NOC_NMU128_X0Y6     :  CIPS_0/FPD_CCI_NOC_0
    2  NOC_NMU128_X0Y7     :  CIPS_0/FPD_CCI_NOC_1
    3  NOC_NMU128_X0Y8     :  CIPS_0/FPD_CCI_NOC_2
    4  NOC_NMU128_X0Y9     :  CIPS_0/FPD_CCI_NOC_3
    5  NOC_NMU128_X0Y4     :  CIPS_0/FPD_AXI_NOC_0
    6  NOC_NMU128_X0Y5     :  CIPS_0/FPD_AXI_NOC_1
    7  NOC_NMU128_X0Y3     :  CIPS_0/LPD_AXI_NOC_0
    8  NOC_NMU128_X0Y2     :  CIPS_0/PMC_NOC_AXI_0
    9  NOC_NMU128_X7Y10    :  ai_engine_0/M03_AXI
    10 NOC_NMU128_X10Y10   :  ai_engine_0/M00_AXI
    11 NOC_NMU128_X12Y10   :  ai_engine_0/M01_AXI
    12 NOC_NMU128_X8Y10    :  ai_engine_0/M05_AXI
    13 NOC_NMU128_X9Y10    :  ai_engine_0/M04_AXI
    14 NOC_NMU128_X11Y10   :  ai_engine_0/M02_AXI
    15 DDRMC_X0Y0          :  DDRMC/DDRMC_X0Y0
    16 DDRMC_X3Y0          :  DDRMC/DDRMC_X3Y0
    17 DDRMC_X1Y0          :  DDRMC/DDRMC_X1Y0

    The vperf command reads the metadata from the link summary and populates all the nodes corresponding to the NoC NMUs used by the design. For example, the output above lists the NoC NMU units that corresponds to the Control, Interrupt and Processing subsystem, AI Engine GMIO interface ports, and DDR memory controller channel ports.

    The vperf profiling control commands can be used to filter out the nodes of interest and start profiling.

    Below is list of the control commands that are available to start/end profiling the nodes of interest .

    Table 1. vperf Control Commands
    Command Description
    s/start_profile Starts profiling.
    e/end_profile Ends profiling.
    q/quit End profiling and quits the vperf command.
    n/set_npi_sample_period Sets the sample period that signifies the period over which values are sampled. Issuing this command discovers the debug cores that are available and displays the supported sample periods. You can choose sample period using the ID.
    ID Sample Period
    0 56 ms
    1 112 ms
    2 224 ms
    3 447 ms
    4 895 ms
    5 1790 ms
    6 3579 ms
    7 7158 ms
    t/set_tslide Represents 2TSLIDE clock cycles as one count. This is useful when overflow occurs while profiling the NoC.
    f/set_filter This command allows you to choose the nodes to profile. You can specify the list of nodes separated by comma or specify the range.
    Filter String Nodes Selected
    1,3,4,5,6,8 1,3,4,5,6,8
    1-5,8 1,2,3,4,5,8
    2,4,9-10 2,4,9,10
    It is recommended that you profile up to four nodes at a time. This is due to the slower sampling intervals over JTAG. Profiling more than four nodes could result in missing samples and consequently inaccurate results.
    c/clear_filter Clears all the filters and profiles all nodes(default)
    p/print This prints the configuration for easy reference before starting the profile.

    Below is an example of setting npi_sample_period, Tslide options .

    > n
    Discovering debug cores...Done!
    
    Enumerating NoC (warning this can take 60-80s on xcvc1902)...Done
    Discovered Nodes: 
    {'enabled': ['DDRMC_X0Y0', 'DDRMC_X1Y0', 'DDRMC_X3Y0', 'NOC_NMU128_X0Y3', 'NOC_NMU128_X0Y2', 'NOC_NMU128_X0Y4', 'NOC_NMU128_X0Y5', 'NOC_NMU128_X0Y6', 'NOC_NMU128_X0Y7', 'NOC_NMU128_X0Y8', 'NOC_NMU128_X0Y9', 'NOC_NMU128_X11Y10', 'NOC_NMU128_X12Y10', 'NOC_NMU128_X9Y10', 'NOC_NMU128_X10Y10', 'NOC_NMU128_X7Y10', 'NOC_NMU128_X8Y10', 'NOC_NSU128_X6Y6'], 'disabled': [], 'invalid': []}
    Select ID from the list:
    ID  Sample Period 
    0  :  56ms
    1  :  112ms
    2  :  224ms
    3  :  447ms
    4  :  895ms
    5  :  1790ms
    6  :  3579ms
    7  :  7158ms
    
    >> 0
    [INFO] Sample period changed to 56ms
    
    > t
    Select Tslide between 0 and 27:
    >> 1
    [INFO] Tslide changed to: 1.
    
  7. Once all configurations are set, you can start the profiling using s/start_profile.
  8. Navigate to the hardware Linux console and run the application. Once the run is complete you can end the profiling using e/end_profile.
  9. When you quit from the vperf utility using the q/quit command, a new vperf.run_summary file gets generated which can be opened in the Vitis IDE.

    In the example below the AI Engine application includes three GMIO input ports and six GMIO output ports. When you use option'f command on the vperf utility to filter the nodes of interest for profiling, it displays all the available nodes as shown below.

    > f
    List of detected nodes in the design:
    1  NOC_NMU128_X0Y6     :  CIPS_0/FPD_CCI_NOC_0
    2  NOC_NMU128_X0Y7     :  CIPS_0/FPD_CCI_NOC_1
    3  NOC_NMU128_X0Y8     :  CIPS_0/FPD_CCI_NOC_2
    4  NOC_NMU128_X0Y9     :  CIPS_0/FPD_CCI_NOC_3
    5  NOC_NMU128_X0Y4     :  CIPS_0/FPD_AXI_NOC_0
    6  NOC_NMU128_X0Y5     :  CIPS_0/FPD_AXI_NOC_1
    7  NOC_NMU128_X0Y3     :  CIPS_0/LPD_AXI_NOC_0
    8  NOC_NMU128_X0Y2     :  CIPS_0/PMC_NOC_AXI_0
    9  NOC_NMU128_X7Y10    :  ai_engine_0/M03_AXI
    10 NOC_NMU128_X10Y10   :  ai_engine_0/M00_AXI
    11 NOC_NMU128_X12Y10   :  ai_engine_0/M01_AXI
    12 NOC_NMU128_X8Y10    :  ai_engine_0/M05_AXI
    13 NOC_NMU128_X9Y10    :  ai_engine_0/M04_AXI
    14 NOC_NMU128_X11Y10   :  ai_engine_0/M02_AXI
    15 DDRMC_X0Y0          :  DDRMC/DDRMC_X0Y0
    16 DDRMC_X3Y0          :  DDRMC/DDRMC_X3Y0
    17 DDRMC_X1Y0          :  DDRMC/DDRMC_X1Y0
    

    One important point to observe here is that the design has a total of nine I/O ports. However, there are only six NMU nodes (ID: 9-14) corresponding to the AI Engine. This is because, for each AI Engine to NMU connection through the interface tile, there are two MM2S and two S2MM channels supported.

    There is a high possibility that read and write happens through the same interface tile but on different channels. Hence, they use the same NMU node for both read and write transfers, and the same NMU node appears in both NoC counters in the Vitis IDE as shown in the following figure.

    Information on GMIO port (both input and output) interface tile and channel mapping can be found in the graph compile summary report in the Vitis IDE.

    Figure 2. Interface Channels

    In the following example, the configuration is set to profile three NoC NMUs, 9-11 (considering the JTAG bandwidth limitations).

    Enter the filter to select Nodes from the above list:
    >> 9,10,11
    [INFO] Selected nodes according to the filter :
    1  NOC_NMU128_X7Y10    :  ai_engine_0/M03_AXI
    2  NOC_NMU128_X10Y10   :  ai_engine_0/M00_AXI
    3  NOC_NMU128_X12Y10   :  ai_engine_0/M01_AXI
    

    The NoC profile output in Vitis IDE is shown below.

    Figure 3. NoC Counters Read
    1. NoC counters read group, which represents the traffic going through NMUs from DDR to AI Engine.
    2. NMU nodes corresponding to the GMIO input read.

      The NoC profile output in the Vitis IDE contains NMU write groups and the corresponding nodes.

      Figure 4. NoC Counters Write
    3. NoC Counter write group, which represents the traffic going through NMUs from AI Engine to DDR.
    4. NMU nodes corresponding to the GMIO output write.

      The configuration is set to profile NoC NMUs, 12-14 (considering the JTAG bandwidth limitations).

      Enter the filter to select Nodes from the above list:
      >> 12,13,14
      [INFO] Selected nodes according to the filter :
      1  NOC_NMU128_X8Y10    :  ai_engine_0/M05_AXI
      2  NOC_NMU128_X9Y10    :  ai_engine_0/M04_AXI
      3  NOC_NMU128_X11Y10   :  ai_engine_0/M02_AXI

To get profiling for other NMU nodes:

  1. Re-configure the npi_sample_period, TSLIDE values and filter the required nodes.
  2. Start the profiling.
  3. Re-run the application on hardware.
  4. End the profiling and quit to generate the new vperf.run_summary file.
Figure 5. Only NoC Counter Write