- Setup the Vitis tool environment by sourcing the
settings64.csh
script.
<Vitis Installation Path>/Vitis/<202x.x>/settings64.sh
- Launch the Chipscopy server using the command
below.
cs_server &
Figure 1. Chipscopy Server - Run the hardware server from the computer that connects to target board. To do so, launch hardware server from the computer that has JTAG connection to the vck190 board. Note the computer Name or IP address as highlighted above for cs_server.
- Boot up the board using the
sd_card
image. - In the local Linux terminal, navigate to the directory where the v++ build output is present.
- Configure the options to the
vperf
command as shown below.vperf --link_summary <Design>.xsa.link_summary --hw_url <TCP:ComputerName>/<IP>:3121 --cs_url <TCP:ComputerName>/<IP>:3042 --verbose
You must provide the
--link_summary
option, which should be set to the link_summary output generated from thev++ --link
step. Set the--hw_url
option to the hardware server location, and the--cs_url
option to the Chipscopy location.The output below is generated when the command is executed.
[INFO] Successfully connected to hw_server and cs_server [INFO] Debug IP layout found at : <Vitis Build Path>/_x/link/int/debug_ip_layout.rtd [INFO] HW Device : xilinx_vck190_base_<> 1 NOC_NMU128_X0Y6 : CIPS_0/FPD_CCI_NOC_0 2 NOC_NMU128_X0Y7 : CIPS_0/FPD_CCI_NOC_1 3 NOC_NMU128_X0Y8 : CIPS_0/FPD_CCI_NOC_2 4 NOC_NMU128_X0Y9 : CIPS_0/FPD_CCI_NOC_3 5 NOC_NMU128_X0Y4 : CIPS_0/FPD_AXI_NOC_0 6 NOC_NMU128_X0Y5 : CIPS_0/FPD_AXI_NOC_1 7 NOC_NMU128_X0Y3 : CIPS_0/LPD_AXI_NOC_0 8 NOC_NMU128_X0Y2 : CIPS_0/PMC_NOC_AXI_0 9 NOC_NMU128_X7Y10 : ai_engine_0/M03_AXI 10 NOC_NMU128_X10Y10 : ai_engine_0/M00_AXI 11 NOC_NMU128_X12Y10 : ai_engine_0/M01_AXI 12 NOC_NMU128_X8Y10 : ai_engine_0/M05_AXI 13 NOC_NMU128_X9Y10 : ai_engine_0/M04_AXI 14 NOC_NMU128_X11Y10 : ai_engine_0/M02_AXI 15 DDRMC_X0Y0 : DDRMC/DDRMC_X0Y0 16 DDRMC_X3Y0 : DDRMC/DDRMC_X3Y0 17 DDRMC_X1Y0 : DDRMC/DDRMC_X1Y0
The
vperf
command reads the metadata from the link summary and populates all the nodes corresponding to the NoC NMUs used by the design. For example, the output above lists the NoC NMU units that corresponds to the Control, Interrupt and Processing subsystem, AI Engine GMIO interface ports, and DDR memory controller channel ports.The
vperf
profiling control commands can be used to filter out the nodes of interest and start profiling.Below is list of the control commands that are available to start/end profiling the nodes of interest .
Table 1. vperf Control Commands Command Description s/start_profile Starts profiling. e/end_profile Ends profiling. q/quit End profiling and quits the vperf
command.n/set_npi_sample_period Sets the sample period that signifies the period over which values are sampled. Issuing this command discovers the debug cores that are available and displays the supported sample periods. You can choose sample period using the ID. ID Sample Period 0 56 ms 1 112 ms 2 224 ms 3 447 ms 4 895 ms 5 1790 ms 6 3579 ms 7 7158 ms t/set_tslide Represents 2TSLIDE clock cycles as one count. This is useful when overflow occurs while profiling the NoC. f/set_filter This command allows you to choose the nodes to profile. You can specify the list of nodes separated by comma or specify the range. Filter String Nodes Selected 1,3,4,5,6,8 1,3,4,5,6,8 1-5,8 1,2,3,4,5,8 2,4,9-10 2,4,9,10 It is recommended that you profile up to four nodes at a time. This is due to the slower sampling intervals over JTAG. Profiling more than four nodes could result in missing samples and consequently inaccurate results. c/clear_filter Clears all the filters and profiles all nodes(default) p/print This prints the configuration for easy reference before starting the profile. Below is an example of setting
npi_sample_period
,Tslide
options .> n Discovering debug cores...Done! Enumerating NoC (warning this can take 60-80s on xcvc1902)...Done Discovered Nodes: {'enabled': ['DDRMC_X0Y0', 'DDRMC_X1Y0', 'DDRMC_X3Y0', 'NOC_NMU128_X0Y3', 'NOC_NMU128_X0Y2', 'NOC_NMU128_X0Y4', 'NOC_NMU128_X0Y5', 'NOC_NMU128_X0Y6', 'NOC_NMU128_X0Y7', 'NOC_NMU128_X0Y8', 'NOC_NMU128_X0Y9', 'NOC_NMU128_X11Y10', 'NOC_NMU128_X12Y10', 'NOC_NMU128_X9Y10', 'NOC_NMU128_X10Y10', 'NOC_NMU128_X7Y10', 'NOC_NMU128_X8Y10', 'NOC_NSU128_X6Y6'], 'disabled': [], 'invalid': []} Select ID from the list: ID Sample Period 0 : 56ms 1 : 112ms 2 : 224ms 3 : 447ms 4 : 895ms 5 : 1790ms 6 : 3579ms 7 : 7158ms >> 0 [INFO] Sample period changed to 56ms > t Select Tslide between 0 and 27: >> 1 [INFO] Tslide changed to: 1.
- Once all configurations are set, you can start the profiling
using
s/start_profile
. - Navigate to the hardware Linux console and run the application.
Once the run is complete you can end the profiling using
e/end_profile
. - When you quit from the
vperf
utility using theq/quit
command, a newvperf.run_summary
file gets generated which can be opened in the Vitis IDE.In the example below the AI Engine application includes three GMIO input ports and six GMIO output ports. When you use option'
f
command on thevperf
utility to filter the nodes of interest for profiling, it displays all the available nodes as shown below.> f List of detected nodes in the design: 1 NOC_NMU128_X0Y6 : CIPS_0/FPD_CCI_NOC_0 2 NOC_NMU128_X0Y7 : CIPS_0/FPD_CCI_NOC_1 3 NOC_NMU128_X0Y8 : CIPS_0/FPD_CCI_NOC_2 4 NOC_NMU128_X0Y9 : CIPS_0/FPD_CCI_NOC_3 5 NOC_NMU128_X0Y4 : CIPS_0/FPD_AXI_NOC_0 6 NOC_NMU128_X0Y5 : CIPS_0/FPD_AXI_NOC_1 7 NOC_NMU128_X0Y3 : CIPS_0/LPD_AXI_NOC_0 8 NOC_NMU128_X0Y2 : CIPS_0/PMC_NOC_AXI_0 9 NOC_NMU128_X7Y10 : ai_engine_0/M03_AXI 10 NOC_NMU128_X10Y10 : ai_engine_0/M00_AXI 11 NOC_NMU128_X12Y10 : ai_engine_0/M01_AXI 12 NOC_NMU128_X8Y10 : ai_engine_0/M05_AXI 13 NOC_NMU128_X9Y10 : ai_engine_0/M04_AXI 14 NOC_NMU128_X11Y10 : ai_engine_0/M02_AXI 15 DDRMC_X0Y0 : DDRMC/DDRMC_X0Y0 16 DDRMC_X3Y0 : DDRMC/DDRMC_X3Y0 17 DDRMC_X1Y0 : DDRMC/DDRMC_X1Y0
One important point to observe here is that the design has a total of nine I/O ports. However, there are only six NMU nodes (ID: 9-14) corresponding to the AI Engine. This is because, for each AI Engine to NMU connection through the interface tile, there are two MM2S and two S2MM channels supported.
There is a high possibility that read and write happens through the same interface tile but on different channels. Hence, they use the same NMU node for both read and write transfers, and the same NMU node appears in both NoC counters in the Vitis IDE as shown in the following figure.
Information on GMIO port (both input and output) interface tile and channel mapping can be found in the graph compile summary report in the Vitis IDE.
Figure 2. Interface ChannelsIn the following example, the configuration is set to profile three NoC NMUs, 9-11 (considering the JTAG bandwidth limitations).
Enter the filter to select Nodes from the above list: >> 9,10,11 [INFO] Selected nodes according to the filter : 1 NOC_NMU128_X7Y10 : ai_engine_0/M03_AXI 2 NOC_NMU128_X10Y10 : ai_engine_0/M00_AXI 3 NOC_NMU128_X12Y10 : ai_engine_0/M01_AXI
The NoC profile output in Vitis IDE is shown below.
Figure 3. NoC Counters Read- NoC counters read group, which represents the traffic going through NMUs from DDR to AI Engine.
- NMU nodes corresponding to the GMIO input read.
The NoC profile output in the Vitis IDE contains NMU write groups and the corresponding nodes.
Figure 4. NoC Counters Write - NoC Counter write group, which represents the traffic going through NMUs from AI Engine to DDR.
- NMU nodes corresponding to the GMIO output write.
The configuration is set to profile NoC NMUs, 12-14 (considering the JTAG bandwidth limitations).
Enter the filter to select Nodes from the above list: >> 12,13,14 [INFO] Selected nodes according to the filter : 1 NOC_NMU128_X8Y10 : ai_engine_0/M05_AXI 2 NOC_NMU128_X9Y10 : ai_engine_0/M04_AXI 3 NOC_NMU128_X11Y10 : ai_engine_0/M02_AXI
To get profiling for other NMU nodes:
- Re-configure the
npi_sample_period
,TSLIDE
values and filter the required nodes. - Start the profiling.
- Re-run the application on hardware.
- End the profiling and quit to generate the new vperf.run_summary file.