- Set up the Vitis tool environment
by sourcing the settings64.csh
script.
<Vitis Installation Path>/Vitis/<202x.x>/settings64.sh - Launch the Chipscopy server using the following
command.
cs_server &Figure 1. Chipscopy Server - Run the hardware server from the computer that connects to target board. To
do so, launch the hardware server from the computer that has the JTAG connection
to the vck190 board. Note the computer Name or IP address as highlighted above
for
cs_server. - Boot up the board using the
sd_cardimage. - In the local Linux terminal, navigate to the directory containing the
v++build output. - Configure the options to the
vperfcommand as follows.vperf --link_summary <Design>.xsa.link_summary --hw_url <TCP:ComputerName>/<IP>:3121 --cs_url <TCP:ComputerName>/<IP>:3042 --verboseYou must provide the
--link_summaryoption. Set this to the link_summary output generated from thev++ --linkstep. Set the--hw_urloption to the hardware server location, and the--cs_urloption to the Chipscopy location.Executing the command outputs the following.
[INFO] Successfully connected to hw_server and cs_server [INFO] Debug IP layout found at : <Vitis Build Path>/_x/link/int/debug_ip_layout.rtd [INFO] HW Device : xilinx_vck190_base_<> 1 NOC_NMU128_X0Y6 : CIPS_0/FPD_CCI_NOC_0 2 NOC_NMU128_X0Y7 : CIPS_0/FPD_CCI_NOC_1 3 NOC_NMU128_X0Y8 : CIPS_0/FPD_CCI_NOC_2 4 NOC_NMU128_X0Y9 : CIPS_0/FPD_CCI_NOC_3 5 NOC_NMU128_X0Y4 : CIPS_0/FPD_AXI_NOC_0 6 NOC_NMU128_X0Y5 : CIPS_0/FPD_AXI_NOC_1 7 NOC_NMU128_X0Y3 : CIPS_0/LPD_AXI_NOC_0 8 NOC_NMU128_X0Y2 : CIPS_0/PMC_NOC_AXI_0 9 NOC_NMU128_X7Y10 : ai_engine_0/M03_AXI 10 NOC_NMU128_X10Y10 : ai_engine_0/M00_AXI 11 NOC_NMU128_X12Y10 : ai_engine_0/M01_AXI 12 NOC_NMU128_X8Y10 : ai_engine_0/M05_AXI 13 NOC_NMU128_X9Y10 : ai_engine_0/M04_AXI 14 NOC_NMU128_X11Y10 : ai_engine_0/M02_AXI 15 DDRMC_X0Y0 : DDRMC/DDRMC_X0Y0 16 DDRMC_X3Y0 : DDRMC/DDRMC_X3Y0 17 DDRMC_X1Y0 : DDRMC/DDRMC_X1Y0The
vperfcommand reads the metadata from the link summary and populates all the nodes corresponding to the NoC NMUs used by the design. For example, the output above lists the NoC NMUs that correspond to the following:- The Control, Interrupt and Processing subsystem
- AI Engine GMIO interface ports
- DDR memory controller channel ports
You can use the
vperfprofiling control commands to filter out the nodes of interest and start profiling.Following is a list of the control commands that are available to start/end profiling the nodes of interest.
Table 1. vperf Control Commands Command Description s/start_profile Starts profiling. e/end_profile Ends profiling. q/quit End profiling and quits the vperfcommand.n/set_npi_sample_period Sets the sample period that signifies the period over which values are sampled. Issuing this command discovers the debug cores that are available and displays the supported sample periods. You can choose sample period using the ID. ID Sample Period 0 56 ms 1 112 ms 2 224 ms 3 447 ms 4 895 ms 5 1790 ms 6 3579 ms 7 7158 ms t/set_tslide Represents 2TSLIDE clock cycles as one count. This is useful when overflow occurs while profiling the NoC. f/set_filter This command allows you to choose the nodes to profile. You can specify the list of nodes separated by comma or specify the range. Filter String Nodes Selected 1,3,4,5,6,8 1,3,4,5,6,8 1-5,8 1,2,3,4,5,8 2,4,9-10 2,4,9,10 It is recommended that you profile up to four nodes at a time. This is due to the slower sampling intervals over JTAG. Profiling more than four nodes can result in missing samples and consequently inaccurate results. c/clear_filter Clears all the filters and profiles all nodes(default) p/print This prints the configuration for easy reference before starting the profile. Following is an example of setting the
npi_sample_periodandTslideoptions.> n Discovering debug cores...Done! Enumerating NoC (warning this can take 60-80s on xcvc1902)...Done Discovered Nodes: {'enabled': ['DDRMC_X0Y0', 'DDRMC_X1Y0', 'DDRMC_X3Y0', 'NOC_NMU128_X0Y3', 'NOC_NMU128_X0Y2', 'NOC_NMU128_X0Y4', 'NOC_NMU128_X0Y5', 'NOC_NMU128_X0Y6', 'NOC_NMU128_X0Y7', 'NOC_NMU128_X0Y8', 'NOC_NMU128_X0Y9', 'NOC_NMU128_X11Y10', 'NOC_NMU128_X12Y10', 'NOC_NMU128_X9Y10', 'NOC_NMU128_X10Y10', 'NOC_NMU128_X7Y10', 'NOC_NMU128_X8Y10', 'NOC_NSU128_X6Y6'], 'disabled': [], 'invalid': []} Select ID from the list: ID Sample Period 0 : 56ms 1 : 112ms 2 : 224ms 3 : 447ms 4 : 895ms 5 : 1790ms 6 : 3579ms 7 : 7158ms >> 0 [INFO] Sample period changed to 56ms > t Select Tslide between 0 and 27: >> 1 [INFO] Tslide changed to: 1. - Start profiling with
s/start_profileafter all configurations are set. - Navigate to the hardware Linux console and run the application. When the run
is complete, end the profiling using
e/end_profile. - When you quit from the
vperfutility using theq/quitcommand, a newvperf.run_summaryfile generates. You can open this in the Vitis IDE.In the following example, the AI Engine application includes three GMIO input ports and six GMIO output ports. You can use the option
fcommand on thevperfutility to filter the nodes of interest for profiling. This displays all the available nodes as shown below.> f List of detected nodes in the design: 1 NOC_NMU128_X0Y6 : CIPS_0/FPD_CCI_NOC_0 2 NOC_NMU128_X0Y7 : CIPS_0/FPD_CCI_NOC_1 3 NOC_NMU128_X0Y8 : CIPS_0/FPD_CCI_NOC_2 4 NOC_NMU128_X0Y9 : CIPS_0/FPD_CCI_NOC_3 5 NOC_NMU128_X0Y4 : CIPS_0/FPD_AXI_NOC_0 6 NOC_NMU128_X0Y5 : CIPS_0/FPD_AXI_NOC_1 7 NOC_NMU128_X0Y3 : CIPS_0/LPD_AXI_NOC_0 8 NOC_NMU128_X0Y2 : CIPS_0/PMC_NOC_AXI_0 9 NOC_NMU128_X7Y10 : ai_engine_0/M03_AXI 10 NOC_NMU128_X10Y10 : ai_engine_0/M00_AXI 11 NOC_NMU128_X12Y10 : ai_engine_0/M01_AXI 12 NOC_NMU128_X8Y10 : ai_engine_0/M05_AXI 13 NOC_NMU128_X9Y10 : ai_engine_0/M04_AXI 14 NOC_NMU128_X11Y10 : ai_engine_0/M02_AXI 15 DDRMC_X0Y0 : DDRMC/DDRMC_X0Y0 16 DDRMC_X3Y0 : DDRMC/DDRMC_X3Y0 17 DDRMC_X1Y0 : DDRMC/DDRMC_X1Y0An important observation is that the design has a total of nine I/O ports. However, there are only six NMU nodes (ID: 9-14) corresponding to the AI Engine. This is because each AI Engine to NMU connection through the interface tile supports two MM2S and two S2MM channels supported.
There is a high possibility that read and write happens through the same interface tile but on different channels. Hence, they use the same NMU node for read and write transfers, and the same NMU node appears in both NoC counters in the Vitis IDE. See the following figure for reference.
You can find information on GMIO port (input and output) interface tile and channel mapping in the graph compile summary report in the Vitis IDE.
Figure 2. Interface ChannelsThe following example sets the configuration to profile three NoC NMUs, 9-11 (considering the JTAG bandwidth limitations).
Enter the filter to select Nodes from the above list: >> 9,10,11 [INFO] Selected nodes according to the filter : 1 NOC_NMU128_X7Y10 : ai_engine_0/M03_AXI 2 NOC_NMU128_X10Y10 : ai_engine_0/M00_AXI 3 NOC_NMU128_X12Y10 : ai_engine_0/M01_AXIThe following figure shows the NoC profile output in the Vitis IDE.
Figure 3. NoC Counters Read- NoC counters read group, which represents the traffic going through NMUs from DDR to AI Engine.
- NMU nodes corresponding to the GMIO input read.
The NoC profile output in the Vitis IDE contains NMU write groups and the corresponding nodes.
Figure 4. NoC Counters Write - NoC Counter write group, which represents the traffic going through NMUs from AI Engine to DDR.
- NMU nodes corresponding to the GMIO output write.
The following configuration profiles NoC NMUs, 12-14 (considering the JTAG bandwidth limitations).
Enter the filter to select Nodes from the above list: >> 12,13,14 [INFO] Selected nodes according to the filter : 1 NOC_NMU128_X8Y10 : ai_engine_0/M05_AXI 2 NOC_NMU128_X9Y10 : ai_engine_0/M04_AXI 3 NOC_NMU128_X11Y10 : ai_engine_0/M02_AXI
To get profiling for other NMU nodes:
- Re-configure the
npi_sample_periodandTSLIDEvalues and filter the required nodes. - Start the profiling.
- Re-run the application on hardware.
- End the profiling and quit to generate the new vperf.run_summary file.
In the preceding figure, only the NoC counters write group is in the profile summary and no read group. This is because the three nodes correspond to the GMIO output, and no GMIO input maps to that interface tile.
You can also open the Vivado project generated by the v++ link (_x/link/vivado/vpl/prj/prj.xpr) step. You can view the NoC interconnect network and locate the NMUs connected to any design with traffic flowing through the NoC.