Profiling Using PL Profile Monitors - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
Release Date
2023.2 English

In this section, you will walk through the process of insering PL profile monitors to identify specific PL kernels that causes the potential drop in performance.

This is a three step process:

  • Add the PL profile monitors in the V++ link command, and generate the SD card image.

  • Prepare the xrt.ini file, and run the design on hardware.

  • Observe the output in the AMD Vitis™ Analyzer, and analyze the performance.

  1. Open the Makefile from cmd_src/ directory.

  2. Locate the VPP_LINK_FLAGS, and add all:all:all as follows:

    VPP_LINK_FLAGS := -l -t $(TARGET) --platform $(BASE_PLATFORM) $(KERNEL_XO) $(GRAPH_O) all:all:all --save-temps -g --config $(CONFIG_FILE) -o $(PFM).xsa

    The<arg> option enables the monitoring of data ports through the monitor IP that are added into the design. In this example, <arg> is set to all:all:all, i.e, assign the data profile to all CUs; you can find the names from the system.cfg file as s2mm_1,s2mm_2 and mm2s* and interfaces of all kernels, s2mm and mm2s.

  3. Do make all TARGET=hw, and a hardware image sd_card.img gets generated inside the sw/ directory.

  4. Flash the sd_card.img file to the SD card. You can follow step 3 in Running the Design on Hardware section.

  5. Create a xrt.ini file with content as follows:

    device_trace = fine


    • The [Debug] switch key option is used to enable profiling of the application during runtime.

    • The [profile] section head contains the data=all:all:all to monitor data on all kernels and CUs.

  6. In the console, run the application by:

    cd /run/media/mmcblk0p1
    ./host.exe a.xclbin

    Observe the TEST PASSED.

  7. Observe the files, xrt.run_summary,summary.csv, and device_trace_*.csv. Copy back the files to the local workspace, and open the xrt.run_summary file in the Vitis Analyzer using the following command:

    vitis_analyzer xrt.run_summary
  8. Once the Vitis Analyzer opens, click the Profile Summary in the left side pane, and navigate to the Compute Unit Utilization. Observe the compute units and kernels. Also note the time and clock frequency as follows. CU Utilization

  9. You can get the data transfer for each compute unit and total Read/write in megabytes by navigating to Kernel Data Transfers -> Top Kernel Transfer as follows: Top kernel transfer

  10. From the Kernel Data Transfers -> Kernel Transfer tab, you can get the transfer rate, througput utilization (%), and latency details.