Interpreting the Profile Summary - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-07-03
Version
2024.1 English

The profile summary includes a number of useful statistics for your host application and kernels. The report provides a general idea of the functional bottlenecks in your application.

Tip: When viewing a table in the Analysis view, you can hover your mouse over nay field to get a definition of the field contents.

Settings

This displays the report and XRT configuration settings.

Summary

This displays summary statistics including device execution time and device power.

Kernels & Compute Units

Displays the profile summary data for all kernel functions scheduled and executed.

Kernel Data Transfers

Displays the data transfer for kernels to the global memory, and the top data transfer for kernels to the global memory, and the data transfer streams.

Host Data Transfers

Displays profile data for all write transfers between the host and device memory through PCI Express® link, and profile data for all read transfers between the host and device memory through PCI Express® link, and the data transfer for host to the global memory.

API Calls

Displays the profile data for all OpenCL host API function calls executed in the host application. The top displays a bar graph of the API call time as a percent of total time.

Device Power

This displays the profile data for device power.

Kernel Internals

Displays the running time for compute units in microseconds (µs) and reports stall time as a percent of the running time. This section of the Profile Summary displays the data transfer for specific ports on the compute unit, and displays the functional port data transfers on the compute unit, and displays the running time and stalls on the compute unit.

Tip: The Kernel Internals tab reports time in µs, while the rest of the Profile Summary reports time in milliseconds (ms).

Shell Data Transfers

This following table displays the DMA data transfers.

Tip: For DMA bypass and Global Memory to Global Memory data transfers, see the DMA Data Transfer table in Kernel Internals.

NoC Counters

Tip: This data is not displayed unless it has been specifically generated during implementation.

NoC Counters display the NoC Counters Read and NoC Counters Write. These sections are only displayed if there is a non-zero NoC counter data.

Each section has a table containing summary data with line graphs for transfer rate and latency. The graphs can have multiple NoC counters, so you can toggle the counters ON/OFF through check boxes in the Chart column of the table.

Depending on the design, it can be possible to correlate NoC counters to CU ports. In this case, the CU port appears in the table, and selecting it cross-probes to the system diagram, profile summary, and any other views that include CU ports as selectable objects.

AI Engine Counters

AI Engine counters display if there is a non-zero AI Engine counter data. If there is an incompatible configuration of the AI Engine counters, this section displays a message stating that the configuration does not support performance profiling. This section of the Profile Summary can contain three sub-sections:

  • AI Engine & Memory
  • Interface Channels
  • Memory Channels
Note: For more information, see AI Engine Tools and Flows User Guide (UG1076).