The profile summary includes a number of useful statistics for your host application and kernels. The report provides a general idea of the functional bottlenecks in your application.
Settings
This displays the report and XRT configuration settings.
Summary
This displays summary statistics including device execution time and device power.
Kernels & Compute Units
Displays the profile summary data for all kernel functions scheduled and executed.
Kernel Data Transfers
Displays the data transfer for kernels to the global memory, and the top data transfer for kernels to the global memory, and the data transfer streams.
Host Data Transfers
Displays profile data for all write transfers between the host and device memory through PCI Express® link, and profile data for all read transfers between the host and device memory through PCI Express® link, and the data transfer for host to the global memory.
API Calls
Displays the profile data for all OpenCL host API function calls executed in the host application. The top displays a bar graph of the API call time as a percent of total time.
Device Power
This displays the profile data for device power.
Kernel Internals
Displays the running time for compute units in microseconds (µs) and reports stall time as a percent of the running time. This section of the Profile Summary displays the data transfer for specific ports on the compute unit, and displays the functional port data transfers on the compute unit, and displays the running time and stalls on the compute unit.
Shell Data Transfers
This following table displays the DMA data transfers.
NoC Counters
NoC Counters display the NoC Counters Read and NoC Counters Write. These sections are only displayed if there is a non-zero NoC counter data.
Each section has a table containing summary data with line graphs for transfer rate and latency. The graphs can have multiple NoC counters, so you can toggle the counters ON/OFF through check boxes in the Chart column of the table.
Depending on the design, it can be possible to correlate NoC counters to CU ports. In this case, the CU port appears in the table, and selecting it cross-probes to the system diagram, profile summary, and any other views that include CU ports as selectable objects.
AI Engine Counters
AI Engine counters display if there is a non-zero AI Engine counter data. If there is an incompatible configuration of the AI Engine counters, this section displays a message stating that the configuration does not support performance profiling. This section of the Profile Summary can contain three sub-sections:
- AI Engine & Memory
- Interface Channels
- Memory Channels