The profile summary includes a number of useful statistics for your host application and kernels. The report provides a general idea of the functional bottlenecks in your application.
Settings
This displays the report and XRT configuration settings.
Summary
This displays summary statistics including device execution time and device power.
Kernels & Compute Units
Displays the profile summary data for all kernel functions scheduled and executed.
Kernel Data Transfers
Displays the data transfer for kernels to the global memory, and the top data transfer for kernels to the global memory, and the data transfer streams.
Host Data Transfers
Displays profile data for all write transfers between the host and device memory through PCI Express® link, and profile data for all read transfers between the host and device memory through PCI Express® link, and the data transfer for host to the global memory.
API Calls
Displays the profile data for all OpenCL host API function calls executed in the host application. The top displays a bar graph of the API call time as a percent of total time.
Device Power
This displays the profile data for device power.
Kernel Internals
Displays the running time for compute units in microseconds (µs) and reports stall time as a percent of the running time. This section of the Profile Summary displays the data transfer for specific ports on the compute unit, and displays the functional port data transfers on the compute unit, and displays the running time and stalls on the compute unit.
Shell Data Transfers
This following table displays the DMA data transfers.