The Timeline Trace window displays host and device events on a common timeline.
This information helps you understand details of application execution and identify
potential areas for improvements. The Timeline Trace report has two main sections: Host
and Device. The Host section shows the trace of all the activity originating from the
host side. The Device section shows the activity of the CUs on the FPGA.
The report has the following structure:
-
Host
-
OpenCL
API Calls
- All OpenCL API
calls are traced here. The activity time is measured from the host
perspective.
-
-
General
- All general OpenCL API calls such as
clCreateProgramWithBinary
,
clCreateContext
, and
clCreateCommandQueue
,
are traced here.
-
-
Queue
-
OpenCL
API calls that are associated with a specific command queue
are traced here. This includes commands such as
clEnqueueMigrateMemObjects
,
and clEnqueueNDRangeKernel
. If the user application
creates multiple command queues, then this section shows all
the queues and activities.
-
-
Data Transfer
- In this section the DMA transfers from the
host to the device memory are traced. There are multiple DMA
threads implemented in the OpenCL runtime and there is typically an
equal number of DMA channels. The DMA transfer is initiated
by the user application by calling OpenCL APIs such as
clEnqueueMigrateMemObjects
. These DMA requests
are forwarded to the runtime which delegates to one of the
threads. The data transfer from the host to the device
appear under Write
as they are written by the host, and the transfers from
device to host appear under Read.
-
Kernel Enqueues
- The kernels enqueued by the host program
are shown here. The kernels here should not be confused with
the kernels/CUs on the device. Here kernel refers to the
NDRangeKernels
and
tasks created by the OpenCL commands clEnqueueNDRangeKernels
and clEnqueueTask
. These are
plotted against the time measured from the host’s
perspective. Multiple kernels can be scheduled to be
executed at the same time, and they are traced from the
point they are scheduled to run until the end of the kernel
execution. This is the reason for multiple entries. The
number of rows depend on the number of overlapping kernel
executions.Note: Overlapping of the kernels should not
be mistaken for actual parallel execution on the device
as the process might not be ready to execute right
away.
-
Device "name"
-
Binary Container "name"
- Binary container name.
-
-
Accelerator "name"
- Name of the compute unit (a.k.a.,
Accelerator) on the FPGA.
-
-
User Functions
- In the case of the Vitis HLS
tool kernels, functions that are implemented as
data flow processes are traced here. The trace for
these functions show the number of active
instances of these functions that are currently
executing in parallel. These names are generated
in hardware emulation when waveform is
enabled.
Note: Function level activity is only
possible in hardware emulation.
-
Function: "name a"
-
Function: "name b"
-
Read
- A CU reads from the DDR over AXI-MM ports. The trace of a data read
by a CU is shown here. The activity is shown as
transaction and the tool-tip for each transaction
shows more details of the AXI transaction. These
names are generated when
--profile.data
is for the CU.
-
Write
- A CU writes to the DDR over AXI-MM ports. The trace of data written
by a CU is shown here. The activity is shown as
transactions and the tool-tip for each transaction
shows more details of the AXI transaction. This is
generated when
--profile.data
is specified for the
CU.