Vitis Analyzer is a graphical tool which lets you browse many aspects of the design starting from the whole system down to the details of the kernel.
Click to expand! (instructions for Vitis Analyzer
)
Open a terminal and setup Vitis
Run:
vitis_analyzer &
File menu -> Open Summary…
Browse to
./build
Select cholesky_kernel_hw_emu_xclbin_run_summary (prefixed with the blue “play” pictogram)
Navigate around by yourself watch this 45 seconds looping gif to see how to go around in Vitis Analyzer.
Make sure to check:
Profile summary
Guidance reports - indicates area of improvement
Application timeline - more information just below
The application timeline has the following structure:
Host
OpenCL API Calls: All OpenCL API calls are traced here. The activity time is measured from the host perspective.
General: All general OpenCL API calls such as clCreateProgramWithBinary, clCreateContext, and clCreateCommandQueue, are traced here.
Queue: OpenCL API calls that are associated with a specific command queue are traced here. This includes commands such as clEnqueueMigrateMemObjects, and clEnqueueNDRangeKernel. If the user application creates multiple command queues, then this section shows all the queues and activities.
Data Transfer: In this section the DMA transfers from the host to the device memory are traced. There are multiple DMA threads implemented in the OpenCL runtime and there is typically an equal number of DMA channels. The DMA transfer is initiated by the user application by calling OpenCL APIs such as clEnqueueMigrateMemObjects. These DMA requests are forwarded to the runtime which delegates to one of the threads. The data transfer from the host to the device appear under Write as they are written by the host, and the transfers from device to host appear under Read.
Kernel Enqueues: The kernels enqueued by the host program are shown here. The kernels here should not be confused with the kernels/CUs on the device. Here kernel refers to the NDRangeKernels and tasks created by the OpenCL commands clEnqueueNDRangeKernels and clEnqueueTask. These are plotted against the time measured from the host’s perspective. Multiple kernels can be scheduled to be executed at the same time, and they are traced from the point they are scheduled to run until the end of the kernel execution. Multiple entries would be shown in different rows depending on the number of overlapping kernel executions.
Device “name”
Binary Container “name”: Simply the binary container name.
Accelerator “name”: Name of the compute unit (a.k.a., Accelerator) on the FPGA.