Vitis Analyzer for Application End-to-end Timeline Analysis - 2022.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2022-12-01
Version
2022.2 English

Vitis Analyzer is a graphical tool which lets you browse many aspects of the design starting from the whole system down to the details of the kernel.

Click to expand! (instructions for Vitis Analyzer)
  1. Open a terminal and setup Vitis

  2. Run: vitis_analyzer &

  3. File menu -> Open Summary…

  4. Browse to ./build

  5. Select cholesky_kernel_hw_emu_xclbin_run_summary (prefixed with the blue “play” pictogram)

  6. Navigate around by yourself watch this 45 seconds looping gif to see how to go around in Vitis Analyzer.

    Make sure to check:

    1. Profile summary

    2. Guidance reports - indicates area of improvement

    3. Application timeline - more information just below

The application timeline has the following structure:

  • Host

    • OpenCL API Calls: All OpenCL API calls are traced here. The activity time is measured from the host perspective.

    • General: All general OpenCL API calls such as clCreateProgramWithBinary, clCreateContext, and clCreateCommandQueue, are traced here.

    • Queue: OpenCL API calls that are associated with a specific command queue are traced here. This includes commands such as clEnqueueMigrateMemObjects, and clEnqueueNDRangeKernel. If the user application creates multiple command queues, then this section shows all the queues and activities.

    • Data Transfer: In this section the DMA transfers from the host to the device memory are traced. There are multiple DMA threads implemented in the OpenCL runtime and there is typically an equal number of DMA channels. The DMA transfer is initiated by the user application by calling OpenCL APIs such as clEnqueueMigrateMemObjects. These DMA requests are forwarded to the runtime which delegates to one of the threads. The data transfer from the host to the device appear under Write as they are written by the host, and the transfers from device to host appear under Read.

    • Kernel Enqueues: The kernels enqueued by the host program are shown here. The kernels here should not be confused with the kernels/CUs on the device. Here kernel refers to the NDRangeKernels and tasks created by the OpenCL commands clEnqueueNDRangeKernels and clEnqueueTask. These are plotted against the time measured from the host’s perspective. Multiple kernels can be scheduled to be executed at the same time, and they are traced from the point they are scheduled to run until the end of the kernel execution. Multiple entries would be shown in different rows depending on the number of overlapping kernel executions.

  • Device “name”

    Binary Container “name”: Simply the binary container name.

    • Accelerator “name”: Name of the compute unit (a.k.a., Accelerator) on the FPGA.