Xilinx Runtime (XRT) and APIs - 2023.1 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
Release Date
2023.1 English

Although this might seem obvious, any hardware acceleration system can be broadly discussed in two parts: the hardware architecture and implementation, and the software that interacts with that hardware. For Vitis, regardless of any higher-level software frameworks you might be using in your application such as FFmpeg, GStreamer, or others, the software library that fundamentally interacts with the Alveo hardware is the Xilinx Runtime (XRT).

While XRT consists of many components, its primary role can be boiled down to three simple things:

  • Programming the FPGA/Versal adaptive SoC kernels and managing the life cycle of the hardware

  • Allocating memory and migrating that memory between the host CPU and the card

  • Managing the operation of the hardware: sequencing execution of kernels, setting kernel arguments, etc.

These three things are also, in the same order, the most to least “expensive” operations you can perform on an FPGA. Let’s examine that in more detail.

Programming the kernels into the hardware inherently takes some amount of time. Depending on the capacity of the FPGA, the PCIe bandwidth available to transfer the configuration image, etc., the time required is typically in the order of dozens to hundreds of milliseconds. This is generally a “one-time deal” when you launch your application, so the configuration hit can be absorbed into general setup latency, but it’s important to be aware of. There are applications where Alveo is reprogrammed multiple times during operation to provide different large kernels. If you are planning to build such an architecture, incorporate this configuration time into your application as seamlessly as possible. It is also important to note that while many applications can simultaneously use the hardware, only one image can be programmed at any point in time.

Allocating memory and moving it around is the real “meat” of XRT. Allocating and managing memory effectively is a critical skill for developing acceleration architectures. If you do not manage memory and memory migration efficiently, you will significantly impact your overall application performance — and not in the way you would hope! Fortunately, XRT provides many functions to interact with memory, and we will explore those specifics later on.

Finally, XRT manages the operation of the hardware by setting kernel arguments and managing the kernel execution flow. Kernels can run sequentially or in parallel, from one process or many, and in blocking or non-blocking ways. The exact way your software interacts with your kernels is under your control, and we will investigate some examples of this later on.

It is worth noting that XRT is a low-level API. For very advanced or unusual use models, you might wish to interact with it directly, but most designers choose to use a higher-level API such as OpenCL, the Xilinx Media Accelerator (XMA) framework, or others. The following figure shows a top-level view of the available APIs. For this document, we will focus primarily on the OpenCL API to make this introduction more approachable. If you have used OpenCL before, you will find it mostly similar (although AMD does provide some extensions for FPGA-specific tasks.)

XRT Stack

In the next section, you will learn the basic concepts of memory management you will need to understand to optimize your application.

Copyright © 2020–2023 Advanced Micro Devices, Inc

Terms and Conditions