User-Defined Accelerator Class - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-05-30
Version
2024.1 English

Vitis enables VSC for hardware and host-code compilation when the source code contains a class derived from the VPP_ACC class. The class derived from VPP_ACC will be compiled as an accelerator.

A VSC accelerator interface is defined in the single-source C++ model as a class definition that is required in a C++ header file. Multiple accelerator class definitions can be found in one compilation unit (as a single v++ --compile command), all of which will be implemented in the same PL hardware. You might also provide multiple accelerators across multiple compilation units. However each compilation unit must define the complete kernel (or HLS) code for that unit.

Every user-defined accelerator class:

  • Must be derived from the Vitis pre-defined VPP_ACC class
  • Must have a static method named compute()
  • The arguments and functionality of compute() can be user-defined
  • The class template has two arguments:
    • The first argument must be the same as the user-defined class name
    • The second argument is the number of compute units to be replicated in the hardware

An example accelerator class definition (xmmult) is provided below:

#include "vpp_acc.hpp"
class xmmult : VPP_ACC<xmmult, /*NCU=*/4>
{
public:
    // Platform port connections
    SYS_PORT(A, DDR[0]);
    SYS_PORT(B, DDR[1]);
    SYS_PORT(C, DDR[2]);
    SYS_PORT_PFM(u50, A, (HBM[0]:HBM[4]:HBM[8]:HBM[12]));
    SYS_PORT_PFM(u50, B, (HBM[1]:HBM[5]:HBM[9]:HBM[13]));
    SYS_PORT_PFM(u50, C, (HBM[2]:HBM[6]:HBM[10]:HBM[14]));
    // Data interfaces
    ACCESS_PATTERN(A, SEQUENTIAL);
    ACCESS_PATTERN(B, RANDOM);
    DATA_COPY(A, A[SZ]);  // move to local memory
    DATA_COPY(B, B[SZ]);
    ZERO_COPY(C);  // direct AXI-mm

    // define the SW entry point of the accelerator
    static void compute(data_t* A, data_t* B, data_t* C);
    // define the HW top-level of the accelerator (HLS top)
    static void mmult(data_t* A, data_t* B, data_t* C);
};

This class interface model captures all hardware system-related considerations in a single unified source and allows easy integration with the application layer through the compute() function as described in The compute() API. The compute() function is the host application's entry point to the hardware accelerator.

The class definition also contains guidance macros that refer to compute() arguments. By providing these, you will guide VSC to make specific hardware choices during implementation. Refer to Guidance Macros for more information.

The example shown above describes the accelerator class named xmmult, which will be compiled by VSC to have four CUs in hardware (/*NCU=*/4). The accelerator has a PE (or kernel) code defined in the mmult() function, which is called within compute() that takes three arguments A, B, and C. For each of these arguments, two types of guidance macros are specified: memory port connections, and data access.

  1. Platform port connections using SYS_PORT() and SYS_PORT_PFM() macros.
    1. These are typically global memory I/O connections or other AXI4 interface connections available in the hardware platform used during Vitis compilation. In this example, the first three SYS_PORT() macros connect the three A, B, and C arguments of compute() to different DDR banks (0, 1 and 2).
      Tip: The SYS_PORT() guidance macro connectivity for each argument applies to all the CU instances (NCU=4) in the xmmult accelerator, because it only specifies one memory bank.
    2. The three SYS_PORT_PFM() macros apply global memory connections in two ways:
      1. It only applies when the target platform name contains u50, as in the U50 Alveo card.
      2. For each of the four CUs, each argument connects to a different HBM banks. This is because the syntax used for the SYS_PORT_PFM() for each argument specifies connectivity four times:
        SYS_PORT_PFM(u50, A, (HBM[0]:HBM[4]:HBM[8]:HBM[12]));
  2. Data access is specified using ACCESS_PATTERN(), DATA_COPY(), and ZERO_COPY() macros:
    1. The ACCESS_PATTERN() macros directs VSC to infer a sequential or random access for data transfer. In the example, argument A is defined as sequentially accessed and therefore can be implemented with a stream connection. Argument B is accessed randomly and therefore requires a local (on-chip) memory buffer to support randomized data access from the PE mmult().
    2. For arguments A and B, the DATA_COPY() macros direct VSC to infer a local memory (RAM) next to the accelerator.
    3. The ZERO_COPY() macro directs VSC to create a memory-mapped AXI (M_AXI) connection for argument C.