Vitis enables VSC for hardware
and host-code compilation when the source code contains a class derived from the
VPP_ACC
class. The class derived from VPP_ACC
will be compiled as an accelerator.
A VSC accelerator interface is defined in the single-source C++ model as a
class definition that is required in a C++ header file. Multiple accelerator class
definitions can be found in one compilation unit (as a single v++ --compile
command), all of which will be
implemented in the same PL hardware. You might also provide multiple accelerators
across multiple compilation units. However each compilation unit must define the
complete kernel (or HLS) code for that unit.
Every user-defined accelerator class:
- Must be derived from the Vitis pre-defined
VPP_ACC
class - Must have a static method named
compute()
- The arguments and functionality of
compute()
can be user-defined - The class template has two arguments:
- The first argument must be the same as the user-defined class name
- The second argument is the number of compute units to be replicated in the hardware
An example accelerator class definition (xmmult
) is provided below:
#include "vpp_acc.hpp"
class xmmult : VPP_ACC<xmmult, /*NCU=*/4>
{
public:
// Platform port connections
SYS_PORT(A, DDR[0]);
SYS_PORT(B, DDR[1]);
SYS_PORT(C, DDR[2]);
SYS_PORT_PFM(u50, A, (HBM[0]:HBM[4]:HBM[8]:HBM[12]));
SYS_PORT_PFM(u50, B, (HBM[1]:HBM[5]:HBM[9]:HBM[13]));
SYS_PORT_PFM(u50, C, (HBM[2]:HBM[6]:HBM[10]:HBM[14]));
// Data interfaces
ACCESS_PATTERN(A, SEQUENTIAL);
ACCESS_PATTERN(B, RANDOM);
DATA_COPY(A, A[SZ]); // move to local memory
DATA_COPY(B, B[SZ]);
ZERO_COPY(C); // direct AXI-mm
// define the SW entry point of the accelerator
static void compute(data_t* A, data_t* B, data_t* C);
// define the HW top-level of the accelerator (HLS top)
static void mmult(data_t* A, data_t* B, data_t* C);
};
This class interface model captures all hardware system-related
considerations in a single unified source and allows easy integration with the
application layer through the compute()
function as described in
The compute() API. The compute()
function is
the host application's entry point to the hardware accelerator.
The class definition also contains guidance macros that refer to
compute()
arguments. By providing these, you
will guide VSC to make specific hardware choices during implementation. Refer to
Guidance Macros for more information.
The example shown above describes the accelerator class named
xmmult
, which will be compiled by VSC to have four CUs in
hardware (/*NCU=*/4
). The accelerator has a PE (or kernel) code
defined in the mmult()
function, which is called
within compute()
that takes three arguments A, B, and C. For each
of these arguments, two types of guidance macros are specified: memory port
connections, and data access.
- Platform port connections using
SYS_PORT()
andSYS_PORT_PFM()
macros.- These are typically global memory I/O connections or other AXI4 interface connections available in
the hardware platform used during Vitis compilation. In this example, the first three
SYS_PORT()
macros connect the three A, B, and C arguments ofcompute()
to different DDR banks (0, 1 and 2).Tip: TheSYS_PORT()
guidance macro connectivity for each argument applies to all the CU instances (NCU=4) in thexmmult
accelerator, because it only specifies one memory bank. - The three
SYS_PORT_PFM()
macros apply global memory connections in two ways:- It only applies when the target platform name contains u50, as in the U50 Alveo card.
- For each of the four CUs, each argument connects to a different HBM banks. This is because the syntax used for
the
SYS_PORT_PFM()
for each argument specifies connectivity four times:SYS_PORT_PFM(u50, A, (HBM[0]:HBM[4]:HBM[8]:HBM[12]));
- These are typically global memory I/O connections or other AXI4 interface connections available in
the hardware platform used during Vitis compilation. In this example, the first three
- Data access is specified using
ACCESS_PATTERN()
,DATA_COPY()
, andZERO_COPY()
macros:- The
ACCESS_PATTERN()
macros directs VSC to infer a sequential or random access for data transfer. In the example, argument A is defined as sequentially accessed and therefore can be implemented with a stream connection. Argument B is accessed randomly and therefore requires a local (on-chip) memory buffer to support randomized data access from the PEmmult()
. - For arguments A and B, the
DATA_COPY()
macros direct VSC to infer a local memory (RAM) next to the accelerator. - The
ZERO_COPY()
macro directs VSC to create a memory-mapped AXI (M_AXI) connection for argument C.
- The