The compute() API - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-05-30
Version
2024.1 English

The compute() method is a special user-defined method in the user-defined accelerator class definition which is used to represent the compute unit. The arguments of compute() provide the software interface of the CU to the host-side application, and the body of compute() specifies the hardware composition of the CU. The accelerator class must have one or more additional methods defined, each of which may be individually called only within the body of compute(). Each such function called is a processing element (PE) in the hardware. The body of compute() can be used to create a structural composition of PEs.

The compute() body has the following features:

  • Is a structural network of processing elements:
    • Using AXI4 standard protocols in the hardware.
    • Using a start/stop synchronization per compute() job
    • Can represent a synchronized hardware pipeline
  • Only calls to PE functions and local variable declarations, and no other C-language constructs, are allowed in the compute() body
  • Each PE is a processing element running in parallel in hardware:
    • The code of this function is implemented as an FSM and datapath using Vitis HLS
    • PE functions can include Vitis HLS pragmas (#pragma HLS)
  • PEs can be connected to global memory or other platform AXI4 ports
  • PEs can be connected to each other through AXI4-Stream interfaces
  • A PE can be free-running:
    • Unaware of transaction start/stop with no ap_ctrl signals
    • Always executing (data-driven) without a reset/start state
    • Only AXI4-Stream interfaces are allowed

An example compute() function definition is given below:

typedef vpp::stream<float, STREAM_DEPTH> InternalStream;
void pipelined_cu::compute(float* A, float* B, float* E, int M) {
    static InternalStream STR_X("str_X");
    static InternalStream STR_Y("str_Y");
    mmult(A, B, STR_X, M);
    incr_10(STR_X, STR_Y, M);
    incr_20(STR_Y, E, M);
}
  • This example is a hardware pipeline of three PEs connected by two internal AXI4-Stream interfaces: STR_X and STR_Y.
  • The mmult PE reads matrices A and B from global memory using their associated pointers and writes to the stream STR_X.
  • The incr_10 PE reads from stream STR_X and writes to stream STR_Y.
  • The incr_20 PE read from stream STR_Y and writes to global memory E.
  • The scalar argument M is the dimension of the matrices and is provided to all the PE in the CU.

This model can be used to define various types of accelerator system compositions.

The compute() interface is intended to succinctly capture a hardware system while providing a simple software application interface. The following section describes the native C++ data types allowed at the compute() interface. The entire accelerator class definition with compute() and other PEs, along with an application code can be functionally validated as described in Debugging and Validation.