The compute()
method is a special user-defined
method in the user-defined accelerator class definition which is used to represent
the compute unit. The arguments of compute()
provide the software
interface of the CU to the host-side application, and the body of
compute()
specifies the hardware composition of the CU. The
accelerator class must have one or more additional methods defined, each of which
may be individually called only within the body of compute()
. Each
such function called is a processing element (PE) in the hardware. The body of
compute()
can be used to create a structural composition of
PEs.
The compute()
body has the following features:
- Is a structural network of processing elements:
- Using AXI4 standard protocols in the hardware.
- Using a start/stop synchronization per
compute()
job - Can represent a synchronized hardware pipeline
- Only calls to PE functions and local variable declarations, and
no other C-language constructs, are allowed in the
compute()
body - Each PE is a processing element running in parallel in
hardware:
- The code of this function is implemented as an FSM and datapath using Vitis HLS
- PE functions can include Vitis HLS pragmas (
#pragma HLS
)
- PEs can be connected to global memory or other platform AXI4 ports
- PEs can be connected to each other through AXI4-Stream interfaces
- A PE can be free-running:
- Unaware of transaction start/stop with no
ap_ctrl
signals - Always executing (data-driven) without a reset/start state
- Only AXI4-Stream interfaces are allowed
- Unaware of transaction start/stop with no
An example compute()
function definition is given
below:
typedef vpp::stream<float, STREAM_DEPTH> InternalStream;
void pipelined_cu::compute(float* A, float* B, float* E, int M) {
static InternalStream STR_X("str_X");
static InternalStream STR_Y("str_Y");
mmult(A, B, STR_X, M);
incr_10(STR_X, STR_Y, M);
incr_20(STR_Y, E, M);
}
- This example is a hardware pipeline of three PEs connected by two internal
AXI4-Stream interfaces:
STR_X
andSTR_Y
. - The
mmult
PE reads matrices A and B from global memory using their associated pointers and writes to the streamSTR_X
. - The
incr_10
PE reads from streamSTR_X
and writes to streamSTR_Y
. - The
incr_20
PE read from streamSTR_Y
and writes to global memory E. - The scalar argument M is the dimension of the matrices and is provided to all the PE in the CU.
This model can be used to define various types of accelerator system compositions.
The compute()
interface is intended to succinctly
capture a hardware system while providing a simple software application interface.
The following section describes the native C++ data types allowed at the
compute()
interface. The entire accelerator class definition
with compute()
and other PEs, along with an application code can be
functionally validated as described in Debugging and Validation.