Single-path Synchronous Pipeline - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-07-03
Version
2024.1 English

The following figure shows an example of a single-path pipeline which has three PEs, namely AccLoad, AccMul, and AccStore. The PEs AccLoad and AccStore access data stored in the global memory through M_AXI channels. The accelerator class header ties the inputData and outputData ports to DDR[0]. In this case, the ZERO_COPY code was used for these PEs to directly access the global memory.

Figure 1. Single-path Pipeline

The following is the code example for the figure.

typedef vpp::stream<DT> STREAM;
class Acc : public VPP_ACC<Acc, NCU>
{
    ZERO_COPY(inputData);
    ZERO_COPY(outputData);
    SYS_PORT(inputData, DDR[0]);
    SYS_PORT(outputData, DDR[0]);
public:
    static void compute(DT* inputData, DT* outputData);

    static void AccLoad(DT* inputData, STREAM& aStr,
                        STREAM& bStr, STREAM& iStr);
    static void AccMul(STREAM& aStr, STREAM& bStr,
                       STREAM& cStr);
    static void AccStore(STREAM& iStr, STREAM& cStr, 
                         DT* outputData);
}
void Acc::compute(DT* inputData, DT* outputData)
{
  static STREAM aStr, bStr, cStr, iStr;

  AccLoad (inputData, aStr, bStr, iStr);
  AccMul  (aStr, bStr, cStr);
  AccStore(iStr, cStr, outputData);
}
void Acc::AccMul(STREAM& aStr, STREAM& bStr, STREAM& cStr)
{
  for (int i = 0 ; i < N_WORDS ; i ++) {
     int res = aStr.read() * bStr().read();
     cStr.write(res);
  }
}
The compute() function body represents a hardware pipeline. There are three function calls corresponding to the PEs, and there are four local stream variables declared:
  1. AccLoad takes inputData and writes to three streams
  2. AccMul processes a fixed number of words in input streams aStr and bStr and writes the results to cStr
  3. The AccStore function will further process the incoming data in iStr and cStr to write results on outputData connected to DDR[0] port.

The PEs in this system will execute in a synchronous fashion such that data flows through in a pipelined fashion. Every call of compute() will load inputData and trigger all PEs for a new transaction. Every call to compute() requires every PE to complete execution (start and stop) exactly once. This example is a pipeline with 3-stages, or PEs chained in a single-path. Thus, with a simple C++ coding style the user can create a hardware pipeline.

The VPP_ACC class allows replication of such pipeline using the NCU parameter. If NCU is more than 1, then the hardware contains as many replicated pipelines. The calls to compute() are automatically loaded in the next available pipeline slot. Thus, the application layer remains simple and easy to maintain, and automates running data through multiple pipelines in the hardware.