Streaming Network of PEs - 2022.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2022-05-25
Version
2022.1 English

Using stream variables between functions in the compute() scope, you can design an arbitrary network of PEs streaming data across the PEs. The body of the compute() method semantically describes a structural composition of PEs. It is unlike the procedural semantics in the C-language, and VSC allows software-emulation based validation of the compute() body semantics. An example using such a network is a design developed for Etherium hashing - a popular algorithm used in cryptocurrency mining. The VSC code for this design is available in the Ethash example on GitHub.

Figure 1. Streaming Network

The picture shows the system architecture of this design. It is a pipelined network of PEs connected by AXI4-Stream. There are four PEs, nodeLookUp-1 to 4 that read from global memory, and each of these also read from an input stream produced by the PE prefnv. The resulting AXI4-Stream from these four PEs are consumed by the PE postfnv.

Notice that there is an AXI4-Stream feedback loop from postfnv in fsk_passback and back to prefnv. This loop is expected to converge after iterating several times over data flowing through the AXI4-Stream. The entire system of PEs will deterministically start and stop execution for each compute() call.

Such streaming architectures are typically efficient in utilizing FPGA resources, and particularly lower in routing resources compared to using AXI4 M_AXI interfaces. Therefore, such architectures have the potential to achieve higher clock frequency and better accelerator performance.

This VSC model is written entirely in C++ and it captures the network with function calls in the compute() scope. The C++ model can be functionally validated in VSC using software emulation, without needing to compile any hardware. This enables early validation of the original design intent in the Vitis tools.

Tip: A less efficient way to compose a system is by creating multiple accelerators (different derivative classes from VPP_ACC), composing a pipeline in the application layer as described in Multi-Accelerator Pipeline Composition.