Multi-Accelerator Pipeline Composition - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-07-03
Version
2024.1 English

In VSC you can also create multiple accelerators with different functionality in a single .xclbin and runtime environment. With such composition you can create a pipeline, where two or more VSC accelerators operating of different data sets in a pipelined fashion, as shown in the sysc_compose example in Supported Platforms and Startup Examples. There are two possible use models described in the following sections.

ACC1-CPU-ACC2 Pipeline

This model defines a pipeline of tasks where the first accelerator computes an intermediate result which is then processed by a host application, and then further processed by a second accelerator. The output buffer of the first accelerator needs to be modified (modify while copying) and then passed to the second accelerator. Example code is provided below.

auto inBP    = my_acc1::create_bufpool(vpp::input);
auto tmpoBP  = my_acc1::create_bufpool(vpp::output);
auto tmpiBP  = my_acc2::create_bufpool(vpp::input);
auto outBP   = my_acc2::create_bufpool(vpp::output);
my_acc1::send_while(
    [=]()->bool {
        int* in = my_acc1::alloc_buf<int>(inBP, inSz);
        int* tmp = my_acc1::alloc_buf<int>(tmpoBP, tmpSz);
        my_acc1::compute(in, tmp, ...);
        ...;
        return ...;
    });
my_acc2::send_while(
    [=]()->bool {
        int* tmp2 = my_acc2::alloc_buf<int>(tmpiBP, tmpSz);
        bool cond = my_acc1::receive_one_in_order( // or receive_one_asap
            [=]() {
                int* tmp1 = my_acc1::get_buf<int>(tmpoBP);
                ...; // tmp1 -> copy and modify -> tmp2
        });
        if (!cond) return false;
        int* out = my_acc2::alloc_buf<int>(outBP, outSz);
        my_acc2::compute(tmp2, out, ...);
        ...;
        return true;
    });
my_acc2::receive_all_in_order(
    [=]() {
        int* out = xfilter1::get_buf<int>(outBP);
        ...;
    });
my_acc1::join();
my_acc2::join();

This code uses the dedicated receive_one_in_order API within the scope of the send_while loop of the second accelerator, my_acc2. The receive_one_in_order or received_one asap APIs requires a user-defined lambda function body, as discussed in VPP_ACC Class API.

In this case, the my_acc1::receive_one_in_order (or receive_one_asap) will wait for the next job in-order (or asap) to finish, and then execute the lambda function body. Because it is called from the send_while of the second accelerator, my_acc2 computes in lock-step with the results generated from the first accelerator my_acc1. This API returns a boolean value which will be true when the send_while loop has exited and there are no more jobs to be received.

ACC1-ACC2 Pipeline

This model defines a pipeline of tasks where an output buffer of the first accelerator can be used directly (without any host CPU synchronization) by the second accelerator. As described in VPP_ACC Class API, the transfer_buf() API takes a buffer pool object and returns the buffer corresponding to the correct iteration. Using this API allows seamless transfer of the result buffer from the first to the second accelerator without needing to copy data. VSC runtime automatically manages the re-use of such transferred buffers across accelerators and job iterators.

The following is a code example:

auto inBP    = my_acc1::create_bufpool(vpp::input);
auto tmpBP   = my_acc1::create_bufpool(vpp::remote);
auto outBP   = my_acc2::create_bufpool(vpp::output);
my_acc1::send_while(
    [=]()->bool {
        int* in = my_acc1::alloc_buf<int>(inBP, inSz);
        int* tmp = my_acc1::alloc_buf<int>(tmpBP, tmpSz);
        my_acc1::compute(in, tmp, ...);
        ...;
        return ...;
    });
my_acc2::send_while(
    [=]()->bool {
        int* tmp;
        bool cond = my_acc1::receive_one_in_order( // or receive_one_asap
            [=, &tmp]() {
                tmp = my_acc1::transfer_buf<int>(tmpBP);
        });
        if (!cond) return false;
        int* out = my_acc2::alloc_buf<int>(outBP, outSz);
        my_acc2::compute(tmp, out, ...);
        ...;
        return true;
    });
my_acc2::receive_all_in_order(
    [=]() {
        int* out = xfilter1::get_buf<int>(outBP);
        ...;
    });
my_acc1::join();
my_acc2::join();