In VSC you can also create multiple accelerators with different functionality in
a single .xclbin and runtime environment. With such
composition you can create a pipeline, where two or more VSC accelerators operating of
different data sets in a pipelined fashion, as shown in the sysc_compose
example in Supported Platforms and Startup Examples.
There are two possible use models described in the following sections.
ACC1-CPU-ACC2 Pipeline
This model defines a pipeline of tasks where the first accelerator computes an intermediate result which is then processed by a host application, and then further processed by a second accelerator. The output buffer of the first accelerator needs to be modified (modify while copying) and then passed to the second accelerator. Example code is provided below.
auto inBP = my_acc1::create_bufpool(vpp::input);
auto tmpoBP = my_acc1::create_bufpool(vpp::output);
auto tmpiBP = my_acc2::create_bufpool(vpp::input);
auto outBP = my_acc2::create_bufpool(vpp::output);
my_acc1::send_while(
[=]()->bool {
int* in = my_acc1::alloc_buf<int>(inBP, inSz);
int* tmp = my_acc1::alloc_buf<int>(tmpoBP, tmpSz);
my_acc1::compute(in, tmp, ...);
...;
return ...;
});
my_acc2::send_while(
[=]()->bool {
int* tmp2 = my_acc2::alloc_buf<int>(tmpiBP, tmpSz);
bool cond = my_acc1::receive_one_in_order( // or receive_one_asap
[=]() {
int* tmp1 = my_acc1::get_buf<int>(tmpoBP);
...; // tmp1 -> copy and modify -> tmp2
});
if (!cond) return false;
int* out = my_acc2::alloc_buf<int>(outBP, outSz);
my_acc2::compute(tmp2, out, ...);
...;
return true;
});
my_acc2::receive_all_in_order(
[=]() {
int* out = xfilter1::get_buf<int>(outBP);
...;
});
my_acc1::join();
my_acc2::join();
This code uses the dedicated receive_one_in_order
API within the scope of the send_while
loop of the second
accelerator, my_acc2
. The receive_one_in_order
or
received_one asap
APIs requires a user-defined lambda function
body, as discussed in VPP_ACC Class API.
In this case, the my_acc1::receive_one_in_order
(or receive_one_asap
) will wait for the next job in-order (or asap) to
finish, and then execute the lambda function body. Because it is called from the
send_while
of the second accelerator, my_acc2
computes in lock-step with the results
generated from the first accelerator my_acc1
. This
API returns a boolean value which will be true when the send_while
loop has exited and there are no more jobs to be
received.
ACC1-ACC2 Pipeline
This model defines a pipeline of tasks where an output buffer of the first
accelerator can be used directly (without any host CPU synchronization) by the
second accelerator. As described in VPP_ACC Class API, the transfer_buf()
API takes a buffer pool
object and returns the buffer corresponding to the correct iteration. Using this API
allows seamless transfer of the result buffer from the first to the second
accelerator without needing to copy data. VSC runtime automatically manages the
re-use of such transferred buffers across accelerators and job iterators.
The following is a code example:
auto inBP = my_acc1::create_bufpool(vpp::input);
auto tmpBP = my_acc1::create_bufpool(vpp::remote);
auto outBP = my_acc2::create_bufpool(vpp::output);
my_acc1::send_while(
[=]()->bool {
int* in = my_acc1::alloc_buf<int>(inBP, inSz);
int* tmp = my_acc1::alloc_buf<int>(tmpBP, tmpSz);
my_acc1::compute(in, tmp, ...);
...;
return ...;
});
my_acc2::send_while(
[=]()->bool {
int* tmp;
bool cond = my_acc1::receive_one_in_order( // or receive_one_asap
[=, &tmp]() {
tmp = my_acc1::transfer_buf<int>(tmpBP);
});
if (!cond) return false;
int* out = my_acc2::alloc_buf<int>(outBP, outSz);
my_acc2::compute(tmp, out, ...);
...;
return true;
});
my_acc2::receive_all_in_order(
[=]() {
int* out = xfilter1::get_buf<int>(outBP);
...;
});
my_acc1::join();
my_acc2::join();