In some situations, if you are not consuming a windows
worth of data on every invocation of a kernel, or if you are not producing a windows
worth of data on every invocation, then you can control the buffer synchronization by
declaring the kernel port to be async
as shown in the
following.
connect< window<128, 32> net1 (in, async(first.in[0]));
connect< window<128, 32> net2 (async(first.out[0]), out);
This declaration tells the compiler to omit synchronization of the window buffer upon entry to the kernel. You must use window synchronization APIs shown inside the kernel code before accessing the window using read/write APIs, as shown in the following.
void super_kernel(input_window<int32> * data, output_window<int32> * result) {
...
window_acquire(data); // acquire input window unconditionally inside the kernel
if (<somecondition>) {
window_acquire(result); // acquire output window conditionally
}
... // do some computation with "data" and "result"
window_release(data); // release input window inside the kernel
if (<somecondition>) {
window_release(result); // release output window conditionally
}
...
};
The window_acquire
API
performs the appropriate synchronization and initialization to ensure that the window
object is available for read or write. The API keeps track of the appropriate buffer
pointers and locks to be acquired internally, even if the window is shared across
AI Engine processors and can be double-buffered.
This API can be called unconditionally or conditionally under dynamic control, and is
potentially a blocking operation. It is your responsibility to ensure that the
corresponding window_release
API is executed some time
later (possibly even in a subsequent kernel call) to release the lock associated with
that window object. Incorrect synchronization can lead to a deadlock in your code.