An AI Engine kernel can either consume or produce blocks of data. It can also access and produce data streams in a sample-by-sample fashion. The data access APIs for both cases are described in the following sections.
Buffer-Based Access
From the kernel perspective, an incoming block of data is called an input buffer. Input buffers are defined by the type of data contained within that buffer. The following example shows a declaration of an input buffer carrying complex integers where the real and imaginary parts are both 16 bits wide.
input_buffer<cint16> myInputBuffer;
From the kernel perspective, an outgoing block of data is called an output buffer. Again, these are defined by type. The following example shows a declaration of an output window carrying 32-bit integers.
output_buffer<int32> myOutputBuffer;
A kernel reads from its input buffers and writes to its output buffers. By default, the synchronization required to wait for an input buffer of data or provide an empty output buffer is performed before entering the kernel. There is no synchronization required within the kernel to read or write the individual elements of data. In other words, the kernel does not execute unless there is a full buffer available.
In some situations, if you are not consuming a buffer's worth of data on every
invocation of a kernel, or if you are not producing a buffer's worth of data on
every invocation, you can control the buffer synchronization by configuring the
kernel port to be async in the Block Parameters
dialog box of the kernel.
It is also possible to have overlap from one block of input to the next. Generally, certain algorithms such as filters require this. This overlap is referred to as 'Buffer Margin'. If you specify a buffer margin, the kernel has access to a total number of samples equal to buffer_size + margin_size.
The following example illustrates buffer margin behavior.
Here, input is a vector of size 8. This feeds
to the kernel block which is has a buffer size of 6
and a buffer margin of 2. The kernel must have
access to a total of 8 samples at every invocation.
During the first simulation cycle, two 0's are
prepended to the first 6 new values from the input
data. For the subsequent simulation cycles, the kernel receives 8 values which includes 6 new values and 2 values from the
previous cycle.
Stream-Based Access
Kernels can access data streams in a sample-by-sample fashion using data access APIs. With a stream-based access model, kernels receive an input stream or an output stream of typed data as an argument. Each access to these streams is synchronized. That means, reads stall if the data is not available in the stream, and writes stall if the stream is unable to accept new data.
The following example shows a declaration of input and output streams
of type cint16.
input_stream<cint16> & myInputStream;
output_stream<cint16> & myOutputStream;
There is also a direct stream communication channel between one AI Engine and the physically adjacent AI Engine, called a cascade. The cascade stream connects AI Engine processors in a snake-like linear pattern within the AI Engine array.
The following example shows a declaration of input and output cascade
streams of type cacc48.
input_cascade<cacc48> & myInputCascade;
output_cascade<cacc48> & myOutputCascade;