An AI Engine kernel can either consume or produce blocks of data, or, it can access and produce data streams in a sample-by-sample fashion. The data access APIs for both cases are described in the following sections.
Buffer-Based Access
From the kernel perspective, an incoming block of data is called an input buffer. Input buffers are defined by the type of data contained within that buffer. The following example shows a declaration of an input buffer carrying complex integers where the real and imaginary parts are both 16 bits wide.
input_buffer<cint16> myInputBuffer;
From the kernel perspective, an outgoing block of data is called an output buffer. Again, these are defined by type. The following example shows a declaration of an output window carrying 32-bit integers.
output_buffer<int32> myOutputBuffer;
A kernel reads from its input buffers and writes to its output buffers. By default, the synchronization required to wait for an input buffer of data or provide an empty output buffer is performed before entering the kernel. There is no synchronization required within the kernel to read or write the individual elements of data. In other words, the kernel will not execute unless there is a full buffer available.
In some situations, if you are not consuming a buffer's worth of data on every
invocation of a kernel, or if you are not producing a buffer's worth of data on
every invocation, you can control the buffer synchronization by configuring the
kernel port to be async
in the Block Parameters
dialog box of the kernel.
It is also possible to have overlap from one block of input to the next. This in general is required for certain algorithms such as filters. This overlap is referred to as 'Buffer Margin'. If a Buffer margin is specified, the kernel has access to a total number of samples equal to buffer_size + margin_size.
The behavior of the buffer margin can be demonstrated using the following example.
Here, input is a vector of size 8
and this is
fed to the kernel block which is configured to have a buffer size of 6
and a buffer margin of 2
, as shown in the previous figure. The kernel should have access to a
total of 8
samples at every invocation. During the
first simulation cycle, two 0
's are prepended to
the first 6
new values from the input data. For the
subsequent simulation cycles, the kernel receives 8
values which includes 6
new values and 2
values from the previous cycle.
Stream-Based Access
Kernels can access data streams in a sample-by-sample fashion using data access APIs. With a stream-based access model, kernels receive an input stream or an output stream of typed data as an argument. Each access to these streams is synchronized (that is, reads stall if the data is not available in the stream and writes stall if the stream is unable to accept new data).
The following example shows a declaration of input and output streams
of type cint16
.
input_stream<cint16> & myInputStream;
output_stream<cint16> & myOutputStream;
There is also a direct stream communication channel between one AI Engine and the physically adjacent AI Engine, called a cascade. The cascade stream is connected within the AI Engine array in a snake-like linear fashion from AI Engine processor to processor.
The following example shows a declaration of input and output cascade
streams of type cacc48
.
input_cascade<cacc48> & myInputCascade;
output_cascade<cacc48> & myOutputCascade;