The Signal Size field on the AI Engine import block masks only applies to kernels with stream or cascade outputs. Moreover, it has no implementation significance and it is only meaningful for simulation purposes in the Simulink environment. This section provides more in-depth knowledge of what Signal Size is and how to set it.
Start with a very simple kernel with buffer input and stream output. The kernel code is as follows:
void win_in_stream_out(input_buffer<int16> & in1,output_stream<int32> * out) {
int16 val;
auto pIn = aie::begin(in1);
for (unsigned i=0; i<16; i++) {
val = *pIn++;
int32 squaring = val * val;
writeincr(out,squaring);
}
}
This kernel expects a buffer of size 16. At every invocation it generates 16 output samples. Import this kernel into Simulink using the AIE Kernel block. The following figure shows the mask for the block.
Regardless of what value you set the signal size to, it does not affect the numerical output. For this example, generally set the signal size to 16 because every invocation of the kernel produces 16 samples. In this case, the output of this block is a variable size signal of maximum size 16 (equal to the signal size) and each output contains 16 samples. However, if for example you set the signal size to 32, the output of the block will be a variable size signal with a maximum size of 32, but each output will only contain 16 samples.
What if you set the signal size to a number smaller than 16, for example to 8? In this case, similar to the previous cases, the output is a variable size signal of maximum 8. As mentioned previously, at each invocation of the kernel, the kernel produces 16 samples. Eight of these samples are put out by the block. The other eight are stored in an internal buffer in the block. If you call the kernel too many times, eventually the internal buffer of the block fills up and you receive a buffer overflow error. The following figure shows this error.
This is a trivial example. You can argue that there is no reason to set the signal size to anything less than 16, and that is correct. Now examine a model with two AI Engine kernels. Connect the output of the kernel previously created to another AI Engine kernel with buffer input and buffer output. The code for this second kernel is as follows:
void win_in_win_out(input_buffer<int32> & inw, output_buffer<int32> & outw)
{
int32 temp;
auto pIn = aie::begin(inw);
auto pOut = aie::begin(outw);
for (unsigned i=0; i<8; i++) {
temp = *pIn++;
*pOut++ = temp;
}
}
This kernel requires an input buffer of size 8 and produces a buffer size of 8. Now consider two scenarios.
First consider a case in which the first block has the signal size set to 16. As mentioned previously, with a signal size of 16, the buffer for the first block does not overflow. But now examine the second block more closely.
The second kernel upon receiving 16 samples, gets invoked twice. Each time, it produces eight samples for a total of 16 samples. However, because the output size is 8, the block produces eight samples and stores the other eight in the internal buffer. As before, if you run this model for long enough, the buffer for the second block overflows and simulation stops.
In another scenario, to avoid an overflow, set the signal size for the first block to 8. This avoids an overflow in the second block. However as mentioned previously, now the buffer for the first block overflows. So how can you avoid this situation?
The buffer overflows because you are feeding more data to the blocks than the blocks can process. If you reduce the rate, the kernels can process any excess data in the buffers and as such prevent the overflow. Now, study this more carefully.
Assume the simulation has been running for a some time and the first block's buffer is not empty. If you somehow stop feeding data to the first block, every time Simulink calls the first block, the kernel is not be invoked (there is no input data). But because there are samples in the buffer, the block continues to produce samples (eight at a time). It produces these until the buffer empties, after which it produces an empty variable size signal.
This information helps you avoid buffer overflow. Instead of stopping the input as suggested above, simply reduce the flow of the data into the first block. One way of doing this is to use a To Variable Size block from AI Engine/Tools. Then set the Output size on the block mask to a number smaller than the size of the input. The following figure depicts the same design shown above but with a To Variable Size block at its input.
In this design, because fewer samples are being fed to the first block at any given call to the block, the buffers do not overflow.