The Signal Size field on the AI Engine import block masks only applies to kernels with stream or cascade outputs. Moreover, it has no implementation significance and it is only meaningful for simulation purposes in the Simulink environment. This section provides more in-depth knowledge of what Signal Size is and how to set it.
Start with a very simple kernel with buffer input and stream output. The kernel code is as follows:
void win_in_stream_out(input_buffer<int16> & in1,output_stream<int32> * out) {
int16 val;
auto pIn = aie::begin(in1);
for (unsigned i=0; i<16; i++) {
val = *pIn++;
int32 squaring = val * val;
writeincr(out,squaring);
}
}
This kernel expects a buffer of size 16 and at every invocation of this kernel, 16 output samples are generated. Import this kernel into Simulink using the AIE Kernel block. The mask for the block is shown in the following figure.
Regardless of what value you set the signal size to, it does not affect the numerical output. For this example, you will generally set the signal size to 16 because every invocation of the kernel produces 16 samples. In this case, the output of this block will be a variable size signal of maximum size 16 (equal to the signal size) and each output will contain 16 samples. However, if for example you set the signal size to 32, the output of the block will be a variable size signal with a maximum size of 32, but each output will only contain 16 samples.
What if you set the signal size to a number smaller than 16, for example to 8? In this case, similar to the previous cases, the output will be a variable size signal of maximum size of 8. As mentioned previously, at each invocation of the kernel, the kernel produces 16 samples. Eight of these samples will be put out by the block. The other eight are stored in an internal buffer in the block. If you call the kernel too many times, eventually the internal buffer of the block will fill up and you will see a buffer overflow error as shown in the following figure.
This is a trivial example. You might contend that there is no reason to set the signal size to anything less than 16, and that is correct. Now examine a model with two AI Engine kernels. Connect the output of the kernel previously created to another AI Engine kernel with buffer input and buffer output. The code for this second kernel is as follows:
void win_in_win_out(input_buffer<int32> & inw, output_buffer<int32> & outw)
{
int32 temp;
auto pIn = aie::begin(inw);
auto pOut = aie::begin(outw);
for (unsigned i=0; i<8; i++) {
temp = *pIn++;
*pOut++ = temp;
}
}
This kernel requires an input buffer of size 8 and produces a buffer size of 8. Now consider two scenarios. First consider a case in which the first block has the signal size set to 16. As mentioned previously, with a signal size of 16, the buffer for the first block will not overflow. But now examine the second block more closely. The second kernel upon receiving 16 samples, will get invoked twice. Each time, it produces eight samples for a total of 16 samples. However, because the output size is 8, the block will produce eight samples and store the other eight in the internal buffer. As before, if you run this model for long enough, the buffer for the second block will overflow and simulation will stop.
In another scenario, to avoid an overflow, you might set the signal size for the first block to 8. This will avoid an overflow in the second block. However as mentioned previously, now the buffer for the first block will overflow. So how can you get out of this situation?
The buffer overflows because you are feeding more data to the blocks than the blocks can process. If you reduce the rate, the kernels will be able to process any excess data in the buffers and as such prevent the overflow. Now look into this more carefully.
Assume the simulation has been running for a while and the first block's buffer is not empty. If you somehow stop feeding data to the first block, every time Simulink calls the first block, the kernel will not be invoked (there is no input data), but because there are samples in the buffer, the block will continue to produce samples (eight at a time) until the buffer empties out after which it will produce an empty variable size signal.
This information should help you avoid buffer overflow. Instead of stopping the input as suggested above, simply reduce the flow of the data into the first block. One way of doing this is to use a To Variable Size block from AI Engine/Tools and set the Output size on the block mask to a number smaller than the size of the input. The following figure depicts the same design shown above but with a To Variable Size block at its input.
In this design, because fewer samples are being fed to the first block at any given call to the block, the buffers will not overflow. Note that the output of this model will be different from the model without the buffer block because the buffer block produces zero samples at time step zero.