While this kernel processes bfloat16 data, the function signature indicates that data type at the interface is int16 for both input and output.
void softmax_kernel::softmax(input_stream<int16>* in, output_stream<int16>* out)
Each of these int16 values represents the 16-bits of a bfloat16 value. When the kernel uses them, it reinterprets values as bfloat16 for processing. The reason for this is that when performing AI Engine simulation, input and output of data use text files. Using int16 preserves all bits of the floating-point number when read from or written to a text file. It also allows for test vector matching at the bit level.
The streaming interfaces provide input and output, which reduces latency and eliminates the need for ping pong buffers in data memory.