The cascade stream allows an AI Engine processor to transfer the value of some of its accumulator register (384 bits) to its neighbor (on the left or right depending on the row):
It is capable of 8x 48-bit word transfer v8acc48 or v4cacc48 in a single cycle.
48 bits is the number of bits of the result of a 16 bits x 16 bits multiplication.
If the transfer concerns a 768-bit register, it takes 2 clock cycles.