The Bitonic Sort is configured using the TP_CASC_LEN
template parameter. This determines the number of kernels over which the Bitonic Sort is split. To be clear, this feature does not use the cascade ports of kernels to convey any data. IO buffers are used to convey data from one kernel to the next in the chain. The term cascade is used simply in the sense that the function is split into a series of operations which are executed by a series of kernels, each on a separate tile. The Bitonic Sort is split at stage boundaries, so the TP_CASC_LEN
cannot exceed the number of bitonic stages log2(n)*(log2(n)+1)/2.