The Bitonic Sort input TP_DIM must be a power of 2. TP_DIM * sizeof(TT_DATA) / TP_SSR must be at least 64 bytes (size of buffer on AI Engine * 2).