Input matrix data must be written in a column-major fashion, and the matrix is assumed to be Hermitian positive-definite. The Cholesky operation is susceptible to catastrophic cancellation; therefore, it is recommended to ensure your matrix is well-conditioned.
Data is operated on in vecSampleNum * vecSampleNum chunks, and thus only chunks along and below the diagonal are operated on. Upper matrix data is assumed to be zero, thus chunks in the upper-triangular output are undefined.
In a single-tile implementation, TP_DIM must be a multiple of vecSampleNum. In multi-tile implementations, TP_DIM / TP_GRID_DIM must be a multiple of vecSampleNum.