Parallelism for the Cholesky is configured using the TP_GRID_DIM template parameter. This parameter scales the size of the lower-triangular grid of tiles used to split up the input matrix. Only the lower-triangular tiles are used since the upper-triangular output can be assumed to resolve to 0.
TP_GRID_DIM must be a factor of TP_DIM, and the resulting sub-matrix dimension must be a multiple of vecSampleNum.
The number of AIE tiles used in the design scales according to TP_GRID_DIM * (TP_GRID_DIM + 1) / 2.
The following is an example of TP_DIM and TP_GRID_DIM being used:
vecSampleNum= 2TP_DIM= 6TP_GRID_DIM= 3
+-----+ |0 6 |12 14 20 26 |1 7 |13 15 21 27 +-----+-----+ |2 8 |14 16|22 28 |3 9 |15 17|23 29 +-----+-----+-----+ |4 10|16 18|24 30| |5 11|17 19|25 31| +-----+-----+-----+
Note
The numbers represent sample indices of the global matrix (must be column-major), and the drawn boundaries represent the samples processed by each sub-kernel. To minimize tile wastage, tiles whose samples are assumed to entirely resolve to 0 are not hooked up.