Code Modifications for the Cholesky Kernel - 2023.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-11-13
Version
2023.2 English

In this module, the code for the algorithm is moved into the header file, cholesky_kernel.hpp.

There is now an explicit parallelization and the number of parallel compute. It is determined by NCU, a constant set in cholesky_kernel.cpp through #define NCU 16.

NCU is passed as a template parameter to the chol_col_wrapper function (see below). The DATAFLOW pragma applies to the loop that calls chol_col 16 times:

template <typename T, int N, int NCU>
void chol_col_wrapper(int n, T dataA[NCU][(N + NCU - 1) / NCU][N], T dataj[NCU][N], T tmp1, int j)
{
#pragma HLS DATAFLOW

Loop_row:
    for (int num = 0; num < NCU; num++)
    {
#pragma HLS unroll factor = NCU
        chol_col<T, N, NCU>(n, dataA[num], dataj[num], tmp1, num, j);
    }
}

To ensure DATAFLOW is applied the dataA is divided into NCU portions.

Finally, the loop is unrolled with a factor NCU which implies you have NCU (i.e., 16) copies of chol_col created each working on a chunk of the data.