Code modifications for the Cholesky kernel - 2022.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2022-12-01
Version
2022.2 English

In this module 4 the code for the algorithm is moved into the header file cholesky_kernel.hpp.

There is now an explicit parallelization and the number of parallel compute, it is determined by NCU, a constant set in cholesky_kernel.cpp through #define NCU 16.

NCU is passed as a template parameter to the chol_col_wrapper function (see below). The DATAFLOW pragma applies to the loop that calls chol_col 16 times:

template <typename T, int N, int NCU>
void chol_col_wrapper(int n, T dataA[NCU][(N + NCU - 1) / NCU][N], T dataj[NCU][N], T tmp1, int j)
{
#pragma HLS DATAFLOW

Loop_row:
    for (int num = 0; num < NCU; num++)
    {
#pragma HLS unroll factor = NCU
        chol_col<T, N, NCU>(n, dataA[num], dataj[num], tmp1, num, j);
    }
}

To ensure DATAFLOW is applied the dataA is divided into NCU portions.

Finally the loop is unrolled with a factor NCU which implies we have NCU (i.e. 16) copies of chol_col created each working on a chunk of the data.