This tutorial is based on a C++ kernel that we’ll optimize for highest throughput.
The algorithm is a common linear algebra solver, the decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. For this purpose we will use the Cholesky decomposition or Cholesky factorization (pronounced /ʃo-LESS-key/). This solver is useful for several numerical problems, in particular for Monte Carlo simulations.
This algorithm has a serial complexity O(n3).
More information on wikipedia… Note that this solver is included as part the official Vitis accelerated libraries, here is a link to its documentation: https://xilinx.github.io/Vitis_Libraries/solver/2022.1/guide_L2/L2_api.html#potrf
For our purpose, we will start with a simple description implemented in C++ and explain how to adapt it for acceleration with an Alveo U50 card.