This function computes the LU decomposition (without pivoting) of matrix \(A\)
\[A = L U\]
where \(A\) is a dense matrix of size \(m \times m\), \(L\) is a lower triangular matrix with unit diagonal, and \(U\) is a upper triangular matrix. This function does not implement pivoting. The maximum matrix size supported in FPGA is templated by NMAX.