This function solves a system of linear equation with triangular coefficient matrix along with multiple right-hand side vectors
\[Ax=B\]
where \(A\) is a dense lower or upper triangular matrix of size \(m \times m\), \(x\) is a vector that needs to be computed, and \(B\) is an input vector. The maximum matrix size supported in FPGA is templated by NMAX.