QR decomposition, is a decomposition of a matrix \(A\) into a product of an orthogonal matrix \(Q\) and an upper triangular matrix \(R\).
This API shows a high performance design of QRD in Versal device. For complex float 1024*256 matrix, this design can achieve 790+ GFLOPS on VCK190.
For DSP performance, nearly 100% sustained to peak performance is achieved.
This design structure is highly scalable. In the smaller dimension of 256*64, resources and performance are linearly related to the case of 1024*256.
QRD is often used to solve the linear least squares problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.
There are several methods for actually computing the QR decomposition, such as by means of the Gram-Schmidt process, Householder transformations, or Givens rotations. Each has a number of advantages and disadvantages.
In our design, Gram-Schmidt is used.