template <int M, int N>
void qrd_householder(input_stream<cfloat>* __restrict matAU_0,
input_stream<cfloat>* __restrict matAU_1,
output_stream<cfloat>* __restrict matRQ_0,
output_stream<cfloat>* __restrict matRQ_1,
const int column_id);
Note
- To utilize bandwidth of input / output stream, the input matrix and output result are transfered in such way: Elem[N*4] and Elem[N*4+1] are transferred with matAU_0/matRQ_0, Elem[N*4+2] and Elem[N*4+3] are transferred with matAU_1/matRQ_1.
- Input:
input_stream<cfloat>* matAU_0
stream of input matrix, contains lower two elements of each 4 elements.input_stream<cfloat>* matAU_1
stream of input matrix, contains higher two elements of each 4 elements.column_id
column id, the elements below diagonal will be zeroed.
- Output:
input_stream<cfloat>* matRQ_0
stream of output matrix, contains lower two elements of each 4 elements.input_stream<cfloat>* matRQ_1
stream of output matrix, contains higher two elements of each 4 elements.