The load-splitting algorithm is used to split the input matrix into sub-matrices. The algorithm aims to balance the number of projection operations across the kernels, whilst ensuring that each kernel can process its sub-matrix within its local memory constraints. The number of projection operations is defined as the number of times a column vector is projected onto another column vector during the modified Gram-Schmidt process.
The load-splitting algorithm is implemented in the qrd_col_dist.py script, which is provided as part of the QRD library element. The script outputs the number of columns assigned to each kernel in the cascade. The user can be informed beforehand about the load-splitting by running the script with the appropriate parameters. The script takes the following parameters:
- aie_variant: The AIE variant being used (e.g., aie, aie-ml, aie-mlv2).
- data_type: The data type being used (e.g., float, cfloat).
- dim_rows: The number of rows in the input matrix.
- dim_cols: The number of columns in the input matrix.
- casc_len: The number of kernels in the cascade.
- num_frames: The number of frames being processed.