python qrd_col_dist.py -params aie_variant,data_type,dim_rows,dim_cols,casc_len,num_frames
Due to the nature of the algorithm, it is not always possible to perfectly balance the projection operations across the kernels. However, the algorithm aims to minimize the difference in projection operations between the kernels as much as possible.
The following is an example of load-splitting for a 32x32 matrix with a cascade length of 4:
| Kernels | K0 | K1 | K2 | K3 |
|---|---|---|---|---|
| #cols | 16 | 6 | 5 | 5 |
| #projections | 120 | 111 | 120 | 145 |