There is a configuration defined the kernel implement cpp file.
#if (QRF_A_ROWS <= 1024 && QRF_A_ROWS > 512)
const int PowUnroll_t = 10;
const int PowNCU_t = 5;
#elif (QRF_A_ROWS <= 512 && QRF_A_ROWS > 256)
const int PowUnroll_t = 9;
const int PowNCU_t = 5;
#elif (QRF_A_ROWS <= 256 && QRF_A_ROWS > 128)
const int PowUnroll_t = 8;
const int PowNCU_t = 5;
#elif (QRF_A_ROWS <= 128 && QRF_A_ROWS > 64)
const int PowUnroll_t = 7;
const int PowNCU_t = 2;
#elif (QRF_A_ROWS <= 64 && QRF_A_ROWS > 32)
const int PowUnroll_t = 6;
const int PowNCU_t = 2;
#elif (QRF_A_ROWS <= 32 && QRF_A_ROWS > 16)
const int PowUnroll_t = 5;
const int PowNCU_t = 2;
#elif (QRF_A_ROWS <= 16 && QRF_A_ROWS > 8)
const int PowUnroll_t = 4;
const int PowNCU_t = 2;
#elif (QRF_A_ROWS <= 8 && QRF_A_ROWS > 4)
const int PowUnroll_t = 3;
const int PowNCU_t = 1;
#endif
const int POWFoldRow_t = 2;
const int NCU_t = 1 << PowNCU_t;
const int UnrollSize_t = 1 << (PowUnroll_t - POWFoldRow_t);
So this kernel could automatically deduce the right configuration (PowUnroll_t and PowNCU_t) when set to appropriate input matrix rows and columns.
You can also set a parameter list suitable for your case according to the logical relationship of these parameters.
The base configuration class is:
template <1024, 256, 10, 2, 32, 5, float>