This example resides in the L2/benchmarks/memKernel/gemm_4CU
directory. The tutorial provides a step-by-step guide that covers commands for building and running the kernel. It performs the matrix-matrix multiplication (A * B = C); M is number of rows of matrix A/C, K is number of columns of matrix A/number of rows of matrix B, and N is number of columns of matrix B/C.