kernel m_mat_vec_mulKernels [TP_CASC_LEN *TP_SSR]
The kernels that will be created and mapped onto AIE tiles. The size of C will determine the length of the number of kernels connected with each other in series via a cascade interface. There will be ( TP_SSR ) number cascaded kernel chains computed in parallel. Therefore, there will be ( TP_CASC_LEN ) * ( TP_SSR ) total kernels.
port_conditional_array <input, (TP_USE_MATRIX_RELOAD==0), (TP_SSR*TP_CASC_LEN)> inA
Input to the function, Matrix A. This should stored in a column major format (TP_DIM_A_LEADING_A = 1) where the column elements are stored contiguously in memory. Row major format (TP_DIM_A_LEADING = 0) will be transposed using DMA buffer descriptors. However, this is only supported when TT_DATA_A is cint16, int32, or float and NUM_FRAMES = 1. Configurations with other TT_DATA_A types and NUM_FRAMES > 1, are not supported by the DMA transpose feature and must be input in column major format (TP_DIM_A_LEADING=1).
The dimensions of the matrix are specified by template parameters TP_DIM_A (number of rows), and TP_DIM_B (number of columns).
TP_DIM_A must be a multiple of (256 / 8 / sizeof(TT_DATA_A)) * TP_SSR, and TP_DIM_B must be a multiple of (256 / 8 / sizeof(TT_DATA_B)) * TP_CASC_LEN.
The matrix data can be zero-padded to achieve this requirement.
The number of samples to each Matrix A iobuffer will be (TP_DIM_A / TP_SSR) * (TP_DIM_B / TP_CASC_LEN) * *TP_NUM_FRAMES.
port_conditional_array <input, (TP_USE_MATRIX_RELOAD==1), (TP_SSR*TP_CASC_LEN)> matrixA
RTP port input for Matrix A when TP_USE_MATRIX_RELOAD = 1.
port <input> inB [TP_SSR *TP_CASC_LEN *(TP_DUAL_IP+1)]
Input to the function, Vector B. The dimensions of the vector are specified by template parameter TP_DIM_B (equal to number of columns in Matrix A).
TP_DIM_B must be a multiple of (256 / 8 / sizeof(TT_DATA_B)) * TP_CASC_LEN.
The vector data can be zero-padded to achieve this requirement.
The number of samples to the Vector B iobuffer will be (TP_DIM_B / TP_CASC_LEN) * TP_NUM_FRAMES.
port <output> out [TP_SSR *(TP_NUM_OUTPUTS)]
The output data of the function. For cascaded designs, this is located at the end of the cascaded kernel chain. The number of output ports will be equal to the number of SSR ranks ( TP_SSR ).
The output type will depend on the type of the matrix and vector (TT_DATA_A and TT_DATA_B). The vector result of the matrix-vector multiplication will be the size of TP_DIM_A. The number of samples to the Output iobuffer will be (TP_DIM_A / TP_CASC_LEN) * TP_NUM_FRAMES.