kernel m_mat_vec_mulKernels [TP_CASC_LEN *TP_SSR]
The kernels that will be created and mapped onto AIE tiles. The size of C will determine the length of the number of kernels connected with each other in series via a *cascade interface. There will be ( TP_SSR
) number cascaded kernel chains computed in parallel. Therefore, there will be ( TP_CASC_LEN
) * ( TP_SSR
) total kernels.
port <input> inA [TP_CASC_LEN *TP_SSR]
Input to the function, Matrix A. This should stored in a column major format (TP_DIM_A_LEADING_A = 1) where each *column of data is stored contiguously in memory. Row major format (TP_DIM_A_LEADING = 0) will be transposed using DMA buffer *descriptors. However, this is only supported when TT_DATA_A is cint16, int32, or float and NUM_FRAMES = 1. Configurations with *other TT_DATA_A types and NUM_FRAMES > 1, are not supported by the DMA transpose feature and must be input in column major format (TP_DIM_A_LEADING=1).
The dimensions of the matrix are specified by template parameters TP_DIM_A (number of rows), and TP_DIM_B (number of columns).
TP_DIM_A must be a multiple of (256 / 8 / sizeof(TT_DATA_A)) * TP_SSR, and TP_DIM_B must be a multiple of (256 / 8 / sizeof(TT_DATA_B)) * TP_CASC_LEN.
The matrix data can be zero-padded to achieve this requirement.
The number of samples to each Matrix A iobuffer will be (TP_DIM_A / TP_SSR) * (TP_DIM_B / TP_CASC_LEN) * *TP_NUM_FRAMES.
port <input> inB [TP_CASC_LEN *TP_SSR]
Input to the function, Vector B. The dimensions of the vector are specified by template parameter TP_DIM_B (equal to number of columns in Matrix A).
TP_DIM_B must be a multiple of (256 / 8 / sizeof(TT_DATA_B)) * TP_CASC_LEN.
The vector data can be zero-padded to achieve this requirement.
The number of samples to the Vector B iobuffer will be (TP_DIM_B / TP_CASC_LEN) * TP_NUM_FRAMES.
port <output> out [TP_SSR]
The output data of the function. For cascaded designs, this is located at the end of the cascaded kernel chain. *The number of output ports will be equal to the number of SSR ranks ( TP_SSR
).
The output type will depend on the type of the matrix and vector (TT_DATA_A and TT_DATA_B). The vector result of the matrix-vector multiplication will be the size of TP_DIM_A. The number of samples to the Output iobuffer will be (TP_DIM_A / TP_CASC_LEN) * TP_NUM_FRAMES.