matrix_vector_mul performs the General Matrix Vector Multiplier (GEMV) which multiplies a matrix input with a vector input of configurable data types and dimensions.
These are the templates to configure the matrix vector multiplier: Note: Rounding modes rnd_sym_floor
and rnd_sym_ceil
are only supported on AIE-ML device.
Parameters:
TT_DATA_A | describes the data type of the input samples of Matrix A. This is a typename and must be one of the following: int16, cint16, int32, cint32, float, cfloat. |
TT_DATA_B | describes the data type of the input samples of Vector B. This is a typename and must be one of the following: int16, cint16, int32, cint32, float, cfloat. |
TP_DIM_A | is an unsigned integer which describes the number of elements along the unique dimension (rows) of Matrix A. |
TP_DIM_B | is an unsigned integer which describes the number of elements in Vector B and the number of columns in Matrix A. |
TP_SHIFT | describes power of 2 shift down applied to the accumulation of FIR terms before output. TP_SHIFT must be in the range 0 to 61. |
TP_RND | describes the selection of rounding to be applied during the shift down stage of processing. Although, TP_RND accepts unsigned integer values descriptive macros are recommended where
|
TP_NUM_FRAMES | describes the number of batches of input data that will be processed per iteration. |
TP_CASC_LEN | describes the number of AIE kernels the matrix-vector multiplication will be divided into in *series. Each kernel will receive a an equal sized split (along the common dimension) of the matrix and vector, and will pass the partial computation of the output to the next kernel in the chain via the cascade stream. TP_CASC_LEN must be in the range 1 (default) to 16. |
TP_SAT | describes the selection of saturation to be applied during the shift down stage of processing. TP_SAT accepts unsigned integer values, where:
|
TP_SSR | describes the number of kernels (or cascaded kernel chains) that will compute the matrix-vector multiplication in parallel. Each SSR rank will receive an equal sized split (along the unique dimension) of Matrix A data. There is no splitting of the vector data when TP_SSR > 1 (only split when TP_CASC_LEN > 1). The Vector B inputs across a chain of cascaded kernels will be the same across all SSR ranks. |
TP_DIM_A_LEADING | describes the leading dimension of the Matrix A data. If TP_DIM_A_LEADING=1, the columns of the matrix are contiguous in memory. This is the only supported order of Matrix A input data when doing the computation. However, if TP_DIM_A_LEADING=0, the rows of the matrix input are contiguous in memory and will be transposed at the input ports for each kernel using DMA Buffer Descriptors. This feature is currently only supported when TT_DATA_A is cint16, int32 or float, and NUM_FRAMES=1. If TT_DATA_A is int16, cint32 or cfloat or NUM_FRAMES > 1, the input matrix data must be transposed outwith the graph port connection to a column major order, and TP_DIM_A_LEADING must be set to 1. |
template < typename TT_DATA_A, typename TT_DATA_B, unsigned int TP_DIM_A, unsigned int TP_DIM_B, unsigned int TP_SHIFT, unsigned int TP_RND, unsigned int TP_NUM_FRAMES, unsigned int TP_CASC_LEN, unsigned int TP_SAT, unsigned int TP_SSR, unsigned int TP_DIM_A_LEADING > class matrix_vector_mul_graph: public graph // fields kernel m_mat_vec_mulKernels[TP_CASC_LEN *TP_SSR] port <input> inA[TP_CASC_LEN *TP_SSR] port <input> inB[TP_CASC_LEN *TP_SSR] port <output> out[TP_SSR]