Overview - 2024.1 English

Vitis Libraries

Release Date
2024-08-06
Version
2024.1 English

matrix_vector_mul performs the General Matrix Vector Multiplier (GEMV) which multiplies a matrix input with a vector input of configurable data types and dimensions.

These are the templates to configure the matrix vector multiplier: Note: Rounding modes rnd_sym_floor and rnd_sym_ceil are only supported on AIE-ML device.

Parameters:

TT_DATA_A

describes the data type of the input samples of Matrix A.

This is a typename and must be one of the following:

int16, cint16, int32, cint32, float, cfloat.

TT_DATA_B

describes the data type of the input samples of Vector B.

This is a typename and must be one of the following:

int16, cint16, int32, cint32, float, cfloat.

TP_DIM_A is an unsigned integer which describes the number of elements along the unique dimension (rows) of Matrix A.
TP_DIM_B is an unsigned integer which describes the number of elements in Vector B and the number of columns in Matrix A.
TP_SHIFT

describes power of 2 shift down applied to the accumulation of FIR terms before output.

TP_SHIFT must be in the range 0 to 61.

TP_RND

describes the selection of rounding to be applied during the shift down stage of processing.

Although, TP_RND accepts unsigned integer values descriptive macros are recommended where

  • rnd_floor = Truncate LSB, always round down (towards negative infinity).

  • rnd_ceil = Always round up (towards positive infinity).

  • rnd_sym_floor = Truncate LSB, always round towards 0.

  • rnd_sym_ceil = Always round up towards infinity.

  • rnd_pos_inf = Round halfway towards positive infinity.

  • rnd_neg_inf = Round halfway towards negative infinity.

  • rnd_sym_inf = Round halfway towards infinity (away from zero).

  • rnd_sym_zero = Round halfway towards zero (away from infinity).

  • rnd_conv_even = Round halfway towards nearest even number.

  • rnd_conv_odd = Round halfway towards nearest odd number.

    No rounding is performed on ceil or floor mode variants.

    Other modes round to the nearest integer. They differ only in how they round for values of 0.5.

TP_NUM_FRAMES describes the number of batches of input data that will be processed per iteration.
TP_CASC_LEN

describes the number of AIE kernels the matrix-vector multiplication will be divided into in *series.

Each kernel will receive a an equal sized split (along the common dimension) of the matrix and vector, and will pass the partial computation of the output to the next kernel in the chain via the cascade stream. TP_CASC_LEN must be in the range 1 (default) to 16.

TP_SAT

describes the selection of saturation to be applied during the shift down stage of processing.

TP_SAT accepts unsigned integer values, where:

  • 0: none = No saturation is performed and the value is truncated on the MSB side.
  • 1: saturate = Default. Saturation rounds an n-bit signed value in the range [- ( 2^(n-1) ) : +2^(n-1) - 1 ].
  • 3: symmetric = Controls symmetric saturation. Symmetric saturation rounds an n-bit signed value in the range [- ( 2^(n-1) -1 ) : +2^(n-1) - 1 ].
TP_SSR

describes the number of kernels (or cascaded kernel chains) that will compute the matrix-vector multiplication in parallel. Each SSR rank will receive an equal sized split (along the unique dimension) of Matrix A data.

There is no splitting of the vector data when TP_SSR > 1 (only split when TP_CASC_LEN > 1). The Vector B inputs across a chain of cascaded kernels will be the same across all SSR ranks.

TP_DIM_A_LEADING

describes the leading dimension of the Matrix A data. If TP_DIM_A_LEADING=1, the columns of the matrix are contiguous in memory. This is the only supported order of Matrix A input data when doing the computation.

However, if TP_DIM_A_LEADING=0, the rows of the matrix input are contiguous in memory and will be transposed at the input ports for each kernel using DMA Buffer Descriptors. This feature is currently only supported when TT_DATA_A is cint16, int32 or float, and NUM_FRAMES=1.

If TT_DATA_A is int16, cint32 or cfloat or NUM_FRAMES > 1, the input matrix data must be transposed outwith the graph port connection to a column major order, and TP_DIM_A_LEADING must be set to 1.

template <
    typename TT_DATA_A,
    typename TT_DATA_B,
    unsigned int TP_DIM_A,
    unsigned int TP_DIM_B,
    unsigned int TP_SHIFT,
    unsigned int TP_RND,
    unsigned int TP_NUM_FRAMES,
    unsigned int TP_CASC_LEN,
    unsigned int TP_SAT,
    unsigned int TP_SSR,
    unsigned int TP_DIM_A_LEADING
    >
class matrix_vector_mul_graph: public graph

// fields

kernel m_mat_vec_mulKernels[TP_CASC_LEN *TP_SSR]
port <input> inA[TP_CASC_LEN *TP_SSR]
port <input> inB[TP_CASC_LEN *TP_SSR]
port <output> out[TP_SSR]