Sliding Multiplication - 2024.2 English

AI Engine-ML Kernel and Graph Programming Guide (UG1603)

Document ID
UG1603
Release Date
2024-11-28
Version
2024.2 English

The AI Engine API supports a type of multiple lanes multiplication, called sliding multiplication. It allows multiple lanes to do MAC operations simultaneously, and the results are added to an accumulator.

These special multiplication structures or APIs are named aie::sliding_mul*. They accept coefficient and data inputs. These classes include:

  • aie::sliding_mul_ops
  • aie::sliding_mul_x_ops
  • aie::sliding_mul_y_ops
  • aie::sliding_mul_xy_ops

For more information about these APIs and supported parameters, see the AI Engine API User Guide (UG1529).

For example, the aie::sliding_mul_ops class provides a parameterized multiplication that implements the following compute pattern.

DSX = DataStepX
DSY = DataStepY
CS = CoeffStep
P = Points
L = Lanes
c_s = coeff_start
d_s = data_start 
out[0] = coeff[c_s] * data[d_s + 0] + coeff[c_s + CS] * data[d_s + DSX] + ... + coeff[c_s + (P-1) * CS] * data[d_s + (P-1) * DSX]
out[1] = coeff[c_s] * data[d_s + DSY] + coeff[c_s + CS] * data[d_s + DSY + DSX] + ... + coeff[c_s + (P-1) * CS] * data[d_s + DSY + (P-1) * DSX]
...
out[L-1] = coeff[c_s] * data[d_s + (L-1) * DSY] + coeff[c_s + CS] * data[d_s + (L-1) * DSY + DSX] + ... + coeff[c_s + (P-1) * CS] * data[d_s + (L-1) * DSY + (P-1) * DSX]
Table 1. Template Parameters
Parameter Description
Lanes Number of output elements.
Points Number of data elements used to compute each lane.
CoeffStep Step used to select elements from the coeff register. This step is applied to element selection within a lane.
DataStepX Step used to select elements from the data register. This step is applied to element selection within a lane.
DataStepY Step used to select elements from the data register. This step is applied to element selection across lanes.
CoeffType Coefficient element type.
DataType Data element type.
AccumTag Accumulator tag that specifies the required accumulation bits. The class must be compatible with the result of the multiplication of the coefficient and data types (real/complex).

The following figure shows how to use the aie::sliding_mul_ops class and its member function, mul, to perform the sliding multiplication. It also shows how each parameter corresponds to the multiplication.

Figure 1. sliding_mul_ops Usage Example

Besides the aie::sliding_mul* classes, AI Engine API provides aie::sliding_mul* functions to do sliding multiplication and aie::sliding_mac* functions to do sliding multiplication and accumulation. These functions are simply helpers, that use the aie::sliding_mul*_ops classes internally and are provided for convenience. These include:

  • aie::sliding_mul
  • aie::sliding_mac
The following examples perform sliding multiplications (template prototypes are in comments for quick reference).
aie::vector<int16,16> va;
aie::vector<int16,64> vb0,vb1;
aie::accum<acc48,8>  acc = aie::sliding_mul_ops<8, 8, 1, 1, 1, int16, int16, acc48>::mul(va, 0, vb0, 0);
acc = aie::sliding_mul_ops<8, 8, 1, 1, 1, int16, int16, acc48>::mac(acc, va, 8, vb1, 0);
*outIter++=acc.to_vector<int32>(15);

/*template<unsigned Lanes, unsigned Points, int CoeffStep = 1, int DataStepX = 1, int DataStepY = DataStepX, AccumElemBaseType AccumTag = accauto, VectorOrOp VecCoeff = void, VectorOrOp VecData = void>
auto sliding_mul (const VecCoeff &coeff, unsigned coeff_start, const VecData &data, unsigned data_start)
*/

aie::vector<cint16,32> data_buff; 
aie::vector<cint16,8> coeff_buff; 
aie::accum<cacc48,8> acc_buff = aie::sliding_mul<8, 8>(coeff_buff, 0, data_buff, 0);
Note: All registers in sliding multiplication must be considered circular. They go back to the start after they reach the end.

Considerations When Using sliding_mul

The current restriction is:

  • Data width <=1024 bits, and Coefficient width <=512bits