The row-wise accumulator is implemened by the L1 primitive cscRow
. This primitive basically multiplies the values of multiple NNZ entries with their correponding dense column vector values, and accumulates the results according to the row indices. The basic functions used by this primitive include xBarRow
, rowMemAcc
, and rowAgg
. The xBarRow
primitive includes formRowEntry
logic for multiplying the NNZ values with the corresponding input column vector entries and the split
, merge
logic for distributing the multiplication results to the corresponding row banks. The rowMemAcc
primitives accumulate the intermediate results in on-chip memories. Multiple on-chip memory buffers are provided to remove the floating pointer accumulation bubbles. The rowAgg
primitive collects the results from all accumulators and outputs the results in sequence.
For more information, see Row-wise Accumulator Implementation.