MAC intrinsics perform vector multiply and accumulate operations between data from two buffers, the X and Z buffers, with the other parameters and options allowing flexibility (data selection within the vectors, number of output lanes) and optional features (different input data sizes, pre-adding, etc). There is an additional input buffer, the Y buffer, whose values can be pre-added with those from the X buffer before the multiplication occurs. The result from the intrinsic is added to an accumulator.
The parameters of the intrinsics allow for flexible data selection from the different input buffers for each lane and column, all following the same pattern of parameters. A starting point in the buffer is given by the (x/y/z) start parameter which selects the first element for the first row as well as first column. To allow flexibility for each lane, (x/y/z) offsets provides an offset value for each lane that will be added to the starting point. Finally, the (x/y/z) step parameter defines the step in data selection between each column based on the previous position. It is worth noticing that when the ystep is not specified in the intrinsic it will be the symmetric of the xstep.
Main permute granularity for x/y and z buffers is 32 bits and 16 bits,
respectively. Complex numbers are considered as one entity for the permute (for
example, cint16 as 32 bits for permute). Parameter zstart
must be a compile time constant. 8-bit and 16-bit permute
granularity in x/y and 8-bit permute granularity in z have certain limitations as
addressed towards the end of this section. The following sections covers the
different data widths and explains the result of the MAC intrinsic on these data
widths.