In all the subsequent intrinsics, the input vectors go through a data shuffling function. Two parameters control this shuffling:
Start
Offset
Take the fpmul function:
vector<float,8> fpmul(vector<float,32> xbuf, int xstart, unsigned int xoffs, vector<float,8> zbuf, int zstart, unsigned int zoffs)
xbuf, xstart, xoffs: First buffer and shuffling parameters
zbuf, zstart, zoffs: Second buffer and shuffling parameters
Start: Starting offset for all lanes of the buffer
Offset: Additional lane-dependent offset for the buffer. Definition takes 4 bits per lane.
For example:
vector<float,8> ret = fpmul(xbuf,2,0x210FEDCB,zbuf,7,0x76543210)
for (i = 0 ; i < 8 ; i++)
ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
All values in hexadecimal:
ret |
xbuf |
xbuf |
Final |
zbuf |
zbuf |
Final |
|---|---|---|---|---|---|---|
0 |
2 |
B |
D |
7 |
0 |
7 |
1 |
2 |
C |
E |
7 |
1 |
8 |
2 |
2 |
D |
F |
7 |
2 |
9 |
3 |
2 |
E |
10 |
7 |
3 |
A |
4 |
2 |
F |
11 |
7 |
4 |
B |
5 |
2 |
0 |
2 |
7 |
5 |
C |
6 |
2 |
1 |
3 |
7 |
6 |
D |
7 |
2 |
2 |
4 |
7 |
7 |
E |