This table summarizes the I/O and compute balance for each supported input data type and matrix size. Review the columns using these definitions:
Mat A Type: number of bits of matrix’s A data type (
bf16is 16 bit wide)Mat B Type: number of bits of matrix’s A data type (
bf16is 16 bit wide)Compute (MAC/cyc): parallel multiply‑accumulate operations per clock cycle on the AI Engine‑ML vector processor
M: number of rows in matrix A
K: number of columns in matrix A = number of rows of matrix B
N: number of columns of matrix B
Mat A Size B: bytes required to encode matrix A
Mat B Size (B): bytes required to encode matrix B
Load Mat A (cyc): ncycles required to load matrix A
Load Mat B (cyc): cycles required to load matrix B
Compute (cyc): cycles required to perform the multiplication
Compute (%): vector processor efficiency compared to the greater of I/O or compute load
IO A (%): matrix A load efficiency compared to the greater of I/O or compute load
IO B (%): matrix B load efficiency compared to the greater of I/O or compute load
Mat A Type |
Mat B Type |
Compute (MAC/cyc) |
M |
K |
N |
Mat A Size (B) |
Mat B Size (B) |
Load Mat A (cyc) |
Load Mat B (cyc) |
Compute (cyc) |
Compute (%) |
IO A % |
IO B % |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8b |
4b |
512 |
4 |
16 |
8 |
64 |
64 |
2 |
2 |
1 |
0.5 |
1 |
1 |
8b |
4b |
512 |
8 |
16 |
8 |
128 |
64 |
4 |
2 |
2 |
0.5 |
1 |
0.5 |
8b |
4b |
512 |
4 |
32 |
8 |
128 |
128 |
4 |
4 |
2 |
0.5 |
1 |
1 |
8b |
8b |
256 |
4 |
8 |
4 |
32 |
32 |
1 |
1 |
0.5 |
0.5 |
1 |
1 |
8b |
8b |
256 |
4 |
16 |
4 |
64 |
64 |
2 |
2 |
1 |
0.5 |
1 |
1 |
8b |
8b |
256 |
8 |
8 |
4 |
64 |
32 |
2 |
1 |
1 |
0.5 |
1 |
0.5 |
8b |
8b |
256 |
2 |
8 |
8 |
16 |
64 |
0.5 |
2 |
0.5 |
0.25 |
0.25 |
1 |
8b |
8b |
256 |
4 |
8 |
8 |
32 |
64 |
1 |
2 |
1 |
0.5 |
0.5 |
1 |
8b |
8b |
256 |
2 |
16 |
8 |
32 |
128 |
1 |
4 |
1 |
0.25 |
0.25 |
1 |
8b |
8b |
256 |
4 |
16 |
8 |
64 |
128 |
2 |
4 |
2 |
0.5 |
0.5 |
1 |
16b |
8b |
128 |
4 |
4 |
4 |
32 |
16 |
1 |
0.5 |
0.5 |
0.5 |
1 |
0.5 |
16b |
8b |
128 |
8 |
4 |
4 |
64 |
16 |
2 |
0.5 |
1 |
0.5 |
1 |
0.25 |
16b |
8b |
128 |
4 |
8 |
4 |
64 |
32 |
2 |
1 |
1 |
0.5 |
1 |
0.5 |
16b |
8b |
128 |
4 |
4 |
8 |
32 |
32 |
1 |
1 |
1 |
1 |
1 |
1 |
8b |
16b |
128 |
4 |
4 |
8 |
16 |
64 |
0.5 |
2 |
1 |
0.5 |
0.25 |
1 |
8b |
16b |
128 |
4 |
4 |
4 |
16 |
32 |
0.5 |
1 |
0.5 |
0.5 |
0.5 |
1 |
16b |
16b |
64 |
4 |
4 |
4 |
32 |
32 |
1 |
1 |
1 |
1 |
1 |
1 |
16b |
16b |
64 |
2 |
4 |
8 |
16 |
64 |
0.5 |
2 |
1 |
0.5 |
0.25 |
1 |
16b |
16b |
64 |
4 |
4 |
8 |
32 |
64 |
1 |
2 |
2 |
1 |
0.5 |
1 |
16b |
16b |
64 |
4 |
2 |
8 |
16 |
32 |
0.5 |
1 |
1 |
1 |
0.5 |
1 |
32b |
16b |
32 |
2 |
4 |
8 |
32 |
64 |
1 |
2 |
2 |
1 |
0.5 |
1 |
32b |
16b |
32 |
4 |
4 |
4 |
64 |
32 |
2 |
1 |
2 |
1 |
1 |
0.5 |
32b |
16b |
32 |
4 |
2 |
4 |
32 |
16 |
1 |
0.5 |
1 |
1 |
1 |
0.5 |
16b |
32b |
32 |
2 |
4 |
8 |
16 |
128 |
0.5 |
4 |
2 |
0.5 |
0.125 |
1 |
16b |
32b |
32 |
4 |
4 |
4 |
32 |
64 |
1 |
2 |
2 |
1 |
0.5 |
1 |
32b |
32b |
16 |
4 |
2 |
4 |
32 |
32 |
1 |
1 |
2 |
1 |
0.5 |
0.5 |
32b |
32b |
16 |
4 |
4 |
4 |
64 |
64 |
2 |
2 |
4 |
1 |
0.5 |
0.5 |
32b |
32b |
16 |
8 |
2 |
4 |
64 |
32 |
2 |
1 |
4 |
1 |
0.5 |
0.25 |
bf16 |
bf16 |
128 |
4 |
8 |
4 |
64 |
64 |
2 |
2 |
1 |
0.5 |
1 |
1 |