Matrix Multiplication Compute Performance of the AI Engine ML Tiles - Matrix Multiplication Compute Performance of the AI Engine ML Tiles - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2026-03-27
Version
2025.2 English

This table summarizes the I/O and compute balance for each supported input data type and matrix size. Review the columns using these definitions:

  • Mat A Type: number of bits of matrix’s A data type (bf16 is 16 bit wide)

  • Mat B Type: number of bits of matrix’s A data type (bf16 is 16 bit wide)

  • Compute (MAC/cyc): parallel multiply‑accumulate operations per clock cycle on the AI Engine‑ML vector processor

  • M: number of rows in matrix A

  • K: number of columns in matrix A = number of rows of matrix B

  • N: number of columns of matrix B

  • Mat A Size B: bytes required to encode matrix A

  • Mat B Size (B): bytes required to encode matrix B

  • Load Mat A (cyc): ncycles required to load matrix A

  • Load Mat B (cyc): cycles required to load matrix B

  • Compute (cyc): cycles required to perform the multiplication

  • Compute (%): vector processor efficiency compared to the greater of I/O or compute load

  • IO A (%): matrix A load efficiency compared to the greater of I/O or compute load

  • IO B (%): matrix B load efficiency compared to the greater of I/O or compute load

Mat A Type

Mat B Type

Compute (MAC/cyc)

M

K

N

Mat A Size (B)

Mat B Size (B)

Load Mat A (cyc)

Load Mat B (cyc)

Compute (cyc)

Compute (%)

IO A %

IO B %

8b

4b

512

4

16

8

64

64

2

2

1

0.5

1

1

8b

4b

512

8

16

8

128

64

4

2

2

0.5

1

0.5

8b

4b

512

4

32

8

128

128

4

4

2

0.5

1

1

8b

8b

256

4

8

4

32

32

1

1

0.5

0.5

1

1

8b

8b

256

4

16

4

64

64

2

2

1

0.5

1

1

8b

8b

256

8

8

4

64

32

2

1

1

0.5

1

0.5

8b

8b

256

2

8

8

16

64

0.5

2

0.5

0.25

0.25

1

8b

8b

256

4

8

8

32

64

1

2

1

0.5

0.5

1

8b

8b

256

2

16

8

32

128

1

4

1

0.25

0.25

1

8b

8b

256

4

16

8

64

128

2

4

2

0.5

0.5

1

16b

8b

128

4

4

4

32

16

1

0.5

0.5

0.5

1

0.5

16b

8b

128

8

4

4

64

16

2

0.5

1

0.5

1

0.25

16b

8b

128

4

8

4

64

32

2

1

1

0.5

1

0.5

16b

8b

128

4

4

8

32

32

1

1

1

1

1

1

8b

16b

128

4

4

8

16

64

0.5

2

1

0.5

0.25

1

8b

16b

128

4

4

4

16

32

0.5

1

0.5

0.5

0.5

1

16b

16b

64

4

4

4

32

32

1

1

1

1

1

1

16b

16b

64

2

4

8

16

64

0.5

2

1

0.5

0.25

1

16b

16b

64

4

4

8

32

64

1

2

2

1

0.5

1

16b

16b

64

4

2

8

16

32

0.5

1

1

1

0.5

1

32b

16b

32

2

4

8

32

64

1

2

2

1

0.5

1

32b

16b

32

4

4

4

64

32

2

1

2

1

1

0.5

32b

16b

32

4

2

4

32

16

1

0.5

1

1

1

0.5

16b

32b

32

2

4

8

16

128

0.5

4

2

0.5

0.125

1

16b

32b

32

4

4

4

32

64

1

2

2

1

0.5

1

32b

32b

16

4

2

4

32

32

1

1

2

1

0.5

0.5

32b

32b

16

4

4

4

64

64

2

2

4

1

0.5

0.5

32b

32b

16

8

2

4

64

32

2

1

4

1

0.5

0.25

bf16

bf16

128

4

8

4

64

64

2

2

1

0.5

1

1