Reducing Window Buffer Sizes for Very High Memory Density Designs - 2025.2 English - UG1076

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2025-11-20
Version
2025.2 English

The number of cycles required for data loading needs to be balanced with the number of compute cycles required by the kernel. This balance is a main consideration when determining window sizes for a design. Balancing helps to pipeline the ping and pong buffer data loading with the kernel compute. For very high memory density designs, it makes sense to have smaller window sizes which can still balance the kernel compute. Larger window sizes can lead to mapper failure.

The following table shows the number of cycles required for the matrix multiplication of two matrices with 16-bit data. Example 1 and Example 2 have different matrix sizes, but both have their compute and data loading balanced.

Note: The larger of the A or B matrix sizes determines data loading time. Kernel compute time depends on both matrix sizes. This shows that Example 1 has smaller window sizes than Example 2, but the compute and data loading are balanced and can be pipelined.
Table 1. Matrix Multiplication Examples
  Matrix A Size Matrix B Size # of Multiplication Operations (MultOps) #Cycles for Compute

32 ops/ cycle

#Cycles for Data Loading

32 bits/ cycle

Example 1 16x64 64x16 16384 512

(16384/32)

512

(64x16x16/32)

Example 2 16x64 64x32 32768 1024

(32768/32)

1024

(64x32x16/32)