The AI Engine processes data in bursts and these data bursts are transferred between AI Engines utilizing ping-pong buffers. The data from one engine is written into one of the two buffers and when it is filled, the buffers are swapped and the data read out by the downstream engine. The size of these data bursts is referred to as the window size, and establishing the optimum window size is a balancing act between throughput and latency. Larger window sizes provide higher throughput because there the burst overhead is less of an influence on the performance. However, latency increases proportionately to the window size.
Thus, the window size should be chosen to be just large enough such that the desired throughput target is met.
The following is data for the AI Engine with one 64-tap FIR filter example for various window sizes:
Impl | Filters | Taps | Window Size | Latency | Execution Time | Throughput |
---|---|---|---|---|---|---|
AIE | 1 | 64 | 64 | 0.4 us | 74.27 us | 220.59 MSPS |
AIE | 1 | 64 | 256 | 1.19 us | 58.86 us | 278.30 MSPS |
AIE | 1 | 64 | 1024 | 4.39 us | 53.23 us | 307.79 MSPS |
AIE | 1 | 64 | 2048 | 8.29 us | 47.59 us | 344.27 MSPS |
If, for example, our throughput requirements were 200 MSPS, a window size of 64 would satisfy that performance requirement with the least amount of latency.