Implementation Details - 2024.2 English

Vitis Libraries

Document ID
XD160
Release Date
2024-11-29
Version
2024.2 English

The optimization happens in the first part. All history of vectors generated is not stored, only the last \(n\) vectors. A circular queue is used in BRAMs to store these values. Depth of BRAM is set to be least 2’s power that larger than n. This makes the calculation of the address simpler. By keeping and updating the address of the starting vector, always calculate the address of vectors we need to access.

Circular Queue of vectors

To generate k-th vector, you need three read ops, for \(X_{k}\), \(X_{k + 1}\) and \(X_{k + m}\). In the next iteration, read \(X_{k + 1}\), \(X_{k + 2}\) and \(X_{k + m + 1}\). This means that you only need to read \(X_{k + 2}\) and \(X_{k + m + 1}\), since you can save \(X_{k + 1}\) in a register. So, you need two read accesses at different vectors and 1 write access for generating the new vector. Since BRAM only allows two read or write accesses at a single cycle, it is not capable of generating the new vector at each clock cycle. In the implementation, copy the identical vectors to different BRAMs, and each of them provides sufficient read or write access port.

Duplicated vectors.