Design Approach - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

We build this design using a combination of AI Engine tiles and PL resources as follows. Based on the five algorithm steps identified earlier, we partition the row and column 256-pt transforms to the AI Engine array. The goal here is to minimize compute resources by using the fewest tiles possible and still meet the 2 Gsps throughput requirement. The AI Engine array also performs point-wise twiddle multiplications in the third step. For a streaming design such as this, the act of performing “row” and “column” transforms translates into a requirement for a “memory transpose” operation. Between these “row” and “column” transforms, we must stream the samples into a storage buffer in row-major order and then extract the samples in column-major order. This is done over a number of parallel streams; the number of streams is chosen to meet the overall throughput requirement. Based on these concepts, the design consists of a “front-end” AI Engine subgraph performing “row” transforms, a “back-end” AI Engine subgraph performing “column” transforms, and a “memory transpose” operation located in PL in between.