A fifo_depth() constraint specification above greater than equal to 40
allocates the AI Engine compiler automatically
implements the buffer as a DMA FIFO. By default, the AI Engine compiler implements the buffers in the core tile memory
(default) but you can relocate it to a memory tile by adding a standard location
constraint.
Semaphore Locks
Synchronization for a DMA FIFO between a producer and a consumer is managed using the AIE-ML architecture's semaphore locks. These locks are six bits wide, allowing them to represent values from 0 to 63. A critical detail is that these locks track the number of entries in the FIFO, not the individual 32-bit words. This hardware characteristic imposes a max FIFO depth for any single DMA FIFO should be at or less than 63.
Words Per Entry (WPE): Packing for Deeper FIFOs
To overcome the 63-limit and achieve greater total depths, the DMA FIFO uses a feature called Words Per Entry (WPE). WPE defines how many 32-bit words are bundled together to form a single FIFO entry. The FIFO then produces and consumes data in these full, entry-sized chunks. This relationship can be expressed with a simple formula:
Total FIFO Depth (D) =
Words Per Entry (X) * Number of Entries (Y)
There are two ways to give fifo_depth in AIE-ML architectures.
- fifo_depth(net) = total_words
- fifo_depth(net) = {total_words, wordsPerEntry}
DMA FIFO is inserted when the total_words>= 40 and the following conditions are satisfied -
- If specified as fifo_depth(net) =
total_words.
- wordsPerEntry is 4 by default.
- Maximum value for total_words is 252. Otherwise, AIE compiler generates an error.
- If specified as fifo_depth(net) = {total_words,
wordsPerEntry}.
- total_words should be a multiple of wordsPerEntry.
- Maximum value of (total_words/wordsPerEntry) is 63. Otherwise, AIE compiler generates an error.
The key is to choose a WPE value (X) that keeps the number of entries (Y) at or below the 63 limit imposed by the locks.
Sizing and Configuration Examples
Choosing the right WPE is straightforward. If your desired depth (D) is 63 words or fewer, you can use a WPE of 1, making the number of entries equal to the depth. If the depth exceeds 63, you must select a WPE large enough to keep the entry count within the limit. The minimum WPE can be calculated as X = ceil((D+1) / 64).
For example, to implement a FIFO with a depth of 100 words:
- Calculate the minimum WPE:
X = ceil(100 / 64) = 2. - Calculate the resulting number of entries:
Y = ceil(100 / 2) = 50.Because 50 is less than 63, this is a valid configuration.
In your source code, you specify both the total depth and the WPE. For instance:
adf::fifo_depth(n2) = {50, 2};
The Impact of WPE on System Behavior
The choice of WPE directly affects system responsiveness. A smaller WPE creates smaller entries that become available more frequently, allowing the consumer kernel to start processing sooner. Conversely, a larger WPE creates bigger entries that take longer to fill, which increases the initial wait time before the kernel can begin.
Consider this trade-off:
- WPE = 2: The consumer kernel can start processing as soon as the first two words are written to the FIFO.
- WPE = 64: The consumer must wait until a full 64 words are available before it can begin.
For optimal performance, you should select the smallest WPE that satisfies the 63
limit while also aligning with your kernel's processing granularity. The following
is an example of a FIFO allocation for a request of fifo_depth(47)
bytes which is allocated in core tile memory.
- DMA FIFO allocated in core tile memory
-
adf::fifo_depth(n2) = 47; // Specify the WPE, default is 4 adf::fifo_depth(n2) = {47,4}; - DMA FIFO allocated in mem tile memory using location constraint
-
adf::location<adf::fifo>(n4) = {adf::dma_fifo(adf::memory_tile, 0, 0, 0x1000, 72)(n2) = {47,4}; - Cascading a stream switch FIFO and a memtile FIFO
- The AI Engine compiler allows you
to manually cascade or chain together multiple FIFO, such as stream switch
FIFOs and DMA FIFOs providing a higher FIFO
depths.
adf::fifo_depth(n2) = {44, 4}; adf::fifo_depth(n4) = {120, 4}; adf::location<adf::fifo>(n4) = {adf::ss_fifo(adf::aie_tile, 1, 0, 0), adf::dma_fifo(adf::memory_tile, 0, 0, 0x1000, 72), adf::dma_fifo(adf::aie_tile, 2, 0, 0x1000, 32)};