Data memory interfaces, stream interfaces, and cascade stream interfaces are the primary I/O interfaces that read and write data for compute to and from the AI Engine.
- The data memory interface sees one contiguous memory consisting of the data memory modules in all four directions with a total capacity of 128 KB. The AI Engine has two 256-bit wide load units and one 256-bit wide store unit. Each load or store can access the data in 128-bit or 256-bit width using 128-bit alignment.
- The AI Engine has two 32-bit input AXI4-Stream interfaces and two 32-bit output AXI4-Stream interfaces. Each stream is connected to a FIFO on both the input and output side, allowing the AI Engine to have a 128-bit access every four clock cycles or 32-bit wide access per cycle on a stream.
- The 384-bit accumulator data from one AI Engine can be forwarded to another by using the dedicated cascade stream interfaces to form a chain. There is a small, two deep, 384-bit wide FIFO on both the input and output streams that allows storing up to four values between AI Engines. In each cycle, 384-bits can be received and sent by the chained AI Engines. The cascade stream chain provides a relative tight coupling of multiple kernels operating at the same throughput.
When programming for the AI Engine, it is important to note that each AI Engine has the capability to access two 32-bit AXI4-Stream inputs, two 32-bit AXI4-Stream outputs, one 384-bit cascade stream input, one 384-bit cascade stream output, two 256-bit data loads, and one 256-bit data store. However, due to the length of the instruction, not all of these operations can be performed during the same cycle.