An AI Engine is an array of very-long instruction word (VLIW) processors with single instruction multiple data (SIMD) vector units that are highly optimized for compute-intensive applications, specifically digital signal processing (DSP), 5G wireless applications, and artificial intelligence (AI) technology such as machine learning (ML). They provide up to five times higher compute density for vector-based algorithms.
- Instruction-level parallelism includes two scalar instructions, two vector reads, a single vector write, and a single vector instruction executed—in total, a six-way VLIW instruction per clock cycle.
- Data-level parallelism is achieved via vector-level operations where multiple sets of data can be operated on a per-clock-cycle basis.
Each AI Engine contains both a vector and scalar processor, dedicated program memory, local 32 KB data memory, and access to local memory in any of four neighboring directions (north, south, east, or west). It also has access to DMA engines and AXI4 interconnect switches to communicate via streams to other AI Engines or to the programmable logic (PL) or the DMA.