AI Engine-ML processor array - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English

The SIMD VLIW AI Engine-ML comes as an array of interconnected processors using AXI-Stream interconnect blocks as shown in the following figure:

../../../../_images/AIEML-Grid.png

Differences can be seen at this level compared to the AI Engine that is in the Versal™ AI Core devices:

  • At the bottom of the processor array there is 1 (or 2 depending on the device) rows of 512KB memories. These memories can be accessed by the PL and the AI Engine-ML processors through the AXI-Stream interconnect network. DMA channels of 1 memory block has also access to neighbor memories. These memories are called ‘shared memories

  • AI Engine-ML tiles are all oriented the same way

    • Cascade stream is always left-to -right, but also top-to-bottom

    • Neighborhood structure does not depend anymore on the row index

These devices being intended for Machine Learning Inference they have been optimized for this kind of applications:

  • Supported datatype list is:

    • (u)int4, (u)int8, (u)int16, bfloat16

    • Number of 8-bit x 8-bit multipliers doubled

    • Support for 4-bit x 8-bit multiplication (4x more than in previous architecture)

    • bfloat16: 8-bit exponent, 8-bit mantissa –> keeps dynamic but with less mantissa precision that in the standard float32 (SPFP).

  • Pipeline is optimized for tensor product

    • permute blocks are no more full crossbars but are limited to specific data selection (tensor products and convolution)

    • AI Engine-ML processors have now access to their own registers. They can program the DMAs of their local memories.

    • Local memory is now 64KB long, always with 8x 128-bit wide banks.

Compute Performance is doubled in 8x8 and 16x16 and quadrupled in 4x8.