| ▼Accumulator Data Types | |
| Complex Accumulator Types | |
| Floating-Point Accumulator Types | |
| ►Integer Accumulator Types | |
| 1024-bit accumulator types | |
| 256-bit accumulator types | |
| 512-bit accumulator types | |
| ▼Load/Store Operations | |
| Addressing intrinsics | |
| ►Compressed Load Operations | Compressed load operations load a compressed vector and expand it into an AIE-ML register |
| Compressed Load Reset Operations | |
| Compressed Load of Eight Vectors | |
| Compressed Load of Four Vectors | |
| Compressed Load of One Vector | |
| Compressed Load of Two Vectors | |
| ►Compressed Sparse Load Operations | Compressed sparse load operations load a compressed sparse vector and expand it into an AIE-ML register |
| Sparse Load Fill Operations | |
| Sparse Load Peek Operations | |
| Sparse Load Pop Operations | |
| Sparse Load Reset Operations | |
| Load 4x Operations | Load 4x intrinsics load four 64-bit values to a vector register from data memory |
| ►Streams | |
| Cascade read | |
| Cascade write | |
| Stream read | |
| Stream write | |
| Scalar Data Types | All the standard C scalar data-types are supported |
| ▼Scalar Operations | |
| ►Configuration | |
| ►Mode Settings | |
| Control registers | Intrinsics to set,get and clear the control registers |
| Status registers | Intrinsics to set,get and clear the status registers |
| Core ID | |
| Cycle Counter | |
| Events | |
| Initialization | |
| Integer Operations | Intrinsics allowing you to perform select, absolute and delay operations on integer scalars |
| Locks | Intrinsics to acquire and release locks |
| Scalar Conversions | |
| Scalar updates and extracts | |
| Stream access | These functions setup stream accesses in native mode |
| ▼Vector Conversions | Various forms of conversions between vector data-types |
| ►Broadcast | Broadcasts input value to all vector lanes |
| Broadcast from scalar | Broadcasts input value to all vector lanes (alternative syntax to broadcast to vector) |
| Broadcast to vector | Broadcasts input value to all vector lanes (alternative syntax to broadcast from scalar) |
| Updating all elements with element extracted from vector | Extracts element "idx" from vector "v" and broadcasts its value to all lanes of the destination vector |
| Updating all elements with one | Broadcasts value one (1) to all vector lanes |
| Updating all elements with zero | Broadcasts value zero (0) to all vector lanes |
| Casting | Casting intrinsics allow casting (bit-reinterpretation) between vector types of the same size |
| ►Concatenate vectors | Vector concat intrinsic functions allow concatenation of vector values to create a larger one |
| Concatenate four vectors | |
| Concatenate two vectors | |
| ►Extract vector | Extraction intrinsics enable lanes to be selected from vector and accumulator types |
| Extract element from vector | |
| Extract integer and float data | |
| Extract sparsity and data from sparse vector | |
| Update sparse vectors | |
| Extract/insert element | These intrinsics allow inserting or extracting of an individual element into/from a vector |
| Float to integer conversions | Conversion from bfloat16 vector to integer vector |
| ►Insert vector | Vector insert intrinsic functions allow substitution of the lanes within a vector value |
| Insert a vector into a vector | |
| Insert an element into a vector | |
| Pack/Unpack | |
| ►Set vector | Vector set intrinsic functions allow setting the lanes within a vector value |
| Set an element of a vector | |
| Set specific lanes of a vector | |
| ►Shift-Round-Saturate | Intrinsics for moving values from accumulator data-types to vector data-types |
| AIE interface | |
| Floating-point interface | |
| Size interface | |
| ►Upshift | Intrinsics for moving values from vector data-types to accumulator data-types |
| AIE interface | |
| Floating-point | |
| Size interface | |
| ▼Vector Data Types | |
| Complex Vector Types | |
| Compressed Complex Vector Types | |
| Compressed Floating-Point Vector Types | |
| ►Compressed Integer Vector Types | |
| Compressed 256-bit vector types | |
| Compressed 512-bit vector types | |
| ►Compressed Sparse Vector Types | |
| Compressed sparse floating-point vector types | |
| Compressed sparse integer vector types | |
| Floating-Point Vector Types | |
| ►Integer Vector Types | |
| 1024-bit vector types | |
| 128-bit vector types | |
| 16-bit vector types | |
| 256-bit vector types | |
| 32-bit vector types | |
| 512-bit vector types | |
| 64-bit vector types | |
| 8-bit vector types | |
| ►Sparse Vector Types | |
| Sparse floating-point vector types | |
| Sparse integer vector types | |
| ▼Vector Operations | |
| Add/Subtract | Intrinsics and operators that allows you to perform addition and substraction operations on all types of vectors |
| Bitwise logical | Intrinsics and operators that allows you to perform bitwise logical operations on all types of vectors |
| Compare/Select | Intrinsics allowing you to perform compare and select operations on all types of vectors |
| Initialization | |
| ►Multiply Accumulate | Intrinsics allowing you to perform MUL/MAC operations and a few of their variants |
| Emulated Multiply-accumulate of 16b x 32b datatypes | Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance |
| Emulated Multiply-accumulate of 32b x 16b datatypes | Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance |
| Emulated Multiply-accumulate of 32b x 32b datatypes | Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b integer datatypes and Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance |
| Emulated Multiply-accumulate of Complex 32b x Complex 32b datatypes | Matrix multiplications in which matrix A has data elements of complex 32 bit and matrix B has data elements of complex 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b complex integer datatypes and might not have optimal performance |
| Emulated Multiply-accumulate of fp32 x fp32 datatypes | Elementwise-multiplication and matrix multiplication using bfloat16 datapath. 2 options available. With or without set_rnd(0) for truncation before using these intrinsics. Use flag AIE_FP32_EMULATION_SET_RND_MODE flag to set rnd mode to truncation. For an explanation how these operations works see Multiply Accumulate |
| Multiply-accumulate of 16b x 16b complex integer datatypes | Matrix multiplications in which matrix A and matrix B have complex data elements of 16 bit. For an explanation how these operations works see Multiply Accumulate |
| Multiply-accumulate of 16b x 16b integer datatypes | Matrix multiplications in which matrix A and matrix B have data elements of 16 bit |
| Multiply-accumulate of 16b x 8b integer datatypes | Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 8 bit |
| Multiply-accumulate of 32b x 16b complex integer datatypes | Matrix multiplications in which matrix A has complex data elements of 32 bit and matrix B has complex data elements of 16 bit |
| Multiply-accumulate of 32b x 16b integer datatypes | Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit |
| Multiply-accumulate of 8b x 4b datatypes | Matrix multiplications in which matrix A has data elements of 8 bit and matrix B has data elements of 4 bit. These operations are emulated on top of int8 x int8 |
| Multiply-accumulate of 8b x 8b integer datatypes | Matrix multiplications in which matrix A and matrix B have data elements of 8 bit |
| Multiply-accumulate of bfloat16 datatypes | Matrix multiplications in which matrix A and B have bfloat16 data elements |
| Multiply-accumulate with a sparse matrix | Matrix multiplications in which matrix B is a sparse matrix |
| Negation control in complex multiplication modes | In order to do complex multiplications, some terms need to be negated |
| Shift | These intrinsics allow shifting full vectors |
| Shift element | |
| ►Shuffle | Intrinsics allowing you perform vector shuffles |
| Illustration of Shuffle Modes | |