AI Engine API User Guide (AIE) 2023.2

Vitis 2023.2
Documentation changes
 Integrate AIEML documentation
 Document rounding modes
 Expand accumulate documentation
 Clarify limitations on 8b parallel lookup
 Fix mmul member documentation
 Clarify requirement for linear_approx step bits
 Improve documentation of vector, accum, and mask
 Highlight architecture requirements of functions using C++ requires clauses
 Document FFT twiddle factor generation
 Clarify internal rounding mode for bfloat16 to integer conversion
 Clarify native and emulated modes for mmul
 Clarify native and emulated modes for sliding_mul
 Document sparse_vector_input_buffer_stream with memory layout and GEMM example
 Document tensor_buffer_stream with a GEMM example
Global AIE API changes
 Add cfloat support for AIEML
Changes to data types
 vector: Optimize grow_replicate on AIEML
 mmul: Support reinitialization from an accum
 DM resources: Add compound aie_dm_resource variants
 streams: Add sparse_vector_input_buffer_stream for loading sparse data on AIEML
 streams: Add tensor_buffer_stream to handle multidimensional addressing for AIEML
 bfloat16: Add specialization for std::numeric_limits on AIEML
Changes to operations
 abs: Fix for float input
 add_reduce: Optimize for 8b and 16b types on AIEML
 div: Implement vectorvector and vectorscalar division
 downshift: Implement logical_downshift for AIE
 fft: Add support for 32 bit twiddles on AIE
 fft: Fix for radix3 and radix5 FFTs on AIE
 fft: Fix radix5 performance for low vectorizations on AIE
 fft: Add stagebased FFT functions and deprecate iterator interface
 mul: Fix for vector * vector_elem_ref on AIE
 print_fixed: Support printing Q format data
 print_matrix: Added accumulator support
 sliding_mul: Add support float
 sliding_mul: Add support for remaining 32b modes for AIEML
 sliding_mul: Add support for Points < Native Points
 sliding_mul_ch: Fix DataStepX == DataStepY requirement
 sincos: Optimize AIE implementation
 to_fixed: Fix for AIEML
 to_fixed/to_float: Add vectorized float conversions for AIE
 to_fixed/to_float: Add generic conversions ((int8, int16, int32) <> (bfloat16, float)) for AIEML
ADF integration
 Add TLAST support for stream reads on AIEML
 Add support for input_cascade and output_cascade types
 Deprecate accum reads from input_stream and output_stream
Vitis 2023.1
Documentation changes
 Add explanation of FFT inputs
 Use block_size in FFT docs
 Clarify matrix data layout expectations
 Clarify downshift being arithmetic
 Correct description of bfloat16 linear_approx lookup table
Global AIE API changes
 Do not explicitly initialize inferred template arguments
 More aggressive inlining of internal functions
 Avoid using 128b vectors in stream helper functions for AIEML
Changes to data types
 iterator: Do not declare iterator data members as const
 mask: Optimized implementation for 64b masks on AIEML
 mask: New constructors added to initialize the mask from uint32 or uint64 values
 vector: Fix 1024b inserts
 vector: Use 128b concats in upd_all
 vector: Fix 8b unsigned to_vector for AIEML
Changes to operations
 add/sub: Support for dynamic accumulator zeroization
 begin_restrict_vector: Add implementation for io_buffer
 eq: Add support for complex numbers
 fft: Correctly set radix configuration in fft::begin_stage calls
 inv/invsqrt: Add implementation for AIEML
 linear_approx: Performance optimization for AIEML
 logical_downshift: New function that implements a logical downshift (as opposed to aie::downshift, which is arithmetic)
 max/min/maxdiff: Add support for dynamic sign
 mmul: Implement 16b 8x2x8 mode for AIEML
 mmul: Implement 8b 8x8x8 mode for AIEML
 mmul: Implemet missing 16b x 8b and 8b x 4b sparse multiplication modes for AIEML
 neq: Add support for complex numbers
 parallel_lookup: Optimize implementation for signed truncation
 print_matrix: New function that prints vectors with the specified matrix shape
 shuffle_up/down: Minor optimization for 16b
 shuffle_up/down: Optimized implementation for AIEML
 sliding_mul: Support data_start/coeff_start values larger than vector size
 sliding_mul: Add support for 32b modes for AIEML
 sliding_mul: Add 2 point 16b 16 channel for AIEML
 sliding_mul_ch: New function for multichannel multiplication modes for AIEML
 sliding_mul_sym_uct: Fix for 16b twobuffer implementation
 store_unaligned_v: Optimized implementation for AIEML
 transpose: Add support for 64b and 32b types
 transpose: Enable transposition of 256 element 4b vectors (scalar implementation for now)
 to_fixed: Add bfloat16 to int32 conversion on AIEML
Vitis 2022.2
Documentation changes
 Add code samples for load_v/store_v and load_unaligned_v/store_unaligned_v
 Enhanced documentation for parallel_lookup and linear_approx
 Clarify coeff vector size limit on AIEML
Global AIE API changes
 Remove usage of srs in compare functions, to avoid compilation warnings as it is deprecated
 Add support for stream ADF vector types on AIEML
Changes to data types
 mask: add shift operators
 saturation_mode: add saturate value. It was previously named truncate, which is not correct. The old name is also kept until it is deprecated
Changes to operations
 add: support accumulator addition on AIEML
 add_reduce: add optimized implementation for cfloat on AIE
 add_reduce: add optimized implementation for bfloat16 on AIEML
 eq/neq: enhanced implementation on AIEML
 le: enhanced implementation on AIEML
 load_unaligned_v: leverage pointer truncation to 128b done by HW on AIE
 fft: add support for radix 3/5 on AIE
 mmul: add matrix x vector multiplicatio modes on AIE
 mmul: add support for dynamic accumulator zeroization
 to_fixed: added implementation for AIEML
 to_fixed: provide a default return type
 to_float: added implementation for AIEML
 reverse: optimized implementation for 32b and 64b on AIEML
 zeros: include fixes on AIE
Vitis 2022.1
Documentation changes
 Small documentation fixes for operators
 Issues of documentation on msc_square and mmul
 Enhance documentation for sliding_mul operations
 Change logo in documentation
 Add documentation for ADF stream operators
Global AIE API changes
 Add support for emulated FP32 data types and operations on AIEML
Changes to data types
 unaligned_vector_iterator: add new type and helper functions
 random_circular_vector_iterator: add new type and helper functions
 iterator: add linear iterator type and helper functions for scalar values
 accum: add support for dynamic sign in to/from_vector on AIEML
 accum: add implicit conversion to float on AIEML
 vector: add support for dynamic sign in pack/unpack
 vector: optimization of initialization by value on AIEML
 vector: add constructor from 1024b native types on AIEML
 vector: fixes and optimizations for unaligned_load/store
Changes to operations
 adf::buffer_port: add many wrapper iterators
 adf::stream: annotate read/write functions with stream resource so they can be scheduled in parallel
 adf::stream: add stream operator overloading
 fft: performance fixes on AIEML
 max/min/maxdiff: add support for bfloat16 and float on AIEML
 mul/mmul: add support for bfloat16 and float on AIEML
 mul/mmul: add support for dynamic sign AIEML
 parallel_lookup: expanded to int16>bfloat, performance optimisations, and softmax kernel
 print: add support to print accumulators
 add/max/min_reduce: add support for float on AIEML
 reverse: add optimized implementation on AIEML using matrix multiplications
 shuffle_down_replicate: add new function
 sliding_mul: add 32b for 8b * 8b and 16b * 16b on AIEML
 transpose: add new function and implementation for AIEML
 upshift/downshift: add implementation for AIEML
Vitis 2021.2
Documentation changes
 Fix description of sliding_mul_sym_uct
 Make return types explicit for better documentation
 Fix documentation for sin/cos so that it says that the input must be in radians
 Add support for concepts
 Add documenttion for missing arguments and fix wrong argument names
 Fixes in documentation for int4/uint4 AIEML types
 Add documentation for the mmul class
 Update documentation about supported accumulator sizes
 Update the matrix multiplication example to use the new MxKxN scheme and size_A/size_B/size_C
Global AIE API changes
 Make all entry points always_inline
 Add declaration macros to aie_declaration.hpp so that they can be used in headers parsed by aiecompiler
Changes to data types
 Add support for bfloat16 data type on AIEML
 Add support for cint16/cint32 data types on AIEML
 Add an argument to vector::grow, to specify where the input vector will be located in the output vector
 Remove copy constructor so that the vector type becomes trivial
 Remove copy constructor so that the mask type becomes trivial
 Make all member functions in circular_index constexpr
 Add tiled_mdspan::begin_vector_dim functions that return vector iterators
 Initial support for sparse vectors on AIEML, including iterators to read from memory
 Make vector methods always_inline
 Make vector::push be applied to the object it is called on and return a reference
Changes to operations
 add: Implementation optimization on AIEML
 add_reduce: Implement on AIEML
 bit/or/xor: Implement scalar x vector variants of bit operations
 equal/not_equal: Add fix in which not all lanes were being compared for certain vector sizes.
 fft: Interface change to enhance portability across AIE/AIEML
 fft: Add initial support on AIEML
 fft: Add alignment checks for x86sim in FFT iterators
 fft: Make FFT output interface uniform for radix 2 cint16 upscale version on AIE
 filter_even/filter_odd: Functional fixes
 filter_even/filter_odd: Performance improvement for 4b/8b/16b implementations
 filter_even/filter_odd: Performance optimization on AIEML
 filter_even/filter_odd: Do not require step argument to be a compiletime constant
 interleave_zip/interleave_unzip: Improve performance when configuration is a runtime value
 interleave_*: Do not require step argument to be a compiletime constant
 load_floor_v/load_floor_bytes_v: New functions that floor the pointer to a requested boundary before performing the load.
 load_unaligned_v/store_unaligned_v: Performance optimization on AIEML
 lut/parallel_lookup/linear_approx: First implementation of lookup based linear functions on AIEML.
 max_reduce/min_reduce: Add 8b implementation
 max_reduce/min_reduce: Implement on AIEML
 mmul: Implement new shapes for AIEML
 mmul: Initial support for 4b multiplication
 mmul: Add support for 80b accumulation for 16b x 32b / 32b x 16b cases
 mmul: Change dimension names from MxNxK to MxKxN
 mmul: Add size_A/size_B/size_C data members
 mul: Optimized mul+conj operations to merged into a single intrinsic call on AIEML
 sin/cos/sincos: Fix to avoid int > unsigned conversions that reduce the range
 sin/cos/sincos: Use a compiletime division to compute 1/PI
 sin/cos/sincos: Fix floatingpoint range
 sin/cos/sincos: Optimized implementation for float vector
 shuffle_up/shuffle_down: Elements don't wrap around anymore. Instead, new elements are undefined.
 shuffle_up_rotate/shuffle_down_rotate: New variants added for the cases in which elements need to wraparound
 shuffle_up_replicate: Variant added which replicates the first element.
 shuffle_up_fill: Variant added which fills new elements with elements from another vector.
 shuffle_*: Optimization in shuffle primitives on AIE, especially for 8b/16b cases
 sliding_mul: Fixes to handle larger Step values for cfloat variants
 sliding_mul: Initial implementation for 16b x 16b and cint16b x cint16b on AIEML
 sliding_mul: Optimized mul+conj operations to merged into a single intrinsic call on AIEML
 sliding_mul_sym: Fixes in start computation for filters with DataStepX > 1
 sliding_mul_sym: Add missing int32 x int16 / int16 x int32 type combinations
 sliding_mul_sym: Fix twobuffer sliding_mul_sym acc80
 sliding_mul_sym: Add support for separate left/right start arguments
 store_v: Support pointers annotated with storage attributes