These examples was used in earlier releases of the tutorials and are now not in use. For comparison purpose they are kept as they represent ideas about potential optimizations. The kernels are indexed from 0 to 3 using the following kernels:
0. datamover_scalar.cc : Uses AIE scalar processing to move 32-bit data (cint16) each clock cycle.
1. datamover_vector_reg.cc : Uses AIE vector processing to move 256-bit data (8 lanes of cint16) each clock cycle.
The vector registers is used for temporary storage as a circular buffer.
2. datamover_mul_one.cc : Similar as vector data mover, except that the 8 lanes are passed through the DSP MUL
by multiplying with 1. This mimics the pipeline delay vector MUL/MAC signal processing.
3. stream_datamover.cc : Based on direct stream access of the 32-bit AXI Stream. As with the vector data mover,
the vector register is used as a circular buffer. To align with the 128-bit width of the
vector register, the 4:1 conversion from 32 to 128-bit in the stream API is used.
In practice this means we can read 128-bits of data every 4th clock cycle, corresponding to
4 cint16 samples. Similar argument for the output, we write 128-bit data every 4th
clock cycle and the 1:4 conversion to 32-bit stream.