Stage 1 - 2024.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2024-12-06
Version
2024.2 English

The figure below shows the processing performed by Stage 1. Here two rounds of butterflies perform local reordering of 4-tuples of consecutive samples. As with Stage 0, SIMD instructions perform a total of 8 parallel comparisons per cycle. Notice the second round of butterfly processing is identical to the single round from Stage 0.

figure

The code block below implements the first round of Stage 1 using AIE API. It uses the same 16-lane vector register along with the aie::max() and aie::min() routines for sample comparison, and the fpshuffle16() intrinsic to perform I/O sample extraction for the “top” and “bottom” samples of each butterfly. Note how AI Engine coding style permits a mixed usage of AIE API and intrinsics in the same code using a common set of AIE API register definitions. This makes it very convenient to “drop down” to intrinsics if necessary from within an AIE API coding framework. Profiling reveals this function requires 27 cycles per invocation.

void __attribute__((noinline)) bitonic_fp16::stage1a( aie::vector<float,16>& vec )
{
  static constexpr unsigned BFLY_STAGE1a_TOP_I = 0xDC985410;
  static constexpr unsigned BFLY_STAGE1a_BOT_I = 0xEFAB6723;
  static constexpr unsigned BFLY_STAGE1a_TOP_O = 0xAB328910;
  static constexpr unsigned BFLY_STAGE1a_BOT_O = 0xEF76CD54;
  vec = fpshuffle16(vec,0,BFLY_STAGE1a_TOP_I,BFLY_STAGE1a_BOT_I);
  aie::vector<float,8>  v_top = vec.extract<8>(0);
  aie::vector<float,8>  v_bot = vec.extract<8>(1);
  aie::vector<float,8>  v_mx = aie::max(v_top, v_bot);
  aie::vector<float,8>  v_mn = aie::min(v_top, v_bot);
  vec = aie::concat(v_mn,v_mx);
  vec = fpshuffle16(vec,0,BFLY_STAGE1a_TOP_O,BFLY_STAGE1a_BOT_O);
}