Row-Major Layout - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

In a row-major layout, the elements are contiguous in memory and the batches are strided apart.

This is a 2D representation of the layout:

B1 -> [ 0 ][ 1 ][ 2 ] ... [ N-1 ]

B2 -> [ 0 ][ 1 ][ 2 ] ... [ N-1 ]

...

Bk -> [ 0 ][ 1 ][ 2 ] ... [ N-1 ]

where B is Batch, k is the number of batches, and N is the size of FFT.

The stride rules are as follows:

  • Inplace Problems: Since the same buffer is to be used for both input and output,
    • For R2C, The input stride for batches should account for the expanded output size of half-complex. Likewise for C2R’s output stride.

    • For example, if you have 4 batches and 50 FFT points (4v50),
      • The output for R2C will have (N/2 + 1) complex points, i.e. 26 complex values for N=50. Since each complex value consists of 2 elements (real and imaginary), the input batch stride should be set to 52 (26 x 2), not 50.

      • The output batch stride would be 26 i.e. (N/2 + 1).

    • Similarly, for C2R, the input batch stride for batches should be 26 and output stride 52.

    • This ensures that each batch’s data does not overlap in memory and matches the expected layout for in-place transforms.

    • Example:

      For an input problem of 4v50, the correct vec stride settings for dims[0].in_stride = dims[0].out_stride = 1 would be:

      • R2C in-place: vecs[0].in_stride = 52, vecs[0].out_stride = 26

      • C2R in-place: vecs[0].in_stride = 26, vecs[0].out_stride = 52

  • Out-of-place Problems: As the input and output buffers are separate, the batch strides can be set independently based on the actual data layout in memory.
    • For R2C, the input batch stride can be set to the actual spacing of real valued input data, while the output batch stride should account for the half-complex format.

    • For C2R, the input batch stride should account for the half-complex format, while the output batch stride can be set to the actual spacing of real valued output data.

    • Example:

      For an input problem of 4v50, the correct vec stride settings for dims[0].in_stride = dims[0].out_stride = 1 would be:

      • R2C out-of-place: vecs[0].in_stride = 50, vecs[0].out_stride = 26

      • C2R out-of-place: vecs[0].in_stride = 26, vecs[0].out_stride = 50