In a row-major layout, the elements are contiguous in memory and the batches are strided apart.
This is a 2D representation of the layout:
B1 -> [ 0 ][ 1 ][ 2 ] ... [ N-1 ]
B2 -> [ 0 ][ 1 ][ 2 ] ... [ N-1 ]
...
Bk -> [ 0 ][ 1 ][ 2 ] ... [ N-1 ]
where B is Batch, k is the number of batches, and N is the size of FFT.
The stride rules are as follows:
- Inplace Problems: Since the same buffer is to be used for both input and output,
For R2C, The input stride for batches should account for the expanded output size of half-complex. Likewise for C2R’s output stride.
- For example, if you have 4 batches and 50 FFT points (4v50),
The output for R2C will have (N/2 + 1) complex points, i.e. 26 complex values for N=50. Since each complex value consists of 2 elements (real and imaginary), the input batch stride should be set to 52 (26 x 2), not 50.
The output batch stride would be 26 i.e. (N/2 + 1).
Similarly, for C2R, the input batch stride for batches should be 26 and output stride 52.
This ensures that each batch’s data does not overlap in memory and matches the expected layout for in-place transforms.
Example:
For an input problem of 4v50, the correct vec stride settings for dims[0].in_stride = dims[0].out_stride = 1 would be:
R2C in-place: vecs[0].in_stride = 52, vecs[0].out_stride = 26
C2R in-place: vecs[0].in_stride = 26, vecs[0].out_stride = 52
- Out-of-place Problems: As the input and output buffers are separate, the batch strides can be set independently based on the actual data layout in memory.
For R2C, the input batch stride can be set to the actual spacing of real valued input data, while the output batch stride should account for the half-complex format.
For C2R, the input batch stride should account for the half-complex format, while the output batch stride can be set to the actual spacing of real valued output data.
Example:
For an input problem of 4v50, the correct vec stride settings for dims[0].in_stride = dims[0].out_stride = 1 would be:
R2C out-of-place: vecs[0].in_stride = 50, vecs[0].out_stride = 26
C2R out-of-place: vecs[0].in_stride = 26, vecs[0].out_stride = 50