As shown in Figure 1, several parallel running forward streaming modules are connected via FIFOs
to compute several wavefield time steps simultaneously. The number of connected forward streaming
modules is configurable at compile time, and decides the freqency of device memory access for retriveing data.
For example, if 10 forward streaming modules are connected, that means the data retrieved form each device memory access
can sustain the computation of 10 wavefield time steps. The C++ implementation of 2D-RTM forward kernel can be found in
L2/include/hw/rtm2d/rtmforward.hpp
.