Streaming interfaces are now supported by all FIRs. Streaming interfaces are based on 32-bit AXI4-Stream and offer throughput of up to 32 Gbps (based on 1 GHz AIE) per stream used.
When TP_API = 1
the FIR will have stream API input and output ports, allowing greater interoperability and flexibility in placement of the design.
Streaming interfaces may be configured to connect a single or dual stream inputs (driven by TP_DUAL_IP
) or one or two stream outputs (driven by TP_NUM_OUTPUTS
).
In general, stream based filters require less ot no data buffering and therefore have lesser memory requirements and lower latency than window API filters.
Asymmetric FIRs
Asymmetric FIRs (single-rate, as well as rate-changing FIRs) will use input and output streams directly.
As a result, there is no need for input/output buffering, hence Asymmetric FIRs offer very low latency and very low memory footprint.
In addition, due to the lack of memory requirements, such designs may operate on very large number of samples within each kernel iteration (TP_INPUT_WINDOW_VSIZE
is limited to 2^31 - 1
) achieving maximum performance and maximum throughput.
For example, a single kernel (TP_CASC_LEN = 1
), 16 tap single-rate asymmetric FIR, using cint16
data with frame size of 25600 and int16
coefficients, is offering throughput of 998 MSa/s (based on 1 GHz AIE clock) and latency as low as tens of nanoseconds.
Hybrid Streaming interface for Symmetric and Half-band FIRs in non-SSR mode
Symmetric FIRs, including half-band FIRs, cannot take full advantage of input streams when operating in a non-SSR mode, i.e. TP_SSR
, TP_PARA_INTERP_POLY
and TP_PARA_DECI_POLY
are all set to 1.
Instead, the input stream will be converted to a window buffer and the FIR kernels will operate in a windowed based architecture.
Output data will be sent directly out through a stream port.
Such designs allow a more flexible connection and mapping onto AIE tiles.
Latency is reduced, when compared to a window based equivalent, but is much greater than compared with an asymmetric design. Lack of an output buffer also reduces the memory requirements.
For example, a 16 tap single-rate symmetric FIR with a 512 sample input/output window operating on cint16
data and int16
coefficients achieves throughput of 978 MSa/s (based on 1 GHz AIE clock) and will need around 1.4 us before a full window of output samples is available for the consumer to read.
Symmetric and Half-band FIRs in SSR mode
When operating in an SSR mode, i.e. TP_SSR
, TP_PARA_INTERP_POLY
or TP_PARA_DECI_POLY
are greater than 1, all Symmetric and Half-band FIRs, in addition to all Asymmetric FIRs, will operate on input and output streams directly, offering very low latency and minimal memory footprint.
For example, a 32 tap, single-rate symmetric FIR with a SSR set to 2 (TP_SSR = 2
), using cint16
data with frame size of 25600 and int16
coefficients achieves throughput of 1998 MSa/s (based on 1 GHz AIE clock) and latency as low as tens of nanoseconds.