Streaming interface for Filters - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Streaming interfaces are now supported by all FIRs. Streaming interfaces are based on 32-bit AXI4-Stream and offer throughput of up to 32 Gbps (based on 1 GHz AIE) per stream used.

When TP_API = 1 the FIR will have stream API input and output ports, allowing greater interoperability and flexibility in placement of the design. Streaming interfaces may be configured to connect a single or dual stream inputs (driven by TP_DUAL_IP) or one or two stream outputs (driven by TP_NUM_OUTPUTS).

In general, stream based filters require less or no data buffering and therefore have lesser memory requirements and lower latency than window API filters.

Note

AIE-ML devices only support single input/output port.

Note

AIE-ML devices cannot take advantage of the symmetry of FIRS, therefore FIRs implementation is always based on Asymmetric design.

Asymmetric FIRs

Asymmetric FIRs (single-rate, as well as rate-changing FIRs) will use input and output streams directly. As a result, there is no need for input/output buffering, hence Asymmetric FIRs offer very low latency and very low memory footprint. In addition, due to the lack of memory requirements, such designs may operate on very large number of samples within each kernel iteration (TP_INPUT_WINDOW_VSIZE is limited to 2^31 - 1) achieving maximum performance and maximum throughput.

For example, a single kernel (TP_CASC_LEN = 1), 16 tap single-rate asymmetric FIR implemented on AIE, that is using cint16 data with frame size of 25600 and int16 coefficients, is offering throughput of 998 MSa/s (based on 1 GHz AIE clock) and latency as low as tens of nanoseconds.

Hybrid Streaming interface for Symmetric and Half-band FIRs in non-SSR mode

Symmetric FIRs, including half-band FIRs, cannot take full advantage of input streams when operating in a non-SSR mode, i.e. TP_SSR, TP_PARA_INTERP_POLY and TP_PARA_DECI_POLY are all set to 1. Instead, the input stream will be converted to a window buffer and the FIR kernels will operate in a windowed based architecture. Output data will be sent directly out through a stream port. Such designs allow a more flexible connection and mapping onto AIE tiles. Latency is reduced, when compared to a window based equivalent, but is much greater than compared with an asymmetric design. Lack of an output buffer also reduces the memory requirements.

For example, a 16 tap single-rate symmetric FIR implemented on AIE with a 512 sample input/output window operating on cint16 data and int16 coefficients achieves throughput of 978 MSa/s (based on 1 GHz AIE clock) and will need around 1.4 us before a full window of
samples is available for the consumer to read.

Symmetric and Half-band FIRs in SSR mode

When operating in an SSR mode, i.e. TP_SSR, TP_PARA_INTERP_POLY or TP_PARA_DECI_POLY are greater than 1, all Symmetric and Half-band FIRs, in addition to all Asymmetric FIRs, will operate on input and output streams directly, offering very low latency and minimal memory footprint.

For example, a 32 tap, single-rate symmetric FIR implemented on AIE with a SSR set to 2 (TP_SSR = 2), using cint16 data with frame size of 25600 and int16 coefficients achieves throughput of 1998 MSa/s (based on 1 GHz AIE clock) and latency as low as tens of nanoseconds.