The performance of an AXI4-Stream Infrastructure IP core is limited only by the FPGA logic speed. Each core uses only block RAMs, LUTs, and registers and contains no I/O elements. The values presented in this section must be used as an estimation guideline; actual performance can vary.
Maximum Frequencies
Each core is designed to meet the maximum target frequency of 250 MHz on an AMD Kintex™ 7 FPGA (xc7k325tffg900-1). It can be expected that a -2 speed grade part can achieve 5% higher maximum target frequency and that a -3 speed grade part can achieve 10% higher maximum target frequency. For AXI4-Stream Switch configurations with more than approximately four masters or slaves, the target maximum frequency can be reduced by 20-25%.
Latency
The latency in the IP cores can vary on an interface-to-interface basis,
depending on how the IP cores are configured. The latency is calculated in clocked
cycles and is measured as the time that it takes from the assertion of the slave
interface TVALID signal to the first assertion of
the master interface TVALID signal. The latency for
each of the individual modules is listed in Table 1. To obtain the minimum
latency for the system, add up the values shown in the following tables for the
modules in your system. The latency specifications assume that the master interface
TREADY signal input is always asserted. The
back-to-back delay is the number of clock cycles that back-to-back transfers can be
accepted by the module. This can be observed by counting how many cycles slave
interface TREADY is Low after a transfer is
accepted on the interface.
| Module Type | Latency (Clocks) |
Back-to-Back Delay (Clocks) |
Description |
|---|---|---|---|
| AXI4-Stream Broadcaster | 0 | 0 | The datapath of the broadcaster is combinatorial. It exhibits no latency if all M_AXIS interfaces have TREADY asserted. |
| AXI4-Stream Clock Converter (synchronous, speed-up) | 1 | 0 | The synchronous clock converter latency is reported as units of the slave interface clock. |
| AXI4-Stream Clock Converter (synchronous, speed-down) | 1 | [clock ratio]-1 | The synchronous clock converter latency is reported as units of the slave interface clock. The back-to-back delay varies based on the clock ratio. Example: If using a synchronous 150 MHz-to-50 MHz 3:1 clock converter (clock ratio of 3), the back-to-back delay is 2 clock cycles. |
| AXI4-Stream Clock Converter (asynchronous) | Not Defined | 0 | The latency associated with an asynchronous clock converter can vary greatly depending on the clocks. It can be expected to see latencies of 5 clock cycles or more. |
| AXI4-Stream Combiner | 0 | 0 | The datapath of the Combiner module is combinatorial and thus has no latency if all ready/valid inputs are asserted. |
| AXI4-Stream Data FIFO | TBD | 0 | The FIFO when configured in normal mode outputs data as soon as it is possible. |
| AXI4-Stream Data FIFO (packet mode) | Until TLAST is received or FIFO is full. | 0 | When configured in packet mode, the FIFO outputs data only when a TLAST is received or the FIFO has filled. |
| AXI4-Stream Data Width Converter (upsizer) | [data width ratio] | 0 | The latency varies based on the data width ratio. Example: If a 32 to 128-bit data converter is used (1:4 ratio), the latency of the module is 4 clock cycles. |
| AXI4-Stream Data Width Converter (downsizer) | 1 | [data width ratio]-1 | The back-to-back delay varies based on the data width ratio. Example: If a 32 to 16-bit data converter is used (2:1 ratio), then the module can only accept transfers every other cycle. |
| AXI4-Stream Register Slice (default or Fully-registered mode) | 1 | 0 | Adding a register slice adds one cycle of latency. There is no back-to-back delay. |
| AXI4-Stream Register Slice Lightweight mode | 1 | 1 | Adding a register slice adds one cycle of latency. Light-weight mode inserts one bubble cycle after each transfer. |
| AXI4-Stream Register Slice SLR Crossing mode | 3 | 0 | SLR Crossing mode incurs 3 latency cycles and adds no back-to-back delay. |
| AXI4-Stream Register Slice SLR TDM Crossing mode | 3 | 0 | SLR TDM Crossing mode incurs 3 aclk latency cycles and adds no back-to-back delay. |
| AXI4-Stream Register Slice Bypass mode | 0 | 0 | Bypass mode directly connects the SI to the MI. |
| AXI4-Stream Subset Converter | 0-1 | 0 | A register slice is inserted when there is a m_axis_tready signal, but not a s_axis_tready signal to avoid violation of the AXI4-Stream protocol. In this configuration, the latency is 1 cycle, otherwise it is 0. |
| AXI4-Stream Switch | 2 | 0-1 | The output latency of the switch is 2 clock cycles. There is 1 cycle of latency for the TDEST decode and 1 cycle of latency for the arbiter grant (if idle.) The back-to-back delay for an already granted arbitration is 0. Back-to-back arbitration results in 1 cycle delays between transactions. |
Throughput
The throughput of a datapath through each AXI4-Stream Infrastructure IP
is calculated as TDATA width x clock frequency of
each of the paths determined by the SI interface, and
MI interface. The minimum throughput of an individual path in a
system for which the transfer will traverse determines the overall throughput of the
datapath.