Performance

Simplify Packet Processing Design with P4 and Vivado Tools (WP555)

Document ID
WP555
Release Date
2024-01-24
Revision
1.0 English

The pipelined nature of the P4 designs allows VNP4 to process one packet every clock cycle. The only exception to this is for HBM/DDR binary CAM (BCAM) table look-ups, where the DRAM bandwidth might become a limiting factor. All elements of the design can scale in complexity without reducing the performance. This includes a complex header parsing tree, many different table look-ups and actions, and many packet editing operations. VNP4 does not set limitations on these complexities, for example:

  • There is no fixed limit to the number of parsing states or header extracts
  • There is no fixed limit to the number of headers that can be modified, removed, or inserted
  • There is no fixed limit to the number of tables in the match-action block
  • There is no fixed limit on the size of the user metadata

All elements can scale up to a large value without impacting the performance. Ultimately, designs reach a natural limit in terms of the device resource utilization (for example, if all block RAMs and UltraRAMs are exhausted). The performance is also not impacted by how deep the parsing goes into a packet.

The packet bus width and the clock frequency can be chosen to achieve the desired performance. The packet rate can also be configured to allow for further optimizations. Some common examples are shown in the following table.

Table 1. Examples of Parameter Settings for Different Throughputs
Throughput

(Gb/s)

Packet Rate

Mp/s

Data Bus Width

(Bytes)

Clock Frequency

(MHz)

200 300 128 336
100 150 64 300
50 75 32 300
10 15 4 312.5
Note: Higher clock frequencies and packet rates can also be achieved; a trade-off is then needed between function complexity and timing closure.