The pipelined nature of the P4 designs allows VNP4 to process one packet every clock cycle. The only exception to this is for HBM/DDR binary CAM (BCAM) table look-ups, where the DRAM bandwidth might become a limiting factor. All elements of the design can scale in complexity without reducing the performance. This includes a complex header parsing tree, many different table look-ups and actions, and many packet editing operations. VNP4 does not set limitations on these complexities, for example:
- There is no fixed limit to the number of parsing states or header extracts
- There is no fixed limit to the number of headers that can be modified, removed, or inserted
- There is no fixed limit to the number of tables in the match-action block
- There is no fixed limit on the size of the user metadata
All elements can scale up to a large value without impacting the performance. Ultimately, designs reach a natural limit in terms of the device resource utilization (for example, if all block RAMs and UltraRAMs are exhausted). The performance is also not impacted by how deep the parsing goes into a packet.
The packet bus width and the clock frequency can be chosen to achieve the desired performance. The packet rate can also be configured to allow for further optimizations. Some common examples are shown in the following table.
Throughput (Gb/s) |
Packet Rate Mp/s |
Data Bus Width (Bytes) |
Clock Frequency (MHz) |
---|---|---|---|
200 | 300 | 128 | 336 |
100 | 150 | 64 | 300 |
50 | 75 | 32 | 300 |
10 | 15 | 4 | 312.5 |