Simplify Packet Processing Design with P4 and Vivado Tools (WP555)

Document ID
Release Date
1.0 English

A major advantage of designing with VNP4 is the savings in development time and effort. This applies to the actual generation of the RTL, but also might be more significant during the RTL verification. To highlight where the savings exist, the following figure compares the different phases of a typical RTL development flow against the approach using VNP4.

Figure 1. Conceptual Breakdown of Project Effort

The definition phase includes defining the scope of a project and capturing the important details in a requirements specification document. When it comes to packet processing, the requirements can be specified more efficiently and effectively with the use of P4 code compared to a requirements specification document. The P4 code is concise and less ambiguous, which helps to avoid misinterpretation later in the project. This consequently saves effort and time in those later stages. Many examples within the industry highlight the benefits of P4 as a specification language [4][5][6].

The definition phase also includes test planning, which involves decisions about the design of a test bench, the nature of the stimulus that is needed to test the design, and the nature of the checking mechanisms. VNP4 provides an example design including a SystemVerilog test bench with automated self-checking against the P4 behavioral model, allowing you to focus more on the stimulus side. The verification can be run using the P4 behavioral model, which has much faster runtimes. The P4's higher level of abstraction makes debug work much easier, where the model outputs a detailed log of each step through the P4 program as a packet is processed. RTL simulation is still recommended when integrating into a larger system design.

Figure 2. VNP4 Example Design Test Bench

The design phase, which would otherwise involve the detailed inner workings and interface connection specifications of the RTL modules, can be simplified to a few top-level VNP4 parameters and clocking decisions. The use of standard interfaces (AXI4-Stream and AXI4-Lite) simplifies the connection to other parts of the system. The user metadata structure also provides customization for custom side-band signals that are needed for interconnection by the user application.

One of the biggest savings when using VNP4 is the reduction in RTL and driver coding in the implementation phase. If the functionality can be described in P4 without user externs, then no RTL coding is required. The engineering effort saved in RTL coding is magnified for more complex P4 designs. This savings is further multiplied in cases of changing requirements, scope creep, and new features.

Similarly, the verification phase is also much shorter where there is little or no RTL test bench coding involved. The P4 code can be verified using the behavioral model. The runtime and iteration cycles are faster here compared to an RTL test bench. Detailed log information is provided by the model to indicate how each packet is processed by the P4 code step by step, allowing for easier debug compared to reviewing RTL waveforms.

Both flows require a system integration stage in either RTL or IP integrator. However, timing closure can be a significantly lower risk with the P4 flow. In the context of a hardware debug iteration cycle, this becomes even more pronounced. The P4 code can be quickly simplified (for example, reduced table size) to generate test bitstreams with a quicker, more reliable turnaround time, before later switching back to the full P4 functionality.