Abstract Shell Design Flow

Solution Efficiencies for Dynamic Function eXchange Using Abstract Shells (WP533)

Document ID
WP533
Release Date
2023-12-15
Revision
1.0.1 English

This section compares the abstract shell flow and the default DFX flow, examining the underlying commands to produce and then use abstract shells. With the abstract shell flow, the implementation of the parent configuration is absolutely identical to the standard DFX flow where you implement the static design and then lock down those results. The flow does not diverge until the static-only design result is saved. The write_abstract_shell command is used to black box the target partition, trim away unneeded static, lock the remaining design, and validate the result using pr_verify. This command must be called for any RP that is to become an abstract shell, because each RP has a unique footprint and connection to the static design. Finally, the implementation of the remaining RMs matches the standard flow, but now the focus is on a single RP for each child run. Instead of building new full-design configurations to implement (or reuse) multiple RMs per design, each RM is implemented in its own run in its own abstract shell, leading to greater efficiency.

Note: In the Vivado Design Suite, the abstract shell flow is supported in non-project Tcl mode.
Figure 1. Design Flow for the Abstract Shell Solution

The parent design configuration can be used to create a full device programming image. With the initial RMs included in any RPs, each creates partial bitstreams. Abstract shell runs can generate additional partial bitstreams for the functions they implement. In single-user environments, within a single company or design group, additional full design configurations can be created for bitstream generation by linking routed static and dynamic checkpoints. Any combination of static and reconfigurable images can be rebuilt to create any full or partial bitstreams needed for the target system. There is no limitation compared to a standard DFX flow in this regard.

Abstract shell compile times are faster in nearly every scenario. How much faster depends on the structure of the design. Designs with very large dynamic regions and minimal static produce modest gains given that very little static is removed to create the shell. For example, AMD Alveo™ platform designs, which contain very little static logic, can be compiled more than twice as fast using the abstract shell flow. This can add up to big savings when you consider the static platform is rarely revisited; the primary usage is to build new RMs. For designs with smaller dynamic regions and larger static regions, the gains are much greater—across a varied design suite, improvements that can be 5, 10, or more times faster than the standard DFX flow. Although the creation of the abstract shell takes time, it is a step that is not done often. The following figure shows the compile-time savings (gray line) using the abstract shell across a variety of AMD Virtex™ UltraScale+™ device designs.

Figure 2. Compile-time Savings

The following image is a closer look at a specific sample design. This design has a dynamic region that covers about a quarter of the device, with a static region that is not much more than a single super logic region (SLR) in the three SLR Virtex UltraScale+ FPGA (VU9P). After the static design is implemented, new RMs can take 1.5 hours to compile, partly due to the 266 MB static design checkpoint. Using the abstract shell flow, the design checkpoint of the shell shrinks to only 10 MB, memory usage is cut by a third, and compile time is cut by two thirds.

Figure 3. Example of a Full Shell (Left) Compared to an Abstract Shell (Right)