Kernel compute unit (CU) instance and DDR memory resource floorplanning are keys to meeting quality of results of your design in terms of frequency and resources. Floorplanning involves explicitly allocating CUs (a kernel instance) to SLRs and mapping CUs to DDR memory resources. When floorplanning, both CU resource usage and DDR memory bandwidth requirements need to be considered.
The largest Xilinx FPGAs are made up
of multiple stacked silicon dies. Each stack is referred to as a super logic region
(SLR) and has a fixed amount of resources and memory including DDR interfaces. Available
device SLR resources which can be used for custom logic can be found in the Vitis Software Platform Release Notes, or can be displayed using the platforminfo
utility described in platforminfo Utility.
You can use the actual kernel resource utilization values to help distribute CUs across SLRs to reduce congestion in any one SLR. The system estimate report lists the number of resources (LUTs, Flip-Flops, BRAMs, etc.) used by the kernels early in the design cycle. The report can be generated during hardware emulation and compilation through the command line or GUI and is described in System Estimate Report.
Use this information along with the available SLR resources to help assign CUs to SLRs such that no one SLR is over-utilized. The less congestion in an SLR, the better the tools can map the design to the FPGA resources and meet your performance target. For mapping memory resources and CUs, see Mapping Kernel Ports to Memory and Assigning Compute Units to SLRs.
After allocating your CUs to SLRs, map any CU master AXI port(s) to DDR memory resources. Xilinx recommends connecting to a DDR memory resource in the same SLR as the CU. This reduces competition for the limited SLR-crossing connection resources. In addition, connections between SLRs use super long line (SLL) routing resources, which incurs a greater delay than a standard intra-SLR routing.
It might be necessary to cross an SLR region to connect to a DDR resource
in a different SLR. However, if both the connectivity.sp
and the connectivity.slr
directives are explicitly defined, the tools automatically add additional crossing logic
to minimize the effect of the SLL delay, and facilitates better timing closure.