Alveo Data Center accelerator cards use stacked silicon devices consisting of multiple Super Logic Regions (SLR) to provide device resources, including global memory. Kernel compute unit (CU) instance and DDR memory resource floorplanning are keys to meeting quality of results of your design in terms of frequency and resources. Floorplanning involves explicitly allocating CUs (a kernel instance) to SLRs. For best performance, you should assign kernels or CUs to specific SLRs to improve placement and timing results. SLR assignment is especially important when assigning kernel ports to specific memory banks as described in Mapping Kernel Ports to Memory.
Specific availability of SLR in an Alveo accelerator card can be determined with the platforminfo
command For instance, the U250 card reports
the following information with regard to SLRs:
Valid SLRs
:
SLR0, SLR1, SLR2, SLR3
You can use the actual kernel resource usage values to help distribute CUs across SLRs to reduce congestion in any one SLR. The system estimate report lists the number of resources (LUTs, Flip-Flops, Block RAMs, etc.) used by the kernels early in the design cycle. Use this information along with the available SLR resources to help assign CUs to SLRs such that no one SLR is over-utilized.
A CU can be assigned to an SLR using the connectivity.slr
option in a config file. The syntax of the connectivity.slr
option in the config file is as follows:
[connectivity]
#slr=<compute_unit_name>:<slr_ID>
slr=vadd_1:SLR2
slr=vadd_2:SLR3
Where:
-
<compute_unit_name>
is an instance name of the CU as determined by theconnectivity.nk
option, described in Creating Multiple Instances of a Kernel, or is simply<kernel_name>_1
if multiple CUs are not specified. -
<slr_ID>
is the SLR number to which the CU is assigned, in the form SLR0, SLR1,...
AMD recommends assigning a kernel
to a DDR memory resource in the same SLR as the kernel is placed. This reduces
competition for limited SLR-crossing connection resources, and the use of super long
line (SLL) routing resources which incur a greater delay than a standard routing. It
might be necessary to connect a kernel to a DDR resource in a different SLR. However, if
both the connectivity.sp
and the connectivity.slr
directives are explicitly defined, the
tool automatically adds additional crossing logic to minimize the effect of the SLL
delay, and facilitates better timing closure.
--profile.trace_memory <memory>:<SLR>