Currently, Xilinx devices on Data Center accelerator cards use stacked silicon consisting of several Super Logic Regions (SLRs) to provide device resources, including global memory. For best performance, when assigning ports to global memory banks, as described in Mapping Kernel Ports to Memory, it is best that the CU instance is assigned to the same SLR as the global memory it is connected to. In this case, you will want to manually assign the kernel instance, or CU into the same SLR as the global memory to ensure the best performance. In addition, if your platform or device supports multiple SLRs, then you should assign CUs to specific SLRs to improve placement and timing results.
A CU can be assigned to an SLR using the connectivity.slr
option in a config file. The syntax of the connectivity.slr
option in the config file is as follows:
[connectivity]
#slr=<compute_unit_name>:<slr_ID>
slr=vadd_1:SLR2
slr=vadd_2:SLR3
Where:
-
<compute_unit_name>
is an instance name of the CU as determined by theconnectivity.nk
option, described in Creating Multiple Instances of a Kernel, or is simply<kernel_name>_1
if multiple CUs are not specified. -
<slr_ID>
is the SLR number to which the CU is assigned, in the form SLR0, SLR1,...
The assignment of a CU to an SLR must be specified for each CU
separately, and is recommended when the platform contains multiple SLRs. If an assigned
CU is connected to global memory located in another SLR, the tool will automatically
insert SLR crossing registers to help with timing closure. In the absence of an SLR
assignment, the v++
linker is free to assign the CU to
any SLR.
After editing the config file to include the SLR assignments, you can
use it during the v++
linking process by specifying
the config file using the --config
option:
v++ -l --config config_slr.cfg ...