The guidance macros that VSC supports allow the function arguments (both PE and compute) in the accelerator class to use to various types of hardware interfaces. The following defines the different types of guidance macros:
- SYS_CLOCK_ID(<PE-name>, <clock-ID>);
-
-
<PE-name>
specifies a processing element function instantiated in the body of thecompute()
function. -
<clock-ID>
is an integer value referring to any clock supported by the platform and is available for connection in the user-logic partition. When this macro is specified, the kernel implementation will use the corresponding clock net for the kernel. The available clock IDs for a platform can be found using theplatforminfo
command.
Tip: When two PEs are connected through AXI-streams, clock-ID may be different if those PEs are marked free-running. In this case, VSC will automatically insert clock-domain crossing (CDC) connection for data transfer between the PEs. When a free-running PE is connected to a PE that it not, they both need to have the same clock-ID. -
- SYS_PORT(<port>, <global_memory>);
-
Specify the platform interface to use for a given argument of the
compute()
function. The global memory is typically a memory bank that will be used to data transfer to/from the FPGA.-
<port>
refers to the name of a specificcompute()
argument. -
<global_memory>
can be specified in one of the following forms.-
<bank-ID>
: A single bank ID that applies to all CU instances. For example DDR, DDR[1], or HBM[5]. The bank names for a platform can be found by using theplatforminfo
command.Important: Observe the following limitations:- Host memory (HOST[0]) is only supported for the X3 hybrid platforms (xilinx_x3522p*)
- Specifying memory bank ranges (for example HBM[0:3]) is not supported
- Specifying PLRAM is not supported
-
(
<CU1-bank-ID>:<CU2-bank-ID>:...:<NCU-bank-ID>
): Within parenthesis, a list of bank-IDs for each CU separated by colons. The bank-IDs are specified for each CU in numerical order, but does not include the CU name:(HBM[0]:HBM[4]:HBM[8]:HBM[12])
. The number of entries must match the specified number of CUs in the class (/*NCU=*/4
).
-
-
- SYS_PORT_PFM(<substr>, <port>, <global_memory>);
-
This can be used to configure accelerator port connections for specific platforms, but defined through a single class header. For example in the code given below, port-A will be connected to HBM[0] for a u50 platform, and to DDR[0] for all other platforms.
SYS_PORT(A, DDR[0]); SYS_PORT_PFM(u50, A, HBM[0]);
- The
<substr>
refers to a sub-string of the platform name. For example, usingu50
would employ theSYS_PORT_PFM
connections only when the platform name contains the specified string. - The
<port>
and<global_memory>
arguments work as described above forSYS_PORT
macros.
Important: When multipleSYS_PORT
andSYS_PORT_PFM
macros are provided for the same<port>
, VSC will apply the last suitableSYS_PORT
orSYS_PORT_PFM
guidance macro that is read. - The
- ACCESS_PATTERN(<port>, <pattern>);
-
Enables VSC to infer a data mover between the hardware accelerator interface and the global memory in the device.
-
<port>
refers to the name of a specificcompute()
argument. -
<pattern>
defines one of two different memory access patterns:-
SEQUENTIAL
: data transfers occur through AXI4-Stream connections to the acceleration interface. The CU (or kernel) code must strictly follow a sequential access pattern on the corresponding argument, otherwise it will lead to incorrect hardware behavior. For example, pointer indices should be sequentially incremented as with the coding stylepointer[i++]
or*pointer++
. -
RANDOM
: data transfers into an on-chip memory acting as a cache to the accelerator. Therefore the CU code does not need to follow a sequential access pattern.Important: On-chip memory resources are limited (for example, typically 32 Kbits per BRAM which could be accessed as 1024 words of 32 bits). If a large payload size per compute job uses too many on-chip RAMs, it can lead to timing closure issues in the Vivado tools. It might be better to use aZERO_COPY
guidance macro connecting the accelerator directly to global memory as described below.
-
-
- DATA_COPY(<port>, <port>[<Num>]);
-
Infers a data-mover IP between the global memory and the accelerator interface. At the runtime of each
compute()
call this data-mover IP will copy the data for the specificcompute()
argument from (or to) a source memory specified bySYS_PORT
orSYS_PORT_PFM
guidance macro, or from (or to) local on-chip memory.-
<port>
refers to the name of a specificcompute()
argument. -
<port>[<Num>]
specifies the number of array elements that the (array or pointer) argument refers to.Num
can be an expression of C-constants and/or scalar arguments ofcompute()
. This allows the accelerator to have a dynamic payload size determined at run time, and enables automatic bursting on AXI4 connections, data width conversion, and padding for the user-defined argument data-types.
Important: When usingDATA_COPY
along withRANDOM
access pattern, the corresponding argument in the prototype of the compute API must be declared as an array with fixed size. For example:compute(int A[10], ...)
. -
- ZERO_COPY(<port>);
-
Directs VSC to not infer a data-mover IP. Instead, let the accelerator use a AXI4 interface directly connected to the specified global memory for the specified argument of the
compute()
function.-
<port>
refers to the name of a specificcompute()
argument.
-
- ASSIGN_SLR(<PE>, <SLR-IDS>);
-
VSC will request Vivado to place the related logic of the named PE into the specified SLR(s). However, this is only a request and the final determination made during placement.
-
<PE>
: Specifies the name of the processing element.Tip: If thecompute()
function is specified, the SLR assignment is applied to all PEs inside thecompute()
function. -
<SLR-IDS>
: Specifies the SLRs to use to place the PE. This can be specified in one of the following forms.-
<SLR-ID>
: Applies the specified SLR-ID to all CU instances of this PE. -
(
<CU1-SLR-ID> : ... : <NCU-SLR-ID>
): Within parenthesis, a list of SLR-IDs separated by colons. These SLR-IDs are assigned to the CU instances. The number of entries must match the specified number of CUs in the class (/*NCU=*/4
).
-
-
- FREE_RUNNING(<PE>);
-
Enables the named PE function to be marked as free-running or an always executing kernel in hardware. Refer to Accelerator System Composition for more information.