Guidance Macros - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-05-30
Version
2024.1 English

The guidance macros that VSC supports allow the function arguments (both PE and compute) in the accelerator class to use to various types of hardware interfaces. The following defines the different types of guidance macros:

SYS_CLOCK_ID(<PE-name>, <clock-ID>);
  • <PE-name> specifies a processing element function instantiated in the body of the compute() function.
  • <clock-ID> is an integer value referring to any clock supported by the platform and is available for connection in the user-logic partition. When this macro is specified, the kernel implementation will use the corresponding clock net for the kernel. The available clock IDs for a platform can be found using the platforminfo command.
Tip: When two PEs are connected through AXI-streams, clock-ID may be different if those PEs are marked free-running. In this case, VSC will automatically insert clock-domain crossing (CDC) connection for data transfer between the PEs. When a free-running PE is connected to a PE that it not, they both need to have the same clock-ID.
SYS_PORT(<port>, <global_memory>);

Specify the platform interface to use for a given argument of the compute() function. The global memory is typically a memory bank that will be used to data transfer to/from the FPGA.

  • <port> refers to the name of a specific compute() argument.
  • <global_memory> can be specified in one of the following forms.
    • <bank-ID>: A single bank ID that applies to all CU instances. For example DDR, DDR[1], or HBM[5]. The bank names for a platform can be found by using the platforminfo command.
      Important: Observe the following limitations:
      • Host memory (HOST[0]) is only supported for the X3 hybrid platforms (xilinx_x3522p*)
      • Specifying memory bank ranges (for example HBM[0:3]) is not supported
      • Specifying PLRAM is not supported
    • (<CU1-bank-ID>:<CU2-bank-ID>:...:<NCU-bank-ID>): Within parenthesis, a list of bank-IDs for each CU separated by colons. The bank-IDs are specified for each CU in numerical order, but does not include the CU name: (HBM[0]:HBM[4]:HBM[8]:HBM[12]). The number of entries must match the specified number of CUs in the class (/*NCU=*/4).

SYS_PORT_PFM(<substr>, <port>, <global_memory>);

This can be used to configure accelerator port connections for specific platforms, but defined through a single class header. For example in the code given below, port-A will be connected to HBM[0] for a u50 platform, and to DDR[0] for all other platforms.

SYS_PORT(A, DDR[0]);
SYS_PORT_PFM(u50, A, HBM[0]);
  • The <substr> refers to a sub-string of the platform name. For example, using u50 would employ the SYS_PORT_PFM connections only when the platform name contains the specified string.
  • The <port> and <global_memory> arguments work as described above for SYS_PORT macros.
Important: When multiple SYS_PORT and SYS_PORT_PFM macros are provided for the same <port>, VSC will apply the last suitable SYS_PORT or SYS_PORT_PFM guidance macro that is read.
ACCESS_PATTERN(<port>, <pattern>);

Enables VSC to infer a data mover between the hardware accelerator interface and the global memory in the device.

  • <port> refers to the name of a specific compute() argument.
  • <pattern> defines one of two different memory access patterns:
    • SEQUENTIAL: data transfers occur through AXI4-Stream connections to the acceleration interface. The CU (or kernel) code must strictly follow a sequential access pattern on the corresponding argument, otherwise it will lead to incorrect hardware behavior. For example, pointer indices should be sequentially incremented as with the coding style pointer[i++] or *pointer++.
    • RANDOM: data transfers into an on-chip memory acting as a cache to the accelerator. Therefore the CU code does not need to follow a sequential access pattern.
      Important: On-chip memory resources are limited (for example, typically 32 Kbits per BRAM which could be accessed as 1024 words of 32 bits). If a large payload size per compute job uses too many on-chip RAMs, it can lead to timing closure issues in the Vivado tools. It might be better to use a ZERO_COPY guidance macro connecting the accelerator directly to global memory as described below.
DATA_COPY(<port>, <port>[<Num>]);

Infers a data-mover IP between the global memory and the accelerator interface. At the runtime of each compute() call this data-mover IP will copy the data for the specific compute() argument from (or to) a source memory specified by SYS_PORT or SYS_PORT_PFM guidance macro, or from (or to) local on-chip memory.

  • <port> refers to the name of a specific compute() argument.
  • <port>[<Num>] specifies the number of array elements that the (array or pointer) argument refers to. Num can be an expression of C-constants and/or scalar arguments of compute(). This allows the accelerator to have a dynamic payload size determined at run time, and enables automatic bursting on AXI4 connections, data width conversion, and padding for the user-defined argument data-types.
Important: When using DATA_COPY along with RANDOM access pattern, the corresponding argument in the prototype of the compute API must be declared as an array with fixed size. For example: compute(int A[10], ...).
ZERO_COPY(<port>);

Directs VSC to not infer a data-mover IP. Instead, let the accelerator use a AXI4 interface directly connected to the specified global memory for the specified argument of the compute() function.

  • <port> refers to the name of a specific compute() argument.
ASSIGN_SLR(<PE>, <SLR-IDS>);

VSC will request Vivado to place the related logic of the named PE into the specified SLR(s). However, this is only a request and the final determination made during placement.

  • <PE>: Specifies the name of the processing element.
    Tip: If the compute() function is specified, the SLR assignment is applied to all PEs inside the compute() function.
  • <SLR-IDS>: Specifies the SLRs to use to place the PE. This can be specified in one of the following forms.
    • <SLR-ID>: Applies the specified SLR-ID to all CU instances of this PE.
    • (<CU1-SLR-ID> : ... : <NCU-SLR-ID>): Within parenthesis, a list of SLR-IDs separated by colons. These SLR-IDs are assigned to the CU instances. The number of entries must match the specified number of CUs in the class (/*NCU=*/4).

FREE_RUNNING(<PE>);

Enables the named PE function to be marked as free-running or an always executing kernel in hardware. Refer to Accelerator System Composition for more information.