Alveo accelerator cards contain HBM DRAM and DDR DRAM memory resources. In some accelerator cards, an additional memory resource available is internal FPGA PLRAM (UltraRAM and block RAM). Supporting platforms typically contain instances of PLRAM in each SLR. The size and type of each PLRAM can be configured on the target platform before kernels or Compute Units are linked into the system.
v++
command line as follows:
v++ -l --advanced.param compiler.userPreSysLinkOverlayTcl=<path_to>/user_tcl_file.tcl
sdx_memory_subsystem::update_plram_specification <memory_subsystem_bdcell> <plram_resource> <plram_specification>
The <plram_specification>
is a Tcl
dictionary consisting of the following entries (entries below are the default values for
each instance in the platform):
{
SIZE 128K # Up to 4M
AXI_DATA_WIDTH 512 # Up to 512
SLR_ASSIGNMENT SLR0 # SLR0 / SLR1 / SLR2
READ_LATENCY 1 # To optimise timing path
MEMORY_PRIMITIVE BRAM # BRAM or URAM
}
In the example below, PLRAM_MEM00
is changed to
be 2 MB in size and composed of UltraRAM; PLRAM_MEM01
is changed to be 4 MB in size and composed of UltraRAM. PLRAM_MEM00
and PLRAM_MEM01
correspond to
the --conectivity.sp
memory resources PLRAM[0] and
PLRAM[1].
# Setup PLRAM
sdx_memory_subsystem::update_plram_specification
[get_bd_cells /memory_subsystem] PLRAM_MEM00 { SIZE 2M AXI_DATA_WIDTH 512
SLR_ASSIGNMENT SLR0 READ_LATENCY 10 MEMORY_PRIMITIVE URAM}
sdx_memory_subsystem::update_plram_specification
[get_bd_cells /memory_subsystem] PLRAM_MEM01 { SIZE 4M AXI_DATA_WIDTH 512
SLR_ASSIGNMENT SLR0 READ_LATENCY 10 MEMORY_PRIMITIVE URAM}
validate_bd_design -force
save_bd_design
The READ_LATENCY
is an important attribute,
because it sets the number of pipeline stages between memories cascaded in depth. This
varies by design, and affects the timing QoR of the platform and the eventual kernel
clock rate. In the example above for PLRAM_MEM01
:
- 4 MB of memory are required in total.
- Each UltraRAM is 32 KB (64 bits wide). 4 MB × 32 KB → 128 UltraRAMs in total.
- Each PLRAM instance is 512 bits wide → 8 UltraRAMs are required in width.
- 128 total UltraRAMs with 8 UltraRAMs in width → 16 UltraRAMs in depth.
- A good rule of thumb is to pick a read latency of depth/2 + 2 → in this
case,
READ_LATENCY
= 10.
This allows a pipeline on every second UltraRAM, resulting in the following:
- Good timing performance between UltraRAMs.
- Placement flexibility; not all UltraRAMs need to be placed in the same UltraRAM column for cascade.