Working with Functional Model of the HLS Kernel - 2024.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2024-07-03
Version
2024.1 English

Using a functional model of the AMD Vitis™ HLS kernel during hardware emulation is an advanced use case that enables compilation of kernels in functional mode that generates the XO with a SystemC wrapper around the C code. An IP, whether generated by the HLS compiler or not, can have multiple types of simulation models, such as TLM and RTL, as indicated by the allowed_sim_models property. However, the IP needs to indicate which of these models is the current model as defined by the selected_sim_model property. The method described here lets you specify which type of sim model you want to be selected when the HLS compiler generates the output IP or XO.

HW Emulation is mainly targeted for hardware kernel debug with detailed, cycle-accurate view of kernel activity. The functional (TLM) model speeds up emulation by compiling the kernel of interest in functional mode rather than as RTL code, and can be used as an early model when the RTL is not yet available. This provides faster compile time for the kernel as it does not need full C to RTL synthesis, and faster execution time as C-code is simulated instead of RTL simulation. You can also mix and match of C and RTL kernels in hardware emulation for faster debug of RTL blocks.

The functional model feature supports modeling AXI4-Stream interfaces (axis) and AXI4 memory-mapped interfaces (m_axi), in addition to register reads and writes of the AXI4-Lite (s_axilite) interfaces. However, with this approach, the kernel will be purely functional without latency information, unlike cycle-accurate models.

The user HLS function is wrapped into a SystemC module with TLM interfaces and IP is created out of the generated code which will allow generating HW_EMU compatible XO that can be used in IP integrator for stitching v++ link designs in HW Emulation flows. This also allows the Wrapper IP to talk to other RTL and SystemC models. So, the HLS C/C++ kernels compiled in functional mode will have TLM transactions during simulation and users can see traffic between the memory models (for example DDR memory) and the TLM kernels.

Tip: The functional model uses C-code performing C-simulation for the kernel. The kernel will be purely functional without any latency information unlike cycle-accurate models. Although, you can see boundary transactions via TLM interfaces during HW Emulation.

XO Generation with Functional Model

Important: The v++ -c --mode hls command does not support the functional simulation model as described below. To generate an IP or kernel to use the TLM model for functional simulation you must use the v++ -c -t=hw_emu command.

During the v++ -c -t=hw_emu compile step, while creating the hardware emulation (hw_emu) XO files, you can provide an option enabling a functional simulation model for the PL kernel that will generate the XO with a SystemC wrapper on the C code. You need to provide the --advanced.param compiler.emulationMode=func option during compilation, as described in --advanced Options.

The default setting for this is compiler.emulationMode=rtl. When building the XO you can either provide the default value using --advanced.param compiler.emulationMode=rtl so you can simply toggle between RTL and TLM models for a specific XO; or you can remove the --advanced_param command to restore the default value and add it back when building for functional simulation. In either case, if you want to change the model from RTL to TLM or back, you must recompile the XO using the v++ -c -t=hw_emu command.

The generated functional simulation XO is linked using the v++ --link command like the regular XO.

Limitations of the Functional Model

  1. The functional mode is not supported on Windows OS.
  2. Limitations in HLS are applied "as is." For example, HLS does not support double pointers so the functional model does not identify it.
  3. HLS designs which operate on multiple data iteration from host with single kernel ap_start (for example ap_ctrl_chain) might not operate if the restart is triggered from the kernel code. Mailboxing works fine.
  4. Application Binary Interface (ABI) changes for FPGA are not available in Functional Mode x86 ABI. For most optimizations where is ABI is used, they need to be disabled in functional compiler.
  5. Limiting DDR Analysis by Casting/Inter procedural uses:

    1. Typecasting DDR memory pointers from scalars will not work.
      kernel void vadd(size_t a_s,size_t b_s,size_t c){
       int* a = (size_t)a;
       int* b = (size_t)b;
       int* c = (size_t)c;
       for(int i=0; i < 64; i++){
       c[i] = a[i] + b[i]; 
       } 
      }
    2. Caching DDR memory pointers across procedural context will not work.
      class Cache{
      int* local;
      Cache(int *a) : local(a){}
      int read(){}
      void write(int x){}
      };
      kernel void vadd(int *a,int *b, int *c){
       Cache ca(a);
       for(int i=0; i < 64; i++){
       c[i] = ca.read() + b[i]; 
       } 
      }
  6. HLS features implemented in binary and consuming DDR memory access are not supported and require functional rewrite.
  7. Burst transactions are not automatically detected in the functional model.

Coding guidelines for working with functional models: For kernel compute units that run multiple times and expect static value reset to zero in each iteration you must initialize all static variables at the entry of the kernel function. The following example shows code that returns an error and also demonstrates the recommended approach:

// User code that errors out 
static int i = 0;
void hls_kernel_logic(...) {
 ...
}
// Recommended 
static int i = 0;
void hls_kernel_logic(...) {
 i = 0;
 ...
}