Working with Functional Model of the HLS Kernel

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-07-17
Version
2023.1 English

Using a functional model of the AMD Vitis™ HLS kernel during hardware emulation is an advanced use case that enables compilation of kernels in functional mode that generates the XO with the SystemC wrapper around the C code.

HW Emulation is mainly targeted for hardware kernel debug with detailed, cycle-accurate view of kernel activity. The functional (TLM) model speeds up emulation by compiling the kernel of interest in functional mode rather than as RTL code. This provides faster compile time for the kernel as it does not need full C to RTL synthesis, and faster execution time as C-code is simulated instead of RTL simulation. You can also mix and match of C and RTL kernels in hardware emulation for faster debug of RTL blocks.

The functional model feature supports modeling AXI4-Stream interfaces (axis) and AXI4 memory-mapped interfaces (m_axi), as well as register reads and writes of the AXI4-Lite (s_axilite) interfaces. However, with this approach, the kernel will be purely functional without latency information, unlike cycle-accurate models.

The user HLS function is wrapped into a SystemC module with TLM interfaces and IP is created out of the generated code which will allow generating HW_EMU compatible XO that can be used in IP integrator for stitching v++ link designs in HW Emulation flows. This also allows the Wrapper IP to talk to other RTL and SystemC models. So, the HLS C/C++ kernels compiled in functional mode will have TLM transactions during simulation and users can see traffic between the memory models (e.g. DDR) and the TLM kernels.

Tip: The functional model uses C-code performing C-simulation for the kernel. The kernel will be purely functional without any latency information unlike cycle-accurate models. Although, you can see boundary transactions via TLM interfaces during HW Emulation.

XO Generation with Functional Model

During the v++ compile step, while creating the hardware emulation (hw_emu) XO files, you can provide a switch describing the intention to do a functional simulation that will generate XO with the SystemC wrapper on the C code. You need to provide an --advanced.param option during compilation. This can be done by adding the compiler option --advanced.param compiler.emulationMode=func as described in --advanced Options.

The generated XO is linked using the v++ --link command same as the regular XO. For an example refer to mm_stream_func_mode on GitHub.

Limitations of the Functional Model

  1. Limitations in HLS are applied "as is". For example, HLS does not support double pointers so the functional model does not identify it.
  2. HLS designs which operate on multiple data iteration from host with single kernel ap_start (for example ap_ctrl_chain) may not operate if the restart is triggered from the kernel code. Mailboxing works fine.
  3. Application Binary Interface (ABI) changes for FPGA are not available in Functional Mode x86 ABI. For most optimizations where is ABI is used, they need to be disabled in functional compiler.
  4. Limiting DDR Analysis by Casting/Inter procedural uses:

    1. Typecasting DDR pointers from scalars will not work.
      kernel void vadd(size_t a_s,size_t b_s,size_t c){
       int* a = (size_t)a;
       int* b = (size_t)b;
       int* c = (size_t)c;
       for(int i=0; i < 64; i++){
       c[i] = a[i] + b[i]; 
       } 
      }
    2. Caching DDR pointers across procedural context will not work.
      class Cache{
      int* local;
      Cache(int *a) : local(a){}
      int read(){}
      void write(int x){}
      };
      kernel void vadd(int *a,int *b, int *c){
       Cache ca(a);
       for(int i=0; i < 64; i++){
       c[i] = ca.read() + b[i]; 
       } 
      }
  5. HLS Features implemented in binary and consuming DDR access are not supported and require functional rewrite.

Coding guidelines for working with functional models: For kernel compute units that run multiple times and expect static value reset to zero in each iteration you must initialize all static variables at the entry of the kernel function. The following example shows code that returns an error and also demonstrates the recommended approach:

// User code that errors out 
static int i = 0;
void hls_kernel_logic(...) {
 ...
}
// Recommended 
static int i = 0;
void hls_kernel_logic(...) {
 i = 0;
 ...
}