Continuously Running Mode - 2024.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-11-13
Version
2024.2 English

The continuously running kernels are kernels that will run forever, until they are reset. The notion of a never-ending kernel is typical of an HW design context. These kernels are closely related to HW FSMs, modeled as processes in VHDL or Verilog. They are very different from classical SW acceleration kernels, which are meant to "be called" and "return" when done in a manner akin to function calls in SW. The behavior of a never-ending kernel differs from that of an SW acceleration kernel. Consider the below example; as a classical SW acceleration kernel, it is meant to "be called" and "return" when done. But a never-ending kernel will re-execute until reset without explicitly being started by the host; it is similar to a while (1) loop around the function.

The use case is in the field of video/networking/fintech where the kernel receives data continuously without any interrupt from the Ethernet ports/DAC etc. This use case needs Continuously running kernels since the host does not know in advance the number of packets (where the number of packets is counted from the first TVALID packet to the last TLAST packet) the kernel should process. This problem is solved when the kernel is modeled with a Continuously running mode, allowing the kernel to run forever without knowing the number of packets in advance.

Some other examples are:

  • Simple rule-based “firewall”, with “rules” written by the host code at reset time, and where the host code can read a set of counters of dropped packets at any time, but where all counter values must come from a single kernel execution.
  • Network router where the routing table must be updated entirely for kernel execution.
  • A load balancer that uses a hash map to send data to a server must update the server list, server map, and corresponding IP addresses simultaneously.
  • Fintech Example where the host loosely monitors the execution status of the kernel running on the accelerator card processing requests coming in over the Ethernet ports.
  1. Can the system tolerate a stall at the initial state

    There are two ways to model the Continuously running kernel, but the decision of option-1/2 will depend on the system designer.

    • Option 1: Data-driven Task-level Parallelism (DTLP).
    • Option 2: Control and Data-Driven Task-level Parallelism (CDTLP) in auto restart mode.

    Option 1: In Data-driven Task-level Parallelism, the kernel will start as soon as the xclbin is loaded onto the device; there won't be any overhead for starting the kernel. This choice is best suited for designs that cannot tolerate or prefer not to for any stalls on the output side of Ethernet/DAC.

    Option 2: In Control and Driven Task-level Parallelism (CDTLP), the host must set the kernel in auto restart mode. This would involve some startup overhead compared to option -1, but the kernel designer needs to do this only once for the entirety of the application run. Similarly, the kernel designer is responsible for stopping the kernel when the application has finished.

    In summary, the decision to use options 1 and 2 depends on how the kernel starts and its pros and cons. Once the modeling style is decided, the kernel designer must decide whether they have at least one non-stream I/O.

  2. What is a non-stream I/O

    Vitis HLS supports memory, stream, and register paradigms, where each paradigm follows a certain interface protocol to interact with the external world. The different paradigms description is documented at Interfaces for Vitis Kernel Flow.

    • Non-Stream I/O:
      • Non-stream I/O falls under the following Interface paradigms:

        • Memory Paradigm
        • Register Paradigm
    • Stream I/O:
      • Stream I/O falls under the following Interface paradigms:

        • Stream Paradigm
  3. Need for Semi-synchronization for non-stream I/O

    Semi-synchronization exchanges I/O data to/from auto restart kernel in an asynchronous, non-blocking way. In an SW acceleration kernel, the I/O is fully synchronized. This means the kernel waits for all the inputs to be available before starting the kernel. In a Semi- synchronized I/O, from the second function start, the kernel will use the old values and does not block the kernel execution for the new updated values. Semi-synchronized I/O can be used in video application that streams continuously. A continuously streamed design might require changing the requirements asynchronously, non-blocking. For example, In the middle of the application run, a video application might want to increase the output image's resolution from HD to 4K. Using the Semi-synchronization feature, the application can change the input parameters without blocking the kernel execution. In HLS, this feature is implemented as a Mailbox feature. Refer to Mailbox Semantics.

    Once the kernel design has decided on semi-synchronization, the kernel designer needs to be aware of the behavior of I/O ports.

  4. The behavior of I/O ports: Stable/Direct I/O

    Depending on the behavior of the input(s), there are three cases when an input (non-stream) argument can change with respect to a kernel execution:

    • Case (a): The I/O never changes during the whole execution of the kernel.
    • Case (b): Can change at any time, regardless of whether the kernel is executing or idle.

    Case (a): The I/O never changes during the whole execution of the kernel.

    • The inputs that only change when the kernel is idle are, by default, treated as stable by the compiler. This means the compiler assumes these inputs will remain constant during the active execution of the kernel. In the example below, the argument "a" is sampled during the kernel's idle state and retains the same value for the entire duration of the kernel's execution.
      void func(hls::stream<int> in, int a,hls::stream<int>out)
      {
      #pragma HLS INTERFACE mode=ap_ctrl_chain
          do {         
              out.write(in.read()+a);
           }
      }

    Case (b): Can change at any time, regardless of whether the kernel is executing or idle.

    • Kernels whose arguments are modeled using hls::direct I/O class can change the value at any point in the kernel execution. The kernel designer is not particular about the timing of the change if the kernel can see the updated values at some point in future time. In the example below, the argument "reset_myCounter" is sampled whenever there is valid input during the kernel's execution.
    void krnl_stream_vdatamover(hls::stream<pkt> &in,
                          hls::stream<pkt> &out,
                                    int mem[DATA],
                          hls::ap_vld<int> &reset_value,
                          hls::ap_vld<int> &reset_myCounter
                          )
     
     
    {for(int i=0;i<DATA;i++)
    {
    ...
      if(reset_myCounter.valid())   
     
     
        {       
            int reset = reset_myCounter.read();
        }
    .. 
    }
    }