HLS Fence Library - 2024.2 English - UG1399

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2024-11-13
Version
2024.2 English

The Vitis HLS compiler can reorder unrelated accesses to improve performance. However, to ensure logic correctness (For example; preserve the program semantics), it preserves the sequential semantics of the program, for both "control dependences" and "data dependences."

When the user wants, in addition, to force some order between unrelated accesses, the Vitis HLS fence can be used to guarantee two (or more) accesses to streams, arrays, and scalar function arguments passed by reference are scheduled following the sequential order implied by the source code.

Important: To use fence, include the header file hls_fence.h into your source. An example of fence usage can be found in Vitis-HLS-Introductory-Examples/Modeling on GitHub.

Full and Half Fence

The library offers two different types of fences.

Full Fence

hls::fence(obj_1, obj_2, ..., obj_n);

The full fence constrains the scheduling of the accesses to obj_1, obj_2, ..., obj_n. Any access that is, from a sequential point of view, before (resp. after) the fence will remain scheduled before (resp. after) the fence.

Half Fence

The half fence is less restrictive and is useful in pipeline loops.

hls::fence({obj_1, obj_2, ..., obj_n}, {obj'_1, obj'_2, ..., obj'_p});

The fence above constrains the access schedule as follows: objects in the first set of curly braces can be scheduled earlier than the fence. But if they are before the fence in sequential order, they cannot be scheduled after. Similarly, accesses to objects in the second set (right curly braces) can be scheduled later than the fence. But if they are after the fence in sequential order, they cannot be scheduled before.

Note: The full fence hls::fence(a,b) is equivalent to the half fence hls::fence({a, b}, {a, b});.

Practical Examples

Here are two typical scenarios where hls::fence is useful.

Avoiding Deadlocks

Suppose two communicating dataflow processes.

Producer(...) {
    int bound = ...;
    strm1.write(bound);
    for (int i=0; i<bound; i++) {
        strm2.write(...);
    }
}
 
Consumer(...) {
    int bound = strm1.read();
    for (int i=0; i<bound; i++) {
        ... = strm2.read();
    }   
}

The second process (Consumer) needs to read the stream strm1 to access the loop bound and start iterating on reading the stream strm2. This loop bound is sent by the first process (Producer) before writing in strm2.

In that arrangement, both processes can overlap and only a small FIFO is needed to store the elements in strm2. In principle, you would not expect the compiler and/or the scheduler to move the write of strm1 after the loop. In this case, the consumer cannot start reading before all elements are written in strm2 and a stream of depth "bound" is needed to avoid a deadlock.

The use of the fence will guarantee that the write of strm1 happens before the write of strm2, as follows.

Producer(...) {
    int bound = ...;
    strm1.write(bound);
    hls::fence({strm1}, {strm2});
    for (int i=0; i<bound; i++) {
        strm2.write(...);
    }
}
 
Consumer(...) {
    int bound = strm1.read();
    for (int i=0; i<bound; i++) {
        ... = strm2.read();
    }   
}

Configuring the Vivado LogiCORE FFT

The initialization for the configuration of the FFT and the loop to send the data can be written as follows:

void inputdatamover(... input_parameters, config_t *fft_config, cmpxDataIn in[FFT_LENGTH], cmpxDataIn xn[FFT_LENGTH]) {
   config_t fft_config_tmp;
   fft_config_tmp.setDir(...);
   fft_config_tmp.setSch(...);
   *fft_config = fft_config_tmp;
  
   for(int i=0; i<FFT_LENGTH; i++) {
   #pragma HLS pipeline rewind
      xn[i] = in[i];
   }
}

For correctness it is imperative that the configuration of the FFT gets captured prior to the actual processing of the data.

As written above, there is no guarantee, and a fence might be needed:

void inputdatamover(... input_parameters, config_t *fft_config, cmpxDataIn in[FFT_LENGTH], cmpxDataIn xn[FFT_LENGTH]) {
   config_t fft_config_tmp;
   fft_config_tmp.setDir(...);
   fft_config_tmp.setSch(...);
   *fft_config = fft_config_tmp;
   hls::fence({fft_config}, {xn});
   for(int i=0; i<FFT_LENGTH; i++) {
   #pragma HLS pipeline rewind style=flp
      xn[i] = in[i];
   }
}

Known Limitations

The fence cannot be used as a barrier involving all variables by default. All variables that need to be constrained should be explicitly arguments of the fence.

Fences can be applied in a scheduling region (pipeline or sequential) but not into a dataflow region.