Control-Driven Task-Level Parallelism (CDTLP), follows standard C semantics for example: function parameters must be available before the function is called. This means that non-stream I/O parameters are provided to the kernel at the start of its execution and remain constant until the kernel completes. Synchronization points, such as the function start and end, allow for predictable updates of these parameters.
In contrast, Data-Driven Task-Level Parallelism (DTLP) designs operate similarly to RTL designs. These kernels run continuously from the moment the FPGA is programmed until a reset occurs. There are no function starts or completion points where non-stream I/O parameters can be updated in the traditional sense because the kernel is always active.
Despite the lack of synchronization points in DTLP designs, non-stream I/O can still be supported with a key consideration: the kernel designer does not require precise timing for when these non-stream inputs change. The important aspect is that the kernel can access the updated data at some point during its continuous execution.
There are two types of behaviors for non-stream I/O:
- Case (a): Can change at any time, regardless of whether the kernel is executing or idle. Kernels whose arguments are modeled using the hls::direct I/O class can change the value at any point in the kernel execution. The kernel designer is not particular about the timing of the change if the kernel can see the updated values at some point in future time.
- Case (b): The I/O never changes during the whole execution of the kernel. The inputs that only change when the kernel is idle are, by default, treated as stable by the compiler. This means the compiler assumes these inputs will remain constant during the active execution of the kernel.
The Stable Memory I/O should be marked as stable
in the KPN context as shown in the following code example.
void write_process(hls::stream<int>& in,
hls::stream<int>& out,
int* mem)
{
#pragma HLS PIPELINE off
int val;
static int addr = 0;
in.read(val);
if (addr >= 32)
addr = 0;
//hls::print("writing %d\n", addr);
mem[addr] = val;
addr++;
val = mem[addr-1];
out.write(val);
}
...
...
...
...
void stable_pointer(int* mem,
hls::stream<int>& in,
hls::stream<int>& out)
{
#pragma HLS DATAFLOW
#pragma HLS INTERFACE mode=s_axilite port=mem offset=30
#pragma HLS INTERFACE mode=m_axi bundle=gmem depth=256 max_read_burst_length=16 \
max_widen_bitwidth=512 max_write_burst_length=16 num_read_outstanding=16 \
num_write_outstanding=16 port=mem
#pragma HLS stable variable=mem
hls_thread_local hls::stream<int> int_fifo("int_fifo");
#pragma HLS STREAM depth=512 type=fifo variable=int_fifo
hls_thread_local hls::stream<int> int_fifo2("int_fifo2");
#pragma HLS STREAM depth=512 type=fifo variable=int_fifo2
hls_thread_local hls::task t1(process_23, in, int_fifo);
hls_thread_local hls::task t2(process_11, int_fifo, int_fifo2);
hls_thread_local hls::task t3(write_process, int_fifo2, out, mem);
}
For both C/RTL Co-simulation the un-synchronized access needs to be
enabled for hls::task
and M_AXI interfaces via the cosim.enable_tasks_with_m_axi
command as described in Co-Simulation Configuration.