xcl_pipeline_loop - 2020.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
Release Date
2020.2 English


You can pipeline a loop to improve latency and maximize kernel throughput and performance.

Although unrolling loops increases concurrency, it does not address the issue of keeping all elements in a kernel data path busy at all times. Even in an unrolled case, loop control dependencies can lead to sequential behavior. The sequential behavior of operations results in idle hardware and a loss of performance.

Xilinx addresses this issue by introducing a vendor extension on top of the OpenCL 2.0 API specification for loop pipelining using the XCL_PIPELINE_LOOP attribute.

By default, the v++ compiler automatically pipelines loops with a trip count more than 64, or unrolls loops with a trip count less than 64. This should provide good results. However, you can choose to pipeline loops (instead of the automatic unrolling) by explicitly specifying the NOUNROLL attribute and XCL_PIPELINE_LOOP attribute before the loop.


Place the attribute in the OpenCL source before the loop definition:



  • <II_number>: Specifies the desired initiation interval (II) for the pipeline. The Vitis HLS tool tries to meet this request; however, based on data dependencies, the loop might have a larger initiation interval. When the II is not specified, the default is 1.


The following example specifies an II target of 3 for the for loop in the specified function:

__kernel void f(__global int *a) {
  for (unsigned i = 0; i < 64; ++i)
    a[i] = i;