Description
Pipeline a work item to improve latency and throughput. Work item pipelining is the extension of loop pipelining to the kernel work group. This is necessary for maximizing kernel throughput and performance.
Syntax
Place the attribute in the OpenCL API source before the elements to pipeline:
__attribute__((xcl_pipeline_workitems))
Example 1
To handle the reqd_work_group_size
attribute in the following example, Vitis
technology automatically inserts a loop nest to handle the three-dimensional
characteristics of the ND range (3,1,1). As a result of the added loop nest, the
execution profile of this kernel is like an unpipelined loop. Adding the
XCL_PIPELINE_WORKITEMS attribute adds concurrency and improves the throughput of the
code.
kernel
__attribute__ ((reqd_work_group_size(3,1,1)))
void foo(...)
{
...
__attribute__((xcl_pipeline_workitems)) {
int tid = get_global_id(0);
op_Read(tid);
op_Compute(tid);
op_Write(tid);
}
...
}
Example 2
The following example adds the work-item pipeline to the appropriate elements of the kernel:
__kernel __attribute__ ((reqd_work_group_size(8, 8, 1)))
void madd(__global int* a, __global int* b, __global int* output)
{
int rank = get_local_size(0);
__local unsigned int bufa[64];
__local unsigned int bufb[64];
__attribute__((xcl_pipeline_workitems)) {
int x = get_local_id(0);
int y = get_local_id(1);
bufa[x*rank + y] = a[x*rank + y];
bufb[x*rank + y] = b[x*rank + y];
}
barrier(CLK_LOCAL_MEM_FENCE);
__attribute__((xcl_pipeline_workitems)) {
int index = get_local_id(1)*rank + get_local_id(0);
output[index] = bufa[index] + bufb[index];
}
}