Description
This rule checks for compute units that are called too often, thus incurring an overhead.
Explanation
Workgroups provide a powerful mechanism to split operations across multiple compute units (CUs). However, creating a kernel with too small a work size can incur an overhead since each CU call has a startup time. Xilinx® recommends finding the right balance of global and local work sizes to optimize overall performance.
__kernel __attribute__ ((reqd_work_group_size(4, 4, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
c[i] = a[i] + b[i];
}
__kernel __attribute__ ((reqd_work_group_size(<b>256</b>, <b>256</b>, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
c[i] = a[i] + b[i];
}
Recommendation
For more details regarding workgroup size and their manipulation, see the "Interface Attributes" section in Optimizing the Performance in Vitis Accelerated Software Development Flow Documentation in the Vitis Unified Software Platform Documentation (UG1416).