Compute Unit Calls - Maximum - 2019.2 English - UG1315

Vitis Guidance Messaging (UG1315)

Document ID
UG1315
Release Date
2019-10-30
Version
2019.2 English

Description

This rule checks for compute units that are called too often, thus incurring an overhead.

Explanation

Workgroups provide a powerful mechanism to split operations across multiple compute units (CUs). However, creating a kernel with too small a work size can incur an overhead since each CU call has a startup time. It is recommended to find the right balance of global and local work sizes to optimize overall performance.

Consider a global work size of (1024, 1024, 1); Xilinx does not recommend:
__kernel __attribute__ ((reqd_work_group_size(4, 4, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}
In contrast, Xilinx recommends:
__kernel __attribute__ ((reqd_work_group_size(<b>256</b>, <b>256</b>, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}

Recommendation

For more details regarding workgroup size and their manipulation, refer to "Interface Attributes" in the "Interface Optimization" chapter of the SDAccel Environment Profiling and Optimization Guide (UG1207) .