Compute Unit Calls - Maximum - 2021.2 English

Vitis Guidance Messaging (UG1315)

Document ID
UG1315
Release Date
2021-10-27
Version
2021.2 English

Description

This rule checks for compute units that are called too often, thus incurring an overhead.

Explanation

Workgroups provide a powerful mechanism to split operations across multiple compute units (CUs). However, creating a kernel with too small a work size can incur an overhead since each CU call has a startup time. Xilinx® recommends finding the right balance of global and local work sizes to optimize overall performance.

Consider a global work size of (1024, 1024, 1); Xilinx does notrecommend:
__kernel __attribute__ ((reqd_work_group_size(4, 4, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}
In contrast, Xilinx recommends:
__kernel __attribute__ ((reqd_work_group_size(<b>256</b>, <b>256</b>, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}

Recommendation

For more details regarding workgroup size and their manipulation, see the "Interface Attributes" section in Optimizing the Performance in Vitis Accelerated Software Development Flow Documentation in the Vitis Unified Software Platform Documentation (UG1416).