The functionality of the host code is described as below:
Creates dedicated 15 CU handles for 15 CUs
Submits 15 CU execution requests
When a CU is finished, it is executed again. In this way, all 15 CUs kept running.
The above process continues for a certain time, in this example for 20 seconds.
After 20 seconds host code calculates the total number of completed CU executions.
The greater number of CU executions in a given time interval signifies the more work done with increased throughput.
As a side note, for brevity, the host code is simplified by using the same data input per CU execution. As the host code is solely focusing on CU execution, it is a simplified version without implementing other typical host functionalities such as verification of returned data from the CUs, error checking, etc.