Group-By Aggregate Design - 2023.2 English

Caution

No updates from Aggregate kernel and L3 Aggregate API. The 2020.2 released gqePart-32bit + gqeAggr-32bit kernel are employed here.

In L3 Aggregation, all solutions are listed below:

solution 0: Hash Aggregate, only for testing small datasets.
solution 1: Horizontally Cut + Pipelined Hash Aggregation
solution 2: Hash Partition + Pipelined Hash aggregation

In solution 1, the first input table is horizontally cut into many slices, then do aggregation for each slice, finally merge results. In solution 2, the first input table is hash partitioned into many hash partitions, then do aggregation for each partition (no merge in last). Comparing the two solutions, solution 1 introduces extra overhead for CPU merging, while solution 2 added one more kernel(hash partition) execution time. In summary, when input table has a high unique-ratio, solution 2 will be more beneficial than solution 1. After profiling performance using inputs with different unique key ratios, we get the turning point.

In this figure, it shows when the unique key number is more than 180K~240K, we can switch from solution 2 to solution 3.

Others: 1) Hash Partition only support max 2 keys, when grouping by more keys, use solution 2 2) In solution 1, make one slice scale close to TPC-H SF1. 3) In solution 2, make one partition scale close to TPC-H SF1.