Group-By Aggregate Design - 2024.2 English - XD160

Vitis Libraries

Document ID
XD160
Release Date
2024-11-29
Version
2024.2 English

Caution

No updates from the Aggregate kernel and L3 Aggregate API. The 2020.2 released gqePart-32bit + gqeAggr-32bit kernel are employed here.

In L3 Aggregation, all the following solutions are listed:

  1. solution 0: Hash Aggregate, only for testing small datasets
  2. solution 1: Horizontally Cut + Pipelined Hash Aggregation
  3. solution 2: Hash Partition + Pipelined Hash Aggregation

In solution 1, the first input table is horizontally cut into many slices, then do aggregation for each slice, and finally merge the results. In solution 2, the first input table is hash partitioned into many hash partitions, then do aggregation for each partition (no merge in last). Comparing the two solutions, solution 1 introduces extra overhead for CPU merging, while solution 2 added one more kernel (hash partition) execution time. In summary, when the input table has a high unique-ratio, solution 2 will be more beneficial than solution 1. After profiling performance using inputs with different unique key ratios, you get the turning point.

Performance for different L3 strategies

In this figure, it shows when the unique key number is more than 180K~240K, you can switch from solution 2 to solution 3.

Others: 1) Hash Partition only support a maximum of two keys, when grouping by more keys, use solution 2. 2) In solution 1, make one slice scale close to TPC-H SF1. 3) In solution 2, make one partition scale close to TPC-H SF1.