Internals of Hash-Join-v3 and Hash-Build-Probe-v3 - 2024.1 English

Vitis Libraries

Release Date
2024-08-06
Version
2024.1 English

This document describes the structure and execution of Hash-Join-V3 and Hash-Build-Probe-v3, implemented as hashJoinV3 and hashBuildProbeV3 functions.

Hash Join MPU Structure

The Hash-Join-v3 and Hash-Build-Probe-v3 are general primitives to accelerate the Hash-Join algorithm utilizing the advantage of high memory bandwidth in an AMD FPGA. Hash-Join-v3 performs Hash-Join in single-call mode which means the small table and the big table should be scanned one after another. Hash-Build-Probe-v3 provides a separative solution for build and probe in Hash-Join. In Hash-Build-Probe, an incremental build is supported, and the workflow can be scheduled as build0 -> build1 -> probe0 -> build2 -> probe2…

Workload is distributed based on the most significant bits (MSBs) of the hash value of join key to the processing unit (PU), so that each PU can work independently. The current design uses a maximum number of PUs, served by four input channels through each of which a pair of key and payload can be passed in each cycle. Overflow processing is provided in this design.

The number of PU is set to 8, as each PU requires two dedicated bank to temporarily store rows in the small table (one for base rows and another for overflow rows). Due to the DDR/HBM memory access delay, four channels can serve enough data to these PUs. Each PU performs HASH-JOIN in three phases.

  1. Build: With small table as input, the number of keys falls into each hash values are counted. The value of hash counters are stored in bit vector in URAM. Every hash value has a fixed depth of storage in HBM to store rows of small table. If the counter of hash value is larger than the fixed depth, it means that overflow happens. Another bit vector is used for counting overflow rows. Also, the overflow of small table will be stored in another area in the HBM.
Build
  1. Merge: Accumulating the overflow hash counter to get the offsets of each hash value. Then, the overflow rows will be read in form one HBM and write out to another HBM. The output address of overflow rows is according to the offset of its hash value. By operation of Merge, you can put the overflow rows into its right position and wait for Probe. To provide high throughput in this operation, two dedicated HBM ports are required for each PU, which provides read and write accessibilities at the same time.