The bloom-filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not - in other words, a query returns either “possibly in set” or “definitely not in set”. (from Wikipedia)
The 3-in-1 GQE kernel supports Bloom-Filter probe flow, that the implementation fully utilizes the high bandwidth feature of HBM to accelerate the query (both build and probe) ability and expands the capacity as large as possible at the same time.
Since Bloom-Filter shares the same framework as 3-in-1 GQE. Thus, the input key and payload should be 64-bit width, and 1 or 2 key column(s), 1 playoad column plus 1 validation-bit column is allowed to be applied to the kernel.
The input/output columns for using the bloom-filter flow can be explained as:
Input | Column 0 | key 0 |
Column 1 | key 1 (if dual-key is enabled) | |
Column 2 | payload | |
Validation Column | input validation bits | |
Output | Column 0 | validation bits |
Column 1 | unused | |
Column 2 | unused | |
Column 3 | unused |
Meanwhile, a proper size in bits of the bloom filter should be set through calling setBloomfilterSize
API in class KernelCommand
to get a reasonable false positive probability under a specific set of unique keys.