hashGroupAggregate - 2024.1 English

Vitis Libraries

Release Date
2024-08-06
Version
2024.1 English
#include "xf_database/hash_group_aggregate.hpp"
template <
    int _WKey,
    int _KeyNM,
    int _WPay,
    int _PayNM,
    int _HashMode,
    int _WHashHigh,
    int _WHashLow,
    int _CHNM,
    int _Wcnt,
    int _WBuffer,
    int _BurstLenW = 32,
    int _BurstLenR = 32
    >
void hashGroupAggregate (
    hls::stream <ap_uint <_WKey>> strm_key_in [_CHNM][_KeyNM],
    hls::stream <ap_uint <_WPay>> strm_pld_in [_CHNM][_PayNM],
    hls::stream <bool> strm_e_in [_CHNM],
    hls::stream <ap_uint <32>>& config,
    hls::stream <ap_uint <32>>& result_info,
    ap_uint <_WBuffer>* ping_buf0,
    ap_uint <_WBuffer>* ping_buf1,
    ap_uint <_WBuffer>* ping_buf2,
    ap_uint <_WBuffer>* ping_buf3,
    ap_uint <_WBuffer>* pong_buf0,
    ap_uint <_WBuffer>* pong_buf1,
    ap_uint <_WBuffer>* pong_buf2,
    ap_uint <_WBuffer>* pong_buf3,
    hls::stream <ap_uint <_WKey>> aggr_key_out [_KeyNM],
    hls::stream <ap_uint <_WPay>> aggr_pld_out [3][_PayNM],
    hls::stream <bool>& strm_e_out
    )

Generic hash group aggregate primitive.

With this primitive, the max number of lines of aggregate table is bound by the AXI buffer size.

The group aggregation values are updated inside the chip, and when a hash-bucket overflows, the overflowed rows are spilled into external buffers. The overflow buffer will be automatically re-scanned, and within each round, a number of distinct groups will be aggregated and emitted. This algorithm ends when the overflow buffer is empty and all groups are aggregated.

Attention

  1. This module can accept multiple input row of key and payload pair per cycle.
  2. The max distinct groups aggregated in one pass is 2 ^ (1 + _WHash).
  3. When the width of the input stream is not fully used, data should be aligned to the little-end.
  4. It is highly recommended to assign the ping buffer and pong buffer in different HBM banks, input and output in different DDR banks for a better performance.
  5. The max number of lines of aggregate table cannot bigger than the max DDR/HBM SIZE used in this design.
  6. When the bit-width of group key is known to be small, say 10-bit, please consider the directAggregate primitive, which offers smaller utilization, and requires no external buffer access.

Parameters:

_WKey width of key, in bit.
_KeyNM maximum number of key column, maximum is 8.
_WPay width of max payload, in bit.
_PayNM maximum number of payload column, maximum is 8.
_HashMode control hash algotithm, 0: radix 1: lookup3.
_WHashHigh number of hash bits used for dispatch pu.
_WHashLow number of hash bits used for hash-table.
_CHNM number of input channels.
_Wcnt width of ‘number of keys’ per hash value, in bits.
_WBuffer width of HBM/DDR buffer(ping_buf and pong_buf).
_BurstLenW burst len of writting unhandled data.
_BurstLenR burst len of reloading unhandled data.
strm_key_in input of key streams.
strm_pld_in input of payload streams.
strm_e_in input of end signal.
config information for initializing primitive, contains op for maximum of 8 columns, key column number(less than 8), pld column number(less than 8) and initial aggregate cnt.
result_info result information at kernel end, contains op, key_column, pld_column and aggregate result cnt
ping_buf0 DDR/HBM ping buffer for unhandled data.
ping_buf1 DDR/HBM ping buffer for unhandled data.
ping_buf2 DDR/HBM ping buffer for unhandled data.
ping_buf3 DDR/HBM ping buffer for unhandled data.
pong_buf0 DDR/HBM pong buffer for unhandled data.
pong_buf1 DDR/HBM pong buffer for unhandled data.
pong_buf2 DDR/HBM pong buffer for unhandled data.
pong_buf3 DDR/HBM pong buffer for unhandled data.
aggr_key_out output of key columns.
aggr_pld_out output of pld columns. [0][*] is the result of min/max/cnt for pld columns, [1][*] is the low-bit value of sum/average, [2][*] is the hight-bit value of sum/average.
strm_e_out is the end signal of output.