#include "xf_database/hash_group_aggregate.hpp"
template < int _WKey, int _KeyNM, int _WPay, int _PayNM, int _HashMode, int _WHashHigh, int _WHashLow, int _CHNM, int _Wcnt, int _WBuffer, int _BurstLenW = 32, int _BurstLenR = 32 > void hashGroupAggregate ( hls::stream <ap_uint <_WKey>> strm_key_in [_CHNM][_KeyNM], hls::stream <ap_uint <_WPay>> strm_pld_in [_CHNM][_PayNM], hls::stream <bool> strm_e_in [_CHNM], hls::stream <ap_uint <32>>& config, hls::stream <ap_uint <32>>& result_info, ap_uint <_WBuffer>* ping_buf0, ap_uint <_WBuffer>* ping_buf1, ap_uint <_WBuffer>* ping_buf2, ap_uint <_WBuffer>* ping_buf3, ap_uint <_WBuffer>* pong_buf0, ap_uint <_WBuffer>* pong_buf1, ap_uint <_WBuffer>* pong_buf2, ap_uint <_WBuffer>* pong_buf3, hls::stream <ap_uint <_WKey>> aggr_key_out [_KeyNM], hls::stream <ap_uint <_WPay>> aggr_pld_out [3][_PayNM], hls::stream <bool>& strm_e_out )
Generic hash group aggregate primitive.
With this primitive, the max number of lines of aggregate table is bound by the AXI buffer size.
The group aggregation values are updated inside the chip, and when a hash-bucket overflows, the overflowed rows are spilled into external buffers. The overflow buffer will be automatically re-scanned, and within each round, a number of distinct groups will be aggregated and emitted. This algorithm ends when the overflow buffer is empty and all groups are aggregated.
Attention
- This module can accept multiple input row of key and payload pair per cycle.
- The max distinct groups aggregated in one pass is
2 ^ (1 + _WHash)
. - When the width of the input stream is not fully used, data should be aligned to the little-end.
- It is highly recommended to assign the ping buffer and pong buffer in different HBM banks, input and output in different DDR banks for a better performance.
- The max number of lines of aggregate table cannot bigger than the max DDR/HBM SIZE used in this design.
- When the bit-width of group key is known to be small, say 10-bit, please consider the
directAggregate
primitive, which offers smaller utilization, and requires no external buffer access.
Parameters:
_WKey | width of key, in bit. |
_KeyNM | maximum number of key column, maximum is 8. |
_WPay | width of max payload, in bit. |
_PayNM | maximum number of payload column, maximum is 8. |
_HashMode | control hash algotithm, 0: radix 1: lookup3. |
_WHashHigh | number of hash bits used for dispatch pu. |
_WHashLow | number of hash bits used for hash-table. |
_CHNM | number of input channels. |
_Wcnt | width of ‘number of keys’ per hash value, in bits. |
_WBuffer | width of HBM/DDR buffer(ping_buf and pong_buf). |
_BurstLenW | burst len of writting unhandled data. |
_BurstLenR | burst len of reloading unhandled data. |
strm_key_in | input of key streams. |
strm_pld_in | input of payload streams. |
strm_e_in | input of end signal. |
config | information for initializing primitive, contains op for maximum of 8 columns, key column number(less than 8), pld column number(less than 8) and initial aggregate cnt. |
result_info | result information at kernel end, contains op, key_column, pld_column and aggregate result cnt |
ping_buf0 | DDR/HBM ping buffer for unhandled data. |
ping_buf1 | DDR/HBM ping buffer for unhandled data. |
ping_buf2 | DDR/HBM ping buffer for unhandled data. |
ping_buf3 | DDR/HBM ping buffer for unhandled data. |
pong_buf0 | DDR/HBM pong buffer for unhandled data. |
pong_buf1 | DDR/HBM pong buffer for unhandled data. |
pong_buf2 | DDR/HBM pong buffer for unhandled data. |
pong_buf3 | DDR/HBM pong buffer for unhandled data. |
aggr_key_out | output of key columns. |
aggr_pld_out | output of pld columns. [0][*] is the result of min/max/cnt for pld columns, [1][*] is the low-bit value of sum/average, [2][*] is the hight-bit value of sum/average. |
strm_e_out | is the end signal of output. |