The initial version of the accelerated application code structure follows the structure of the original software version. The entire input buffer is transferred from the host to the FPGA in a single transaction. Then, the FPGA accelerator performs the computation. Finally, the results are read back from the FPGA to the host before being post-processed.
The following figure shows the sequential process of the host writing data on the device, compute by the accelerator on the FPGA, and read flags back to host, implemented in this first step. The Profile score is calculated sequentially on CPU after all the flags are received by the host.
The FPGA accelerator computes the hash values and flags for the provided input words.
The functionality of the different inputs passed to the accelerator kernel is as follows:
input_doc_words
: Input array that contains the 32-bit words for all the documents.bloom_filter
: Bloom filter array that contains the inserted search array hash values.total_size
: Unsignedint
that represents the total size processed by the FPGA when called.load_weights
: Boolean that allows thebloom_filter
array to load only once to the FPGA in the case of multiple kernel invocations.
The output of the accelerator is as follows:
output_inh_flags
: Output array of 8-bit outputs where each bit in the 8-bit output indicates whether a word is present in the Bloom filter, that is then used for computing score in the CPU.