In the previous step, you implemented a sequential execution of the written words from the host, computing hash functions on the FPGA, and reading flags by the host.
The compute does not start until the entire input is read into the FPGA, and similarly, the host read from the FPGA does not start until compute is done on the FPGA.
In this lab, you will work with an:
Overlap of host data transfer and compute on the FPGA with split buffers (two buffers)
Split the documents and send them to the FPGA in two iterations.
The kernel can start the compute as soon as the data for the corresponding iteration is transferred to the FPGA.
Overlap of host data transfer and compute with multiple buffers
Explore how the application performance is affected based on splitting the documents and into 2, 4, 8, 16, 32, 64, and 128 chunks.
Overlap data transfer from host, compute on FPGA and profile score on the CPU.
Enables the host to start profile scores as soon as the flags are received.