The theoretical throughput depends on different factors. A floating point kernel will be faster than a double kernel. A larger MAX_N will provide more accurate results but will decrease throughput. The kernel has been pipelined in order to increase the throughput when a large number of inputs is to be processed.
Throughput is composed of three processes: transferring data to the FPGA, running the computations and transferring the results back from the FPGA. The demo contains options to measure timings as described in the README.md file.
As an example, processing a batch of 2048 call calculations with a floating point kernel with MAX_N = 100 breaks down as follows:
Time to transfer data = 0.207ms
Time for 2048 calculations = 0.969ms (equates to ~0.47us per calculation)
Time to transfer results = 0.078ms