The theoretical throughput depends on different factors. A floating point kernel will be faster than a double kernel. A smaller dw and larger Wmax will provide more accurate results but will decrease throughput. The kernel has been pipelined in order to increase the throughput when a large number of input needs to be processed.
Throughput is composed of three processes; transferring data to the FPGA, running the computations and transferring the results back from the FPGA. The demo contains options to measure timings as described in the README.md file.
As an example, processing a batch of 1000 call calculations with a floating point kernel with dw = 0.5 and Wmax = 200 breaks down as follows:
Time to transfer data = 0.26ms
Time for 1000 calculations = 14.4ms (equates to 14.4us per calculation)
Time to transfer results = 0.18ms