Throughput is composed of two processes: transferring data to/from the FPGA and running the computations. The demo contains options to measure timings as described in the README.md file.
As an example, processing a batche of 19683 call calculations with a floating point kernel breaks down as follows:
Total time (memory transfer time plus calculation time) = 1175us
Calculation time (kernel execution time) = 405us
Memory transfer time = 770us
Throughput = 48.5663 Mega options/sec