The hardware resources are listed in Table 160. The Arithmetic and Geometric Asian Engines demand similar amount of resources.
Engines | BRAM | DSP | Register | LUT | Latency | clock period(ns) |
McAsianArithmeticAPEngine | 12 | 59 | 24664 | 26826 | 53276 | 3.423 |
McAsianArithmeticASEngine | 12 | 65 | 26683 | 29362 | 53196 | 3.423 |
McAsianGeometricAPEngine | 10 | 61 | 24626 | 26657 | 53222 | 3.423 |
Table 161 shows the performance improvement in comparison with CPU-based Quantlib result (Tolerance = 0.02)
Engines | McAsianAPEngine | McAsianASEngine | McAsianGPEngine |
SampNum | 25951 | 33642 | 46805 |
CPU result | 1.98441 | 3.10669 | 3.28924 |
CPU Execution time (us) | 224911 | 310856 | 601068 |
FPGA result | 1.89144 | 3.0866 | 3.26228 |
FPGA Kernel Execution time (us) | 563.05 | 734.42 | 830.130 |
FPGA SampNum | 28672 | 34816 | 49152 |
FPGA E2E Execution time (ms) | 1 | 1 | 1 |
Number of MCM | 2 | 2 | 4 |
Table 162 shows the max performance of McAsianEngines in one SLR of Xilinx xcu250-figd2104-2L-e (Vivado report).
Engines | BRAM | DSP | Register | LUT | Latency |
|
Max Unroll Num |
McAsianArithmeticAPEngine | 192 | 749 | 246325 | 205626 | 54056 | 300 | 16 |
McAsianArithmeticASEngine | 192 | 755 | 247524 | 207476 | 54109 | 300 | 16 |
McAsianGeometricAPEngine | 160 | 781 | 242927 | 200106 | 54002 | 300 | 16 |