Profiling - 2023.2 English

Vitis Libraries

Release Date
2023.2 English

The hardware resources are listed in Table 160. The Arithmetic and Geometric Asian Engines demand similar amount of resources.

Table 160 Hardware resources for single MCU
Engines BRAM DSP Register LUT Latency clock period(ns)
McAsianArithmeticAPEngine 12 59 24664 26826 53276 3.423
McAsianArithmeticASEngine 12 65 26683 29362 53196 3.423
McAsianGeometricAPEngine 10 61 24626 26657 53222 3.423

Table 161 shows the performance improvement in comparison with CPU-based Quantlib result (Tolerance = 0.02)

Table 161 Comparison between CPU and FPGA
Engines McAsianAPEngine McAsianASEngine McAsianGPEngine
SampNum 25951 33642 46805
CPU result 1.98441 3.10669 3.28924
CPU Execution time (us) 224911 310856 601068
FPGA result 1.89144 3.0866 3.26228
FPGA Kernel Execution time (us) 563.05 734.42 830.130
FPGA SampNum 28672 34816 49152
FPGA E2E Execution time (ms) 1 1 1
Number of MCM 2 2 4

Table 162 shows the max performance of McAsianEngines in one SLR of Xilinx xcu250-figd2104-2L-e (Vivado report).

Table 162 Hardware resources for max perforamnce
Engines BRAM DSP Register LUT Latency
Max Unroll Num
McAsianArithmeticAPEngine 192 749 246325 205626 54056 300 16
McAsianArithmeticASEngine 192 755 247524 207476 54109 300 16
McAsianGeometricAPEngine 160 781 242927 200106 54002 300 16