Profiling - 2024.2 English

Vitis Libraries

Release Date
2024-11-29
Version
2024.2 English

The hardware resources are listed in Table 177. The Arithmetic and Geometric Asian Engines demand similar amount of resources.

Table 177 Hardware resources for single MCU
Engines BRAM DSP Register LUT Latency clock period(ns)
McAsianArithmeticAPEngine 12 59 24664 26826 53276 3.423
McAsianArithmeticASEngine 12 65 26683 29362 53196 3.423
McAsianGeometricAPEngine 10 61 24626 26657 53222 3.423

Table 178 shows the performance improvement in comparison with CPU-based Quantlib result (Tolerance = 0.02)

Table 178 Comparison between CPU and FPGA
Engines McAsianAPEngine McAsianASEngine McAsianGPEngine
SampNum 25951 33642 46805
CPU result 1.98441 3.10669 3.28924
CPU Execution time (us) 224911 310856 601068
FPGA result 1.89144 3.0866 3.26228
FPGA Kernel Execution time (us) 563.05 734.42 830.130
FPGA SampNum 28672 34816 49152
FPGA E2E Execution time (ms) 1 1 1
Number of MCM 2 2 4

Table 179 shows the max performance of McAsianEngines in one SLR of an AMD xcu250-figd2104-2L-e (Vivado report).

Table 179 Hardware resources for max perforamnce
Engines BRAM DSP Register LUT Latency
Frequency
(MHz)
Max Unroll Num
McAsianArithmeticAPEngine 192 749 246325 205626 54056 300 16
McAsianArithmeticASEngine 192 755 247524 207476 54109 300 16
McAsianGeometricAPEngine 160 781 242927 200106 54002 300 16