This section provides L2 performance benchmarks and Quality of Results (QoR) for the AI Engine solver library elements across various configurations. The results are extracted from hardware-emulation-based simulations.
The devices used for benchmarking are:
- AIE: xcvc1902-vsva2197-2MP-e-S
- AIE-ML: xcve2802-vsvh1760-2MP-e-S
- AIE-MLv2: xc2ve3858-ssva2112-2LP-e-S
The benchmark results are obtained with an AI Engine clock frequency of 1.25 GHz (AIE and AIE-ML devices) or 1.05 GHz (AIE-MLv2), and 64-bit PLIOs at 625 MHz.
The metrics reported for each case are:
- Latency: The time delay between the first input sample and the first output sample. If there are multiple ports, the latency is recorded from the first input and first output port.
- Throughput: Input throughput calculated based on the number of samples per iteration and the time between each consecutive iteration.
- NUM_BANKS: Number of memory banks used by the design.
- NUM_AIE: Number of AI Engine tiles used by the design.
- DATA_MEMORY: Total data memory in bytes used by the design.
- PROGRAM_MEMORY: Program memory in bytes used by each kernel.
The AIE_VARIANT parameter indicates the AI Engine type used for each case in the benchmark results: AIE, AIE-ML, or AIE-MLv2.
The PROGRAM_MEMORY metric is collected for each kernel in the design. For example, a QR decomposition (QRD) configured to run on two tiles (CASC_LEN=2) will have two sets of figures displayed in the following table (space-delimited).