This section provides the L2 performance benchmarks and Quality of Results (QoR) for the AI Engine digital signal processing (DSP) library elements with various configurations. The results are extracted from a hardware emulation based simulations.
The devices used for benchmarking are:
- AIE: xcvc1902-vsva2197-2MP-e-S,
- AIE-ML is the xcve2802-vsvh1760-2MP-e-S.
- AIE-MLv2 is the xc2ve3858-ssva2112-2LP-e-S.
The benchmark results are obtained using these devices wth an AI Engine clock frequency of 1.25 GHz (AIE and AIE-ML devices) or 1.05 GHz (AIE-MLv2 device) and 64-bit PLIOs at 625 MHz.
The metrics reported for each case are:
- Latency: The time delay between the first input sample and the first output sample. If there are multiple ports, the latency is recorded from the first input and first output port.
- Throughput: Input throughput calculated based on the number of samples per iteration and the time between each consecutive iteration.
- NUM_BANKS: Number of memory banks used by the design.
- NUM_AIE: Number of AI Engine tiles used by the design.
- DATA_MEMORY: Total data memory in bytes used by the design.
- PROGRAM_MEMORY: Program memory in bytes used by each kernel.
The AIE_VARIANT parameter refers to the type of AI Engine that is used for each particular case in the benchmark results, this may be AIE, AIE-ML or AIE-MLv2.
The PROGRAM_MEMORY metrics are harvested for each kernel the design consists of. For example, a finite impulse response (FIR) configured to be implemented on two tiles (CASC_LEN=2) will have two sets of figures displayed in the following table (space delimited).