Performance Comparison: AMD Alveo U45N vs. NVIDIA CX6 - WP564

NVMe over TCP Storage Disaggregation Accelerated by AMD Alveo U45N SmartNIC (WP564)

Document ID
WP564
Release Date
2025-07-24
Revision
1.0.1 English

To explore the performance comparison between two solutions for NVMe over TCP workloads, an analysis is conducted on identical server hardware. This ensures a fair and equivalent comparison between the AMD Alveo U45N FPGA paired with the Mangoboost NTI solution, and a software NVMe over TCP stack on the NVIDIA ConnectX-6 Dx.

Figure 1. NVMe over TCP Test Setup

NVMe over TCP Initiator and Target Servers Hardware

  • Server type: PowerEdge R7615
  • CPU: 9174F 16-core (single socket, SMT = OFF)
  • Memory: 192 G (12 x 16 GB) DDR5 4800

NVMe over TCP Initiator Configuration

  • AMD Alveo U45N with MangoBoost solution
  • Nvidia ConnectX-6 Dx

NVMe over TCP Target Configuration

  • Nvidia ConnectX-6
  • NULL block device used in testing

Software Environment

  • OS: Ubuntu 22.04.5 LTS
  • Kernel: 5.15.0-94-generic
  • GRUB parameters: amd_iommu=on iommu=pt irqpoll
  • SPDK version: 24.01
  • MangoBoost solution: MangoBoost-nvme-tcp-2025-03-21
  • Nvidia drivers and firmware
    • ConnectX™ driver version: 24.10-2.1.8
    • ConnectX 6-DX firmware version: 22.39.1002

The comparison is performed using the flexible I/O tester (FIO), an industry-standard open-source benchmark tool for storage performance.

The following figure illustrates that the Alveo U45N with Mangoboost NTI exhibits exceptional NVMe over TCP read performance gain compared to CX-6 Dx across all tested block sizes under a workload configuration with 32 concurrent jobs. Using a 4 KB block size in FIO benchmark testing, the U45N achieved an average read only throughput of 5.37 M IOPS (175.96 Gb/s), nearly three times the 1.93 M IOPS (63.23 Gb/s) achieved by the CX-6 Dx, demonstrating its efficient data transfer capabilities. In a write-only benchmark, the U45N delivers an average write throughput of 5.34 M IOPS (174.96 Gb/s), compared to the Nvidia 3.4M IOPS (111.43 Gb/s), representing up to a 60% improvement.

Figure 2. NVMe over TCP Read-only and Write-only Throughput Benchmark Results

The following figure compares the Alveo U45N with the NTI and Nvidia CX-6 Dx in mixed NVMe over TCP workloads under a read/write (70:30) configuration with 32 concurrent jobs. At a 4k block size, NTI on U45N delivers approximately 5.35 M IOPS (175.33 Gb/s) total throughput (read and write) — significantly outperforming the CX-6 Dx’s 2.31 M IOPS (75.65 Gb/s). The U45N provides higher mixed read and write performance, highlighting its superior bandwidth efficiency and scalability in demanding mixed workload environments.

Figure 3. NVMe over TCP Read/Write 70:30 Throughput Benchmark Results

The following figure presents the random read-only throughput performance of the AMD Alveo U45N with the NTI and the Nvidia CX-6 Dx with a 4 KB block size. Results indicate a substantial performance advantage for the AMD Alveo U45N. As the number of concurrent jobs increases, the Alveo U45N consistently delivers approximately up to 3x the throughput of the Nvidia CX-6 Dx. The scalability characteristics of the Alveo U45N demonstrate superior performance in environments with high I/O concurrency.

Figure 4. NVMe-over-TCP Read-only Throughput Benchmark Results Across Different FIO Job Configurations