To explore the performance comparison between two solutions for NVMe over TCP workloads, an analysis is conducted on identical server hardware. This ensures a fair and equivalent comparison between the AMD Alveo U45N FPGA paired with the Mangoboost NTI solution, and a software NVMe over TCP stack on the NVIDIA ConnectX-6 Dx.
NVMe over TCP Initiator and Target Servers Hardware
- Server type: PowerEdge R7615
- CPU: 9174F 16-core (single socket, SMT = OFF)
- Memory: 192 G (12 x 16 GB) DDR5 4800
NVMe over TCP Initiator Configuration
- AMD Alveo U45N with MangoBoost solution
- Nvidia ConnectX-6 Dx
NVMe over TCP Target Configuration
- Nvidia ConnectX-6
- NULL block device used in testing
Software Environment
- OS: Ubuntu 22.04.5 LTS
- Kernel: 5.15.0-94-generic
- GRUB parameters:
amd_iommu=oniommu=pt irqpoll - SPDK version: 24.01
- MangoBoost solution: MangoBoost-nvme-tcp-2025-03-21
- Nvidia drivers and firmware
- ConnectX™ driver version: 24.10-2.1.8
- ConnectX 6-DX firmware version: 22.39.1002
The comparison is performed using the flexible I/O tester (FIO), an industry-standard open-source benchmark tool for storage performance.
The following figure illustrates that the Alveo U45N with Mangoboost NTI exhibits exceptional NVMe over TCP read performance gain compared to CX-6 Dx across all tested block sizes under a workload configuration with 32 concurrent jobs. Using a 4 KB block size in FIO benchmark testing, the U45N achieved an average read only throughput of 5.37 M IOPS (175.96 Gb/s), nearly three times the 1.93 M IOPS (63.23 Gb/s) achieved by the CX-6 Dx, demonstrating its efficient data transfer capabilities. In a write-only benchmark, the U45N delivers an average write throughput of 5.34 M IOPS (174.96 Gb/s), compared to the Nvidia 3.4M IOPS (111.43 Gb/s), representing up to a 60% improvement.
The following figure compares the Alveo U45N with the NTI and Nvidia CX-6 Dx in mixed NVMe over TCP workloads under a read/write (70:30) configuration with 32 concurrent jobs. At a 4k block size, NTI on U45N delivers approximately 5.35 M IOPS (175.33 Gb/s) total throughput (read and write) — significantly outperforming the CX-6 Dx’s 2.31 M IOPS (75.65 Gb/s). The U45N provides higher mixed read and write performance, highlighting its superior bandwidth efficiency and scalability in demanding mixed workload environments.
The following figure presents the random read-only throughput performance of the AMD Alveo U45N with the NTI and the Nvidia CX-6 Dx with a 4 KB block size. Results indicate a substantial performance advantage for the AMD Alveo U45N. As the number of concurrent jobs increases, the Alveo U45N consistently delivers approximately up to 3x the throughput of the Nvidia CX-6 Dx. The scalability characteristics of the Alveo U45N demonstrate superior performance in environments with high I/O concurrency.