Aurora 64B/66B is a lightweight, serial communications protocol for multi-gigabit links. On Alveo card, Aurora IP utilize GT transceiver (such as GTY) to realize the high speed data transfer. Each kind of Alveo accelerator cards have one or two QSFP28 ports, which connects to the GT transceiver of the FPGA. The user can integrate Aurora IP into Alveo card to implement full-duplex communication between cards at up to 100 Gbps data throughput on each QSFP28 port. The Aurora IP provides standard AXI Stream ports to user application for data transfer, and with Vitis flow, the users can easily integrate the Aurora IP into their acceleration design based on the Vitis target platform.
Following is the block diagram of Aurora 64B/66B communication channel.
For details on Aurora 64B/66B protocol please refer to Aurora 64B/66B Protocol Specification.
For details on Aurora 64B/66B IP please refer to Aurora 64B/66B IP Product Guide.
This tutorial will provide an example design and step-by-step instruction for integrating Aurora IP into Alveo accelerator cards with Vitis flow. The example design integrates a four-lane Aurora kernel with 10 Gbps lane rate (achieve total 40 Gbps throughput). The complete design steps in this tutorial includes: Aurora IP generation, reference RTL top module for Aurora IP, example test system integration, and example x86 host program. Following is the hardware block diagram of the example design.
There are three kernels in the hardware design:
krnl_aurora: this is a RTL kernel. krnl_aurora instantiates the Aurora core IP, an AXIS data FIFO for data transmit, an AXIS data FIFO for data receive, and an AXI control slave for Aurora IP status read back.
strm_issue: this is a HLS kernel, which implements a simple AXI master to AXI stream bridge for data transmit. It read the data from on-board global memory and send them to the Aurora core.
strm_dump: this is a HLS kernel, which implements a simple AXI stream to AXI master bridge for data receive. It receive the data from Aurora core and write them to the on-board global memory.
In the example design, host transfers block data into the on-board global memory, loads the data to Aurora core, and then stores the loopback data into on-board global memory for integrity check. To run the real hardware test of the design, you will need a 40 Gbps QSFP+ (0dB, 0W) loopback module inserted in the QSFP port of the Alveo cards. In case your Alveo card has two QSFP ports, please insert the module into QSFP 0. The loopback module looks like below photo.
The design supports Ubuntu 18.04/20.04 and Redhat/CentOS 7/8 systems and is supported with following XRT and target platform version:
XRT 2.14.354
Alveo U200: xilinx_u200_gen3x16_xdma_2_202110_1
Alveo U250: xilinx_u250_gen3x16_xdma_4_1_202210_1
Alveo U280: xilinx_u280_gen3x16_xdma_1_202211_1
Alveo U50: xilinx_u50_gen3x16_xdma_5_202210_1
Alveo U55C: xilinx_u55c_gen3x16_xdma_3_202210_1
All the flows in the example design are provided as command line fashion, which utilize Makefile and Tcl scripts. During some steps in this tutorial, some GUI operations are used for explicit explanation purpose. Below is the files description of the design directory.
├── hls
│ ├── strm_dump.cpp # HLS C source code for strm_dump kernel
│ └── strm_issue.cpp # HLS C source code for strm_issue kernel
├── host
│ └── host_krnl_aurora_test.cpp # x86 host program
├── krnl_aurora_test.cfg # Vitis link configuration file
├── Makefile # Makefile for full flow
├── README.md
├── rtl
│ ├── krnl_aurora_control_s_axi.v # Verilog source code for AXI control slave module
│ └── krnl_aurora.v # Verilog source code for top level of krnl_aurora
├── tcl
│ ├── gen_aurora_ip.tcl # Tcl script to generate Aurora IP
│ ├── gen_fifo_ip.tcl # Tcl script to generate AXI stream data FIFO
│ └── pack_kernel.tcl # Tcl script to package the RTL kernel krnl_aurora
└── xdc
└── aurora_64b66b_0.xdc # additional XDC file for krnl_aurora
Notes: If you are using RedHat/CentOS 7, the default installed GCC version is 4.x.x. You must use the following command to install and switch to GCC 7 before compiling the x86 host program.
sudo yum install centos-release-scl
sudo yum install devtoolset-7-gcc-c++
scl enable devtoolset-7 bash