Asynchronous Mode Support - 1.0 English - PG320

Advanced IO Wizard LogiCORE IP Product Guide (PG320)

Document ID
PG320
Release Date
2025-03-19
Version
1.0 English

In an asynchronous (Beta) mode, there is no incoming clock/strobe associated with data. The CDR (clock data recovery) module is provided in the wizard-generated wrapper to enable the data capture. Data_Out is the parallel data from the receiver. The additional Dataout_Valid signal indicates when Data_Out is valid. Dataout_Valid is useful for ensuring data is not lost, or for when there is a PPM difference between the transmitter and receiver. The Advanced IO Wizard IP supports the following CDR modes:

If the application is set to asynchronous mode, the CDR with PPM difference module is used by default. To use the zero PPM CDR, enable the option Enable ZERO PPM CDR. This option is grayed out by default, but it is available when the application is set to asynchronous.

Figure 1. Asynchronous Mode Structure (Typical FIFO_MODE = ASYNC Setup)

CDR with PPM Difference

Note: In this mode, the I/O pins are required to be differential.

The purpose of CDR is to ensure that the UI sampling is always done at the center for asynchronous signals. Data is received into differential pairs in bitslices. The sampling of UI is done at the same frequency as the data rate. For example, if the SGMII data rate is 1250 MB/s, the RX and TX PLL clock frequency should be 1250 MHz.

The CDR circuitry can stay aligned to receive data as long as the receiver and transmitter use reference clocks that vary up to 100 clock for every million (100 PPM). For 1250 Mb/s, the CDR circuitry can stay aligned as long as the transmitter varies by 0.125 MHz or less.

For asynchronous applications, the wizard configures the XPHY settings to:

  • SERIAL_MODE = TRUE
  • RX_DATA_WIDTH= 8

In this mode, the capture clock is driven by PLL_CLK where REFCLK_FREQUENCY indicates the the frequency. Because the deserializer is DDR, the frequency for FIFO_RD_CLK is F_FIFO_RD_CLK = REFCLK_FREQUENCY * 2 / < RX_DATA_WIDTH> = 1250 * 2 / 4 = 312.5 MHz.

Receive data is converted to the Application Data Width using a gearbox which contains a FIFO. Setting the frequency of rx_app_clk is described in Datapath and Gearbox.

The sample coming from the center of the UI is valid data, and the other sample is used to keep the clock in the center of the data by updating the delay line.

When the clock is in the center of the UI of one of the bitslices, data can be captured without any other delay adjustments. When there is a PPM difference, the delay line increases or decreases to maintain alignment using the phase detector circuitry. The CDR algorithm also has to account for the data rate being faster or slower than the clock rate.

As shown in Figure 1, there are two separate clocks used for reading the receive data out of XPHY. The first clock is used for reading the receive data from XPHY into the gearbox. The CDR with PPM difference has been set up so that a 1250 Mb/s design using 8-bit data needs FIFO_RD_CLK = 312.5 MHz. This setup is related to the fact that the XPHY data uses the PLL_CLK as the capture clock where the parallel data is read out at 312.5 MHz.

The second clock is rx_app_clk from the instantiation template. For 8-bit data running at 1250 Mb/s, 156.25 MHz can be used when the transmitter and receiver are locked to the same clock source because the phase detector does not need to track the phase differences. The example design is set up using a clock divider and can be reviewed for one way to set up a design.

For applications where the transmitter varies between 0 and 100 PPM, to ensure data is not lost, connect rd_clk to fifo_rd_clk and read data out at 312.5 MHz. In this case, the application needs to run off the 312.5 MHz clock and use Dataout_valid.

CDR with PPM blocks are as follows:

  • Phase detector
  • Delay line tracking
  • Overflow underflow filter
  • Datapath and gearbox

Phase Detector

The samples from the master and slave delay lines are fed into an Alexander Bang-Bang phase detector circuit to determine if the delay line value should be increased or decreased. For each UI, two samples are taken from each bitslice. Depending on whether the clock is early or late, the delay is incremented or decremented respectively.
Figure 2. PCLK and NCLK Sampling Same Data

In the preceding figure, both pclk and nclk are sampling the same UI, so the delay value is incremented.

Figure 3. PCLK and NCLK Sampling Different Data

In the preceding figure, both pclk and nclk are sampling different UI, so the delay value is decremented.

The rules are therefore as follows: if X=D, increment and if X!=D, decrement delays. For the master bitslice, consider P data to be X and N data to be D. Automatically, the master bitslice is N centered. For the slave bitslice, consider N data to be X and P data to be D. Automatically, the slave bitslice is P centered.

Delay Line Tracking

The delay line tracking module tracks the delay values of each bitslice. Depending on phase detector output, delay line values are updated after a certain number of cycles (loop bandwidth to see the updated outcome of the previous decision). When the respective D samples received from the PHY are in the center of the UI (will see dithering effect of count value of delay line), the particular bitslice is considered locked until it reaches the delay line boundaries.

When the boundary of delay line is reached, overflow or underflow signals are generated to inform the datapath that the delay line cannot track the particular UI anymore and must move to the next available UI. This is done by overriding the INC/DEC decision from the phase detector until the bitslice reaches the next available UI. During this period, the lock signal is pulled down until you can start tracking the next UI.

Overflow Underflow Filter

This module is used to generate a single pulse of overflow or underflow condition for the datapath from multiple bursts of underflow or overflow signals generated by delay line tracking logic. For a long time, delay line tracking logic generates overflow and underflow signals without going to next UI. These signals toggle because of the drift; the PPM difference between the transmit clock and receive clock. A filtration mechanism is therefore required to generate single-pulse overflow or underflow.

Datapath and Gearbox

This module is responsible for correctly selecting the data from PHY and providing it to the output. When both the delay lines are locked, they are naturally ½ UI apart because one bitslice is PCLK centered and the other is NCLK centered. When the lock happens, depending on which delay line is consuming less, delay becomes the active bitslice (data is given from that particular bitslice). The other bitslice is the monitor, and is always ½ UI ahead or behind depending on drift. When overflow or underflow happens for the active bitslice, switch to the monitor bitslice for data. If overflow or underflow happens for the monitor, update its reference pointer (D_loc).

D_loc is used as reference pointer for both active and monitor bitslices to ensure there is no loss of data when switching from active bitslice to monitor. Both master and slave 4-bit D-data is stored in a 12-bit shift register. D_loc can be from 0–8. 0 represents output data as 0:3 whereas 8 represents 8:11. This data is then stored in a buffer which is 2x8 in size. Whenever 8-bit data is available in the buffer, put out the data and assert DATA_VALID with RX_DATA from the gearbox. The gearbox converts data widths associated with the configuration of XPHY and XPLL. When you move from 0 to 4 in the underflow condition, there is no data to put out, so there are two cycle gaps inDATA_VALID. When you move from 8 to 4, there is extra data to put out, and continuous DATA_VALID can be seen. In all other scenarios, there is alternate assertion of DATA_VALID when rx_app_clk is connected to fifo_rd_clk. When rx_app_clk uses a clock divider (example design), DATA_VALID remains asserted. The clock divider is useful when the transmitter and receiver are using the same reference clock.

For some designs, rx_app_clk can be connected to a clock divider, as shown in the example design for the core, where DATA_VALID remains asserted. The clock divider is useful when the transmitter and receiver are using the same reference clock.

Debug control signals are enabled when ENABLE_CDR_DEBUG parameter is turned on. This has bitwise access to internal signals of CDR.

Figure 4. Data Rate is Slower than Receive Clock

Figure 5. Data Rate is Faster than Receive Clock

Figure 6. Block Diagram for CDR

CDR with Zero PPM

In this case, TX and RX clock must be generated by the same source. Besides this, the I/O pins can be single ended or differential. The CDR algorithm is different for single-ended and differential IO pins. This is described in the following sections.

CDR for Single-Ended IO Design

The purpose of the CDR block is to ensure that UI sampling is always at the center for asynchronous signals. The sampling of UI is performed at the same frequency as the data rate. For example, if the interface speed is 1250 Mb/s, the PLL clock frequency should be 1250 MHz. Similar to CDR with PPM, XPHY is configured with SERIAL_MODE = TRUE, RX_DATA_WIDTH = 8; therefore, the expected FIFO_RD_CLK frequency is F_FIFO_RD_CLK = REFCLK_FREQUENCY * 2 / RX_DATA_WIDTH = 1250 * 2 / 8 = 312.5 MHz. The block diagram of CDR block is shown in the following figure.

Figure 7. Block Diagram of CDR Implementation for Single-Ended IO Designs

The samples from the delay line are fed into the phase detector circuit to determine if the delay line value should be increased or decreased. For each UI, two samples are taken from each bitslice. Depending on whether the clock is early or late, the delay is incremented or decremented.

Depending on the phase detector output, delay line values are updated after a certain number of cycles. When the respective D samples received from the PHY are in the centre of the UI, the particular bitslice is considered locked. When the bitslice is locked, of the eight bits given by the PHY, four bits are given to the RX gearbox. Among the eight bits, four bits are selected depending on whether the data is N centered or P centered.

CDR for Differential IO Design

In the differential case, two lane outputs (both P and N) from the PHY are fed to the CDR block. The sampling of the UI is done at half the frequency of the data rate. For example, if the interface speed is 1250 Mb/s, the PLL clock frequency should be 625 MHz. Similar to CDR with PPM, XPHY is configured with SERIAL_MODE = TRUE, RX_DATA_WIDTH = 8; therefore, the expected FIFO_RD_CLK frequency is F_FIFO_RD_CLK = REFCLK_FREQUENCY * 2 / RX_DATA_WIDTH = 625 * 2 / 8 = 156.25 MHz. The block diagram of the CDR block for differential IOs is shown in the following figure.

Figure 8. Block Diagram of CDR Implementation for Differential IO Designs

For differential IOs, the CDR algorithm is different from that of single-ended IOs. Here, two lanes are fed into the Alexander Bang-Bang detector. For the Alexander Bang-Bang detector to work, one lane should be edge aligned and other lane should be center aligned. This is achieved according to the following flow chart.

Figure 9. Flow Chart for Centering the ‘N’ Lane

As described in the flow chart, the P lane is center aligned and the N lane is edge aligned. After this, both the bitslices are considered as locked and both lanes are fed to the Alexander Bang Bang Detector for VT tracking. Depending on whether the clock is early or late, the delay is incremented or decremented for both lanes (P and N) respectively. When the bitslices are locked, the 8-bit N channel output is given to the gearbox.

Gearbox

The gearbox is used to convert data from one width to another; for example, to convert 8-bit data to 4-bit data. The gearbox read clock is connected to rx_app_clk. Receive data can be read out of the core using rx_data and rx_data_valid. The frequency of rx_app_clk is dependent on the application_data_width setting.