IN RECENT YEARS, the increase of data transmission over

1356 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 8, AUGUST 2004 A 3.125-Gb/s Clock and Data Recovery Circuit for the 10-Gbase-LX4 Ethernet Rong-Jyi Yang, Student Member, IEEE, Shang-Ping Chen, and Shen-Iuan Liu, Senior Member, IEEE Abstract A 3.125-Gb/s clock and data recovery (CDR) circuit using a half-rate digital quadricorrelator frequency detector and a shifted-averaging voltage-controlled oscillator is presented for 10-Gbase-LX4 Ethernet. It can achieve low-jitter operation and improve pull-in range without a reference clock. This CDR circuit has been fabricated in a standard 0.18- m CMOS technology. It occupies an active area of 0 6 0 8 mm 2 and consumes 83 mw from a single 1.8-V supply. The measured bit-error rate is less than 10 12 for 2 7 1 PRBS 3.125-Gb/s data. It can meet the jitter tolerance specifications for the 10-Gbase-LX4 Ethernet application. Index Terms Clock and data recovery (CDR), frequency detector, quadricorrelator. I. INTRODUCTION IN RECENT YEARS, the increase of data transmission over the internet has led to the demand for high-speed serial-data communication networks. Several optical communication standards have been applied to high-speed and long-distance communications. Considerable design efforts have been focused on low-cost, low-power integrated fiber-optic transmitters and receivers. Clock and data recovery (CDR) circuits can be used for receivers to generate the clocks synchronized with received data. For different applications, CDR circuits must satisfy the specifications defined by standards such as 10-Gbase Ethernet [1]. The loop bandwidth of CDRs [2] [5] should be small to improve noise performances. However, it will result in small capture and pull-in ranges. CDRs without frequency acquisition loops might need either additional reference clock[3] or off-chip tuning [4]. Digital quadricorrelators[6],[7] have been widely used in frequency acquisition loops because they can be reliable and tolerant to process, voltage and temperature variations. However, the conventional digital quadricorrelator frequency detector(dqfd)[7] could be only suitable for CDRs with full-rate clocks. To lower the power consumption, clock relaxing techniques [2] [5] have been employed to achieve the higher bit-rate transmission with lower clock rate. Considering the power consumption, half-rate CDRs [3] [5] may be a better choice. However, a half-rate frequency detector (FD) would be needed. In this paper, a half-rate DQFD is presented for a half-rate CDR and its operational principle will be explained. This DQFD can enlarge the pull-in range of a CDR and not disturb the loop while the frequency is locked. The mismatch between quadrature clocks will affect the operation of a DQFD. To improve this issue, a shifted-averaging voltage-controlled oscillator (VCO) [8] is employed to improve the accuracy of clock and the jitter performance. Manuscript received December 16, 2003; revised April 13, 2004. The authors are with the Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R.O.C. (e-mail: lsi@cc.ee.ntu.edu.tw). Digital Object Identifier 10.1109/JSSC.2004.831809 II. CIRCUIT DESCRIPTION The proposed half-rate CDR circuit consisting of the proposed half-rate DQFD, a shifted-averaging VCO [8], a half-rate phase detector (PD) [5], and two charge pumps, as shown in Fig. 1. The proposed half-rate DQFD can be realized by eight DFFs, two XOR gates, and combinational logics as shown in Fig. 2(a) and (b). The truth table for combinational logics in the proposed DQFD is shown in Fig. 2(c). According to the results clocks of 0,45,90, and 135 are sampled by input data, each half of clock period can be divided into four states, I, II, III, and IV, as shown in Fig. 2(d). In the proposed DQFD, four DFFs triggered by clock of 0 will store the sampled values and record the states. There is a rising edge of clock of 0 to ensure this state to have been recorded. In other words, all valid state transitions have to rotate counterclockwise and cross the arrow in Fig. 2(d). The arrow represents the edge of clock of 0 to rise at the boundary between state IV and state I. The operational principle of the proposed half-rate DQFD will be discussed in the following. For a slow periodic data as shown in Fig. 3(a), suppose that the first rising edge of data appears at the boundary between state III and state IV. Assume the range of a bit time,, could be where is the clock period. Then, the second rising edge appears at the boundary between state IV and state I. The state transition rotated from state IV to state I would be detected. This state transition would indicate that the clock rate is faster than the half data rate; i.e., frequency DOWN should be active. For a fast periodic data in Fig. 3(a), the first rising edge appears at the boundary between state I and state II. Assume that the range of a bit time could be The second one appears at the boundary between state IV and state I. Then the third one appears at the boundary between state IV and state I. Note that the second rising edge should occur in state I. This state I would not be recorded because there is no additional edge of clock of 0 to rise between the second and third rising edges of data. The new state IV sampled by the third rising edge of data will replace the previous state I. Then, the state transition would rotate from state I to state IV. This state transition would indicate that the clock rate is slower than the half data rate; i.e., frequency UP should be active. Combining (1) and (2), the range of clock period can be expressed as (1) (2) (3) 0018-9200/04$20.00 2004 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 8, AUGUST 2004 1357 Fig. 1. Half-rate CDR architecture. Fig. 2. (a) Schematic of half-rate DQFD. (b) Combinational logics. (c) Truth table. (d) State representation. Under this condition, two state transitions, which are mentioned before, could be used to perform frequency acquisition for periodic data. For a random data sequence, it may contain bits between two consecutive rising-edge nonreturn-to-zero (NRZ) data. The nominal range of could be from 2 to 13 for the pseudorandom bit sequence (PRBS) of 2. Actually, the cases of would be considered and the case of occurs only once. As shown in Fig. 3(b), the time duration of bits can be expressed as where and are integers and. If the first rising edge of data leads the clock of 0, is defined as the time duration (4) from the first rising edge of data to the rising edge of clock with 0. Otherwise, is the time from the first rising edge of data to the falling edge of clock with 0. And, if the second rising edge of data leads the clock of 0, is the time from the falling edge of clock with 0 to the second rising edge of data. Else, is the time from the rising edge of clock with 0 to the second rising edge of data. Both and should be smaller than. For state I transiting to state IV as shown in Fig. 3(b), will be satisfied. For all possible state transitions, the ranges of plus are listed in Table I. According to (4), can be expressed as (5)

1358 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 8, AUGUST 2004 Fig. 3. Timing diagram for (a) slow and fast periodic data, (b) the state transition from state I to state IV, and (c) the state transition from state IV to state I. TABLE I SUMMATION OF t AND t FOR ALL STATE TRANSITIONS Suppose that this state transition could be selected to speed up the clock, it implies should be smaller than. Substituting the maximum of in (5) into (3), the range of can be given as If, the value of integer is 0 and is smaller than. It means the estimation of frequency detection is right. If, the possible integer is 4 or 5. The case of makes greater than yielding a malfunction of the DQFD. Generally speaking, as long as the probability of correct operation is greater than that of false operation, the DQFD will work properly. The state transition from state I to state IV could be chosen for frequency UP with random data sequence. (6) For the state transition from state IV to state I as shown in Fig. 3(c), could be satisfied. can be expressed as If this state transition could be selected to slow down the clock, should be larger than. Substituting the minimum of in (7) into (3), the range of can be given as By the same method as mentioned before, the cases of from 2 to 6 have been checked that the frequency detection is correct. The state transition from state IV to state I could be chosen for frequency DOWN with random data sequence. (7) (8)

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 8, AUGUST 2004 1359 Fig. 4. Transfer curve of the proposed DQFD. As the clock rate lies around the upper and lower bounds in (3), the density of frequency correction would be very low. Two additional state transitions, such as state transition from state I to state III and state transition from state II to state IV, are chosen to aid the speed-up process. Similarly, two additional state transitions, such as state transition from state III to state I and state transition from state IV to state II, are chosen for the slow-down process. The complete truth table of the half-rate DQFD which can work properly with random NRZ data is shown in Fig. 2(c). The simulated transfer curve of the proposed DQFD is illustrated in Fig. 4. When the clock rate is close to half data rate, their relation can be described as (9) Fig. 5. Die photo of this work. TABLE II PERFORMANCE SUMMARY where is the time difference between and. The state which is sampled by input data would remain the same but the sampled point would drift by a value of. As the sampled point crossing the boundaries of states, a state transition would occur. Assume the time interval between two consecutive UP/DOWN pulses is and all in are the same. In this time interval, the accumulated phase difference would be about and can be expressed as (10) By substituting (10) into (9), the normalized frequency offset can be calculated as (11) The frequency offset would approach to zero as approaches to infinite. It implies that there are no more UP/DOWN pulses when frequency is the same. The half-rate DQFD will not disturb the PD when locked. The shifted-averaging VCO [8] with the differential delay stages [9] is used to reduce the error caused by the mismatches among the delay stages. Compared with a conventional VCO, the shifted-averaging VCO will have better phase accuracy and jitter performance at the cost of doubling gate counts and power consumption [8]. A half-rate Hogge PD [5] is employed in this work to extract high-speed phase information. All logic components in the PD are implemented in current-mode logic (CML) [10]. Similarly, the charge pump is implemented with CML XOR gates [5]. III. EXPERIMENTAL RESULTS The proposed CDR has been fabricated in a standard 0.18- m CMOS technology. Two off-chip capacitors Ce and Cp are 1.7 nf and 60.3 nf, respectively. Another 70-pF on-chip capacitor in parallel with Ce was placed on chip to stabilize the control voltage on the VCO and alleviate the disturbance of bond-wire inductance. This CDR consumes 83 mw from a single 1.8-V supply and occupies an active area of mm including a 70-pF on-chip capacitor. Fig. 5 shows the die photograph. Since the coding scheme in 10-Gbase-LX4 Ethernet is 8 B/ 10 B, the maximum run length of consecutive zeros or ones is 5. This CDR is measured using NRZ data with a PRBS of 2. Fig. 6(a) illustrates the retimed data and clock. The output impedance of the open drain buffer is not matched to 50- and causes an overshoot at the retimed data eye diagram. The measured jitter histogram and spectrum of the retimed clock

1360 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 8, AUGUST 2004 Fig. 6. Measured (a) eye diagram of retimed data for 2 0 1 PRBS, (b) jitter histogram of retimed clock, (c) spectrum of retimed clock, and (d) jitter tolerance. as the CDR is locked to 3.125-Gb/s NRZ data are given in Fig. 6(b) and 6(c), respectively. The measured rms and peak-topeak jitter is 2.2 and 16 ps, respectively. The measured bit-error rate (BER) is smaller than 10. Fig. 6(d) shows the measured jitter tolerance of this CDR and it could meet the specifications of 10-Gbase-LX4 Ethernet. Table II gives the performance summary of this work. IV. CONCLUSION A 3.125-Gb/s CDR circuit incorporating the proposed shifted-averaging VCO and half-rate DQFD is realized in a 0.18- m standard CMOS technology for 10-Gbase-LX4 Ethernet. The shifted-averaging VCO improves the phase accuracy and the jitter performance. The half-rate DQFD enlarges the pull-in range and does not interfere with the steady-state performance. From the measurement results, the CDR circuit could meet the 10-Gbase-LX4 Ethernet specifications and its BER is less than 10. ACKNOWLEDGMENT The authors would like to thank Chip Implementation Center (CIC), Taiwan, for fabricating this chip. REFERENCES [1] Media Access Control (MAC) Parameters, Physical Layer, and Management Parameter for 10 Gb/s Operation, IEEE Draft P802.3ae/D3.3, 2000. [2] S. J. Song, S. M. Park, and H. J. Yoo, A 4-Gb/s CMOS clock and data recovery circuit using 1/8-rate clock technique, IEEE J. Solid-State Circuits, vol. 38, pp. 1213 1219, July 2003. [3] S. H. Lee, M. S. Hwang, Y. Choi, S. Kim, Y. Moon, B. J. Lee, D. K. Jeong, W. Kim, Y. J. Park, and G. Ahn, A 5 Gb/s 0.25 m CMOS jitter-tolerant variable-interval oversampling clock/data recovery circuit, IEEE J. Solid-State Circuits, vol. 37, pp. 1822 1830, Dec. 2002. [4] J. E. Rogers and J. R. Long, A 10 Gb/s CDR/DEMUX with LC delay line VCO in 0.18-um CMOS, IEEE J. Solid-State Circuits, vol. 37, pp. 1781 1789, May 2002. [5] J. Savoj and B. Razavi, A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector, IEEE J. Solid-State Circuits, vol. 36, pp. 761 767, May 2001. [6] A. Pottbacker, U. Langmann, and H. Schreiber, A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s, IEEE J. Solid- State Circuits, vol. 27, pp. 1747 1751, Dec. 1992. [7] B. Stilling, Bit rate and protocol independent clock and data recovery, Electron. Lett., vol. 36, pp. 824 825, Apr. 2000. [8] H. H. Chang, S. P. Chen, and S. I. Liu, A shifted-averaging VCO with precise multiphase outputs and low jitter operation, in Proc. 29th Eur. Solid-State Circuits Conf., Sept. 2003, pp. 647 650. [9] W. S. T. Yan and H. C. Luong, A 900-MHz CMOS low-phase-noise voltage-controlled ring oscillator, IEEE Trans. Circuits Syst. II, vol. 48, pp. 216 221, Feb. 2001. [10] M. M. Green and U. Singh, Design of CMOS CML circuits for highspeed broadband communications, in Proc. IEEE Int. Symp. Circuits and Systems, vol. II, May 2003, pp. 204 207.