Lecture 2. High-Speed I/O



Similar documents
ECEN689: Special Topics in High-Speed Links Circuits and Systems Spring 2010

ISSCC 2003 / SESSION 13 / 40Gb/s COMMUNICATION ICS / PAPER 13.7

11. High-Speed Differential Interfaces in Cyclone II Devices

ECEN474: (Analog) VLSI Circuit Design Fall 2010

Using Pre-Emphasis and Equalization with Stratix GX

Clocking. Figure by MIT OCW Spring /18/05 L06 Clocks 1

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

Clock Recovery in Serial-Data Systems Ransom Stephens, Ph.D.

Pericom PCI Express 1.0 & PCI Express 2.0 Advanced Clock Solutions

IDT80HSPS1616 PCB Design Application Note - 557

PL-277x Series SuperSpeed USB 3.0 SATA Bridge Controllers PCB Layout Guide

Alpha CPU and Clock Design Evolution

Equalization/Compensation of Transmission Media. Channel (copper or fiber)

6.976 High Speed Communication Circuits and Systems Lecture 1 Overview of Course

Jitter in PCIe application on embedded boards with PLL Zero delay Clock buffer

High-Speed Electronics

ICS379. Quad PLL with VCXO Quick Turn Clock. Description. Features. Block Diagram

A 3.2Gb/s Clock and Data Recovery Circuit Without Reference Clock for a High-Speed Serial Data Link

Fiber Optic Communications Educational Toolkit

DS2187 Receive Line Interface

Introduction to Digital Subscriber s Line (DSL)

TRIPLE PLL FIELD PROG. SPREAD SPECTRUM CLOCK SYNTHESIZER. Features

AVX EMI SOLUTIONS Ron Demcko, Fellow of AVX Corporation Chris Mello, Principal Engineer, AVX Corporation Brian Ward, Business Manager, AVX Corporation

PCB Design Conference - East Keynote Address EMC ASPECTS OF FUTURE HIGH SPEED DIGITAL DESIGNS

ZL40221 Precision 2:6 LVDS Fanout Buffer with Glitchfree Input Reference Switching and On-Chip Input Termination Data Sheet

A 1.62/2.7/5.4 Gbps Clock and Data Recovery Circuit for DisplayPort 1.2 with a single VCO

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Signal Integrity: Tips and Tricks

1+1 PROTECTION WITHOUT RELAYS USING IDT82V2044/48/48L & IDT82V2054/58/58L HITLESS PROTECTION SWITCHING

Equalization for High-Speed Serial Interfaces in Xilinx 7 Series FPGA Transceivers

Loop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements. Application Note

Eye Doctor II Advanced Signal Integrity Tools

SPREAD SPECTRUM CLOCK GENERATOR. Features

Model-Based Synthesis of High- Speed Serial-Link Transmitter Designs

AN ESTIMATION APPROACH TO CLOCK AND DATA RECOVERY

Grounding Demystified

Lecture 11: Sequential Circuit Design

Figure 1 FPGA Growth and Usage Trends

BURST-MODE communication relies on very fast acquisition

LatticeECP3 High-Speed I/O Interface

Phase-Locked Loop Based Clock Generators

Topics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology

Signal Types and Terminations

8B/10B Coding 64B/66B Coding

Latch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead

Managing High-Speed Clocks

Designing the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor

Clocks Basics in 10 Minutes or Less. Edgar Pineda Field Applications Engineer Arrow Components Mexico

Chapter 6: From Digital-to-Analog and Back Again

Analysis and Design of Robust Multi-Gb/s Clock and Data Recovery Circuits

Chapter 6 PLL and Clock Generator

A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector

Application Note. PCIEC-85 PCI Express Jumper. High Speed Designs in PCI Express Applications Generation GT/s

Lecture 24. Inductance and Switching Power Supplies (how your solar charger voltage converter works)

USB 3.0 CDR Model White Paper Revision 0.5

Lecture 10: Sequential Circuits

The Future of Multi-Clock Systems

4 OUTPUT PCIE GEN1/2 SYNTHESIZER IDT5V41186

Lezione 6 Communications Blockset

ICS514 LOCO PLL CLOCK GENERATOR. Description. Features. Block Diagram DATASHEET

Title: Low EMI Spread Spectrum Clock Oscillators

Application Note AN:005. FPA Printed Circuit Board Layout Guidelines. Introduction Contents. The Importance of Board Layout

Lecture 7: Clocking of VLSI Systems

A 1.7 Gbps DLL-Based Clock Data Recovery for a Serial Display Interface in 0.35-μm CMOS

Application Note 58 Crystal Considerations with Dallas Real Time Clocks

The Bus (PCI and PCI-Express)

ICS SPREAD SPECTRUM CLOCK SYNTHESIZER. Description. Features. Block Diagram DATASHEET

Features. Modulation Frequency (khz) VDD. PLL Clock Synthesizer with Spread Spectrum Circuitry GND

Explore Efficient Test Approaches for PCIe at 16GT/s Kalev Sepp Principal Engineer Tektronix, Inc

TIMING-DRIVEN PHYSICAL DESIGN FOR DIGITAL SYNCHRONOUS VLSI CIRCUITS USING RESONANT CLOCKING

Transmission of High-Speed Serial Signals Over Common Cable Media

An Engineer s Guide to Full Compliance for CAT 6A Connecting Hardware

" PCB Layout for Switching Regulators "

ICS Pentium/Pro TM System Clock Chip. Integrated Circuit Systems, Inc. General Description. Pin Configuration.

Guidelines for Designing High-Speed FPGA PCBs

Timing Errors and Jitter

Jitter Transfer Functions in Minutes

W a d i a D i g i t a l

Fairchild Solutions for 133MHz Buffered Memory Modules

Eatman Associates 2014 Rockwall TX rev. October 1, Striplines and Microstrips (PCB Transmission Lines)

QAM Demodulation. Performance Conclusion. o o o o o. (Nyquist shaping, Clock & Carrier Recovery, AGC, Adaptive Equaliser) o o. Wireless Communications

Enhanced Category 5 Cabling System Engineering for Performance

T = 1 f. Phase. Measure of relative position in time within a single period of a signal For a periodic signal f(t), phase is fractional part t p

On Cables and Connections A discussion by Dr. J. Kramer

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

Lecture 3: Signaling and Clock Recovery. CSE 123: Computer Networks Stefan Savage

ANN Based Modeling of High Speed IC Interconnects. Q.J. Zhang, Carleton University

DS2186. Transmit Line Interface FEATURES PIN ASSIGNMENT

1. Memory technology & Hierarchy

How To Get A Better Signal From A Fiber To A Coax Cable

A 125-MHz Mixed-Signal Echo Canceller for Gigabit Ethernet on Copper Wire

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Computer buses and interfaces

MODULATION Systems (part 1)

KVPX CONNECTOR SERIES HIGH SPEED SIGNAL INTEGRITY REPORT

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Electronic Circuits Spring 2007

Wireless Security Camera

Transcription:

Lecture 2 High-Speed I/O Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2007 by Mark Horowitz, with material from Stefanos Sidiropoulos, and Vladimir Stojanovic 1

Readings Readings Techniques for High-speed Implementation of Nonlinear Cancellation, Sanjay Kasturia and Jack H. Winters Overview: Your project will be the design of a circuit that processes the input data from a high-speed I/O. This processing is generally done in a mixed signal manner today, but your job will be to build a digital implementation of the algorithm. This lecture will try to give you some background about why I/O rates are important, and what issues need to be resolved to achieve high performance. The next lecture will discuss the operation of the circuit you need to build. 2

Computers Today CPU DVI, HDMI AGP, PCI-E FSB, HT DDR, RDRAM FBDIMM Display >1GB/s Graphics Controller >4GB/s System Controller >4GB/s Memory PCI-X PCI-E Storage Network I/O >0.1GB/s I/O Controller PCI*, *ATA, USB.. 3

Speed of Light: The Difference Between I/O and On-Chip Wires First question: Why is I/O different from on-chip wires? Both send signals to each other Gates send data to each other all the time Don t generally worry about signals, or delay Model the connection between gates as a capacitor Sometimes a capacitor/resistor network Answer: On-chip, ignore the speed of light, assume c infinite For external wires can t make that assumption Wire connecting the pins is not an equipotential References are different 4

Finite Speed of Light Ramifications Signals must have delay in reaching destination Td = L/ν, bits arrive at a different time than when sent Thus must determine right time to sample them Wires store energy Current is set by the geometry of wire (what else?) Signal can t see termination resistor (causality) V/I for the line is called the impedance, Z < 300 Ω When signal is traveling on the wire Power goes into the wire before it hits load Since energy is conserved, wire must be storing energy Signal is ALWAYS a pair of currents 5

Link Issues Signaling: getting the bit to the receiver R TERM R TERM Tx Channel Rx Timing: Determining which bit is which 1 0 0 1 0 1 0 t bit /2 6

Transmission Lines Wire where you notice c is finite Current flows in one terminal And flows out the other Figure from John Poulton Energy is stored in E and B fields But can model with L, C 7

Problems : Material Loss Loss in GETEK : 1m, 8mil μstrip trace H(s) (transfer function) Frequency PCB Loss : skin & dielectric loss Skin Loss f Dielectric loss f : a bigger issue at higher f 8

Dealing With Current Return/References Wire Utilization: Single Ended shared signal return path Differential explicit signal return path - + Pseudo Differential ref - + 9

Transmission Lines Z2 Z1 ------------------- Z1+ Z2 2Z2 ------------------- Z1+ Z2 Z1 Z2 Two constraints govern behavior at any junction: Voltage are equal They are electrically connected Power is conserved Energy flow into junction is equal to transmitted and reflected 10

Z2 High-Speed Wires Are Point to Point Can t split a wire to go to two location You will get a reflection from the junction Z1 will see impedance discontinuity Z1 Z2 11

At High Speeds, Vias are Stubs Top layer signaling results in large via stub Signal energy splits at via If via is short can be modeled as a cap load Causes a reflection in signal Higher the frequency, the more sensitive you are to stubs 12

Backplane Environment Line card trace On-chip parasitic Package (termination resistance and device loading capacitance) Package via Back plane trace Back plane connector Line card via Backplane via Line attenuation Reflections from stubs (vias) 13

Backplane Channel Loss is variable Same backplane Different lengths Different stubs Top vs. Bot Attenuation is large >30dB @ 3GHz But is that bad? Attenuation [db] 0-10 -20-30 -40-50 -60 9" FR4, via stub 9" FR4 26" FR4, via stub 26" FR4 0 2 4 6 8 10 frequency [GHz] 14

Inter-Symbol Interference (ISI) Channel is low pass Our nice short pulse gets spread out pulse response 1 0.8 0.6 0.4 0.2 Tsymbol=160ps Dispersion short latency (skin-effect, dielectric loss) Reflections long latency (impedance mismatches connectors, via stubs, device parasitics, package) 0 0 1 2 3 ns 15

ISI 1 0.8 Error! Amplitude 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 Symbol time Middle sample is corrupted by 0.2 trailing ISI (from the previous symbol), 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISI As a result middle symbol is detected in error 16

Equalization For Loss : Goal is to Flatten Response + = Channel is band-limited Equalization : boost high-frequencies; or attenuate low freq 17

Equalization Mechanisms 1 0.8 No equalization 0.6 0.4 Tx equalization Amplitude 0.6 0.4 Amplitude 0.2 0 0.2-0.2 0 0 2 4 6 8 10 12 14 16 18 Symbol time -0.4 0 2 4 6 8 10 12 14 16 18 Symbol time Tx equalization Pre-filter the pulse with the inverse of the channel Filters the low freq. to match attenuation of high freq. Rx feedback equalization Subtract the error from the signal 18

Removing ISI Linear transmit equalizer Tx Data Anticausal taps Sampled Data Deadband Feedback taps Channel Causal taps 50Ω d 50Ω outp outn d Tap Sel Logic Decision-feedback equalizer I eq0 Transmit and Receive Equalization Changes signal to correct for ISI Initial work was at transmitter J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003. 19

Transmit Equalization Headroom Constraint Tx Data Anticausal taps Peak power constraint Channel Attenuation [db] 0-5 -10-15 unequalized equalized Causal taps -20 frequency [GHz] -25 0 0.5 1 1.5 2 2.5 Amplitude of equalized signal depends on the channel Transmit DAC has limited voltage headroom Unknown target signal levels Harder to make adaptive equalization work Need to tune the equalizer and receive comparator levels If you have multi-level signals 20

Removing Interference at Receiver Could also build a linear filter Could have gain in the filter But either it would need to be analog and have gain Or need high-speed A/D And real multiplication Sum (ai*xi) Increases channel noise too 21

High Frequency Channel Noise: Crosstalk Many sources On-chip Package PCB traces Inside connector Differential signaling can help Minimize xtalk generation & make effects common-mode Both NEXT & FEXT NEXT very destructive if RX and TX pairs are adjacent Full swing-tx coupling into attenuated RX signal Effect on SNR is multiplied by signal loss Simple solution : group RX/TX pairs in connector NEXT typically 3-6%, FEXT typically 1-3% 22

Subtract Out Residual Interference Called Decision feedback equalization (DFE) Subtracts error from input No attenuation 1 0.8 Feedback equalization Problem with DFE Need to know interfering bits ISI must be causal Problem - latency in the decision circuit Receive latency + DAC settling < bit time Can increase allowable time by loop unrolling Receive next bit before the previous is resolved Amplitude 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 Symbol time 23

Removing ISI Linear transmit equalizer Tx Data Anticausal taps Sampled Data Deadband Feedback taps Channel Causal taps 50Ω d 50Ω outp outn d Tap Sel Logic Decision-feedback equalizer I eq0 Transmit and Receive Equalization Changes signal to correct for ISI Initial work was at transmitter J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003. 24

One Bit Loop Unrolling (for 2 level signal) 2PAM signal constellation 1 1 + αd 1 +α K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops," IEEE International Symposium on Circuits and Systems, May 1990 1 +α +1 1 α +α +α 1 α + α 1 = 1 d n d n 0 α α x n dclk DQ d n 1 1 1 +α 1 α 1 +α 1 α α dclk 1 = 0 d n d n Instead of subtracting the error Move the slicer level to include the interference Slice for each possible level, since previous value unknown 25

More Bits/Hz Multi-level signaling (aka PAM) Convert extra voltage margin to more bits Works well when the noise is small Need even more signal processing 26

Internal Speed Limitation Links need good quality clocks with low jitter That means you want them to settle to both Vdd, and Gnd If you make the clock to fast, it will not rail And that means it will be prone to jitter So one limitation for links is internal clock rate For power efficiency want FO on clock to be around 4 Need pulse width 3-4 times the slowest gate Gives around 8 FO4 clock For higher speed bit rates Need to generate multiple bits/clock Use non-static CMOS clock circuits (CML & inductors) 27

Simple Demultiplexing Receiver Input Data_E in ref pre latch Data_O clk clk 2-1 demux at the input Preconditioning stage: filter/integrate, can be clocked to avoid ISI Reject CM Sometimes not used Latch makes decision (4-FO4) 28

Simple Multiplexing Transmitter DDR: send a bit per clock edge Critical issues: 50% duty cycle Tbit > 4-FO4 Data_O Data_E output pulse width closure (%) 30 20 10 0 1 2 3 4 5 bit time (normalized to FO4) 29

I/O Clocking Issues Remember the clocking issues: Long path constraint (setup time) Short path constraint (hold time) Need to worry about them for I/O as well For I/O need to worry about a number of delays Clock skew between chips Data delay between chips Can be larger than a clock cycle (speed of light) Clock skew between external clock and internal clock This can be very large if not compensated It is essentially the insertion delay of the clock tree 30

System Clocking: Simple Synchronous Systems d1 CK X CK X D I CK C1 CK C2 D I d2 on-chip logic CK C1 CK C2 Long bit times compared to on chip delays: Rely on buffer delays to achieve adequate timing margin 31

PLLs: Creating Zero Delay Buffers PLL/DLL CK X CK C CK X D I on-chip logic D I CK C On-chip clock might be a multiple of system clock: Synthesize on-chip clock frequency On-chip buffer delays do not match Cancel clock buffer delay 32

Used to Argue About PLLs vs DLLs VCO VCDL clk clk N ref clk PD ref clk PD Filter Filter Second/third order loop: Stability is an issue Frequency synthesis easy Ref. Clk jitter gets filtered Phase error accumulates First order loop: Stability guaranteed Frequency synthesis problematic Ref. Clk jitter propagates Phase error does not accumulate 33

After Many Years of Research And many papers and products One can mess up either a DLL or PLL Each has it own strengths and weaknesses If designed correctly, either will work well Jitter will be dominated by other sources Many good designs have been published It is now a building block that is often reused We all have our favorites, mine is the dual-loop design And yes, people use ring oscillators Still an open question about how much LC helps (in system) 34

Clocking Structures Synchronous: Same frequency and phase Conventional buses t t F 0 Mesochronous Same frequency, unknown phase Fast memories Internal system interfaces MAC/Packet interfaces t A t A t B F 0 t B Plesiochronous: Almost the same frequency Mostly everything else today F 1 F 2 F 1 F 2 35

Source Synchronous Systems CK SRC PLL/DLL CK RCV data rcvr logic ref CK SRC data D 0 D 1 D 2 D 3 CK RCV Position on-chip sampling clock at the optimal point i.e. maximize timing margin 36

Serial Link Circuit CK R rcvr logic D IN D 0 D 1 D IN CDR CK R Recover incoming data fundamental frequency Position sampling clock at the optimal point 37