PCI Express 4.0 Electrical Previews Rick Eads Keysight Technologies, EWG Member
Disclaimer The information in this presentation refers to specifications still in the development process. This presentation reflects the current thinking of various PCI-SIG workgroups, but all material is subject to change before the specifications are released. Copyright 2014, PCI-SIG, All Rights Reserved 2
Overview PCIe 4.0 motivations and assumptions Separate Reference Clocks with Independent SSC (SRIS) Transmitter Reference Clock Receiver Channel and CEM connector Copyright 2014, PCI-SIG, All Rights Reserved 3
PCIe 4.0 Motivations and Assumptions We continue to see a requirement to increase PCIe bandwidth Networking, Storage, High Performance Computing Motivations for PCIe 2.x->3.0 apply equally for 3.0->4.0 Eco-system impact of a new generation drives requirement for 2x increase in delivered bandwidth Desirable to extend PCIe 3.0 infrastructure and PHY architecture for another generation Moving to a new infrastructure such as electrical or optical waveguides likely breaks backwards compatibility Highly desirable to preserve current usage models With incremental improvements 3.0 PHY architecture is capable of higher data rates Copyright 2014, PCI-SIG, All Rights Reserved 4
PCIe 4.0 Overview Key attributes of PCIe 4.0 16 GT/s, using scrambling, same as 8GT/s Maintains backward compatibility with installed base of PCIe devices Limited channel reach: approx. 12 one connector Longer channels require retimers or lower loss channels New features Uniform spec methodology applied across all data rates (as possible) Support for independent Refclk clocking mode with SSC (SRIS) Integration of Retimer ECN This presentation focuses on 0.3 spec changes vs. the 3.0 spec and on items under discussion for potential changes in the 0.5 specification Transmitter Reference Clock Retimer Receiver Channel Copyright 2014, PCI-SIG, All Rights Reserved 5
Separate Reference Clocks with Independent SSC (SRIS) Copyright 2014, PCI-SIG, All Rights Reserved 6
Inexpensive Cabling = Independent Clock + Spread Spectrum Challenge: PCIe spec did not support independent clock with spread spectrum SATA cable does not include clock and is ~ $0.50 PCIe cables include reference clock, would increase cost > $1 for equivalent cable PCIe Base Spec 3.0 ECN approved 1) Requires use of larger elasticity buffer 2) Requires more frequent insertion of SKIP ordered set 3) Requires receiver changes (CDR). Does not change transmitter or reference clock requirements. 4) Second ECN updates Model CDRs Change will create a number of new form factor opportunities for PCIe SATA Express: Connector for PCIe SSD compatible with SATA Lower cost external cabled PCIe Introduce New Terms for Separate Refclk Modes of Operation 5600ppm (New - SRIS) and 600ppm (Existing - SRNS) Example of Possible PCIe x2 Cable Copyright 2014, PCI-SIG, All Rights Reserved 7
Model CDR Must Reject SSC Not Enough SSC Rejection 20 db/dec Copyright 2014, PCI-SIG, All Rights Reserved 8
4.0 Rev 0.3 SRIS/IR Model CDRs (First 3.x ECN CDR) 8 GT/s 16 GT/s ω n = ω 3dB = 2π * 2.09 MHz for 8 GT/s ω n = ω 3dB = 2π * 2.43 MHz for 16 GT/s ζ = 0.707 Kept 40 db/dec, critically damped model CDRs for 2.5/5 GT/s consistency There were concerns about TX and Ref Clock test Copyright 2014, PCI-SIG, All Rights Reserved 9
8GT/s Jitter Tolerance vs. CDR Model Gain Frequency (Hz) Copyright 2014, PCI-SIG, All Rights Reserved 10
H(s) in db New SRIS Model CDRs Frequency (Hz) Copyright 2014, PCI-SIG, All Rights Reserved 11
New Model CDR Xfer Functions The inverse of H(s) is lower than the SRIS JTOL spec for PCIe 3.0 and PCIe 4.0 Differentiable at all points Parameters: 8GT/s: 16GT/s: Copyright 2014, PCI-SIG, All Rights Reserved 12
Transmitter Copyright 2014, PCI-SIG, All Rights Reserved 13
Transmitter Specification Preset definition Retain P0-P10 with same definition as PCIe 3.0 at 8GT/s Package loss (ps21tx) Informative for root complex devices, normative for AIC devices Architecture Specific Post Processing Embedded vs. non-embedded, Common vs. Independent Refclk architectures Jitter parameters Applied uniformly for all four data rates Number of normative parameters reduced Informative parameters added Return Loss extended up to 8GHz Same limits as at 4.0 GHz T-coils likely required to meet limits Copyright 2014, PCI-SIG, All Rights Reserved 14
Return Loss (db) Return Loss (db) 50 MHz 1.25 GHz 2.5 GHz 4.0 GHz 8.0 GHz 50 MHz 1.25 GHz 2.5 GHz 4.0 GHz 8.0 GHz Tx/Rx RL Parameters Differential Return Loss Mask Common Mode Return Loss Mask A B C D A B C D A: 2.5, 5.0, 8.0 and 16 GT/s B: 5.0, 8.0 and 16 GT/s C: 8.0 and 16.0 GT/s D: 16 GT/s only A: 2.5, 5.0, 8.0 and 16 GT/s B: 5.0, 8.0 and 16 GT/s C: 8.0 and 16.0 GT/s D: 16 GT/s only Freq (GHz) Freq (GHz) Effective pad capacitance around 400 pf Copyright 2014, PCI-SIG, All Rights Reserved 15
CEM Spec Tx Path System Board System Board Tx Add-in Card TX AC Coupling Caps PCE Conne ctor RX RX TX Component Add-in Card Tx CEM Spec Defines Tx Requirements for Chip + Interconnect No Separate Tx Chip Or Interconnect Only Requirements. Copyright 2014, PCI-SIG, All Rights Reserved 16
Summary of Base vs. CEM Differences for Tx Testing CEM Tx Testing is at the end of the CEM reference channel Eye can already be closed at CEM connector with long channel motherboards Waveform-based test with reference equalizer application in postprocessing was chosen as the only option to asses overall Tx interoperability of channel plus silicon CEM Tx test is an eye test @ BER 10-12 after applying the reference equalizer No jitter decomposition beyond Rj/Dj due to end of channel reference point CEM Tx eye test only required to pass with best preset Copyright 2014, PCI-SIG, All Rights Reserved 17
Summary of Base vs. CEM Differences for Tx Testing Motherboard Tx test is done with real motherboard clock (not a clean/lab reference clock) Do not want to add cost/complexity and require a motherboard to provide method for external clock source Want a method that can test real, off-the-shelf motherboards Motherboard Tx test is done by sampling data lane under test and 100 MHz reference clock simultaneously (dual port methodology) Explore consistency between Base (4.0) and CEM (4.0) Specified/Recommended measurement/calibration method for eye after reference equalizer Study possible test/reference channel commonality Copyright 2014, PCI-SIG, All Rights Reserved 18
Reference Clock Copyright 2014, PCI-SIG, All Rights Reserved 19
Refclk Specifications Architecture independent parameters Architecture dependent parameters Common Clock (CC) and Independent Reference Clock (IR) filter functions IR with SSC (SRIS) defined in 3.0 ECN and 4.0 specification Explicit listing of all combinations of PLL and CDR limits that need to be evaluated Normative limits for CC Informative limits for IR (may be removed) A PHY may support one or more modes Copyright 2014, PCI-SIG, All Rights Reserved 20
Architecture Independent Parameters Note that T SSC_MAX_PHASE_SKEW is no longer defined There is now an explicit freq vs. amplitude mask for SSC profile phase jitter Symbol Description Limits Units Notes F REFCLK Refclk Frequency 99.97 (min), 100.03 (max) MHz F SSC SSC frequency range 30 (min), 33 (max) KHz T SSC-FREQ- DEVIATION T TRANSPORT-DELAY T SSC-MAX-FREQ- SLEW SSC deviation -0.5 (min), 0.0 (max) % Tx-Rx transport delay 12 (max) Nsec 1 Max SSC df/dt 1250 ppm/usec 2 Copyright 2014, PCI-SIG, All Rights Reserved 21
Jitter (ps pp) Low Frequency Reference Clock Jitter Limits (Mask) 30-33 KHz 25000 ps 100 KHz 1000 ps 500 KHz 25 ps Frequency (Hz) Copyright 2014, PCI-SIG, All Rights Reserved 22
Refclk Topologies Common Refclk Data In Tx Latch Tx PLL channel T 2 T = T 1 T 2 Ref clk RxEQ T 1 Rx Latch CDR Rx PLL Data Out H 3 (s) = H 1 (s) = H 2 (s) = Transfer Function: Jitter at Rx latch [H 1 (s)e -st = H 2 (s)]h 3 (s) X(s)[H 1 (s)e -st = H 2 (s)]h 3 (s) X(s)[H 2 (s)e -st = H 1 (s)]h 3 (s) Compute both and use larger of the two Data Clocked Tx latch Tx PLL channel RxEq Rx latch CDR Ref clk Refclk jitter = X(s) Transfer function: Jitter at Rx latch: H(s) = H 1 (s) * H 3 (s) J(s) = X(s) * H 1 (s) * H 3 (s) Copyright 2014, PCI-SIG, All Rights Reserved 23
IR Reference Clock Test (0.3) Tx latch channel RxEq Rx latch Refclk1 jitter = X 1 (s) Refclk2 jitter = X 2 (s) CDR Tx PLL Ref clk #1 Ref clk #2 Rx PLL Transfer Function: Jitter at Rx latch: H(s) = [H 1 (s) + H 2 (s)] * H 3 (s) J(s)= [X 1 (s)h 1 (s) + X 2 (s)h 2 (s)] * H 3 (s) Incorrect for Clock Test Options to fix X1(s)H1(s)*H3(s) <.7 ps (.5 ps in ECN) 8 GT/s (same as SRIS ECN) and 16GT/s Define reference worst case H 2 (s) Remove for IR/SRIS mode altogether Copyright 2014, PCI-SIG, All Rights Reserved 24
SRIS/IR Reference Clock Test Currently informative in 4.0 Rev 0.3 Pessimistic assumes worst case specification compliant model PLL transfer function Difficult to meet for current discrete clock chips even with improved model CDR Should 100 MHz frequency be required/implied for a SRIS/IR only implementation? Reference clock test not specified by other standards with similar PHY architectures USB 3.0, 3.1, SATA Current direction to remove altogether for SRIS/IR Mode Allow maximum implementation PLL/Transmitter trade-off Copyright 2014, PCI-SIG, All Rights Reserved 25
Receiver Copyright 2014, PCI-SIG, All Rights Reserved 26
Receiver Specification Stressed eye methodology applied to all data rates Stressed jitter and voltage as a single test Calibration channel defined by data rate dependent mask Tentative direction to make variable at 16GT/s Minimize Rj/Sj/DM variation across different set-ups Separate Root Complex and AIC behav pkg models Behavioral Rx equalization data rate dependent 2.5 and 5.0G: none 8.0 and 16.0G: CTLE and DFE (8G: 1 tap, 16G, 2 taps) Copyright 2014, PCI-SIG, All Rights Reserved 27
Calibrating Stressed Eye: Rev 0.3 Fixed TX EQ 8 GT/s PRBS Generator Combiner Calibration Channel Replica Channel Test Equipment TP1 TP2 Rj Source Sj Source Diff Interfer ence CM Interfer ence Post Processing Scripts: Rx pkg model Behaviorial CTLE/DFE Behavioral CDR EW Adjust EH Adjust TP2P 25 mv /.3 UI at E-12 BER Copyright 2014, PCI-SIG, All Rights Reserved 28
Calibrating Stressed Eye: Rev 0.5 Direction Fixed TX EQ Calibration Channel EH or EW Adjust 16 GT/s PRBS Generator Combiner Replica Channel Test Equipment TP1 CEM Connector TP2 Rj Source Sj Source Diff Interfer ence CM Interfer ence Post Processing Scripts: Rx pkg model Behaviorial CTLE/DFE Behavioral CDR Small EW Adjust Small EH Adjust TP2P 25 mv /.3 UI at E-12 BER Copyright 2014, PCI-SIG, All Rights Reserved 29
Channel and Connector Copyright 2014, PCI-SIG, All Rights Reserved 30
PCIe Connector/Card Enablers Mitigate conductor geometry that impairs performance in the PCIe connector at 16GT/s Any changes must preserve full compatibility among combinations of Gen 25, 5, 8, 16GT/s connectors and Add-in Card (AIC) Keep the standard thru-hole pinfield, for thru hole parts For surface mount, use a common footprint Plan to build and characterize a test board to validate and correlate models for the proposed performance enablers Copyright 2014, PCI-SIG, All Rights Reserved 31
1.30 Single Single Single Double Double 4.40 5.60 4.30 3.91 RESERVED GROUND TX0P TX0N GROUND PRESENT GROUND TX1P TX1N GROUND GROUND TX2P TX2N GROUND GROUND 7.85 2.00 2.25 Test Layout 0: Baseline The trace between the via and ground finger is not addressed in the CEM spec Ground traces are unconstrained in spec The length, width, and shape of the ground trace has been implementation specific Ground vias 10 mil Drill (0.254mm) 20 mil Pad FULL GROUND PLANE Ground traces are 10 mils wide after finger Connector body 2.2mm above point of contact, 7.85 mm above board edge The ground traces, above the ground finger, may be straight, like here or hockey-stick, etc. 0.70 1.00 For this baseline test, use these common PCIe 3.0 edge finger dimensions 2mm long, 0.508mm (20 mil) wide ground trace, as shown CHAMFER REGION Copyright 2014, PCI-SIG, All Rights Reserved 32
PCIe 4.0 Connector Experiments On the Add-in Card PCB: 0. Baseline Typical 8GT/s 1. Adjacent ground vias Raises resonant frequency 2. Join the ground edge fingers Reduces crosstalk and raises resonant freq 3. Narrow the ground fingers Improves overall insertion loss Ground finger resistive termination Baseboard & Connector changes: 1. Surface mount connector 2. Thru Hole with stub 3. Thru Hole with no stub 4. Thru Hole with via stub mitigation Reduces baseboard PCB via resonance 4. Ground finger resistive termination Suppresses all resonance 5. Place floating subsurface resonant structures beneath ground fingers Suppresses resonant insertion loss/crosstalk spikes Copyright 2014, PCI-SIG, All Rights Reserved 33
Combined Experiments - Simulation Dashed: Baseline Solid: Improved Improved Lane 0 1 Improved Lane 1 2 Improved Lane 0 2 Baseline 1 2 Tx0: 2x Single Grounds Tx1: 1x Double Ground 1x Single Ground Tx2: 2x Double Grounds Differential Insertion Loss Differential FEXT/NEXT Copyright 2014, PCI-SIG, All Rights Reserved 34
Combined Experiments - Simulation 8GHz Discontinuities Improved Lane 0 1 Improved Lane 1 2 Improved Lane 0 2 Baseline 1 2 Dashed: Baseline Solid: Improved Tx0: 2x Single Grounds Tx1: 1x Double Ground 1x Single Ground Tx2: 2x Double Grounds Differential Insertion Loss Differential FEXT/NEXT Copyright 2014, PCI-SIG, All Rights Reserved 35
Channel Simulation Random data Step response victim and aggressors P0/P1 Aggressor Center Alignment TxEQ Voltage step statistical convolution Time statistical convolution Locate and measure eye 2.5G, 5.0G T TX-CH-UPW-RJ T TX-CH-UPW-DJ T TX-CH-URJ T TX-CH-UDJDD Random data Step response victim and aggressors EQ search space and FoM PDA Equalizer Optimization and Aggressor Center Alignment A DC DFE TxEQ Apply CTLE Voltage step statistical convolution Time statistical convolution Locate and measure eye 8.0G, 16G T TX-CH-UPW-RJ T TX-CH-UPW-DJ T TX-CH-URJ T TX-CH-UDJDD Copyright 2014, PCI-SIG, All Rights Reserved 36
Channel Topologies Target max length PCIe 3.0 (8 Gb/s) server channel is ~20 with 1 or 2 connectors Pathfinding for 16GT/s shows ~12 with 1 connector Even with reduced channel length mitigation is required Improvements to the CEM connector/launch Clean up via transitions Minimize crosstalk Center channel impedance ~85Ω Retimer required for longer or 2 connector topologies Copyright 2014, PCI-SIG, All Rights Reserved 37
Channel Recommendations The CEM form factor is the most important usage model for PCI Express Can be extended by another generation with improvements to CEM connector launch Current CEM channels are electrically very complex beyond 8GT/s Discontinuities and crosstalk from packages, sockets, vias, etch, coupling capacitors, CEM connector Non-monotonic frequency domain behavior yields unpredictable data rate scaling To extend current infrastructure requires enabling SIG membership to design and build cleaner channels Tuning via launches, minimizing layer transitions, careful layer choices For longer reach channels, back-drilling, lower loss materials and repeaters will be required Copyright 2014, PCI-SIG, All Rights Reserved 38
Thank you for attending the. For more information please go to www.pcisig.com Copyright 2014, PCI-SIG, All Rights Reserved 39