PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick Oracle Corporation

Similar documents
PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures and Future Data Centers

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express IO Virtualization Overview

Storage Architectures. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

PCI Express and Storage. Ron Emerick, Sun Microsystems

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCI Express Supersedes SAS and SATA in Storage

UCS M-Series Modular Servers

PCI Express Overview. And, by the way, they need to do it in less time.

VTrak SATA RAID Storage System

PCI Express Basics Ravi Budruk Senior Staff Engineer and Partner MindShare, Inc.

PCI Express* Ethernet Networking

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

How to build a high speed PCI Express bus expansion system using the Max Express product family 1

How To Build A Cisco Ukcsob420 M3 Blade Server

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

The ABC of Direct Attach Cables

3G Converged-NICs A Platform for Server I/O to Converged Networks

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Transitioning to 6Gb/s SAS (Serial-Attached SCSI) A Dell White Paper

Serial ATA and Serial Attached SCSI technologies

SATA Evolves SATA Specification v3.2 FMS 2013

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Next Generation Storage Networking for Next Generation Data Centers

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Serial ATA in Servers and Networked Storage

XMC Modules. XMC-6260-CC 10-Gigabit Ethernet Interface Module with Dual XAUI Ports. Description. Key Features & Benefits

Intel PCI and PCI Express*

Unified Storage Networking

Headline in Arial Bold 30pt. The Need For Speed. Rick Reid Principal Engineer SGI

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

SAS: The Emerging Storage Fabric. Marty Czekalski President, SCSI Trade Association

Introduction to PCI Express Positioning Information

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

I/O Virtualization The Next Virtualization Frontier

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

10GBASE T for Broad 10_Gigabit Adoption in the Data Center

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

Computer buses and interfaces

HPC Update: Engagement Model

How To Design A Data Centre

StorageBox High Performance NVMe JBOF

Server Bandwidth Scenarios Signposts for 40G/100G Server Connections

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

SCSI vs. Fibre Channel White Paper

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

Michael Kagan.

Cisco Unified Computing System Hardware

Open-E Data Storage Software and Intel Modular Server a certified virtualization solution

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

SCSI Standards and Technology Update: SAS and SCSI Express

SMB Direct for SQL Server and Private Cloud

SUN DUAL PORT 10GBase-T ETHERNET NETWORKING CARDS

HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN

2 Port PCI Express 2.0 SATA III 6Gbps RAID Controller Card w/ 2 msata Slots and HyperDuo SSD Tiering

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

ethernet alliance Overview of Requirements and Applications for 40 Gigabit and 100 Gigabit Ethernet Version 1.0 August 2007 Authors:

How To Get 10Gbe (10Gbem) In Your Data Center

White Paper: M.2 SSDs: Aligned for Speed. Comparing SSD form factors, interfaces, and software support

Motherboard- based Servers versus ATCA- based Servers

Fibre Channel over Ethernet in the Data Center: An Introduction

The Bus (PCI and PCI-Express)

Optimizing Large Arrays with StoneFly Storage Concentrators

over Ethernet (FCoE) Dennis Martin President, Demartek

præsentation oktober 2011

TS500-E5. Configuration Guide

PCI EXPRESS TECHNOLOGY. Jim Brewer, Dell Business and Technology Development Joe Sekel, Dell Server Architecture and Technology

Understanding PCI Bus, PCI-Express and In finiband Architecture

Frequently Asked Questions: EMC UnityVSA

VMware vsphere Design. 2nd Edition

FCoE Deployment in a Virtualized Data Center

Oracle Virtual Networking Overview and Frequently Asked Questions March 26, 2013

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Chapter 5 Cubix XP4 Blade Server

CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers

Benefits of Networked Storage: iscsi & Fibre Channel SANs. David Dale, NetApp

Computer Systems Structure Input/Output

Overview of Requirements and Applications for 40 Gigabit and 100 Gigabit Ethernet

Configuration Maximums VMware Infrastructure 3

Tipos de Máquinas. 1. Overview x3550 M2. Features X550M2

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

New Trends Make 10 Gigabit Ethernet the Data-Center Performance Choice

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

SAN Conceptual and Design Basics

Building a Scalable Storage with InfiniBand

Fibre Forward - Why Storage Infrastructures Should Be Built With Fibre Channel

PCI EXPRESS: AN OVERVIEW OF PCI EXPRESS, CABLED PCI EXPRESS, AND PXI EXPRESS

Serial ATA technology

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

EMC VMAX3 FAMILY VMAX 100K, 200K, 400K

Transcription:

PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick Oracle Corporation

Abstract PCI Express Impact on Storage Architectures and Future Data Centers PCI Express Gen2 and Gen3, IO Virtualization, FCoE, SSD are here or coming soon. This session describes PCI Express, Single Root and Multi Root IOV and the implications on FCoE, SSD and impacts of all these changes on storage connectivity, storage transfer rates. The potential implications to the Storage Industry and Data Center Infrastructure will also be discussed. This tutorial will provide the attendee with: Knowledge of PCIe Architecture, PCIe Roadmap, System Root Complexes and IO Virtualization Expected Industry Roll Out of latest IO Technology and required Root Complex capabilities Implications and Impacts of FCoE, SSD and IO to storage Connectivity ( PCIe IO Virtualization connectivity possibilities in the Data Center (via 2

Agenda IO Architectures PCI Express is Here to Stay PCI Express Tutorial New PCI Express based architectures How does PCI Express work IO Evolving Beyond the Motherboard Serial Interfaces InfniBand, 10 GbE, 40 GbE, 100 GbE PCIe IO Virtualization Review of PCI Express IO Virtualization Impact of PCI Express on Storage 3

I/O Architecture PCI provides a solution to connect processor to IO Standard interface for peripherals HBA, NIC etc Many man years of code developed based on PCI Would like to keep this software investment Performance keeps pushing IO interface speed PCI/PCI-X 33 Mhz, 66 Mhz to 133 Mhz PCI-X at 266 Mhz released Problems at PCI-X 512 Mhz with load and trace length Parallel interfaces are almost all replaced ATA/PATA to SATA SCSI to SAS (UWDIS may fnally be gone) Move parallel PCI has migrated to serial PCI Express 4

PCI Express Introduction PCI Express Architecture is a high performance, IO interconnect for peripherals in computing/ communication platforms Evolved from PCI and PCI-XTM Architectures Yet PCI Express architecture is signifcantly different from its predecessors PCI and PCI-X PCI Express is a serial point- to- point interconnect ( lane between two devices (4 pins per Implements packet based protocol for information transfer Scalable performance based on the number of signal Lanes implemented on the interconnect 5

PCI Express Overview Uses PCI constructs Same Memory, IO and Confguration Model Identifed via BIOS, UEFI, OBP Supports growth via speed increases Uses PCI Usage and Load/ Store Model Protects software investment Simple Serial, Point- to- Point Interconnect Simplifes layout and reduces costs Chip- to- Chip and Board-to-Board IO can exchange data System boards can exchange data Separate Receive and Transmit Lanes 50% of bandwidth in each direction 6

PCIe What s A Lane Point to Point Connection Between Two PCIe Devices RX Device A R+ R- T+ TX T- Device B TX T+ T- R+ R- RX This Represents a Single Lane Using Two Pairs of Traces, TX of One to RX of the Other 7

PCIe Multiple Lanes Links, Lanes and Ports 4 Lane (x4) Connection Lane Device A P O R T P O Device B R T Link 8

PCI Express Terminology PCI Express Device A (Root Complex Port) T+ Signal... T- Link T+ R+ Wire T- R- Lane PCI Express Device B (Card in a Slot) 9

Transaction Types Requests are translated to one of four types by the Transaction Layer: Memory Read or Memory Write Used to transfer data to or from a memory mapped location. Protocol also supports a locked memory read transaction variant. IO Read or IO Write Used to transfer data to or from an IO location These transactions are restricted to supporting legacy endpoint devices. 10

( cont ) Transactions Types Requests can also be translated to: Confguration Read or Confguration Write: Used to discover device capabilities, program features, and check status in the 4KB PCI Express confguration space. Messages Handled like posted writes. Used for event signalling and general purpose messaging. 11

PCI Express Throughput Link Width X1 X2 X4 X8 X16 X32 Gen1 0.5 1 2 4 8 16 (2004) Aggregate Gen2 BW 1 N/A 4 8 16 32 (2007) (Gbytes/s) Gen3 2 N/A 8 16 32 64 (2010) Assumes 2.5 GT/sec signalling for Gen1 Assumes 5 GT/sec signalling for Gen2 80% BW available due to 8 / 10 bit encoding overhead Assumes 8 GT/sec signalling for Gen3 Aggregate bandwidth implies simultaneous traffc in both directions Peak bandwidth is higher than any bus available 12

PCI-X vs PCIe Throughput How does PCI-X compare to PCI Express? 16000 14000 12000 PCI-X vs PCI Express 10000 8000 6000 PCI X4 X8 X16 4000 2000 0 PCI-X PCI-X (DDR) PCI-X (QDR) PCIe Gen 1 PCIe Gen 2 PCIe Gen 3 PCI-X QDR maxs out at 4263 MB/s per leaf PCIe x16 Gen1 maxs out at 4000 MB/s PCIe x16 Gen3 maxs out at 16000 MB/s 13

IO Bandwidth Needs 16000 PCI Express Bandwidth 14000 12000 Max MB/s Throughput 10000 8000 6000 40 Gb Ethernet, 40 Gb FcoE & EDR IB X1 X4 X8 X16 4000 16 Gb FC & QDR IB 2000 10 Gb Ethernet 8 Gb FC 0 Gen 1 Gen 2 Gen 3 PCI E 14

Sample PCI Express Topology 15

Benefts of PCI Express Lane expansion to match need x1 Low Cost Simple Connector x4 or x8 PCIe Adapter Cards x16 PCIe High Performance Graphics Cards Point- to- Point Interconnect allows for: Extend PCIe via signal conditioners and repeaters Optical & Copper cabling to remote chassis External Graphics solutions External IO Expansion Infrastructure is in Place PCIe Switches and Bridges Signal Conditioners 16

PCI Express at the SIG (Gen 1) PCIe Gen 1.1 Approved 2004/2005 Frequency of 2.5 GT/s per Lane Full Duplex (FD) Use 8/10 Bit Encoding => 250 MB/s/lane (FD) 2.5 GT @ 1 bit/t * 8/10 encoding / 8 bit/byte = 250 MB/s FD PCIe Overhead of 20% yields 200 MB/s/lane (FD) Replace PCI-X DDR and QDR Roadmap Defned Switches, Bridges and Devices x16 High Performance Graphics @ 50W (then 75W) x8, x4, x1 Connectors (x8 is pronounced as by 8) Support your lane width and all lower lane widths Defned Express Module 17

PCI Express In Industry PCIe Gen 2.0 Shipped in 2008 Approved 2007 Frequency of 5.0 GT/s per Lane Doubled the Theoretical BW to 500 MB/s/lane 4 GB per x8 Still used 8/10 bit encoding Support for Genesco features added (details later) Power for x16 increased to 225W Desktop systems started Gen2 x16 slots in Q4 2007 Servers shipping slots 2009 Cards Available x4, x8 cards Single/Dual 10 GbE, Dual/Quad GbE, Single/Dual 10 Gb CNA, Single/Dual 4/8 Gb FC, SAS 2.0, IB QDR, Serial Cards, Other Special Cards x16 High Performance Graphics @ 150 W and more Old PCI technology behind PCIe-PCIX bridge 18

Current PCI Express Activities PCIe Gen 3.0 Base Spec currently at 1.0 (as of Dec 2010) Frequency of 8.0 GT/s per Lane Uses 128/130 bit encoding / scrambling Nearly Doubled the Theoretical BW to 1000 MB/s/lane Support for Genesco features included Standard for Co-processors, Accelerators, Encryption, Visualization, Mathematical Modeling, Tunneling Power for x16 increased to 300W (250 W via additional connector) Express Module Spec is being upgraded to Gen2 then to Gen3 Ron Emerick is the current chair of the EM working group External expansion Cable work group is active PCIe IO Virtualization (SR / MR IOV) 19 Architecture allows shared bandwidth

PCI Express In Industry PCIe Gen 3 Will Ship in 2011 Desktop Systems x16 High Performance Graphics Servers with multiple x4 and x8 connectors in 2012 Root Complex's will provide multiple x8, less need for switches First Gen3 Cards Available 2011/2012 Dual 10 Gb CNAs/FC boards (multi personality) SAS 2.0 16 port, SAS 3.0 8/16 port x16 High Performance Graphics @ 300 W and more FDR/EDR InfniBand Dual/Quad 10 Gbase-T and Optical Single/Dual 40 Gb, FCoE, iscsi, NAS 20

New IO Interfaces High Speed / High Bandwidth Fibre channel 8 Gb, 16 Gb, 32 Gb, FCoE Storage area network standard Ethernet 10Gb, 40Gb, 100Gb Provides a network based solution to SANs InfniBand - QDR, FDR, EDR Choice for high speed process to processor links Supports wide and fast data channels ( Gb SAS 2.0, 3.0 (6 Gb, 12 SSDs Serial version of SCSI offers low cost storage solution Solid State Disk Drive Formfactor Solid State PCIe Cards Solid State 1ru Trays of Flash 21

Evolving System Architectures Processor speed increase slowing - Replaced by Multi-core Processors Quad-core here, 8 and 16 core coming Requires new root complex architectures Requires high speed interface for interconnect Minimum 10Gb data rates, moving higher Must support backplane distances Bladed systems Single box clustered processors Need backplane reach, cost effective interface to IO Interface speeds are increasing Ethernet moving from GbE to 10G, FC from 8 Gb to 16 Gb, Infniband is now QDR with FDR and EDR coming Single applications struggle to fll these links Requires applications to share these links 22

Drivers for New IO Architectures High Availability Increasing in Importance Requires duplicated processors, IO modules and interconnect Use of shared virtual IO simplifes and reduces costs and power Shared IO support N+1 redundancy for IO, power and cooling Remotely re-confgurable solutions can help reduce operating cost Hot plug of cards and cables provide ease of maintenance PCI Express Modules with IOV enable this Growth in backplane connected blades and clusters Blade centres from multiple vendors Storage and server clusters Storage Bridge Bay hot plug processor module PCI Express IOV allows commodity I/O to be used 23

Share the IO Components PCIe IOV Provides this Sharing Root Complexes are PCIe Closer to CPU than 10 GbE or IB Requires Root Complex SW Modifcations Based Upon PCI SIG Standards Allows the Sharing of High Bandwidth, High Speed IO Devices 24

Single Root IOV Better IO Virtualization for Virtual Machines

System I/O with a Hypervisor Memory Application issues a Read Application Access Data Memory Map Data Translate User address to Guest OS Memory Address Build I/O Request to Virtual Device Guest Operating System Complete I/O Memory Map Translate OS address to PCI Memory Address Rebuild I/O Request to Real Device Hypervisor Fake Virtual Device Completion & Interrupt Guest Complete Real I/O PCI Device Function Interrupt Host Move Data into Main Memory

Single Root IOV Before Single Root IOV the Hypervisor was responsible for creating virtual IO adapters for a Virtual Machine This can greatly impact Performance ( SAS Especially Ethernet but also Storage (FC & Single Root IOV pushes much of the SW overhead into the IO adapter Remove Hypervisor from IO Performance Path Leads to Improved Performance for Guest OS applications

PCI-SIG Single Root VM1 VM2 VF1 VF1 VF2 VF2 PF Hypervisor PF ( s ) CPU Root Complex VF1 VF VF2 PF VF1 VF VF2 PF Fibre Channel Ethernet

Fibre Channel & SR Virtualization Guest OS Hypervisor SR/VF Device Driver ( aware SR Adapter Specifc Driver (fabric I/Os go directly to adapter via VF PF VF SR Adapter Hypervisor confgures VF via PF FC Fabric Fabric is visible to Host Fibre Channel LUNs are seen as LUNs to Guest OS Device Driver

Roll Out of IOV Blade Chassis are First to Roll Out SR IOV Limited IO Slots Space Constraints Discouraged by OS Uniqueness Servers in 2010 SR IOV ( SAS Especially Ethernet but also Storage (FC & MR IOV No Offerings Yet Great for Blades Sharing High Speed/Bandwidth Ports Each OS must work with IOV Management Layer

Impact / Beneft to Storage PCI Express provides Full Bandwidth Dual Ported 8 & 16 Gb FC Full Bandwidth for QDR, FDR and EDR IB Full Bandwidth SAS 2.0 & 3.0 Legacy Support via PCI-X IOV takes it one step further Ability for System Images to Share IO across OS Images Backplane for Bladed Environments Extension of PCIe Possible PCIe attached storage devices 31

What Your Next Data Center Might Look Like

Data Center in 2013-2016 Root complexes are PCIe 3.0 Integrated into CPU Multiple Gen3 x8 from each socket Multicast and Tunneling PCIe Gen4 in 2016 and beyond Networking Dual ported Optical 40 GbE (capable of FCoE, iscsi, NAS) 100 GbE Switch Backbones by 2018 Quad 10 Gbase-T and Quad MMF (single ASIC) Dual/Quad Legacy GbE Copper and Optical Dual ported FDR & EDR InfniBand for cluster, some storage Graphics x16 Single/Dual ported Graphics cards @ 300 W (when needed) 33

Data Center in 2013-2016 (2) Storage Access SAS 3.0 HBAs, 8 and 16 port IOC/ROC 16/32 Gb FC HBAs pluggable optics for single/dual port Multi-function FC & CNA (converged network adapters) at 16/32 Gb FC and 40 Gb FCoE Storage will be: Solid State Storage SSS PCIe Cards, 1 ru trays of FLASH DIMMS SSS in 2.5 and 3.5 drive formfactor following all current disk drive support models 2.5 and 3.5 10K RPM SAS (capacities up to 1 to 2 TB) 2.5 and 3.5 SATA 2.0 Drives (capacities 500 GB to 4 TB) SAS 3.0 Disk Arrays Front Ends with above drives 16/32 Gb FC Disk Arrays with above drives FDR/EDR IB Storage Heads with above drives 34

Future Storage Attach Model PCIe Slot Host System 100 M PCIe Optical Cable SCSI Command Set on Host Encapsulated for PCIe PCIe from Host Slot Over Card with Signal Conditioner Across 100 Meter Optical Cable Into PCIe Slot in Disk Subsystem Disk Controller PCIe ASIC PCIe Encapsulation is Stripped Off Raid Controller to Correct Protocol to Disk ( time (Conceptual Only at this Disk Subsystem PCIe Slot Disk Controller ASIC PCIe to FC or SAS/Sata FC Drive SATA/ SAS Drive 35

Glossary of Terms PCI Peripheral Component Interconnect. An open, versatile IO technology. Speeds range from 33 Mhz to 266 Mhz, with pay loads of 32 and 64 bit. Theoretical data transfer rates from 133 MB/ s to 2131 MB/ s. PCI-SIG - Peripheral Component Interconnect Special Interest Group, organized in 1992 as a body of key industry players united in the goal of developing and promoting the PCI specifcation. IB InfniBand, a specifcation defned by the InfniBand Trade Association that describes a channel-based, switched fabric architecture. 36

Glossary of Terms Root complex the head of the connection from the PCI Express IO system to the CPU and memory. HBA Host Bus Adapter. IOV IO Virtualization Single root complex IOV Sharing an IO resource between multiple operating systems on a HW Domain Multi root complex IOV Sharing an IO resource between multiple operating systems on multiple HW Domains VF Virtual Function PF Physical Function 37