PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems



Similar documents
PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express IO Virtualization Overview

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express and Storage. Ron Emerick, Sun Microsystems

PCI Express Impact on Storage Architectures and Future Data Centers

Storage Architectures. Ron Emerick, Oracle Corporation

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCI Express Supersedes SAS and SATA in Storage

UCS M-Series Modular Servers

PCI Express* Ethernet Networking

Transitioning to 6Gb/s SAS (Serial-Attached SCSI) A Dell White Paper

PCI Express Basics Ravi Budruk Senior Staff Engineer and Partner MindShare, Inc.

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

Data Center Convergence. Ahmad Zamer, Brocade

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

I/O Virtualization The Next Virtualization Frontier

3G Converged-NICs A Platform for Server I/O to Converged Networks

Introduction to PCI Express Positioning Information

Serial ATA in Servers and Networked Storage

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

VTrak SATA RAID Storage System

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Fibre Channel over Ethernet in the Data Center: An Introduction

How to build a high speed PCI Express bus expansion system using the Max Express product family 1

Open-E Data Storage Software and Intel Modular Server a certified virtualization solution

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

Serial ATA and Serial Attached SCSI technologies

PCI Express Overview. And, by the way, they need to do it in less time.

SAN Conceptual and Design Basics

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

SCSI Standards and Technology Update: SAS and SCSI Express

Next Generation Storage Networking for Next Generation Data Centers

Understanding PCI Bus, PCI-Express and In finiband Architecture

How PCI Express Works (by Tracy V. Wilson)

XMC Modules. XMC-6260-CC 10-Gigabit Ethernet Interface Module with Dual XAUI Ports. Description. Key Features & Benefits

10GBASE T for Broad 10_Gigabit Adoption in the Data Center

SAS: The Emerging Storage Fabric. Marty Czekalski President, SCSI Trade Association

Server and Storage Consolidation with iscsi Arrays. David Dale, NetApp

Oracle Virtual Networking Overview and Frequently Asked Questions March 26, 2013

Redundancy in enterprise storage networks using dual-domain SAS configurations

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

HPC Update: Engagement Model

HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

How To Build A Cisco Ukcsob420 M3 Blade Server

SMB Direct for SQL Server and Private Cloud

The proliferation of the raw processing

High Speed I/O Server Computing with InfiniBand

Oracle Big Data Appliance: Datacenter Network Integration

The Bus (PCI and PCI-Express)

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

SCSI vs. Fibre Channel White Paper

DD670, DD860, and DD890 Hardware Overview

Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000

APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS

Using High Availability Technologies Lesson 12

Serial ATA technology

FCoE Deployment in a Virtualized Data Center

Intel PCI and PCI Express*

Storage Networking Foundations Certification Workshop

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

Michael Kagan.

Optimizing Large Arrays with StoneFly Storage Concentrators

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

The ABC of Direct Attach Cables

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

FIBRE CHANNEL OVER ETHERNET

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

An Introduction to Storage Management. Raymond A. Clarke, Oracle

Fibre Channel over Ethernet: Enabling Server I/O Consolidation

CompTIA Storage+ Powered by SNIA

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

White Paper Solarflare High-Performance Computing (HPC) Applications

Solid State Storage in a Hard Disk Package. Brian McKean, LSI Corporation

Oracle Exalogic Elastic Cloud: Datacenter Network Integration

White Paper: M.2 SSDs: Aligned for Speed. Comparing SSD form factors, interfaces, and software support

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

PCI EXPRESS TECHNOLOGY. Jim Brewer, Dell Business and Technology Development Joe Sekel, Dell Server Architecture and Technology

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Computer buses and interfaces

Computer Systems Structure Input/Output

4 Channel 6-Port SATA 6Gb/s PCIe RAID Host Card

IDE/ATA Interface. Objectives. IDE Interface. IDE Interface

Unified Storage Networking

Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Cascade Profiler Fibre Channel SAN Attached Storage Installation Guide. Version 8.2 March 2009

Transcription:

PCI Express Impact on Storage Architectures Ron Emerick, Sun Microsystems

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract PCI Express Impact on Storage Architecture and Capabilities PCI Express Gen2 and Gen3, IO Virtualization, FCoE, SSD are here or coming soon. This session describes PCI Express, Single Root and Multi Root IO Virtualization and discusses the implications on FCoE, SSD and impacts of all these changes on storage connectivity, storage transfer rates. This tutorial will provide the attendee with: Basic knowledge of PCI Express Architecture, PCI Express Roadmap, System Root Complexes and IO Virtualization Expected Industry Roll Out of latest IO Technologies and required Root Complex capabilities Implications on FCoE, SSD and IO to Storage Connectivity Anticipated Impacts of these Technologies on Storage Environments IO Virtualization connectivity possibilities in the Data Center (via PCI Express) 3

Agenda IO Architectures PCI Changing to PCI Express PCI Express Tutorial New PCI Express based architectures How does PCI Express work IO Evolving Beyond the Motherboard Serial Interfaces InfiniBand, GbE & 10 GbE PCIe IO Virtualization Review of PCI Express IO Virtualization Impact of PCI Express on Storage 4

Typical PCI Implementation 5

Changing I/O Architecture PCI provides a solution to connect processor to IO Standard interface for peripherals HBA, NIC etc Many man years of code developed based on PCI Would like to keep this software investment Performance keeps pushing PCI speed Moved from 32bit/ 33Mhz to 64bit/ 66Mhz, then PCI-X introduced to reduce layout challenges PCI-X 133Mhz well established Problems at PCI-X 266Mhz with load and trace lengths Parallel interfaces gradually being replaced ATA to SATA (PATA is going away) SCSI to SAS Move parallel PCI to serial PCI Express 6

PCI Express Introduction PCI Express Architecture is a high performance, IO interconnect for peripherals in computing/ communication platforms Evolved from PCI and PCI-X TM Architectures Yet PCI Express architecture is significantly different from its predecessors PCI and PCI-X PCI Express is a serial point- to- point interconnect between two devices (4 pins per lane) Implements packet based protocol for information transfer Scalable performance based on the number of signal Lanes implemented on the interconnect 7

PCI Express Terminology PCI Express Device A T+ Signal... T- Link T+ R+ Wire T- R- Lane PCI Express Device B 8

PCIe What s A Lane Point to Point Connection Between Two PCIe Devices RX Device A R+ R- T+ TX T- Device B TX T+ T- R+ R- RX This Represents a Single Lane Using Two Pairs of Traces, TX of One to RX of the Other 9

PCIe Multiple Lanes Links, Lanes and Ports 4 Lane (x4) Connection Lane P Device A O R T P O Device B R T Link 10

PCI Express Overview Uses PCI constructs Same Memory, IO and Configuration Model Supports growth via speed increases Uses PCI Usage and Load/ Store Model Protects software investment Simple Serial, Point- to- Point Interconnect Simplifies layout and reduces costs Chip- to- Chip and Board-to-Board IO can exchange data System boards can exchange data Separate Receive and Transmit Lanes 50% of bandwidth in each direction 11

Express Module (EM) Developed by the PCI-SIG (Initially Server IO Modules) Fully compatible with latest PCI Express specification Designed to support future generations of PCI Express Adds the necessary Hot Plug hardware and software Commodity pricing model using standard PCI Express silicon and ½ size card PCIe EM Products available today providing: SAS Internal/ external 4 Gb FC External GbE External 10 GbE External IB External 12

Transaction Types Requests are translated to one of four types by the Transaction Layer: Memory Read or Memory Write Used to transfer data to or from a memory mapped location. Protocol also supports a locked memory read transaction variant. IO Read or IO Write Used to transfer data to or from an IO location These transactions are restricted to supporting legacy endpoint devices. 13

Transactions Types (cont) Requests can also be translated to: Configuration Read or Configuration Write: Used to discover device capabilities, program features, and check status in the 4KB PCI Express configuration space. Messages Handled like posted writes. Used for event signalling and general purpose messaging. Supports Message Signal Interrupt similar to PCI-X Encodes legacy PCI Interrupt signals (INT) within Message Transactions 14

PCI Express Throughput Link Width X1 X2 X4 X8 X16 X32 Gen1 0.5 1 2 4 8 16 (2004) Aggregate Gen2 BW 1 N/A 4 8 16 32 (2007) (Gbytes/s) Gen3 2 N/A 8 16 32 64 (2010) Assumes 2.5 GT/ s signalling for Gen1 Assumes 5 GT/ s signalling for Gen2 80% BW available due to 8 / 10 bit encoding overhead Assumes 8 GT/ s signalling for Gen3 Aggregate bandwidth implies simultaneous traffic in both directions Peak bandwidth is higher than any bus available 15

PCI-X vs PCI Express Throughput How does PCI-X compare to PCI Express? 16000 PCI-X vs PCI Express 14000 12000 10000 8000 6000 PCI X4 X8 X16 4000 2000 0 PCI-X PCI-X (DDR) PCI-X (QDR) PCIe Gen 1 PCIe Gen 2 PCIe Gen 3 PCI-X QDR maxs out at 4263 MB/ s per leaf PCIe x16 Gen1 maxs out at 4000 MB/ s 16

IO Bandwidth Needs 16000 PCI Express Bandwidth 14000 12000 Max MB/s Throughput 10000 8000 6000 40 Gb Ethernet, 40 Gb FcoE & EDR IB X1 X4 X8 X16 4000 16 Gb FC & QDR IB 2000 10 Gb Ethernet 8 Gb FC 0 Gen 1 Gen 2 Gen 3 PCI E 17

Sample PCI Express Topology

Sample PCI Express Topology CPU

Benefits of PCI Express Lane expansion to match need x1 Low Cost Simple Connector x4 or x8 PCIe Adapter Cards x16 PCIe High Performance Graphics Cards Point- to- Point Interconnect allows for: Extend PCIe via signal conditioners and repeaters Optical & Copper cabling to remote chassis External Graphics solutions External IO Expansion Infrastructure is in Place PCIe Gen2 Switches and Gen1-to-PCI-X Bridges Signal Conditioners, Extend PCIe 5M via Copper, 30M via Optical 20

PCI Express In Industry Gen 1 First Release of Slots in 2005 Gen 2 Shipped Q4 CY2007 First (x16 Gen2) slots shipped in Q4 CY2007 Adoption is slower than Gen1 Systems Currently Shipping Desktops with multiple x16 connectors Servers with multiple x4 and x8 connectors Cards Available Gen 1 x4, x8 cards - 10 GbE, Dual/Quad GbE, 4/8 Gb FC, SAS, IB Gen 2 x4, x8 cards Dual 10 GbE, 8 Gb FC, QDR IB, SAS 2 (soon), QGE, Multi-protocols cards, 2D Graphics, FCoE (soon) All the above are or will be available in EM (Express Module) form factor 21

Recent PCI Express Changes Power increase for Graphics Cards to 300 Watts Lanes can be grouped 1x, 4x, 8x, 16x and 32x supported, x2 no longer supported Must support all groupings lower than your width Performance roadmap Gen 2.0 Doubled to 5Gbits/ sec (DDR) with 8 / 10bit encoding Gen 3.0 Doubles again to 8Gbits/sec (no 8 / 10bit encoding) External expansion Copper connector and connector specified Geneseo enhancements to PCIe 2.0 Standard for co-processors, accelerators Encryption, visualization, mathematical modelling PCIe IO Virtualization (SR/ MR IOV) Architecture allows shared bandwidth 22

Evolving System Architectures Processor speed increase slowing Replaced by Multi-core Processors Quad-core here, 8 and 16 core coming Requires new root complex architectures Requires high speed interface for interconnect Minimum 10Gb data rates Must support backplane distances Bladed systems Single box clustered processors Need backplane reach, cost effective interface to IO Interface speeds are increasing Ethernet moving from GbE to 10G, FC from 4 Gb to 8 Gb, Infiniband from DDR to QDR / EDR Single applications struggle to fill these links Requires applications to share these links 23

Drivers for New IO Architectures High Availability Increasing in Importance Requires duplicated processors, IO modules and interconnect Use of shared virtual IO simplifies and reduces costs and power Shared IO support N+1 redundancy for IO, power and cooling Remotely re-configurable solutions can help reduce operating cost Hot plug of cards and cables provide ease of maintenance PCI Express Modules with IOV enable this Growth in backplane connected blades and clusters Blade centres from multiple vendors Storage and server clusters Storage Bridge Bay hot plug processor module PCI Express IOV allows commodity I/O to be used Options Use an existing IO interface like 10GbE/ Infiniband Enhance PCI Express Find a New IO interface 24

Existing Serial Interfaces Established external transport mechanisms exist Fibre channel - 4 Gb, 8Gb Storage area network standard Ethernet 10Gb, 40Gb iscsi/tcp/ip/ethernet provide a network based solution to SANs InfiniBand - DDR, QDR, EDR Choice for high speed process to processor links Supports wide and fast data channels SAS 1.0, 2.0 (3 Gb, 6 Gb) Serial version of SCSI offers low cost solution No need to add to these yet another solution PCI Express is not intended to replace these But backplane IO must support these bandwidths 25

10G Ethernet or InfiniBand? 10G Ethernet + Provides a network based solution to SANs - Requires Ethernet Switch in Box - QoS mechanisms required - Direct interface to root complex - Low overhead stack + Fiber Channel over Ethernet InfiniBand Check out SNIA Tutorials: FcoE: Fiber Channel Over Ethernet Fabric Consolidation with InfiniBand Ethernet Enhancements for Storage: Deploying FCoE + Higher speed capability (4X QDR 40 Gb/s, EDR 80 Gb/s) + Established stack + Choice for high speed, low latency processor to processor links - Requires IB Fabric - Requires all protocols to move to IB based via TCAs 26

Backplane Serial Interfaces Inside the box solution today Mid-plane or center-plane communication Possible solutions are 10 Gb Ethernet InfiniBand PCI Express Will it be 10GbE, InfiniBand, PCIe IOV? Root In the Architectures are All PCIe, SR/MR IOV works with Root Complex, 27

Or PCI Express Requires IO Virtualization SR Single Root MR Multi Root Based Upon PCI SIG Standards 28

Single Root IOV Better IO Virtualization for Virtual Machines

System I/O with a Hypervisor Memory Application issues a Read Application Access Data Memory Map Data Translate User address to Guest OS Memory Address Build I/O Request to Virtual Device Guest Operating System Complete I/O Memory Map Translate OS address to PCI Memory Address Rebuild I/O Request to Real Device Hypervisor Fake Virtual Device Completion & Interrupt Guest Complete Real I/O PCI Device Function Interrupt Move Data into Main Memory

Single Root IOV Before Single Root IOV the Hypervisor was responsible for creating virtual IO adapters for a Virtual Machine This can greatly impact Performance Especially Ethernet but also Storage (FC & SAS) Single Root IOV pushes much of the SW overhead into the IO adapter Remove Hypervisor from IO Performance Path Leads to Improved Performance for Guest OS applications

PCI-SIG Single Root VM1 VM2 VF1 VF1 VF2 VF2 PF Hypervisor CPU (s) PF Root Complex VF1 VF VF2 PF VF1 VF VF2 PF Fibre Channel Ethernet

Fibre Channel and SR Virtualization Guest OS Hypervisor SR/VF Device Driver SR Adapter Specific Driver (fabric aware) I/Os go directly to adapter via VF PF VF SR Adapter Hypervisor configures VF via PF FC Fabric Fabric is visible to Fibre Channel LUNs are seen as LUNs to Guest OS Device Driver

MultiRoot IOV Virtualizing an IO adapter for multiple s

It Started With Pass Thru Adaptor Adaptor Adaptor Back Plane Chassis Back Panel Fabric Switch Adaptor

Chassis Switching to Aggregation Adaptor Adaptor Adaptor Adaptor Back Plane Aggregator / Chassis Switch Fabric Switch

Stretch the PCIe Bus Across the Backplane Back Plane Adaptor Adaptor Chassis Switch Fabric Switch Adaptor Adaptor Chassis Switch Fabric Switch

Merge the Adapter and Aggregator Adaptor Back Plane Adaptr Adaptor Shared Adapter Fabric Switch Adaptor

Insert a shared PCIe Fabric and add Multiprotocol Back Plane Shared PCIe Fabric Shared Adapter Shared Adapter Shared Adapter Fabric Switch 1 Fabric Switch Fabric Switch 2

In a Blade Chassis Chassis Management Switch Bay Back Plane Shared PCIe Fabric Shared Adapter Shared Adapter Fabric Channel Switch Ethernet Switch

Mutli Root Virtualization in Blades Applications Virtualization & Service Provisioning Layer 1U & Blade Servers Systems Storage Systems Storage IO Blades Scalable Interconnect Switch Shared Network IO Blades SAN LAN MAN WAN Hardware Accelerators 41

In a Rack Ethernet Fabric I/O Module I/O Module Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server Rack Server FC Fabric PCIe Cable

Impact / Benefit to Storage PCI Express provides Full Bandwidth Dual Ported 4 & 8 Gb FC Full Bandwidth Dual Ported 10 GbE/iSCSI/FCoE Full Bandwidth for QDR and EDR IB Full Bandwidth SAS 1.0 & 2.0 Legacy Support via PCI-X Access to SSS via PCIe IOV takes it one step further Ability for System Images to Share IO across OS Images Backplane for Bladed Environments Extension of PCIe Possible PCIe attached storage devices 43

PCIe and SSS Device on Card Memory Memory Memory System CPU PCIe to SAS Disk Controller PCIe Slot SCSI Command Set on JBOD or Raid SAS Disk Controller RDMA Over PCIe Card CPU to System Memory System CPU to SSS Device PCIe Encapsulation is Stripped Off Available soon, from multiple vendors Check out SNIA Tutorials: Solid State Storage in a Hard Disk Package Solid State Storage Reliability and Data Integrity 44

Future Storage Attach Model PCIe Slot System 100 M PCIe Optical Cable SCSI Command Set on Encapsulated for PCIe PCIe from Slot Over Card with Signal Conditioner Across 100 Meter Optical Cable Into PCIe Slot in Disk Subsystem Disk Controller PCIe ASIC PCIe Encapsulation is Stripped Off Raid Controller to Correct Protocol to Disk (Conceptual Only at this time) Disk Subsystem PCIe Slot Disk Controller ASIC PCIe to FC or SAS/Sata FC Drive SATA/ SAS Drive 45

Glossary of Terms PCI Peripheral Component Interconnect. An open, versatile IO technology. Speeds range from 33 Mhz to 266 Mhz, with pay loads of 32 and 64 bit. Theoretical data transfer rates from 133 MB/ s to 2131 MB/ s. PCI-SIG - Peripheral Component Interconnect Special Interest Group, organized in 1992 as a body of key industry players united in the goal of developing and promoting the PCI specification. IB InfiniBand, a specification defined by the InfiniBand Trade Association that describes a channel-based, switched fabric architecture. 46

Glossary of Terms Root complex the head of the connection from the PCI Express IO system to the CPU and memory. HBA Bus Adapter. IOV IO Virtualization Single root complex IOV Sharing an IO resource between multiple operating systems on a HW Domain Multi root complex IOV Sharing an IO resource between multiple operating systems on multiple HW Domains VF Virtual Function PF Physical Function 47

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: tracknetworking@snia.org Many thanks to the following individuals for their contributions to this tutorial. SNIA Education Committee Howard Goldstein Alex Nicolson Rob Peglar Joe White 48