The Future of Cloud Networking Idris T. Vasi
Cloud Computing and Cloud Networking
What is Cloud Computing? An emerging computing paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices over the internet. Businesses, from startups to enterprises 4+ billion phones by 2010 Web 2.0- enabled PCs, TVs, etc. Source: IBM
The platform shift means simply, freedom Freedom of Religion Freedom of Location Freedom from Vendor Lock-In Freedom of Motion Freedom of Speech
How Big is Cloud Computing? $42B Estimated size of the Cloud Computing Infrastructure market in 2012, up from $16B in 2008, IDC October 2008
Projected Cloud Spending (IDC 2008) Year 2008 2012 Growth Cloud IT Spending $16B $42B 27% Total IT Spending $383B $494B 7% Total Cloud Spend $367B $452B 4% Cloud / Total Spend 4% 9% Cloud Spending is growing 6X faster than traditional IT spending
The Cloud Evolution Classical Enterprise Datacenter Networks Cloud Computing Datacenter Networks Number of Servers Network Throughput 100s to 1,000s 10,000s to 100,000s Gigabits to Terabit/sec Multi Terabits to 100 Terabits/sec Network Services Network Management Expensive Add-ons Distributed, Box Based Integrated/Extensible Orchestrated
What is Cloud Networking? The networking infrastructure required for cloud computing that is faster, simpler and more cost effective than existing conventional enterprise network architectures. Scalability Self Healing Resilience Low Latency Cloud Networking Guaranteed Delivery Extensible Management
price/performance High Performance Networking at attractive price points Ultra-Low Latency Architectures High Performance Aggregation Wire-Rate is Easier to Manage
Cloud Datacenter Network Requirements Requirements 10,000s of Servers Non-Blocking Network 1/10 GigE to Server 10/40/100 GigE Core L2 (edge), L3 (core) 24x7 Availability Power Efficiency Cost Effectiveness Network Throughput Drives Overall Server Utilization
Market Overview
The Changing Server Mix in DataCenters % of X86 Servers used for Web, HPC, and Enterprise
10 GbE Market Overview Ports (000) ASP ($) Source: Dell Oro
10 GbE Market Growth Drivers 10 GbE server attachment will grow dramatically in the next 24 months due to: Applications demanding lower latency and more throughput Growth in highperformance/ technical computing Storage over Ethernet (NFS, iscsi & FCoE) 10GbE optimized next gen server CPUs 10GbE on next generation server motherboards Historically, the inflection point for adopting the next generation Ethernet speed was 2-3X the price per port of previous speed
10 GbE Market Summary 2009 10GbE Becoming Cost-Effective 2-3X the cost of 1 GbE, 1/10th the cost of today 10 GbE NICs on server motherboards 10 GbE is a Substitution Technology Completely transparent to existing applications No new protocol or network management Rapid Market Adoption in 2009 Major barrier has been price per port Driving Force: Latency and Throughput Large websites, datacenters and HPC labs have started to deploy 10GbE In 2009, 10 GbE will become the standard server interconnect
Data Center Trends
Today s Data Center Challenges Space Management Power Consumption Cooling Maintenance Windows SLA Obligations Equipment Costs Traffic Delays Human Error
Rack-Server Aggregation Requirements Modular 10 GbE Switch Modular 10 GbE Switch Modular 10 GbE Switch Rack-Top Switch 24-48 Ports Typically 20-40 servers per rack 4 to 8 uplinks Copper cables in rack Optics to core switches High Availability Redundant power/cooling Load sharing redundant topology Reliable switch OS Scalability High Throughput Low Latency
Rack Aggregation Requirements Modular 10 GbE Switch Modular 10 GbE Switch Modular 10 GbE Switch Rack-Top Switch 24-48 Ports Typically 8-16 blades per chassis 2-3 blade chassis per rack 4 to 8 uplinks Optics to core switches High Availability Redundant power/cooling Load sharing redundant topology Reliable switch OS Scalability High Throughput Low Latency
Scalable Network Designs
The New Design Shift to Private & Public Clouds Enterprise Datacenter Virtualized Computing Cloud Computing Consolidate Servers & Storage Increased Server Utilization Multi cores Increase Server Computing Cycles Oversubscription 40:1 100 Microseconds Latency Oversubscription 20:1 20-50 microsecond Latency Non-blocking 1:1 to 5:1 1-5 microsecond Latency
Oversubscription & Latency in Legacy Designs Typical Oversubscription 4:1 4:1 4:1 Total Oversubscription is 4 x 4 x 4 = 64:1!! Intra-rack latency is 8us Inter-rack latency is 56 to 136us Typical Latency 40us 40us 5-8us Legacy Design 3 tiers - Top-of-Rack Switch -L2/L3 Aggregation layer - Core layer Oversubscription at every layer Optimized for simple file transfers
New Cloud Networking Architecture Need for non-blocking & low-latency 3 layer design, but most traffic only in leaf/spine layers Core layer only for traffic going in/out of datacenter Need non-blocking interconnect from any-toany node Need low latency between nodes to improve application performance
Oversubscription (or lack of!) & Latency Typical Oversubscription Typical Latency 1:1 40us Ports provisioned based on traffic in/out of Datacenter 1:1 1.2us Spine 1:1 600ns - 2.9us Leaf Non-blocking & Ultra-low latency Within rack latency is 600ns - 2.9us Inter-rack latency is 2.4us - 7us Only touch core when going in/out of datacenter
Comparison at Various Scales
Operating System Model
Market Transition Demands Architectural Change physical virtual cloud: consistent networking simplify and mobilize configuration, compliance, and changemanagement integrate with best-of-breed suppliers and open-standards enable workloads to be global follow-the-sun, kilowatt, or capacity Physical Virtual Cloud App OS App OS App OS App OS
Extensible Operating System (EOS) Model XML/ SOAP LAG SNMP STP CLI sysdb OSPF vswitch Mgmt ASIC Drivers 3 rd Party SW fully modular, multi-process, multi-threaded, stateful restart core sysdb for all session state and inter-process communications extensible architecture enables 3 rd party applications focused on making operations simpler one system image for all product families Protected OS Kernel
Fault Containment and Self Healing Agent Agent ProcMgr Sysdb Switch ASIC 1. The agent experiences a fault. The agent exits without affecting packet forwarding or other processes. 2. ProcMgr detects process exit and starts a new agent instance. Packet forwarding continues. 3. The new agent loads its state from Sysdb without data path disruption. 4. The system resumes normal operation.
New Agents EOS Open Extensibility OSPF 3 rd party agent (Provisioning) 3 rd party mgmt system New Extensions (Network Services) CLI SNMP ASIC driver Sysdb STP LED LAG New agent (PTP, Fping, PXE Boot) Protected OS Kernel Switch Hardware
Virtualized Extensible Operating System Implementation of EOS that manages VMware distributed switches Familiar network interface - integrates with vcenter virtualization platform Separation of control and data plane enables hitless software upgrades Consistent policy and accounting for physical ports, virtual machines, and cloud deployments Bridges the physical, virtual, and cloud networks
The Cloud Networking Landscape Cloud Trading Apps Cloud Storage Cloud Services
New Application models require a new Architecture
Thank You! www.aristanetworks.com