Integrating Network. Management for Cloud



Similar documents
A Network-State Management Service. Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft

Integrating Network Management For. Cloud Computing Services

A Network-State Management Service

Network Virtualization

SDN Applications in Today s Data Center

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CLOUD NETWORKING THE NEXT CHAPTER FLORIN BALUS

Introduction to Software Defined Networking (SDN) and how it will change the inside of your DataCentre

SDN PARTNER INTEGRATION: SANDVINE

Network Virtualization and Application Delivery Using Software Defined Networking

Software-Defined Networking for the Data Center. Dr. Peer Hasselmeyer NEC Laboratories Europe

Commercial Software Licensing

Relational Databases in the Cloud

U s i n g S D N - and NFV-based Servi c e s to M a x i m iz e C SP Reve n u e s a n d I n c r e ase

SOFTWARE DEFINED NETWORKING

STRATEGIC WHITE PAPER. The next step in server virtualization: How containers are changing the cloud and application landscape

What is SDN all about?

Cloud Computing Trends

How To Compare The Cost Of A Microsoft Private Cloud To A Vcloud With Vsphere And Vspheon

Using SDN-OpenFlow for High-level Services

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Software Defined Networking What is it, how does it work, and what is it good for?

The Internet: A Remarkable Story. Inside the Net: A Different Story. Networks are Hard to Manage. Software Defined Networking Concepts

Leveraging SDN and NFV in the WAN

How To Understand The Power Of The Internet

RIDE THE SDN AND CLOUD WAVE WITH CONTRAIL

SDN Software Defined Networks

The Next Frontier for SDN: SDN Transport

Software Defined Networks

SINGLE-TOUCH ORCHESTRATION FOR PROVISIONING, END-TO-END VISIBILITY AND MORE CONTROL IN THE DATA CENTER

Introduction to Cloud Computing

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Outlook. Corporate Research and Technologies, Munich, Germany. 20 th May 2010

Scalable Network Monitoring with SDN-Based Ethernet Fabrics

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

SDN AND SECURITY: Why Take Over the Hosts When You Can Take Over the Network

SDN Security Design Challenges

SDN FOR IP/OPTICAL TRANSPORT NETWORKS

Building Scalable Multi-Tenant Cloud Networks with OpenFlow and OpenStack

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

An Introduction to Software-Defined Networking (SDN) Zhang Fu

Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions

Migrating SaaS Applications to Windows Azure

Panopticon: Incremental SDN Deployment in Enterprise Networks

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Lecture 7: Data Center Networks"


Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments

White Paper on NETWORK VIRTUALIZATION

Building an Open, Adaptive & Responsive Data Center using OpenDaylight

3 Ways to build a SaaS Product. Asteor Software Inc Ram Kumar - Director Product Management

Scalable Network Monitoring with SDN-Based Ethernet Fabrics

SDN CENTRALIZED NETWORK COMMAND AND CONTROL

Lecture 02b Cloud Computing II

software networking Jithesh TJ, Santhosh Karipur QuEST Global

Cloud models and compliance requirements which is right for you?

A Look at the New Converged Data Center

Software-Defined Networks Powered by VellOS

Cisco Prime Network Services Controller. Sonali Kalje Sr. Product Manager Cloud and Virtualization, Cisco Systems

Data Center Use Cases and Trends

Cloud Infrastructure Operational Excellence & Reliability

Software Defined Networks Virtualized networks & SDN

Radware ADC-VX Solution. The Agility of Virtual; The Predictability of Physical

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Rapid IP redirection with SDN and NFV. Jeffrey Lai, Qiang Fu, Tim Moors December 9, 2015

Hybrid (Cloud) Computing

SDN and NFV in the WAN

Service Performance Management: Pragmatic Approach by Jim Lochran

Global Headquarters: 5 Speen Street Framingham, MA USA P F

OpenFlow/SDN for IaaS Providers

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Securing the Virtualized Data Center With Next-Generation Firewalls

OpenFlow and Onix. OpenFlow: Enabling Innovation in Campus Networks. The Problem. We also want. How to run experiments in campus networks?

SDN. What's Software Defined Networking? Angelo Capossele

Cloud Courses Description

Software Defined Networking A quantum leap for Devops?

AppStack Technology Overview Model-Driven Application Management for the Cloud

SDN research directions

A Link Load Balancing Solution for Multi-Homed Networks

Bringing the Public Cloud to Your Data Center

Paolo Costa

Cloud Essentials for Architects using OpenStack

PLATFORM-AS-A-SERVICE: ADOPTION, STRATEGY, PLANNING AND IMPLEMENTATION

Open Source Networking for Cloud Data Centers

Pluribus Netvisor Solution Brief

BRINGING NETWORKS TO THE CLOUD ERA

A.Prof. Dr. Markus Hagenbuchner CSCI319 A Brief Introduction to Cloud Computing. CSCI319 Page: 1

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

Control Plane architectures for Photonic Packet/Circuit Switching-based Large Scale Data Centres

Panel: Cloud/SDN/NFV 黃 仁 竑 教 授 國 立 中 正 大 學 資 工 系 2015/12/26

CoIP (Cloud over IP): The Future of Hybrid Networking

Outline. What is cloud computing? History Cloud service models Cloud deployment forms Advantages/disadvantages

Longer is Better? Exploiting Path Diversity in Data Centre Networks

Software Defined Networking & Openflow

Achieve Economic Synergies by Managing Your Human Capital In The Cloud

Software Defined Environments

Transcription:

Integrating Network Management for Cloud Computing Services Final Public Oral Exam Peng Sun

What is Cloud Computing Deliver applications as services over Internet Run applications in datacenters Lower capital and operational expenses 1

Cloud Services are Growing Public cloud for consumer service Amazon Web Services Microsoft Azure 2

Cloud Services are Growing Public cloud for consumer service Amazon Web Services Microsoft Azure Enterprises out-source IT service Storage: Box, Analytics: Salesforce, Productivity: Office365, 2

Quality of Cloud Service Depends on Network Quality Datacenter ISP App Routing OS Hardware ISP Enterprise Network Servers Network Devices ISP 3

Improve Network Quality The old way cannot keep up with the growth of cloud services Deploying more devices with higher bandwidth 4

Improve Network Quality The old way cannot keep up with the growth of cloud services Deploying more devices with higher bandwidth The key is proper management of network resources 4

Examples of Network Management Solutions 5

Examples of Network Management Solutions Traffic Engineering 5

Examples of Network Management Solutions Traffic Engineering Load Balancing 5

Examples of Network Management Solutions Traffic Engineering Load BalancingLink Corruption Mitigation 5

Examples of Network Management Solutions Traffic Engineering Load BalancingLink Corruption Network Mitigation Device Configuration 5

Examples of Network Management Solutions Traffic Engineering Load BalancingLink Corruption Network Mitigation Device Configuration Device Power Control 5

Examples of Network Management Solutions Traffic Engineering Load BalancingLink Corruption Network Mitigation Device Configuration Device Power Control 5

Problems of Current Network Management Disjoint management of network components Low-level interfaces with network devices 6

Disjoint Management Systems Division of labor in pre-cloud era App OS Servers Routing Hardware Network Devices Users ISP 7

Disjoint Management Systems Division of labor in pre-cloud era App OS Routing Hardware Users Servers Server Provisioning Network Devices ISP 7

Disjoint Management Systems Division of labor in pre-cloud era Traffic Engineering App OS Routing Hardware Users Servers Server Provisioning Network Devices ISP 7

Disjoint Management Systems Division of labor in pre-cloud era Traffic Engineering App OS Servers Server Provisioning Routing Hardware Infrastructure Network Devices ISP Users 7

Datacenter Breaks Balance Coordination of server & network: More applications are built as multitier distributed systems Intra-datacenter traffic is new majority 8

Datacenter Breaks Balance Coordination of server & network: More applications are built as multitier distributed systems Intra-datacenter traffic is new majority Infrastructure evolution speeds up: Network architecture changes (e.g., FatTree) 8

Disjoint Mgmt. is Bottleneck Because: Cloud service providers have stakes in all three management areas: Server, infrastructure, traffic 9

Disjoint Mgmt. is Bottleneck Because: Cloud service providers have stakes in all three management areas: Server, infrastructure, traffic Great opportunity exists for consolidation Google and Microsoft on SoftWAN 9

Yet Another Problem: Low-level Device Interaction Hardware vendors differentiate with specialized devices Network operation: intensively uses vendor-specific APIs heavily depends on experiences 10

Cloud Service is Different Much more devices in datacenters than traditional networks Automation is a must 11

Cloud Service is Different Much more devices in datacenters than traditional networks Automation is a must Commodity devices instead of specialized hardware Homogeneity is preferred 11

Cloud Service is Different Much more devices in datacenters than traditional networks Automation is a must Commodity devices instead of specialized hardware Homogeneity is preferred Lower vendor dependence pays off (MS & Amazon on SoftLB) 11

Promising Yet Limited SDN Great way of automating traffic management with high-level programming paradigms Yet literature focus on just traffic 12

Summary of Problems This dissertation solves: Disjoint management of server, infrastructure, and traffic Low-level device interaction for broader scope of network management 13

Practical Approach 14

Practical Approach Identify opportunity of integration 14

Practical Approach Identify opportunity of integration Simple and expressive programming abstraction 14

Practical Approach Identify opportunity of integration Simple and expressive programming abstraction Efficient and scalable system for execution 14

Practical Approach Identify opportunity of integration Simple and expressive programming abstraction Deployment in real world Efficient and scalable system for execution 14

Practical Approach Identify opportunity of integration Simple and expressive programming abstraction Deployment in real world Efficient and scalable system for execution 14

Contributions Project Publication Deployment Statesman SIGCOMM 14 Microsoft Azure HONE JNSM, Vol. 23, 2015 Overture/ Verizon Sprite SOSR 15 In process with OIT 15

Statesman: Safe Datacenter Traffic/Infrastructure Management Other ISPs Datacenter ISP ISP App Routing Enterprises OS Hardware Servers Network Devices 16

Hone: End-host/Network Cooperative Traffic Management Other ISPs Datacenter ISP ISP App Routing Enterprises OS Hardware Servers Network Devices 17

Sprite: Direct Control of Entrant ISP for Enterprise Traffic Other ISPs Datacenter ISP ISP App Routing Enterprises OS Hardware Servers Network Devices 18

What Follows Brief explanation of each project Open issues Related work Q&A 19

Statesman: Integrating Network Infrastructure Management 20

Problem for Cloud Providers Multiple mgmt. solutions coexist for traffic and infrastructure mgmt.

Problem for Cloud Providers Multiple mgmt. solutions coexist for traffic and infrastructure mgmt. Complexity is the main problem

Problem for Cloud Providers Multiple mgmt. solutions coexist for traffic and infrastructure mgmt. Complexity is the main problem Development Scale & heterogeneity of devices

Problem for Cloud Providers Multiple mgmt. solutions coexist for traffic and infrastructure mgmt. Complexity is the main problem Development Scale & heterogeneity of devices Coordination Conflicts and safety violations

Conflict Core1 2 Agg A Agg B ToRs 22

Conflict Core1 2 Agg A Agg B Link-corruptionmitigation adjusts traffic away from Core1 ToRs 22

Conflict Core1 2 Agg A Agg B Link-corruptionmitigation adjusts traffic away from Core1 ToRs TE tunes traffic among links to Core1, 2 22

Safety Violation Core1 2 Agg A Agg B ToRs 23

Safety Violation Core1 2 Agg A Agg B Link-corruptionmitigation shuts down faulty Agg A ToRs 23

Safety Violation Core1 2 Agg A Agg B Link-corruptionmitigation shuts down faulty Agg A ToRs Firmware-upgrade schedules Agg B to upgrade 23

The Statesman System Network operating system Common layer to consolidate traffic and infrastructure management Resolve conflicts & safety violations 24

The Statesman System Network operating system Common layer to consolidate traffic and infrastructure management Resolve conflicts & safety violations Core techniques: Network state & three-view workflow State dependency model Scalable & robust system 24

Three-view Workflow with Network State Observed State Proposed State Target State 25

Three-view Workflow with Network State Application Application Solutions Observed State Proposed State Target State 25

Three-view Workflow with Network State Application Application Solutions Observed State What we see from the network Proposed State Target State 25

Three-view Workflow with Network State Application Application Solutions Observed State What we see from the network Proposed State What we want the network to be Target State 25

Three-view Workflow with Network State Application Application Solutions Statesman Observed State Proposed State Target State What we see from the network What we want the network to be What can be actually done on the network 25

State Dependency for Conflict Detection Link Device 26

State Dependency for Conflict Detection Link PowerState Device 26

State Dependency for Conflict Detection FirmwareVersion Link PowerState Device 26

State Dependency for Conflict Detection ConfigurationState FirmwareVersion Link PowerState Device 26

State Dependency for Conflict Detection ConfigurationState ConfigurationState AdminState FirmwareVersion Link PowerState Device 26

State Dependency for Conflict Detection RoutingState ConfigurationState ConfigurationState FirmwareVersion AdminState Link PowerState Device 26

State Dependency for Conflict Detection PathState RoutingState ConfigurationState ConfigurationState FirmwareVersion AdminState Link PowerState Device 26

Statesman System Observed State Proposed State Target State 27

Statesman System Storage Service Observed State Proposed State Target State 27

Statesman System Storage Service Observed State Proposed State Target State Monitor 27

Statesman System Storage Service Checker Observed State Proposed State Target State Monitor 27

Statesman System Storage Service Checker Observed State Proposed State Target State Monitor Updater 27

Deployment Overview Operational in Microsoft Azure since October 2013 Cover 10 DCs of 20K devices 3 production management solutions built and running 28

Case: Resolve Conflict Inter-DC TE & Firmware-upgrade DC = Data Center BR = Border Router BR 3 DC 2 BR 4 BR 1 DC 1 BR 2 BR 8 DC 4 BR 7 BR 6 BR 5 DC 3 29

30

30

Firmware-upgrade acquires lock of BR1 30

TE fails to acquire lock, and moves traffic away 30

TE fails to acquire lock, and moves traffic away 30

BR1 firmware upgrade starts 30

BR1 firmware upgrade starts BR1 firmware upgrade ends. Lock released. 30

BR1 firmware upgrade starts TE re-acquires lock, and moves traffic back 30

BR1 firmware upgrade starts TE re-acquires lock, and moves traffic back 30

Statesman Summary Programming abstraction Three-view network state model Efficient and scalable system Automatic and safe infrastructure management system Deployment Operational in Microsoft Azure worldwide SIGCOMM 2014 31

Hone: Combining End-host and Network for Traffic Management 32

Problem Server Traffic Management Solutions App OS Network Traffic management is limited by the scope of network devices Coarse granularity & limited view 33

The Hone System Bring end hosts into traffic mgmt. 34

The Hone System Bring end hosts into traffic mgmt. Core techniques: Access of socket & transport layers 34

The Hone System Bring end hosts into traffic mgmt. Core techniques: Access of socket & transport layers Expressive three-stage programming framework 34

The Hone System Bring end hosts into traffic mgmt. Core techniques: Access of socket & transport layers Expressive three-stage programming framework Efficient system with parallel execution of partitioned program 34

Programming Model Measure Stage Analyze Stage Control Stage Framework for each stage Programmable body of each stage Focus on measurement and analysis 35

Hone System Host App Server OS Network 36

Hone System Host App Server OS Hone Agent Network 36

Hone System Controller Mgmt. Program Hone Runtime System Host App Server OS Hone Agent Network 36

Hone System Controller Mgmt. Hone Runtime Program System Host App Server OS Hone Agent Network 36

Hone System Controller Host Host Local Exe. Plan Global Hone Runtime System Net Local Exe. Plan Exe. Plan App Server OS Hone Agent Network 36

Hone System Controller Global Hone Runtime System Exe. Plan Host Host Local App Hone Exe. Plan Agent Server OS Net Local Exe. Plan Network 36

Evaluation Built multiple traffic management solutions on Hone Show distributed rate limiting here Limit total bandwidth across all instances of a tenant in a public cloud 37

DRL on HONE Host-side execution: Measure Calculate throughput Aggregate among connections Controller-side execution: Aggregate among hosts Generate new rate-limit policies 38

Distributed Rate Limiting 39

Distributed Rate Limiting 39

Distributed Rate Limiting 39

Distributed Rate Limiting 39

Distributed Rate Limiting 39

Hone Summary Programming abstraction Access to fine-grained data in servers Three-stage framework Efficient and scalable runtime Deployment Integrated into product of Overture for Verizon Business Cloud Springer JNSM Volume 23, 2015 40

Sprite: Bridging Enterprise and ISP for Inbound Traffic Control 41

Problem for Enterprises: Inbound Traffic Engineering Service 1 ISP 1 Internet ISP 2 Enterprise Network Service 2 ISP 3 42

Problem for Enterprises: Inbound Traffic Engineering ISP 1 Internet ISP 2 Enterprise Network Service 2 ISP 3 42

Problem for Enterprises: Inbound Traffic Engineering ISP 1 Internet ISP 2 Enterprise Network Service 2 ISP 3 42

Mechanism of Sprite Enterprise Network ISP 1 YouTube User Other ISPs ISP 2 VoIP 43

Mechanism of Sprite Enterprise Network 1.1.1.0/24 ISP 1 YouTube User Other ISPs ISP 2 VoIP 1.1.2.0/24 43

Mechanism of Sprite Enterprise Network 1.1.1.0/24 ISP 1 YouTube 10.1.1.2 User Other ISPs ISP 2 VoIP 1.1.2.0/24 43

Mechanism of Sprite 1.1.1.0/24 Enterprise Network Src:1.1.1.2 10.1.1.2 User ISP 1 Other ISPs YouTube ISP 2 VoIP 1.1.2.0/24 43

Mechanism of Sprite 1.1.1.0/24 Enterprise Network Src:1.1.1.2 10.1.1.2 User ISP 1 Other ISPs YouTube Src:1.1.2.2 ISP 2 VoIP 1.1.2.0/24 43

Challenges Naïve solution All done by the border router 44

Challenges Naïve solution All done by the border router Need a distributed solution 44

Challenges Naïve solution All done by the border router Need a distributed solution Control-plane scaling 44

Challenges Naïve solution All done by the border router Need a distributed solution Control-plane scaling Data-plane scaling 44

Challenges Naïve solution All done by the border router Need a distributed solution Control-plane scaling Data-plane scaling Need a simple management interface 44

Three Tiers of Abstractions Abstraction Example 45

Three Tiers of Abstractions Abstraction Example High-level Policy 45

Three Tiers of Abstractions Abstraction High-level Policy Example <BioDept, Analytic> BestLatency 45

Three Tiers of Abstractions Abstraction High-level Policy Example <BioDept, Analytic> BestLatency Networklevel Rule 45

Three Tiers of Abstractions Abstraction High-level Policy Example <BioDept, Analytic> BestLatency Networklevel Rule <10.1.0.0/16, 184.168.1.0/24> ISP1 45

Three Tiers of Abstractions Abstraction High-level Policy Example <BioDept, Analytic> BestLatency Networklevel Rule <10.1.0.0/16, 184.168.1.0/24> ISP1 45

Three Tiers of Abstractions Abstraction High-level Policy Example <BioDept, Analytic> BestLatency Networklevel Rule <10.1.0.0/16, 184.168.1.0/24> ISP1 Per-flow Rule 45

Three Tiers of Abstractions Abstraction High-level Policy Example <BioDept, Analytic> BestLatency Networklevel Rule <10.1.0.0/16, 184.168.1.0/24> ISP1 Per-flow Rule <10.1.1.42,184.168.1.17> SNAT to 1.1.1.2 45

Case: Dynamic Perf-driven Balancing Move users connections among ISPs for best per-user performance Live Internet experiment on AWS VPC and PEERING 10 users watch movies on YouTube 46

User YouTube Throughput 47

Sprite Summary Direct and fine-grained inbound traffic control Scalable solution Scaling control and data planes Evaluation In collaboration with OIT SOSR 15 48

Open Issue #1 Statesman + Hone Merge Hone into Statesman Joint server, traffic, infrastructure mgmt. Possible exploration State abstraction for server data Dependency of server and network states Safety invariants involving servers 49

Open Issue #2 Transactional Statesman Transactional semantics for conflict resolution Possible exploration Grouping semantics Condition semantics 50

Open Issue #3 Hone for Multi-tenant Cloud Loosen the assumption of having access to the hosts OS Possible exploration Infer stats in guest OS from data in hypervisor 51

Related work 52

Statesman Related Work Work What they do Statesman SDN works Onix Pyretic Corybantic Centralized control of flows Single repository of network stats Target at flow management Tight cross-application proposal evaluation Wider spectrum of management applications Three-view network state model Wider spectrum of applications Loose coupling FlowVisor Virtual topology slicing Network state model 53

Hone Related Work Work What they do Hone Network Exception Handler Use hosts only as software switches Go deeper into socket layer Gigascope Extend SQL Use functional language to construct Chimera Specific application (intrusion detection) Support various management solutions MapReduce Naturally parallelizable data Data inherently associated with collection point 54

Sprite Related Work Work What they do Sprite BGP AS-path prepending Works on Internet rearchitecture Tunnel-based works Tune BGP configurations to influence neighboring ASes Clean-slate design in routing or hosts Two ends cooperate to control traffic Direct control from inside enterprise networks Incrementally deployable Only need the client side to act 55

Thanks! Q&A 56