Redundancy Doesn't Always Mean "HA" or "Cluster"



Similar documents
How To Build An Openstack Cloud System

SCALABILITY AND AVAILABILITY

Achta's IBAN Validation API Service Overview (achta.com)

Openstack. Cloud computing with Openstack. Saverio Proto

Cloud on TEIN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat University

High Availability Solutions with MySQL

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

OpenStack Ecosystem and Xen Cloud Platform

Availability Digest. Redundant Load Balancing for High Availability July 2013

Networking Topology For Your System

Load Balancing Web Applications

Deployment Topologies

An Introduction to OpenStack and its use of KVM. Daniel P. Berrangé

Multi Provider Cloud. Srinivasa Acharya, Engineering Manager, Hewlett-Packard

Large-Scale Web Applications

การใช งานและต ดต งระบบ OpenStack ซอฟต แวร สาหร บบร หารจ ดการ Cloud Computing เบ องต น

Georgiana Macariu, Dana Petcu, CiprianCraciun, Silviu Panica, Marian Neagul eaustria Research Institute Timisoara, Romania

OpenStack Introduction. November 4, 2015

PES. Ermis service for DNS Load Balancer configuration. HEPiX Fall Aris Angelogiannopoulos, CERN IT-PES/PS Ignacio Reguero, CERN IT-PES/PS

Advanced Computer Networks. Layer-7-Switching and Loadbalancing

ExamPDF. Higher Quality,Better service!

3/21/2011. Topics. What is load balancing? Load Balancing

How To Handle A Power Outage On A Server Farm (For A Small Business)

Elastic Private Clouds

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Challenges and lessons learned with Openstack deployments and MySQL. Sandro Mazzio+a Sr Director Product Management IaaS

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Simplifying Virtual Infrastructures: Ethernet Fabrics & IP Storage

How To Balance A Load Balancer On A Server On A Linux (Or Ipa) (Or Ahem) (For Ahem/Netnet) (On A Linux) (Permanent) (Netnet/Netlan) (Un

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Red Hat Cluster Suite

Deploying the Barracuda Load Balancer with Office Communications Server 2007 R2. Office Communications Server Overview.

Topics. 1. What is load balancing? 2. Load balancing techniques 3. Load balancing strategies 4. Sessions 5. Elastic load balancing

Gladinet Cloud Enterprise

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Building a Cloud Computing Platform based on Open Source Software Donghoon Kim ( donghoon.kim@kt.com ) Yoonbum Huh ( huhbum@kt.

Building Nameserver Clusters with Free Software

Scaling DBMail with MySQL

CERN Cloud Infrastructure. Cloud Networking

High-Availability in the Cloud Architectural Best Practices

Global Load Balancing with Brocade Virtual Traffic Manager

High Availability Solutions for the MariaDB and MySQL Database

How Linux kernel enables MidoNet s overlay networks for virtualized environments. LinuxTag Berlin, May 2014

FAQ: BroadLink Multi-homing Load Balancers

Microsoft Lync Server Overview

Zscaler Internet Security Frequently Asked Questions

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

ACHIEVING 100% UPTIME WITH A CLOUD-BASED CONTACT CENTER

Technical White Paper: Clustering QlikView Servers

Web Application Hosting in the AWS Cloud Best Practices

White Paper. ThinRDP Load Balancing

How To Compare The Two Cloud Computing Models

Deploying Microsoft SharePoint Services with Stingray Traffic Manager DEPLOYMENT GUIDE

Load Balancing for Microsoft Office Communication Server 2007 Release 2

Solution for private cloud computing

Alfresco Enterprise on AWS: Reference Architecture

F5 Application Delivery in a Virtual Network

Building Storage Service in a Private Cloud

Mirantis

OpenStack in 程 辉. freedomhui@gmail.com

HA for Enterprise Clouds: Oracle Solaris Cluster & OpenStack

Using ProductionGrade ADC Services

Ignify ecommerce. Item Requirements Notes

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

A Link Load Balancing Solution for Multi-Homed Networks

Web Application Hosting Cloud Architecture

Building a small Data Centre

Core and Pod Data Center Design

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

FIA Athens 2014 ~OKEANOS: A LARGE EUROPEAN PUBLIC CLOUD BASED ON SYNNEFO. VANGELIS KOUKIS, TECHNICAL LEAD, ~OKEANOS

Scalable Linux Clusters with LVS

What network engineers can learn from web developers when thinking SDN.

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

PowerLink Bandwidth Aggregation Redundant WAN Link and VPN Fail-Over Solutions

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use.

KT ucloud storage. Two Years of Life with OpenStack Swift / Jaesuk Ahn, Cloud OS Dev. Team, Korea Telecom

Building an Open, Adaptive & Responsive Data Center using OpenDaylight

HP network adapter teaming: load balancing in ProLiant servers running Microsoft Windows operating systems

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

Building Multi-Site & Ultra-Large Scale Cloud with Openstack Cascading

Effect of anycast on K-root

VMware vsphere-6.0 Administration Training

High-Availability and Scalability

HAProxy. Ryan O'Hara Principal Software Engineer, Red Hat September 17, HAProxy

Carrier-grade VoIP platform with Kamailio at 1&1

(R)Evolution im Software Defined Datacenter Hyper-Converged Infrastructure

MySQL High-Availability and Scale-Out architectures

How to choose High Availability solutions for MySQL MySQL UC 2010 Yves Trudeau Read by Peter Zaitsev. Percona Inc MySQLPerformanceBlog.

System Models for Distributed and Cloud Computing

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

Transcription:

Redundancy Doesn't Always Mean "HA" or "Cluster" A cautionary tale against using hammers to solve all redundancy and resiliency problems... OpenStack Design Summit Oct 2012 Randy Bias @randybias CTO, Cloudscaling Dan Sneddon @dxs Sr. Engineer, Cloudscaling CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses 1

Our Journey Today 1. HA pairs are not the only type of redundancy 2. Alternative redundancy patterns for HA 3. Redundancy patterns in Open Cloud System* * Cloudscaling s OpenStack-powered cloud operating system ( distribution ) 2

What Do We Mean By HA? We mean what most people mean... Two servers or network devices that look like one 3

HA HA? HA pairs come in a couple flavors Active / Passive 4

HA HA? People like this flavor best, but it s not always possible... Active / Active 5

HA HA HA HA HA?? Many people wish they could get it more like this... HA cluster aka massive operational nightmare 6

Cluster<bleep>! Imagine this was 4 or 6 nodes in the cluster 4 network tech. 7 NICs / node A million different ways to break 7

HA Pairs Are One Type of Redundancy Herein lies the problem... 8

The Problem With HA -mmers There are many, but these two matter most... Catastrophic failures No scale out 9

HA Pairs Have Binary Failures Either working or dead, nothing in-between 10

What is Scale-out? A B A B C D N A B Scale-up - Make boxes bigger (usually an HA pair) Scale-out - Make moar boxes 11

Scaling out is a mindset Scaling up is like treating your servers as pets bowzer.company.com web001.company.com Servers *are* cattle 12

HA Pair Failures* - 100% down Hardware rarely fails, operators fail, software fails Who Type Year Why Duration Apple Switch 2005 Bug 2 hrs Flexiscale SAN 2007 Ops Err 24 hrs Vendio NAS 2008 Ops Err 8 hrs UOL Brazil SAN 2011 Bug 72 hrs Twitter Datacenter 2012 Bug+Ops 2 hrs * This is a handful of examples as a baseline; I m sure you can find many more 13

HA Pairs Are an All-in Move They better not fail... 14

Risk Reduction Many small failure domains is usually better 15

Big failure domains vs. small Would you rather have the whole cloud down or just a small bit for a short period of time? Still a scale-up pattern... wouldn t you rather scale-out? 16

Pair vs. Scale-out Load Balancing No scale-out State Sync (100% loss) Shared-nothing Architecture (20% loss) 17

Pair vs. Scale-out Load Balancing No scale-out State Sync (100% loss) Shared-nothing Architecture (20% loss) 17

What s Usually an HA Pair in OpenStack? Everything... Service Endpoints (APIs) Messaging System (RPC) Worker Threads (e.g. Scheduler, Networking) Database (MySQL) 18

What needs to be an HA pair? Not much needs state synchronization Service Endpoints (APIs) Messaging System (RPC) Worker Threads (e.g. Scheduler, Networking) Database (MySQL) 19

Fault Tolerance Methodologies 20

Fault Tolerance in OCS 21

Service Distribution High Availability Without Compromise Resilient Stateless Scale-out 22

Service Distribution Combines Standard Networking Technologies OSPF /etc/quagga/ospfd.conf router ospf ospf router-id 10.1.1.1 network 10.1.255.1 area 0.0.0.0 Anycast /etc/quagga/zebra.conf interface lo:2 description Pound listening address ip address 10.1.255.1/32 Load- Balancing Proxy /etc/pound/pound.conf ListenHTTP Address 10.1.255.1 Port 8774 xhttp 1 Service BackEnd Address 10.1.1.1 Port 8774 End BackEnd Address 10.1.1.2 Port 8774 End End End 23

Resilient OpenStack Horizontally Scalable, No Single Point Of Failure Service Distribution ZeroMQ Service Endpoints (APIs) Messaging System (RPC) Service Distribution Worker Threads (e.g. Scheduler, Networking) MMR + HA Database (MySQL)

Service Distribution Advantages What Makes This a Superior Solution? True horizontal scalability with no centralized controller Services are always running, failover is nearly instant Reduced complexity, fewer idle resources No need for separate load balancers Server Server Server Server Server Server Server... Failover vs. Distributed Services 25

Perfect For Site Resiliency Service Distribution Works With Multiple Sites Traditional HA pairs do not support cross-site resiliency Service Distribution fail across sites without DNS redirections 26

Service Distribution in Action Example: Distributed Load Balancing 1) OSPF OSPF Router(s) OSPF advertisement OSPF advertisement V Quagga Quagga HTTP Proxy HTTP Proxy 27

Service Distribution in Action Example: Distributed Load Balancing 1) OSPF OSPF Router(s) 2) ECMP Per-flow Load Balancing 3) Load-balancing HTTP Proxy OSPF advertisement V Quagga Per-Flow Load Balancing OSPF advertisement Quagga HTTP Proxy HTTP Proxy 28

Service Distribution in Action Example: Distributed Load Balancing 1) OSPF OSPF Router(s) 2) ECMP Per-flow Load Balancing 3) Load-balancing HTTP Proxy OSPF advertisement V Quagga Per-Flow Load Balancing OSPF advertisement Quagga HTTP Proxy HTTP Proxy 4) Unlimited # of Back-End Servers Server Server Server Server 29

Failure Resiliency Client Client Client Client 1 2 3 4 1 2 3 4 Load Balancer/ Proxy Proxy Load Balancer/ Proxy Load Balancer/ Proxy Load Balancer/ Proxy Server Server Server Server Server Server Server Server Server Server 10% Load Each Server 30

Failure Resiliency Client Client Client Client 1 2 3 4 1 1 2 3 4 X Load Balancer/ Proxy Proxy Load Balancer/ Load Balancer/ Proxy Load Balancer/ Proxy Server Server Server Server Load Balancer/ Proxy Server Server Server Server Server Server 10% Load Each Server 31

Failure Resiliency Client Client Client Client 1 2 3 4 1 2 3 4 Load Balancer/ Proxy Proxy Load Balancer/ Proxy Load Balancer/ Proxy Load Balancer/ Proxy X Server Server Server Server Server Server Server Server Server Server 10% Increased Server Load 32

OCS NAT Service Example: Scale-out Network Address Translation BGP Multiple ISP providers NAT Service Distribution VMs 33

Brokerless Messaging With ZeroMQ Avoiding RabbitMQ s Single Point Of Failure Nova-Compute Single Point Of Failure RabbitMQ Broker Nova-Scheduler RabbitMQ (Brokered) Nova-API 34

Brokerless Messaging With ZeroMQ Avoiding RabbitMQ s Single Point Of Failure Nova-Compute Nova-Compute Single Point Of Failure RabbitMQ Broker Nova-Scheduler RabbitMQ (Brokered) Nova-API Nova-Scheduler vs. Nova-API ZeroMQ (Peer To Peer) 35

What did we learn today? 1. HA-mmers are for nails 2. Scale-out rules for redundancy 3. Design-for-failure is a mentality, not a pair 4. Resiliency over redundancy 36

Q&A Randy Bias @randybias CTO, Cloudscaling Dan Sneddon @dxs Sr. Engineer, Cloudscaling OCS 2.0 Public Cloud Benefits Private Cloud Control Open Cloud Economics 37