Automated Deployment of an HA OpenStack Cloud

Similar documents

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla

SUSE Cloud 5 Private Cloud based on OpenStack

How To Make A Cloud Work For You

How an Open Source Cloud Will Help Keep Your Cloud Strategy Options Open

HO15982 Deploy OpenStack. The SUSE OpenStack Cloud Experience. Alejandro Bonilla. Michael Echavarria. Cameron Seader. Sales Engineer

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

SUSE OpenStack Cloud 4 Private Cloud Platform based on OpenStack. Gábor Nyers Sales gnyers@suse.com

Running SAP HANA One on SoftLayer Bare Metal with SUSE Linux Enterprise Server CAS19256

Advanced Systems Management with Machinery

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

SUSE Storage. FUT7537 Software Defined Storage Introduction and Roadmap: Getting your tentacles around data growth. Larry Morris

Software Defined Everything

We are watching SUSE

SUSE Linux uutuudet - kuulumiset SUSECon:sta

DevOps and SUSE From check-in to deployment

Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Chris Haddad

Relax-and-Recover. Johannes Meixner. on SUSE Linux Enterprise 12.

High Availability Storage

Ceph Distributed Storage for the Cloud An update of enterprise use-cases at BMW

SUSE Customer Center Roadmap

Of Pets and Cattle and Hearts

KVM, OpenStack and the Open Cloud SUSECon November 2015

Using SUSE Linux Enterprise to "Focus In" on Retail Optical Sales

High Availability and Disaster Recovery for SAP HANA with SUSE Linux Enterprise Server for SAP Applications

Configuration Management in SUSE Manager 3

How SUSE Is Helping You Rock The Public Cloud

TUT8155 Best Practices: Linux High Availability with VMware Virtual Machines

SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment

SUSE Cloud Installation: Best Practices Using an Existing SMT and KVM Environment

Installing, Tuning, and Deploying Oracle Database on SUSE Linux Enterprise Server 12 Technical Introduction

Openstack. Cloud computing with Openstack. Saverio Proto

A Complete Open Cloud Storage, Virt, IaaS, PaaS. Dave Neary Open Source and Standards, Red Hat

Public Cloud. Build, Use, Manage. Robert Schweikert. Public Cloud Architect

Challenges Implementing a Generic Backup-Restore API for Linux

Troubleshooting Your SUSE TUT OpenStack Cloud. Adam Spiers SUSE Cloud/HA Senior Software Engineer

Ubuntu OpenStack on VMware vsphere: A reference architecture for deploying OpenStack while limiting changes to existing infrastructure

Deploying Hadoop with Manager

SUSE Enterprise Storage Highly Scalable Software Defined Storage. Gábor Nyers Sales

SUSE Virtualization Technologies Roadmap

HP OpenStack & Automation

Data Center Automation with SUSE Manager Federal Deployment Agency Bundesagentur für Arbeit Data Center Automation Project

Implementing Linux Authentication and Authorisation Using SSSD

How To Install Openstack On Ubuntu (Amd64)

Wicked A Network Manager Olaf Kirch

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen

Introduction to OpenStack

Open Source High Availability Writing Resource Agents for your own services. Lars Marowsky-Brée Team Lead SUSE Labs

SUSE Cloud. Deployment Guide. January 26, 2015

Iron Chef: Bare Metal OpenStack

Multi Provider Cloud. Srinivasa Acharya, Engineering Manager, Hewlett-Packard

SUSE Linux Enterprise 12 Security Certifications

KVM, OpenStack, and the Open Cloud

Based on Geo Clustering for SUSE Linux Enterprise Server High Availability Extension

An Introduction to OpenStack and its use of KVM. Daniel P. Berrangé

Comparing Ganeti to other Private Cloud Platforms. Lance Albertson

SUSE Linux Enterprise 12 Security Certifications Common Criteria, EAL, FIPS, PCI DSS,... What's All This About?

TUT19741 Use SUSE Cloud 5 with Manila to utilize NetApp s enterprise class storage for SAP workloads

OpenStack IaaS. Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013

RED HAT INFRASTRUCTURE AS A SERVICE OVERVIEW AND ROADMAP. Andrew Cathrow Red Hat, Inc. Wednesday, June 12, 2013

Implementing the SUSE Linux Enterprise High Availability Extension on System z Mike Friesenegger

Déployer son propre cloud avec OpenStack. GULL François Deppierraz

Wojciech Furmankiewicz Senior Solution Architect Red Hat CEE

KVM, OpenStack, and the Open Cloud

OpenStack Alberto Molina Coballes

OpenStack. Orgad Kimchi. Principal Software Engineer. Oracle ISV Engineering. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

FUJITSU Enterprise Store - Introduction

HA for Enterprise Clouds: Oracle Solaris Cluster & OpenStack

SUSE Cloud. Deployment Guide. February 20, 2015

CAS18543 Migration from a Windows Environment to a SUSE Linux Enterprise based Infrastructure Liberty Christian School

How To Use Openstack On Your Laptop

SUSE Virtualization Technologies Roadmap

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Oracle Products on SUSE Linux Enterprise Server 11

Mobile Cloud Computing T Open Source IaaS

Operating System Security Hardening for SAP HANA

cloud functionality: advantages and Disadvantages

SUSE Cloud Deployment Guide Questionnaire

Using btrfs Snapshots for Full System Rollback

An Intro to OpenStack. Ian Lawson Senior Solution Architect, Red Hat

kgraft Live patching of the Linux kernel

CS312 Solutions #6. March 13, 2015

Change the Game with HP Helion

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

CloudCIX Bootcamp. The essential IaaS getting started guide.

SMB in the Cloud David Disseldorp

Oracle Virtualization Strategy and Roadmap

Wicked Trip into Wicked Network Management

With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Understand IBM Cloud Manager V4.2 for IBM z Systems

Transcription:

Automated Deployment of an HA OpenStack Cloud with SUSE Cloud HO7695 Adam Spiers Senior Software Engineer aspiers@suse.com Vincent Untz Project Manager vuntz@suse.com

Introduction

Agenda Start building a cloud! Quick intro to SUSE Cloud architecture Learn about HA in OpenStack and SUSE Cloud Build an HA cluster Build an HA OpenStack cloud on the cluster Break things! 3

Workshop environment

Workshop environment Relax ;-) We have plenty of time Whole build is also automated and idempotent You can take home the entire environment afterwards (available online) You can run on any machine with at least 16GB RAM... or 8GB at a push (although that comes with limitations) 5

Workshop environment We'll build a miniature cloud on a single machine VirtualBox hypervisor 4 VMs Administration Server (Crowbar) 2 Control Nodes in an HA cluster 1 Compute Node Vagrant for rapid deployment 6

What is Vagrant? "Creates and configures lightweight, reproducible, and portable development environments." Not just for development https://www.vagrantup.com/ Perfect for "kicking the tyres", demoing, testing etc. Cross-platform (Linux, MacOS X, Windows) Providers for libvirt, VirtualBox, VMware, Hyper-V, Docker, OpenStack,... 7

Vagrant inputs 1 or more Vagrant "box" pre-built virtual appliances Vagrantfile: Ruby DSL file which defines: which box(es) to use virtual hardware required virtual network topology network ports to forward hypervisor-specific settings files to inject into appliance commands to run in appliance files to inject 8

Using Vagrant: crash course vagrant box add suse/cloud4-admin https://vagrantcloud.com/suse/ Also possible to add local boxes vagrant up admin vagrant up controller1 vagrant halt controller2 vagrant destroy compute1 https://docs.vagrantup.com/v2/getting-started/index.ht ml 9

Workshop Vagrant environment https://github.com/suse-cloud/suse-cloud-vagrant demos/ha/ vagrant/ Vagrantfile and configs/2-controllers-1-compute.yaml VirtualBox pre-installed 2 boxes pre-installed suse/cloud4-admin and suse/sles11sp3 4 VMs admin: SUSE Cloud 4 Administration Server controller1, controller2 (will form an HA cluster) compute1 10

Exercise #1: start the build! Start up VirtualBox GUI cd to local copy of git repository cd vagrant/ vagrant up All 4 VMs will be booted in sequence: admin controller1 controller2 compute1 11

SUSE Cloud Overview

SUSE Cloud Enterprise OpenStack distribution that rapidly deploys and easily manages highly available, mixed hypervisor IaaS Clouds Increase business agility Economically scale IT capabilities Easily deliver future innovations 13

OpenStack SUSE Cloud Distribution 4 Management Billing VM SUSE Mgmt Image SUSE Tool Portal App Monitor Sec & Perf Manager Studio Cloud Dashboard (Horizon) Cloud APIs (OpenStack and EC2) Orchestration (Heat) Telemetry (Ceilometer) RadosGW Install Framework (Crowbar, Chef, TFTP, DNS, DHCP) Required Services RabbitMQ Postgresql Compute (Nova) Hypervisor Xen, KVM VMware, Hyper-V Identity (Keystone) Images (Glance) Object (Swift) Network (Neutron) Adapters Block (Cinder) Adapters Adapters RBD SUSE Linux Operating Enterprise System Server 11 SP3 Rados Physical Infrastructure: x86-64, Switches, Storage OpenStack Icehouse SUSE Cloud Adds SUSE Product Ceph Highly Available Services Partner Solutions 14

Why an Install Framework? 1229 Parameters 11 Components 1 Week Hour 15

Why an Install Framework? SCARY AS HELL! 16

Introduction to Crowbar

Crowbar 18

19

20

21

22 It could have been worse.

24 SUSE Cloud architecture

25 SUSE Cloud Administration Server

SUSE Cloud Control Node PostgreSQL database Image Service (Glance) for managing virtual images Compute Identity (Keystone), providing authentication and authorization for all SUSE Cloud services Dashboard (Horizon), providing the Dashboard, which is a user Web interface for the SUSE Cloud services Nova API and scheduler Message broker (RabbitMQ) 26

SUSE Cloud Compute Nodes Pool of machines where instances run Equipped with RAM and CPU Compute Nodes Compute SUSE Cloud Compute (nova) service Setting up, starting, stopping, migration of VMs Compute 27

sledgehammer 29

barclamp

Interlude

Exercise #2: assign aliases to nodes Connect to admin node vagrant ssh admin or ssh root@192.168.124.10 or use VM console in VirtualBox Root password is vagrant Type q then y to accept the beta EULA Run setup-node-aliases.sh Point a browser at the Crowbar web UI http://localhost:3000 Check the 4 nodes are registered, named correctly, and in Ready state (green) 34

High Availability and Cloud

Why High Availability? I can't have my systems go down. We lose $1,000,000 for every minute that we're down, and upper-management gets really 'excited' when that happens. 36

High Availability for OpenStack What might we want to protect? Admin server core infrastructure: DNS, NTP, provisioning capabilities Controller node OpenStack services Compute nodes Hypervisor VM instances (i.e. guests in the cloud) 37

Component failure impact Admin server New cloud nodes require manual addition and configuration Currently no ability to rediscover existing nodes on restart No impact on currently operating cloud Control node Cannot start or stop guest instances No ability to rediscover existing nodes or guest VMs on restart No impact on currently deployed instances 38

Pets vs. cattle metaphor Pets are given names like mittens.mycompany.com Each one is unique, lovingly handraised and cared for When they get ill, you spend money nursing them back to health Cattle are given names like vm0213.cloud.mycompany.com They are almost identical to other cattle When one gets ill, you shoot it and get another one 39

Component failure impact (continued) Compute node Loss of VMs on that node Recovery is by restart and re-provisioning of physical server Can be mitigated through application design VM instances Loss of workload Recovery is by booting a replacement instance (cattle) Can be mitigated through application design 40

Component failure assessment Control Node Highest priority Recovery realistically requires complete cloud restart Compute Node & VM instances Application level recovery is normal practice for existing clouds Not existing enterprise expectation, but workaround exists for new workloads Admin Server Least impact on deployed system Operation can continue with no impact on end users 41

Status Quo of HA in OpenStack Community is now mostly converged on a standard architecture for an HA control plane involving Pacemaker and HAproxy SUSE was first vendor to release a supported implementation of this, via an update to SUSE Cloud 3 (May 2014) No one has yet implemented a full solution for HA of compute nodes and VM guests However community discussion in the last month has generated proposals which look quite promising. HA for storage and network nodes is also still ongoing work 42

High Availability in SUSE Cloud

HA in SUSE Cloud (high level) Administration Server No longer a SPoF (Single Point of Failure) Can have multiple DNS / NTP servers Backup / restore script for cold or warm standby Control Plane Run services in a cluster to ensure availability of data and service Some OpenStack services are stateless Some can run active/active, e.g. API endpoint services The load balancer still needs protecting Database and message queue need shared storage 44

45 HA Control Plane in SUSE Cloud

46 HA Control Plane in SUSE Cloud

HA Control Plane in SUSE Cloud Fully automated cluster setup through Pacemaker barclamp Simple, intuitive web UI Allows choice of cluster size and quantity Supports multiple strategies for STONITH and storage Uses SLE HAE components Pacemaker, HAproxy, DRBD Architecture consistent with OpenStack community recommendations 47

HA Control Plane in SUSE Cloud Active/passive for PostgreSQL and RabbitMQ choice of replicated (DRBD) or shared storage Active/active for other services via HAproxy load balancer HAproxy itself is active/passive Innovative approach to Neutron L3 agent 48

Setting Expectations Not fault tolerance Small outage of services is tolerated Automated recovery within small number of minutes 4 nines availability (99.99% = ~53 mins/year) Maybe even five 9s achievable (4.32 mins/year) depending on context Some manual intervention may be necessary to repair a degraded (but still functioning) cluster 49

Simple cluster architecture Control Node 1 Control Node 2 Orchestration Telemetry Dashboard Nova Glance Neutron Cinder Keystone RabbitMQ PostgreSQL DRBD Pacemaker Cluster 50

Recommended architecture Node 1 Node 2 Orchestration Telemetry Dashboard Nova Glance Cinder Keystone Node 3 Services Cluster Node 1 Node 2 Node 3 Neutron Network Cluster Node Control 1 Node 1 Neutron Node 2 RabbitMQ Cinder RabbitMQ Dashboard Nova PostgreSQL PostgreSQL Glance Keystone DRBD DRBD or shared storage Pacemaker Cluster Database Cluster 51

Building a Pacemaker Cluster

Cluster fencing Every HA cluster needs an out-of-band fencing mechanism. This is not optional! Simplistically, if cluster communications break down, consensus is lost, and multiple nodes may contend for the same data/service ( split brain syndrome) Solution: STONITH (Shoot The Other Node In The Head) Popular fencing devices include IPMI, IBM RSA, HP ilo, Dell DRAC We'll use SBD (Storage-Based Death) which allows sending poison pill messages via shared block storage device. 53

Exercise #3: build a Pacemaker cluster In VirtualBox GUI, view settings for controller1 and controller2 Observe locations of extra disks (SBD and DRBD) Which disk is shared? Point a browser at the Crowbar web UI http://localhost:3000 Follow instructions for deploying a Pacemaker cluster SBD device is /dev/sdc On admin node: Run tail -f /var/log/crowbar/chef_client/* 54

Exercise #4: check cluster health Wait for chef-client to complete on both controller nodes Pacemaker proposal should finish applying and go green Connect to controller1 node vagrant ssh controller or connect to admin node and ssh controller1, or use VM console in VirtualBox Root password (vagrant) and EULA as before Run crm_mon and check cluster has two nodes online Visit Hawk web UI on https://localhost:7630 55

Exercise #5: build HA cloud Follow remaining instructions for deploying the remaining barclamps This will take quite a long time (at least 30 minutes) If you are feeling lazy, you can use a tool to this automatically: crowbar batch --timeout 1200 \ build HA-cloud.yaml Watch Hawk web UI and crm_mon output as Crowbar/Chef automatically add resources to the cluster 56

Exercise #6: simulate failures Follow instructions for testing cluster failover 57

Exercise #7: recover degraded cluster Follow instructions for recovering a degraded cluster 58

59 If you made it this far, well done!

Corporate Headquarters Maxfeldstrasse 5 90409 Nuremberg Germany +49 911 740 53 0 (Worldwide) www.suse.com Join us on: www.opensuse.org 60

Unpublished Work of SUSE LLC. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.