My experience writing a DR service for CloudStack. Alena Prokharchyk Citrix @Lemonjet

Similar documents

Building disaster-recovery solution using Azure Site Recovery (ASR) for Hyper-V (Part 1)

Virtualization & Covance Inc.

vcloud Air Disaster Recovery Technical Presentation

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Replication Overview

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Virtualized Disaster Recovery (VDR) Overview Detailed Description... 3

VMware vcenter Site Recovery Manager 5 Technical

AUTOMATED DISASTER RECOVERY SOLUTION USING AZURE SITE RECOVERY FOR FILE SHARES HOSTED ON STORSIMPLE

CompTIA Cloud+ 9318; 5 Days, Instructor-led

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

CloudPlatform (powered by Apache CloudStack) Version 4.2 Administrator's Guide

CompTIA Cloud+ Course Content. Length: 5 Days. Who Should Attend:

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

CloudPlatform (powered by Apache CloudStack) Version Administrator's Guide

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

Optimization, Business Continuity & Disaster Recovery in Virtual Environments. Darius Spaičys, Partner Business manager Baltic s

Consulting Solutions Disaster Recovery. Yucem Cagdar

Our Cloud Backup Solution Provides Comprehensive Virtual Machine Data Protection Including Replication

Virtualized Disaster Recovery (VDR) Overview Detailed Description... 3

VirtualclientTechnology 2011 July

Building a big IaaS cloud with Apache CloudStack

OnApp Cloud. The complete platform for cloud service providers. 114 Cores. 286 Cores / 400 Cores

The Shift Cloud Computing Brings to Disaster Recovery

Availability for the modern datacentre Veeam Availability Suite v8. Henk Arts Senior System Engineer (teamlead), Veeam Software

Citrix XenServer Industry-leading open source platform for cost-effective cloud, server and desktop virtualization. citrix.com

How To Use Vcenter Site Recovery Manager 5 With Netapp Fas/Vfs Storage System On A Vcenter Vcenter 5 Vcenter 4.5 Vcenter (Vmware Vcenter) Vcenter 2.

Nutanix Solution Note

Cloud Optimize Your IT

Availability for the modern datacentre Veeam Availability Suite v8 & Sneakpreview v9

Veeam Summer School. Thomas Zaatman Veeam Software

Red Hat Enterprise Virtualization Disaster Recovery

Top 5 Disaster Recovery Reports IT Risk and Business Continuity Managers Live For

CloudStack Metering Working with the Usage Data. Tariq Iqbal Senior

Leveraging Public Cloud for Affordable VMware Disaster Recovery & Business Continuity

Designing Apps for Amazon Web Services

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Cloud.com CloudStack Release Notes

Availability for your modern datacenter

Identity and Access Management for the Cloud What You Need to Know About Managing Access to Your Clouds

Enterprise Java Applications on VMware: High Availability Guidelines. Enterprise Java Applications on VMware High Availability Guidelines

Asigra Cloud Backup V13.0 Provides Comprehensive Virtual Machine Data Protection Including Replication

HBC How to build your cloud - Steps to Extend your Datacenter

How To Run A Modern Business With Microsoft Arknow

MAKING YOUR VIRTUAL INFRASTUCTURE NON-STOP Making availability efficient with Veritas products

VMware System, Application and Data Availability With CA ARCserve High Availability

RackWare Solutions Disaster Recovery

VMware vcloud Air - Disaster Recovery User's Guide

XenServer Pool Replication: Disaster Recovery

VMware Site Recovery Manager (SRM) Lessons Learned in the First 90 Days

Deploying the BIG-IP System with VMware vcenter Site Recovery Manager

Building Multi-Site & Ultra-Large Scale Cloud with Openstack Cascading

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

Part2 Hyper-V Replica and Hyper-V Recovery Manager. Datacenter Specialist

SnapManager 2.0 for Virtual Infrastructure Best Practices

Veeam Availability Suite

Disaster Recovery As A Service Storage by CloudGrid and Zerto Virtual Replication Disaster Recovery and Business Continuity Platform

Veeam Backup and Replication Architecture and Deployment. Nelson Simao Systems Engineer

Vembu VMBackup v3.1.0 BETA

Configuring High Availability for VMware vcenter in RMS Distributed Setup

VMware Business Continuity & Disaster Recovery Solution VMware Inc. All rights reserved

CloudStack Identity and Access Management (IAM) Citrix

Disaster Recovery (DR) Planning with the Cloud Desktop

Storage and Disaster Recovery

Course Syllabus. Maintaining a Microsoft SQL Server 2005 Database. At Course Completion

Managing Physical and Virtual Machines in Paragon Protect & Restore

Hybrid Cloud: Overview of Intercloud Fabric. Sutapa Bansal Sr. Product Manager Cloud and Virtualization Group

How To Manage A Cloud System

CommVault Extends Its Data Protection and Information Management Strategy with Simpana 9

Backup and recovery as agile as the virtual machines being protected

Virtualization and IaaS management

Availability for the modern datacentre

Citrix CloudPlatform (powered by Apache CloudStack) Version 4.5 Administration Guide

Availability for the modern datacentre Veeam Availability Suite v8. Henk Arts Senior System Engineer (teamlead), Veeam Software

Testing Mitel Contact Center and Call Accounting Software with VMware vcenter Site Recovery Manager, vsphere Advanced Features, and vcloud Director

OVERVIEW. The complete IaaS platform for service providers

VMware Virtualization for Business Continuity and Disaster Recovery. Guy Bowers Senior Systems Engineer Q1 2010

Tintri VMstore with Hyper-V Best Practice Guide

Planning for the Worst SAS Grid Manager and Disaster Recovery

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Cloud Database Demystified to Deliver SaaS Customer Value

VMware Site Recovery Manager Overview Q2 2008

Explain how to prepare the hardware and other resources necessary to install SQL Server. Install SQL Server. Manage and configure SQL Server.

Use Case Brief CLOUD MANAGEMENT SOFTWARE AUTOMATION

Computer Visions Course Outline

Transcription:

My experience writing a DR service for CloudStack Alena Prokharchyk Citrix @Lemonjet

What is a disaster for the cloud Disaster for the Cloud is hardware/software failure,network/power outage, physical damage to the data center (DC) Disaster can cause partial or entire DC failure As a result, VMs become unresponsive and needs to be restored in another DataCenter DR products goal is to prepare VM s for failover and recover them in a short time frame

Existing DR solutions in CS Recurring snapshots feature! No out-of-box cross zones recovery solution

What new DR service does Lets admin to configure recovery service w/o putting extra scripts and config files Prepares for disaster and restores VM and all its metadata - Networks/Networking rules Recovers VM cross zones Real time updates for the recovery VMs' metadata - helps to keep MTTR (Mean Time to Repair) low Provides tiered DR service - most important apps/ accounts can be recovered first

Things DR service doesn t cover No Storage replication is done by DR service, only metadata replication Storage replication is covered by the admin outside of CS (NetApp s Snapmirror)

Which version of Cloudstack is supported by DR? DR works with: Cloudstack 4.5 version Next Citrix CloudPlatform release based on ASF 4.4

Design principles followed while writing the DR Develop as a CS plugin in V1 with ability to run as a separate service in the future versions No changes to core/server CS code that are specific just to DR No direct access to CS DB. All data manipulation through CS APIs only DR service doesn t have its own DB in Version 1. All DR data is stored in CS DB in form of resources metadata Rely on MTBF (Mean Time Between Failures) to be high. Never fail VM in original zone if its preparation fails, let admin fix things and retry

DR Service deployment DR service CloudStack DR UI plugin DR Events listener DR UI plugin DR API plugin DR API plugin DR Server CS UI CS API CS Orchestration engine CS Services /Plugins DR Service DR Events Event listener message bus

DR process Configuration - configuring the DR service Preparation - preparing VM for failover Failover - failing over the vm to the Recovery zone Failback - failing back the vm to its Original zone

Configuration DR Setup Active zone with the Recovery zone Configure DR offerings (SLAs) Tag storages for the DR VMs volumes placement

Preparing VM for failover DR service listens to events from CS, and deploys/ updates a recovery VM metadata in the Recovery zone Recovery Vm doesn t occupy physical resources on the CS side Recovery VM is invisible to an end user

Preparing VM for failover Active zone Recovery zone UserVm Nic1 DR Service UserVm Nic1 Nic 2 Nic 2

Failover process Process of restoring failed vm in the recovery zone DR doesn t do automatic indication that the Disaster happens DR admin triggers failover for the VM by calling the DR API DR service performs the failover process

Failover process Active zone UserVm UUID1 DR Service Recovery zone UserVm UUID1 Volume1 Volume1 Volume2 CS storage1 Volume2 CS storage2 Physical storage1 Volume1 Volume2 NetApp SnapMirror Physical storage2 Volume1 Volume2

Failback process Process of moving VM back to its original zone Vm metadata is preserved in the original zone and re-used when vm is recovered Recovery VM s volumes get re-introduced to the original zone, and attached to the original vm VM in the recovery zone gets disabled VM in the original zone gets enabled UUID swap happens

DR metadata in CS DB CS DB user_vm id name zone_id user_vm_details vm_id detail_name detail_value 1 VM-user1 1 1 DR_RECOVERY_ID 2 2 VM-user1 2 1 DR_STATE 1 DR_ALERT FAILED_TO_PREPARE_FOR_ DR Failed to attach Nic to the Recovery vm

Who controls the DR process Admin controls recovery process on behalf of users VMs End user can monitor: - DR state of his VMs - Ready to Failover / FailedOver - Recovery zone info - to which zone the VM recovers in case of failure - Recovery public ip address(es) info - to reconfigure his public DNS

CS API enhancements Added some missing data to CS API responses Added missing resource_details tables for some CS resources Put in the support for CS services to publish Alerts via CS APIs Introduced External UUID management Implemented resource creation with delayed start for some objects (VPC)

Things yet to fix on CS Single sign on is missing Resource creation in the DB and actual implementation are not granular enough

Summary If you are an API developer for open source IaaS product: Always think from an end user/customer use case perspective while adding/modifying end user APIs Look out what plugins/services/bug fixes people write for your software. Helps to define missing pieces/common problems in your software