Planning for the Worst SAS Grid Manager and Disaster Recovery



Similar documents
Best Practices for Implementing High Availability for SAS 9.4

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Avaya Aura Virtualized Environment

Using Live Sync to Support Disaster Recovery

Data Center Optimization. Disaster Recovery

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Building your Server for High Availability and Disaster Recovery. Witt Mathot Danny Krouk

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Deployment Options for Microsoft Hyper-V Server

Table of contents. Matching server virtualization with advanced storage virtualization

High Availability and Clustering

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R

High Availability And Disaster Recovery

Rajesh Gupta Best Practices for SAP BusinessObjects Backup & Recovery Including High Availability and Disaster Recovery Session #2747

Neverfail Solutions for VMware: Continuous Availability for Mission-Critical Applications throughout the Virtual Lifecycle

Technical Paper. Leveraging VMware Software to Provide Failover Protection for the Platform for SAS Business Analytics April 2011

Building disaster-recovery solution using Azure Site Recovery (ASR) for Hyper-V (Part 1)

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Server and Storage Virtualization: A Complete Solution A SANRAD White Paper

BME CLEARING s Business Continuity Policy

Active-Active and High Availability

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

Addendum 5 STATE OF LOUISIANA. Division of Administration Office of Technology Services RFP #:

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

The Benefits of Virtualizing

Leveraging Virtualization for Disaster Recovery in Your Growing Business

High Availability with Windows Server 2012 Release Candidate

Disaster Recovery with EonStor DS Series &VMware Site Recovery Manager

Real-time Protection for Hyper-V

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

be architected pool of servers reliability and

HIGHLY AVAILABLE MULTI-DATA CENTER WINDOWS SERVER SOLUTIONS USING EMC VPLEX METRO AND SANBOLIC MELIO 2010

How to Set Up Disaster Recovery for HP OO

Zerto Virtual Manager Administration Guide

Information Services hosted services and costs

Affordable Remote Data Replication

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

AUTOMATED DISASTER RECOVERY SOLUTION USING AZURE SITE RECOVERY FOR FILE SHARES HOSTED ON STORSIMPLE

Double-Take Replication in the VMware Environment: Building DR solutions using Double-Take and VMware Infrastructure and VMware Server

Getting Started with Endurance FTvirtual Server

Drobo How-To Guide. Topics Drobo and vcenter SRM Basics Configuring an SRM solution Testing and executing recovery plans

Eliminate SQL Server Downtime Even for maintenance

Unitt Zero Data Loss Service (ZDLS) The ultimate weapon against data loss

Experiences with Building Disaster Recovery for Enterprise-Class Clouds

Veritas InfoScale Availability

VMware Virtual SAN Design and Sizing Guide TECHNICAL MARKETING DOCUMENTATION V 1.0/MARCH 2014

Virtual Infrastructure Security

Connect Converge / Converged Infrastructure

Using Hitachi Protection Manager and Hitachi Storage Cluster software for Rapid Recovery and Disaster Recovery in Microsoft Environments

SAP HANA virtualized Technology Roadmap. Arne Arnold, SAP HANA Product Management September, 2014

COMLINK Cloud Technical Specification Guide VIRTUAL PRIVATE SERVERS

Federated Application Centric Infrastructure (ACI) Fabrics for Dual Data Center Deployments

ABSTRACT. February, 2014 EMC WHITE PAPER

TOP FIVE REASONS WHY CUSTOMERS USE EMC AND VMWARE TO VIRTUALIZE ORACLE ENVIRONMENTS

Ingres High Availability Option

Addendum 03. This is the Final Extension in response and due to the above received request:

SQL Server 2012/2014 AlwaysOn Availability Group

Virtualization & Covance Inc.

Deploying the BIG-IP System with VMware vcenter Site Recovery Manager

Veritas Storage Foundation High Availability for Windows by Symantec

IBM PROTECTIER: FROM BACKUP TO RECOVERY

ABSTRACT INTRODUCTION SOFTWARE DEPLOYMENT MODEL. Paper

Softverski definirani data centri - 2. dio

VMware Business Continuity & Disaster Recovery Solution VMware Inc. All rights reserved

How to use Cloud Solutions by Swisscom for Disaster Recovery. Whitepaper. Fabian Haldimann Stefan Lengacher Thomas Gfeller

Cookbook Disaster Recovery (DR)

Replication Overview

Optimize VMware and Hyper-V Protection with HP and Veeam

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

SanDisk ION Accelerator High Availability

WINDOWS AZURE EXECUTION MODELS

Continuous Data Protection for any Point-in-Time Recovery: Product Options for Protecting Virtual Machines or Storage Array LUNs

The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality?

Oracle Hyperion Financial Management Virtualization Whitepaper

Deployment Topologies

Active-Active ImageNow Server

Server and Storage Virtualization with IP Storage. David Dale, NetApp

SteelFusion with AWS Hybrid Cloud Storage

Surviving the Worst: Disaster Recovery for OpenStack

Windows Server 2008 Hyper-V Backup and Replication on EMC CLARiiON Storage. Applied Technology

Mirror File System for Cloud Computing

Symantec Storage Foundation High Availability for Windows

Transcription:

Paper SAS1897-2015 Planning for the Worst SAS Grid Manager and Disaster Recovery ABSTRACT Glenn Horton and Doug Haigh, SAS Institute Inc. Many companies use geographically dispersed data centers running SAS Grid Manager to provide 24/7 SAS processing capability with the thought that if a disaster takes out one of the data centers, another data center can take over the SAS processing. To accomplish this, careful planning must take into consideration hardware, software, and communication infrastructure along with the SAS workload. This paper looks into some of the options available, focusing on using SAS Grid Manager to manage the disaster workload shift. INTRODUCTION We will consider several options for Disaster Recovery (DR) strategies. In making the best choice for your organization you must consider several factors such as: Cost, including hardware, software and administrative costs Complexity, including network and storage infrastructure Available technologies, including data replication and virtual machines (VMs) Recovery Time Objective (RTO) Recovery Point Objective (RPO) The following sections cover various strategies and compare the strategies in terms of the factors listed above. The strategies we will cover are: Active-active SAS Grid Manager instances, with and without VMs Active-passive SAS Grid Manager instances, with and without VMs A single SAS Grid Manager instance spread across two sites A common requirement in all of these strategies is data replication between sites. That topic is not covered in this presentation, because it is very specific to the selected storage technology, the vendor used, and the Recovery Point Objective. Also, the topic of licensing implications is not covered in great detail here. Licensing should be discussed with your SAS account team based on the strategies you choose. Another factor to consider is your Recovery Time Objective (RTO), or the amount of time it takes for the DR site to become fully functional as the various strategies may support different RTOs. ACTIVE-ACTIVE SAS GRID MANAGER INSTANCES An active-active configuration includes two SAS Grid Manager instances, which means two IBM Platform LSF (LSF) clusters, that are running in distinct data centers. For example, you may have your production SAS Grid Manager instance running in one data center and your development SAS Grid Manager instance running in another data center. In the event of a disaster at the production site the production SAS Grid Manager instance can be started at the DR site. The simplest SAS deployment strategy requires the DR site to have the capability of bringing up the operating system instances that were running in the failed site. Two strategies for bringing up the failed OS instances at the DR site are: VMs and the associated VM high availability (HA) technology network booting (in a configuration that does not use VMs) 1

These strategies are the most transparent, because whenever production fails over to the DR site, the OS instances are exactly what was running prior to the DR event. The advantages of this approach are: The SAS configuration can be replicated from the failed site to the DR site. No initial configuration is required at the DR site and no manual actions are required to keep the sites synchronized. The configuration can be replicated using a storage replication technology, because the machine names will not change during a DR event. Because machine names do not change, no client reconfiguration is required when a disaster takes place or when the DR system is tested. Obviously, this assumes the network can make the failover transparent to clients. Another approach would be to mount the replicated production file systems on the development OS instances. However, there are several complications that would need to be addressed for this to be an effective strategy. Some of those complications are: Both the production and the development machines need to be configured into SAS metadata for features such as SAS Workspace Server load balancing. This isrequired so that the configuration data in the production SAS Metadata Server can used on the development operating systems. Both the production and the development machines need to be configured into LSF. This is required so that SAS load balancing will work when using the production LSF on the development operating systems. Network accommodations need to be made (for example Virtual IP addresses put in place) to allow clients that typically connect to the production site to connect to the development site in a DR situation. This active-active strategy might make sense if, for example, it is acceptable to the business for the development environment to be down for a longer time than it is acceptable for the production environment to be down. ACTIVE-PASSIVE SAS GRID MANAGER INSTANCES The primary difference between the active-active approach and the active-passive approach is that there are no SAS license costs for the passive site. In the previous section, we made the assumption that it was acceptable for the development site to be unavailable in the case of a failure at the production site. If that is not the case, then a better choice might be to have more than one active site and a dedicated passive DR site. The DR site could be used as the DR site for either the production site or for the development site. This removes the limitation of having to operate for some period of time at reduced capacity. One obvious drawback to this approach is storage cost, because both data and SAS installation and configuration information need to be replicated from both the production site and the development site to the DR site. Another drawback is the requirement for three physical sites, because we need to plan for a disaster at either the production or development site. A SINGLE SAS GRID MANAGER INSTANCE SPREAD ACROSS TWO SITES You likely noticed that SAS Grid Manager isn t being used to help meet any of the DR requirements in the first two scenarios. That s going to change as we consider a single SAS Grid Manager instance that is spread across two physical sites. In this approach, although we have a single SAS Grid Manager instance across two physical sites (and therefore a single LSF instance across two sites), the two sites are used for different purposes. For example, one of the sites could be used for production workload and the other site used for development workload. In a configuration like this, the SAS data for a given site is typically local to that site for performance reasons. However, the production data must be replicated to the development site to support a case such as failover of the production system to the development site. Other considerations that are required for this HA configuration but are atypical for a single-site SAS Grid Manager environment include: 2

Although a single LSF instance is used, each site has its own identical copy of the LSF installation and configuration. This is required because if the site hosting the LSF file system goes down, that failure does not take down the other site due tothe LSF directory being unavailable. This consideration implies that when a LSF configuration change is made, that change must be replicated in the mirrored directory at the alternate site. IBM Platform Enterprise Grid Orchestrator (EGO) services for a given environment must be defined so that they fail over from their primary machine to other machines in their primary site, and to the DR site only if all of the primary site is down. This prevents services from failing over to the DR site in situations such as taking a machine down for maintenance in the primary site, or a single machine failure in the primary site. These services include components such as SAS Metadata Server, SAS Middle Tier and Platform Process Manager. An EGO service could be created so that when it fails over to the DR site, it mounts the replicated file systems that are required by the SAS environment that is failing over. Network components, such as Virtual IP Address switches, must be capable of handling failover from the primary to the DR site. This allows clients to find SAS components such as SAS Metadata Server and SAS Middle Tier without requiring the clients to be reconfigured. CONCLUSION You need to consider many criteria when you create a DR plan,including: What is your RTO? Will you use an active-active or an active-passive configuration? If active-active, will there be a single SAS Grid Manager instance or multiple SAS Grid Manager instances? If active-passive, will you use VMs, network boot or some manual approach to getting the system running at the DR site? Which systems need a DR plan? What is your RPO? What is your data replication strategy? What is your test plan and test frequency? What are your IT standards? Each of these decisions can have implications for cost, ease of implementation and ease of recovery in the case of a disaster. However, not making a decision can be more costly. Of businesses that experience a disaster and do not have a DR plan, 80% are out of business in one year or less. Can you afford to wait? REFERENCES Spectra. 2011. Three Key Requirements of a Sound Disaster Recovery Strategy. Availablehttps://www.spectralogic.com/index.cfm?fuseaction=home.displayFile&DocID=3664. Accessed January 12, 2015. HotLink. 2014. Easy VMware Disaster Recovery & Business Continuity in Amazon Web Services Available http://www.hotlink.com/resources/dr_exp_wp01.pdf. Accessed February 8, 2015. 3

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Glenn Horton 100 SAS Campus Drive Cary NC 27513 SAS Institute Inc. Glenn.Horton@sas.com http://www.sas.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 4

5