Oracle EPM Disaster Recovery High Level Overview

Similar documents
Building your Server for High Availability and Disaster Recovery. Witt Mathot Danny Krouk

Protecting Microsoft SQL Server

Operational Continuity

DB2 9 for LUW Advanced Database Recovery CL492; 4 days, Instructor-led

Disaster Recovery Strategy for Microsoft Environment

PROTECTING MICROSOFT SQL SERVER TM

Table of Contents Chapter 1 - Getting Started with Oracle Data Relationship Management (DRM) 1

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Neverfail for Windows Applications June 2010

Cloud Backup and Recovery

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

PROFESSIONAL SERVICES

Critical SQL Server Databases:

Automated Disaster Recovery With BMC Atrium Orchestrator

SQL Server for Database Administrators Course Syllabus

Core Solutions of Microsoft Lync Server 2013

INSIDE. Preventing Data Loss. > Disaster Recovery Types and Categories. > Disaster Recovery Site Types. > Disaster Recovery Procedure Lists

Perforce Backup Strategy & Disaster Recovery at National Instruments

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

courtesy of F5 NETWORKS New Technologies For Disaster Recovery/Business Continuity overview f5 networks P

Consulting Solutions Disaster Recovery. Yucem Cagdar

Virtual Disaster Recovery

Administering a Microsoft SQL Server 2000 Database

Introduction to Enterprise Data Recovery. Rick Weaver Product Manager Recovery & Storage Management BMC Software

Windows Geo-Clustering: SQL Server

Data Protection. for Virtual Data Centers. Jason Buffington. Wiley Publishing, Inc. WILEY

SQL Server High Availability: After Virtualization SQL PASS Virtualization Virtual Chapter September 11, 2013

Buyer s Guide Checklist - What to Look For in Online Backup and Recovery Services

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Disaster Recovery for Oracle Database

Database as a Service (DaaS) Version 1.02

Course 2788A: Designing High Availability Database Solutions Using Microsoft SQL Server 2005

Exchange Data Protection: To the DAG and Beyond. Whitepaper by Brien Posey

Missed Recovery Techniques for SQL Server By Rudy Panigas

Taking EPM to new levels with Oracle Hyperion Data Relationship Management WHITEPAPER

The Shift Cloud Computing Brings to Disaster Recovery

VMware vsphere 5.1 Advanced Administration

Running Successful Disaster Recovery Tests

Core Solutions of Microsoft Lync Server 2013

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

M6430a Planning and Administering Windows Server 2008 Servers

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Planning and Administering Windows Server 2008 Servers

Administering a Microsoft SQL Server 2000 Database

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

Maintaining a Microsoft SQL Server 2008 Database

Core Solutions of Microsoft Lync Server 2013

Contingency Planning and Disaster Recovery

Developing a Complete RTO/RPO Strategy for Your Virtualized Environment

Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465

Lesson Plans Microsoft s Managing and Maintaining a Microsoft Windows Server 2003 Environment

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

White Paper. Mimosa NearPoint for Microsoft Exchange Server. Next Generation Archiving for Exchange Server By Bob Spurzem and Martin Tuip

Deploying Exchange Server 2007 SP1 on Windows Server 2008

Website Disaster Recovery

Administering a Microsoft SQL Server 2000 Database

SQL Server Training Course Content

5054A: Designing a High Availability Messaging Solution Using Microsoft Exchange Server 2007

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

Designing and Deploying Messaging Solutions with Microsoft Exchange Server 2010 Service Pack B; 5 days, Instructor-led

Technical Considerations in a Windows Server Environment

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

How To Choose A Business Continuity Solution

MySQL Enterprise Backup

MaximumOnTM. Bringing High Availability to a New Level. Introducing the Comm100 Live Chat Patent Pending MaximumOn TM Technology

VMware vsphere 5.0 Boot Camp

EMC Disk Library with EMC Data Domain Deployment Scenario

Blackboard Managed Hosting SM Disaster Recovery Planning Document

High Availability and Disaster Recovery Solutions for Perforce

SQL Server AlwaysOn Deep Dive for SharePoint Administrators

Data Center Optimization. Disaster Recovery

Unitrends Backup & Recovery Solutions and Disaster Recovery Best Practices

Real-time Protection for Hyper-V

End-to-End Availability for Microsoft SQL Server

Selecting the Right Change Management Solution Key Factors to Consider When Evaluating Change Management Tools for Your Databases and Teams

SQL-BackTrack the Smart DBA s Power Tool for Backup and Recovery

10533: Deploying, Configuring, and Administering Microsoft Lync Server 2010 Duration: Five Days

Explain how to prepare the hardware and other resources necessary to install SQL Server. Install SQL Server. Manage and configure SQL Server.

Managing and Maintaining Windows Server 2008 Servers

High Availability for Microsoft SQL Server 7.0 Using Double-Take

With 57% of small to medium-sized businesses (SMBs) having no formal disaster

EMC SOLUTIONS TO OPTIMIZE EMR INFRASTRUCTURE FOR CERNER

Five Fundamentals for Modern Data Center Availability

MOC 5047B: Intro to Installing & Managing Microsoft Exchange Server 2007 SP1

Disaster Recovery with EonStor DS Series &VMware Site Recovery Manager

Transcription:

By: Damon Hannah, Managing Consultant Oracle EPM Disaster Recovery High Level Overview Abstract: Few Enterprise Performance Management (EPM) topics are more discussed and less understood than Disaster Recovery (DR). What does Disaster Recovery really mean? What are the Disaster Recovery options for Oracle EPM? And more importantly what are the nitty gritty details you absolutely have to get right? We will discuss two Disaster Recovery options for the most common EPM components. We will also go into detail on what a Disaster Recovery plan must include to be successful.

What is Disaster Recovery? What does Disaster Recovery really mean? Disaster Recovery is a nebulous term that can be used to describe or define any number of scenarios or situations. DR is defined by Dictionary.com as Planning and implementation of procedures and facilities for use when essential systems are not available for a period of time long enough to have a significant impact on the business That definition is moderately helpful, except that it still leaves a great deal open to interpretation. What constitutes a significant impact on the business and more importantly who decides for your organization? Oracle s documentation outlines the following purpose for a Disaster Recovery; Addresses service continuity so that in case of a disaster, service is maintained through a standby site. This begs the question; what constitutes a disaster? In the end each organization or business must decide for itself what constitutes a disaster, and who within the organization decides when one has occurred. Your organization may have a more relaxed view of Disaster Recovery and may intend to initiate a failover plan in the event of performance issues, server failures, network issues, or other unplanned outages. While Disaster Recovery plans can be customized to meet these needs, they can also be mitigated using Fault Tolerant (FT) and Highly Available (HA) solutions. For the purposes of this document, a disaster is defined as anything that results in a site-wide failure of the primary Data Center. Therefore, Disaster Recovery is the processes and procedures necessary to restore essential systems (Oracle EPM for our purposes) in the event of a site-wide failure of the primary Data Center. Disaster Recovery Options There are an unlimited number of options for setting up a Disaster Recovery plan for an EPM 11.1.2.3 environment. They span the field from a simple export/import of your primary EPM application to another environment, to a full replication of Production to a dedicated Disaster Recovery environment. Many of the EPM components have similar recovery requirements or options. However, there are some EPM components with unique recovery requirements. Understanding these is paramount to a successful DR plan. Putting Together a Disaster Recovery Solution The critical first step is Requirements Gathering, which identifies the specific requirements for Disaster Recovery. Without requirements, it is impossible to properly design a DR solution. Once requirements are agreed upon, the method used for recovery is identified; replicating the Production Servers in a DR environment, repurposing another environment as the DR environment, or another custom solution can be determined. A detailed plan must then be created to meet every requirement identified; backups, schedule and method of backups, retention plans for backups, storage and replication of backups to the DR site, and owner of each backup, etc. When the backup plan is complete, it must be implemented, independently verified, and documented. When the backups are in place and documented, a Recovery Plan should be thought out, documented, and any prerequisite pieces put in place (such as creating Essbase applications in the Target environment). A step-by-step recovery guide should then be created. The guide will include a High Level 2

Checklist - complete with task ownership, overseeing identification, and verification steps. A detailed process for each High Level Task should be included with step-by-step directions for completing each task. These steps should be executable by anyone familiar with Oracle EPM Systems but with zero environment specific knowledge. Finally, verification/validation testing should be documented, including testing owners and pass/fail qualifications. Once the proper documentation, backups, and processes are in place, a failover test should be executed. The initial test should focus on documentation accuracy and a successful failover from an Infrastructure Perspective. Application availability and 100% functionality should be verified; however data is not a critical part of this initial test. Following a successful infrastructure test, a failover test including data validation and end-to-end testing should be conducted. During all tests, the DR Failover documentation should be followed step-by-ste Any discrepancies should be noted and documentation updated. This will ensure subsequent tests are successful regardless of the parties involved. Dedicated Disaster Recovery Environment Now that we have looked at the different EPM components and their recovery requirements, let us look at a couple of ways to put them all together. First up is a dedicated DR environment solution. In this example, the DR Environment was installed and configured following the same steps that were used to build the Production EPM environment. The server and RDBMS entries were configured using the Production instance names; local host aliases were used to ensure all entries were resolved to the DR components. We will then look at a custom solution that repurposes a Quality Assurance (QA) environment as a DR solution. This solution makes less use of replication and tends to be more complicated. It does however have the benefit of being less expensive, usually! This first example is an EPM 11.1.2.3 implementation. The DR environment was built by repeating the steps used to install and configure the Production environment. Prior to configuring the DR environment, local host aliases were created on each of the servers, mapping the Production server names and DR server names to the DR server IP addresses. This ensured the EPM configurations could use the Production server names and eliminate a mismatch within the replicated Oracle schemas. The Production applications must be migrated to the DR environment to seed the environment. Once the DR environment is seeded, replication and other processes are put in place to ensure the RPO and RTO requirements can be met. For example, command scripts are used to take nightly LCM exports of Shared Services, Planning applications, and Native Essbase applications. Shell scripts are used to take Level 0 data exports of all Essbase applications. All exports are stored on either NAS or SAN, which is replicated to the DR environment. Repurposed Disaster Recovery Environment This example is of an EPM 11.1.2.3 implementation. The plan is based on repurposing the Quality Assurance (QA) environment as the Production environment in the event of a DR. The existing QA applications, security, and data are not critical. While efforts will be made to back up the QA objects as part of the failover, their survival is not paramount. The Recovery Point Objective (RPO) and Recovery Time Objective are (RPO: HFM = 0, Planning/Essbase < 24 hours)(rto < 8 hours). The Recovery Time 3

Objective is impacted by a number of items beyond the control of the Oracle EPM Team. Some of these include network services, database services, Domain Name Services, and LDAP/MSAD authentication services. The RTO assumes all of these required services are available and does not take into account the time required to restore those services. Limiting its dependencies on these other teams, especially DBAs can help make the DR plan or strategy more efficient when the time comes. The QA environment was architected to mirror the Production environment and it consists of the same number, size, and configuration of servers. The EPM installation and configuration also mirrors that of Production. However, the instance names, dbs, server names, etc. are all unique to the QA environment. The DR Solution is based on migrating the required objects from Production to QA (DR) in the event of a disaster. The key being that the export portion of the migration must be done prior to the Production environment being lost due to a disaster. Each EPM component is taken individually to ensure its export migration requirements are met. Those steps are then scripted through batch and shell scripts, and automated through third party tools. The import portion of the migrations is primarily a manual effort, although automating some of the pieces should be possible. The Life Cycle Management utility is the primary tool used for creating the required backups or migration exports needed for DR. LCM export migrations are created for Planning, HFM, as well as Essbase applications and EPM System Security from the Production environment. These are stored on a NAS that is replicated to the QA environment. A weekly process is run to take cold backups of the entire Essbase volume, including all Essbase application objects, configuration files, and the Essbase.sec (security) file. These Essbase backups and exports are stored on SAN which is replicated to the QA environment. In the event of a DR failover, the exported content is imported using the same method or tool used for the export. When all objects have been imported, base level functionality testing is completed. This includes a Health Check checklist that ensures basic functionality is working. HFM and Planning/Essbase financial reports are executed to validate security, data source connectivity, HFM and Planning/Essbase functionality, and Foundation and RA Services are working as expected. Following the base level testing, the application owners begin data validation testing. This includes a deeper dive into the full EPM functionality and data validation. This level of testing closely resembles Systems Integration Testing (SIT) to ensure data flows through the entire system properly. Critical Details There are many steps in designing and implementing an Oracle EPM Disaster Recovery solution; from choosing dedicated vs. shared environment, to using Life Cycle Management or RDBMS schema exports. None of these is more important and critical to the success of the plan than requirements gathering. Many times a disaster recovery solution is developed without ever having consulted the business or application owners to identify or understand the actual requirements. Are all Production applications required in a DR scenario? What level of resiliency is necessary in a DR environment? Will the upstream and downstream systems change? Will integrations change in DR? Will a full complement of users access the system in a DR scenario? What are the actual Recovery Point Objective (RPO) and Recovery Time Objective (RTO)? And what costs are you willing to accept to meet those requirements? Many of these questions are never asked or not fully answered or understood; by one or both sides. For example, an application owner that insists on an RPO of zero (zero data loss) for an Essbase application, likely doesn t understand the implications in terms of downtime to meet this requirement. 4

The requirements must be gathered and challenged to ensure 1) that they are truly requirements and 2) that the costs for meeting the requirements are understood. Another critical piece is the backup strategy employed to meet the Recovery Point Objective. Many EPM components can be properly and easily backed up using the Life Cycle Management s command line utility. LCM migrations can be scripted using the LCM GUI and automated using 3 rd party scheduling tools. Most LCM migrations can be run during normal business processes with little or no impact to the end user. They can be scheduled to run as often as necessary to meet defined RPOs. The use of replication goes hand in hand with a proper backup methodology. Configuring the EPM components to use NAS or SAN simplifies replicating the data to the Disaster Recovery environment. Lastly, once the DR plan is documented and implemented, it must be tested. Sometimes, the plan must be tested repeatedly to ensure the process, as documented is complete, accurate, and meets the DR requirements. The test should be executed by the individuals tasked with executing the plan in a true Disaster Recovery scenario. The plan should be followed step-by-step to ensure there are no actions or information assumed. This is to ensure the team can complete the test in the time allowed. Updates to the DR plan are typically required during the first few tests. In the end, the plan should be able to be fully executed by individuals outside of the design team, without assistance. 5