Highest Application Availability with Oracle Optimized Solution for Disaster Recovery O R A C L E W H I T E P A P E R J U N E 2 0 1 5
Introduction Disaster recovery has come to the forefront of IT concerns as recent disasters continue to affect business operations well after the dust has settled. Disasters, whether natural or man-made, small or large, cause an average of 2.2 days of downtime costing $366,363 each year for the majority of businesses. 1 As companies increasingly rely on IT services for mission-critical operations, services, and even products, any downtime, however short, is costly and can negatively impact revenue streams, customer satisfaction, and the overall survival of the business. Statistics show that ninetythree percent of businesses that lose access to their data for ten or more days file for bankruptcy within a year, and half file immediately. 2 However, nearly forty percent of organizations do not have disaster recovery plans in place, and many of those that do are still underprepared. 1 The events most people think of when considering disaster planning are large-scale natural disasters that affect widespread areas, such as earthquakes, hurricanes, and floods. However, these types of events do not cause most downtime incidents. In fact, only ten percent of business downtime incidents can be attributed to natural disasters. The remaining ninety percent is caused by smaller, man-made events. Equipment malfunction, random power outages, theft, computer viruses, and human error are far more likely causes of costly downtime, and are typically much less planned for. Building a secure, geographically separate disaster recovery site can mitigate the effects of all types of disasters on business operations to keep companies up and running even when catastrophe strikes. Data, systems, and services can be replicated and operated offsite to prevent loss due to unforeseen events that render the primary data center unusable. The result is less downtime, improved business continuity, and increased competitiveness. Featuring fully integrated hardware and software components, built-in replication and backup functionality, and simplified and automated management tools, Oracle Optimized System for Disaster Recovery provides an ideal solution for business continuity and disaster recovery for Oracle Engineered Systems such as Oracle SuperCluster. 1. Acronis, The Acronis Global Disaster Recovery Index: 2012. 2. National Archives and Records Administration. 1 HIGHEST APPLICATION AVAILABILITY WITH ORACLE OPTIMIZED SOLUTION FOR DISASTER RECOVERY
Protecting Business with Remote Disaster Recovery Sites In today s always-on world, downtime can make or break a business. Most businesses will experience at least one downtime incident throughout the year, and the frequency and length of downtime events can impact customer satisfaction, stakeholder and partner trust, company reputation, regulatory compliance, and ultimately the longevity of the company. With forty percent of businesses closing after a disaster, and an additional quarter failing within a year of the event, disaster recovery is more important than ever to business survival. 3 Remote disaster recovery sites provide the most complete protection when catastrophic events such as fire, flood, or even sabotage incapacitate a company s primary data center. Business Continuity and Survival Business continuity hinges on reducing the number and length of downtime incidents and minimizing their effects on business operations. Each hour of downtime can cost tens to hundreds of thousands of dollars and ties up valuable resources. A remote disaster recovery site allows business processes to fail over to a set of systems unaffected by the incident, whether it is a widespread natural disaster, interruption in telecommunications or power, or a malfunctioning sprinkler system. By transferring operations to a secondary site, productivity is maintained regardless of the situation at the primary site. Outages due to planned maintenance can also be avoided by transferring services to the remote disaster recovery site while work is performed on the main production site. Service outages can greatly affect customer satisfaction, and customers today expect a higher level of availability than ever before. Excessive downtime can cause customers to seek services from competitors, resulting in lost revenue and reduced competitiveness. Migrating to a remote disaster recovery site during a downtime event not only prevents operating losses, but also protects revenue streams, allowing the business to survive and thrive. Compliance, Regulatory, and Contractual Requirements Implementing a remote disaster recovery site can ensure compliance with contractual and regulatory requirements. Nearly all businesses have some operational or data protection obligations and noncompliance costs can be high. Contractual violations can lead to the loss of partners, customers, and revenue, while transgressions against increasing government regulations can lead to legal action and hefty fines. Minimizing downtime through the use of a disaster recovery site makes certain that agreements and regulations are met. Brand, Image, and Reputation Protection Downtime or security attacks, especially at critical purchasing times, can decrease customer, partner, and shareholder satisfaction, and negatively impact a business s reputation and brand. In the digital age, media coverage, online reviews, and word-of-mouth opinions via social media spread good news fast and bad news faster, especially when it comes to business reputation. One dissatisfied client can now share their poor experience with thousands of potential customers instantly with the simple click of a button. Having a disaster recovery plan and secondary site in place builds trust with stakeholders and provides a sense of security for employees, partners, and shareholders. Disaster Mitigation Even with a disaster recovery plan in place, lack of automated failover to a secondary site can increase the risk of errors and mistakes during an incident. A synchronized disaster recovery site with automated controls reduces the amount of decision-making required by responding personnel, decreasing the chance of mistakes. By automating the backup and failover process, less responsibility is left in human hands, lower the risk of human error under duress and minimizing the effect a disaster can have on business operations. For example, if backup drives are 3. Federal Emergency Management Agency 2 HIGHEST APPLICATION AVAILABILITY WITH ORACLE OPTIMIZED SOLUTION FOR DISASTER RECOVERY
simply shipped to offsite storage at the end of each day, there s a risk that the employee responsible for packaging and delivering the drives may be out sick, on vacation, or business travel, or simply forget. In this case, a day s worth of data, or more, could be lost in a disaster event. Automating the backup process with a secondary site ensures that the requisite data, applications, and systems are replicated regularly and without fail and minimizes the potential business effect of a disaster event. Intimate Knowledge of Business Operations Designing and implementing an effective disaster recovery plan and backup site requires an in-depth review and documentation of everyday business operations to identify risks and ensure all data, applications, systems, and processes are accounted for. Often, these reviews reveal functional inefficiencies in the computing environment. With more complete knowledge of operations, steps can be taken to streamline processes, improving efficiency and return on investment even before disaster hits. Leveraging Disaster Recovery Sites for Increased ROI Many are initially concerned about the apparent cost of replicating production systems at a secondary site. As stated, these costs can become instantly justifiable when and if a disaster recovery plan needs to be put into action. Additionally, remote disaster recovery sites do not have to simply sit idle until called upon for service recovery. To increase the return on investment of a disaster recovery site and improve productivity, the infrastructure can be leveraged for extraneous activities, such as development and test work along with analysis and reporting tasks. Often, developers create and test applications on their own laptops or PCs using small sets of aged data. While this approach has the advantage of ensuring that test code doesn t cause interruptions in the production environment, it also prevents developers from testing the full scalability of their application on complete sets of current data. Developers also must spend a great deal of time repetitively installing and configuring platforms and software for testing. Because disaster recovery sites are typically equipped with compute and storage capacities comparable to the production environment, they are ideal for development and testing. Offline and backup versions of production data can be used for testing new applications over a much larger environment, and pre-configured virtual machines can be easily and quickly deployed for analysis over a variety of platforms and configurations. The results are more complete validation of new applications in less time and better utilization of personnel and data center resources. With automated backups of databases and applications, the disaster recovery site can keep a standby version of the production environment running for reporting and other read-only workloads such as ad-hoc queries and data extracts. Performing these nonessential tasks at the remote disaster recovery site allows the primary site to focus on transactional functions and improves the efficiency and productivity of the overall environment. Utilizing the disaster recovery site for extraneous tasks also ensures that the site is fully operational at all times, and is ready to take over in case of an emergency. The best-equipped secondary site will not help during an incident if it is not functional, and unfortunately, only half of companies with secondary sites test them regularly. Offloading peripheral tasks to the secondary site improves productivity of both the production and backup sites, increases the return on investment of the secondary site, and ensures that the backup site is ready when disaster hits. Oracle Optimized Solution for Disaster Recovery Although Oracle Engineered Systems are designed for high availability, unforeseen natural and man-made disasters can disable an entire data center and affect operations. Oracle Optimized Solution for Disaster Recovery starts with the Oracle SuperCluster platform running Oracle Database with Oracle Data Guard. Oracle ZFS Storage Appliance and Oracle Exadata Storage Servers provide storage for the solution while Oracle Solaris Cluster Geographic Edition and Oracle Enterprise Manager Ops Center offer advanced failover and management capabilities respectively. Deploying the solution at both a local primary site and at a remote secondary site protects enterprise 3 HIGHEST APPLICATION AVAILABILITY WITH ORACLE OPTIMIZED SOLUTION FOR DISASTER RECOVERY
implementations from service and data loss if the primary site becomes unavailable for any reason, allowing business operations to continue and preserving productivity and revenue streams. Integrated Protection In contrast with piecemeal multi-vendor database implementations, Oracle Optimized Solutions feature fully integrated hardware and software for exceptional performance, unprecedented efficiency and interoperability, and simplified deployment and operation. A disaster recovery system must take into account all operational aspects of the business and primary datacenter to be effective. For homegrown solutions, this process can be extremely complex, increasing the risk of malfunctions, errors, and omission of important aspects. Through integration, Oracle Optimized Solutions greatly simplify design and operation of a remote disaster recovery site, ensuring protection for all systems and data in an emergency.» Built-in end-to-end security. Disaster recovery solutions require more security than typical data center environments because many times they use rented or public networks to transfer data from site to site. Oracle Optimized Solutions provide functional security guidelines and best practices that protect your disaster recovery solution from end to end using built-in security technologies in every layer of the solution.» Fully tested interoperability. Because Oracle Optimized Solutions are designed from the ground up for hardware and software integration, the risk of downtime due to interoperability issues is reduced to a negligible level. The extreme integration of Oracle Optimized Solutions also improves operational efficiency and return on investment. While third-party components such as software and storage are supported, they add an extra level of complexity to the overall solution, detracting from the efficiencies created through an integrated hardware and software stack.» Increased automation. With integration comes increased opportunity for automation. Automation not only reduces the amount of time needed to fail over and restart services at a secondary site, it also decreases the risk of human errors in failover and recovery operations.» Simplified management. From bare metal to applications, every architectural layer of Oracle Optimized Solutions can be managed through a single, centralized management interface, Oracle Enterprise Manager Ops Center, greatly simplifying management and increasing management efficiency.» Streamlined support. Homegrown disaster recovery solutions can be very complex, resulting in many calls for support throughout the design, establishment, and operation of the environment. If something should go wrong, expert support for the entire Oracle Optimized Solution hardware and software is a single phone call away.» Fast, easy deployment. With the entire hardware and software stack pre-integrated, Oracle Optimized Solutions can be deployed much more quickly and easily than custom multi-vendor solutions. Full Environment Replication Using Oracle Optimized Solution for Disaster Recovery ensures that all data, applications, and systems are replicated and ready for operation if the primary site becomes unavailable. Through full system integration and duplication of all system components at the remote disaster recovery site, the solution provides faster and more complete operations recovery than simple SAN mirroring, in which only data is replicated and failover systems must be brought up and configured manually during a disaster event. Because manual installation, configuration, and deployment of physical and virtual systems is typically very time consuming, SAN mirroring for disaster recovery can result in increased downtime. The only way to ensure uninterrupted business operations during a disaster is to have an independent copy of the operating environment up and running at a remote secondary site. Figure 1 shows the basic configuration for Oracle Optimized Solution for Disaster Recovery in a local primary and remote secondary site for disaster recovery protection. Business operations are run on the Oracle SuperCluster platform using Oracle ZFS Storage Appliance for storage and Oracle s StorageTek SL3000 tape library for onsite backup and archiving. Oracle Data Guard and ZFS replication reproduce applications and data from the primary site at the secondary site continually. Tape backups can be run at both the local and remote sites for increased redundancy and protection of valuable data. If the primary site fails, operations can be quickly transferred to the 4 HIGHEST APPLICATION AVAILABILITY WITH ORACLE OPTIMIZED SOLUTION FOR DISASTER RECOVERY
secondary site for uninterrupted service. The following sections describe the details of Oracle Optimized Solution for Disaster Recovery. Local Site Oracle Solaris Cluster Geographic Edition Oracle Enterprise Manager Management Remote Site StorageTek SL3000 Tape Library Oracle Active Data Guard Replication Database Content StorageTek SL3000 Tape Library ZFS Replication Applications & Unstructured Data Oracle ZFS Storage Appliance Oracle SuperCluster Oracle SuperCluster Oracle ZFS Storage Appliance Figure 1. Oracle Optimized Solution for Disaster Recovery. Application and Data Failover Within Oracle SuperCluster, applications and unstructured (non-database) data reside in shared file systems on the integral Oracle ZFS Storage Appliance. Disaster recovery for these components utilizes the remote replication features of the Oracle ZFS Storage Appliance. In maintaining a complete replica of both applications and data, recovery time is drastically reduced compared to traditional offline backup architectures such as SAN mirroring. For increased disaster recovery site return on investment, the snapshot and clone functionality included in the Oracle ZFS Storage Appliance can be used to create database instances for test, development, and reporting operations. In addition to utilizing the secondary site for more than just disaster recovery, offloading read-only operations allows the primary production site to focus exclusively on transaction processing for performance and service gains. Database Failover Oracle Data Guard provides management, monitoring, and automation tools to create and maintain synchronized standby copies of production databases for protection against failures, corruption, and disasters at the primary site. Included with Oracle Database, Oracle Data Guard uses log shipping technology to replicate database content at the secondary recovery site. Rather than sending complete sets of data to the secondary site, Oracle Data Guard minimizes bandwidth usage and requirements by instructing the secondary standby databases to update data changed in the primary database, resulting in faster and more reliable synchronization. Before replicating data at the remote secondary site, Oracle Data Guard automatically detects and corrects corrupted data, ensuring logical and physical consistency between the production and standby databases. Oracle Data Guard improves return on investment of disaster recovery sites. Because the standby databases are fully operational, they can also be used to minimize downtime during planned maintenance on the primary production environment. Moreover, adding Oracle Active Data Guard allows read-only access to standby databases to offload queries, reporting, and backups from the primary database, improving overall performance of the environment. For heterogeneous database instances on Oracle SuperCluster, Oracle GoldenGate is recommended for database disaster recovery. Oracle GoldenGate provides database protection through log-based replication with low network overhead for heterogeneous data environments. 5 HIGHEST APPLICATION AVAILABILITY WITH ORACLE OPTIMIZED SOLUTION FOR DISASTER RECOVERY
Clustered Failover Oracle SuperCluster features clustered compute and storage operations for increased resiliency and reliability. Oracle Solaris Cluster Geographic Edition extends this concept to multiple clusters separated by long distances, allowing the overall clustered environment to tolerate a disaster that disables the primary site. Through Oracle Solaris Cluster Geographic Edition, geographically separated clusters, such as a local primary and remote secondary Oracle SuperCluster configuration, can be configured and managed and services can be migrated between clusters. Application failover procedures are automated to allow standby servers to efficiently and smoothly take over services from failing nodes with minimal interruption to services and business operations. Oracle Solaris Cluster Geographic Edition integrates with Oracle ZFS Storage Appliance and Oracle Data Guard to provide complete failover protection, data replication, and heartbeat monitoring between sites from the application layer to the storage layer. Conclusion Business operations increasingly depend on the quality and availability of IT services and tolerance for outages and downtime is nearly nonexistent, even in times of catastrophe. The cost of outages is high, and nearly all businesses experience some unplanned downtime throughout the year. Disaster recovery planning is critical to business continuity and survival, company reputation, and contractual and regulatory compliance. Remote disaster recovery sites provide the most complete protection from data center failures due to natural and man-made incidents and can even boost primary site productivity and performance. Featuring superlative hardware and software integration, Oracle Optimized Solution for Disaster Recovery based on Oracle SuperCluster is ideal for simplified, automated, and comprehensive disaster recovery with minimal interruption to operations. Contact your Oracle representative to find out more about Oracle SuperCluster and Oracle Optimized Solution for Disaster Recovery, and protect your business from unforeseen downtime. 6 HIGHEST APPLICATION AVAILABILITY WITH ORACLE OPTIMIZED SOLUTION FOR DISASTER RECOVERY
Oracle Corporation, World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065, USA Worldwide Inquiries Phone: +1.650.506.7000 Fax: +1.650.506.7200 C O N N E C T W I T H U S blogs.oracle.com/oracle facebook.com/oracle twitter.com/oracle oracle.com Copyright 2015, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0615 Highest Application Availability with Oracle Optimized Solution for Disaster Recovery June 2015, Version 1.1 Author: Dean Halbeisen