Oracle Data Guard Fast Start Failover understood!



Similar documents
Data Guard Fast-Start Failover

DisasterRecoverywith. DisasterRecoverywith Oracle Data Guard10gR2

Oracle Data Guard for High Availability and Disaster Recovery

Use RMAN to relocate a 10TB RAC database with minimum downtime. Tao Zuo tao_zuo@npd.com NPD Inc. 9/2011

Oracle Database 10g: Backup and Recovery 1-2

Oracle Failover Database Cluster with Grid Infrastructure 12c Release 1

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Oracle Data Guard. Caleb Small Puget Sound Oracle Users Group Education Is Our Passion

Oracle Database Backup & Recovery, Flashback* Whatever, & Data Guard

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

11. Configuring the Database Archiving Mode.

Oracle Data Recovery Advisor

Database Recovery For Newbies

Setup Flashback Database on Data Guard Physical Standby Database for SAP Customers

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

An Oracle White Paper March Oracle Data Guard Broker. Best Practices for Configuring Redo Transport for Data Guard and Active Data Guard 12c

SOUG-SIG Data Replication With Oracle GoldenGate Looking Behind The Scenes Robert Bialek Principal Consultant Partner

Oracle Audit in a Nutshell - Database Audit but how?

Daniela Milanova Senior Sales Consultant

ORACLE DATABASE HIGH AVAILABILITY STRATEGY, ARCHITECTURE AND SOLUTIONS

Restoring To A Different Location With EBU And RMAN An AppsDBA Consulting White Paper

Maximum Availability Architecture

DISASTER RECOVERY STRATEGIES FOR ORACLE ON EMC STORAGE CUSTOMERS Oracle Data Guard and EMC RecoverPoint Comparison

Maximum Availability Architecture. Oracle Best Practices For High Availability

Techniques for implementing & running robust and reliable DB-centric Grid Applications

Disaster Recovery of Tier 1 Applications on VMware vcenter Site Recovery Manager

Module 14: Scalability and High Availability

DB2 9 for LUW Advanced Database Recovery CL492; 4 days, Instructor-led

Deloitte Solutions Network (SNET) Disaster Recovery POC on Amazon EC2

Oracle GoldenGate on Disaster Recovery

Rob Zoeteweij Zoeteweij Consulting

Case Study: Oracle E-Business Suite with Data Guard Across a Wide Area Network

An Oracle White Paper January Oracle Active Data Guard vs Storage Remote Mirroring

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper

Contents. SnapComms Data Protection Recommendations

Palo Open Source BI Suite

Key Factors For a Successful ODA Deployment

Oracle Database Disaster Recovery Using Dell Storage Replication Solutions

ASM and for 3rd Party Snapshot Solutions - for Offhost. Duane Smith Nitin Vengurlekar RACPACK

Cisco Active Network Abstraction Gateway High Availability Solution

Zen Internet. Online Data Backup. Zen Vault Professional Plug-ins. Issue:

Oracle Database Solutions on VMware High Availability. Business Continuance of SAP Solutions on Vmware vsphere

Backing up a Large Oracle Database with EMC NetWorker and EMC Business Continuity Solutions

WELCOME. Backup of OracleVM. Martin Bracher SOUG-Vortrag 21. März 2013

Oracle Recovery Manager

Configuring Network Load Balancing with Cerberus FTP Server

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R

Oracle Databases on VMware High Availability

High Availability Solutions for the MariaDB and MySQL Database

Why Not Oracle Standard Edition? A Dbvisit White Paper By Anton Els

Backup/Recovery Strategy and Impact on Applications. Jacek Wojcieszuk, CERN IT Database Deployment and Persistancy Workshop October, 2005

High Availability Infrastructure for Cloud Computing

DeltaV Virtualization High Availability and Disaster Recovery

BUSINESS CONTINUITY AND DISASTER RECOVERY FOR ORACLE 11g

Using Physical Replication and Oracle Database Standard Edition for Disaster Recovery. A Dbvisit White Paper

RMAN What is Rman Why use Rman Understanding The Rman Architecture Taking Backup in Non archive Backup Mode Taking Backup in archive Mode

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Database High Availability. Solutions 2010

Business Continuity: Choosing the Right Technology Solution

Installation Companion Oracle Data Guard on Amazon EC2 Configuration Guide

Implementing an Enterprise Class Database Backup and Recovery Plan

Secure Test Data Management with ORACLE Data Masking

Using Recovery Manager with Oracle Data Guard in Oracle9i. An Oracle White Paper January 2007

How To Ensure Data Security On Anor

HA for Enterprise Clouds: Oracle Solaris Cluster & OpenStack

SQL-BackTrack the Smart DBA s Power Tool for Backup and Recovery

Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

Using Recovery Manager with Oracle Data Guard in Oracle Database 10g. An Oracle White Paper April 2009

Oracle Active Data Guard Far Sync Zero Data Loss at Any Distance

Implementing Highly Available OpenView. Ken Herold Senior Integration Consultant Melillo Consulting

Oracle Database 11g: Administration Workshop II DBA Release 2

SanDisk ION Accelerator High Availability

Oracle 12c Multitenant and Encryption in Real Life. Christian Pfundtner

WHITE PAPER: ENTERPRISE SOLUTIONS. Symantec Backup Exec Continuous Protection Server Continuous Protection for Microsoft SQL Server Databases

Surround SCM Backup and Disaster Recovery Solutions

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Microsoft Azure. IaaS Networking Storage. Stefan Geiger Gerry

12. User-managed and RMAN-based backups.

Administrator Guide VMware vcenter Server Heartbeat 6.3 Update 1

If you have not multiplexed your online redo logs, then you are only left with incomplete recovery. Your steps are as follows:

RBI BCP READINESS REPORT. Auto generated by Sanovi DRM

MySQL Enterprise Backup

Oracle Database 11g: Administration Workshop II Release 2

Explain how to prepare the hardware and other resources necessary to install SQL Server. Install SQL Server. Manage and configure SQL Server.

16. November >BEST PRACTICES FOR ORACLE HIGH AVAILABILITY WITH DATA GUARD TECHNOLOGY Mila Friedman, Lufthansa Systems

Deploy App Orchestration 2.6 for High Availability and Disaster Recovery

Perforce Backup Strategy & Disaster Recovery at National Instruments

Best Practices White Paper Using Oracle Database 10g Automatic Storage Management with FUJITSU Storage

PoINT Jukebox Manager Deployment in a Windows Cluster Configuration

Support Document: Microsoft SQL Server - LiveVault 7.6X

Oracle Active Data Guard

BackupEnabler: Virtually effortless backups for VMware Environments

D12CBR Oracle Database 12c: Backup and Recovery Workshop NEW

Transcription:

Oracle Data Guard Fast Start Failover understood! Dr. Martin Wunderli http://www.trivadis.com Principal Consultant Partner Basel Baden Bern Lausanne Zurich Düsseldorf Frankfurt/M. Freiburg i. Br. Hamburg Munich Stuttgart Wien

Trivadis Facts & Figures 12 locations D: Dusseldorf, Frankfurt, Freiburg, Hamburg, Munich, Stuttgart A: Vienna CH: Baden, Basle, Bern, Lausanne, Zurich Consolidated income CHF 85 million / EUR 53 million Over 470 employees Over 450 clients Over 1 400 projects per year Over 110 Service Level Agreements About 4'000 training participants per year Fast Start Failover understood 2 2006

FSFO understood! Data Guard Concepts & History The startup issue Fast Start Failover Data is always part of the game. Conclusion Fast Start Failover understood 3 2006

Oracle Standby Databases and Data Guard Overview Primary Site Standby Site Primary Database Online Log Files Local Archiving Standby Log Files Standby Database Archived Log Files Fast Start Failover understood 4 2006

Standby Databases: A short history Oracle 7.3: Creating and mounting a standby database &'(" &'(" &'(" &'(" # Oracle 8i: Automated archived redo log transport and application, TAF, open read-only of standby database!!"#$$$#$$ #$$##$$% Oracle 9i: Data Guard and Data Guard Broker with switchover, close log gap, delayed redo application, GUI and no-data-loss setups (sync transport) Oracle 10g: Simplified syntax, RAC support, partial failover cluster support, reuse of old primary as new standby database, automatic standby activation Fast Start Failover understood 5 2006

Why Data Guard (and not e.g. a Failover Cluster)? In case of a disaster protection setup (data must be mirrored between at least two locations), bandwidth usage is smaller: Even high transaction systems typically need only approx. 70 MBit/s bandwidth No extra software layer and license needed (if you already licensed Oracle Enterprise Edition on primary and standby server ) No file system or instance recovery of database needed after crash of primary server (standby is up to date in case of No-Data- Loss setup and 10gR2) Fast Start Failover understood 6 2006

Why a Failover Cluster (and not Data Guard)? File system based mirroring is needed because of non-database files IP address failover is needed for e.g. an application server DBA knowledge is not available In case that instance recovery time and bandwidth between locations is also crucial, a combination of Failover Cluster and Data Guard between the same machines may be necessary Fast Start Failover understood 7 2006

Failover Cluster vs. Data Guard Remember Fast Start Failover for Data Guard is not a failover cluster with Two connections between nodes (network and disk) where the loss of one connection results in node shutdown A single location of data files (from the point of view of Oracle RDBMS) These two points have positive and negative impact Nodes stay longer up and in their role in case of partial inter node connection loss Automatic failover may not be possible after partial inter node connection loss Fast Start Failover understood 8 2006

FSFO understood! Data Guard Concepts & History The startup issue Fast Start Failover Data is always part of the game. Conclusion Fast Start Failover understood 9 2006

Physical Standby: Startup Behavior 10g versus 9i alter database open; # Primary or recover managed standby database; # Standby alter database mount; )*+, $, -... startup nomount Data Guard Broker!, Fast Start Failover understood 10 2006

Physical Standby Startup Issue What is the biggest problem for data consistency in a cluster? Split Brain! What is the biggest problem for data consistency in a Data Guard environment? / PRIM DB More than one primary! PRIM DB How can this happen? Primary startup (Hardware fixed etc.) after standby activation Fast Start Failover understood 11 2006

Primary Startup after Failover: Network connected OK: STARTUP MOUNT of former primary database Data Guard Broker takes over and handles startup process The Broker knows about the failover and the resulting change of the primary database The former primary database is not started DGMGRL> show configuration; Error: ORA-16795: database resource guard detects that database re-creation is required Configuration details cannot be determined by DGMGRL BAD: STARTUP of former primary database Results in two primary databases since sqlplus does not know of the Data Guard Broker configuration Fast Start Failover understood 12 2006

Primary Startup after Failover: Network interrupted BAD: STARTUP of former primary database Results in two primary databases since sqlplus does not know of the Data Guard Broker configuration BAD: STARTUP MOUNT of former primary database Data Guard Broker tries to verify the Data Guard configuration After 5 unsuccessful requests, Data Guard Broker opens the former primary database Fast Start Failover understood 13 2006

Startup: Variants 1. Only mount Primary and Standby Database during system boot 2. Manual database startup after system boot 3. Adapt TNSNAMES or LDAP server so that old Primary is not found anymore. But local jobs 4. Is there a better solution? Yes, see Fast Start Failover! TBP Fast Start Failover understood 14 2006

FSFO understood! Data Guard Concepts & History The startup issue Fast Start Failover Data is always part of the game. Conclusion Fast Start Failover understood 15 2006

Fast-Start Failover Main criticism of Oracle standby databases: too much manual interaction 1. Manual interaction is required for a failover Need some administrative checks before to validate the status of the standby database, e.g. if all redo are applied More downtime 2. Manual interaction to recreate a new standby database No HA until the setup of the new standby is finished 3. Manual interaction is needed for startup if two primaries have to be avoided at all cost Fast Start Failover addresses all three problems! Fast Start Failover understood 16 2006

Fast Start Failover: Concept 1. Observed Data Guard environment Primary Standby 2. Fast-Start-Failover (automatic) Primary Primary 3. Reinstate (automatic) Standby Primary Fast Start Failover understood 17 2006

When is a Fast-Start Failover triggered? Primary site failure Server crash or server shutdown (without database shutdown) Primary database failure Instance failure (last running instance if RAC) Shutdown abort (but not with normal or immediate) Data file is taken offline Network failure (special case) Documentation of when and when not automatic activation will happen is quite large. Read and test carefully. We will show one case. Fast Start Failover understood 18 2006

Network Failure (1) )+ )+ Select fs_failover_status,fs_failover_observer_present from v$database; -- on primary site FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT -------------------- ----------------------------- SYNCHRONIZED NO Fast Start Failover understood 19 2006

Network Failure (2) 01& 2 ) )+ Select fs_failover_status,fs_failover_observer_present from v$database; -- on primary site FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT -------------------- ---------------------------- STALLED NO Fast Start Failover understood 20 2006

Network Failure (3) ) 3 )+ Select fs_failover_status,fs_failover_observer_present from v$database; -- on new primary site FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT -------------------- ---------------------------- REINSTATE REQUIRED YES Fast Start Failover understood 21 2006

Network Failure (4) 3 )+ 3 )+ Select fs_failover_status,fs_failover_observer_present from v$database; -- on primary site and standby site FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT -------------------- ----------------------------- SYNCHRONIZED YES Fast Start Failover understood 22 2006

Observer location? 4 5& % 6 # )+ )+ 7 Fast Start Failover understood 23 2006

Observer location Best is three locations: One for primary database One for standby database One for observer In many real life situations (no three locations ) Observer on primary site will be the best choice if avoiding 'false' activations is most important Observer on standby site will be the best choice if protection from computation center loss is most important Fast Start Failover understood 24 2006

Compromise to minimize false activates & )+ )+ Fast Start Failover understood 25 2006

Observer Installation Requirements Observer machine with Oracle Net configuration Special entry in Data Guard Broker configuration which requires MaxAvailability protection mode for Primary Database but: special startup behavior but: primary stalls in certain situations Flashback database activated on Primary and Standby Database Fast Start Failover understood 26 2006

Observer - Data Guard additional Configuration Not much to configure edit database 'PHYS_LUCERNE' set property FastStartFailoverTarget = 'PHYS_TOKYO'; edit database 'PHYS_TOKYO' set property FastStartFailoverTarget = 'PHYS_LUCERNE'; edit configuration set property FastStartFailoverThreshold = 15; enable fast_start failover; Fast-Start Failover is a feature of Oracle Data Guard, and cannot run without a Data Guard Broker configuration! Fast Start Failover understood 27 2006

FSFO Does it work? (1) Usually it works An interrupt (network, server crash etc.) during reinstate often results in problems FSFO configuration hangs The reinstating instance will not continue The observer cannot be stopped (with stop observer) How to solve the problem dgmgrl connect system@<new_primary> disable fast_start failover force; reinstate database '<old_primary>'; enable fast_start failover; If the "disable fast_start failover force" also hangs, kill/start the observer and restart the new primary instance Fast Start Failover understood 28 2006

FSFO Does it work? (2) In rare cases, the whole broker configuration is corrupted Remove the configuration On both nodes / instances sql> shutdown immediate cd $ORACLE_BASE/admin/DG1/pfile/ mv dr1dg1.dat dr1dg1.dat.bck mv dr2dg1.dat dr2dg1.dat.bck sql> startup mount Recreate the configuration (good to have scripts ) dgmgrl> create configuration 'DG1'... dgmgrl> add database... dgmgrl> edit database... / edit configuration... dgmgrl> enable configuration; dgmgrl> enable fast_start failover; Fast Start Failover understood 29 2006

FSFO understood! Data Guard Concepts & History The startup issue Fast Start Failover Data is always part of the game. Conclusion Fast Start Failover understood 30 2006

Fast Start Failover understood: Core Messages FSFO addresses three major problems of 9i Data Guard Observer location is not easy to decide Data is always part of the game. Things can become corrupt: Be prepared to recreate the Data Guard configuration Fast Start Failover understood 31 2006