managing planned downtime with RAC Björn Rost

Similar documents
Oracle Database 11g: RAC Administration Release 2

Oracle Real Application Clusters Load Balancing and Failover Options. An IT Convergence presentation by Dan Norris

About the Author About the Technical Contributors About the Technical Reviewers Acknowledgments. How to Use This Book

Using DataDirect Connect for JDBC with Oracle Real Application Clusters (RAC)

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Advanced Oracle DBA Course Details

High Availability Implementation for JD Edwards EnterpriseOne

Seamless Application Failover with Oracle Data Guard

Using Oracle Real Application Clusters (RAC)

Code:1Z Titre: Oracle WebLogic. Version: Demo. Server 12c Essentials.

Oracle Database Solutions on VMware High Availability. Business Continuance of SAP Solutions on Vmware vsphere

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

An Oracle White Paper June Oracle Real Application Clusters (RAC)

Oracle WebLogic Server and Highly Available Oracle Databases. An Oracle White Paper August Integrated Maximum Availability Solutions

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

Oracle Net Services - Best Practices for Database Performance and Scalability

High Availability Infrastructure for Cloud Computing

Oracle Databases on VMware High Availability

KillTest. 半 年 免 费 更 新 服 务

An Oracle White Paper June Oracle Real Application Clusters One Node

High Availability Essentials

Getting Dedicated with Shared Servers, and how not to.

Oracle Database 11g: New Features for Administrators DBA Release 2

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE HIGH AVAILABILITY STRATEGY, ARCHITECTURE AND SOLUTIONS

<Insert Picture Here> Oracle In-Memory Database Cache Overview

High Availability Infrastructure of Database Cloud: Architecture, Best Practices. Kai Yu Oracle Solutions Engineering, Dell Inc.

Services for a DBA May Your Workloads RIP (Run In Peace)

Key Factors For a Successful ODA Deployment

IBM PureData System for Transactions. Technical Deep Dive. Jonathan Rossi, PureSystems Specialist

Techniques for implementing & running robust and reliable DB-centric Grid Applications

Introduction to Database as a Service

Oracle Networking and High Availability Options (with Linux on System z) & Red Hat/SUSE Oracle Update

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Oracle Exam 1z0-599 Oracle WebLogic Server 12c Essentials Version: 6.4 [ Total Questions: 91 ]

Rob Zoeteweij CUSTOMER CASE CONFIGURATION MANAGEMENT PROVISIONING & AUTOMATED PATCHING

Instant-On Enterprise

be architected pool of servers reliability and

<Insert Picture Here> WebLogic High Availability Infrastructure WebLogic Server 11gR1 Labs

Objectif. Participant. Prérequis. Pédagogie. Oracle Database 11g - New Features for Administrators Release 2. 5 Jours [35 Heures]

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

Module 14: Scalability and High Availability

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

Why and How You Should Be Using Policy-Managed RAC Databases

Blackboard Learn TM, Release 9 Technology Architecture. John Fontaine

Oracle Database 12c - Global Data Services

Oracle 11g New Features - OCP Upgrade Exam

Load Balancing and Clustering in EPiServer

Inge Os Sales Consulting Manager Oracle Norway

Highly Available NFS over Oracle ASM Cluster File System (ACFS)

Oracle Database 10g: RAC for Administrators

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

High Availability Solutions for the MariaDB and MySQL Database

Oracle Database 10g: Backup and Recovery 1-2

Oracle Database Public Cloud Services

Oracle Security Auditing

Oracle Security Auditing

An Oracle White Paper June Oracle Single Client Access Name (SCAN)

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Building Active/Passive Clusters with Oracle Fusion Middleware 11g

Jive and High-Availability

Overview: X5 Generation Database Machines

OBIEE 11g Scaleout & Clustering

Implementing Oracle Grid: A Successful Customer Case Study

F5 and Oracle Database Solution Guide. Solutions to optimize the network for database operations, replication, scalability, and security

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Oracle Failover Database Cluster with Grid Infrastructure 12c Release 1


Load Balancing and Clustering in EPiDesk

Oracle Database In-Memory The Next Big Thing

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Peter Ruissen Marju Jalloh

LinuxWorld Conference & Expo Server Farms and XML Web Services

FioranoMQ 9. High Availability Guide

COMPARISON OF VMware VSHPERE HA/FT vs stratus

Ecomm Enterprise High Availability Solution. Ecomm Enterprise High Availability Solution (EEHAS) Page 1 of 7

How to resolve Root Certificate Expiry Issue for Enterprise Manager - Database Control ( )

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Oracle Net Services for Oracle10g. An Oracle White Paper May 2005

Comparing TCO for Mission Critical Linux and NonStop

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Linux as a Data Integration Platform

Oracle Communications WebRTC Session Controller: Basic Admin. Student Guide

Veritas Cluster Server from Symantec

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Deployment patterns for Fusion Middleware. a best practice session by Simon Haslam & Jacco H. Landlust

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014

What s New in 12c High Availability. Aman Sharma

An Oracle White Paper June Enterprise Manager Cloud Control 12c Disaster Recovery with Storage Replication

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

TIBCO ActiveMatrix BusinessWorks SmartMapper Plug-in Release Notes

SAP NetWeaver High Availability and Business Continuity in Virtual Environments with VMware and Hyper-V on Microsoft Windows

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Risk-Free Administration for Expert Database Administrators

Achieving Oracle Database Scalability using SharePlex

Transcription:

managing planned downtime with RAC VIP D ASM C CVU SQL UC CP OPS OUI RAC F OUI RAC FCF LBA ONS UI RAC FCF LBA ONS FAN C RAC FCF LBA ONS FAN TAF OD AC FCF LBA ONS FAN TAF CRS VIP A FCF LBA ONS FAN TAF CRS VIP GSD ODA FCF LBA ONS FAN TAF CRS VIP GSD ASM SQ LBA ONS FAN TAF CRS VIP GSD ASM CVU SQL UC F LBA ONS FAN TAF CRS VIP GSD ASM CVU RAC FCF L CRS VIP GSD ASM CVU SQL UCP OPS OUI RAC LBA ON IP GSD ASM CVU SQL UCP OPS OUI RAC ODA LBA FAN T ASM CVU SQL RAC UCP OPS OUI RAC FCF ASM LBA ON CVU SQL UCP OPS OUI RAC LBA ODA ONS TAF FAN TAF CVU SQL UCP OPS ODA OUI RAC FCF LBA ONS FAN U SQL UCP OPS OUI FCF LBA ONS FAN TAF CRS P OPS OUI RAC FCF ONS FAN TAF CRS VIP I RAC FCF LBA ONS FAN TAF CRS VIP C FCF LBA ONS FAN TAF CRS VIP ONS FAN TAF CRS VIP GSD AS S FAN TAF CRS VIP GSD RS VIP GSD ASM CV SD ASM CVU SQ VU SQL UC Björn Rost

Björn Rost founder, manager and DBA RAC SIG European Chair ACE Director

about us Software production company founded 2001 mostly J2EE logistics telecommunication media and publishing customers demand full lifecycle support hardware resale datacenter operations 3rd party software

project lifecycle consulting J2EE Hardware hosting specification php SW-Licenses monitoring documentation database installation patching feasibility studies benchmarking backups tuning planning design integration operation

TAF

TAF Minimize downtime! Go implement this TAF thing. Just turn it on, it is completely transparent!

TAF Minimize downtime! Go implement this TAF thing. Just turn it on, it is completely transparent! let me check the docs and get right back

TAF

use OCI driver TAF

TAF use OCI driver can do

TAF use OCI driver can do delay or overhead?

TAF use OCI driver can do delay or overhead? was expecting some cost

TAF use OCI driver can do delay or overhead? no DML! was expecting some cost

TAF use OCI driver can do delay or overhead? no DML! was expecting seriously? some cost

TAF use OCI driver can do delay or overhead? no DML! was expecting seriously? some cost yup, only SELECT will fail over...

expectation a clustered HA system should always be UP

the reality even with RAC implemented, there are still many (if not more) outages :(

limits a session can never move between nodes session creation (lb) decided on connection HA needs to be supported in Apps some of this stuff can be confusing

12c app continuity

Agenda introduction walkthrough load balancing connection pools srvctl app continuity

reasons to use RAC http://www.my-idconcept.de/downloads/you_probably_dont_need_rac.pdf

reasons to use RAC http://www.my-idconcept.de/downloads/you_probably_dont_need_rac.pdf

reasons to use RAC You probably don t need RAC! http://www.my-idconcept.de/downloads/you_probably_dont_need_rac.pdf

reasons to use RAC

reasons to use RAC scalability & performance

reasons to use RAC scalability & performance high availability

reasons to use RAC scalability & performance high availability unplanned

reasons to use RAC scalability & performance high availability unplanned planned

RAC One node RAC without scaling across multiple nodes migration to full RAC online possible seamless crash failover

unplanned downtime

unplanned downtime hardware fault

unplanned downtime hardware fault servers come with redundant components disks power supplies fans components are getting better, too

unplanned downtime

unplanned downtime hardware fault

unplanned downtime hardware fault software crash or hang

unplanned downtime hardware fault software crash or hang DOS attacks / security issues

unplanned downtime hardware fault software crash or hang DOS attacks / security issues human error

planned downtime

planned downtime hardware upgrade (RAM, CPU,...)

planned downtime hardware upgrade (RAM, CPU,...) firmware upgrades

planned downtime hardware upgrade (RAM, CPU,...) firmware upgrades OS updates

planned downtime hardware upgrade (RAM, CPU,...) firmware upgrades OS updates Oracle Software patches

planned downtime hardware upgrade (RAM, CPU,...) firmware upgrades OS updates Oracle Software patches network re-patching

planned downtime hardware upgrade (RAM, CPU,...) firmware upgrades OS updates Oracle Software patches network re-patching SAN reconfiguration

downtime

failure types

failure types app not connected (only on demand)

failure types app not connected (only on demand) session open but idle/no tx

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only start over or display error

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only start over or display error tx in progress, DML

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only start over or display error tx in progress, DML rollback/replay/handle error

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only start over or display error tx in progress, DML rollback/replay/handle error important: don t commit twice

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only start over or display error tx in progress, DML rollback/replay/handle error important: don t commit twice (re)join of cluster node

failure types app not connected (only on demand) session open but idle/no tx app needs to reconnect tx in progress, SELECT only start over or display error tx in progress, DML rollback/replay/handle error important: don t commit twice (re)join of cluster node

maintenance rqmts

maintenance rqmts remove nodes from cluster without user interruption

maintenance rqmts remove nodes from cluster without user interruption don t break running sessions

maintenance rqmts remove nodes from cluster without user interruption don t break running sessions ok to kill idle sessions, let them reconnect

maintenance rqmts remove nodes from cluster without user interruption don t break running sessions ok to kill idle sessions, let them reconnect don t loose data/transactions/new orders

maintenance rqmts remove nodes from cluster without user interruption don t break running sessions ok to kill idle sessions, let them reconnect don t loose data/transactions/new orders stay up or available

load balancing

load balancing client side tnsnames.ora and/or SCAN

load balancing client side tnsnames.ora and/or SCAN server side on connection long goal: # of connections short goal: system load avg

load balancing client side tnsnames.ora and/or SCAN server side on connection long goal: # of connections short goal: system load avg runtime advisory events sent to conn. pools

SCAN RAC_OLTP = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = scan.db.portrix.net)(port = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = OLTP) )) oracle@rac1:~$ host scan.db.portrix.net scan.db.portrix.net has address 46.30.26.101 scan.db.portrix.net has address 46.30.26.102 scan.db.portrix.net has address 46.30.26.103

Services: OLTP batch

Services: OLTP: RAC1, RAC2 batch: RAC1 RAC1 RAC2

1 Services: OLTP: RAC1, RAC2 batch: RAC1 RAC1 RAC2

Services: 1 OLTP: RAC1, RAC2 batch: RAC1 RAC1 RAC2

Services: OLTP: RAC1, RAC2 batch: RAC1 1 RAC1 RAC2

2 Services: OLTP: RAC1, RAC2 batch: RAC1 1 RAC1 RAC2

Services: 2 OLTP: RAC1, RAC2 batch: RAC1 1 RAC1 RAC2

Services: OLTP: RAC1, RAC2 batch: RAC1 2 1 RAC1 RAC2

3 Services: OLTP: RAC1, RAC2 batch: RAC1 2 1 RAC1 RAC2

Services: 3 OLTP: RAC1, RAC2 batch: RAC1 2 1 RAC1 RAC2

Services: OLTP: RAC1, RAC2 batch: RAC1 2 1 3 RAC1 RAC2

Services: OLTP: RAC2 batch: RAC2 2 1 3 RAC1 RAC2

4 5 Services: OLTP: RAC2 batch: RAC2 2 1 3 RAC1 RAC2

Services: 4 5 OLTP: RAC2 batch: RAC2 2 1 3 RAC1 RAC2

Services: OLTP: RAC2 batch: RAC2 5 2 1 4 3 RAC1 RAC2

Services: OLTP: RAC2 batch: RAC2 1 3 5 4 RAC1 RAC2

Services: OLTP: RAC2 batch: RAC2 5 4 3 RAC1 RAC2

Services: OLTP: RAC2 batch: RAC2 5 4 3 RAC1 RAC2

5 4 3 RAC1 RAC2

Services: OLTP: RAC2 batch: RAC2 5 4 3 RAC1 RAC2

Services: OLTP: RAC1, RAC2 batch: RAC1 5 4 3 RAC1 RAC2

app requirements reconnect regularly handle connection failures set max_sessions to the right value

connection pools pool will open and hold connections app loans session for tx as needed when tx is done, app returns session pool can decide which connection to lend to app

connection pools save resources memory connection time reconnect help load balancing abstraction layer for errors

UCP and FAN

UCP and FAN Fast Connection Failover

UCP and FAN Fast Connection Failover crash

UCP and FAN Fast Connection Failover crash planned outage

UCP and FAN Fast Connection Failover crash planned outage (re)join

UCP and FAN Fast Connection Failover crash planned outage (re)join run-time load balancing

UCP and FAN Fast Connection Failover crash planned outage (re)join run-time load balancing session affinity

UCP and FAN Fast Connection Failover crash planned outage (re)join run-time load balancing session affinity transaction affinty

services A service is an entity to which users connect configured with connection settings on client registered through clusterware each service has: a list of preferred and available instances load-balancing goal TAF and other parameters 12c multitenant: each PDB has it s own service

services default service is always active on all nodes ORA-01033: ORACLE initialization or shutdown in progress seperation might improve performance helpful in other areas of administration resource management EM monitoring grouping

srvctl grid@rac1:~$ srvctl config service -d PTXRAC -s OLTP Service name: OLTP Service is enabled Server pool: PTXRAC_OLTP Cardinality: 2 Disconnect: false Service role: PRIMARY Management policy: AUTOMATIC DTP transaction: false AQ HA notifications: false Failover type: NONE Failover method: NONE TAF failover retries: 0 TAF failover delay: 0 Connection Load Balancing Goal: LONG Runtime Load Balancing Goal: SHORT TAF policy specification: NONE Edition: Preferred instances: PTXRAC1,PTXRAC2 Available instances:

verify service cfg grid@rac1:~$ lsnrctl status listener_scan1 LSNRCTL for Solaris: Version 11.2.0.2.0 - Production on 29-SEP-2011 11:35:58 Copyright (c) 1991, 2010, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1))) STATUS of the LISTENER ------------------------ Alias LISTENER_SCAN1 Version TNSLSNR for Solaris: Version 11.2.0.2.0 - Production Start Date 30-APR-2011 23:09:28 Uptime 151 days 12 hr. 26 min. 30 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora Listener Log File /u01/app/11.2.0/grid/log/diag/tnslsnr/sun1os/listener_scan1/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.42.155)(PORT=1521))) Services Summary... Service "BATCH.DB.PORTRIX.NET" has 1 instance(s). Instance "PTXRAC2", status READY, has 1 handler(s) for this service... Service "OLTP.DB.PORTRIX.NET" has 2 instance(s). Instance "PTXRAC1", status READY, has 1 handler(s) for this service... Instance "PTXRAC2", status READY, has 1 handler(s) for this service... Service "PTXRAC.DB.PORTRIX.NET has 2 instance(s). Instance "PTXRAC1", status READY, has 1 handler(s) for this service... Instance "PTXRAC2", status READY, has 1 handler(s) for this service... Service "PTXRACXDB.DB.PORTRIX.NET" has 2 instance(s). Instance "PTXRAC1", status READY, has 1 handler(s) for this service... Instance "PTXRAC2", status READY, has 1 handler(s) for this service... The command completed successfully

srvctl srvctl modify service Moves a service member from one instance to another. Additionally, this command changes which instances are to be the preferred and the available instances for a service. This command supports some online modifications to the service, such as: When there are available instances for the service, and the service configuration is modified so that a preferred or available instance is removed, the running state of the service may change unpredictably: The service is stopped and then removed on some instances according to the new service configuration. The service may be running on some instances that are being removed from the service configuration. These services will be relocated to the next free instance in the new service configuration. srvctl relocate service -d db_unique_name -s service_name {-c source_node -n target_node -i old_instance_name -t new_instance_name} [-f]

srvctl if service only up on one node: relocate up on muliple nodes: modify

shutdown srvctl stop instance -d db_unique_name {[-n node_name] [-i "instance_name_list"]} [-o stop_options] [-f] stops all services on the node (with -f ) better relocate service yourself!

shutdown srvctl stop instance -d db_unique_name {[-n node_name] [-i "instance_name_list"]} -o transactional refuses new connections disconnects sessions after commit/rollback

steps (again) relocate services away (relocate/modify) wait until sessions are done with work shutdown (transactional) perform maintenance restart services relocate services back

rolling upgrades available in a lot of patches two RDBMS versions running simultaneously built-in support in OPatch

rolling upgrades [oracle@rac1 tmp]$ opatch query -is_rolling_patch 10352368 Invoking OPatch 11.1.0.6.6 Oracle Interim Patch Installer version 11.1.0.6.6 Copyright (c) 2009, Oracle Corporation. All rights reserved. Oracle Home : /u01/app/oracle/product/11.2.0/db_1 Central Inventory : /u01/app/orainventory from : /etc/orainst.loc OPatch version : 11.1.0.6.6 OUI version : 11.2.0.1.0 OUI location : /u01/app/oracle/product/11.2.0/db_1/oui Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/ opatch2011-09-15_11-28-05am.log Patch history file: /u01/app/oracle/11.2.0/db_1/cfgtoollogs/opatch/ opatch_history.txt -------------------------------------------------------- Patch is a rolling patch: true

12c app continuity 2 part system transaction guard reliably determine the state of commits app continuity (replay driver) driver records and caches requests and validation information reconnects and verifies commit state replays and validates requests

activate app continuity driver needs replay boundaries UCP and WebLogic add these automatically beginrequest/endrequest for 3rd party apps jdbc-thin only mutable calls (seq.nextval, sysdate) does not work with default service consider memory&cpu overhead

review TAF Load-Balancing services UCP FAN and FCF App Continuity

summary set up at least one extra service possibly more make sure application reconnects regularly use UCP if possible try and use this make it part of app rqrmts patch regularly

und weiter? RAC SIG Wahlen laufen gerade! RAC SIG - www.oracleracsig.org

DOAG 2013 unconference: DEMO 12c RAC auf laptop, UCP und app continuity mit java app

Danke RAC Attack www.racattack.org RAC SIG - www.oracleracsig.org b.rost@portrix.net http://portrix-systems.de/blog/ @brost