Internet Services. CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it



Similar documents
Techniques for implementing & running robust and reliable DB-centric Grid Applications

Database Services for CERN

High Availability Databases based on Oracle 10g RAC on Linux

Ultimate Guide to Oracle Storage

ORACLE DATABASE ADMINISTRATOR RESUME

MySQL Enterprise Monitor

Oracle Recovery Manager

Customer evaluation guide Toad for Oracle v12 Database administration

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Oracle Database 11g: RAC Administration Release 2

Why Standardize on Oracle Database 11g Next Generation Database Management. Thomas Kyte

Proactive database performance management

SQL diagnostic manager Management Pack for Microsoft System Center. Overview

Rob Zoeteweij CUSTOMER CASE CONFIGURATION MANAGEMENT PROVISIONING & AUTOMATED PATCHING

Exadata and Database Machine Administration Seminar

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies

The Ultimate Remote Database Administration Tool for Oracle, SQL Server and DB2 UDB

Strategies for Monitoring Large Data Centers with Oracle Enterprise Manager. Ana McCollum Consulting Product Manager

Oracle Database 12c: Performance Management and Tuning NEW

Exadata Database Machine Administration Workshop NEW

PATROL From a Database Administrator s Perspective

About the Author About the Technical Contributors About the Technical Reviewers Acknowledgments. How to Use This Book

Exadata for Oracle DBAs. Longtime Oracle DBA

SolarWinds Database Performance Analyzer (DPA) or OEM?

Management Packs for Database

ORACLE DATABASE HIGH AVAILABILITY STRATEGY, ARCHITECTURE AND SOLUTIONS

Oracle vs. SQL Server. Simon Pane & Steve Recsky First4 Database Partners Inc. September 20, 2012

Oracle Software. Hardware. Training. Consulting. Mythics Complete.

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Oracle Database 11g: Administration Workshop II DBA Release 2

Monitoring and Diagnosing Oracle RAC Performance with Oracle Enterprise Manager. Kai Yu, Orlando Gallegos Dell Oracle Solutions Engineering

Flash Performance for Oracle RAC with PCIe Shared Storage A Revolutionary Oracle RAC Architecture

Tier0 plans and security and backup policy proposals

Transaction Performance Maximizer InterMax

Monitoring Remedy with BMC Solutions

Oracle Database 10g: New Features for Administrators

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

FUNCTIONAL OVERVIEW

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Monitoring and Diagnosing Oracle RAC Performance with Oracle Enterprise Manager

Monitoring, Managing and Supporting Enterprise Clouds with Oracle Enterprise Manager 12c Name, Title Oracle

Instant-On Enterprise

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

DATABASE ADMINISTRATION (DBA) SERVICES

Maintaining Non-Stop Services with Multi Layer Monitoring

Configuring and Managing a Private Cloud with Enterprise Manager 12c

Integration Service Database. Installation Guide - Oracle. On-Premises

Oracle Enterprise Manager 13c Cloud Control

Oracle 11g Database Administration

Oracle Database 11g: Performance Tuning DBA Release 2

Oracle Database 11g: Administration Workshop II DBA Release 2

Oracle Database 11g Comparison Chart

vrops Microsoft SQL Server MANAGEMENT PACK OVERVIEW

Safe Harbor Statement

James Serra Sr BI Architect

Configuring Backup Settings Configuring and Managing Persistent Settings for RMAN Configuring Autobackup of Control File Backup optimization

Oracle Database 11g: Administration Workshop II Release 2

1. This lesson introduces the Performance Tuning course objectives and agenda

Clustered Database Reporting Solution utilizing Tivoli

Expert Oracle Exadata

XpoLog Center Suite Log Management & Analysis platform

Scalable Architecture on Amazon AWS Cloud

<Insert Picture Here> Oracle In-Memory Database Cache Overview

SERVICE SCHEDULE DEDICATED SERVER SERVICES

Critical Database. Oracle Enterprise Manager Oracle Open World 2010 Presented dby Venkat Tekkalur. Prem Venkatasamy. Principal Technical Architect

Idera SQL Diagnostic Manager Management Pack Guide for System Center Operations Manager. Install Guide. Idera Inc., Published: April 2013

Sisense. Product Highlights.

Integration of IT-DB Monitoring tools into IT General Notification Infrastructure

Oracle DBA Course Contents

Virtualization of Oracle Evolves to Best Practice for Production Systems

Introduction to Database as a Service

Real Application Testing. Fred Louis Oracle Enterprise Architect

Quick Start Guide. Ignite for SQL Server. Confio Software 4772 Walnut Street, Suite 100 Boulder, CO CONFIO.

Rob Zoeteweij Zoeteweij Consulting

Simplify Enterprise Systems Management with SAP DB Control Center

ORACLE ENTERPRISE MANAGER 10 g CONFIGURATION MANAGEMENT PACK FOR ORACLE DATABASE

Rackspace Cloud Databases and Container-based Virtualization

ORACLE DATABASE: ADMINISTRATION WORKSHOP I

PRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.

AGENDA: INTRODUCTION: 1. How is our cloud monitoring setup? 2. Which are the tools used? 3. How do we access monitoring dashboard?

Service Desk Intelligence System Requirements

rpaf KTl enterprise Grid Control 11gR1: Business Oracle Enterprise Manager Service Management services using Oracle Enterprise Manager 11gR1

Oracle Enterprise Manager 12c Microsoft SQL Server Plug-in version

System Administration of Windchill 10.2

Oracle Data Miner (Extension of SQL Developer 4.0)

CA Insight Database Performance Monitor for Distributed Databases

Oracle Database 10g: Backup and Recovery 1-2

Oracle Fixed Scope Services Definitions Effective Date: October 14, 2011

Oracle Database 11g: Administration Workshop II

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

SQL Sentry Essentials

Transcription:

Monitoring best practices & tools for running highly available databases Miguel Anjo & Dawid Wojcik DM meeting 20.May.2008

Oracle Real Application Clusters

Architecture RAC1 RAC2 RAC5 RAC3 RAC6 RAC4

Highly Available databases Oracle services Resources distributed among Oracle services Applications assigned to dedicated service On node failure, resources re-distributed CMS_CONDCOND Preferred A1 A2 A3 CMS_C2K Preferred A3 A1 A2 CMS_DBS A2 A3 A1 Preferred CMS_DBS_W A3 A1 A2 Preferred CMS_SSTRACKER Preferred Preferred Preferred Preferred CMS_TRANSFERMGMT A2 Preferred Preferred A1 CMS_CONDCOND Preferred A1 A2 CMS_C2K A2 Preferred A1 CMS_DBS A2 A1 Preferred CMS_DBS_W A1 A2 Preferred CMS_SSTRACKER Preferred Preferred Preferred CMS_TRANSFERMGMT Preferred Preferred A1

Highly Available databases Apps and DB Release cycle Applications release cycle Development service Validation service Production service Database software release cycle Production service version 10.2.0.n 1020 Validation service version 10.2.0.(n+1) Production service version 10.2.0.(n+1)

Why monitor? Monitor (n.) Computer Science. A program that observes, supervises, or controls the activities of other programs. Diagnostics Performance Reporting Need to keep all components in healthy state We are prepared for single failures, some double failures Commitment to give 24/7 best effort service SW misbehavior affecting performance Trends might indicate need to grow system Security breaches

Monitoring participants Presentation title - 7

Monitoring participants Presentation title - 8

What we monitor 25 database clusters 124 servers, 450 cores, 150 disk-arrays, 2000 disks at Tier0 10 Tier1 sites for Streams replication 150+ Oracle services / applications 2000+ user schemas 1M+ connections/day

PDB-BackupBackup 2 node cluster Using Oracle Clusterware Running: RACMon (monitoring agents) StreamMon (monitoring agents) Backups Scripts repository Monitored by Lemon. Set as Critical in Operator procedures

Monitored components Servers Accessibility CDB state Tools: Lemon + RACMon + OEM Disk arrays Accessibility State given by controller Firmware, disk state, disk size, disk speed Tools: Lemon + RACMon Database SW Clusterware state Service accessibility Space available Oracle Streams Tools: RACMon + OEM + StreamMon Database usage OS CPU, I/O User Sessions, CPU, I/O User quotas, tablespace usage Bad usage (short connections, bind variables) Table fragmentation Tools: RACMon, Reports

Best practises (I) No overhead to DB (monitored object) Monitor as much as possible Presentation layer simple & compact Possibility to drill down

Best practises (II) Hierarchy of alarms and notifications Simplicity reliability Centralized version vs. deployed everywhere Independent blocks (monitoring, dashboard, reporting) for HA

Monitoring tools Monitoring tools Lemon, SLS Basic Monitoring (in house development) SQL scripts (reactive monitoring) RACMon (in house development, openlab) StreamMon (in house development, openlab) OEM Oracle Enterprise Manager (Grid Control) - openlab Service oriented monitoring i tools Experiment reports DB Availability & Performance Pages

Basic monitoring SSH SQL*Plus Select * from dual; Checking every 5 minutes Each failure e-mail with error 3 consecutive failures SMS Almost perfect for single instance databases Limitations On RAC, system survives to single HW failures Users connect to service, not database instance No other components (storage, clusterware) monitoring Missing dashboard view

DBA monitoring SQL scripts reactive monitoring (ad-hoc monitoring) Pros: Easy to use Fast real time information Cons: No global overview Diagnosing single problem Requires expert knowledge

RACMon requirements Reliable (24/7) Easy to use and configure Provides up to date information (frequent runs) Centralized no configuration or deployment on RAC side Web interface (RAC monitoring dashboard) one common place for RACs status Monitoring of Oracle services (DB and user level) and Oracle clusterware Monitoring of ASM instances (diskgroups and failgroups) Monitoring other parts of the infrastructure backups, storage, (easy extensibility) Notification send via emails & SMSs to DBAs Availability numbers (over extended periods of time) Disabling monitoring for specific machines or clusters (scheduled and unscheduled intervention logbook)

RACMon Architecture

RACMon - examples

RACMon - examples

RACMon Pros/Features: Customized for our environment Gives an overview of all our HW and RACs Configurable alerts (via email and SMS) and alert levels l (production or non-production systems) Drill down details available via multiple links to other types of monitoring software (OEM, Lemon, StreamMon) Cons: Requires manpower for development

Oracle Streams Oracle Streams enables the propagation and management of data, transactions and events in a data stream either within a database, or from one database to another.

StreamMon

StreamMon

StreamMon Streams availability and usage monitoring Build in alerting in case of any error in streams stack Pros: Monitoring of all T1 sites in one place (streams monitoring not available in any other tool, including OEM) Convenient and easy to use web interface Advanced plotting utilities Cons: Required manpower for development (currently in maintenance only) Uses not-standard libraries, requires customized server

Oracle Enterprise Manager Architecture: Agent running on each server uploads information to central repository, if repository is not available, it caches data Management Service provides insight i into any monitored target t details Management Service based on set-up metrics and policies sends e-mails (SMSes) Proactive monitoring gp possible (actions based on problem diagnostics)

Oracle Enterprise Manager Oracle Enterprise Manager Grid Control features

Oracle Enterprise Manager Pros: Highly configurable alerts, metrics and notification policies Advanced and easy to use web interface Easy drill down External product fully supported Cons: Universal requires more navigation No global overview (per target oriented) Customization for many target requires much work Bugs may by intrusive (e.g. affecting streams, excessive memory/cpu consumption, storage, DB instances) Manpower required for maintenance and configuration Not reliable enough for 24/7 monitoring

Weekly reports Targeted to experiment DBAs and Coordinators Information about Bookkeeping Application names, contacts Resource usage Sessions, CPU, Logical and Physical I/O Security: Connection errors, expiring i passwords, not used schemas Space: consumed, fragmentation, recycle e bin Bad usage: short connections, queries missing bind variables

Weekly reports PHP scripts Generate report over last 7 days Specific to one RAC cluster

Weekly reports

Weekly reports Current functionality Simple way to visualize whole DB usage Concentrates on main users (dynamic) Easy to spot problems (color coded) Very good feedback from our users Now working on user configurable reports

DB availability and performance page PHP, aggregation of other tools Requested by experiments Dashboard of current DB activity Almost real time monitoring i (up to last hour) Application resource usage No extra load uses SLS, RACMon, StreamMon, weekly reports Possibility to drill down

DB availability and performance page

Summary Many monitoring components developed for our environment Out of the box tools not sufficient Open frameworks new features easily added Feedback given to Oracle Enterprise Manager development (openlab) Very good feedback from T1s and experiments Components included in experiment dashboards, WLCG ServiceMaps, SLS