Affordable Enabling Technology 2. Giles Gamon of High-Availability.Com. Practical Approaches

Similar documents

Symantec Cluster Server powered by Veritas

Database Resilience at ISPs. High-Availability. White Paper

Application Persistence. High-Availability. White Paper

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

DISASTER RECOVERY WITH AWS

Brian LaGoe, Systems Administrator Benjamin Jellema, Systems Administrator Eastern Michigan University

Veritas InfoScale Availability

Infortrend ESVA Family Enterprise Scalable Virtualized Architecture

Veritas Cluster Server from Symantec

Enterprise Linux Business Continuity Solutions for Critical Applications

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

Eliminate SQL Server Downtime Even for maintenance

Network Attached Storage. Jinfeng Yang Oct/19/2015

Veritas Storage Foundation High Availability for Windows by Symantec

The Trouble with Backups

OPTIONS / AGENTS DESCRIPTION BENEFITS

Disaster Recovery for Oracle Database

HP StorageWorks Data Protection Strategy brief

WHITE PAPER. Best Practices to Ensure SAP Availability. Software for Innovative Open Solutions. Abstract. What is high availability?

29/07/2010. Copyright 2010 Hewlett-Packard Development Company, L.P.

ORACLE DATABASE HIGH AVAILABILITY STRATEGY, ARCHITECTURE AND SOLUTIONS

Quorum DR Report. Top 4 Types of Disasters: 55% Hardware Failure 22% Human Error 18% Software Failure 5% Natural Disasters

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Contents. Finance and Information Technology Directorate. Disaster Recovery Policy

HRG Assessment: Stratus everrun Enterprise

High Availability Implementation for JD Edwards EnterpriseOne

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

HA / DR Jargon Buster High Availability / Disaster Recovery

White Paper. Low Cost High Availability Clustering for the Enterprise. Jointly published by Winchester Systems Inc. and Red Hat Inc.

Scalable NAS for Oracle: Gateway to the (NFS) future

Building Reliable, Scalable AR System Solutions. High-Availability. White Paper

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Server Virtualization and Consolidation

Disaster Recovery Infrastructure

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

Deploying Global Clusters for Site Disaster Recovery via Symantec Storage Foundation on Infortrend Systems

How To Write A Server On A Flash Memory On A Perforce Server

ORACLE DATABASE 10G ENTERPRISE EDITION

Affordable Remote Data Replication

Techniques for implementing & running robust and reliable DB-centric Grid Applications

EMC PowerPath Family

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

(Scale Out NAS System)

Achieving High Availability

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Pervasive PSQL Meets Critical Business Requirements

Symantec Storage Foundation High Availability for Windows

Blackboard Managed Hosting SM Disaster Recovery Planning Document

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

Red Hat Enterprise linux 5 Continuous Availability

An Enterprise Backup Solution for GOES Operations Ground Equipment (OGE) and Spacecraft Support Ground System (SSGS)

High Availability Solutions for the MariaDB and MySQL Database

High Availability with Windows Server 2012 Release Candidate

Symantec Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft SQL Server

Vicom Storage Virtualization Engine. Simple, scalable, cost-effective storage virtualization for the enterprise

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Building Highly Available OpenZFS Storage Appliances Grenville Whelan

Bosch Video Management System High Availability with Hyper-V

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Case Study: Oracle E-Business Suite with Data Guard Across a Wide Area Network

HARVARD RESEARCH GROUP, Inc.

TOP FIVE REASONS WHY CUSTOMERS USE EMC AND VMWARE TO VIRTUALIZE ORACLE ENVIRONMENTS

EMC Unified Storage for Oracle Database 11g/10g Virtualized Solution. Enabled by EMC Celerra and Linux using NFS and DNFS. Reference Architecture

Synology High Availability (SHA)

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Ingres Replicated High Availability Cluster

Database System Architecture for Fault tolerance and Disaster Recovery

EMC Invista: The Easy to Use Storage Manager

Windows Server Failover Clustering April 2010

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

Total Business Continuity with Cyberoam High Availability

Contents. SnapComms Data Protection Recommendations

recovery at a fraction of the cost of Oracle RAC

EMC Solutions for Disaster Recovery

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

High Availability and Disaster Recovery Solutions for Perforce

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform

Westek Technology Snapshot and HA iscsi Replication Suite

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

VERITAS Business Solutions. for DB2

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Veritas Storage Foundation and High Availability Solutions HA and Disaster Recovery Solutions Guide for Enterprise Vault

Data on Kernel Failures and Security Incidents

Implementing efficient system i data integration within your SOA. The Right Time for Real-Time

MaximumOnTM. Bringing High Availability to a New Level. Introducing the Comm100 Live Chat Patent Pending MaximumOn TM Technology

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Planning and Implementing Disaster Recovery for DICOM Medical Images

SERVER VIRTUALIZATION IN MANUFACTURING

Cloud Based Application Architectures using Smart Computing

Virtualization, Business Continuation Plan & Disaster Recovery for EMS -By Ramanj Pamidi San Diego Gas & Electric

Everything You Need to Know About Network Failover

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Strategies to Solve and Optimize Management of Multi-tiered Business Services

The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality?

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Mirror File System for Cloud Computing

Transcription:

May 2006 Affordable Enabling Technology 2 Giles Gamon of High-Availability.Com Practical Approaches 1

Defining High-Availability Clustering is common place but what does High-Availability clustering achieve? High-Availability IS The absence of interruptions to an end-to to-end service More than making sure the db is running High-Availability IS NOT High-performance computing / clustering Scientific number crunching 2

Achieving High-Availability Identification of threats to service Systems failures, human errors, sabotage, software bugs, acts of God etc Management of risk Building in redundancy, taking backups, training staff, testing systems, active management solutions 3

Causes of Down Time Source - IEEE 4

Causes - Disaster Planning to cope with disasters is an important component of a High-Availability strategy Flood, fire, power grid failure, terrorism etc Most disasters are classified as environmental causes of downtime Collectively environmental causes approximately 5% of downtime 5

Causes - Environmental Power cuts and brown outs UPS & Generator What do they power? Communication blackouts WiFi saturation Cooling systems error Humidification regulation errors can cause hardware failures 6

Southampton University 2005 A small but very real threat Photo by Adrian Pickering 7

Causes Hardware Failure Probably the most recognised cause of downtime Server failures Disk, CPU, internal cooling fans, memory faults, Network failures DNS, DHCP, router, ISP, switches, cables cut, Other Tape backup corruption, client hardware, 8

Causes - Planned Hardware upgrades OS version upgrades Software version upgrades Data migration / transformation Backups Batch processing Preventative maintenance Testing 9

Causes Human Factor Failure to maintain File systems full Database tables full Patches for known bugs not applied Accidents root # rm rf / tmp/tempstuff Network mis-configuration Incorrect cable removed Inexperience root# reboot Cleaner knocks cables out Malice root# uadmin 1 5 or halt Physical sabotage 10

Causes Software Error Code crashes Application suddenly stops with a core dump Memory leaks Slowly consumes all memory until system crash Run away code Taking all CPU time in a loop Hanging code Code pauses waiting for reply that never comes Resource shortfalls Overflowing logs, failure to allocate memory or process Buffer overflows Possibly exploited or just bad code 11

Managing Risks Identify critical services 12

Identify Critical Services How long can the web server be down? Think - internal, public, distance learning How about Email? Can some Emails be lost? How about SITS / Bb / SCT? How much downtime is acceptable? Who will be affected? Admin, students, lecturers What is the impact on the business Reputation, income, disruption 13

Managing Risks Identify critical services Describe service level targets 14

Service Level Targets Email, Web (external) Downtime < 2 hours per month 8a.m. 2a.m. Collaboration Server Downtime < 30 mins per month 24x7 Distance Learning Downtime < 5 mins per year 24x7 Statistical Server Fix when you can not really required 15

Managing Risks Identify critical services Describe service level targets Map risks to services Quantify the level of threat Design and cost solutions Compromise in a rational way 16

Balancing Risk and Reward Unless you have an infinite budget you will have to make trade-offs Identify and remove SPoFs for critical services SPoF = Single Points of Failure Identify the least reliable MTBFs Moving parts typically have the lowest MTBF Identify the most difficult components to repair/rebuild e.g.:- Security server, database Identify what will have biggest impact on failure Usually a core server Database, Email, Web, authentication server etc 17

Technical Approaches Clustering Replication Transaction / block level Emerging technologies iscsi Multi-domain clusters Oracle RAC 18

Typical Multi-Tier Architecture View the service in a vertical fashion List all SPoFs Network Load balancers Switches Application server Database server Data disks Etc Design in redundancy where possible 19

Resilient Architecture Multi-site solution Replication to remote site Load balancers shown actually provide each other with redundant functionality Multiple switches used but not shown SPoFs reduced near to zero Multiple active blades centres Multiple active application servers Clustered database servers This architecture is resilient to almost every conceivable fault 20

Resilient Architecture 21

Resilient Architecture 22

High-Availability Clustering Intelligent management solution Software only Deployed on critical servers Can be active-active or active-passive Constant monitoring Application availability Server health Network availability Other defined components Automated restart / move in the event of a fault Notifications to administrative staff GUI, Email, SMS 23

High-Availability Clustering Active-Passive Simple setup Externalise shared data Use RAID &/ Mirroring Low cost, fast and simple Very reliable 24

High-Availability Replication Traditional cluster locally Replicate to remote node Replication at transaction level Remote node probably included in cluster Automatic locally Manual remotely 25

High-Availability Replication Typically replication does a log scrape Although newer versions have closer integration Takes committed transactions and copies them across to the other node(s) Other nodes roll back the transactions to a read-only copy of the database 26

High-Availability Replication Block level replication Suitable for user files Not ideal for databases Many better approaches that understand db data Available in different guises - like Sun s s SNDR (remote mirror) in kernel Sync / async Streams type module Rsync user space Periodic checking and copy 27

High-Availability Replication Use db replication for db when possible Use block level for other file types and legacy applications that have no replication option available 28

iscsi Block Level Replication Presented as standard disk Over LAN instead of Fibre / SCSI Very clever but still emerging Can be combined with local attach 29

Multi-Domain Clusters Resilient hardware Good I/O architecture Probably not cheapest solution Cheap 2 nd hand 30

Practical Examples Tokyo Stoke Exchange Dealer connections Surrey Ambulance 999 call handling centre North Yorkshire Police Tasking & operational management Steria SWIFT bureau service InSerTo Telco real time services 31

Tokyo Stock Exchange Trading connections over public telecoms network Requiring FireWall-1 Exchange secure but exposed Network faults Firewall system crashes The exchange needed to eliminate identified exposures Low tolerance to downtime 32

Tokyo Stock Exchange Multiple network connections Multiple firewalls installed at every location Clustering used to provide automated failover Transparent failover 33

Surrey Ambulance Service 999 call centre 24x7 live operations environment Handling calls from the public Live feeds from ambulance GPS devices Automatic escalation and logging 34

North Yorkshire Police 24x7 live CAD system Command and control Custody management Crime management Duty rostering Imaging and biometrics Oracle backend to STORM application Highly integrated systems Mapping systems PNC links DVLA links Firearms database Neighbouring force systems 35

North Yorkshire Police 36

North Yorkshire Police 37

Bristol University Number of Oracle databases & other apps Desire to HA across campus Extensive pre-purchase purchase consultancy with Sun Oracle Elected not to use Oracle RAC Not suited to multiple smaller databases Didn t t suite their consolidation desires More cost effective to build clusters of individual Oracle instances nces Expensive compared with standard clustering & RAC requires clustering regardless Applications not built for RAC extended features Elected not to use block copy replication Despite having hardware in place capable of this Data Guard 38

Bristol University 39

Nottingham University Distributed cluster cross-campus campus Oracle, Bb, SITS, SCT, NFS, Web 40

Example Clusters in UK Education Salford SCT UWE Bb, Library, SunOne Sunderland SITS Newcastle 3 x SAP & Oracle Largest European SAP site in hefe Leeds SAP & Oracle Manchester WebCT Edinburgh Firewall Sheffield Halam SITS & Bb 41

RSF-1 Environments Solaris HP-UX Linux AIX SCO MAC OS X SPARC Intel Opteron / AMD x, p, i & Z-Series Z IBM PA-RISC 42

Typical Further Questions Will users notice a fail-over? How long will it take to get installed? Is it complicated? Can it work on Oracle 10i? Is Oracle RAC a good option? Can I use a WAN connection? What about SVM, VxFS,, EMC? Can I use Solaris 10? What about Linux? 43

Contacting Us Giles Gamon High-Availability.Com sales@high-availability.com support@high-availability.com giles@high-availability.com 01565 754 459 44