The Future of PostgreSQL High Availability Robert Hodges - Continuent, Inc. Simon Riggs - 2ndQuadrant

Similar documents

Not Your Grandpa s Replication

Module 14: Scalability and High Availability

Implementing the Future of PostgreSQL Clustering with Tungsten

How To Run A Standby On Postgres (Postgres) On A Slave Server On A Standby Server On Your Computer (Mysql) On Your Server (Myscientific) (Mysberry) (

Comparing MySQL and Postgres 9.0 Replication

High Availability Solutions for the MariaDB and MySQL Database

PostgreSQL 9.0 Streaming Replication under the hood Heikki Linnakangas

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Disaster Recovery for Oracle Database

Oracle Database 10g: Backup and Recovery 1-2

Be Very Afraid. Christophe Pettus PostgreSQL Experts Logical Decoding & Backup Conference Europe 2014

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

Backup and Recovery using PITR Mark Jones EnterpriseDB Corporation. All rights reserved. 1

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Appendix A Core Concepts in SQL Server High Availability and Replication

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

MySQL Enterprise Backup

One Solution for Real-Time Data protection, Disaster Recovery & Migration

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Tushar Joshi Turtle Networks Ltd

DISASTER RECOVERY STRATEGIES FOR ORACLE ON EMC STORAGE CUSTOMERS Oracle Data Guard and EMC RecoverPoint Comparison

PostgreSQL Backup Strategies

Database high availability

Contents. SnapComms Data Protection Recommendations

Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Linas Virbalas Continuent, Inc.

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Database Replication with MySQL and PostgreSQL

MySQL Replication. openark.org

Library Recovery Center

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Recovery Principles in MySQL Cluster 5.1

Deploying BDR. Simon Riggs CTO, 2ndQuadrant & Major Developer, PostgreSQL. February 2015

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Techniques for implementing & running robust and reliable DB-centric Grid Applications

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Maximum Availability Architecture

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

How To Use A Recoverypoint Server Appliance For A Disaster Recovery

Module 7. Backup and Recovery

PostgreSQL Concurrency Issues

Application Continuity with BMC Control-M Workload Automation: Disaster Recovery and High Availability Primer

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Monitoring PostgreSQL database with Verax NMS

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Stretching A Wolfpack Cluster Of Servers For Disaster Tolerance. Dick Wilkins Program Manager Hewlett-Packard Co. Redmond, WA dick_wilkins@hp.

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Cloud Based Application Architectures using Smart Computing

Near-Instant Oracle Cloning with Syncsort AdvancedClient Technologies White Paper

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

3. PGCluster. There are two formal PGCluster Web sites.

Redefining Microsoft SQL Server Data Management. PAS Specification

Database High Availability. Solutions 2010

High Availability Database Solutions. for PostgreSQL & Postgres Plus

Solving Large-Scale Database Administration with Tungsten

AVALANCHE MC 5.3 AND DATABASE MANAGEMENT SYSTEMS

SQL Server 2012/2014 AlwaysOn Availability Group

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

<Insert Picture Here> Managing Storage in Private Clouds with Oracle Cloud File System OOW 2011 presentation

MySQL Administration and Management Essentials

A Shared-nothing cluster system: Postgres-XC

About the Author About the Technical Contributors About the Technical Reviewers Acknowledgments. How to Use This Book

Cisco Active Network Abstraction Gateway High Availability Solution

Oracle Database Backup & Recovery, Flashback* Whatever, & Data Guard

Eloquence Training What s new in Eloquence B.08.00

Using Oracle TimesTen to Deploy Low Latency VOIP Applications in Remote Sites

The Evolution of Database Management Systems

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

<Insert Picture Here> Oracle In-Memory Database Cache Overview

SQL Server AlwaysOn (HADRON)

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

ORACLE DATABASE 12c FOR SAP: ROADMAP, BASE CERTIFICATION FEATURES AND OPTIONS

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Availability Digest. Raima s High-Availability Embedded Database December 2011

ScaleArc for SQL Server

PROTECTING MICROSOFT SQL SERVER TM

Oracle Data Guard. Caleb Small Puget Sound Oracle Users Group Education Is Our Passion

Oracle Data Integration Solutions GoldenGate New Features Summary

Future-Proofing MySQL for the Worldwide Data Revolution

Configuring Apache Derby for Performance and Durability Olav Sandstå

Expert Oracle Exadata

Data Integration Overview

SCALABILITY AND AVAILABILITY

Transcription:

The Future of PostgreSQL High Availability Robert Hodges - Continuent, Inc. Simon Riggs - 2ndQuadrant

Agenda / Introductions / Framing the High Availability (HA) Problem / Hot Standby + Log Streaming / The PostgreSQL HA Manifesto / Questions

About Us / Simon Riggs -- Key PostgreSQL HA Contributor PITR, pg_standby, hot standby, etc. / Robert Hodges -- Architect of Tungsten Clustering Tungsten Replicator for MySQL & PostgreSQL, backups, distributed management, etc. / Continuent: Cross-platform database clustering Protect data Improve availability Scale performance / 2ndQuadrant: PostgreSQL services and core dev Services Education Support

Framing the Problem: Database High Availability

DBMS High Availability Made Simple Availability: Degree to which a system is up and running. 1. Minimize failures Keys to High Availability 2. Keep downtime including repairs as short as possible 3. Don t lose more data than you absolutely have to

What Are Key Causes of Downtime? / Crashes -- Hardware or software component fails / Scheduled maintenance Upgrade/service components / Migration Moving between DBMS versions and hardware architectures / Administrative errors Accidents that delete data or cause components not to work Thought exercise: which accounts for the most down-time?

Who Needs High Availability? / Small/medium business applications Idiot-proof installation and management / Embedded medical data processing Unattended operation Never lose a transaction / Hosted website intrusion reporting Burst updates to 100K INSERTs per second Massive data volumes / Hosted CRM (Customer Relationship Management) Fail-back options for system upgrades Creation of reporting databases / Internet Service Provider Shared DBMS instances Transparent migration of users between instances

Shared vs. Redundant Resources / Shared resources create single points of failover (SPOFs) / More redundancy == higher availability DBMS DBMS DBMS DBMS Data Data Data Redundant data approach covers more use cases with less capable hardware Shared disk approach requires internal redundancy; limited data protection and fewer use cases

Backups and Point-In-Time-Recovery / Backups are first line of defense for availability / Point-in-time-recovery restores database state to a particular: Point in calendar time, or Transaction ID / Provisioning copies directly from one database instance to another

Physical vs. Logical Replication / Databases can update either at disk or logical level, hence two replication approaches / Log records -- Databases apply them automatically during recovery / SQL statements -- Clients send SQL to make changes Physical Replication Replicate log records/events to create bit-for-bit copy Transparent, high performance, hard to cross architectures and versions, limitations on updates Logical Replication Replicate SQL to create equivalent data Flexible, fewer/different restrictions, allow schema differences, replicas allow reads

Asynchronous vs. Synchronous / Replicating is like buying a car--there are lots of ways to pay for it / $0 down - Pay later; hope nothing goes wrong / Down payment - Pay some so less goes wrong later / Cash - Pay up front and it s yours forever Asynchronous Replication Commit now, replicate later Lose data but robust against network failure Semi-Synchronous Replication Replicate to at least one other database Trade-off data loss vs. partition handling Synchronous Replication Replicate fully to all other databases Network fails --> you stop

Simple vs. Complex / Simple systems are almost always more available / Complexity is the #1 enemy of high availability DBMS DBMS Data Data Built-in database replication creates simple system with few/no additional ways to fail Complex combinations are hard to understand, test, and manage; create new failure modes

Hot Standby and Log Streaming

PostgreSQL 8.4 Warm Standby Master Standby PostgreSQL PostgreSQL WAL Files rsync to standby Archived WAL Files WAL Files Continuous recovery pg_xlogs Directory Archive Directory pg_xlogs Directory pg_standby

Advantages of Warm Standby / Simple / Completely transparent to applications / Very low performance overhead E.g. no extra writes from triggers / Supports point-in-time recovery / Works over WAN as well as LAN / Has reasonable recovery of master using rsync / Very reliable solution -- if recovery works warm standby works / Requires careful management to use effectively

Limitations of Warm Standby 1. Utilization -- Cannot open the standby To bring up the standby for queries you must end recovery Standby hardware is idle Difficult to track state of recovery since you cannot query log position 2. Data Loss -- Warm standby transfers only full WAL files Can bound loss using archive_timeout Low values create large numbers of WAL files; complicate pointin-time recovery Workarounds using DRBD, etc. are complex

Introducing Hot Standby / Allows users to connect to standby in read-only mode Allowed: SELECT, SET, LOAD, COMMIT/ROLLBACK Disallowed: INSERT, UPDATE, DELETE, CREATE, 2PC, SELECT FOR SHARE/UPDATE, nextval(), LISTEN, LOCK, No admin commands: ANALYZE, VACUUM, REINDEX, GRANT / Simple configuration through recovery.conf # Hot standby recovery_connections = 'on' / Performance Overhead Master: < 0.1% impact from additional WAL Standby: 2% CPU impact, but we're I/O bound anyway / Can come out of recovery while queries are running

Hot Standby Query Conflicts / Master: Connections can interfere and deadlock / Standby: Queries can conflict with recovery Recovery always wins / Causes of conflicts Cleanup records (HOT/VACUUM) Access exclusive locks DROP DATABASE DROP TABLESPACE Very long queries / Conflict resolution Wait, then Cancel - Controlled by max_standby_delay Avoid - Dblink

Introducing Log Streaming Master Standby PostgreSQL PostgreSQL WAL Sender Recovery WAL Receiver Archiving Continuous replication to standby Archived WAL Files Archive Directory

Configuration and Usage / Log streaming layers on top of existing warm standby log shipping / Configuration through postgresql.conf + recovery.conf # Recovery.conf log streaming options standby_mode = 'on' primary_conninfo = 'host=172.16.238.128 port=5432 user=postgres' trigger_file = '/path_to/trigger' / Multiple standby servers allowed / Failure of one standby does not affect others / Management is not simple - must coordinate provisioning & WAL shipping to set up/restart

What Does This Get Us? / Hot standby enhances utilization / Hot standby makes standby monitoring very simple / Hot standby heats up FS cache and shared buffers / Log streaming reduces the data loss window and shortens failover / Hot standby + log streaming will be the favored basic availability solution and will largely replace: Master/slave availability using SLONY/Londiste/PG Replicator Disk block replication Shared disk failover / So are we there yet??

The PostgreSQL HA Manifesto

Developing the PostgreSQL HA Roadmap / What can we learn from the neighbors? / Four features to round out PostgreSQL HA

MySQL Master Master Replication / Logical replication is built in -- no triggers / Covers all SQL including DDL / Handles maintenance very well (painless resync, application upgrades, cross architecture/version) Application Virtual IP MySQL DBMS DBMS Replication MySQL DBMS

Google Semi-Synchronous Replication / Quorum algorithm -- Commits block until at last one slave responds affirmatively / Protects data but avoids system freeze if a slave is unavailable / Released as patch to MySQL; not widely available yet MySQL DBMS Commit succeeds when > 0 slaves respond affirmatively MySQL DBMS MySQL DBMS MySQL DBMS MySQL DBMS

Oracle Data Guard / Oracle Data Guard moves transaction (redo) logs / Protection modes include async/sync replication / Physical standby is bit-for-bit copy, readonly / Logical standby allows readable, updatable copy / WAY better than RAC or Streams Application Application Read/Write Primary Instance Redo Log Replication Standby Instance Read/ Optional Write

Oracle Flash Back / Flash Back Query builds PITR into the DBMS / Select any SCN (System Commit #) for which logs are available / Flash back query to recover deleted data / Flash back instance to convert failed master to slave Sounds better than rsync, doesn t it?

Drizzle Pluggable Replication / Public replication protocol (Google Proto Buffers) / Pluggable replication -- enable new replication types / Sync/async replication / Support for all SQL operations, not just DML Master Slave Replicator Plugin Applier Plugin Applier Plugin Reader Plugin

PostgreSQL HA: Synchronous Replication / Flexible, synchronous replication Physical replication is the beginning / Selectable apply modes Submitted to replication Received by slave Applied by slave / Selectable quorum semantics Async Semi-sync Synchronous / Enables any application that values data to trade off durability vs. availability / Vendor solution jump off: configuration and management

PostgreSQL HA: Real-Time PITR / Implement Flash Back for PostgreSQL / Implementations range from straightforward to very hard / Use zoned snapshots to pick points in past where data remain visible to R/O transactions / Extra credit: Let PostgreSQL revert to a snapshot / Usage: Allow users to recover data from specific point in time--like built-in time delay replication. Snapshot reversion simplifies master recovery / Vendor jump-off point: Set up and manage snapshots

PostgreSQL HA: VLDB High Availability / Multiple simultaneous backups (only one now supported) Backup ref counts to allow more than one customer at a time / Incremental backup with WAL synchronization / Efficient recovery of large masters after failover / Vendor solution jump-off -- Management, fast backup/restore utilities, incremental backup solutions

PostgreSQL HA: Logical Replication / Supplement WAL to allow SQL generation Keys Schema definitions Recover DDL statements in actionable form (e.g., XML) / Extensible replication plug-ins a la Drizzle Intercept data as they are written to log Ability to hold commits to mark transactions (e.g., global IDs) and implement synchronous replication Handle two-phase commit issues Loadable through SQL without weird syntax extensions / Provide built-in reference implementation / Open source/vendor jump-off: Migration, multimaster, filtering, data consistency checking and repair

Summary and Questions

Summary / Hot standby + log streaming provide sound built-in simple HA / PostgreSQL HA manifesto = roadmap to a complete solution for high availability with jump-offs to vendor solutions / Tell us what features you need!

Information/Contact Continuent Web Site: http://www.continuent.com 2ndQuadrant Web Site: http://www.2ndquadrant.com