Data Distribution with SQL Server Replication



Similar documents
Database Replication with Oracle 11g and MS SQL Server 2008

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

Basics Of Replication: SQL Server 2000

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Chapter 3 - Data Replication and Materialized Integration

Appendix A Core Concepts in SQL Server High Availability and Replication

Survey on Comparative Analysis of Database Replication Techniques

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Real-time Data Replication

Online Transaction Processing in SQL Server 2008

Database Replication

Database Replication with MySQL and PostgreSQL

Virtuoso Replication and Synchronization Services

Database Replication Techniques: a Three Parameter Classification

Module 14: Scalability and High Availability

Comparing MySQL and Postgres 9.0 Replication

A Shared-nothing cluster system: Postgres-XC

SQL Server 2008 Performance and Scale

Optimizing Performance. Training Division New Delhi

SQL Server for Database Administrators Course Syllabus

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Database Replication: A Survey of Open Source and Commercial Tools

Guide to Scaling OpenLDAP

Microsoft SQL Database Administrator Certification

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i

SQL Server Replication Guide

Distributed Databases

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Data Management in the Cloud

Database Replication: a Tale of Research across Communities

The Future of PostgreSQL High Availability Robert Hodges - Continuent, Inc. Simon Riggs - 2ndQuadrant

HOUG Konferencia Oracle TimesTen In-Memory Database and TimesTen Application-Tier Database Cache. A few facts in 10 minutes

Postgres Plus xdb Replication Server with Multi-Master User s Guide

How to Implement Multi-way Active/Active Replication SIMPLY

Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation

System Development and Life-Cycle Management (SDLCM) Methodology. Approval CISSCO Program Director

SharePlex for SQL Server

David Dye. Extract, Transform, Load

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

When an organization is geographically dispersed, it. Distributed Databases. Chapter 13-1 LEARNING OBJECTIVES INTRODUCTION

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

Real World Enterprise SQL Server Replication Implementations. Presented by Kun Lee

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

Advanced Database Group Project - Distributed Database with SQL Server

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Chapter Replication in SQL Server

Segmentation in a Distributed Real-Time Main-Memory Database

Best Practices: Extending Enterprise Applications to Mobile Devices

Contents. SnapComms Data Protection Recommendations

New method for data replication in distributed heterogeneous database systems

Data Replication and Snapshot Isolation. Example: Cluster Replication

ScaleArc for SQL Server

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs

An Overview of Distributed Databases

Tier Architectures. Kathleen Durant CS 3200

Synchronization and replication in the context of mobile applications

Microsoft SQL Server Data Replication Techniques

SQL Server. 1. What is RDBMS?

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Principles of Distributed Database Systems

SQL Server AlwaysOn

Innovative technology for big data analytics

Databases in Organizations

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Database Mirroring & Snapshots SQL Server 2008

Oracle 11g Database Administration

Accelerate SQL Server 2014 AlwaysOn Availability Groups with Seagate. Nytro Flash Accelerator Cards

Framework Model for Database Replication within the Availability Zones

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Chancery SMS Database Split

MTCache: Mid-Tier Database Caching for SQL Server

Scalability and Performance Report - Analyzer 2007

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Distributed Databases

Getting to Know the SQL Server Management Studio

Designing a Cloud Storage System

SQL Server Training Course Content

Data Replication in Privileged Credential Vaults

Course 6232A: Implementing a Microsoft SQL Server 2008 Database

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Byzantium: Byzantine-Fault-Tolerant Database Replication

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

MySQL 5.0 vs. Microsoft SQL Server 2005

SanDisk ION Accelerator High Availability

SQL-BackTrack the Smart DBA s Power Tool for Backup and Recovery

Transcription:

Data Distribution with SQL Server Replication Introduction Ensuring that data is in the right place at the right time is increasingly critical as the database has become the linchpin in corporate technology infrastructure driving customer interactions, revenues, and decision making. The customer-facing OLTP servers have real-time data critical to the decision making process supported by the OLAP servers; the challenge is how to make the OLTP data available where and when it is needed. Database replication is a solution for distributing data that has been widely studied for over 20 years providing a mechanism to increase performance and availability as well as to distribute data to support business requirements. However, replication also creates a large and complex distributed system with intricate dependencies and hidden communications channels (Konstantinos Krikellas, 2010) requiring experienced DBAs for maintenance and support. The goal of this paper is to provide a foundation for understanding data replication as well as a discussion of the criteria for selecting an appropriate replication technology. Background Database replication is the process of maintaining multiple copies of data items in different locations called replicas (Bettina Kemme, 2010). Distributing data to disparate servers has been a research topic for more than two decades during which time multiple approaches to maintaining transactional atomicity, consistency, integrity, and durability (ACID) have been investigated. Ten years ago, the theoretical basis for database replication revolved around the classic concepts of serializability and locking (Bettina Kemme, 2010). In these schemes ACID was ensured using a 2-phase-commit protocol where locks were obtained for read operations while write operations used distributed locks. The overhead costs of locks lead some researchers to suggest quorums as a more efficient approach for ensuring the consistency of distributed data; however, many operational issues were identified with these approaches. The seminal paper Dangers of Replication and a Solution (Jim Gray, 1996) suggested that there was an exponential relationship between the number of replicas and the transaction response times, the conflict probability, and the deadlock rates (Bettina Kemme, 2010). Research conducted by Jim Gray led to new proposals for replication that eased the restrictions for consistency in order to increase responsiveness. Early research in replication, as described above, is commonly referred to as eager or synchronous replication which keep the replicas synchronized within transaction boundaries (Christian Plattner, 2004). Eager systems conform to 1-copy-serialization: Data Distribution with SQL Server Replication Page 1 of 9

the resulting schedules are equivalent to a serial schedule on a single database (Christian Plattner, 2004). Confirming research by Jim Gray, Plattner and Alonso determined that eager replication results in very high communication overhead and a probability of deadlocks proportional to the third power of the number of replicas (Christian Plattner, 2004). These issues led to a revised view in the database community that one could get either performance by sacrificing consistency (lazy replication approaches) or consistency at the cost of performance and scalability (eager replication approaches) (Bettina Kemme, 2010). In contrast to eager replication, lazy or asynchronous replication propagates the updates of a transaction once it has already committed (F. D. Muñoz-Escoí, 2009). Lazy replication improves performance allowing faster transactional rates; however, this is achieved by sacrificing replica consistency a choice that was not well-received by some in the research community who found it difficult to accept the fact that it was possible to implement (and commercialize!) a system with such ill-defined consistency guarantees (Bettina Kemme, 2010). It was, in fact, the need to commercialize a solution that led to the decision to implement lazy replication as the preferred solution in the real-world notwithstanding the issues of potentially stale data at the replicas as well as the need to implement mechanisms to resolve update conflicts (Christian Plattner, 2004). SQL Server Replication Microsoft implements the Replication subsystem in SQL Server using an asynchronous (lazy) replication solution that provides a robust conflict resolution mechanism. As with any lazy replication implementation there is a possibility of stale data at the replicas; however, the consistency tradeoff is well-worth the achievable performance of SQL Server Replication. Additionally, the Replication subsystem includes tools to manage data consistency across the replicas allowing for a high degree of confidence in the data. There are many excellent references available describing the architecture and components of the SQL Server Replication subsystem; this paper will not include redundant discussion of these topics. Instead, the remaining sections of this paper will discuss the criteria for selecting a replication mechanism. Replication Scenarios Replication is used for two complimentary features: performance improvement and high availability (F. D. Muñoz-Escoí, 2009). Performance can be increased with replication because each replicated server can respond to queries as read-only replicas requiring no coordination among the servers; this approach may also provide server redundancy where the failure of any one replica does not impact data accessibility. SQL Server supports this configuration with its Transactional replication mechanism. Data Distribution with SQL Server Replication Page 2 of 9

Additionally, for replication configurations requiring replicas to update their local data SQL Server provides a Merge replication mechanism with default and customizable conflict resolvers available to meet any business logic need. SQL Server also provides a Snapshot replication mechanism which creates a point-in-time read-only copy of data on the replica. For clarity, the population of decision support systems is encompassed within the performance improvement feature as defined by Muñoz-Escoí. A major consideration when selecting a replication mechanism is the data modifications and data-type limitations required to support the internal replication mechanisms. Transactional and snapshot replication mechanisms have no intrusive data modification requirements; e.g. the former manages data distribution via the transaction log while the later creates and applies a point-in-time copy of the replicated data. Transactional replication does require that all tables have a primary key defined; additionally, large object types require to replication configuration parameters. Merge replication, in contrast, requires data augmentation in order to synchronize data updates across replicas. The major Merge replication requirements and limitations are listed below: TEXT, NTEXT, or IMAGE data types are not replicated. Any Foreign Keys in the database must be labeled as NOT FOR REPLICATION to avoid conflicts. Any User Defined Triggers must be labeled as NOT FOR REPLICATION to avoid conflicts. IDENTITY columns must be labeled as NOT FOR REPLICATION. IDENTITY values must be managed on a site-by-site basis. There are limits associated with Merge Replication regarding the amount of data that may be replicated as measured by row and column size. TIMESTAMP values are not replicated they are regenerated. Every replicated table must have a UNIQUEIDENTIFIER defined. Data inserted using BULK INSERT commands will not be replicated. Table 1 Merge Replication Requirements and Limitations In my experience there are several scenarios where data replication is commonly utilized: data synchronization to a reporting server; data synchronization to replica(s) in support of business needs (i.e. off-loading data processing, stand-by server); and data synchronization of remote semi-connected replicas. The Replication subsystem defines specific criteria for the implementation of any of the replication mechanisms some requiring schema modifications thus many considerations must be weighed in designing a replication topology. Criteria for Selecting a Replication Mechanism Distributing data across SQL Server instances in any enterprise environment requires a careful consideration of a myriad of issues. The Replication subsystem provides a range of options as well as customizable configurations to satisfy most data distribution Data Distribution with SQL Server Replication Page 3 of 9

needs; however, given the available options the selection of a specific replication mechanism may be daunting for the uninitiated. In an attempt to provide a starting point in the selection process I have developed the decision flow charts in Figures 1, 2, and 3 below. Please note that these flow charts represent high-level considerations and should not be interpreted as definitive they are a starting point for a more thorough analysis. Data Distribution with SQL Server Replication Page 4 of 9

1) What is the data replication latency? Real-time or Near-time data Source and replicas are SQL Server 2) What are the DBMS types? 3) Network connectivity? Replicas constantly connected to source 4) What is the data change frequency? High volume of data 5) How much data will be replicated per cycle? Large quantity of data to be distributed Data updates at replicas 6) Will replicas need to update data? 7) Can the data schema be augmented? Schema possible 8) Use of resolvers for conflict resolution? Data conflicts acceptable Merge Replication Figure 1 Merge Replication Selection Criteria Data Distribution with SQL Server Replication Page 5 of 9

Real-time or Near-time data 1) What is the data replication latency? Source and replicas are SQL Server 2) What are the DBMS types? 3) Network connectivity? Replicas constantly connected to source 4) What is the data change frequency? High volume of data 5) How much data will be replicated per cycle? Large quantity of data to be distributed 6) Will replicas need to update data? Data updates at replicas Schema possible 7) Can the data schema be augmented? 8) Use of resolvers for conflict resolution? Data conflicts acceptable Snapshot Replication Figure 2 Snapshot Replication Selection Criteria Data Distribution with SQL Server Replication Page 6 of 9

Real-time or Neartime data 1) What is the data replication latency? Source and replicas are SQL Server 3) Network connectivity? Not discussed in this paper 2) What are the DBMS types? Replicas constantly connected to source High volume of data 4) What is the data change frequency? 5) How much data will be replicated per cycle? Large quantity of data to be distributed Data updates at replicas 7) Can the data schema be augmented? 6) Will replicas need to update data? Schema possible Data conflicts acceptable 8) Use of resolvers for conflict resolution? Transactional Replication Peer-to-Peer Transactional Replication Figure 3 Transactional Replication Selection Criteria Data Distribution with SQL Server Replication Page 7 of 9

Summary Replication is a powerful mechanism supporting the distribution of data throughout an enterprise. Significant research has led the industry to adopt lazy or asynchronous replication implementations in order to support the high-levels of performance necessary in the real-world. The benefits of lazy replication come with a cost - data consistency; however, with replication properly configured and administered the data consistency issues may be mitigated thereby allowing high-performance data distribution. Commercial replication systems have evolved from the research community to satisfy the requirements of business by providing abstractions from the complexities of the underlying replication mechanisms. Microsoft, in its mission to provide easy-to-use tools for data management, has incorporated the technologies necessary for distributing data within the SQL Server product line. The Replication subsystem included with various feature sets in every edition of SQL Server provides the ability to natively distribute data throughout the enterprise with no additional licensing fees. The Replication subsystem in SQL Server allows for implementations supporting a variety of data distribution needs while minimizing implementation and administrative requirements. Transactional replication supports real-time data synchronization with low overhead for offline processing, standby servers, or reporting servers. Merge replication supports data synchronization across intermittently connected replicas while snapshot replication is most useful for distributing static data representing a point-in-time from source to replicas during offpeak hours. About the Author Ron is a Senior DBA who specializes in performance optimization, replication, and security. Trademark Acknowledgements Microsoft and SQL Server are trademarks or registered trademarks of Microsoft Corporation, in the United States and/or other countries. Data Distribution with SQL Server Replication Page 8 of 9

Bibliography Bettina Kemme, G. A. (2010). Database Replication: a Tale of Research across Communities. Proceedings of the VLDB Endowment. Christian Plattner, G. A. (2004). Ganymed: Scalable Replication for Transactional Web Applications. Middleware '04 Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware. F. D. Muñoz-Escoí, H. D. (2009). Replication, A Survey of Approaches to Database. In J. H. Viviana E. Ferraggine, Handbook of Research on Innovations in Database Technologies and Applications: Current and Future Trends (2 Volumes) (pp. 762-768). IGI Global. Jim Gray, P. H. (1996). The Dangers of Replication and a Solution. SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data. Konstantinos Krikellas, S. E. (2010). Strongly consistent replication for a bargain. 26th IEEE International Conference on Data Engineering. Data Distribution with SQL Server Replication Page 9 of 9