The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle



Similar documents
Deduplication Demystified: How to determine the right approach for your business

Data Deduplication HTBackup

DeltaStor Data Deduplication: A Technical Review

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Data Deduplication and Tivoli Storage Manager

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

Demystifying Deduplication for Backup with the Dell DR4000

Barracuda Backup Deduplication. White Paper

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Data Deduplication: An Essential Component of your Data Protection Strategy

A Forensic Investigation of PL/SQL Injection Attacks in Oracle 1 st July 2010 David Litchfield

June Blade.org 2009 ALL RIGHTS RESERVED

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Turnkey Deduplication Solution for the Enterprise

Protect Data... in the Cloud

3Gen Data Deduplication Technical

Data Deduplication and Tivoli Storage Manager

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley

Avamar. Technology Overview

LDA, the new family of Lortu Data Appliances

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Cloud-integrated Storage What & Why

Availability Digest. Data Deduplication February 2011

Backup and Recovery 1

Data Compression and Deduplication. LOC Cisco Systems, Inc. All rights reserved.

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Redefining Oracle Database Management

Understanding EMC Avamar with EMC Data Protection Advisor

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

ACCELERATING SQL SERVER WITH XTREMIO

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Speeding Up Cloud/Server Applications Using Flash Memory

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Trends in Enterprise Backup Deduplication

89 Fifth Avenue, 7th Floor. New York, NY White Paper. HP 3PAR Thin Deduplication: A Competitive Comparison

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside

Understanding EMC Avamar with EMC Data Protection Advisor

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

Oracle Database 10g: Backup and Recovery 1-2

09'Linux Plumbers Conference

ClearPath Storage Update Data Domain on ClearPath MCP

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Data Deduplication Background: A Technical White Paper

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Evaluation Guide. Software vs. Appliance Deduplication

How To Make A Backup System More Efficient

Understanding Enterprise NAS

Future-Proofed Backup For A Virtualized World!

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs

All-Flash Arrays: Not Just for the Top Tier Anymore

Protect Microsoft Exchange databases, achieve long-term data retention

Deploying De-Duplication on Ext4 File System

WHITE PAPER. Storage Savings Analysis: Storage Savings with Deduplication and Acronis Backup & Recovery 10

Presentation Identifier Goes Here 1

EMC VNX2 Deduplication and Compression

ITCertMaster. Safe, simple and fast. 100% Pass guarantee! IT Certification Guaranteed, The Easy Way!

Redefining Microsoft SQL Server Data Management. PAS Specification

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

EMC XTREMIO EXECUTIVE OVERVIEW

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

FLASH 15 MINUTE GUIDE DELIVER MORE VALUE AT LOWER COST WITH XTREMIO ALL- FLASH ARRAY Unparal eled performance with in- line data services al the time

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Aspirus Enterprise Backup Assessment and Implementation of Avamar and NetWorker

Data Backup and Archiving with Enterprise Storage Systems

EMC BACKUP MEETS BIG DATA

Upgrade to Oracle E-Business Suite R12 While Controlling the Impact of Data Growth WHITE PAPER

Inline Deduplication

An Oracle White Paper January Advanced Compression with Oracle Database 11g

Backup and Restore Back to Basics with SQL LiteSpeed

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Business Benefits of Data Footprint Reduction

CONSOLIDATING MICROSOFT SQL SERVER OLTP WORKLOADS ON THE EMC XtremIO ALL FLASH ARRAY

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Configuring Backup Settings. Copyright 2009, Oracle. All rights reserved.

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Transcription:

The Curious Case of Database Deduplication PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

Agenda Introduction Deduplication Databases and Deduplication All Flash Arrays and Deduplication 2

Quick Show of Hands How many Storage administrators? How many System administrators? How many Network administrators? How many Database administrators? Neither? Example, Managers? 3

Enterprise Data Growth Enterprise data is growing 10-20% per year Some industries upwards of 50%+ data growth / annually Increase in production data is magnified in the backup storage infrastructure More production data expands exponentially into the backup storage infrastructure Year over year data growth! 4

Exponential Growth in Backup Infrastructure Data Growth Multiples in Backup Copies: Production Data Size Last Year: This Year: One Month of Backups Backup retention periods remain constant based on business needs Often expanding due to regulatory requirements Storage budget doesn t scale with data growth Data center space and power constraints hamper storage scale-outs 5

Backup Storage Optimization Strategies Backup Compression Software (e.g. native utilities) Hardware (e.g. tape or disk drives) Deduplication Source Side (software) Purpose Built Backup Appliances Hybrid Approach 6

Agenda Introduction Deduplication Concepts Databases and Deduplication All Flash Arrays and Deduplication 7

Deduplication 101 Deduplication: Replace redundant backup data with pointers to shared copy Redundant data identified by unique algorithms Lowers storage costs by reducing capacity requirements Can be done at source, inline or post-process Mileage varies for deduplication of database blocks 8

Deduplication Internals Is the Fingerprint Unique? Yes Write to storage, update the metadata Backup Data Deduplication Engine Hash / Fingerprint No Update the metadata Matches Duplicate Found: Bit Compare with existing block Doesn t Match Write to storage, update the metadata 9

Deduplication & Compression Compression is about shrinking the dataset Encoding information using fewer bits than the original representation Examples: Lempel-Ziv (LZ) compression Deduplication is about finding unique datasets and reducing redundant dataset Deduplicated dataset can be further compressed Compressed dataset will yield poor and unpredictable deduplication ratios 10

Source Side Deduplication Deduplication software is installed on each host Software reads the data to be backed up and looks for duplicates Transmits the deduped backup to a storage appliance, for most cases Performance impact: Host : High Network: Low Usually the backups need to be reconstituted when copied to tape 11

Target Side Deduplication Deduplication occurs at backup/storage appliance Two flavors: Inline Backup is deduplicated prior to being written to disk Post-process Backup is written to a staging area, then deduplicated Performance Impact: Host : High Network: High Backups must be reconstituted when copied to tape 12

Agenda Introduction Deduplication Concepts Databases and Deduplication All Flash Arrays and Deduplication 13

Databases And Deduplication Leading vendor 30x times deduplication for File Systems 6x for Relational Databases Emerging all Flash array vendor 10x saving for file systems 2.1x for Relational Databases Most vendors recommend full backups to achieve better deduplication ratios What drives low deduplication ratios for databases? 14

Relational Databases Quick Intro Relational database: repository and management of data organized in a relational model * Database Management System (DBMS): the software infrastructure to maintain these key characteristics: Structures: Well-defined objects to store data Operations: Well-defined actions to access & update data & structure Integrity rules: Well-defined rules to govern operations Data organized in a table: a two-dimensional relation with rows (tuples) and columns (attributes) Each row in a table has the same set of columns E.g. employee, department, salary tables 15

Databases and Storage Database Objects Reads and Writes Meta data associated with these Objects Database Logging Mechanism to deliver transactional consistency Critical functionality for database operation Availability features 16

Database Block Format Database manages its logical storage in data blocks Minimum unit of I/O A data block has a welldefined structure Block header is kept consistent with payload, rows do not overlap, metadata in its place More than simple bits: can always verify logical consistency 17

Database Block Format.. Contd Even the row (user) data is very carefully formatted within a database data block Blocks have a fixed size (usually 8K) All blocks are unique Update to a block will result in metadata updates to adjacent blocks as well 18

Understanding Database Backup Options Image copy: Full Backups Fairly efficient for restore Inefficient use of Server, Network and Storage resources Might Provide Better deduplication ratios Application aware intelligent image copy Better storage and network resource utilization, but not for server workload Faster restore times compared to incremental copies Incremental Backups Transmit only changes on a periodic basis Efficient use to storage, network and server resources Time to recover depends on the number of incremental backups needed to be restored Faster than any full backup strategy, reduces backup window Most successful Database backup strategies include a combination of the two approaches to meet your Recovery Objectives 19

Incremental Backups And Deduplication Changed data 1 2 3 4 5 7 6 8 B 9 A C 9 1 2 5 6 B 3 7 4 A C 8 Full Backup Sunday Incremental Monday Incremental Tuesday Incremental Wednesday Full Backup Thursday Blocks Transmitted All 1 2 3 4 5 6 7 8 9 A B C All A database aware incremental backup only has unique blocks Deduplication ratios are minimal / unpredictable Limited to duplicates in the user data within the block Low impact on database server and network 20

Full Backups And Deduplication Changed data 1 2 3 4 5 7 6 8 B 9 A C 9 1 2 5 6 B 3 7 4 A C 8 Full Backup Sunday Full Backup Monday Full Backup Tuesday Full Backup Wednesday Full Backup Thursday Blocks Transmitted All All All All All Huge number of redundant blocks transmitted In the above example: almost 80% of the traffic is redundant With a typical 5%-10% churn rate nearly 90-95% traffic will be redundant Hence deduplication ratios are high Significant impact on database server and network 21

Database Compression and Deduplication Several database integrated compression techniques: OLTP data compression on user data Columnar compression Backup compression Most will yield unpredictable deduplication performances on the backup stream Similar implications with encryption as well 22

What About Database Logs? Fundamental structure of the database: transaction logs Most crucial structure for data recovery Contents: change data, undo data, schema / object management statements, etc. Well-defined data fields, relationships Almost zero scope for deduplication REDO RECORD - Thread:1 RBA: 0x00003e.0000d188.01b0 LEN: 0x07c8 VLD: 0x01 SCN: 0x0000.00d6ca18 SUBSCN: 1 04/30/2007 21:06:42 CHANGE #1 TYP:0 CLS:67 AFN:3 DBA:0x00c1fbb1 OBJ:4294967295 SCN:0x0000.00d6ca15 SEQ: 1 OP:5.2 ktudh redo: slt: 0x0007 sqn: 0x00000000 flg: 0x000a siz: 120 fbi: 0 uba: 0x00c05098.0004.01 pxid: 0x0000.000.00000000 23

To Summarize Is deduplication not the correct design choice for Database backups? Well, it depends Better deduplication ratios possible: With daily full backups May not be practical, based on database size By disabling database compression / encryption Consistent with your security / capacity policies? After customizing backup parameters Need to balance database / dedupe best practices For storage efficiency, do the testing + math Daily incremental backups + native compression VS. Daily full backups + 3rd party dedupe 24

Agenda Introduction Deduplication Concepts Databases and Deduplication All Flash Arrays and Deduplication 25

Deduplication And All Flash Arrays Solid State Drives are still 5-10x more expensive than Hard Disk Drives Economics of SSDs improve as the footprint of the dataset to be stored is reduced Some All Flash Array vendors are leveraging deduplication to increase the storage efficiency of SSDs Deduplication ratios are fairly modest compared to that seen in secondary storage or Purpose Built Backup Appliances 26

Deduplication in All Flash Arrays Technology Enablers Is the Fingerprint Unique? Yes Write to storage, update the metadata Dataset Deduplication Engine Hash / Fingerprint No SSDs accelerate read I/O by many folds thus accelerating bit comparisons and fingerprint lookups Duplicate Found: Bit Compare with existing block Matches Update the metadata Fewer writes result in higher flash endurance Less strong hashing allows the array to leverage in-built CPU hashing functions to generate the fingerprint Doesn t Match Write to storage, update the metadata 27

Databases, Deduplication and All Flash Arrays Most All Flash Arrays use fixed length chunk size By definition each database block update results in a unique block and changes to metadata of adjacent blocks Algorithms that are unaware of database block boundary will generate a unique hash for each update Thus defeating the purpose of deduplication 28

Questions? 29