Alternatives to Big Backup

Similar documents
Object Storage, Cloud Storage, and High Capacity File Systems

Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

Trends in Enterprise Backup Deduplication

Object Oriented Storage and the End of File-Level Restores

Scalable Storage for Life Sciences

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Backup and Recovery 1

RAID. Contents. Definition and Use of the Different RAID Levels. The different RAID levels: Definition Cost / Efficiency Reliability Performance

How To Protect Data On Network Attached Storage (Nas) From Disaster

How To Create A Multi Disk Raid

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Turnkey Deduplication Solution for the Enterprise

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

ANY SURVEILLANCE, ANYWHERE, ANYTIME

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

Replication and Erasure Coding Explained

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

EMC DATA DOMAIN OPERATING SYSTEM

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

EMC DATA DOMAIN OPERATING SYSTEM

What is RAID and how does it work?

DeltaStor Data Deduplication: A Technical Review

A Deduplication File System & Course Review

Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level. Jacob Farmer, CTO, Cambridge Computer

Protecting Information in a Smarter Data Center with the Performance of Flash

June Blade.org 2009 ALL RIGHTS RESERVED

PIONEER RESEARCH & DEVELOPMENT GROUP

PARALLELS CLOUD STORAGE

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Avamar. Technology Overview

Data Protection. the data. short retention. event of a disaster. - Different mechanisms, products for backup and restore based on retention and age of

Reliability and Fault Tolerance in Storage

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Object Storage: Out of the Shadows and into the Spotlight

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

BUSINESS CONTINUITY AND DISASTER RECOVERY FOR ORACLE 11g

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

TCO Case Study Enterprise Mass Storage: Less Than A Penny Per GB Per Year

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

LEVERAGING EMC SOURCEONE AND EMC DATA DOMAIN FOR ENTERPRISE ARCHIVING AUGUST 2011

Moving Beyond RAID DXi and Dynamic Disk Pools

Trends in Application Recovery. Andreas Schwegmann, HP

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

EMC arhiviranje. Lilijana Pelko Primož Golob. Sarajevo, Copyright 2008 EMC Corporation. All rights reserved.

System Availability and Data Protection of Infortrend s ESVA Storage Solution

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

DATA BACKUP & RESTORE

Definition of RAID Levels

Backup Over 2TB: Best Practices for Cloud Backup and DR with Large Data Sets

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

The Microsoft Large Mailbox Vision

Introduction to Data Protection: Backup to Tape, Disk and Beyond

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

ReadyRECOVER. Reviewer s Guide. A joint backup solution between NETGEAR ReadyDATA and StorageCraft ShadowProtect

Designing a Cloud Storage System

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines

Using object storage as a target for backup, disaster recovery, archiving

CS161: Operating Systems

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

3Gen Data Deduplication Technical

Restoration Technologies. Mike Fishman / EMC Corp.

Redefining Microsoft SQL Server Data Management. PAS Specification

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside

Backup to the Future. Hugo Patterson, Ph.D. Backup Recovery Systems, EMC

Presents. Attix5 Technology. An Introduction

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

EMC BACKUP-AS-A-SERVICE

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Xanadu 130. Business Class Storage Solution. 8G FC Host Connectivity and 6G SAS Backplane. 2U 12-Bay 3.5 Form Factor

High Availability Databases based on Oracle 10g RAC on Linux

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

HadoopTM Analytics DDN

RAID Overview

Efficient Backup with Data Deduplication Which Strategy is Right for You?

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives

1 Storage Devices Summary

How To Make A Backup System More Efficient

EMC DATA DOMAIN PRODUCT OvERvIEW

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

Symantec NetBackup deduplication general deployment guidelines

NetApp Big Content Solutions: Agile Infrastructure for Big Data

Storing Data: Disks and Files

ClearPath Storage Update Data Domain on ClearPath MCP

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Ensuring Robust Data Integrity

Why disk arrays? CPUs improving faster than disks

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Transcription:

Alternatives to Big Backup Life Cycle Management, Object- Based Storage, and Self- Protecting Storage Systems Presented by: Chris Robertson Solution Architect Cambridge Computer Copyright 2010-2011, Cambridge Computer Services, Inc. All Rights Reserved www.cambridgecomputer.com 781-250-3000

Who is Cambridge Computer? Solution Architect at Cambridge Computer crobertson@cambridgecomputer.com Cambridge Computer Expertise in storage networking, data protection, and data life cycle management Founded in 1991 Based in Boston with regional teams spread around the country Unique business model with no costs or commitments to our clients (ask us how this is possible) Clients of all shapes and sizes Museums, K12, Defense Contractors, Banks, etc. Everyone has data. No one wants to lose it! 2

The Futility of Traditional Backups Data accumulates over time If your primary storage capacity doubles, then BOTH the CAPACITY and the SPEED of your backup system must double. Storage devices become BRITTLE as they get bigger and bigger The bigger they are, the harder they fall As we move away from tape-based backup, we rely on increasingly larger target storage devices Targets have to hold backup data for multiple primary storage systems Targets have to retain previous versions of data Yes, deduplication is very helpful, but we then run into scalability and cost issues Policies are usually ill-defined and religion gets in the way 3

Doubling is Serious Business 4

Where Will the Solution Lie? Using the right tool for the right job Smarter backup software that captures changes more frequently and more granularly Outside of the scope of today s talk incremental forever, synthetic fulls, CDP, snapshots, replication Next generation storage systems based on object-based algorithms Self-protecting storage systems More sophisticated and more scalable backup targets Our ability to separate active data from in active data Store inactive data on self-protecting storage devices Free up resources to better manage active data 5

What is Wrong with RAID? 6

Bigger Hard Drives: Friend or Foe? The Good News: As drives grow bigger we can achieve more capacity with fewer devices Fewer devices = higher density, lower power consumption, fewer device failures The Bad News MTBF not growing as fast Bandwidth into device not growing as fast Consequences Unreliability (per bit) growing Accessibility of data (per bit) shrinking Drive rebuild times are longer, which increases overall risk of data loss Rebuilding failed drives has a heavier impact on performance 7

RAID Rebuilds Take Too Long RAID 5 rebuilds take too long On the order of 36 hours per TB 4TB drive could take a week to rebuild RAID 6 (double parity offers some protection) But what happens when we have 8TB drives? The more stuff you have the higher the chance of failures. If you have 1PB or more, something will always be broken 8

Redundancy Between Cabinets: Can You Have Too Much Redundancy? Is this really a good idea? How long will it take to re-mirror a 14TB RAID 6 stripe? Is there a better way to protect against a device failure? Replication? Backup? Mirroring at a different level of abstraction? 9

Big Storage is Fragile Storage systems become brittle as they scale up The FRUs are too big and cumbersome Individual hard drives are too large RAID subsystems and disk cabinets are too large the more you have the more likely one is to fail. We need new architectures The bigger they are the harder they fall Backup is difficult. Restore is almost impossible. If recovery time is important, you have to replicate Replication is expensive Big storage systems need to be self-protecting and selfhealing 10

What Does it Mean to Be Self- Protecting? Snapshot and Replicate = is it good enough? Can you fail over? Can you fail back? What if something breaks other than hardware? File system corruption? User error? Software bug? Sabotage? Do you still need a backup? 11

How Big is the Building Block? What Are You Building? What Size Building Block? An outhouse? The foundation for a new house? A pyramid? Brick Cinder Block Boulder A parking garage? Grains of Sand (Concrete) 12

Object Storage More than Just the Cloud 13

Objects Represent a Different Way to Address Data Block Blocks are addressed by Device ID and sequential block number. File Object Files are addressed by UNC paths: \\MyServer\MyFolder\MyFile.doc Objects are addressed by an ID that is unique to the storage system. - Sequentially assigned number - Randomly assigned number - A hash derived as a function of the objects content - A combination of things 14

What is an Object An object is a chunk of data that can be individually addressed and manipulated A file is a chunk of data A zip file containing many files is a chunk of data A file can be made up of several chunks of data A block is a chunk of data A volume (a range of blocks) is made up of chunks of data Pages, extents, chunks, chunklets are objects consisting of multiple blocks Email? An email message is a chunk of data An email attachment is a chunk of data An email message along with its attachments could be treated as a single chunk of data. Often objects have associated metadata Descriptive information or tags Provenance 15

Object Granularity: Fine-Grained v. Coarse-Grained Objects Fine-Grained Object is a portion of a file, akin to a block But might be variable in size Objects are opaque individually they are just blobs of data Very friendly to caching and distribution over a WAN Might be friendly to subfile-level deduplication Coarse-Grained Object is a whole file or some kind of container Changes made to the file might generate a whole new object Deltas between versions can be stored as objects that reference a parent object Often have additional properties (metadata) associated with them 16

Coarse-Grained Objects Can Contain Fine-Grained Objects 17

Content Addressing Content addressing calculates a hash of the data that makes up the object and uses the hash as an address Locality independence An object can live in multiple location for: Redundancy Parallelism Local processing affinity Data integrity The object can be compared against its hash for integrity checking If the hash test fails, simply retrieve a copy of the object and repair the corrupt object Deduplication Two objects with the same name are actually the same object 18

Self Healing and Data Protection in Object Stores 19

Basic Object-Level Redundancy: An Alternative to RAID and Mirroring 20

Redundant Objects Propagate on Device Failure 21

Object Mirroring Across a WAN 22

Erasure-Coded Data Protection: An Alternative to Parity-Based RAID 23

You Can Lose X% of Your Storage Without Losing Data 24

Dispersed Storage: Erasure Coded Storage Across the WAN 25

Some Real-World Examples of Object-Based Storage 26

A SAN Array Based on an Object Storage Model 27

Splitting SAN I/O into a Block Stream and an Object Stream 28

Object-Based File System with Erasure Coding and Global Dedupe 29

Shared File System Leveraging a Cloud-based Object Store 30

Object-Based Archive File System: Automatic Back up to Tape 31

Object-Based Archival File System Stored Entirely on Tape 32