Alternatives to Big Backup Life Cycle Management, Object- Based Storage, and Self- Protecting Storage Systems Presented by: Chris Robertson Solution Architect Cambridge Computer Copyright 2010-2011, Cambridge Computer Services, Inc. All Rights Reserved www.cambridgecomputer.com 781-250-3000
Who is Cambridge Computer? Solution Architect at Cambridge Computer crobertson@cambridgecomputer.com Cambridge Computer Expertise in storage networking, data protection, and data life cycle management Founded in 1991 Based in Boston with regional teams spread around the country Unique business model with no costs or commitments to our clients (ask us how this is possible) Clients of all shapes and sizes Museums, K12, Defense Contractors, Banks, etc. Everyone has data. No one wants to lose it! 2
The Futility of Traditional Backups Data accumulates over time If your primary storage capacity doubles, then BOTH the CAPACITY and the SPEED of your backup system must double. Storage devices become BRITTLE as they get bigger and bigger The bigger they are, the harder they fall As we move away from tape-based backup, we rely on increasingly larger target storage devices Targets have to hold backup data for multiple primary storage systems Targets have to retain previous versions of data Yes, deduplication is very helpful, but we then run into scalability and cost issues Policies are usually ill-defined and religion gets in the way 3
Doubling is Serious Business 4
Where Will the Solution Lie? Using the right tool for the right job Smarter backup software that captures changes more frequently and more granularly Outside of the scope of today s talk incremental forever, synthetic fulls, CDP, snapshots, replication Next generation storage systems based on object-based algorithms Self-protecting storage systems More sophisticated and more scalable backup targets Our ability to separate active data from in active data Store inactive data on self-protecting storage devices Free up resources to better manage active data 5
What is Wrong with RAID? 6
Bigger Hard Drives: Friend or Foe? The Good News: As drives grow bigger we can achieve more capacity with fewer devices Fewer devices = higher density, lower power consumption, fewer device failures The Bad News MTBF not growing as fast Bandwidth into device not growing as fast Consequences Unreliability (per bit) growing Accessibility of data (per bit) shrinking Drive rebuild times are longer, which increases overall risk of data loss Rebuilding failed drives has a heavier impact on performance 7
RAID Rebuilds Take Too Long RAID 5 rebuilds take too long On the order of 36 hours per TB 4TB drive could take a week to rebuild RAID 6 (double parity offers some protection) But what happens when we have 8TB drives? The more stuff you have the higher the chance of failures. If you have 1PB or more, something will always be broken 8
Redundancy Between Cabinets: Can You Have Too Much Redundancy? Is this really a good idea? How long will it take to re-mirror a 14TB RAID 6 stripe? Is there a better way to protect against a device failure? Replication? Backup? Mirroring at a different level of abstraction? 9
Big Storage is Fragile Storage systems become brittle as they scale up The FRUs are too big and cumbersome Individual hard drives are too large RAID subsystems and disk cabinets are too large the more you have the more likely one is to fail. We need new architectures The bigger they are the harder they fall Backup is difficult. Restore is almost impossible. If recovery time is important, you have to replicate Replication is expensive Big storage systems need to be self-protecting and selfhealing 10
What Does it Mean to Be Self- Protecting? Snapshot and Replicate = is it good enough? Can you fail over? Can you fail back? What if something breaks other than hardware? File system corruption? User error? Software bug? Sabotage? Do you still need a backup? 11
How Big is the Building Block? What Are You Building? What Size Building Block? An outhouse? The foundation for a new house? A pyramid? Brick Cinder Block Boulder A parking garage? Grains of Sand (Concrete) 12
Object Storage More than Just the Cloud 13
Objects Represent a Different Way to Address Data Block Blocks are addressed by Device ID and sequential block number. File Object Files are addressed by UNC paths: \\MyServer\MyFolder\MyFile.doc Objects are addressed by an ID that is unique to the storage system. - Sequentially assigned number - Randomly assigned number - A hash derived as a function of the objects content - A combination of things 14
What is an Object An object is a chunk of data that can be individually addressed and manipulated A file is a chunk of data A zip file containing many files is a chunk of data A file can be made up of several chunks of data A block is a chunk of data A volume (a range of blocks) is made up of chunks of data Pages, extents, chunks, chunklets are objects consisting of multiple blocks Email? An email message is a chunk of data An email attachment is a chunk of data An email message along with its attachments could be treated as a single chunk of data. Often objects have associated metadata Descriptive information or tags Provenance 15
Object Granularity: Fine-Grained v. Coarse-Grained Objects Fine-Grained Object is a portion of a file, akin to a block But might be variable in size Objects are opaque individually they are just blobs of data Very friendly to caching and distribution over a WAN Might be friendly to subfile-level deduplication Coarse-Grained Object is a whole file or some kind of container Changes made to the file might generate a whole new object Deltas between versions can be stored as objects that reference a parent object Often have additional properties (metadata) associated with them 16
Coarse-Grained Objects Can Contain Fine-Grained Objects 17
Content Addressing Content addressing calculates a hash of the data that makes up the object and uses the hash as an address Locality independence An object can live in multiple location for: Redundancy Parallelism Local processing affinity Data integrity The object can be compared against its hash for integrity checking If the hash test fails, simply retrieve a copy of the object and repair the corrupt object Deduplication Two objects with the same name are actually the same object 18
Self Healing and Data Protection in Object Stores 19
Basic Object-Level Redundancy: An Alternative to RAID and Mirroring 20
Redundant Objects Propagate on Device Failure 21
Object Mirroring Across a WAN 22
Erasure-Coded Data Protection: An Alternative to Parity-Based RAID 23
You Can Lose X% of Your Storage Without Losing Data 24
Dispersed Storage: Erasure Coded Storage Across the WAN 25
Some Real-World Examples of Object-Based Storage 26
A SAN Array Based on an Object Storage Model 27
Splitting SAN I/O into a Block Stream and an Object Stream 28
Object-Based File System with Erasure Coding and Global Dedupe 29
Shared File System Leveraging a Cloud-based Object Store 30
Object-Based Archive File System: Automatic Back up to Tape 31
Object-Based Archival File System Stored Entirely on Tape 32