A Primer on Object Storage, Cloud Storage, and High Capacity File Presented by: Chris Robertson Sr. Solution Architect Cambridge Computer Copyright 2010-2011, Cambridge Computer Services, Inc. All Rights Reserved www.cambridgecomputer.com 781-250-3000
About Your Lecturer: Chris Robertson SA at Cambridge Computer 25% of my time I do what industry analysts do 75% of my time is client-facing, solving problems and reconciling to budgets Cambridge Computer Expertise in storage networking, data protection, and data life cycle management Founded in 1991 Based in Boston with regional teams spread around the country Unique business model with no costs or commitments to our clients (ask us how this is possible) Clients of all shapes and sizes Museums, K12, Defense Contractors, Banks, etc. Everyone has data. No one wants to lose it! 2
A Unique Business Model: Combining the Best of All Worlds... 3
What is Cloud Storage? The Cloud has the same challenges that any other enterprise has The design challenges of cloud storage are relevant to private users with large private data collections. Cloud storage has three major incarnations Enterprise storage for applications that are hosted in the cloud Dynamic provisioning of storage with careful attention to balancing capacity and performance. Hosted backups Granular / efficient backups with backups with data automatically stored off site Redundant object storage Geographically dispersed, redundant storage for data that does not change much. 4
A Typical Cloud Service: 3 Copies of Each Object Stored Somewhere 5
The Cloud is Accessed Through SOAP/ REST Software Interface File or block interface SOAP/REST interface Dedicated appliance and/or software app On-Ramp 6
Doubling is Serious Business 7
Traditional Storage Models Don t Scale Data accumulates over time If your primary storage capacity doubles, then BOTH the CAPACITY and the SPEED of your backup system must double. Backups take too long. Restores take too long. Storage devices become BRITTLE as they get bigger and bigger The bigger they are, the harder they fall Wholesale data migration between storage devices is impractical. Massive storage systems must allow for in place upgrades. 8
Moving a PB is Heavy Lifting Data Rate Example Total Time (Approximate) 140MB/Sec 1GB/Sec LTO-5 tape drive at full tilt without factoring in compression A beefy Virtual Tape Library A dedicated 10Gb Ethernet 82.5 days 11 days 1.5mb/Sec A dedicated T-1 176 years 156mb/Sec An OC3 640 days 2488mb/Sec An OC48 40 Days 9
What is Wrong with RAID? 10
Bigger Hard Drives: Friend or Foe? The Good News: As drives grow bigger we can achieve more capacity with fewer devices Fewer devices = higher density, lower power consumption, fewer device failures The Bad News MTBF not growing as fast Bandwidth into device not growing as fast Consequences Unreliability (per bit) growing Accessibility of data (per bit) shrinking Drive rebuild times are longer, which increases overall risk of data loss Rebuilding failed drives has a heavier impact on performance 11
RAID Rebuilds Take Too Long RAID 5 rebuilds take too long On the order of 36 hours per TB 4TB drive could take a week to rebuild RAID 6 (double parity offers some protection) But what happens when we have 8TB drives? The more stuff you have the higher the chance of failures. If you have 1PB or more, something will always be broken 12
Redundancy Between Cabinets: Can You Have Too Much Redundancy? Is this really a good idea? How long will it take to re-mirror a 14TB RAID 6 stripe? Is there a better way to protect against a device failure? Replication? Backup? Mirroring at a different level of abstraction? 13
How Big is the Building Block? What Are You Building? What Size Building Block? An outhouse? The foundation for a new house? A pyramid? Brick Cinder Block Boulder A parking garage? Grains of Sand (Concrete) 14
Object Storage More than Just the Cloud 15
Objects Represent a Different Way to Address Data Block Blocks are addressed by Device ID and sequential block number. File Object Files are addressed by UNC paths: \\MyServer\MyFolder\MyFile.doc Objects are addressed by an ID that is unique to the storage system. - Sequentially assigned number - Randomly assigned number - A hash derived as a function of the objects content - A combination of things 16
What is an Object An object is a chunk of data that can be individually addressed and manipulated A file is a chunk of data A zip file containing many files is a chunk of data A file can be made up of several chunks of data A block is a chunk of data A volume (a range of blocks) is made up of chunks of data Pages, extents, chunks, chunklets are objects consisting of multiple blocks Email? An email message is a chunk of data An email attachment is a chunk of data An email message along with its attachments could be treated as a single chunk of data. Often objects have associated metadata Descriptive information or tags Provenance 17
Content Addressing Content addressing calculates a hash of the data that makes up the object and uses the hash as an address Locality independence An object can live in multiple location for: Redundancy Parallelism Local processing affinity Data integrity The object can be compared against its hash for integrity checking If the hash test fails, simply retrieve a copy of the object and repair the corrupt object Deduplication Two objects with the same name are actually the same object 18
Self Healing and Data Protection in Object Stores 19
Basic Object-Level Redundancy: An Alternative to RAID and Mirroring 20
Redundant Objects Propagate on Device Failure 21
Object Mirroring Across a WAN 22
Erasure-Coded Data Protection: An Alternative to Parity-Based RAID 23
You Can Lose X% of Your Storage Without Losing Data 24
Some Real-World Examples of Object-Based Storage 25
Splitting SAN I/O into a Block Stream and an Object Stream 26
Object-Based File System with Erasure Coding and Global Dedupe 27
SharePoint with External Blob Storage Gateway of some sort 28
Shared File System Leveraging a Cloud-based Object Store 29
Object-Based Archive File System: Automatic Back up to Tape 30
The Mwah Hah Hah Plan to Conquer the World 31
Summary of What We Have Today Application software that manages files on CIFS and NFS volumes for a single location Out of band respects ACLs and UIDS/GUIDS Basic support for cloud stores (S3) and object stores Key-value metadata MYSQL back end Support for 500K to 1B files Admin GUI User GUI Rest-based API Multi-threaded crawler Policy-based multi-threaded data mover Backup copies with versioning Reporting with duplicate file detection 32