Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic
Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The Best of All Worlds 2
Data Management Challenges Data Chaos/Data Explosion Unstructured Data, Big Data, Expansive Data Backup, Data Protection, Data Recovery Storage Heterogeneity / Utilization Efficiency Data Center Footprint and Power Data Preservation Indefinite Retention Compliance Managed Retention & Access, WORM, Audit Want to build a service or cloud for Data Management Backup Want to build a Global Name Space for data accessibility Want Standardization and Policies Flexibility Cost = Everything comes down to Cost / GB 3
What is Big Data? Applications Requiring Intensive Data Mining and Analytics Financial Institutions Tic Data Analysis Trend Analysis Risk Assessment Cross Domain Correlations Health Care Drug Efficacy Disease Pattern recognition Broadcast, Media, & progression Entertainment Fraud detection Claim automation 4x resolution of HDTV Government (FBI,CIA,DOD,DOE,HLS,IRS) Internet threat detection Deeper in Color Pattern recognition Online Streaming Image analysis Fraud and waste mitigation Consumer / Commercial Space Social Media and Product Sentiment correlation Consumer Analysis Advertising Affinity correlation Telemetry and Quality analysis Mapping and Satellite Mortgage Data Check Images Brokerage Data Medical Records Digital Imaging Genomic Increase frame rate per second 2 4X Video Surveillance Biometrics Research Data Digital Media Videos, Music, Books, Photos Seismic/Oil & Gas
Data Protection BACKUP Data Protection BIG DATA = EXPANSIVE DATA 5
Data Protection Traditional Backup: Designed for Point In Time Recovery Designed for Disaster Recovery Full Backups Incrementals Differentials Tape primary media Tape primary offsite strategy 6
Data Protection Backup Issues Dataset Size (GB s TB s PB s) Backup Duration (Minutes Hours Days) Backup Reliability % of Stale/Inactive Data Full backups backing up higher % of the same data over and over again Windows of Exposure Time to get tapes created and offsite Restore Complexity and Duration Proprietary 7
Data Protection Backup Adaptability Enter VTL / Disk based backups Somewhat Faster Backups Virtual Tape Libraries / seem-less integration Introduced the concept of deduplication Introduced the concept of backup image replication Limited Retention on Disk due to Cost and Scalability Longer retention still required tape use of backup tapes for archive became the norm. Introduction of WORM and Encryption on Tape 8
Data Protection It s All About Data Access: Faster Restores but, Still Proprietary Required Rehydration Requires IT Intervention It s Still a File-Level Restore 9
Data Recovery BACKUP Data Protection BIG DATA = EXPANSIVE DATA MIRRORS SNAPSHOTS REPLICATION Data Recovery 10
Data Recovery Enter Storage-based Mirrors, Snapshots, Replication Block Based One or More Times Per Day Point in Time Recovery (more Aggressive) Close to Instant Recovery Offsite Protection 11
Data Recovery Storage Based Data Recovery Issues Storage Vendor Dependent Dependent on Primary Volume Integrity Short Term Retention Only Still need Traditional Backup Double or Triple the Storage Cost of Storage Bandwidth Considerations Spinning Disk Durability: how many copies is enough? 12
Data Recovery Storage Adaptability Introduce Storage Tiering Introduce Deduplication Introduce Off-Array based Snapshots Longer retention still requires backup tapes for use as a long term archive 13
Data Recovery It s All About Data Access: Snapshots Eliminated Restores Fast and Easy but, Primarily short retention only Storage vendor specific Mirrors and Access to Replicated Copies Complicated and required IT intervention Deduplication still requires rehydration overhead 14
Data Archive BACKUP Data Protection BIG DATA= EXPANSIVE DATA MIRRORS SNAPSHOTS REPLICATION Data Archive HSM ILM ACTIVE Data Recovery 15
Data Archive Let s try Hierarchical Storage Management (HSM) on Open Systems Let s call it Information Life Cycle Management (ILM) Let s address inactive/stale data and compliance challenges We ll move the data and manage the archive We ll include a combination of disk and tape to address everyone s needs 16
Data Archive We ll introduce immutability models Write Once, Read Many (WORM) Write Once, Read None (WORN) Write Once, Read Seldom if Ever (WORSE) We ll introduce a Storage Platform designed for Compliance Introduction of Content Addressable Storage And that s exactly what the industry tried to do 17
Data Archive Data Archive Issues One ILM Archive Technology did not address all types of data and applications Most ILM Archive Platforms were oriented around disk no native access to tape Cost and Complexity No Technology Refresh Strategies Long Term Preservation No Self-Healing Limited Scalability Archives were vendor proprietary Data Mover and Archive had to be the same vendor How do you protect the Archive? 18
Data Archive It s All About Data Access: Somewhat Limited or Redirection to files had to be via Pointers or Stubs left behind on the primary storage tier Access had to be via a Proprietary Archive Application GUI or Client/Application Plug-In 19
Active Archive Concept An Active Archive contains native file format data transparently accessible to end users through a file system interface (CIFS, NFS) Active Archiving is not a single product. It s a collaborative solution offered by software and multiple hardware vendors, and in the proposed scenario, also takes advantage of existing equipment Vendor Agnostic- Consisting of data management software, disk, and tape options 20
Active Archive Out of Band Conceptual Design Primary Storage Z: DRIVE NFS / CIFS NFS / CIFS Data Mover, Copy Technologies Data Management Software Tape Secondary Heterogeneous Storage Data Management Framework Remote Data Center/ and or 2012 Cloud Storage Developer Conference. Spectra Logic Inc.. All Rights Reserved. Remote Data Center And/or Cloud
Data Archive It s All About Data Access: An Active Archive provides native access to disk or tape without File-Level restores. An Active Archive presents an NFS and/or CIFS Gateway for transparent Access An Active Archive process is heterogeneous, working with multiple data movers, data management software, and storage/tape vendors Designed for Long Term Data Preservation 22
Data Durability BACKUP Data Protection OBJECT BASED STORAGE BIG DATA = EXPANSIVE DATA MIRRORS SNAPSHOTS REPLICATION Data Archive ARCHIVE HSM ILM ACTIVE Data Recovery 23
Why Object Based Storage? Looming, Spinning Disk Challenges: RAID Limitations Larger Disk Capacities Larger RAID Sets Increased number of RAID Sets per array due to higher capacity Introduce Longer Rebuild Times (days vs. hours) Higher Potential of Failures Unrecoverable Bit Errors ( bit rot ) Data Loss or Corruption during rebuild process 24
Why Object Based Storage? Replicated Copy Limitations Size of Data Sets Number of Copies needed for redundancy Rebuild Time for Mirrors Storage Capacity Required = Cost File System Limitations # s of files, directories, volumes Scalability Metadata Management Indexing, Search Ability, File Management 25
Why Object Based Storage? Four Major Benefits Levels of Protection/Data Durability Separation of File Intelligence from Physical Data Scalability Accessibility 26
Why Object Based Storage? Protection and Durability Replaces RAID much higher availability and reliability (Multiple 9 s) Erasure Coding Methodologies (i.e. Reed- Solomon, Fountain-Codes) Files are managed as data objects separation of file intelligence from physical data Data objects are transformed into a serious of equations which are redundant and distributed across a storage pool Self Healing Capabilities 27
Why Object Based Storage? Protection and Durability Replica s and Reliability Policies Define the Failure Tolerance Level of specific objects Define how many disks the objects should be spread over Define how many simultaneous failures that should be tolerated Example: 16/4 = Indicates that data will be spread across 16 drives in a manner that can tolerate 4 simultaneous failures 28
Why Object Based Storage? 29
Why Object Based Storage? Separation of Intelligence from Physical Data Scalable Metadata store Enables search, mining, and analytics of billions of objects without touching physical media Standard Metadata Object ID s, object size, creation dates, location Custom Metadata User Definable 30
Why Object-Based Storage? Scalability 100TB 100 s PB Space is Allocated across Storage Pools On-the fly capacity upgrades No file system or RAID rebuild limitations Automatic Restriping Capabilities Automated Replica Management No Volume Configuration Multi-Site Support Metadata and Data Store can scale with no impact on performance 31
Why Object-Based Storage? Accessibility Object Stores are accessible via Rest and HTTP Protocols Cloud and As A Service Enabling They require applications to be Object Storage Aware 32
Whey Object Based Storage? It s All About Data Access: Data stored within the Object Store can be natively accessed by an Object Store aware application Eliminates the Need for File-Level restores Data can be stored or copied within an Object Store thus eliminating the need to continually back it up Overcomes the limitations with spinning disk and capacity considerations 33
Best of All Worlds OBJECT BASED STORAGE Data Durability BIG DATA = EXPANSIVE DATA Data Protection ACTIVE ARCHIVE Data Recovery Data Archive 34
Best of All Worlds Combine Active Archive with Object Based Storage Remove Spinning Disk Concerns and Limitations Durability (Replica s, Reliability Policies) Capacity Management Scalability Add Enhanced Metadata Management Add Content Policy Management Retention, Immutability Deletion/Purge 35
Best of All Worlds NFS/CIFS (NAS Gateway) Capabilities Expanded Accessibility Transparent Application/User Access Ability to Leverage Tape and Disk transparently as a drive letter, share, mount point Ability to replace Traditional Backup Ability to Leverage Tape as NAS Ability to Leverage Tape for Open Portability Why Tape? Cost vs. Capacity (higher areal density) Capacity vs. Footprint 36
Best of All Worlds Primary Storage Z: DRIVE NFS / CIFS NFS / CIFS Data Mover, Copy Technologies Active Archive Management Software REST/HTTP Tape Object Storage Aware Applications Data Management Framework Remote Data Center and/or 2012 Cloud Storage Developer Conference. Spectra Logic Inc.. All Rights Reserved. Remote Data Center and/or Cloud
Best of All Worlds Ultimately, It s All About Data Access: Active Archive + Object Based Storage Cloud Ready, Scalable, Cost Effective, Long Term Data Storage And the End of File Level Restores!
Questions? stacys@spectralogic.com