PROTECTING DATA IN THE BIG DATA WORLD. Thomas Rivera, Hitachi Data Systems

Similar documents
Thomas Rivera / Hitachi Data Systems

Thomas Rivera / Hitachi Data Systems. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Deduplication s Role in Disaster Recovery. Gene Nagle, EXAR Thomas Rivera, SEPATON

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

15 Best Practices For Comparing Data Protection With Cloud Media Server

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Restoration Technologies. Mike Fishman / EMC Corp.

The Business Value of Data Deduplication DDSR SIG

Introduction to Data Protection: Backup to Tape, Disk and Beyond

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

in Transition to the Cloud

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Today s Agile, Complex and Heterogeneous Data Centers

Understanding Enterprise NAS

How to Cost Effectively Retain Reference Data for Analytics and Big Data. Molly Rector, EVP Product Management & WW Marketing, Spectra Logic

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP

Trends in Application Recovery. Andreas Schwegmann, HP

Protecting Information in a Smarter Data Center with the Performance of Flash

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Introduction to NetApp Infinite Volume

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

Cloud OS Vision. Modern platform for the world s apps

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

How To Protect Data On Network Attached Storage (Nas) From Disaster

How To Migrate To A Network (Wan) From A Server To A Server (Wlan)

Rethinking Archiving: Exploring the path to improved IT efficiency and maximizing value of archiving solution investments

EMC BACKUP MEETS BIG DATA

in the Cloud - What To Do and What Not To Do Chad Thibodeau / Cleversafe Sebastian Zangaro / HP

Cloud Archive & Long Term Preservation Challenges and Best Practices

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

Business Life Insurance

Cloud-integrated Storage What & Why

High Performance Computing OpenStack Options. September 22, 2015

Server and Storage Virtualization with IP Storage. David Dale, NetApp

EMC Disk Library with EMC Data Domain Deployment Scenario

Technology Insight Series

Protect Data... in the Cloud

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

How it can benefit your enterprise. Dejan Kocic Netapp

Reducing the cost of Protecting and. Securing Data. Assets. Big data, small data, critical data, more data. NetApp

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

3Gen Data Deduplication Technical

The Modern Virtualized Data Center

Don t be duped by dedupe - Modern Data Deduplication with Arcserve UDP

Redefining Microsoft SQL Server Data Management. PAS Specification

Module: Business Continuity

ARCHIVING FOR DATA PROTECTION IN THE MODERN DATA CENTER. Tony Walker, Dell, Inc. Molly Rector, Spectra Logic

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

Die Herausforderung an Backup/Recovery durch das Datenwachstum Wie optimiere ich

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

Redefining Microsoft SQL Server Data Management

Redefining Oracle Database Management

TOP CONSIDERATIONS FOR BUSINESS ONLINE BACKUP

Fundamental Approaches to WAN Optimization. Josh Tseng, Riverbed

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Big data Devices Apps

The Role of WAN Optimization in Cloud Infrastructures

Backup and Recovery 1

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Cloud Archiving. Paul Field Consultant

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Dell Data Protection. Marek Istok Ŋ Dell Slovakia

HP Store Once. Backup to Disk Lösungen. Architektur, Neuigkeiten. rené Loser, Senior Technology Consultant HP Storage Switzerland

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley

Sales Tool. Summary DXi Sales Messages November NOVEMBER ST00431-v06

Long term retention and archiving the challenges and the solution

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Future-Proofed Backup For A Virtualized World!

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability

Reducing Backups with Data Deduplication

Cloud File Services: October 1, 2014

Protecting Big Data Data Protection Solutions for the Business Data Lake

REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS

Object Storage: Out of the Shadows and into the Spotlight

Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level. Jacob Farmer, CTO, Cambridge Computer

This presentation contains some information about future Veeam product releases, the timing and content of which are subject to change without

Understanding data deduplication ratios June 2008

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Optimizing IT Data Services

Redefining Microsoft Exchange Data Management

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013

Active Archive - Data Protection for the Modern Data Center. Molly Rector, Spectra Logic Dr. Rainer Pollak, DataGlobal

ETERNUS CS High End Unified Data Protection

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Optimizing Data Protection Operations in VMware Environments

Transcription:

PROTECTING DATA IN THE BIG DATA WORLD Thomas Rivera, Hitachi Systems

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

About the SNIA DPCO This tutorial has been developed, reviewed and approved by members of the Protection & Capacity Optimization (DPCO) Committee, which any SNIA member can join for free The mission of the DPCO is to foster the growth and success of the market for data protection and capacity optimization technologies Online DPCO Knowledge Base: www.snia.org/dpco/knowledge Online Product Selection Guide: http://sniadataprotectionguide.org 2012 goals include educating the vendor and user communities, market outreach, and advocacy and support of any technical work associated with data protection and capacity optimization Check out these SNIA Tutorials: Introduction to Protection Trends in Protection Advanced Reduction Concepts Managing Big Hands-On Lab HOL 3

Abstract growth is in an explosive state, and these "Big " repositories need to be protected. In addition, new regulations are mandating longer data retention, and the job of protecting these ever-growing data repositories is becoming even more daunting. This presentation will outline the challenges and the methods that can be used for protecting "Big " repositories. Topics will include: The unique challenges of managing and protecting "Big " The various technologies available for protecting "Big " The various data protection considerations for "Big, for various environments, including Disaster Recovery/Replication, Capacity Optimization, etc. 4

Terminology Big [Storage System] A characterization of datasets that are too large to be efficiently processed in their entirety by the most powerful standard computational platforms available. Deduplication [Storage System] The replacement of multiple copies of data at variable levels of granularity with references to a shared copy in order to save storage space and/or bandwidth. Compression [General] The process of encoding data to reduce its size. Lossy compression (i.e., compression using a technique in which a portion of the original information is lost) is acceptable for some forms of data (e.g., digital images) in some applications, but for most IT applications, lossless compression (i.e., compression using a technique that preserves the entire content of the original data, and from which the original data can be reconstructed exactly) is required. May add the SNIA definitions for structured and unstructured data 5

The 3 V s of Big Volume PB Petabytes EB Exabytes ZB Zettabytes YB Yottabytes Variety Unstructured Semi structured Rich media Velocity High velocity ingest Live feeds Real time decisions 6

The Challenge of Big : Volume Volume PB Petabytes EB Exabytes ZB Zettabytes YB Yottabytes 1PB = 10 15 bytes 1EB = 10 18 bytes 1ZB = 10 21 bytes 1YB = 10 24 bytes Jet engine produces ~10PB of data every 30 minutes of flight time processes ~20PB of data per day If one exabyte s worth of data were placed onto DVDs in slimline jewel cases, and then loaded into Boeing 747 aircraft, it would take 13,513 planes to transport this one exabyte 7

The Challenge of Big : Variety Variety Unstructured Semi structured Rich media Unstructured data: office files, Semi-structured data: integrated text/media files, Web files, XML Easy to generate but difficult to query, optimize, etc. Rich Media: Streaming media, Flash videos, etc. 8

The Challenge of Big : Velocity Velocity High velocity ingest Live feeds Real time decisions High velocity ingest: Easy to generate but difficult to query, optimize, etc. Live feeds: Description to go here Reel time decisions: Description to go here 9

Sources of Big Growth VIDEO Machine Complex SENSORS Human Enterprise content, External sources RECORDING M2M LOG FILES SATELLITE IMAGING Variety Volume Velocity WEB LOGS EMAIL Volume Business Process base data 1X 10X 100X More with More Complex Relationships in Real Time and At Scale (To manage, govern and analyze ) OLTP DOCUMENTS SOCIAL BIO- INFORMATICS 10

Putting Big in Context High Business Process Machine Human Volume & Velocity Low MPP Warehouse In Memory Columnar Traditional Warehouse & Marts Hadoop Ecosystem and emerging technologies.. Big of Tomorrow Mostly unstructured (variety) Massive scale (volume) Real time ingest (velocity) Traditional NAS File Servers Structured Semi-Structured Unstructured Variety 11

Protection: in years past Backup to tape Multi-stream to tape libraries to decrease backup windows Compression technology native on the tape drives Backup to Disk Multi-stream to disk arrays to further decrease backup windows Satisfy more stringent RPO / RTO requirements (faster recovery than tape) Backup to VTL Disk-based technology that emulates tape No backup application changes required Replication of disk-based backups Both VTL & non-vtl Has been a mainstream Protection practice for years (still is!) 12

Protection: Today Backup to tape still common Multi-stream to tape libraries to decrease backup windows Compression technology native on the tape drives Backup to Disk much more prevalent for quick restores Multi-stream to tape libraries to decrease backup windows Deduplication is also very common (for space saving, not speed) Satisfy more stringent RPO / RTO requirements Backup to VTL still a viable option for some environments Disk-based technology that emulates tape Where huge investment in Backup application & difficult to migrate from Replication of disk-based backups Now very common, often replacing local tape backups 13

Deduplication and Compression Dedupe and compression are similar Both are dependant on data patterns Results can vary from little/no optimization to high percentage Both consume system resources Both can optimize required storage capacity or bandwidth utilization Dedupe and compression are different Dedupe and compression can be complementary But some knowledge about the data pattern is helpful Some data is best optimized via dedupe Some data is best optimized via compression Some data can be optimized via dedupe and compression Sequence of optimization when encryption is used Typically dedupe is first, encryption is last 14

Considerations for Protecting Big : VM Environments Important to have tools to work with the APIs within the virtual environment VM Clone Copies APP APP DATA DATA OS OS APP DATA OS APP DATA OS APP DATA OS Note: Deduplication with snapshot/clone technology is common for VM environments APP DATA OS VMs After Deduplication Duplicate Is Eliminated 15

Considerations for Protecting Big : Remote / Dispersed Replication Remote Sites Primary Center Optimized Secondary Center Optimized Optimized Tertiary Center Can be one-way, bi-directional, multi-hop, cascade or n-way Utilizing data reduction techniques is mandatory for big data 16

Considerations for Protecting Big : Remote / Dispersed Replication (Cont.) Focus on your Service Level Agreements (SLAs) first Needs to meet window for Replication Needs to meet SLA for System Recovery or Restore Is DR site planned as failover site? If so, need to consider handling of data reduction re-hydration Is it Necessary to Optimize All? Mission-critical applications May have regulatory issues for some data Some data types not conducive to data reduction Replicate incremental changes only, without other optimization 17

Consideration Matrix for Reduction Location Application-specific protocol CIFS, NFS, FC, FCoE, iscsi, VTL WAN Network Gateway Backup Software Agent Virtual Gateway Primary Storage DR/Backup Storage Archival Storage Reduce Network Traffic Reduce Physical Capacity Reduce Backup Time Reduce Recovery Time Reduce Replication Time Reduce Media Latency 18

Capacity Optimization: Savings Every TB of disks you don t buy saves you CAPEX for the equipment CAPEX for the footprint CAPEX and OPEX for power conditioning OPEX for the power to spin the drives OPEX for cooling OPEX for storage management 19

Capacity Optimization: Savings (Cont.) - Capacity optimization technologies use less raw capacity to store and use the same data set 10 TB Archive 5 TB 1 TB Backup Snapshots Growth RAID10 Snapshots Growth RAID10 Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP RAID 10 RAID 5/6 Thin Provisioning Delta Snapshots Dedupe Dedupe & Compression 20

Factors Impacting Space Savings More Effective Deduplication created by users Low change rates Reference data and inactive data Applications with lower data transfer rates Use of full backups Longer retention of deduplicated data Continuous business process improvement Format awareness Temporal data deduplication Less Effective Deduplication captured from mother nature High change rates Active data, encrypted data, compressed data Applications with higher data transfer rates Use of incremental backups Shorter retention of deduplicated data Business as usual operational procedures No format awareness Spatial data deduplication Don t forget about compression! 21

Protection: Future Backup to tape still common, but rarely used for local backups More used for the off-site DR and/or for long-term requirements Starting to see other data reduction technologies appear on tape drives File system-like access technology Deduplication Backup to Disk and other fast media will be most prevalent Solid-state disk and various memory technologies Will be less of an issue of RPO/RTO than pure speed and adaptability Multi-site / multi-media backups Multi-site: N-way site replication Multi-media: not constrained to one media type for data protection 22

Protection of Future (Cont.) Backup to the cloud Failover to and recovery from the cloud Primary Center Capacity-Optimized CLOUD 23

Protection of Future (Cont.) Clustering ( RAIN ) technology Allows for data to be copied to one or more nodes within a cluster Sometimes negates the need for traditional RAID protection Will change the graphic (below) 24

Protection of Future (Cont.) Distributed / Parallel / Global file system technologies Allows for geographically dispersed data on a file system Can create policies to make copies data for multi-site data redundancy Client Client Client Client File Network Protocol File Server File Server File Server File Server File Server 25 Aggregation of Storage Servers RAIN + RAID (aka Network RAID) Global Namespace 25

Summary Many different technologies are being used to cope with big data Compression Deduplication WAN accelerators Faster storage technologies (Solid State Disks, Flash/Memory, etc.) Content-Aware and application-aware data protection schemes are becoming more prevalent Potential for greater data reduction with more knowledge of specific data structures and data types Potential for greater flexibility in how data is protected Greater adaptability in how data is protected 26

Summary (Cont.) Much higher performance data protection required for big data Near instantaneous backups Much faster restores More data protection automation is required No time can we wasted reviewing failed backup jobs Less people required to manage the data protection schemes Do more with less people Advanced data protection flexibility The variety / sources of data is exploding Need to support multi-vendor/multi-application environments 27

Summary (Cont.) Growing requirement for abiding by regulations SEC 17a-4, HIPPA, etc. High Reliability / Resiliency Relying on a standard backup is a thing of the past: a new paradigm is emerging Need to support protecting data to the cloud feeds coming from other sources via public, private & hybrid clouds Also need support for recovery to the cloud (not just backup to ) 28

Q&A / Feedback The SNIA Education Committee would like to thank the following individuals for their contributions to this Tutorial: Authorship History Original: DPCO Committee, 8/2012 Updates: ABDC Committee, 9/2011 Additional Contributors Kevin Dudak Mike Dutch Larry Freeman Tom McNeal Gene Nagle Ronald Pagani Thomas Rivera Tom Sas SW Worth It s easy to get involved with the DPCO! Please send any questions or comments regarding this SNIA Tutorial to tracktutorials@snia.org www.snia.org/dpco 29