Thomas Rivera / Hitachi Data Systems

Similar documents
PROTECTING DATA IN THE BIG DATA WORLD. Thomas Rivera, Hitachi Data Systems

Thomas Rivera / Hitachi Data Systems. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

15 Best Practices For Comparing Data Protection With Cloud Media Server

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Deduplication s Role in Disaster Recovery. Gene Nagle, EXAR Thomas Rivera, SEPATON

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

in Transition to the Cloud

The Business Value of Data Deduplication DDSR SIG

Restoration Technologies. Mike Fishman / EMC Corp.

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

Introduction to Data Protection: Backup to Tape, Disk and Beyond

Today s Agile, Complex and Heterogeneous Data Centers

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Understanding Enterprise NAS

How To Protect Data On Network Attached Storage (Nas) From Disaster

Protecting Information in a Smarter Data Center with the Performance of Flash

How to Cost Effectively Retain Reference Data for Analytics and Big Data. Molly Rector, EVP Product Management & WW Marketing, Spectra Logic

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

EMC BACKUP MEETS BIG DATA

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

How it can benefit your enterprise. Dejan Kocic Netapp

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

Introduction to NetApp Infinite Volume

Reducing the cost of Protecting and. Securing Data. Assets. Big data, small data, critical data, more data. NetApp

in the Cloud - What To Do and What Not To Do Chad Thibodeau / Cleversafe Sebastian Zangaro / HP

TOP CONSIDERATIONS FOR BUSINESS ONLINE BACKUP

Protect Data... in the Cloud

EMC Disk Library with EMC Data Domain Deployment Scenario

ntier Verde Simply Affordable File Storage

Technology Insight Series

Object Storage: Out of the Shadows and into the Spotlight

The Modern Virtualized Data Center

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Cloud OS Vision. Modern platform for the world s apps

SOLUTION BRIEF KEY CONSIDERATIONS FOR BACKUP AND RECOVERY

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

Business Life Insurance

IBM PROTECTIER: FROM BACKUP TO RECOVERY

ARCHIVING FOR DATA PROTECTION IN THE MODERN DATA CENTER. Tony Walker, Dell, Inc. Molly Rector, Spectra Logic

Long term retention and archiving the challenges and the solution

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS

Don t be duped by dedupe - Modern Data Deduplication with Arcserve UDP

Cloud Archive & Long Term Preservation Challenges and Best Practices

Cloud-integrated Storage What & Why

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

Rethinking Archiving: Exploring the path to improved IT efficiency and maximizing value of archiving solution investments

High Performance Computing OpenStack Options. September 22, 2015

Protecting Big Data Data Protection Solutions for the Business Data Lake

Things You Need to Know About Cloud Backup

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Trends in Application Recovery. Andreas Schwegmann, HP

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Reducing Backups with Data Deduplication

3Gen Data Deduplication Technical

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

How To Migrate To A Network (Wan) From A Server To A Server (Wlan)

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

Redefining Oracle Database Management

Sales Tool. Summary DXi Sales Messages November NOVEMBER ST00431-v06

Enable unified data protection

Backup and Recovery 1

Business white paper. environments. The top 5 challenges and solutions for backup and recovery

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Future-Proofed Backup For A Virtualized World!

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Reduced Complexity with Next- Generation Deduplication Innovation

The Oracle StorageTek VSM6 Providing Unprecedented Storage Versatility

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

ClearPath Storage Update Data Domain on ClearPath MCP

Understanding data deduplication ratios June 2008

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Understanding EMC Avamar with EMC Data Protection Advisor

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

ETERNUS CS High End Unified Data Protection

EMC DATA DOMAIN PRODUCT OvERvIEW

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Redefining Microsoft SQL Server Data Management. PAS Specification

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Active Archive - Data Protection for the Modern Data Center. Molly Rector, Spectra Logic Dr. Rainer Pollak, DataGlobal

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Bringing the edge to the data center a data protection strategy for small and midsize companies with remote offices. Business white paper

Transcription:

Protecting PRESENTATION in TITLE the GOES Big HERE World Thomas Rivera / Hitachi Systems

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

About the SNIA DPCO Committee This tutorial has been developed, reviewed and approved by members of the Protection and Capacity Optimization (DPCO) Committee which any SNIA member can join for free The mission of the DPCO is to foster the growth and success of the market for data protection and capacity optimization technologies Online DPCO Knowledge Base: www.snia.org/dpco/knowledge Online Product Selection Guide: http://sniadataprotectionguide.org 2013 goals include educating the vendor and user communities, market outreach, and advocacy and support of any technical work associated with data protection and capacity optimization Check out these SNIA Tutorials: Understanding Deduplication Deduplication s Role in Disaster Recovery The ABCs of Big - Analytics and Bandwidth for Your Content 3

Abstract growth is in an explosive state, and these "Big " repositories need to be protected. In addition, new regulations are mandating longer data retention, and the job of protecting these ever-growing data repositories is becoming even more daunting. This presentation will outline the challenges and the methods that can be used for protecting "Big " repositories. Topics will include: The unique challenges of managing and protecting "Big " The various technologies available for protecting "Big " The various data protection considerations for "Big, for various environments, including Disaster Recovery/Replication, Capacity Optimization, etc. 4

Terminology Big [Storage System] A characterization of datasets that are too large to be efficiently processed in their entirety by the most powerful standard computational platforms available. Compression [General] The process of encoding data to reduce its size. Lossy compression (i.e., compression using a technique in which a portion of the original information is lost) is acceptable for some forms of data (e.g., digital images) in some applications, but for most IT applications, lossless compression (i.e., compression using a technique that preserves the entire content of the original data, and from which the original data can be reconstructed exactly) is required. Deduplication [Storage System] The replacement of multiple copies of data at variable levels of granularity with references to a shared copy in order to save storage space and/or bandwidth. 5

Terminology (Cont.) Protection [ Management] Assurance that data is not corrupted, is accessible for authorized purposes only, and is in compliance with applicable requirements. Structured [ Management] that is organized and formatted in a known and fixed way. The format and organization are customarily defined in a schema. The term structured data is usually taken to mean data generated and maintained by databases and business applications. Unstructured [ Management] that cannot easily be described as structured data. 6

The 3 V s of Big Volume PB Petabytes EB Exabytes ZB Zettabytes YB Yottabytes Variety Unstructured Semi structured Rich media Velocity High velocity ingest Live feeds Real time decisions 7

The Challenge of Big : Volume Volume PB Petabytes EB Exabytes ZB Zettabytes YB Yottabytes 1PB = 10 15 bytes 1EB = 10 18 bytes 1ZB = 10 21 bytes 1YB = 10 24 bytes Jet engine produces ~10PB of data every 30 minutes of flight time processes ~20PB of data per day If one exabyte s worth of data were placed onto DVDs in slimline jewel cases, and then loaded into Boeing 747 aircraft, it would take 13,513 planes to transport this one exabyte 8

The Challenge of Big : Variety Variety Unstructured Semi structured Rich media Unstructured data: office files, video files, audio files, etc. Semi-structured data: integrated text/media files, Web/XML files, etc. Easy to generate but difficult to query, optimize, etc. Rich Media: Streaming media, Flash videos, etc. 9

The Challenge of Big : Velocity Velocity High velocity ingest Live feeds Real time decisions High velocity ingest: Easy to generate but difficult to query, optimize, etc. Live feeds: News, audio/video television programming, surveillance video, etc. Real-time decisions: Retail: discounts, suggest other items to pruchase, etc. Security Etc. 10

Sources of Big Growth Machine Complex VIDEO SENSORS Human Enterprise content, External sources RECORDING M2M LOG FILES SATELLITE IMAGING Variety Volume Velocity WEB LOGS EMAIL Volume Business Process base data 1X 10X 100X More with More Complex Relationships in Real Time and At Scale (To manage, govern and analyze ) OLTP DOCUMENTS SOCIAL BIO- INFORMATICS 11

Big : In Context High Business Process Machine Human Volume & Velocity Low MPP Warehouse In Memory Columnar Traditional Warehouse & Marts Hadoop Ecosystem and emerging technologies.. Big of Tomorrow Mostly unstructured (variety) Massive scale (volume) Real time ingest (velocity) Traditional NAS File Servers Structured Semi-Structured Unstructured Variety 12

All is not Equal Value of Mission Critical Every minute down impacts revenue Lost data means lost customers Complete protection is a high priority Important Needs to be recovered quickly to avoid impact loss is painful but not fatal Justifiable protection is required Non-Critical 24+ hour recovery time is acceptable Lost data has little or no impact on operations Cost-effective protection needed 13

What is Protection? 14

Protection: In Years Past Backup to tape Multi-stream to tape to decrease backup windows Compression technology native on the tape drives Backup to disk Multi-stream to disk arrays to further decrease backup windows Satisfy more stringent RPO / RTO requirements (faster recovery than tape) Backup to VTL Disk-based technology that emulates tape No backup application changes required Replication of disk-based backups Both VTL & non-vtl Has been a mainstream Protection practice for years (still is!) 15

Protection: Today Backup to tape still common Multi-stream to tape libraries to decrease backup windows Compression technology native on the tape drives Backup to disk much more prevalent for quick restores Multi-stream to tape libraries to decrease backup windows Deduplication is also very common (for space saving, not speed) Satisfy more stringent RPO / RTO requirements Backup to VTL still a viable option for some environments Disk-based technology that emulates tape Where huge investment in Backup application & difficult to migrate from Replication of disk-based backups Now very common, often replacing local tape backups 16

Deduplication and Compression Dedupe and compression are similar Both are dependant on data patterns Results can vary from little/no optimization to high percentage Both consume system resources Both can optimize required storage capacity or bandwidth utilization Dedupe and compression are different Dedupe and compression can be complementary But some knowledge about the data pattern is helpful Some data is best optimized via dedupe Some data is best optimized via compression Some data can be optimized via dedupe and compression Sequence of optimization when encryption is used Dedupe is first, encryption is last 17

Considerations for Protecting Big : Remote / Dispersed Replication Remote Sites Primary Center Optimized Secondary Center Optimized Optimized Tertiary Center Can be one-way, bi-directional, multi-hop, cascade or n-way Utilizing data reduction techniques is mandatory for big data 18

Considerations for Protecting Big : Remote / Dispersed Replication (Cont.) Focus on your Service Level Agreements (SLAs) first Needs to meet window for Replication Needs to meet SLA for System Recovery or Restore Is DR site planned as failover site? If so, need to consider handling of data reduction re-hydration Is it Necessary to Optimize All? Mission-critical applications May have regulatory issues for some data Some data types not conducive to data reduction Replicate incremental changes only, without other optimization 19

Consideration Matrix for Reduction Location Application-specific protocol CIFS, NFS, FC, FCoE, iscsi, VTL WAN Network Gateway Backup Software Agent Virtual Gateway Primary Storage DR/Backup Storage Archival Storage Reduce Network Traffic Reduce Physical Capacity Reduce Backup Time Reduce Recovery Time Reduce Replication Time Reduce Media Latency 20

Capacity Optimization: Savings Every TB of disks you don t buy saves you CAPEX for the equipment CAPEX for the footprint CAPEX and OPEX for power conditioning OPEX for the power to spin the drives OPEX for cooling OPEX for storage management 21

Capacity Optimization: Savings (Cont.) - Capacity optimization technologies use less raw capacity to store and use the same data set 10 TB Archive 5 TB 1 TB Backup Snapshots Growth RAID10 Snapshots Growth RAID10 Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP Archive Backup Snapshots Growth RAID DP Snapshots Growth RAIDDP RAID 10 RAID 5/6 Thin Provisioning Delta Snapshots Dedupe Dedupe & Compression 22

Factors Impacting Space Savings More Effective Deduplication created by users Low change rates Reference data and inactive data Applications with lower data transfer rates Use of full backups Longer retention of deduplicated data Continuous business process improvement Format awareness Temporal data deduplication Less Effective Deduplication captured from mother nature High change rates Active data, encrypted data, compressed data Applications with higher data transfer rates Use of incremental backups Shorter retention of deduplicated data Business as usual operational procedures No format awareness Spatial data deduplication Don t forget about compression! 23

Protection: Future Backup to tape still common, but rarely used for local backups More used for the off-site DR and/or for long-term requirements Starting to see other data reduction technologies appear on tape drives File system-like access technology Deduplication Backup to Disk and other fast media will be most prevalent Solid-state disk and various memory technologies Will be less of an issue of RPO/RTO than pure speed and adaptability Multi-site / multi-media backups Multi-site: N-way site replication Multi-media: not constrained to one media type for data protection 24

Protection: Future (Cont.) Backup to the cloud Failover to and recovery from the cloud Some of the technologies currently used and being developed: Applications that write directly to the cloud Storage-based software migration On-ramps devices that stage data and move to the cloud over time Storage systems that make copies to cloud locations Primary Center Capacity-Optimized CLOUD 25

Protection: Future (Cont.) RAIN technology (Redundant Array of Independent Nodes) Allows for data to be copied to one or more nodes within a cluster Sometimes negates the need for traditional RAID protection Especially if the Nodes are in remote locations for DR purposes 26

Protection: Future (Cont.) Distributed / Parallel / Global file system technologies Allows for geographically dispersed data on a file system Can create policies to make copies data for multi-site data redundancy Client Client Client Client File Network Protocol File Server File Server File Server File Server File Server 27 Aggregation of Storage Servers RAIN + RAID (aka Network RAID) Global Namespace 27

Summary Many different technologies are being used to cope with big data Compression Deduplication WAN accelerators Faster storage technologies (Solid State Disks, Flash/Memory, etc.) But this is only one aspect of dealing with the challenge of protecting Big 28

Summary (Cont.) Content-aware and application-aware data protection schemes will become the mainstay Potential for greater data reduction with more knowledge of specific data structures within respective data types Potential for greater flexibility in how data is protected protection can become much more streamlined as we are more aware of data types, data change-rate models, data access models, etc. Greater adaptability in how data is protected New hybrid data types are being generated Advanced data protection schemes can be developed to better accommodate big data requirements 29

Summary (Cont.) Much higher performance data protection required for big data Near instantaneous backups Much faster restores (in some cases, near instantaneous) More data protection automation is required Less people will be required to manage Gone will be the days of reviewing failed backup jobs logs Advanced data protection flexibility Need to support multi-vendor, multi-application environments 30

Summary (Cont.) Growing requirement for abiding by regulations SEC 17a-4, HIPPA, etc. Automation will be a large part of meeting these requirements High Reliability / Resiliency SLAs becoming more stringent (cannot afford to lose any access to data) Relying on a standard backup is a thing of the past: a new paradigm is emerging Need to support protecting data to the cloud feeds coming from other sources via public, private & hybrid clouds Also need support for recovery to the cloud (not just backup to ) 31

Attribution & Feedback The SNIA Education Committee thanks the following individuals for their contributions to this Tutorial: Authorship History Original Author: DPCO Committee, 8/2012 Updates: DPCO Committee, 9/2012 DPCO Committee, 2/2013 Additional Contributors Kevin Dudak Mike Dutch Larry Freeman David Hill Tom McNeal Gene Nagle Ronald Pagani Thomas Rivera Tom Sas Gideon Senderov SW Worth Please send any questions or comments regarding this SNIA Tutorial to tracktutorials@snia.org 32