How to Cost Effectively Retain Reference Data for Analytics and Big Data. Molly Rector, EVP Product Management & WW Marketing, Spectra Logic

Similar documents
Object Oriented Storage and the End of File-Level Restores

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Understanding Enterprise NAS

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Cloud and Big Data initiatives. Mark O Connell, EMC

15 Best Practices For Comparing Data Protection With Cloud Media Server

Restoration Technologies. Mike Fishman / EMC Corp.

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

Cloud Archiving. Paul Field Consultant

High Performance Computing OpenStack Options. September 22, 2015

Rethinking Archiving: Exploring the path to improved IT efficiency and maximizing value of archiving solution investments

Active Archive - Data Protection for the Modern Data Center. Molly Rector, Spectra Logic Dr. Rainer Pollak, DataGlobal

Trends in Application Recovery. Andreas Schwegmann, HP

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

ARCHIVING FOR DATA PROTECTION IN THE MODERN DATA CENTER. Tony Walker, Dell, Inc. Molly Rector, Spectra Logic

Data Center Transformation. Russ Fellows, Managing Partner Evaluator Group Inc.

ILM: Tiered Services & The Need For Classification

Deduplication s Role in Disaster Recovery. Gene Nagle, EXAR Thomas Rivera, SEPATON

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Cloud File Services: October 1, 2014

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

Storage Clouds. Karthik Ramarao. Director of Strategy and Technology and CTO Asia Pacific, NetApp Board Director SNIA South Asia

Building the Business Case for the Cloud

An Introduction to Storage Management. Raymond A. Clarke, Oracle

Deploying Public, Private, and Hybrid Storage Clouds. Marty Stogsdill, Oracle

Storage Cloud Environments. Alex McDonald NetApp

Mobile World. Chris Winter SafeNet Inc.

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

2011 FileTek, Inc. All rights reserved. 1 QUESTION

The Role of WAN Optimization in Cloud Infrastructures

Managing Data Storage in the Public Cloud. October 2009

Enterprise Architecture and the Cloud. Marty Stogsdill, Oracle

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

How it can benefit your enterprise. Dejan Kocic Netapp

Today s Agile, Complex and Heterogeneous Data Centers

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Storage Clouds. Enterprise Architecture and the Cloud. Author and Presenter: Marty Stogsdill, Oracle

Preparing for a Security Audit: Best Practices for Storage Professionals

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Object Storage: Out of the Shadows and into the Spotlight

Got Files? Get Cloud!

Hitachi Cloud Service for Content Archiving. Delivered by Hitachi Data Systems

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP

in the Cloud - What To Do and What Not To Do Chad Thibodeau / Cleversafe Sebastian Zangaro / HP

How To Migrate To A Network (Wan) From A Server To A Server (Wlan)

REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS

Interoperable Cloud Storage with the CDMI Standard

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

in Transition to the Cloud

Cloud Storage Clients. Rich Ramos, Individual

NetApp Big Content Solutions: Agile Infrastructure for Big Data

Cloud Data Management Interface (CDMI) The Cloud Storage Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

Enterprise Architecture and the Cloud. Marty Stogsdill, Oracle

Introduction to NetApp Infinite Volume

Save Time and Money with Quantum s Integrated Archiving Solution

Technology Insight Series

WAN Optimization and Cloud Computing. Josh Tseng, Riverbed

SSD and Deduplication The End of Disk?

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

September 2009 Cloud Storage for Cloud Computing

Storage Networking Overview

How To Understand And Understand The Risks Of Configuration Drift

Unmasking Virtualization Security. Eric A. Hibbard, CISSP, CISA Hitachi Data Systems

How To Create A Cloud Backup Service

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Using Classification to manage File Servers. Nir Ben-Zvi, Microsoft Corporation

Diagram 1: Islands of storage across a digital broadcast workflow

Efficiently Protect, Manage and Access Big Data. Nigel Tozer Business Development Director EMEA

June Blade.org 2009 ALL RIGHTS RESERVED

NEXT GENERATION EMC: LEAD YOUR STORAGE TRANSFORMATION. Copyright 2013 EMC Corporation. All rights reserved.

Fundamental Approaches to WAN Optimization. Josh Tseng, Riverbed

Object Storage A Dell Point of View

Key Considerations for Managing Big Data in the Life Science Industry

IBM Infrastructure for Long Term Digital Archiving

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

IBM Smart Business Storage Cloud

Autonomy Consolidated Archive

Three Virtualization Management Myths Busted. Rich Corley, NetApp

XenData Video Edition. Product Brief:

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Archiving with Enterprise Vault Bruno Ritter

Cloud Archive & Long Term Preservation Challenges and Best Practices

FAN An Architecture for Scalable, Service-Oriented Data Management

Data Sheet: Archiving Symantec Enterprise Vault Discovery Accelerator Accelerate e-discovery and simplify review

Big Data and the Urgency of College- Level Data Storage & Virtualization Curriculum

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Data Center Convergence. Ahmad Zamer, Brocade

Evaluation of Enterprise Data Protection using SEP Software

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

SERVER VIRTUALIZATION AND STORAGE DISASTER RECOVERY. Ray Lucchesi, Silverton Consulting

Transcription:

How to Cost Effectively Retain Reference Data for Analytics and Big Data Molly Rector, EVP Product Management & WW Marketing, Spectra Logic

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract Check out SNIA Tutorial: Enter Tutorial Name when finalized How to Cost Effectively Retain Reference Data for Analytics and Big Data The need to store critical digital content at the 100's of terabytes and into the petabyte level is quickly moving outside the capabilities of traditional storage solutions. In addition, the complexity of many of today's storage solutions can prohibit companies from migrating to a solution that best fits their needs. Many companies are turning to active archives, an approach that is quickly gaining traction for companies that regularly manage high-volume reference data or face exponential data growth associated with Data Analytics and Big Data. Software, tape and disk vendors alike are combining the best of these technologies to provide more efficient and functional solutions. And, according to InterSect360, 35% of Big Data users are already using tape to help access and retain the data necessary for long-term analytics. 3

Two Primary Active Archive Principles There are exceptions to every rule Agentless except for.. There is no black or white There is only gray and lot s of overlap between technologies

Active Archive Out of Band Conceptual Design Primary Storage Z: DRIVE NFS / CIFS File Archive Mgmt Software Data Management Software NFS / CIFS Data Migrator Software Vaulting Software Data Mover &/or Archive Products Tape Secondary Heterogeneous Storage Remote Data Center Data Management Framework Remote Data Center/ Cloud Provider

Active Archive In Band Conceptual Design Distributed File Systems Application Specific Workflow Manager NFS / CIFS (NAS) FCP/iSCSI (SAN) Server Clients Typically Required Data Management Software Primary Storage Secondary Storage Tape Data Management Framework

Active Archive File Based In Band Conceptual Design NFS / CIFS Virtual Abstraction Global Name Space Policy Driven Primary Storage Secondary Heterogeneous Storage Tape Remote Data Center Data Management Framework Remote Data Center/ Cloud Provider

Active Archive File Based Hybrid Conceptual Design Primary Storage NFS / CIFS Virtual Abstraction Global Name Space Policy Driven Primary Storage Secondary Heterogeneous Storage Tape Remote Data Center Data Management Framework Remote Data Center/ Cloud Provider

Active Archive File Based Hybrid Conceptual Design Primary Storage NFS / CIFS Vol 1 NFS / CIFS Virtual Abstraction Global Name Space Policy Driven Primary Storage Vol 2 Vol 3 Secondary Heterogeneous Storage Tape Remote Data Center Data Management Framework Remote Data Center/Publi c Cloud

What do we mean by Data Mover? P Data Source Application Server S Data Mover/s

What is Big Data? Applications Requiring Intensive Data Mining and Analytics Financial Institutions Tic Data Analysis Mortgage Data Trend Analysis Check Images Risk Assessment Cross Domain Correlations Brokerage Data Health Care Drug Efficacy Disease Pattern recognition & progression Fraud detection Claim automation Medical Records Digital Imaging Genomic Government (FBI,CIA,DOD,DOE,HLS,IRS) Internet threat detection Pattern recognition Image analysis Fraud and waste mitigation Consumer / Commercial Space Social Media and Product Sentiment correlation Consumer Analysis Advertising Affinity correlation Telemetry and Quality analysis Video Surveillance Biometrics Research Data 0 Digital Media Videos, Music, Books, Photos Seismic/Oil & Gas

Data Storage Initiatives Big Data Storage: Focus is on Performance 1 st Accessibility 2 nd Data Collection Data Processing Data Storage Data Movement Data Sharing, Manipulation, Mining and Analytics Aligns with Workflow and Application Specific Active Archive Technologies Just Long Term Storage : Focus on Scalability 1 st Accessibility 2 nd Low Cost Repository Multi-Purpose Regulatory or Business Driver Requirements Little to NO manipulation of the data required Data must be accessible if ever needed Metadata is helpful for search Aligns with General Purpose Active Archive Technologies

Technology Categories General Purpose - Agentless Out of Band Approach General Purpose Agents/Clients Required Out of Band Approach Work Flow and Application Specific - Agents/Clients Required In Band Approach

General Purpose - Agentless Centralized Archive Repository - for General Unstructured Data Agentless approach - CIFS/NFS Gateway Out of Band approach (Primary Tier Not Managed) Not tied to specific data movers platform independent Use in Legacy environments with existing storage challenges Use in New environments that are diverse in nature not tied to performance intensive applications or workflows Use in multi-tenant / multi-need/application environments

General Purpose - Agentless Example: Use when the environment just wants a dumping ground to store inactive unstructured data for various periods of time and is looking for some confidence that it will be accessible Example: Use when the environment wants selfservice transparent access to the archived data repository Example: Use when servers and hosts are diverse and well established with an inability to install agents Example: Use when the environment is trying to support a multi-data mover

General Purpose Agents/Clients Required Centralized Archive Repository - Focus on Unstructured Data and also specific Agent Supported Applications (Structured Data) Agents must be installed on each host servers and/or workstations Out of Band approach (Primary Tier Not Managed) Data Mover is incorporated into the technology platform dependent Archive Access must be through STUB or GUI no direct CIFS/NFS access Use in Legacy environments with existing data protection and/ or storage challenges if you can install agents Use in New environments that will require data protection and/ or are diverse in nature and not tied to performance intensive applications or workflows Use in multi-tenant / multi-need/application environments

Workflow and Application Specific - Agents/Clients Required High Performance - File System Evolution SAN Focused Shareable File Systems which - Enable Shared Access to Block Based Storage LUNS by multiple application servers Similar to what Network Attached Storage provide natively Provide Shared Storage/Data Access for Applications supporting Block Based protocols (FCP, iscsi etc.) Minimized and/or eliminates native OS File System limitations Must own Primary Storage Tier apply proprietary file system Must own Secondary Storage Tiers/Tape apply proprietary file system Application Server agent/clients required to be able to access proprietary file system independent of OS Proprietary file system enables data movement between storage tiers and/or tape Agents/Clients - NAS Gateway capabilities now added to allow NAS protocol applications to also leverage proprietary file system enhanced functionality

Work Flow and Application Specific Agents/Clients Required Archive Repository Focus for Specialized Unstructured Data associated with a specific workflow or application In Band approach (Primary Tier Is Managed) Present a complete virtualized storage pool across ALL storage tiers Host Agents required to facilitate integration with all storage tiers and proprietary file system Installed on specific workstations and application servers Optimizes performance to and from virtualized storage pool Facilitates integration with specific applications and workflows Optimize data movement and copies between all storage tiers Proprietary Data Mover Removes the Need for Stubs or Pointers Not recommended for use with Legacy environments due to need to incorporate primary storage and agent requirements Use with NEW environments that are designed for a specific application or workflow and where performance is key

Work Flow and Application Specific Agents/Clients Required Examples: Use for New Applications when the Entire Workflow must be managed including: Data Collection Data Processing Data Storage Data Manipulation, Mining, Analytics Examples: Use for Performance/Data Flow Applications Broadcast, Cable & CCTV Movie and Television Production, Post-Production & Graphics Science and Engineering (Oil & Gas research, Satellite imaging) High Performance Computing (Life Sciences- genome sequencing) Audio and Video Surveillance and Mining Government Data Mining projects, Internet Streaming, Pre- Press, Digital Printing, CAD/CAM

Active Archive Check Box Differentiators Unstructured Structured Out of Band In Band Agents Required Agentless Data Mover Included Tape Supported LTFS Supported Copy Management

Active Archive Check Box Differentiators Version Management Varied Retention Across Tiers Spinning Disk Storage Tiering Multi-Site Support with Replication Cloud Integration Single Name Space H/A Strategy Multi-Tenancy Data Profiling Content Classification

Active Archive Check Box Differentiators Custom Metadata Search Legal Hold Single Instancing Deduplication Compression Auto Data Deletion (tape consolidation?) Data Shredding Scale Out Architecture

Active Archive Check Box Differentiators Digital Fingerprint/Self Healing Catalog Rebuild Capability Partial File Retrieval Reporting Capabilities 3 rd Party Application Integration

General Purpose Out of Band / Agentless Comparison Attribute File Manager Archive Manager Data Mover LTFS Appliance LTFS YES YES NO YES Active/Active H/A YES NO - Active /Standby NO - Active/Standby or Need MS Cluster NO - Active/Standby Data Mover Included NO NO NO NO Spinning Disk Storage Tiering YES Only for short retention YES Cache only Auto Data Deletion YES NO YES NO Copy/Version Mgmt. YES user plug-in s (optional) YES 3 Copies on any Tier Can t modify Files must save as new Files YES - Dual Copy on Tape Custom Metadata YES NO NO NO Search YES GUI NO NO NO Scale Out YES NO NO NO

Active Archive Landscape Just Long Term Storage General Purpose Self Service Focus Agentless Easier to Deploy Open Data Mover Platform New or Existing Environment Big Data Workflow and Application Specific Performance Focus Data Sharing Data Mining/Analysis New Application Just Long Term Storage General Purpose All In One Data Mover Included Agents Required Closed Platform Some WorkFlow/App Integration New Better Existing OK Shades of Gray

Approach to Selecting the Right Technology Identify the goals of the environment: What type of data will be archived? Is this a long term general purpose storage repository? Is this for a specific application or work flow? Is this for data mining and analytics? Is this for a specific business unit or department? Is it a new or legacy environment?

Data Storage Initiatives Big Data Storage: Focus is on Performance 1 st Accessibility 2 nd Data Collection Data Processing Data Storage Data Movement Data Sharing, Manipulation, Mining and Analytics Aligns with Workflow and Application Specific Active Archive Technologies Just Long Term Storage : Focus on Scalability 1 st Accessibility 2 nd Low Cost Repository Multi-Purpose Regulatory or Business Driver Requirements Little to NO manipulation of the data required Data must be accessible if ever needed Metadata is helpful for search Aligns with General Purpose Active Archive Technologies

Attribution & Feedback The SNIA Education Committee would like to thank the following individuals for their contributions to this Tutorial. Authorship History Name/Date of Original Author here: Molly Rector 08/15/2012 Additional Contributors Stacy Schwartz-Gardner Please send any questions or comments regarding this SNIA Tutorial to tracktutorials@snia.org 28

Refer to Other Tutorials Use this icon to refer to other SNIA Tutorials where appropriate Check out SNIA Tutorial: Enter Tutorial Title Here Use the Hands-On Lab icon only if you ve been notified that your tutorial is a cross-referenced match to the Hands-On Lab 29