Scalable Storage for Life Sciences



Similar documents
Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level. Jacob Farmer, CTO, Cambridge Computer

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Alternatives to Big Backup

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Backup and Recovery 1

Optimizing Large Arrays with StoneFly Storage Concentrators

June Blade.org 2009 ALL RIGHTS RESERVED

Solution Brief: Creating Avid Project Archives

CIGRE 2014: Udaljena zaštita podataka

EMC BACKUP MEETS BIG DATA

A Deduplication File System & Course Review

Solutions for Encrypting Data on Tape: Considerations and Best Practices

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Long term retention and archiving the challenges and the solution

Redefining Oracle Database Management

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

3Gen Data Deduplication Technical

Optimizing IT Data Services

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

How To Protect Data On Network Attached Storage (Nas) From Disaster

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Trends in Enterprise Backup Deduplication

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Data Protection: Understanding the Benefits of Various Data Backup and Recovery Techniques

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

WHITE PAPER: customize. Best Practice for NDMP Backup Veritas NetBackup. Paul Cummings. January Confidence in a connected world.

PASIG May 12, Jacob Farmer, CTO Cambridge Computer

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

THE CASE FOR ACTIVE DATA ARCHIVING

Restoration Technologies. Mike Fishman / EMC Corp.

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

The Microsoft Large Mailbox Vision

E Number: E Passing Score: 800 Time Limit: 120 min

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Understanding Disk Storage in Tivoli Storage Manager

Implementing a Digital Video Archive Based on XenData Software

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Library Recovery Center

TCO Case Study Enterprise Mass Storage: Less Than A Penny Per GB Per Year

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

PARALLELS CLOUD STORAGE

White. Paper. Improving Backup Effectiveness and Cost-Efficiency with Deduplication. October, 2010

XenData Archive Series Software Technical Overview

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

Leveraging Virtualization for Disaster Recovery in Your Growing Business

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

(Scale Out NAS System)

Deduplication has been around for several

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Contingency Planning and Disaster Recovery

Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Tape s evolving data storage role Balancing Performance, Availability, Capacity, Energy for long-term data protection and retention

Introduction to Data Protection: Backup to Tape, Disk and Beyond

ClearPath Storage Update Data Domain on ClearPath MCP

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Data Loss Prevention (DLP) & Recovery Methodologies

SOLUTION BRIEF KEY CONSIDERATIONS FOR BACKUP AND RECOVERY

A Virtual Tape Library Architecture & Its Benefits

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

Network Attached Storage. Jinfeng Yang Oct/19/2015

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

XenData Product Brief: SX-550 Series Servers for LTO Archives

A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

Traditionally, a typical SAN topology uses fibre channel switch wiring while a typical NAS topology uses TCP/IP protocol over common networking

Building Storage Service in a Private Cloud

Mosaic Technology s IT Director s Series: Exchange Data Management: Why Tape, Disk, and Archiving Fall Short

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

Symantec NetBackup Appliances

DATA PROGRESSION. The Industry s Only SAN with Automated. Tiered Storage STORAGE CENTER DATA PROGRESSION

Transcription:

Scalable Storage for Life Sciences Presented By: Jacob Farmer, CTO Cambridge Cputer Copyright 2009, Cambridge Cputer Services, Inc. All Rights Reserved 781-250-3000

About Your Lecturer Jacob Farmer, CTO, Cambridge Cputer 20+ years of experience with data protection, archiving, and storage management Hybrid of industry analyst and consultant to end-users. Spend 25% of my time working in the industry, going to conferences, meeting with vendors. 75% of my time custer-facing, helping the sales and services departments design solutions for end users. Lecturer at major trade shows and conferences Usenix, Interop, Storage Networking World, BioITWorld, others Travelling lecturer for Usenix (Usenix On-The-Road Lecture Series) Email: jfarmer@cambridgecputer.c 2

About My Cpany: A Different Way to Shop for Storage Solutions Manufacturers Support Resources VARS Multi-Vendor Solutions Consultants Expertise / Problem Solving 3

Agenda A Crash Course in Storage Networking Storage 101 Trends and New Developments in Storage Solid State Storage Devices File Systems Storage Tiers To Tier or Not to Tier Where to Insert Tiering Logic Backup and Archiving Various Topics on Backup Tape/Archival File Systems 4

Storage 101 A Crash Course in Storage Networking w w w. C a m b r i d g e C o m p u t e r. c o m 5

Files v. Blocks Blocks Least cmon deninator in conventional storage technologies. A block is a unit of data storage. Hard drives and RAID arrays serve requests for blocks. Files Objects consisting of multiple blocks. Blocks are organized into files by file systems, which are like databases of all of the files, their attributes and records of the blocks that make up the files. 6

Metadata Data that describes other data File system metadata consists of the names of the files and directories, the file attributes, permissions, etc. Backup system metadata are indexes (logs) of all of the files that were backed up, their location on tape, and perhaps copies of relevant file system metadata. 7

The Storage I/O Path A pplication Operating System File System Storage has a layered architecture, very much like a network stack. Disk drives store data in blocks. Each block has a unique numerical address. Disk devices (hard drives, RAID systems, etc.) are like block servers, meaning you ask them to perform operations on specific blocks. 8

File System Abstraction A pplication Operating System File System Abstraction layer or redirector is inserted above or next to the file system. Read and write cmands are handled or filtered by the file system redirector. Applications do not know or care where the file resides. As long as they get the data they were asking for! 9

Why Mess with Your File System? Network file services. Hierarchical and near-line file systems Allow files to be managed on removable media and jukeboxes. SAN file systems and Parallel file systems Allow file sharing at high speed and low latency over a SAN. Parallelize network file system I/O processing. Capacity Optimization Deduplication, Cpression Security / Regulatory Cpliance Encryption, Immutability (write-once) More robust or journaling file systems FAT FAT32 NTFS. Reiser, Ext3, VxFS, etc. 10

NAS - Network Attached Storage NAS = Fancy word for File Server Appliance Appliance = Fancy word for proprietary Proprietary = two different meanings Fancy word for you buy it all fr me at whatever price I dictate. Fancy word for sething better than you can build by yourself. Hardware and Software are bundled together Software is NOT a perpetual license If you replace the hardware, you might forfeit the software. If you want to change the software, you forfeit the hardware. NAS is a marketing term that allows storage vendors to play in the file server marketplace by packaging file servers as storage appliances. Storage vendor can cpete for revenue with the server vendor Storage vendor can cpete for revenue with the operating system vendor 11

Block-Level Abstraction Abstraction of the physical disk. File system asks for specific block addresses and volume manager fulfills request fr the disks. Software RAID Hard Drive Fault Tolerance Spindle Aggregation Host-based mirroring Volume-level snapshots Volume-level replication A pplication Operating System File System Volume Manager Disk Disk Disk 12

SAN Array = Centralized, External Abstraction Application Operating System File System Volume Manager Application Operating System File System Volume Manager Application Operating System File System Volume Manager SAN DISK CONTROLLER 13

Summary of the Storage I/O Path Application Operating System File System Volume Manager HBA/Drivers LAN/WAN Appliance External File System Out-Of-Band, Block-Level Appliance Server Virtualization SAN Switch Switch Enabled Virtualization Appliance SAN Appliance Disk Controllers Disk Spindles 14

Inserting Virtualization Logic into the Storage I/O Path LAN Operating System Volume Manager File System Application SAN 15

Trends and New Development w w w. C a m b r i d g e C o m p u t e r. c o m 16

Storage is a Vibrant Field Solid State Disks New Affordable Choices Performance characteristics that are 3x to 500x that of conventional hard drives The industry is looking for novel ways to insert SSD into storage system architectures File Systems Scalability without becing brittle Self-healing Data integrity assurance Ability to detect data corruption Ability to correct data corruption 17

Inserting SSD into the Mix 18

Tiered Storage, Workflow, and Life Cycle Management w w w. C a m b r i d g e C o m p u t e r. c o m 19

Definitions Tiered Storage Managing multiple types of storage devices to better batch costs, capabilities, and properties of storage devices to the data being stored on them Life cycle management Movement of data between tiers based on frequency of access or recentness of access Workflow management Movement of data autatically between stages of data processing 20

To Tier or not to Tier? Can all of your needs be met by a single tier solution? If you can meet all of your needs in a single tier, than do it. Example, SATA array with a heavily cached file system. Setimes the premium you pay for tiering technology outweighs the cost of growing the top tier. Tiers are typically thought of as defined by drive types: 15K, 10K, SATA, maybe SSD. But a tier can be any set of properties Data integrity assurance Addressing silent data corruption ces at a performance price. Fault tolerance Fault tolerance ces at a monetary price. Data Protection Policy Backups, snapshots, replication,etc. 21

Cmon Tiers Scratch space Funds are prioritized around performance instead of fault tolerance and data protection. Set expectations with users that if it breaks, they lose their data. As scratch systems get bigger, data loss might start being a serious problem. How much does it cost to re-run 250TB worth of data? User work space He directories and collaboration space. Needs to be backed up. Snapshots are nice. Archive A place to move data that is not in active use. Deep archive A place to bank your raw data in case you need to rerun experiments in the future. 22

Where Can You Insert Tiering Logic? In-band in the data network between clients and file servers Out-of-band in the data network In a file system or NAS But be careful to weigh the costs and consider the longterm cmitment to the specific product In a disk array Se disk arrays can manage tiered storage at the block-level 23

NAS Appliance with Integrated Virtual Namespace Files are autatically moved between tiers based on policies NAS Appliance with integrated virtual namespace and tiered storage Bulk Disk or CAS Tape Librar y 24

In-Band File System Virtualization In-band Virtualization Appliance or Switch File Servers / NAS 25

Out of Band File System Virtualization Virtual File System Server File Servers / NAS 26

General Recmendations on Tiered Storage Minimize the number of tiers you have The fewer tiers the easier to manage You can always add more tiers and management later Have at least two tiers: General purpose storage Secondary copy Maybe a scaled down version of your general purpose storage Maybe an archival/backup file system at a much lower cost Keep an eye on SATA-SSD cbinations NOTE: SSD might not be called out by name. Many caching file systems get the same benefit as SSD drives. 27

Backup and Data Protection w w w. C a m b r i d g e C o m p u t e r. c o m 28

The Importance of Backup Fault tolerance storage devices break all the time! Drive failures, controller failures, data corruptions File systems get brittle as they grow large The bigger they are the harder they fall Ideally, your backup system enables you to move a copy of the data off-site Your backup copy could eliminate doubt about the integrity of the data Backing up scratch space Many people don t, but there could be real lost productivity if scratch space were cprised. How much is sequencer time worth? 29

Backup Requirements / Policies What are your backup requirements? RPO Recovery Point Objective RTO Recovery Time Objective Retention Do you need to retain versions of files? How do you plan to access those files? Can the users help themselves? Can you do it through a file system snapshot? Are you worried about file system corruption and having to roll back the whole file system? Do all of your file systems or data types have the same backup requirements? Can you tier your storage around data protection requirements? 30

Backup / Offsite Storage Options Rsynch and its cmercial equivalents Replicate/copy your primary file system to a secondary file system Even better if target file system understands versions or can perform snapshots NAS with integrated snapshot and replication Conventional enterprise backup Mirrored or multi-site file system Setimes these are described as replicated opbjectbased storage systems 31

Limitation of Enterprise Backup Systems for Research Data Enterprise backup systems (Legato, NetBackup, etc.) perform repetitive full backups, even when data has not changed Enterprise users struggle with file systems as small as a few hundred Gigabytes Research data sets are too large for repetitive full backups The vast majority of research data does not typically change, so there is no benefit to repeated full backups. Note: TSM fr IBM is based on incremental backups. It is thus more suitable for research data, but still problematic. Data has to be restored before it can be accessed If you lose data, you have to restore it back to your disk before you can access it. This takes too long If you disk is broken, you have to first fix the disk before you can access data 32

Conventional Tape Backup/Restore is Too Slow An LTO-4 tape drive at full tilt, can theoretically back up 10TB per day One tape drive could theoretically back up a Petabyte in 100 days Yes, you can run multiple tape drives, but tape performance does not scale linearly and ultimately it beces hard to manage If it takes 100 days to backup a Petabyte to a single LTO-4 drive, how long is going to take to restore? We typically estimate restore time at 1.5x to 2x the backup time It is difficult to prioritize restore processes Generally, you restore everything before people go back to work Wouldn t it be nice if you could prioritize restore around the most recently accessed files? You can theoretically do this, but it would be a highly manual process. 33

Tape is Not Your Enemy Tape is not the enemy Conventional enterprise backup software is the enemy Tape is really inexpensive Incremental cost per TB with LTO-4 is as low as $50.00 It is practical to keep multiple copies of your data Yes, of course, there is a cost to tape libraries, management software, and labor, but the costs are very low cpared with disk systems Power and cooling Tape sits idle most of the time Idle tape libraries consume power akin to light bulbs Disk systems are akin to hair dryers 34

Tape File Systems Tape storage systems that emulate conventional file systems No need to restore files Access them directly fr tape Perhaps with the aid of a disk buffer Disk buffer can be large enough to accmodate likely working set of data Tape file systems can be used for backup as well as longterm redundant storage Tapes can be made redundant and a set can be stored off-site Guaranteed data integrity Data is broken up into shards and checksum is calculated If a file failed checksum test it is pulled fr alternative media Ingest rates of 3-5TB per day with just two tape drives 35

Archival File System with Both Disk and Tape Clients Archive Server Storage Connections: SAN, DAS, FC, iscsi Tape Disk Storage Disk cache for tape 36

Archival Tape File System Extended Between Two Sites 37

Multi-Tier Global File System Redundant Across Two Sites Global File System Scratch Clustered servers with local disk. Object-level redundancy across sites.. Working Enterprise NAS with snapshot and mirror. Archive Bulk SATA disk at site A. Tape cached by disk at site B. Object mirroring between sites. WAN, LAN, or MAN Site A Site B 38

Questions? w w w. C a m b r i d g e C o m p u t e r. c o m 39