Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, 2009. Balancing Cost, Complexity, and Fault Tolerance



Similar documents
Scalable Storage for Life Sciences

Alternatives to Big Backup

Object Storage, Cloud Storage, and High Capacity File Systems

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Optimizing Large Arrays with StoneFly Storage Concentrators

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

WHITEPAPER: Understanding Pillar Axiom Data Protection Options

List of Figures and Tables

June Blade.org 2009 ALL RIGHTS RESERVED

How To Protect Data On Network Attached Storage (Nas) From Disaster

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Object Storage: Out of the Shadows and into the Spotlight

Get Success in Passing Your Certification Exam at first attempt!

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level. Jacob Farmer, CTO, Cambridge Computer

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

How to choose the right RAID for your Dedicated Server

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

an introduction to networked storage

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

E Number: E Passing Score: 800 Time Limit: 120 min

EMC arhiviranje. Lilijana Pelko Primož Golob. Sarajevo, Copyright 2008 EMC Corporation. All rights reserved.

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

EMC Invista: The Easy to Use Storage Manager

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Long term retention and archiving the challenges and the solution

PASIG May 12, Jacob Farmer, CTO Cambridge Computer

HP and Mimosa Systems A system for archiving, recovery, and storage optimization white paper

Building Storage Service in a Private Cloud

How To Virtualize A Storage Area Network (San) With Virtualization

A Deduplication File System & Course Review

Breaking the Storage Array Lifecycle with Cloud Storage

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Traditionally, a typical SAN topology uses fibre channel switch wiring while a typical NAS topology uses TCP/IP protocol over common networking

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

Snapshot Technology: Improving Data Availability and Redundancy

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Making the Move to Desktop Virtualization No More Reasons to Delay

The Modern Virtualized Data Center

Server and Storage Virtualization with IP Storage. David Dale, NetApp

(Scale Out NAS System)

DAS (Direct Attached Storage)

HP ProLiant Storage Server family. Radically simple storage

NETWORK ATTACHED STORAGE DIFFERENT FROM TRADITIONAL FILE SERVERS & IMPLEMENTATION OF WINDOWS BASED NAS

Scalable NAS for Oracle: Gateway to the (NFS) future

Scalable Windows Server File Serving Clusters Using Sanbolic s Melio File System and DFS

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

VERITAS Volume Manager Delivering High Availability Storage Management Tools for Microsoft Windows

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Best Practices for Managing Storage in the Most Challenging Environments

Contingency Planning and Disaster Recovery

High Availability with Windows Server 2012 Release Candidate

Introduction to Data Protection: Backup to Tape, Disk and Beyond

Brian LaGoe, Systems Administrator Benjamin Jellema, Systems Administrator Eastern Michigan University

Uncompromised business agility with Oracle, NetApp and VMware

Library Recovery Center

SAN & NAS Virtualization. Yakov Cohen Regional Technology Consultant, AMESA EMC

Symantec Enterprise Vault And NetApp Better Together

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Dell PowerVault DL2200 & BE 2010 Power Suite. Owen Que. Channel Systems Consultant Dell

Network Attached Storage. Jinfeng Yang Oct/19/2015

nexsan NAS just got faster, easier and more affordable.

Technical Note. Dell PowerVault Solutions for Microsoft SQL Server 2005 Always On Technologies. Abstract

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

StoneFly SCVM TM for ESXi

Virtualization, Business Continuation Plan & Disaster Recovery for EMS -By Ramanj Pamidi San Diego Gas & Electric

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

The Microsoft Large Mailbox Vision

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

PIONEER RESEARCH & DEVELOPMENT GROUP

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Enhancements of ETERNUS DX / SF

Storage Technologies for Video Surveillance

Storage Virtualization

How Does the ECASD Network Work?

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Technology Insight Series

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

White. Paper. Improving Backup Effectiveness and Cost-Efficiency with Deduplication. October, 2010

Business Process Desktop: Acronis backup & Recovery 11.5 Deployment Guide

Cisco Small Business NAS Storage

Understanding Storage Virtualization of Infortrend ESVA

Reliability and Fault Tolerance in Storage

CASS COUNTY GOVERNMENT. Data Storage Project Request for Proposal

Created By: 2009 Windows Server Security Best Practices Committee. Revised By: 2014 Windows Server Security Best Practices Committee

SAN Conceptual and Design Basics

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Module: Business Continuity

EMC DATA DOMAIN OPERATING SYSTEM

UniFS A True Global File System

VERITAS Backup Exec 9.0 for Windows Servers

Transcription:

Storage Design for High Capacity and Long Term Storage Balancing Cost, Complexity, and Fault Tolerance DLF Spring Forum, Raleigh, NC May 6, 2009 Lecturer: Jacob Farmer, CTO Cambridge Computer Copyright 2009, Cambridge Computer Services, Inc. All Rights Reserved www.cambridgecomputer.com 781-250-3000

My Background 20+ years experience working with data protection, archiving, and storage technologies CTO of Cambridge Computer since 1991 Cambridge Computer provides consulting and a variety of services with a focus on data storage. 25% of my time is spent circulating in the industry, meeting with vendors, and researching trends 75% of my time is end-user facing My work is across all industries. About 50% of my business is higher education Often get involved in projects with the libraries Plenty of other industries struggle to store and manage digital objects Lecturer for Usenix www.cambridgecomputer.com 2

My Contact Info Jacob Farmer, CTO Cambridge Computer 271 Waverley Oaks Road Waltham, MA 02452 Jfarmer@CambrigdeComputer.com Twitter: JacobAFarmer www.cambridgecomputer.com 3

A Little Plug for Usenix I volunteer my time to lecture for the Usenix Association Usenix is a nonprofit whose mission is the advancement of information technology. Usenix on the Road A free program that brings Usenix storage technology tutorials to college campuses. Topics include: Tiered Storage and Archiving: Best Practices for Life Cycle Management and Digital Preservation Next Generation Storage Networking: Beyond Conventional SAN & NAS Eliminating Backup System Bottlenecks www.cambridgecomputer.com 4

Pain Points for Long Term Storage Conventional file storage technology does not scale up gracefully. Disk-based file systems are limited in capacity and as they grow they become more prone to failure and corruption. The bigger they are the harder they fall. Corruption is hard to detect and harder to fix once you detect it. Conventional backups do not scale. When your disk capacity doubles, you have to double both the capacity AND the performance of the backup system. Unconventional storage technology tends to be expensive. Not just acquisition cost, but total cost of ownership www.cambridgecomputer.com 5

Saving Graces Enterprise SAN and NAS systems are not necessarily the solution. These products are designed to handle the complexity and diversity of a large data center. They might have some useful properties, but you can find their essential value elsewhere. Save your money! Fixed content is easier to manage than dynamic content. Files that do not change are easier to back up and replicate. www.cambridgecomputer.com 6

Cost = Not Just Acquisition Cost Maintenance Is it affordable. What if you opt not to pay it? Upgrades How much do they cost relative to acquisition cost? Opportunity costs As storage gets cheaper and cheaper, does your system allow you to take advantage of the declining costs? Product life cycle Are you forced to replace your equipment at an inopportune time? What costs are associated with replacing it? Data Migration? Environmental Power, cooling, physical space Operational costs www.cambridgecomputer.com 7

What Do We Care About Other than Costs? Fault tolerance Uptime Data Protection / Redundancy Data Integrity Assurance Performance But maybe we have a high performance version of the file that is separate from the preservation format Operational Efficiency / Ease of Use Environmental Rack space, power, and cooling www.cambridgecomputer.com 8

Storage 101 w w w. C a m b r I d g e C o m p u t e r. c o m 9

Files v. Blocks Blocks Least common denominator in conventional storage technologies. A block is a unit of data storage. Hard drives and RAID arrays serve requests for blocks. Files Objects consisting of multiple blocks. Blocks are organized into files by file systems, which are like databases of all of the files, their attributes and records of the blocks that make up the files. www.cambridgecomputer.com 10

The Storage I/O Path Application Operating System File System Storage has a layered architecture, very much like a network stack. Disk drives store data in blocks. Each block has a unique numerical address. Disk devices (hard drives, RAID systems, etc.) are like block servers, meaning you ask them to perform operations on specific blocks. www.cambridgecomputer.com 11

Network File Services CIFS or NFS Client File Server Application LAN Server Other Apps Operating System LAN Operating System File System LAN Client File System Disk Disk www.cambridgecomputer.com 12

NAS = Fancy Word for File Server NAS = Network Attached Storage Marketing term to refer to a file server appliance. Appliance Marketing term to describe a turnkey product based on hardware and software. Typically, the software and the hardware are inextricable. When you refresh the hardware, you replace the software. When you want to replace the software, you dump the hardware. www.cambridgecomputer.com 13

File Systems and Their Limitations Conventional file systems do not scale very well. Microsoft NTFS can address up to 256TB of capacity in a single file system. But no one in their right mind would ever do this. Leading NAS products typically do not offer more than a few TB in a single volume. Large file systems can become corrupt and it is very difficult to identify and root out corruption. Backing up large file systems takes forever. Restore takes even longer! www.cambridgecomputer.com 14

Block-Level Abstraction Abstraction of the physical disk. File system asks for specific block addresses and volume manager fulfills request from the disks. Software RAID Hard Drive Fault Tolerance Spindle Aggregation Host-based mirroring Volume-level snapshots Volume-level replication Application Operating System File System Volume Manager Disk Disk Disk www.cambridgecomputer.com 15

RAID Arrays = Hardware-Enabled Block Abstraction External Array Application Operating System File System Volume Manager Internal Array Application Operating System File System Volume Manager DISK CONTROLLER RAID CARD www.cambridgecomputer.com 16

SAN Array = Centralized, External Abstraction Application Operating System File System Volume Manager Application Operating System File System Volume Manager Application Operating System File System Volume Manager SAN DISK CONTROLLER www.cambridgecomputer.com 17

LUNS Logical Hard Drives Carved Out of a Disk Array LUN = Logical Unit Number Subset of the SCSI ID SCSI ID = Street Address LUN = Apartment Number 6TB Stripe LUN 0 LUN 1 LUN 2 SPARE SPARE SPARE www.cambridgecomputer.com 18

The Layers Web Server Web Server Asset Management File System File System LUN LUN LUN LUN LUN LUN LUN www.cambridgecomputer.com 19

Another 3-Letter Acronym - CAS CAS Originally coined by EMC to describe their Centera Product CAS = Content Addressable Storage The file is addressed by a unique number that is derived by hashing the file. New meaning Content Aware Storage All this means is that the storage sub-system knows that it is holding a bunch of objects and the management logic is tailored to object management. Other than that, there is no official industry definition. www.cambridgecomputer.com 20

Reducing the Size of the Redundancy Unit Clustered Object Storage Normal Operation Drive Failture Small units are stored redundantly across the pool of drives and cabinets. Recovery from a failed drive is highly parallel. www.cambridgecomputer.com 21

Properties Associated with Content- Aware Storage Self healing Immutability (write once) Single instance storage a/k/a deduplication File integrity checking If redundant copies are stored, corrupt files might be automatically replaced with a non-corrupt file Many CAS systems offer a gateway that emulates a conventional file system. www.cambridgecomputer.com 22

File System Gateway in Front of Replicated CAS File System Gateway LAN WAN LAN www.cambridgecomputer.com 23

Storage Logic Can Live Pretty Much Anywhere LAN Application Operating System File System Volume Manager SAN www.cambridgecomputer.com 24

Three Ways to Grow Big Monoliths, Aggregates, and Federations w w w. C a m b r I d g e C o m p u t e r. c o m 25

Monoliths Monoliths are single systems that meet all of your foreseeable needs. Monoliths are relatively easy, very reliable, and some scale pretty big (1-2PB) As monoliths get really big, you encounter some problems. Might be expensive Replacement at end of life cycle can be painful Much harder to extract a wisdom tooth than an incisor If they break, it can be really messy The moment you outgrow your monolith, you lose the benefits of having a monolith. www.cambridgecomputer.com 26

A Simple Monolithic File Server Single File System SPARE SPARE SPARE www.cambridgecomputer.com 27

Replication and Backup of a Monolithic File Server Replication Replication can be as simple as a file copy. Once a day, copy all files that were added since yesterday. Alternatively, there are directory synchronization solutions on the market that are very inexpensive. Backups Since the files do not change, there is no need to do full backups once a week. Only back up unique files. www.cambridgecomputer.com 28

Using Archival File System (or CAS) as Backup Target for a Large File Server LAN High performance file system or NAS Tiered Archival File System www.cambridgecomputer.com 29

Aggregates Aggregates are systems built of individual components that together behave like a monolithic system. No one piece of the system can work by itself. The challenge with aggregates is that if any one component fails, the whole system could be compromised The more components you have, the higher the probability of failure Aggregates require some kind of redundancy model www.cambridgecomputer.com 30

What Happens When You Outgrow the Monolith SPARE SPARE SPARE SPARE SPARE SPARE SPARE SPARE SPARE The more pieces you have, the more likely you are to have a piece fail. www.cambridgecomputer.com 31

Aggregated File System with Parity Protection Data is striped across members of the aggregate with a parity scheme that allows one or more boxes to fail. Usable capacity is N-1. Some systems allows N-2, N-3, etc. www.cambridgecomputer.com 32

Aggregated System with 1:1 Redundancy You lose 50% of your capacity. Maybe you go 3- way redundancy and lose 66% www.cambridgecomputer.com 33

1:1 Redundancy Between Sites HA, Fault Tolerance, and DR in One LAN, MAN, WAN www.cambridgecomputer.com 34

N-Way Redundancy Mirroring Between Disk and Tape LAN/MAN LAN/MAN Meta Data and Device Management Meta Data and Device Management www.cambridgecomputer.com 35

Federations A logical grouping of independent systems that behave as one global file system. Different from aggregates in that each member of the federation is a free-standing element. If the federations were to dissolve or melt down, data is still intact and accessible from the individual members. Some federated solutions allow you to interact directly with the members. This is particularly useful when federating with high performance file systems. Members of a federation can have very different properties. www.cambridgecomputer.com 36

In-Band File System Federation In-band Virtualization Appliance or Switch File Servers / NAS www.cambridgecomputer.com 37

Out-of-Band File System Federation Virtual File System Server File Servers / NAS www.cambridgecomputer.com 38

Life Cycle Management with Federated Virtual File Systems Most virtual file systems offer some kind of integrated tiering logic Might work on the directory level or the file level. Policy engines will vary in capabilities Sophisticated features: Write staging new data is written into high performance pool. Transparent download to local disk for faster processing Management of files across a wide area Web portal Caching and management of local copies www.cambridgecomputer.com 39

Content Classification with Federated, Virtual File Systems Extensible metadata The central database can typically be customized to serve as a catalog of all files in the file system. Users can annotate their files and directories. Metadata can be associated with files as they are created and used. Tags from embedded file metadata can be ingested into the file system catalog TIFF images, DICOM, etc. Virtual directories can be created with a SQL query. www.cambridgecomputer.com 40

Other Useful Features of Federated File Systems Federated user directories Share files between different institutions with a single signon. File access auditing Backup Maintain backup copies locally or in other sites Self service delegation of file access privileges Users can easily grant access to their files to other users and groups. www.cambridgecomputer.com 41