Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation



Similar documents
recovery at a fraction of the cost of Oracle RAC

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

How it can benefit your enterprise. Dejan Kocic Netapp

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Understanding Enterprise NAS

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Data Center Convergence. Ahmad Zamer, Brocade

High Performance Computing OpenStack Options. September 22, 2015

Building Private Cloud Architectures

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Quantum StorNext. Product Brief: Distributed LAN Client

Fundamental Approaches to WAN Optimization. Josh Tseng, Riverbed

Scalable NAS for Oracle: Gateway to the (NFS) future

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

A Survey of Shared File Systems

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

SanDisk ION Accelerator High Availability

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Sujee Maniyam, ElephantScale

POWER ALL GLOBAL FILE SYSTEM (PGFS)

PolyServe Matrix Server for Linux

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Big data management with IBM General Parallel File System

Restoration Technologies. Mike Fishman / EMC Corp.

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

How To Migrate To A Network (Wan) From A Server To A Server (Wlan)

Technology Insight Series

Veritas Cluster Server from Symantec

(Scale Out NAS System)

SSD and Deduplication The End of Disk?

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

VERITAS Cluster Server v2.0 Technical Overview

An Introduction to Storage Management. Raymond A. Clarke, Oracle

CommuniGate Pro White Paper. Dynamic Clustering Solution. For Reliable and Scalable. Messaging

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

High Availability Storage

How To Understand And Understand The Risks Of Configuration Drift

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

Introduction to NetApp Infinite Volume

PARALLELS CLOUD STORAGE

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Trends in Application Recovery. Andreas Schwegmann, HP

Application Brief: Using Titan for MS SQL

Introduction to Data Protection: Backup to Tape, Disk and Beyond

HP StorageWorks Enterprise File Services Clustered Gateway. Technical Product Review. Sprawl Makes Inexpensive Servers Expensive

Cloud File Services: October 1, 2014

Software-defined Storage Architecture for Analytics Computing

Protect Data... in the Cloud

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

Red Hat Cluster Suite

How to Choose your Red Hat Enterprise Linux Filesystem

HA for Enterprise Clouds: Oracle Solaris Cluster & OpenStack

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

Cloud Storage Clients. Rich Ramos, Individual

WAN Optimization and Cloud Computing. Josh Tseng, Riverbed

DeltaV Virtualization High Availability and Disaster Recovery

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Solution Brief Network Design Considerations to Enable the Benefits of Flash Storage

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Accelerating Application Performance -- Tier-0

How To Virtualize A Storage Area Network (San) With Virtualization

STORAGE CENTER WITH NAS STORAGE CENTER DATASHEET

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

High Availability Solutions for the MariaDB and MySQL Database

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Scala Storage Scale-Out Clustered Storage White Paper

Uncompromised business agility with Oracle, NetApp and VMware

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Simplify Data Management and Reduce Storage Costs with File Virtualization

Cloud Based Application Architectures using Smart Computing

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Object Storage: Out of the Shadows and into the Spotlight

Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage

Transcription:

Scale and Availability Considerations for Cluster File Systems David Noy, Symantec Corporation

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract Scale and Availability Considerations for Cluster File Systems This session will appeal to server administrators looking to improve the availability of their mission critical applications using a heterogeneous tool that is cross platform and low cost. We will learn how you can use Cluster File System to improve performance and availability for your application environment. 3

Simultaneous Access To File System Without a Clustered File System File System 1 File System 2 File System 3 File System 4 A traditional file system can only be mounted on one server at any given time, otherwise data corruption will occur. With a Clustered File System Veritas Cluster File System HA With a cluster file system, all servers that are part of the cluster can safely access the file system simultaneously

How is a CFS different from NAS? Network Attached Storage (NAS) Uses TCP/IP to share a file system over the Local Area Network Higher latency and overhead A file system is mounted via a network based file system protocol (CIFS on Windows, NFS on Unix) NFS Cluster File System Looks and feels like a local file system, but shared across multiple nodes Uses a Storage Area Network to share the data, And dedicated network interfaces to share locking information Tightly coupled with clustering to create redundancy With a Cluster File System the application can run on the same node as the file system to get the best performance

Typical Performance Decay at Scale Server Capacity Available Capacity COST PERFORMANCE Bottlenecks as a result of legacy technology Performance per node can degrade with more nodes Linear scalability generally infeasible (results vary based on app) 6

Expected Scalability for a CFS 2500 Throughput for CFS for 1 16 nodes 2000 1500 1000 Total MB/Sec to CFS 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of Nodes 7

Legacy Meta-Data Management File system transactions done primarily by one master node per file system That node becomes the bottleneck in transaction intensive workloads The amount of transactions performed locally can be improved but it is still the bottleneck Acceptable performance for some workloads, not for others Metadata/Data Data CFSData Data

Distributed Meta-Data Management All nodes in the cluster can perform transactions No need to manually balance the load Not directly visible to end users Performance tests show linear scalability Scale-out becomes complex beyond 64 nodes CFS Metadata / Data Metadata / Data SAN Metadata / Data Log per node Metadata / Data Servers

Without Range Locking: Exclusive locking Node 1 Writer with an exclusive lock Node 2 Writer waiting for lock to be released File foo.ba r Appending writes, e.g. a log file Lock held by node 1, other nodes do not have access until the lock is released

Range Locking Node 1 - Writer Node 2 - Writer Appending writes, e.g. data ingest File foo.bar Available for read or write by any nodes Lock held by node 1, other nodes do not have access

Other Scalability Considerations Distributed Meta-Data Management Distributed Lock Management Minimize intra-cluster messaging Cache optimizations Reduce or eliminate fragementation

Availability : I/O Fencing Node A Node B Coordinator Disks Data Disks 13

I/O Fencing : Split Brain Node A Node B Coordinator Disks Data Disks 14

I/O Fencing : Split Brain Resolved Y N Node A Node B Coordinator Disks Data Disks 15

Cluster Fencing Why Fencing? SCSI-3 Based Fencing Proxy Based Fencing APP APP APP APP Data Corruption Need to restrict writes to current and verified nodes SCSI3 PR Disks SCSI3 disks for i/o fencing Maximum data protection IP Based Proxy Servers Non SCSI3 fencing Virtualized environment 16

Other Availability Considerations Robust fencing mechanism Tight integration between application clustering and storage layer File System Volume Manager Multi-Pathing Quick recovery of CFS objects Lock Recovery Meta-Data Redistribution

Cluster File System Use Cases Faster Failovers Performance for Parallel Apps Improve Storage Utilization Databases ETL applications Messaging applications Grid Applications CRM applications Parallel databases Dramatically reduce application failover times When appropriate, eliminate the cost and complexity of parallel databases Linear scalability and performance for parallel applications Ideal for grid compute Pool storage between servers eliminating storage islands Eliminate multiple copies of data

Accelerate Application Recovery For Applications that Need Maximum Uptime Classic Clustering Requires Time Databases Client Recovery Steps Messaging applications CRM applications Classic Clustering File System Active Failed Server Classic Clustering File System Passive Active Server Detect failure Un-mount file system Deport disk group Import disk group Mount file system Start application Clients reconnect

Accelerate Application Recovery For Applications that Need Maximum Uptime Failover as Fast as Application Restart Databases Messaging applications CRM applications Veritas Cluster Server Failed Server Client Veritas Cluster Server Veritas Storage Veritas Storage Veritas Cluster File System Foundation Foundation Active Server Recovery Steps Detect failure Un-mount file system Deport disk group Import disk group Mount file system Start application Clients reconnect

Case Study : Reduced Downtime Cost Before: Ground Control System Using CRM and DataBases SAP and Oracle with HA 6x 30 Minute Failovers / Year Traditional High Availability Downtime cost = 1,000,000/plane/day # Failovers/yr = 6 Failures @ 30 mins each # Planes = 240 Total Downtime/Year = 3 hours @ 1,000,000/plane/day x 3 hrs x 240 planes Downtime Cost = 30,000,000 After: Ground Control System Fast Failover with CFS Prod SG Prod SG < 1 Minute Downtime / Failure Fast Failover with CFS HA Downtime cost = 1,000,000/plane/day # Failovers/yr = 6 Failures @ 1 min each # Planes = 240 Total Downtime/Year = 6 minutes @ 1,000,000/plane/day x 6 min x 240 planes Downtime Cost = 1,000,000

Case Study : Large Casino Messaging Services FT Cluster Messaging server Using messaging services to run entire casino gaming floor Experienced server outage Storage failover was very quick Failover server required 20 minutes to recover to rebuild message queue Entire casino floor came to a complete stop Messaging Server Veritas Storage Foundation CFS Client Messaging Server Veritas Storage Foundation Total Savings With CFS, a hot-standby server could recover in seconds instead of 20 minutes Failed Server Active Server Shared SAN

Case Study : Reduce Billing Cycle Without CFS OSS Customer Care ETL Billing Drawbacks Time required to process customer billing included 12 hours of copy time For billing systems, time is money Redundant copies of data at each server means 2x the storage requirements With CFS OSS Customer Care ETL Billing CFS Benefits CFS eliminates copy time so one process can start when another completes Single copy of data shared among servers 12 hour reduction in billing cycle

Case Study : Storage Consolidation Traditional Application Architecture Shared Storage Architecture SPARE SPARE SPARE SPARE SPARE PROD POST PROC POST PROC POST PROC PROD POST PROC POST PROC POST PROC 1 2 3 4 4 x 500GB ISLANDS = 2TB Grow 1.5 TB POOL 25% Less Traditional File System Islands of storage zoned to each server Storage over-provisioned due to unknown storage growth needs When storage is filled, new storage must be provisioned Shared Cluster File System Storage is accessible to all nodes Reduce upfront over-provisioning All nodes share common free space Minimize idle server and storage resources

Availability and Scale for Clustered NAS High Availability of NFS and CIFS service Distributed NFS load balancing for performance Scale servers and storage independently More servers gives linear performance Flexible storage growth Stretch your NFS/CIFS cluster (up to 100 km active/active) Choose your platform (Solaris, Sol x86, AIX, RHEL5) Integrated with Dynamic Storage Tiering Dynamic Multi Pathing Thin Provisioning Reclamation Increased Price / Performance compared to a similar NAS appliances Lock Coordination

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: trackfilemgmt@snia.org Karthik Ramamurthy David Noy Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee 26