Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang



Similar documents
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

Module 6. RAID and Expansion Devices

A Comparison on Current Distributed File Systems for Beowulf Clusters

Network Attached Storage. Jinfeng Yang Oct/19/2015

CSAR: Cluster Storage with Adaptive Redundancy

Enabling Technologies for Distributed Computing

COM 444 Cloud Computing

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

How To Create A Multi Disk Raid

Storage Virtualization from clusters to grid

How To Build A Clustered Storage Area Network (Csan) From Power All Networks

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Violin: A Framework for Extensible Block-level Storage

Cray DVS: Data Virtualization Service

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Enabling Technologies for Distributed and Cloud Computing

Performance Analysis of RAIDs in Storage Area Network

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

How To Improve Performance On A Single Chip Computer

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

MOSIX: High performance Linux farm

A NOVEL APPROACH FOR PROTECTING EXPOSED INTRANET FROM INTRUSIONS

Virtualised MikroTik

an introduction to networked storage

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

SMB Direct for SQL Server and Private Cloud

Clusters: Mainstream Technology for CAE

VMware vsphere 5.1 Advanced Administration

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

Storage Architectures for Big Data in the Cloud

Distributed File System Performance. Milind Saraph / Rich Sudlow Office of Information Technologies University of Notre Dame

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Scalable NAS for Oracle: Gateway to the (NFS) future

Cloud Computing through Virtualization and HPC technologies

An On-line Backup Function for a Clustered NAS System (X-NAS)

Installation Guide July 2009

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

W H I T E P A P E R. VMware Infrastructure Architecture Overview

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

(Scale Out NAS System)

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

VMware vsphere 5.0 Boot Camp

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

VIA COLLAGE Deployment Guide

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Using Linux Clusters as VoD Servers

KFUPM Enterprise Network. Sadiq M. Sait

Online Remote Data Backup for iscsi-based Storage Systems

SAN Conceptual and Design Basics

Discover Smart Storage Server Solutions

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

POWER ALL GLOBAL FILE SYSTEM (PGFS)

How To Virtualize A Storage Area Network (San) With Virtualization

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Lessons learned from parallel file system operation

COMP 7970 Storage Systems

High Availability Solutions with MySQL

Microsoft Exchange Server 2003 Deployment Considerations

Lecture 23: Multiprocessors

Introduction to Gluster. Versions 3.0.x

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Lecture 36: Chapter 6

ORACLE DATABASE 10G ENTERPRISE EDITION

Scala Storage Scale-Out Clustered Storage White Paper

Cisco Small Business NAS Storage

Quantum StorNext. Product Brief: Distributed LAN Client

Remote PC Guide Series - Volume 1

VIA CONNECT PRO Deployment Guide

HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN

Distributed Operating Systems. Cluster Systems

DAS, NAS or SAN: Choosing the Right Storage Technology for Your Organization

General Parallel File System (GPFS) Native RAID For 100,000-Disk Petascale Systems

Content Distribution Management

SCSI vs. Fibre Channel White Paper

Course Description and Outline. IT Essential II: Network Operating Systems V2.0

How To Install Linux Titan

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

IBM ^ xseries ServeRAID Technology

IBM System x GPFS Storage Server

TotalStorage Network Attached Storage 300G Cost effective integration of NAS and LAN solutions

VMWARE VSPHERE 5.0 WITH ESXI AND VCENTER

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Milestone Solution Partner IT Infrastructure Components Certification Summary

List of Figures and Tables

Transcription:

Distributed RAID Architectures for Cluster I/O Computing Kai Hwang Internet and Cluster Computing Lab. University of Southern California 1

Presentation Outline : Scalable Cluster I/O The RAID-x Architecture Cooperative disk drivers Benchmark Experiments Security and Fault Tolerance Conclusions

Scalable clusters providing SSI services are gradually replacing the SMP, cc-numa, and MPP in Servers, Web Sites, and Database Centers K. Hwang, March 15,2001 in Beijing 3

g Issues in Cluster Design Size Scalability (physical & application) g Enhanced Availability (failure management) g Single System Image (Middleware,OS extensions) g Fast Communication (networks & protocols) g Load Balancing (CPU, Net, Memory, Disk) g Security and Encryption (clusters of clusters) g Distributed Environment (User friendly) g Manageability (Jobs and resources ) g Programmability (simple API required) g Applicability (cluster- and grid-awareness) K. Hwang, March 15,2001 in Beijing 4

The USC Trojans Cluster Project Internet and Cluster Computing Lab. EEB Rm.104 g Sixteen Pentinum PCs are housed in two 9-ft computer racks. g All PCs run with the RedHat Linux v. 6.0 (Kernel v. 2.2.5) g All nodes are connected by a 100 Mbps Fast Ethernet g g The cluster is ported with DQS, LSF, MPI, PVM, TreadMarks, Elias, and NAS benchmarks, etc. Scalable to a future system with 100 s of future PC nodes interconnected by Gigabit networks http://andy.usc.edu/trojan/ K. Hwang, March 15,2001 in Beijing 5

Trojans Linux Cluster with Middleware for Security and Checkpoint Recovery Programming Environments (Java, EDI, HTML, XML) Web Windows User Interface Other Subsystems (Database, OLTP, etc.) Single-System Image and Availability Infrastructure Security and Checkpointing Middleware Linux Linux Linux Pentium PC Pentium PC Pentium PC Gigabit Network Interconnect K. Hwang, March 15,2001 in Beijing 6

An I/O-centric cluster architecture Entry Partition Client Internet/Intranet Database Partition Fast Ethernet Service Partition Service Flow Data Flow 4

Distributed RAID Embedded in Clusters or Storage-Area Networks: g I/O Bottleneck in Scalable Cluster Computing n The gap between CPU/Memory and disk-io widens as the mp doubles in speed every year n Cluster applications are often I/O-bound g Disks connected to hosts are often subject to failure by hosts themselves. Distributed RAID has much higher availability by fault isolation, rollback recovery, and automatic file migration. K. Hwang, March 15,2001 in Beijing 8

Distributed RAID with a single I/O space embedded in a cluster Cluster Network (SAN or LAN) Workstations or PCs ds-raid

Research Projects on Parallel and Distributed RAID System Attributes RAID Architecture environment Enabling Mechanism for SIOS Data Consistency Checking Reliability and Fault Tolerance USC Trojans RAID-x Orthogonal striping and mirroring in a Linux cluster Cooperative device drivers in Linux kernel Locks at device driver level Orthogonal striping and mirroring Princeton TickerTAIP RAID-5 with multiple controllers Single RAID server implementation Sequencing of user requests Parity checks in RAID-5 Digital Petal Chained Declustering in Unix cluster Petal device drivers at user level Lamport s Paxos algorithm Chained Declustering Berkeley Tertiary Disk RAID-5 built with a PC cluster xfs storage servers at file level Modified DASH protocol in the xfs file system SCSI disks with parity in RAID-5 HP AutoRAID Hierarchical with RAID-1 and RAID-5 Disk array within single controller Use mark to update the parity disk Mirroring and parity checks

Four RAID architectures using different mirroring and parity checking schemes Disk 1 Disk 2 Disk 3 Disk 4 Disk 1 Disk 2 Disk 3 Disk 4 B 0 B 1 M 0 M 1 B 0 B 1 B 2 P 1 B 2 B 3 M 2 M 3 B 4 B 5 P 2 B 3 B 4 B 5 M 4 M 5 B 8 P 3 B 6 B 7 B 6 B 7 M 6 M 7 P 4 B 9 B 10 B 11 B 8 B 10 B 9 B 11 M 8 M 10 M 9 M 11 B 12 B 16 B 13 B 14 P 5 (a) Striped mirroring in RAID-10 Disk 0 Disk 1 Disk 2 Disk 3 B 17 P 6 B 15 (b) Parity checking in RAID-5 Disk 0 Disk 1 Disk 2 Disk 3 Data blocks B 0 B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8 B 9 M 6 B 10 B 11 B 0 M 3 B 4 B 1 M 0 B 5 B 2 M 1 B 6 B 3 M 2 B 7 Mirrored blocks M 9 M 10 M 11 M 7 M 8 M 3 M 4 M 5 M 0 M 1 M 2 (c) Orthogonal striping and mirroring (OSM) in the RAID-x M 7 M 4 M 5 M 6 7B 8 B 9 B 10 B 11 M 11 M 8 M 9 M 10 (d) Skewed striping in a chained declustering RAID 6

Theoretical Peak Performance of Four RAID Architectures Performance Chained RAID-10 RAID-5 Indicators Declustering RAID-x Read n B n B n B n B Max. I/O Large Write n B (n-1) B n B n B Bandwidth Small Write n B nb / 2 n B n B Large Read mr / n mr / n mr / n mr / n Parallel Small Read R R R R Read or Parallel Large Write 2 mw / n mw / (n-1) 2 mw / n mw / n + Write Time mw / n(n-1) Max. Fault Coverage Small Write 2W R+W 2W W n/2 disk Single disk n/2 disk failures failure failures Single disk failure 7

P/M Distributed RAID-x architecture CDD P/M CDD Cluster Network Node 0 Node 1 Node 2 Node 3 B 0 B 12 B 24 M 25 M 26 M 27 B 4 B 16 B 28 M 29 M 30 M 31 B 8 B 20 B 32 M 33 M 34 M 35 D0 B 1 B 13 B 25 M 14 M 15 M 24 P/M CDD P/M CDD D1 D2 D3 B 2 B 14 B 26 M 3 M 12 M 13 D4 D5 D6 D7 B 5 B 17 B 29 M 18 M 19 M 28 B 6 B 18 B 30 M 7 M 16 M 17 D8 D9 D10 D11 B 9 B 21 B 33 M 22 M 23 M 32 B 10 B 22 B 34 M 11 M 20 M 21 B 3 B 15 B 27 M 0 M 1 M 2 B 7 B 19 B 31 M 4 M 5 M 6 B 11 B 23 B 35 M 8 M 9 M 10 8

Single I/O space in a Distributed RAID enabled by CDDs at Linux kernel level Interconnection Network Central NFS Server IDD Interconnection Network Cluster node IDD Cluster node IDD Cluster node Cluster node Cluster node CDD CDD CDD (a) Separate disks driven by independent disk drivers (IDDs) (b) A global virtual disk with a SIOS formed by cooperative disks K. Hwang, March 15,2001 in Beijing 14

Remote disk access using central NFS server versus using cooperative disk drivers in the RAID-x cluster User Level User Application NFS Server User Level Kernel Level 1 NFS Client 6 5 2 3 4 Traditional Device Driver Kernel Level Client side Server side (a) Parallel disk I/O using the NFS in a server/client cluster. User Level User Application NFS Server is bypassed User Level Kernel Level 1 CDD 6 5 2 CDD 4 3 Kernel Level Client side Server side (b) Using CDDs to achieve a SIOS in a serverless cluster. K. Hwang, March 15,2001 in Beijing 15

Architectural design of cooperative device drivers Node 1 CDD Communications through the network Node 2 CDD Cooperative Disk Driver (CDD) Data Consistency Module Storage Manager CDD Client Module Physical disks Virtual disks Communications through the network (a) Device masquerading (b) CDD architecture K. Hwang, March 15,2001 in Beijing 16

Maintaining consistency of the global directory /sios by all CDDs in the distributed RAID-x Cluster node 1 Cluster node 2 Cluster node 3 Cluster node 4 Application Application Application Application /sios /sios /sios /sios dir1 dir2 dir1 dir2 dir1 dir2 dir1 dir2 file1 file2 file3 file1 file2 file3 file1 file2 file3 file1 file2 file3 CDD CDD CDD CDD Cluster Network K. Hwang, March 15,2001 in Beijing 17

Elapsed Time in Executing the Andrew Benchmark on the Linux Cluster at USC Elapsed Time (sec) 35 30 25 20 15 10 5 Compile Read File Scan Dir Copy Files Make Dir Elapsed Time (sec) 8 7 6 5 4 3 2 1 0 1 4 8 12 16 Number of Clients 0 1 4 8 12 16 Number of Clients NFS results RAID-x results

Parallel Write Performance of four RAID Architectures against Traffic Rate 18 Aggregate Bandwidth (MB/s) 16 14 12 10 8 6 4 2 RAID-x Chained Declustering RAID-10 RAID-5 NFS 0 1 4 8 12 16 Number of Clients Parallel writes (20MB per client) 13

Parallel Write Performance of four RAID architectures vs. Disk Array Size 18 Aggregate Bandwidth (MB/s) 16 14 12 10 8 6 4 2 0 RAID-x Chained Declustering RAID-10 RAID-5 2 4 8 12 16 Disk Numbers Parallel write 13

Achievable I/O Bandwidth and Improvement Factor on Trojans Cluster I/O NFS RAID-x Operations 1 Client 16 Clients Improve 1 Client 16 Clients Improve Large Read 2.58 MB/s 2.3 MB/s 0.89 2.59 MB/s 15.63 MB/s 6.03 Large Write 2.11 MB/s 2.77 MB/s 1.31 2.92 MB/s 15.29 MB/s 5.24 Small Write 2.47 MB/s 2.81 MB/s 1.34 2.35 MB/s 15.1 MB/s 6.43 Operations Chained Declustering RAID-10 1 Client 16 Clients Improve 1 Client 16 Clients Improve Large Read 2.46 MB/s 15.8 MB/s 6.42 2.37 MB/s 10.76 MB/s 4.54 Large Write 2.62 MB/s 12.63 MB/s 4.82 2.31 MB/s 9.96 MB/s 4.31 Small Write 2.31 MB/s 12.54 MB/s 5.43 2.27 MB/s 9.98 MB/s 4.39 K. Hwang, March 15,2001 in Beijing 21

Effects of Stripe Unit Size on I/O Bandwidth of RAID Architectures 20 18 Aggregate Bandwidth (MB/s) 16 14 12 10 8 6 4 2 0 RAID-x Chained Declustering RAID-10 RAID-5 16 32 64 128 Stripe Unit Size (KB) Large write (320MB for 16 clients) 14

Bonnie Benchmark Results on Trojans Cluster 3.5 3 Output Rate (MB/s) 2.5 2 1.5 1 0.5 0 2 4 8 12 16 Number of Disks File rewrite RAID-x chained declustering RAID-10 RAID-5

Fault tolerance Increasing Reliability No data protection Securing Networks, Intranets, Clusters, or Grid Resources with intrusion control and automatic recovery from malicious attacks Gateway firewall to screen Cluster with no security protection Highly secured Intranet with intrusion detection and response, automatic traffic flow between networks recovery from malicious attacks, and fault-tolerance with distributed storage for reliable I/O SMP cluster Intranet Grid Increasing scalability

Distributed Micro-Firewalls K. Hwang, March 15,2001 Source: in BeijingMurali and Hwang, USC 25

Distributed Checkpointing on The RAID-x x in Trojans Cluster Time Process 0 Process 1 Process 2 Stripe0 Process 3 Process 4 Process 5 Process 6 Stripe1 Process 7 Process 8 Process 9 Process10 Stripe2 Process11 C: Checkpointing overhead C S S: Synchronization overhead 18

Security Component Technologies Firewalls and Cryptography Cluster Middleware for Security Anti-virus and Immune Systems Intrusion Detection and Response Distributed Software RAIDs Security & Assurance Policies K. Hwang, March 15,2001 in Beijing 27

Distributed Intrusion Detection and Responses Security Threats Insider attacks Denial-of- Service attacks Trojan Program IP Address Spoofing Probes and Scans Unauthorized External access Attacks on Intranet Infrastructure Effectiveness in using Micro-Firewalls Protect hosts against attack from insiders Protect against denial-of-service attacks from any source Protect hosts from trapdoors by any source Can be reconfigured to prevent IP spoofing at the client host level Use with IDS to block the probes and scans close to their sources Can prevent unauthorized access to the external networks at the source Resist both internal and external attacks and provide fine-grained access control

checkpoint overhead (sec) 3.5 3 2.5 2 1.5 1 0.5 0 Checkpointing overhead on distributed RAIDs NFS Vaidya Striped 1.21 2.21 3.21 4.21 5.21 6.21 7.21 8.11 checkpoint file size (MB) K. Hwang, March 15,2001 in Beijing 29

Advantages and Shortcomings of Distributed Checkpointing Checkpointing Scheme Advantages Shortcomings Suitable applications Simultaneous writing to a central storage (The NFS scheme) Staggered writing to a central storage (Vaidya scheme) Striped staggering checkpointing on any distributed RAID (Our scheme) Simple, no inconsistent state Eliminate the network and I/O contention Eliminate network and I/O contentions, low checkpoint overhead, fully utilize network bandwidth, tolerate multiple failures among stripe groups Has network and I/O contentions, NFS is single point of failure Network bandwidth is wasted, NFS is a single point of failure Can not tolerate more node failures within each stripe group Small size of checkpoint, small number of nodes, low I/O operation Small size of checkpointers, small number of nodes, low I/O operations Large size of checkpointers, large number of nodes, low communication, I/O intensive applications K. Hwang, March 15,2001 in Beijing 30

Conclusions : Distributed storage-area networks demands hardware or software support of a single I/O space not only in clusters but also in pervasive information grids. Hierarchical checkpointing with striping and staggered mirroring for building fault-tolerant clusters to provide continuous network services Hacker-proof clusters are in great demand for securing E-business, distributed computing, and metacomputing grid applications. Exploring new applications in multiserver consolidation, collaborative design, and pervasive network services. K. Hwang, March 15,2001 in Beijing 31

Call for Participation IEEE Third International Conference on Cluster Computing CLUSTER 2001 Sutton Place Hotel, Newport Beach, California October 8-11, 2001