1 Storage Devices Summary



Similar documents
RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Price/performance Modern Memory Hierarchy

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

RAID Performance Analysis

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

CS 6290 I/O and Storage. Milos Prvulovic

Lecture 36: Chapter 6

Disk Storage & Dependability

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Data Storage - II: Efficient Usage & Errors

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Storing Data: Disks and Files

Introduction to I/O and Disk Management

HP Smart Array Controllers and basic RAID performance factors

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Chapter 9: Peripheral Devices: Magnetic Disks

CS161: Operating Systems

Sistemas Operativos: Input/Output Disks

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Reliability and Fault Tolerance in Storage

How To Improve Performance On A Single Chip Computer

Why disk arrays? CPUs improving faster than disks

Storage. The text highlighted in green in these slides contain external hyperlinks. 1 / 14

Definition of RAID Levels

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Chapter 10: Mass-Storage Systems

Q & A From Hitachi Data Systems WebTech Presentation:

3PAR Fast RAID: High Performance Without Compromise

Big Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Database Management Systems

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Chapter 12: Mass-Storage Systems

RAID 5 rebuild performance in ProLiant

Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Storage in Database Systems. CMPSCI 445 Fall 2010

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

Improving Lustre OST Performance with ClusterStor GridRAID. John Fragalla Principal Architect High Performance Computing

CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC

What is RAID? data reliability with performance

How To Create A Multi Disk Raid

Why disk arrays? CPUs speeds increase faster than disks. - Time won t really help workloads where disk in bottleneck

How To Write A Disk Array

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

Summer Student Project Report

Chapter 11 I/O Management and Disk Scheduling

RAID Storage, Network File Systems, and DropBox

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

RAID: Redundant Arrays of Independent Disks

Virtuoso and Database Scalability

PIONEER RESEARCH & DEVELOPMENT GROUP

Oracle Database 10g: Performance Tuning 12-1

Input/output (I/O) I/O devices. Performance aspects. CS/COE1541: Intro. to Computer Architecture. Input/output subsystem.

IBM ^ xseries ServeRAID Technology

VERITAS Volume Management Technologies for Windows

Distribution One Server Requirements

Communicating with devices

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

RAID Levels and Components Explained Page 1 of 23

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Filing Systems. Filing Systems

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

PERFORMANCE TUNING ORACLE RAC ON LINUX

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

CS 153 Design of Operating Systems Spring 2015

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

AIX NFS Client Performance Improvements for Databases on NAS

Disk Array Data Organizations and RAID

RAID Made Easy By Jon L. Jacobi, PCWorld

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Solving Data Loss in Massive Storage Systems Jason Resch Cleversafe

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

An Introduction to RAID. Giovanni Stracquadanio

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

RAID Overview

Google File System. Web and scalability

Performance Report Modular RAID for PRIMERGY

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

The Bus (PCI and PCI-Express)

Mass Storage Structure

Windows 8 SMB 2.2 File Sharing Performance

Windows Server Performance Monitoring

Intel RAID Controllers

RAID Basics Training Guide

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

RAID Overview: Identifying What RAID Levels Best Meet Customer Needs. Diamond Series RAID Storage Array

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

Benchmarking Cassandra on Violin

Transcription:

Chapter

1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious corollary bandwidth for small block is lower than large blocks. Manufacturers quote favourable numbers (for as long as they can get away with it). Multimedia processing on the desktop may need excellent throughput Desktops performance dominated by response time Servers by throughput. Dependability Fault is the failure of a component Component failure does not necessary lead to system failure. System failure leads to service interruption. Restoration is return to service accomplishment Whirlwind was said to provide 35 hours a week with 90% Which in 1951 was excellent

2 Storage Devices Calculating dependability Manufacturers quote mean time to failure. MTTF depends on device Service interruption mean time to repair MTTR depends on your organisation Mean time between failures MTBF = MTTF + MTTR Availability = MTTF/(MTTF+MTTR) Availability is something you can manage. If MTTR = 0. Availability can be 100% even with a small MMTF. Rapid replacement may trump high dependability; and be cheaper. Remember to include staff costs

3 Storage Devices Access Sector: Sector ID. Data fixed number of bytes eg 512 Error correcting code, synchronization field and gaps. ECC on controller system may never see errors. Remapping sectors, forward error correction. Need to monitor for errors to replace before failure. Access: Queuing delay if other accesses pending. On board control may re-arrange accesses elevator fashion. Again conflict between use as fast as possible & service as much as possible Seek head movement; all heads together, hence cylinders. Arrange data in cylinders for fast retrieval Rotational latency Data transfer Controller overhead (error correction) Actual seek may be better than average locality Drives may include cache memory, which acts in a similar way to CPU cache. Fetching more data than required. Pre-fetching subsequent blocks. Clever controllers make it hard to predict what will happen. How does CPU prefetching interact with the disk

4 I/O IO commands Devices managed by special purpose hardware Transfers data Synchronises with software Command registers control the device Status registers information including errors Data registers to transfer data to and from the device Instructions can often only be accessed in kernel mode Memory mapped I/O. Registers are in the same space as memory. Address decoder distinguishes between them. Simplifies programming hard work delegated to the OS Very useful for large transfers Device ready (or error) CPU interrupted. Must handle Can priority order devices, so one device may interrupt another. High speed transfer using direct memory access DMA. OS provides starting address and controller working autonomously transfers data. Interrupt on completion.

5 I/O performance Measure performance Make sure you are measuring what you think you are: Disk controllers may re-order requests and pre-fetch results. The bus used has an influence. The OS you are running. The application you are using. DBMS tend to do clever things with disks the results for a database may not reflect a simpler app. Response time and throughput are antagonistic. Choose benchmarks to match application. Beware of tuning Transaction Processing Council (TPC) develop benchmarks to estimate database performance. Small amounts of data transferred (DBMS queries). Measure I/O rate, not data rate may count overheads as part of the payload and give a larger number than you will see. TPC benchmarks published and used by industry. Hardware/Software suppliers may tune their systems to score well in TPC (true of all benchmarks) Also benchmarks for file systems, by stressing an NFS server with a workload based on actual measurements. Benchmarks for web servers.

6 Storage Fragments Why arrays? Users need space to store data for their work. Simplest is to provide them with a disk. When they run out of space provide them with a larger disk. When they have filled the largest disk provide them with a second disk. Repeat. A serious management problem Drawbacks what happens to the smaller disks. How does the user keep track of how many disks they have and where their data is. How does the system manager keep track of disks and make sure they are backed up. Buy three users 1 TB disk and they may only use 100Gbytes now and will not reach 300GBytes for six or seven months. Buying too early and you are wasting money. Recycle to smaller users? Solution introduce a unified disk system.

7 Disk Arrays Linux mount points A linux system has a root at / On this system a number of directories can hang /home /usr /var /local They can be directories on a disk, but any directory can also be a mount for another disk. So /home can be a completely separate disk. /home/kyberd might be a directory on this home disk while /home/ellis may be a mount point for a completely separate disk. Linux allows a single directory structure Management issue This starts to create a single uniform directory space. It is not completely transparent. If I have 200Gb on /home/kyberd and I want an increase to 500Gb I can add a bigger disk and copy files across. Make a mount point /home/kyberd/data System mount points /mnt/disk1 /mnt/disk2 Soft links from /home/kyberd/teaching -> disk1 etc. Apparently have disk space but unable to add files in some directories Make a number of disks appear as a single disk with a single size.

8 RAID Redundant Arrays of Inexpensive Disks Management issue Large disks were expensive smaller disks were cheap. RAID was originally about cost. Cheap disks are more likely to fail; more disks and failure more likely. Expensive disk with an annual failure rate of 0.1%. Cheap disks with an annual failure rate of 0.2%. But if any of the cheap disks fail the system fails and the joint failure rate is nearly 1.2%. Expensive disks have a higher areal density (large capacity) and higher spin speed. Lower latency and faster transfer. Two low spec disks can give better performance than 1 high spec (for certain applications). May not be cheaper BUT If we do or three simultaneous reads, and if the reads come off different disks, the cheap option might be faster. If we replace 1 disk with 0.1% failure with 2 with 0.2% failure but make the two disks identical the failure rate is 0.0004%. (Higher power requirements & more space). The disk controller can choose to read off different disks

9 RAID Levels Where do you want performance? Interesting case study: performance is not just a hardware issue. It is how the raw speed is utilised. RAID 0 :Just a set of disks. No redundancy, no space overheads. May be striped for faster access. Reads from either disk, writes to both. RAID 1: also called mirroring. 1 check disk for every disk. Low space efficiency 50%. No calculation overheads. May be optimised for disk reads, but this may cause writes to take longer. Mirroring precedes RAID Recovery from single disk failure. RAID 2: Based on error correcting memory. 1 check disk per 2 data disks. Data split at the bit level. Not used, because of heavy overheads and more use of onboard error correction. Recovery from single disk failure. RAID 3: Bit interleaved parity. Each bit from a byte is written to a separate disk. One extra disk for eight data disks takes the parity bit for each byte. Can recover from any single disk failure. Very little overhead for the check. Good performance for read/writing large files. Not so good for small /random I/O.

10 RAID Levels Where do you want performance? Recovery from single disk failure. Improvements complicate electronics increase cost. RAID 4: Block-interleaved parity. An enhancement to make RAID 3 better for small accesses. Again uses a single parity disk (which can constitute a pinch point, but it writes blocks on disks not bits (relying on onboard checking. Means that disk reads are independent. Parity disk can be a bottleneck. Disk writes take longer on naïve RAID 4, because writing a block means reading the corresponding block on the other three disks and calculating the parity block and then writing that. 4 reads and 2 writes. Improvement. Read existing data and existing parity. Calculate differences between old and new and from that update the parity block. 2 reads and 2 writes. RAID3. RAID4.

11 RAID Levels Where do you want performance? Recovery from single disk failure. RAID 5: RAID4, but with the check block spread evenly across all five disks. Removes parity disk bottleneck. Benefits of both 3 and 4, but complicated controller. Widely used If you are dominated by large read/writes RAID3 may still be best Recovery from single disk failure. RAID 6 P+Q two parity bits allow recovery from two failures RAID 01 v 10: With 2n disks, do you RAID0 n of them and mirror on the other n RAID 01: mirrored stripes. Mirror them in n pairs and stripe across the n. RAID 10: striped mirrors Or RAID 0+1 RAID 1+0

12 RAID Levels Multiple disk failures Large disks increase time for raid rebuild Companies are introducing systems which allow hot replacement. Remove bad disk, insert new disk. Rebuild disk for the information on other disks in the set, without the disk becoming unavailable. Clearly this can reduce the MTTR to zero (if we ignore dual failures). Dual failures are more common than the square of the single failure rate. Correlated failure manufacturing defect. Environmental problems. 1 TB disks can take days to rebuild possibility of second failure. Especially since fails are not always independent. If reliability and performance are crucial. Need to benchmark during a rebuild. Because a single disk on a set may fail silently, must have alert

13 Example I/O Performance Given a Sun Fire x4150 system with Workload: 64KB disk reads Each I/O op requires 200,000 user-code instructions and 100,000 OS instructions Each CPU: 10 9 instructions/sec FSB: 10.6 GB/sec peak DRAM DDR2 667MHz: 5.336 GB/sec PCI-E 8 bus: 8 250MB/sec = 2GB/sec Disks: 15,000 rpm, 2.9ms avg. seek time, 112MB/sec transfer rate What I/O rate can be sustained? For random reads, and for sequential reads I/O rate for CPUs Per core: 10 9 /(100,000 + 200,000) = 3,333 8 cores: 26,667 ops/sec Random reads, I/O rate for disks Assume actual seek time is average/4 Time/op = seek + latency + transfer = 2.9ms/4 + 4ms/2 + 64KB/(112MB/s) = 3.3ms 303 ops/sec per disk, for 8 disks 2424 ops/sec Sequential reads 112MB/s / 64KB = 1750 ops/sec per disk for 8 disks 14,000 ops/sec

14 Example I/O Performance PCI-E I/O rate 2GB/sec / 64KB = 31,250 ops/sec DRAM I/O rate 5.336 GB/sec / 64KB = 83,375 ops/sec FSB I/O rate Assume we can sustain half the peak rate 5.3 GB/sec / 64KB = 81,540 ops/sec per FSB for 2 FSBs 163,080 ops/sec CPU Random Disk Sequential Disk 26,667 ops/sec 2424 ops/sec 14,000 ops/sec Bottleneck disks by at least a factor 2. As a service provider how can I provide disk performance to fully utilise my CPUs Or should I buy cheaper/slower cpu s because my disks are limiting my performance. At present Efficiency for cpu operation will only be about 50% CPU time used. Wall time