Flexible Storage Allocation



Similar documents
Managing Storage Space in a Flash and Disk Hybrid Storage System

Violin: A Framework for Extensible Block-level Storage

CHAPTER 17: File Management

Flash-Friendly File System (F2FS)

Chapter 11: File System Implementation. Operating System Concepts 8 th Edition

Computer Engineering and Systems Group Electrical and Computer Engineering SCMFS: A File System for Storage Class Memory

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DualFS: A New Journaling File System for Linux

COS 318: Operating Systems

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Chapter 12 File Management. Roadmap

Chapter 12 File Management

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System

AIX NFS Client Performance Improvements for Databases on NAS

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

A Data De-duplication Access Framework for Solid State Drives

Accelerating Server Storage Performance on Lenovo ThinkServer

IncidentMonitor Server Specification Datasheet

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Datacenter Operating Systems

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Choosing Storage Systems

Today s Papers. RAID Basics (Two optional papers) Array Reliability. EECS 262a Advanced Topics in Computer Systems Lecture 4

Blueprints for Scalable IBM Spectrum Protect (TSM) Disk-based Backup Solutions

CSE 120 Principles of Operating Systems

Maximizing VMware ESX Performance Through Defragmentation of Guest Systems. Presented by

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

A Filesystem Layer Data Replication Method for Cloud Computing

An On-line Backup Function for a Clustered NAS System (X-NAS)

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Cray DVS: Data Virtualization Service

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

Big data management with IBM General Parallel File System

The Data Placement Challenge

CS 153 Design of Operating Systems Spring 2015

WHITE PAPER. Drobo TM Hybrid Storage TM

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Software-defined Storage at the Speed of Flash

WHITE PAPER Optimizing Virtual Platform Disk Performance

June Blade.org 2009 ALL RIGHTS RESERVED

Network Attached Storage. Jinfeng Yang Oct/19/2015

HyperQ Storage Tiering White Paper

Encrypt-FS: A Versatile Cryptographic File System for Linux

File Systems Management and Examples

Storage Management. in a Hybrid SSD/HDD File system

Integrating Flash-based SSDs into the Storage Stack

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

VNX HYBRID FLASH BEST PRACTICES FOR PERFORMANCE

How to recover a failed Storage Spaces

Virtuoso and Database Scalability

Understanding Disk Storage in Tivoli Storage Manager

HP reference configuration for entry-level SAS Grid Manager solutions

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik

Automated Data-Aware Tiering

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Encrypted File Systems. Don Porter CSE 506

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

EMC Unified Storage for Microsoft SQL Server 2008

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Data Center Performance Insurance

OSDsim - a Simulation and Design Platform of an Object-based Storage Device

Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Exploring RAID Configurations

Storage Architectures for Big Data in the Cloud

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Speeding Up Cloud/Server Applications Using Flash Memory

Big Fast Data Hadoop acceleration with Flash. June 2013

The What, Why and How of the Pure Storage Enterprise Flash Array

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Ceph. A file system a little bit different. Udo Seidel

Recovery Protocols For Flash File Systems

Solid State Storage in Massive Data Environments Erik Eyberg

Chapter 12 File Management

Configuring ThinkServer RAID 500 and RAID 700 Adapters. Lenovo ThinkServer

New Storage System Solutions

RAID Implementation for StorSimple Storage Management Appliance

Virtualization. Dr. Yingwu Zhu

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

<Insert Picture Here> Btrfs Filesystem

OBFS: A File System for Object-based Storage Devices

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

On Benchmarking Popular File Systems

Transcription:

Flexible Storage Allocation A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at IBM Almaden) John Garrison

Outline Big Picture Part I: Flexible Storage Allocation Introduction and Motivation Design of Virtual Allocation Evaluation Part II: Data Distribution in Networked Storage Systems Introduction and Motivation Design of User-Optimal Data Migration Evaluation Part III: Storage Management across diverse devices Conclusion 2 Texas A&M University Narasimha Reddy

Storage Allocation Allocate entire storage space at the time of the file system creation Storage space owned by one operating system cannot be used by another Windows NT (NTFS) Linux (ext2) AIX (JFS) Actual Allocations 30 GB 70 GB 50 GB 50 GB 98 GB Running out of space! 3 Texas A&M University Narasimha Reddy

Big Picture Memory systems employ virtual memory for several reasons Current storage systems lack such flexibility Current file systems allocate storage statically at the time of their creation Storage allocation: Space on the disk is not allocated well across multiple file systems 4 Texas A&M University Narasimha Reddy

File Systems with Virtual Allocation When a file system is created with X GB, Allows the file system to be created with only Y GB, where Y << X Remaining space used as one common available pool As the file system grows, the storage space can be allocated on demand Windows NT (NTFS) Linux (ext2) AIX (JFS) Actual Allocations 30 GB 10 GB 50 GB 10 GB 98 GB 100 60 GB Common Storage 40 GB Pool 5 Texas A&M University Narasimha Reddy

Our Approach to Design Physical block address Physical Disk Employ Allocate-on-write policy Storage space is allocated when the data is written Writes all data to disk sequentially based on the time at which data is written to the device Once data is written, data can be accessed from the same location, i.e., data is updated in-place 6 Texas A&M University Narasimha Reddy

Allocate-on-write Policy Extent Write at t = t Physical Disk Storage space is allocated by the unit of the extent when the data is written Extent is a group of file system blocks Fixed size Retain more spatial locality Reduce information that must be maintained 7 Texas A&M University Narasimha Reddy

Allocate-on-write Policy Extent 0 Write at t = t (where t > t ) Extent 1 Write at t = t Physical Disk Data is written to disk sequentially based on write-time Further writes to the same data updated in-place VA (Virtual Allocation) requires additional data structure 8 Texas A&M University Narasimha Reddy

Block Map Extent 0 Write at t = t (where t > t ) Extent 1 Extent 2 Write at t = t Physical Disk Block map Block map keeps a mapping of logical storage locations and real (physical) storage locations 9 Texas A&M University Narasimha Reddy

VA Metadata VA Meta data Extent 0 Extent 1 Extent 2 Physical Disk Hardening Block map This block map is maintained in memory and regularly written to disk for hardening against system failures VA Metadata represents the on-disk block map 10 Texas A&M University Narasimha Reddy

On-disk Layout & Storage Expansion Storage Expansion Threshold VA Meta data FS Meta data Extent 0 Extent 1 Extent 2 Extent 3 Extent 4 Physical Disk Storage Expansion Extent 5 Extent 6 Extent 7 Virtual Disk When the capacity is exhausted or reaches storage expansion threshold, a physical disk can be expanded to other available storage resources File system unaware of the actual space allocation and expansion 11 Texas A&M University Narasimha Reddy

Write Operation Application Write Request File System Acknowledgement Buffer/Page Cache Layer Page Search VA block map Block I/O Layer (VA) Hardening Allocate new extent and update mapping information Disk VA Meta data FS Meta data Extent 0 Extent 1 Extent 2 Extent 3 12 Texas A&M University Narasimha Reddy

Read Operation Application Read Request File System Buffer/Page Cache Layer Search VA block map Block I/O Layer (VA) Disk VA Meta data FS Meta data Extent 0 Extent 1 Extent 2 Extent 3 13 Texas A&M University Narasimha Reddy

Allocate-on-write vs. Other Work Key difference from log-structured file systems (LFS) Only allocation is done at the end of log Updates are done in-place after allocation LVM still ties up storage at the time of file system creation 14 Texas A&M University Narasimha Reddy

Design Issues VA Metadata Hardening (File System Integrity) Must keep certain update ordering of VA metadata and FS (meta)data Extent-based Policy Example (with Ext2) I (inode), B (data block), V (VA block map) A B (B is allocated to A) File system-based Policy Example (with Ext3 ordered mode) 15 Texas A&M University Narasimha Reddy

Design Issues (cont.) Extent Size Larger extent size: Reduce block map size, retain more spatial locality, cause data fragmentation Reclaiming allocated storage space of deleted files Needed to continue to provide the benefits of virtual allocation Without reclamation, possible to turn virtual allocation into static allocation Interaction with RAID RAID remaps blocks to physical devices to provide device characteristics VA remaps blocks for flexibility Need to resolve performance impact of VA s extent size and RAID s chunk size 16 Texas A&M University Narasimha Reddy

Spatial Locality Observations & Issues Metadata and data separation Data clustering: Reduce seek distance Multiple file systems Data placement policy Allocate hot data in a high data region of disk Allocate hot data in the middle of the partition 17 Texas A&M University Narasimha Reddy

Implementation & Experimental Setup Virtual allocation prototype Kernel module for Linux 2.4.22 Employ a hash table in memory for speeding up VA lookups Setup A 3GHz Pentium 4 processor, 1GB main memory Red Hat Linux 9 with a 2.4.22 kernel Ext2 file system and Ext3 file system Workloads Bonnie++ (Large-file workload) Postmark (Small-file workload) TPC-C (Database workload) 18 Texas A&M University Narasimha Reddy

VA Metadata Hardening Compare EXT2 and VA-EXT2-EX Compare EXT3 and VA-EXT3-EX, VA-EXT3-FS 19 Texas A&M University Narasimha Reddy -7.3-3.3-1.2 +4.9 +8.4 +9.5

Reclaiming Allocated Storage Space Reclaim operation for deleted large files How to keep track of deleted files? Employed stackable file system: Maintain duplicated block bitmap Alternatively, could employ Life or Death at Block-Level (OSDI 04) work 20 Texas A&M University Narasimha Reddy

VA with RAID-5 Large-file workload Small-file workload Large-file workload with NVRAM Used Ext2 with software RAID-5 + VA NVRAM-X%: X% of total VA metadata size VA-RAID-5 NO-HARDEN VA-RAID-5 NVRAM-17% VA-RAID-5 NVRAM-4% VA-RAID-5 NVRAM-1% 21 Texas A&M University Narasimha Reddy

Data Placement Policy (Postmark) VA NORMAL partition: Same data rate across a partition VA ZCAV partition: Hot data is placed in high data region of a partition VA-NORMAL: start allocation from the outer cylinders VA-MIDDLE: start allocation from the middle of a partition 22 Texas A&M University Narasimha Reddy 16 24

Multiple File Systems VA-7GB: 2 x 3.5GB partition, 30% utilization VA-32GB: 2 x 16GB partition, 80% utilization Used Postmark VA-HALF: The 2 nd file system is created after 40% of the 1 st file system is written VA-FULL: 80% 23 Texas A&M University Narasimha Reddy

Real-World Deployment of Virtual Allocation Prototype built 24 Texas A&M University Narasimha Reddy

VA in Networked Storage Environment Flexible allocation provided by VA leads to Balancing locality vs. load balance issues 25 Texas A&M University Narasimha Reddy

Part II: Data Distribution Cold data Locality-based approach Hot data Use data migration (e.g. HP AutoRAID) Employ hot data migration from slower device (remote disk) to faster device (local disk) Load balancing-based approach (Striping) Exploit multiple devices to support the required data rates (e.g. Slice- OSDI 00) 26 Texas A&M University Narasimha Reddy

User-Optimal Data Migration data Locality is exploited first Data is migrated from Disk B to Disk A Load balancing is also considered If the load on Disk A is too high, data is migrated from Disk A to Disk B 27 Texas A&M University Narasimha Reddy

Migration Decision Issues write write read data Where to migrate: Use I/O request response time When to migrate: Migration threshold Initiate migration from Disk A to Disk B only when How to migrate: Limit number of concurrent migrations (Migration token) What data to migrate: Active data 28 Texas A&M University Narasimha Reddy

Design Issues Allocation policy Striping with user-optimal migration: will improve data access locality Sequential allocation with user-optimal migration: will improve load balancing Multi-user environment Each user migrates data in a user-selfish manner Migrations will tend to improve the performance of all users over longer periods of time 29 Texas A&M University Narasimha Reddy

Evaluation Implemented as a kernel block device driver Evaluated it using SPECsfs benchmark Configuration SPECsfs Performance Curve Single-User Multi-User 30 Texas A&M University Narasimha Reddy

Single-User Environment Configuration: (Allocation Policy)-(Migration Policy) STR (Striping), SEQ (Seq. Alloc.), NOMIG (No migration), MIG (User-Optimal migration) Striping with user-optimal migration Seq. allocation with useroptimal migration 31 Texas A&M University Narasimha Reddy

Single-User Environment (cont.) Comparison between migration systems Migration based on locality: hot data (remote local), cold data (local remote) 32 Texas A&M University Narasimha Reddy

Multi-User Environment - Striping Server A: Load from 100 to 700 Server B: Load from 50 to 350 33 Texas A&M University Narasimha Reddy

Multi-User Environment Seq. Allocation Server A: Load from 100 to 1100 Server B: Load from 30 to 480 34 Texas A&M University Narasimha Reddy

Storage Management Across Diverse Devices Flash storage becoming widely available More expensive than hard drives Faster random accesses Low Power consumption In Laptops now In hybrid storage systems soon Manage data across Different Devices Match application needs to device characteristics Optimize for performance, power consumption 35 Texas A&M University Narasimha Reddy 8/7/2007

Motivation VFS Allows many file systems underneath VFS maintains 1 to 1 mapping from namespace to storage Can we provide different storage options for different files for a single user? /user1/file1 storage system 1, /user2/file2 storage system 2 36 Texas A&M University Narasimha Reddy 8/7/2007

Normal File System Architecture Calc Impress Writer WinAmp /user1/file1 /user1/file2 /user2/file3 /user2/file4 User Space VFS Kernel /user1/* Ext2 /user2/* FAT32 Magnetic Disk Flash Drive 37 Texas A&M University Narasimha Reddy 8/7/2007

Umbrella File System Calc Impress Writer WinAmp /user1/file1 /user1/file2 /user1/file3 /user1/file4 User Space VFS Kernel UmbrellaFS /FS1/user1/file3 Ext3 Ext2 FAT32 /FS2/user1/file1 /FS3/user1/file4 /FS2/user1/file2 Magnetic Disk Encrypted Magnetic Disk Flash Drive 38 Texas A&M University Narasimha Reddy

Example Data Organization User View /usr /usr/dir1 /usr/dir1/foo.jpg /usr/dir1/foo.txt /usr/dir1/foo.avi /images/ usr /images/usr/dir1 /text/usr /text/usr/dir1 /media/ usr /media/usr/dir1 /images/usr/dir1/foo.jpg /text/usr/dir1/foo.txt /media/usr/dir1/foo.avi Underlying data organization 39 Texas A&M University Narasimha Reddy

Motivation --Policy Based Storage User or System administrator Choice Allow different types of files on different devices Reliability, performance, power consumption Layered Architecture Leverage benefits of underlying file systems Map applications to file systems and underlying storage Policy decisions can depend on namespace and metadata Example: Files not touched in a week slow storage system 40 Texas A&M University Narasimha Reddy 8/7/2007

Rules Structure Provided at mount time User specified Based on inode values (metadata) and filenames (namespace) Provides array of branches 41 Texas A&M University Narasimha Reddy 8/7/2007

Umbrella File System Sits under VFS to enforce policy Policy enforced at open and close times Policy also enforced periodically (less often) UmbrellaFS acts as a router for files Not only based on namespace, but also metadata 42 Texas A&M University Narasimha Reddy 8/7/2007

Inode Rules Structure Rule Inode/ Filename Field Match Value Branch 1 Inode file permissions = Read Only /fs1, /fs2 2 Filename n/a n/a n/a n/a 3 Inode file creation time >= 8:00 am, August 3 rd, 2007 /fs2 4 Inode file length < 20 KB /fs3 43 Texas A&M University Narasimha Reddy 8/7/2007

Inode Rules Provide in order of precedence First match Compare inode value to rule At file creation some inode values indeterminate Pass over those rules 44 Texas A&M University Narasimha Reddy 8/7/2007

Filename Rules Structure Rule Match String Branch 1 /*.avi /fs2,/fs1 2 /home/*.txt /fs1 3 /home/jgarrison/* /fs3 45 Texas A&M University Narasimha Reddy 8/7/2007

Filename Rules Once first filename rule triggered, all checked Similar to longest prefix matching Double index based on Path matching Filename matching Example: Rules: /home/*/*.bar, /home/jgarrison/foo.bar File: /home/jgarrison/foo.bar File matches second rule more closely (3 path length and 7 characters of file name vs. 3 path length and 4 characters of file name) 46 Texas A&M University Narasimha Reddy 8/7/2007

Evaluation Overhead Throughput CPU Limited I/O Limited Example Improvement 47 Texas A&M University Narasimha Reddy 8/7/2007

UmbrellaFS Overhead 40 35 Bonnie Read Overhead Ext2 Inode Rules Filename Rules Throughput (MB/s) 30 25 20 15 10 5 0 Ext2 1 2 4 8 16 32 Rules 48 Texas A&M University Narasimha Reddy

CPU Limited Benchmarks 49 Texas A&M University Narasimha Reddy 8/7/2007

I/O Limited Benchmarks 50 Texas A&M University Narasimha Reddy 8/7/2007

Flash vs. RAID5 Read Performance 51 Texas A&M University Narasimha Reddy 8/7/2007

Flash vs. RAID5 Write Performance Write Performance 70 60 Throughput (MB/s) 50 40 30 20 10 RAID 5 Flash SSD 0 1 10 100 1000 10000 File Size (kb) 52 Texas A&M University Narasimha Reddy

Flash and Disk Hybrid System 53 Texas A&M University Narasimha Reddy 8/7/2007

Disks with Encryption hardware Encryption Example 800 700 600 Time (s) 500 400 300 200 100 0 Partial Encryption Full Encryption 54 Texas A&M University Narasimha Reddy

Conclusion Virtual allocation allows Flexibility Improve the flexibility of managing storage across multiple file systems/platforms Enabled user-optimal migration Balance disk access locality and load balance automatically and transparently Adapt to changes of workloads and loads in each storage device Policy-based storage: Umbrella File System Allows matching application characteristics to devices 55 Texas A&M University Narasimha Reddy