Integrating Flash-based SSDs into the Storage Stack



Similar documents
The Use of Flash in Large-Scale Storage Systems.

The Data Placement Challenge

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

SSDs: Practical Ways to Accelerate Virtual Servers

SSDs: Practical Ways to Accelerate Virtual Servers

Managing Storage Space in a Flash and Disk Hybrid Storage System

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

Automated Data-Aware Tiering

SOLID STATE DRIVES AND PARALLEL STORAGE

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Performance Brief: MegaRAID SAS 9265/9285 Series

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

Energy aware RAID Configuration for Large Storage Systems

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

WHITE PAPER. Drobo TM Hybrid Storage TM

Understanding Microsoft Storage Spaces

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

Big Fast Data Hadoop acceleration with Flash. June 2013

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Database Hardware Selection Guidelines

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

Integrated System and Process Crash Recovery in the Loris Storage Stack

Increasing Storage Performance

Answering the Requirements of Flash-Based SSDs in the Virtualized Data Center

Methods to achieve low latency and consistent performance

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

89 Fifth Avenue, 7th Floor. New York, NY White Paper. HP 3PAR Adaptive Flash Cache: A Competitive Comparison

PARALLELS CLOUD STORAGE

HyperQ Storage Tiering White Paper

Flexible Storage Allocation

Accelerating Server Storage Performance on Lenovo ThinkServer

VNX HYBRID FLASH BEST PRACTICES FOR PERFORMANCE

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Solid State Storage in Massive Data Environments Erik Eyberg

Parallels Cloud Storage

Analysis of VDI Storage Performance During Bootstorm

ReconFS: A Reconstructable File System on Flash Storage

Benchmarking Hadoop & HBase on Violin

How SSDs Fit in Different Data Center Applications

Flash Memory Technology in Enterprise Storage

Flash-optimized Data Progression

Oracle Aware Flash: Maximizing Performance and Availability for your Database

DEPLOYING ORACLE DATABASE APPLICATIONS ON EMC VNX UNIFIED STORAGE

IOmark-VM. DotHill AssuredSAN Pro Test Report: VM a Test Report Date: 16, August

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Flash at the price of disk Redefining the Economics of Storage. Kris Van Haverbeke Enterprise Marketing Manager Dell Belux

Hybrid Storage Management for Database Systems

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Maxta Storage Platform Enterprise Storage Re-defined

Synology High Availability (SHA)

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

FAS6200 Cluster Delivers Exceptional Block I/O Performance with Low Latency

Boost Database Performance with the Cisco UCS Storage Accelerator

Hadoop: Embracing future hardware

EqualLogic PS Series Architecture: Hybrid Array Load Balancer

All-Flash Storage Solution for SAP HANA:

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Flash-Friendly File System (F2FS)

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Accelerate Oracle Performance by Using SmartCache of T Series Unified Storage

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

Hypertable Architecture Overview

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Scaling from Datacenter to Client

Reduce Latency and Increase Application Performance Up to 44x with Adaptec maxcache 3.0 SSD Read and Write Caching Solutions

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Essentials Guide CONSIDERATIONS FOR SELECTING ALL-FLASH STORAGE ARRAYS

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Application-Focused Flash Acceleration

A client side persistent block cache for the data center. Vault Boston Luis Pabón - Red Hat

Optimizing SQL Server Storage Performance with the PowerEdge R720

NV-DIMM: Fastest Tier in Your Storage Strategy

SLIDE 1 Previous Next Exit

The Pitfalls of Deploying Solid-State Drive RAIDs

Windows 8 SMB 2.2 File Sharing Performance

I-Motion SQL Server admin concerns

The Classical Architecture. Storage 1 / 36

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

An Exploration of Hybrid Hard Disk Designs Using an Extensible Simulator

Transcription:

Integrating Flash-based SSDs into the Storage Stack Raja Appuswamy, David C. van Moolenbroek, Andrew S. Tanenbaum Vrije Universiteit, Amsterdam April 19, 2012

Introduction: Hardware Landscape $/GB of flash SSDs is still much higher than HDDs Flash-only installations are prohibitively expensive Hybrid Storage Architectures (HSA) - a viable alternative Use high-performance SSDs in concert with high-density HDDs Caching HSAs Extends the two-level RAM/HDD memory hierarchy SSDs used as intermediate caches Dynamic Storage Tiering HSAs Establishes tiers of devices based on performance SSDs used for primary data storage

Introduction: Complex Workloads The duality of storage workloads Isolated: well-defined app-specific access patterns Virtualized: disjoint I/O requests blended into a single stream Two fundamental questions need to be answered How do DST/Caching systems fare under such workloads? Is the one-architecture-per-installation approach correct? We need a modular, flexible hybrid storage framework Perform side-by-side comparison of various architectures Understand the impact of design alternatives

The Loris Storage Stack - Layers and Interfaces File-based interface between layers Each file has a unique file identifier Each file has a set of attributes File-oriented requests: create delete read write truncate getattr setattr sync

Loris - Division of Labor Naming Cache Logical Physical POSIX call processing Directory handling Data caching File-level RAID Parental checksums Metadata caching On-disk layout

Loris: A Hybrid Storage Framework Functionalities required to support DST/Caching Collecting access statistics to classify data Transparent background migration We extended the Logical Layer Exploiting the logical file abstraction Access statistics reflect real storage workload Flexible, modular plugin-based realization Data collection plugin gathers statistics Migration plugin handles transparent migration

Loris: Data Collection Plugin Several access statistics proposed by prior research Extent-level IOPS and bandwidth statistics (EDT-DST) Block-level access frequency (Azor-Caching) Our current implementation is based on Inverse Bitmaps Compute a bitmap value for each read/write operation b = 2 6 log 2 (N) (1) Bitmap value added to a per-file, in-memory counter Counter value indicates hotness of each file Prioritizes small, random I/Os over large, sequential ones

Loris-based Hybrid Systems All Loris-based hybrid systems are file based Migration/Caching at whole-file granularities File granularity is only a limitation of the current prototype Inverse Bitmaps used for hot data identification Data collection techniques are architecture neutral All hybrid systems share the SSD cleaner implementation Cleaning is triggered reactively Side effect of writes that encounter a lack of space Both foreground and background writes trigger cleaning

Loris-based Conventional Hybrid Systems Popular DST/Caching variants Interval-driven Hot-DST Migrates hot files to SSD tier periodically Migration interval is an important design parameter On-demand Write-through Caching Caches hot files in SSD tier as a side effect of read operation Updates both copies on writes

Loris-based Unconventional Hybrid Systems On-demand Hot-DST Migrate hot files as a side-effect of read operation Absence of a data copy in contrast to Caching Interval-driven Write-through Caching Periodically cache hot files in SSD Low-overhead SSD cleanup in contrast to DST On-demand Cold-DST Initially allocate all files in SSD SSD cleaner evicts cold files to accommodate new files Migrate back once-cold, but now-hot files from HDD

Benchmarks and Workload Generators: Quirks Used variety of benchmarks/workload generators File Server, Web Server, and Mail Server workload types Parameters: file size, dir depth, r/w ratio, etc. Benchmarks lack locality by default Uniform/random access pattern Grossly underestimates effectiveness of DST/Caching Need to extract workload properties from file system traces Beware of transaction-bound benchmarks PostMark unlike FileBench is transaction bound Interval-based systems might fail to reach equilibrium

Results: Caching vs Hot-DST Caching excels in read-heavy workloads (WebServer) Interval-driven/on-demand Caching faster than DST Cheap SSD cleanup by cached copy invalidation Hot-DST excels in write-heavy workloads (FileServer) Interval-driven/on-demand DST faster than Caching No expensive synchronization of cached copy Is write-back caching worth the complexity? Complicates consistency/availability maintenance But offers Cache-like read and DST-like write performance

Results: Caching vs Hot-DST (2) On-demand migration/caching systems outperform their interval-driven counterparts Quick responsiveness in read-heavy workloads But why is this the case in write-heavy workloads? Append-Read Inverse Bitmap interaction An append operation first reads last file block A single block read results in high increment to access counter Actual write buffered in OS cache On-demand migration migrates/caches file to/in SSD SSD services the write operation at a later time Need for semantic awareness In the long run, append reads fill SSD with write-only logs Being file aware, Loris can isolate append reads

Results: Cold-DST Cold-DST outperforms rest under most workloads Buffering allocation writes in the SSD tier boosts performance Scan-resistant Inverse Bitmaps retains hot files in the SSD tier Scales better as it avoids unnecessary background migration Configuration free unlike Interval-driven systems Workload patterns also favor Cold-DST architecture 90% of newly created files are opened less than 5 times Proactive cold migration can exploit SSD parallelism to improve performance

Results: Cold-DST (2) Cold-DST systems share several advantages with write-back caching without their disadvantages No synchronization overhead for maintaining consistency Efficient space utilization Admitting allocation writes and sieving cold data writes However, more research is required to address Accelerated SSD wear due to excessive writes Are recent SW/HW reliability techniques sufficient? Performance deterioration caused by using all SSD capacity Can over provisioning solve this problem? Performance deterioration caused by high-cost random writes Is random write performance still an issue with modern SSDs?

Conclusion Hybrid storage systems are effective and efficient No one architecture fits all workloads - Not yet! Can Cold-DST be the last word in hybrid architectures? How does Cold-DST stack up wrt write-back caching? Pairing workloads with ideal architectures Preliminary results under virtualized workloads are encouraging More results/details in the paper