BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Similar documents
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

BlobSeer: Enabling Efficient Lock-Free, Versioning-Based Storage for Massive Data under Heavy Access Concurrency

Going Back and Forth: Efficient Multideployment and Multisnapshotting on Clouds

Computing in clouds: Where we come from, Where we are, What we can, Where we go

A Cost-Evaluation of MapReduce Applications in the Cloud

Storage Architectures for Big Data in the Cloud

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

J. Parallel Distrib. Comput. BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds

Accelerating and Simplifying Apache

THE HADOOP DISTRIBUTED FILE SYSTEM

BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Hadoop Architecture. Part 1

Introduction to Cloud Computing

GeoGrid Project and Experiences with Hadoop

Google File System. Web and scalability

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Sistemi Operativi e Reti. Cloud Computing

Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29. Jean-Louis Pazat. IRISA/INSA Rennes, FRANCE MYRIADS Project Team

Hadoop IST 734 SS CHUNG

Performance Analysis of Mixed Distributed Filesystem Workloads

HPC performance applications on Virtual Clusters

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage

Evaluation Methodology of Converged Cloud Environments

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Energy Efficient MapReduce

Distributed File Systems

Chapter 7. Using Hadoop Cluster and MapReduce

Scala Storage Scale-Out Clustered Storage White Paper

Distributed File Systems

NoSQL and Hadoop Technologies On Oracle Cloud

Violin: A Framework for Extensible Block-level Storage

Evaluating HDFS I/O Performance on Virtualized Systems

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Amazon EC2 Product Details Page 1 of 5

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Cloud Computing at Google. Architecture

CSE-E5430 Scalable Cloud Computing Lecture 2

HDFS Users Guide. Table of contents

Big data management with IBM General Parallel File System

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Can High-Performance Interconnects Benefit Memcached and Hadoop?

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

HDFS Architecture Guide

Hadoop and Map-Reduce. Swati Gore

Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow

Cloud Computing Trends

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Assignment # 1 (Cloud Computing Security)

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS

Parallel Processing of cluster by Map Reduce

Hadoop: Embracing future hardware

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Mixing Hadoop and HPC Workloads on Parallel Filesystems

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Software Execution Protection in the Cloud

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

Managed Virtualized Platforms: From Multicore Nodes to Distributed Cloud Infrastructures

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Optimize the execution of local physics analysis workflows using Hadoop

Lecture 02a Cloud Computing I

Comparative analysis of Google File System and Hadoop Distributed File System

Cloud/SaaS enablement of existing applications

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Virtualizing Apache Hadoop. June, 2012

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Design and Evolution of the Apache Hadoop File System(HDFS)

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Introduction to OpenStack

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Cluster, Grid, Cloud Concepts

A Service for Data-Intensive Computations on Virtual Clusters

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

ATLAS Tier 3

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Cloud Computing. Chapter 1 Introducing Cloud Computing

A Study on Analysis and Implementation of a Cloud Computing Framework for Multimedia Convergence Services

Cloud Computing through Virtualization and HPC technologies

Transcription:

: Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu and Luc Bougé December 1, 2010 Bogdan Nicolae : Efficient data storage on distributed systems 1/ 45

Outline Context Related work and its limitations Contribution: Principles High level description Zoom on metadata management Synthetic benchmarks as a storage backend for Hadoop MapReduce providing virtual machine image storage for clouds as a QoS enabled storage service for applications running on the cloud Bogdan Nicolae : Efficient data storage on distributed systems 2/ 45

Information overload Large-scale computing infrastructure Data storage at large scale We live in exponential times... Every two years the amount of information doubles Making something useful out of it becomes incresingly difficult We depend more and more on large-scale computing infrastructure Bogdan Nicolae : Efficient data storage on distributed systems 3/ 45

Information overload Large-scale computing infrastructure Data storage at large scale Dealing with information overload: Enterprise datacenters Tens of thousands of machines in huge clusters Leveraged directly by the owner Commodity hardware: minimizes per unit cost Easy to add, upgrade and replace Data-intensive applications Bogdan Nicolae : Efficient data storage on distributed systems 4/ 45

Information overload Large-scale computing infrastructure Data storage at large scale Dealing with information overload: Clouds Computing as utility rather than capital investment Driven by pay-as-you-go model Several levels of abstraction: IaaS, PaaS, SaaS Several advantages: low entry cost, elasticity, rapid development Bogdan Nicolae : Efficient data storage on distributed systems 5/ 45

Context Information overload Large-scale computing infrastructure Data storage at large scale Dealing with the information overload: HPC infrastructures Complex scientific and engineering applications High-end hardware Manipulate information at petabyte-scale and beyond Bogdan Nicolae : Efficient data storage on distributed systems 6/ 45

Information overload Large-scale computing infrastructure Data storage at large scale Data storage and management is a key issue Requirements Easy manipulation of data High access throughput Scalability Data Metadata Bogdan Nicolae : Efficient data storage on distributed systems 7/ 45

Information overload Large-scale computing infrastructure Data storage at large scale Current approaches: Parallel file systems Mostly used in HPC infrastructures POSIX access interface Data striping Advanced caching Pros Distributed data MPI optimizations Cons Locking-based Too many small files Bogdan Nicolae : Efficient data storage on distributed systems 8/ 45

Information overload Large-scale computing infrastructure Data storage at large scale Current approaches: Data-intensive oriented file systems Huge files Writes at random offsets are seldom Files grow by atomic appends Fine grain concurrent reads Pros No locking Data location aware Cons Centralized metadata Expensive updates Bogdan Nicolae : Efficient data storage on distributed systems 9/ 45

Information overload Large-scale computing infrastructure Data storage at large scale Current approaches: Cloud data storage services Virtualize storage resources Pay for duration, size and traffic Flat naming scheme Simple access model Pros High data availability Versioning Cons Limited object size Limited concurrency control Bogdan Nicolae : Efficient data storage on distributed systems 10/ 4

Information overload Large-scale computing infrastructure Data storage at large scale Limitations of existing approaches Issue Parallel FS Data-intensive FS Cloud store Too many small files Centralized metadata No versioning support No fine grain writes Addressed Addressed Addressed Addressed Addressed Bogdan Nicolae : Efficient data storage on distributed systems 11/ 4

Information overload Large-scale computing infrastructure Data storage at large scale Limitations of existing approaches Issue Parallel FS Data-intensive FS Cloud store??? Too many small files Centralized metadata No versioning support No fine grain writes Addressed Addressed Addressed Addressed Addressed Addressed Addressed Addressed Addressed Bogdan Nicolae : Efficient data storage on distributed systems 12/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Data Sharing at Large Scale Bogdan Nicolae : Efficient data storage on distributed systems 13/ 4

Principles Context Principles High level description Metadata management Design issues Synthetic benchmarks BLOBs Eliminate need to keep many small files Provide fine-grain R/W access Data striping Distributes I/O workload Enables user to configure distribution strategy Decentralized metadata Distributes metadata at fine granularity Brings scalability and high avalability Versioning is a key design principle Bogdan Nicolae : Efficient data storage on distributed systems 14/ 4

Versioning as a key principle Principles High level description Metadata management Design issues Synthetic benchmarks Clients mutate BLOBs by submitting diffs A BLOB is never overwritten: a new snapshot is generated Only diffs are stored Clients see whole, fully independent snapshots Fine-grain read access to any past snapshot is possible Bogdan Nicolae : Efficient data storage on distributed systems 15/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: a versioning-oriented access interface id = CREATE() v = APPEND(id, size, buffer) v = WRITE(id, offset, size, buffer) (v, size) = GET RECENT(id) READ(id, v, offset, size, buffer) new id = CLONE(id, v) dv = MERGE(sid, sv, soffset, ssize, did, doffset) Bogdan Nicolae : Efficient data storage on distributed systems 16/ 4

Consistency semantics Context Principles High level description Metadata management Design issues Synthetic benchmarks Writes: are atomic and totally ordered No guarantee when they become visible to clients Finish before becoming visible to clients Reads: require a version explicitly GET RECENT does not guarantee latest version Can read any version older than GET RECENT Read exposed writes only Bogdan Nicolae : Efficient data storage on distributed systems 17/ 4

Advantages of this proposal Principles High level description Metadata management Design issues Synthetic benchmarks Access to historic data Revert to previous snapshots Track changes to data Exploit data parallelism better Avoid synchronization to achieve better throughput Build complex workflows Bogdan Nicolae : Efficient data storage on distributed systems 18/ 4

Architecture Context Principles High level description Metadata management Design issues Synthetic benchmarks Data providers Metadata providers Provider manager Allocation strategy Version manager Guarantees total ordering and atomicity Bogdan Nicolae : Efficient data storage on distributed systems 19/ 4

How does a read work? Context Principles High level description Metadata management Design issues Synthetic benchmarks 1 Select a version (optionally ask version manager for the most recently exposed version) 2 Fetch the corresponding metadata from the metadata providers 3 Contact providers in parallel and fetch the chunks into the local buffer Bogdan Nicolae : Efficient data storage on distributed systems 20/ 4

How does a write work? Context Principles High level description Metadata management Design issues Synthetic benchmarks 1 Get a list of providers, one for each chunk 2 Contact providers in parallel and write the chunks 3 Get a version number for the update 4 Add new metadata to consolidate the new version 5 Report the new version is ready Version manager will eventually expose it Bogdan Nicolae : Efficient data storage on distributed systems 21/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks How to guarantee total ordering and atomicity? Clients ask for a version number Version manager assigns version numbers Clients write metadata concurrently Clients confirm completion Version manager exposes versions in order of assignment Bogdan Nicolae : Efficient data storage on distributed systems 22/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Zoom on metadata management: Motivation Present fully independent snapshots in spite of writing only diffs Access performance should not degrade with increasing number of diffs Proposed so far: B-Trees, Shadowing Difficult to maintain in a distributed fashion Expensive synchronization for concurrent updates Our goals: Easy to manage in a distributed fashion Efficient concurrent updates Bogdan Nicolae : Efficient data storage on distributed systems 23/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

Principles High level description Metadata management Design issues Synthetic benchmarks Contribution: Versioning over Distributed Segment Trees Deals with distributing the metadata while avoiding expensive management Binary tree is associated to each BLOB snapshot Reads descend towards leaves, writes build new trees bottom-up Key ideas Keep metadata immutable Share whole sub-trees Bogdan Nicolae : Efficient data storage on distributed systems 24/ 4

Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

Metadata forward references Principles High level description Metadata management Design issues Synthetic benchmarks Solve the problem of efficient concurrent updates to the metadata Key idea: precalculate children of lower versions instead of waiting Example Initial BLOB Three concurrent writers finished writing their chunks Black is faster but knows about blue and links to the not-yet-existing node After red and blue finish, metadata is consistent Bogdan Nicolae : Efficient data storage on distributed systems 25/ 4

Design considerations Context Principles High level description Metadata management Design issues Synthetic benchmarks Event-driven, layered design Callbacks instead of blocking Asynchronous RPC for interprocess communication Metadata providers form a DHT Custom implementation Plugin-able allocation strategy on provider manager Round-robin load-balancing by default More adaptive solutions possible Bogdan Nicolae : Efficient data storage on distributed systems 26/ 4

Fault tolerance Context Principles High level description Metadata management Design issues Synthetic benchmarks Clients die during write Before asking for a version: no problem After asking for a version: delegate to metadata provider Data and/or metadata providers die Replication of chunks and metadata pieces Data and metadata immutable: no sync between replicas needed Version manager and/or provider manager dies Distributed state machine using leader election Bogdan Nicolae : Efficient data storage on distributed systems 27/ 4

Experimental platform: Grid 5000 Principles High level description Metadata management Design issues Synthetic benchmarks Experimental testbed distributed in 9 sites around France A total of more than 5000 cores Reservation system grants exclusive access for experiments x86 64 CPUs, >2 GB RAM, locally attached disks Interconnect: Gigabit Ethernet, Myrinet, Infiniband Bogdan Nicolae : Efficient data storage on distributed systems 28/ 4

Results: synthetic benchmarks Principles High level description Metadata management Design issues Synthetic benchmarks Data striping Metadata decentralization Bogdan Nicolae : Efficient data storage on distributed systems 29/ 4

Results: synthetic benchmarks (2) Principles High level description Metadata management Design issues Synthetic benchmarks Increase write pressure while reading Increase read pressure while writing Bogdan Nicolae : Efficient data storage on distributed systems 30/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Storage backend for Hadoop MapReduce Efficient VM image deployment and snapshotting on clouds QoS enabled storage for clouds Bogdan Nicolae : Efficient data storage on distributed systems 31/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications MapReduce Data-intensive oriented paradigm Covers a wide range of data-intensive application classes Users stick to a well-defined model Widely adopted: Google, Yahoo Popular open-source implementation: Hadoop Bogdan Nicolae : Efficient data storage on distributed systems 32/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications as a storage backend for Hadoop MapReduce Proposal replaces HDFS (default storage backend) Design issues Implement Hadoop API Hierarchic namespace for BLOBs Data prefetching Affinity scheduling: exposing data location Bogdan Nicolae : Efficient data storage on distributed systems 33/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: synthetic benchmarks Single writer Concurrent readers Bogdan Nicolae : Efficient data storage on distributed systems 34/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: Real MapReduce applications Write intensive Read intensive Bogdan Nicolae : Efficient data storage on distributed systems 35/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications In short Improvement of 11%-30% over HDFS for real MapReduce applications Potential to leverage versioning in Hadoop Further improve performance New features Bogdan Nicolae : Efficient data storage on distributed systems 36/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Virtual machine image storage for IaaS clouds On IaaS clouds users rent resources as VMs VM image customized with user application Two patterns: Multi-deployment: instantiate many VMs from the same image Multi-snapshotting: save state of VMs into independent images State-of-art: full pre-propagation, then copy images back to repository Our goal: reduce cost (execution time, storage space, network traffic) Bogdan Nicolae : Efficient data storage on distributed systems 37/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Proposal Store image in a striped fashion Leverage to store each image as a BLOB Mirror image contents locally Lazy scheme: only read from BLOB when needed Keep changes to image local Only the necessary parts are accessed Consolidate local changes into an independent image CLONE BLOB, then WRITE local changes to BLOB Provides illusion of independent images Only differences are stored Bogdan Nicolae : Efficient data storage on distributed systems 38/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: scalability of multi-deployment under concurrency VM boot: runtime speedup vs. pre-propagation VM boot: total network traffic vs. pre-propagation Bogdan Nicolae : Efficient data storage on distributed systems 39/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: performance of multi-snapshotting Avg. time to take snapshots Total network traffic (storage space) Bogdan Nicolae : Efficient data storage on distributed systems 40/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications In short Large speedup and network traffic savings over pre-propagation for multi-deployment Efficient multi-snapshotting Portable approach: does not depend on hypervisor to manage diffs Bogdan Nicolae : Efficient data storage on distributed systems 41/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Quality-of-service enabled storage for cloud applications Storage for data-intensive applications deployed on IaaS clouds QoS impacted by several factors Multiple customers share the same storage service Hardware components prone to failures Application access pattern We need: High aggregated throughput under concurrency Stable throughput for individual data accesses Bogdan Nicolae : Efficient data storage on distributed systems 42/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Proposal: QoS improvement methodology Methodology 1 Monitor storage service 2 Collect app feedback 3 Identify + classify behavior patterns using GloBeM 4 Prevent undesired patterns Input: MapReduce access patterns + faults Applied methodology to find bottlenecks Output: Improved allocation strategy Bogdan Nicolae : Efficient data storage on distributed systems 43/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications Results: improvement of throughput stability Clients separated from providers Clients co-deployed with providers Bogdan Nicolae : Efficient data storage on distributed systems 44/ 4

Storage backend for Hadoop MapReduce Virtual machine image storage for IaaS clouds Quality-of-service enabled storage for cloud applications In short General methodology to improve QoS for cloud storage Concretely: Reduction in standard deviation for read throughput of up to 25% Promising results for cloud providers to improve SLA for the same price Bogdan Nicolae : Efficient data storage on distributed systems 45/ 4