Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers



Similar documents
Introduction to Cloud Computing

Cloud Computing Training

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing and Amazon Web Services


INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

Introduction to Cloud Computing

Cloud Courses Description

Cloud Computing; What is it, How long has it been here, and Where is it going?

An Introduction to Cloud Computing Concepts

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

How To Understand Cloud Computing

Planning the Migration of Enterprise Applications to the Cloud

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

Private Cloud 201 How to Build a Private Cloud

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Cloud Computing. Karan Saxena * & Kritika Agarwal**

How To Understand Cloud Computing

CLOUD COMPUTING USING HADOOP TECHNOLOGY

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Emerging Technology for the Next Decade

Viswanath Nandigam Sriram Krishnan Chaitan Baru

An Overview of Codes Tailor-made for Networked Distributed Data Storage

White Paper on CLOUD COMPUTING

Topics. Images courtesy of Majd F. Sakr or from Wikipedia unless otherwise noted.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Contents. 1. Introduction

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models.

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Unit 10b: Introduction to Cloud Computing

A programming model in Cloud: MapReduce

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

OVERVIEW Cloud Deployment Services

Cloud computing - Architecting in the cloud

PaaS Cloud Migration Migration Process, Architecture Problems and Solutions. Claus Pahl and Huanhuan Xiong

Cloud Computing. Chapter 1 Introducing Cloud Computing

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Cloud Computing in the Enterprise An Overview. For INF 5890 IT & Management Ben Eaton 24/04/2013

Cloud Computing Paradigm Shift. Jan Šedivý

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

Tamanna Roy Rayat & Bahra Institute of Engineering & Technology, Punjab, India talk2tamanna@gmail.com

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Oracle Applications and Cloud Computing - Future Direction

Cloud Computing An Elephant In The Dark

Hadoop. Sunday, November 25, 12

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

CHAPTER 8 CLOUD COMPUTING

Cloud Computing. Course: Designing and Implementing Service Oriented Business Processes

OWASP Chapter Meeting June Presented by: Brayton Rider, SecureState Chief Architect

INTRODUCTION TO CASSANDRA

Apache Hadoop. Alexandru Costan

IBM Spectrum Protect in the Cloud

Cloud Computing Summary and Preparation for Examination

Why Private Cloud? Nenad BUNCIC VPSI 29-JUNE-2015 EPFL, SI-EXHEB

Where in the Cloud are You? Session Thursday, March 5, 2015: 1:45 PM-2:45 PM Virginia (Sheraton Seattle)

Cloud Security: Evaluating Risks within IAAS/PAAS/SAAS

High Performance Computing Cloud Computing. Dr. Rami YARED

Sriram Krishnan, Ph.D.

Zadara Storage Cloud A

Session 3. the Cloud Stack, SaaS, PaaS, IaaS

Microsoft Private Cloud Fast Track

Infrastructure as a Service

A STUDY ON CLOUD STORAGE

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

A.Prof. Dr. Markus Hagenbuchner CSCI319 A Brief Introduction to Cloud Computing. CSCI319 Page: 1

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

TOP 7 THINGS Every Executive Should Know About Cloud Computing EXECUTIVE BRIEF

CLOUD COMPUTING. A Primer

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Lecture 02a Cloud Computing I

How Microsoft Designs its Cloud-Scale Servers

Cloud Computing Now and the Future Development of the IaaS

Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud

Big Data and Apache Hadoop s MapReduce

ArcGIS for Server: In the Cloud

OTM in the Cloud. Ryan Haney

EMPOWER DATA PROTECTION AND DATA STORAGE IN CLOUD COMPUTING USING SECURE HASH ALGORITHM (SHA1)

Index Terms Cloud Storage Services, data integrity, dependable distributed storage, data dynamics, Cloud Computing.

Cloud Courses Description

Elevate your analytics with SAS in the cloud

Hadoop IST 734 SS CHUNG

Data Centers and Cloud Computing. Data Centers

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

Managed Cloud Services

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

CDBMS Physical Layer issue: Load Balancing

Enhancing Operational Capacities and Capabilities through Cloud Technologies

How To Understand Cloud Computing

Transcription:

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers Anwitaman Datta Joint work with Frédérique Oggier (SPMS) School of Computer Engineering Nanyang Technological University Infocomm Professional Development Forum 7 th July 2011, Singapore

Self-* Aspects of Networked Distributed Systems Me, myself & SANDS http://sands.sce.ntu.edu.sg/ http://www.ntu.edu.sg/home/anwitaman/

Outline o Data Centers - the heart of the Cloud o Replication, RAID & Erasure Codes o Erasure tailor-made for distributed networked storage Self-repairing o Wrap-up

What is the Cloud? - Cloud - Storage systems o At least, we can all agree Cloud is something big and happening! It s all of these and some more! Old wine Data center SaaS IaaS PaaS %*$aas

NIST Definition for Cloud Computing - Cloud - Storage systems o Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

Two Sides of the Cloud Coin - Cloud - Storage systems o Outside view A single/exclusive entity Access through a demilitarized zone API based Agnostic to multi-tenancy Infinite/elastic resources Pay per use, on-demand, Browser based access (often) Anytime, anywhere, any device

Two Sides of the Cloud Coin - Cloud - Storage systems o Inside view Pool of resources In flux: New compute units joining, old ones retiring Self-*: Load-balancing, fault-tolerance, autoconfiguration, Multi-tenancy Virtualization, transparent migration, Distributed file system and data-management Google s GFS, Amazon s Dynamo, Yahoo! s Pnuts Map Reduce/Hadoop, Pig, Chubby,

The New Stack - Cloud - Storage systems SQL Implementations e.g., PIG (relational algebra), HIVE, Applications NoSQL e.g., Map Reduce, Hadoop, BigTable, Hbase, Cassandra Distributed File System (e.g., Key Value Store) Reliable storage service Distributed Physical Infrastructure: Storage/Compute Nodes Disclaimer: This is a personal view, and may not be standard/universal

Data Center - Cloud - Storage systems o Essentially a networked distributed storage system Source of topology: http://www.cisco.com/en/us/docs/solutions/enterprise/data_center/vmware/vmware.html

Data Center Design Evolution Gen 1 DC Collocation Gen 2 Gen 3 Gen 4 (future) Modular Data Center Deployment Scale Unit Server Rack Containers Pre Assembled Components Capacity Density and Sustainability Scalability Thousands of Servers Slide courtesy Roger Barga (Microsoft) from his P2P 2009 Keynote talk Right Time to Market, Lower TCO (PUE) Scalable Data Centers

Failure (of individual nodes) is inevitable - Cloud - Storage systems o But, failure of the system is not an option! o Solution: Redundancy

Is the Danger Real? Yes - Cloud - Storage systems Cloud is NSFW

Is the Danger Real? Yes - Cloud - Storage systems o There is also the danger of data falling in wrong hands, e.g. due to security breach o Security/privacy issues are out of the scope of this talk @ SANDS we work on those issues also A*Star TSRP project pcloud http://sands.sce.ntu.edu.sg/pcloud/

Data Center Fault-Tolerance - Existing approaches - Has EC a role? o Faults are omnipresent Hardware, network, software, human, misconfiguration, o Cascade of failures in interdependent networks Power failure => Network switches stop working Network failure => Control system for power system ineffective

Redundancy Based Fault Tolerance - Existing approaches - Has EC a role? o Replicate data e.g., 3 or more copies In nodes on different racks Can deal with switch failures o Power back-up using battery between racks (Google)

Redundancy Based Fault Tolerance - Existing approaches - Has EC a role? o Using independent physical infrastructure Over different availability zones (Amazon AZ) How independent are components in a complex network? Over multiple geographical regions

Amazon s AWS: Availability Zones - Existing approaches - Has EC a role? Note: The recent (April 2011) AWS outage was the first region wide failure

Five Levels of Redundancy - Existing approaches - Has EC a role? o Physical o Virtual resource o Availability zone o Region o Cloud From: http://broadcast.oreilly.com/2011/04/the aws outage the clouds shining moment.html

At What Cost? - Existing approaches - Has EC a role? o Failure is not an option, but are the overheads acceptable?

Reducing the Overheads of Redundancy - Existing approaches - Has EC a role? o Erasure Much lower storage overhead High level of fault-tolerance

Erasure Codes for Networked Storage Data = Object O 1 O 2 Encoding B 1 B 2 B l Retrieve any k ( k) blocks Decoding O 1 O 2 Reconstruct Data O k k blocks Lost blocks B n n encoded blocks (stored in storage devices in a network) O k Original k blocks

Replenishing Lost Redundancy for ECs B 1 B 2 B n Retrieve any k ( k) blocks Lost blocks n encoded blocks o Repairs needed for long-term resilience Decoding O 1 O 2 O k Original k blocks Encoding o Repairs are expensive! Recreate lost blocks B l Re insert Reinsert in (new) storage devices, so that there is (again) n encoded blocks

Can We Do Better? - Existing approaches - Has EC a role? o What is the best one can do (w.r.to repairs)? Minimize bandwidth usage per repair Minimize number of live nodes used per repair o Erasure have some other drawbacks Coding/Decoding is Expensive In contrast to replication or RAID/XOR based systems Systematic can help (with decoding/access)! Not adequate when load-balancing is also an issue!! More complex system design We do not attempt to address these explicitly But, some solution we will arrive at will be amenable!

Can We Do Better? - Pyramid - Regenerating Codes - Self-repairing o Self-repairing Codes: Erasure tailor-made for distributed networked storage

Self-repairing Codes: Blackbox View - Pyramid - Regenerating Codes - Self-repairing B 1 B 2 Retrieve some k (< k) blocks (e.g. k =2) to recreate a lost block B l Re insert B n Lost blocks n encoded blocks (stored in storage devices in a network) Reinsert in (new) storage devices, so that there is (again) n encoded blocks

Self-repairing Codes - Pyramid - Regenerating Codes - Self-repairing o There is at least one pair to repair a node, for up to (n -1)/2 simultaneous failures Parallel & fast repair of multiple fairs o Example Data object split in four parts: PSRC(n=5, k=3)

Toy Example: PSRC(5,3) repair - Pyramid - Regenerating Codes - Self-repairing (o 1 +o 2 +o 4 ) + (o 1 ) => o 2 +o 4 Repair using two nodes (o 3 ) + (o 2 +o 3 ) => o 2 Say N (o 1 ) + (o 2 ) => o 1 + o 1 and N 3 2 Four pieces needed to regenerate two pieces (o 2 ) + (o 4 ) => o 2 + o 4 Repair using three nodes (o 1 +o 2 +o 4 ) + (o 4 ) => o 1 +o 2 Say N 2, N 3 and N 4 Three pieces needed to regenerate two pieces

Toy Example: PSRC(5,3) reconstruction - Pyramid - Regenerating Codes - Self-repairing o 3 o 4 (o 3 ) + (o 1 +o 3 ) => o 1 (o 1 ) +(o 4 )+(o 1 +o 2 +o 4 ) => o 2 Reconstruction, say using N 3, N 4 and N 5

Symmetry in SRCs - Pyramid - Regenerating Codes - Self-repairing o All encoded blocks have symmetric role Equivalent importance of all blocks for both data reconstruction & repair o Symmetry is good Easy to analyze, understand and implement Simpler algorithm and system design

Maximum Distance Separable (MDS)? - Pyramid - Regenerating Codes - Self-repairing o SRC is not MDS (and can not be!) Does it matter? Not much In practice, access will be planned PSRC needs less bandwidth than `optimal RGC! This is with random access PSRC(21,3)

Practical properties - Pyramid - Regenerating Codes - Self-repairing o (Current) SRCs are not systematic PSRC is like systematic Need to contact more nodes (than k) To obtain systematic `pieces Same total bandwidth usage Parallel download for access can even be an `advantage `mixed strategies for access, i.e. get some systematic pieces, and some others Power saving (by switching off nodes) strategies possible

Practical properties - Pyramid - Regenerating Codes - Self-repairing o Self-repair implies somewhat locally decodable If access to only part of the whole object is desired o Coding/decoding in PSRC are both using XOR operations only

Outlook o 2020: Self-repairing in a data-center near you? o Ongoing: Concepts/Implementation Prototype miniature data-center Template for preassembled component of a modular 4G+ data center o Interested to Follow: http://sands.sce.ntu.edu.sg/codingfornetworkedstorage/ Get involved: {anwitaman,frederique}@ntu.edu.sg