Network Coding for Distributed Storage

Size: px
Start display at page:

Download "Network Coding for Distributed Storage"

Transcription

1 Network Coding for Distributed Storage Alex Dimakis USC

2 Overview Motivation Data centers Mobile distributed storage for D2D Specific storage problems Fundamental tradeoff between repair communication and storage. Systematic Repair (open problem) Distributed storage allocations (open problem) 2

3 Motivation: Data centers Warehouse-sized computing and storage facilities. Cost in the hundreds of millions. Large-scale distributed storage: thousands of servers. Petabytes of disc space. Internet Data centers are the next computing platform: Web search, indexing, Gmail, Facebook, Video storage, 3

4 Massive distributed data storage Numerous disk failures per day. Must introduce redundancy in stored information. Replication or erasure coding? Coding can give orders of magnitude more reliability But problems in creating and maintaining an encoded data representation have to be addressed 4

5 Distributed caching in mobiles Infrastructure slow to deploy and upgrade Delivery with opportunistic contacts [7DS,Haggle, ] Extends coverage and capacity using free D2D bandwidth Scales as network gets dense [Grossglauser/Tse02] 5/5/10 5

6 Distributed caching in mobiles The video you want to watch is very likely to be downloaded by people nearby in the next day Storage in phones is increasing more than anything else Cache the popular content and use D2D to share 5/5/10 6

7 MDS erasure codes File or data object k=2 A n=3 n=4 A A A B B B B A+B A+B (3,2) MDS code, (single parity) used in RAID 5 A+2B (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 7

8 erasure codes are reliable Replication (4,2) MDS erasure code (any 2 suffice to recover) File or data object A A A Erasure coding is introducing redundancy in an optimal way. B Very useful in practice i.e. Reed-Solomon codes, Fountain Codes, (LT and Raptor) B A+B Current storage architectures still use replication. (Gmail makes 21 copies(!)) Can we improve storage efficiency? B A+2B A vs B Replication Pr[failure]=0.43 MDS Erasure code Pr[failure]=0.31 8

9 New open problems A B Network traffic Issues: Communication Update complexity Repair communication? 9

10 Code Repair: Problem statement 1mb 1mb a b Assume we have a (4,2) MDS code and one node leaves the system How much data does a newcomer (e) have to download, to construct a new encoded packet?? repairing the code in distributed environments. c? d? e 10

11 Code Repair: first thoughts 1mb 1mb a Downloading 2mb definitely works. But newcomer (e) is downloading 2mb, to store only 1mb! b Q: Is it possible to download less data? a+b 1mb It is possible to download 1.5mb! e a+2b 1mb When coding is used, creating new fragments is not a trivial task. The problem is that to create a new fragment we must have access to the entire data object 11

12 Reducing repair bandwidth 1mb 1mb a1 a2 b1 b2 1 1 b1+b2 a1+b1 a2+b2 a1+2b1 a2+2b a1+b1+2a2+2b2 a1+2b1+3a2+6b2 e1 e1 12

13 Repair Bandwidth for MDS Theorem 1: For (n,k)-mds codes, if each node is storing bits and downloads β from each existing node Proof by reduction to an flow on an (infinite) graph. MDS = M k,β MDS = M k 1 n k (D, Godfrey, Wu, Wainwright, Ramchandran, IT Transactions (to appear) ) 13

14 Proof sketch: Information flow graph 1mb a a data collector S b c d b c d β β β e data collector =1mb 1+2 β 2 β 1/2mb Total download 1.5mb 14

15 Proof sketch: reduction to multicasting data collector a a data collector S b c d b c d β β β e data collector data collector data collector data collector Repairing a code = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M. (Ahlswede et al. Koetter & Medard, Ho et al.) 15

16 Overview Motivation - Distributed storage in data centers The code repair problem Minimizing repair bandwidth Fundamental tradeoff between repair bandwidth and storage. Systematic Repair 16

17 Regenerating codes M/k a g b f c β d β d e Repair bandwidth can be greatly reduced if we allow slightly more storage per node. 17

18 Minimizing repair bandwidth β β d β d β d d minβd st : MinCut(DC i ) M, i d {k,k +1,...n 1}, βd This problem can be solved analytically 18

19 Ingredient 1: bounding the flow lemma: for any (potentially infinite) graph G(,β,d), any data collector has flow at least MinCut(DC i ) k 1 i= 0 Min{(d i)β,} Proof: sort topologically, count. Bound is tight since satisfied with equality for this graph 19

20 Ingredient 2: just relax β d β d β d β d minβd st : k 1 i= 0 min{(d i)β,} M d {k,k +1,...n 1}, βd Relax the integer constraint Show that integer and relaxed problem attain optimum at the same point 20

21 Minimum repair bandwidth Theorem 2: The minimum repair bandwidth optimization problem has a unique optimum point: 21

22 Numerical example File size M=20mb, k=20, n=25 Reed-Solomon : Store =1mb, repair βd=20mb MinStorage-RC : Store =1mb, repair βd=4.8mb MinBandwidth RC : Store =1.65mb, repair βd=1.65mb Fundamental Tradeoff: What other points are achievable? 22

23 Storage-Communication tradeoff Theorem 3: for any (n,k) code, where each node stores bits, repairs from d existing nodes and downloads dβ=γ bits, the feasible region is piecewise linear function described as follows: min = M /k, γ [ f (0), ), M g(i)γ, γ [ f (i), f (i 1)). k i f (i) := g(i) := 2Md (2k i 1)i + 2k(d k +1) (2d 2k + i +1)i 2d 23

24 Storage-Communication tradeoff Min-Bandwidth Regenerating code Min-Storage Regenerating code βd 24

25 Open Problem: Systematic repair 1mb 1mb a b From Theorem 1, a (4,2) MDS code can be repaired by downloading MDS = M k,β MDS = M k 1 n k c?? What if we require perfect reconstruction? d? e=a 25

26 Repair vs Systematic Repair x 1 x 2 x n β d β d x 1? Repair= Multicasting Systematic repair= Multicasting with intermediate nodes having (overlapping) requests. data data collector collector Cut arguments might not be tight Linear codes might not suffice (Dougherty et al.) β d k β d 26

27 Systematic Repair-(4,2) example x1 x3 x1+x3 x1+2x3 x2 x4 x2+x x2+3x4 3-1 x3+x4 x1+x2+x3+x4 2-1 x x2+x3+x4 x1? x2? (Wu and D., ISIT 2009) 27

28 What is known about systematic repair For (n,2) systematic repair can match cutset bound. [WD ISIT 09] (5,3) MSR systematic code exists (Cullina,D,Ho, Allerton 09) For k/n <=1/2 Systematic repair can match cutset bound [Rashmi, Shah, Kumar, Ramchandran (2010)] [Suh, Ramchandran (2010) ] What can be done for high rates? 28

29 What is known about systematic repair Given an error-correcting code find the repair coefficients that reduce communication (over a field) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) (Papailiopoulos &D, working paper)

30 Distributed caching in mobiles Network codes designed for distributed storage (Regenerating codes) greatly reduce the communication required to maintain the desired redundancy.? Nodes cache different content in a distributed way Which content to cache How much to store? How to find peers that have the desired content Incentives for people to donate storage/ bandwidth?

31 How much to store Two files, each of size 1. Fix a total redundancy 2 How to allocate storage?

32 How much to store Coding helps But finding the best allocation is nontrivial

33 An easier problem

34 Allocations for one object

35 Allocations for one object

36 Problem Description max s.t. n Prob[ i=1 xi T x i 1 i 1] Can be generalized to other models of node availability. Nonconvex problem. Harder than it looks.

37 Distributed storage allocations Symmetric allocations can be suboptimal Given n = 5 storage nodes, budget T = 12/5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial Originally from a discussion among R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman also see S. Jain, M. Demmer, R. Patra, and K. Fall, SIGCOMM 05

38 Distributed storage allocations Results can be obtained for different access models. For iid model. Maximal spreading x= T/n was shown to have asymptotically zero gap from optimality if Tp>1 Leong, D. Ho, Netcod 2009, Globecom submitted

39 Open Problems Cut-Set bounds tight? Linear codes sufficient? What is the limit of interference alignment techniques? Repairing codes in small fields? Existing codes used in storage (e.g. EvenOdd Code, B- Code, etc?). Dealing with bit-errors (security)? (Dikaliotis,Ho,D, ISIT 10) What is the role of (non-trivial) network topologies? Allocations for multiple objects? 39

40 Coding for Storage wiki 40

41 fin 41

42 Conclusions We proposed a theoretical framework for analyzing encoded information representations Repair reduces to network coding and flow arguments completely characterize what is possible. We identified and characterized a tradeoff between repair bandwidth and communication for any storage system. Numerous interesting questions in coding for data centers- repair/ updates/disk IO vs network bandwidth. Systematic, deterministic, small finite field constructions are very interesting for real applications. 42

Functional-Repair-by-Transfer Regenerating Codes

Functional-Repair-by-Transfer Regenerating Codes Functional-Repair-by-Transfer Regenerating Codes Kenneth W Shum and Yuchong Hu Abstract In a distributed storage system a data file is distributed to several storage nodes such that the original file can

More information

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster K. V. Rashmi 1, Nihar B. Shah 1, Dikang Gu 2, Hairong Kuang

More information

Reliability Comparison of Various Regenerating Codes for Cloud Services

Reliability Comparison of Various Regenerating Codes for Cloud Services Reliability Comparison of Various Regenerating Codes for Cloud Services Yonsei Univ. Seoul, KORE Jung-Hyun Kim, Jin Soo Park, Ki-Hyeon Park, Inseon Kim, Mi-Young Nam, and Hong-Yeop Song ICTC 13, Oct. 14-16,

More information

Weakly Secure Network Coding

Weakly Secure Network Coding Weakly Secure Network Coding Kapil Bhattad, Student Member, IEEE and Krishna R. Narayanan, Member, IEEE Department of Electrical Engineering, Texas A&M University, College Station, USA Abstract In this

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

An Overview of Codes Tailor-made for Networked Distributed Data Storage

An Overview of Codes Tailor-made for Networked Distributed Data Storage An Overview of Codes Tailor-made for Networked Distributed Data Storage Anwitaman Datta, Frédérique Oggier Nanyang Technological University, Singapore Email: {anwitaman,frederique}@ntu.edu.sg arxiv:1109.2317v1

More information

A Digital Fountain Approach to Reliable Distribution of Bulk Data

A Digital Fountain Approach to Reliable Distribution of Bulk Data A Digital Fountain Approach to Reliable Distribution of Bulk Data John Byers, ICSI Michael Luby, ICSI Michael Mitzenmacher, Compaq SRC Ashu Rege, ICSI Application: Software Distribution New release of

More information

On the Locality of Codeword Symbols

On the Locality of Codeword Symbols On the Locality of Codeword Symbols Parikshit Gopalan Microsoft Research [email protected] Cheng Huang Microsoft Research [email protected] Sergey Yekhanin Microsoft Research [email protected]

More information

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution

Peer-to-Peer Networks. Chapter 6: P2P Content Distribution Peer-to-Peer Networks Chapter 6: P2P Content Distribution Chapter Outline Content distribution overview Why P2P content distribution? Network coding Peer-to-peer multicast Kangasharju: Peer-to-Peer Networks

More information

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences

More information

Secure Network Coding via Filtered Secret Sharing

Secure Network Coding via Filtered Secret Sharing Secure Network Coding via Filtered Secret Sharing Jon Feldman, Tal Malkin, Rocco Servedio, Cliff Stein (Columbia University) jonfeld@ieor, tal@cs, rocco@cs, cliff@ieor columbiaedu Feldman, Malkin, Servedio,

More information

Secure Network Coding for Wiretap Networks of Type II

Secure Network Coding for Wiretap Networks of Type II 1 Secure Network Coding for Wiretap Networks of Type II Salim El Rouayheb, Emina Soljanin, Alex Sprintson Abstract We consider the problem of securing a multicast network against a wiretapper that can

More information

RAID. Storage-centric computing, cloud computing. Benefits:

RAID. Storage-centric computing, cloud computing. Benefits: RAID Storage-centric computing, cloud computing. Benefits: Improved reliability (via error correcting code, redundancy). Improved performance (via redundancy). Independent disks. RAID Level 0 Provides

More information

XORing Elephants: Novel Erasure Codes for Big Data

XORing Elephants: Novel Erasure Codes for Big Data XORing Elephants: Novel Erasure Codes for Big Data Maheswaran Sathiamoorthy University of Southern California msathiam@uscedu Megasthenis Asteris University of Southern California asteris@uscedu Alexandros

More information

Network Monitoring in Multicast Networks Using Network Coding

Network Monitoring in Multicast Networks Using Network Coding Network Monitoring in Multicast Networks Using Network Coding Tracey Ho Coordinated Science Laboratory University of Illinois Urbana, IL 6181 Email: [email protected] Ben Leong, Yu-Han Chang, Yonggang Wen

More information

Department of Electrical Engineering, Pennsylvania State University, University Park PA.

Department of Electrical Engineering, Pennsylvania State University, University Park PA. Viveck R Cadambe E-mail: [email protected] Phone no : +1-814-867-4774 http://www.ee.psu.edu/viveck/ Current Employment Assistant Professor Researcher Aug 2014 - Current Department of Electrical Engineering,

More information

Data Corruption In Storage Stack - Review

Data Corruption In Storage Stack - Review Theoretical Aspects of Storage Systems Autumn 2009 Chapter 2: Double Disk Failures André Brinkmann Data Corruption in the Storage Stack What are Latent Sector Errors What is Silent Data Corruption Checksum

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

Implementation and Performance Evaluation of Distributed Cloud Storage Solutions using Random Linear Network Coding

Implementation and Performance Evaluation of Distributed Cloud Storage Solutions using Random Linear Network Coding Implementation and erformance Evaluation of Distributed Cloud Storage Solutions using Random Linear Network Coding Frank H.. Fitzek 1,2, Tamas Toth 2,3, Aron Szabados 2,3, Morten V. edersen 1, Daniel E.

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

On the effect of forwarding table size on SDN network utilization

On the effect of forwarding table size on SDN network utilization IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz

More information

Kodo: An Open and Research Oriented Network Coding Library Pedersen, Morten Videbæk; Heide, Janus; Fitzek, Frank Hanns Paul

Kodo: An Open and Research Oriented Network Coding Library Pedersen, Morten Videbæk; Heide, Janus; Fitzek, Frank Hanns Paul Aalborg Universitet Kodo: An Open and Research Oriented Network Coding Library Pedersen, Morten Videbæk; Heide, Janus; Fitzek, Frank Hanns Paul Published in: Lecture Notes in Computer Science DOI (link

More information

File Sharing between Peer-to-Peer using Network Coding Algorithm

File Sharing between Peer-to-Peer using Network Coding Algorithm File Sharing between Peer-to-Peer using Network Coding Algorithm Rathod Vijay U. PG Student, MBES College Of Engineering, Ambajogai V.R. Chirchi PG Dept, MBES College Of Engineering, Ambajogai ABSTRACT

More information

On the Multiple Unicast Network Coding Conjecture

On the Multiple Unicast Network Coding Conjecture On the Multiple Unicast Networ Coding Conjecture Michael Langberg Computer Science Division Open University of Israel Raanana 43107, Israel [email protected] Muriel Médard Research Laboratory of Electronics

More information

Provably Delay Efficient Data Retrieving in Storage Clouds

Provably Delay Efficient Data Retrieving in Storage Clouds Provably Delay Efficient Data Retrieving in Storage Clouds Yin Sun, Zizhan Zheng, C. Emre Koksal, Kyu-Han Kim, and Ness B. Shroff Dept. of ECE, Dept. of CSE, The Ohio State University, Columbus, OH Hewlett

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

SYSTEMATIC NETWORK CODING FOR LOSSY LINE NETWORKS. (Paresh Saxena) Supervisor: Dr. M. A. Vázquez-Castro

SYSTEMATIC NETWORK CODING FOR LOSSY LINE NETWORKS. (Paresh Saxena) Supervisor: Dr. M. A. Vázquez-Castro SYSTEMATIC NETWORK CODING FOR LOSSY LINE NETWORKS Paresh Saxena Supervisor: Dr. M. A. Vázquez-Castro PhD Programme in Telecommunications and Systems Engineering Department of Telecommunications and Systems

More information

Practical Covert Channel Implementation through a Timed Mix-Firewall

Practical Covert Channel Implementation through a Timed Mix-Firewall Practical Covert Channel Implementation through a Timed Mix-Firewall Richard E. Newman 1 and Ira S. Moskowitz 2 1 Dept. of CISE, PO Box 116120 University of Florida, Gainesville, FL 32611-6120 [email protected]

More information

Signatures for Content Distribution with Network Coding

Signatures for Content Distribution with Network Coding Signatures for Content Distribution with Network Coding Fang Zhao, Ton Kalker, Muriel Médard, and Keesook J. Han Lab for Information and Decision Systems MIT, Cambridge, MA 02139, USA E-mail: zhaof/[email protected]

More information

Practical Data Integrity Protection in Network-Coded Cloud Storage

Practical Data Integrity Protection in Network-Coded Cloud Storage Practical Data Integrity Protection in Network-Coded Cloud Storage Henry C. H. Chen Department of Computer Science and Engineering The Chinese University of Hong Kong Outline Introduction FMSR in NCCloud

More information

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks Jia-Qi Jin, Tracey Ho California Institute of Technology Pasadena, CA Email: {jin,tho}@caltech.edu Harish Viswanathan

More information

Network Coding for Security and Error Correction

Network Coding for Security and Error Correction Network Coding for Security and Error Correction NGAI, Chi Kin A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy in Information Engineering c The Chinese

More information

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367 Operating Systems RAID Redundant Array of Independent Disks Submitted by Ankur Niyogi 2003EE20367 YOUR DATA IS LOST@#!! Do we have backups of all our data???? - The stuff we cannot afford to lose?? How

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

CSE-E5430 Scalable Cloud Computing P Lecture 5

CSE-E5430 Scalable Cloud Computing P Lecture 5 CSE-E5430 Scalable Cloud Computing P Lecture 5 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 12.10-2015 1/34 Fault Tolerance Strategies for Storage

More information

9 More on differentiation

9 More on differentiation Tel Aviv University, 2013 Measure and category 75 9 More on differentiation 9a Finite Taylor expansion............... 75 9b Continuous and nowhere differentiable..... 78 9c Differentiable and nowhere monotone......

More information

Factoring & Primality

Factoring & Primality Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

More information

A Practical Scheme for Wireless Network Operation

A Practical Scheme for Wireless Network Operation A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2

On the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2 On the Traffic Capacity of Cellular Data Networks T. Bonald 1,2, A. Proutière 1,2 1 France Telecom Division R&D, 38-40 rue du Général Leclerc, 92794 Issy-les-Moulineaux, France {thomas.bonald, alexandre.proutiere}@francetelecom.com

More information

Tableau Server Scalability Explained

Tableau Server Scalability Explained Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

Reliability and Fault Tolerance in Storage

Reliability and Fault Tolerance in Storage Reliability and Fault Tolerance in Storage Dalit Naor/ Dima Sotnikov IBM Haifa Research Storage Systems 1 Advanced Topics on Storage Systems - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director [email protected] Dave Smelker, Managing Principal [email protected]

More information

Adaptive Linear Programming Decoding

Adaptive Linear Programming Decoding Adaptive Linear Programming Decoding Mohammad H. Taghavi and Paul H. Siegel ECE Department, University of California, San Diego Email: (mtaghavi, psiegel)@ucsd.edu ISIT 2006, Seattle, USA, July 9 14, 2006

More information

Lecture 36: Chapter 6

Lecture 36: Chapter 6 Lecture 36: Chapter 6 Today s topic RAID 1 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for

More information

Disk Storage & Dependability

Disk Storage & Dependability Disk Storage & Dependability Computer Organization Architectures for Embedded Computing Wednesday 19 November 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc.

The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms. Abhijith Shenoy Engineer, Hedvig Inc. The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms Abhijith Shenoy Engineer, Hedvig Inc. @hedviginc The need for new architectures Business innovation Time-to-market

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks

Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks Optimal Multicast in Dense Multi-Channel Multi-Radio Wireless Networks Rahul Urgaonkar IBM TJ Watson Research Center Yorktown Heights, NY 10598 Email: [email protected] Prithwish Basu and Saikat Guha

More information

On Data Recovery in Distributed Databases

On Data Recovery in Distributed Databases On Data Recovery in Distributed Databases Sergei L. Bezrukov 1,UweLeck 1, and Victor P. Piotrowski 2 1 Dept. of Mathematics and Computer Science, University of Wisconsin-Superior {sbezruko,uleck}@uwsuper.edu

More information

Quantcast Petabyte Storage at Half Price with QFS!

Quantcast Petabyte Storage at Half Price with QFS! 9-131 Quantcast Petabyte Storage at Half Price with QFS Presented by Silvius Rus, Director, Big Data Platforms September 2013 Quantcast File System (QFS) A high performance alternative to the Hadoop Distributed

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

PEER TO PEER FILE SHARING USING NETWORK CODING

PEER TO PEER FILE SHARING USING NETWORK CODING PEER TO PEER FILE SHARING USING NETWORK CODING Ajay Choudhary 1, Nilesh Akhade 2, Aditya Narke 3, Ajit Deshmane 4 Department of Computer Engineering, University of Pune Imperial College of Engineering

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures

STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures Cheng Huang, Member, IEEE, and Lihao Xu, Senior Member, IEEE Abstract Proper data placement schemes based on erasure correcting

More information

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding Mathematical Modelling of Computer Networks: Part II Module 1: Network Coding Lecture 3: Network coding and TCP 12th November 2013 Laila Daniel and Krishnan Narayanan Dept. of Computer Science, University

More information

6.852: Distributed Algorithms Fall, 2009. Class 2

6.852: Distributed Algorithms Fall, 2009. Class 2 .8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

More information

Network File Storage with Graceful Performance Degradation

Network File Storage with Graceful Performance Degradation Network File Storage with Graceful Performance Degradation ANXIAO (ANDREW) JIANG California Institute of Technology and JEHOSHUA BRUCK California Institute of Technology A file storage scheme is proposed

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA [email protected], [email protected]

More information

Online Scheduling with Bounded Migration

Online Scheduling with Bounded Migration Online Scheduling with Bounded Migration Peter Sanders, Naveen Sivadasan, and Martin Skutella Max-Planck-Institut für Informatik, Saarbrücken, Germany, {sanders,ns,skutella}@mpi-sb.mpg.de Abstract. Consider

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Dependable Systems 9. Redundant arrays of inexpensive disks (RAID) Prof. Dr. Miroslaw Malek Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Redundant Arrays of Inexpensive Disks (RAID) RAID is

More information

Scalable Internet Services and Load Balancing

Scalable Internet Services and Load Balancing Scalable Services and Load Balancing Kai Shen Services brings ubiquitous connection based applications/services accessible to online users through Applications can be designed and launched quickly and

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding

Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding International Journal of Emerging Trends in Engineering Research (IJETER), Vol. 3 No.6, Pages : 151-156 (2015) ABSTRACT Optimizing Congestion in Peer-to-Peer File Sharing Based on Network Coding E.ShyamSundhar

More information

Codes for Network Switches

Codes for Network Switches Codes for Network Switches Zhiying Wang, Omer Shaked, Yuval Cassuto, and Jehoshua Bruck Electrical Engineering Department, California Institute of Technology, Pasadena, CA 91125, USA Electrical Engineering

More information

Summer Student Project Report

Summer Student Project Report Summer Student Project Report Dimitris Kalimeris National and Kapodistrian University of Athens June September 2014 Abstract This report will outline two projects that were done as part of a three months

More information

Why RAID is Dead for Big Data Storage. The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal

Why RAID is Dead for Big Data Storage. The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal Why RAID is Dead for Big Data Storage The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal Executive Summary Data is exploding, growing 10X every five

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

How to Choose your Red Hat Enterprise Linux Filesystem

How to Choose your Red Hat Enterprise Linux Filesystem How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to

More information