The Advantages and Disadvantages of Network Computing Nodes



Similar documents
Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Big Data & Scripting storage networks and distributed file systems

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Big Data & Scripting Part II Streaming Algorithms

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

PIONEER RESEARCH & DEVELOPMENT GROUP

Hard Disk Drives and RAID

Physical Data Organization

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Distributed Storage Networks and Computer Forensics

Data Storage - II: Efficient Usage & Errors

Algorithms and Methods for Distributed Storage Networks 5 Raid-6 Encoding Christian Schindelhauer

An Introduction to RAID. Giovanni Stracquadanio

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

CS 153 Design of Operating Systems Spring 2015

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

RAID Level Descriptions. RAID 0 (Striping)

Lecture 36: Chapter 6

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

How To Write A Disk Array

Storing Data: Disks and Files

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Data Warehousing und Data Mining

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Big Data & Scripting Part II Streaming Algorithms

Analysis of Algorithms I: Binary Search Trees

What is RAID and how does it work?

Theoretical Aspects of Storage Systems Autumn 2009

Introduction. What is RAID? The Array and RAID Controller Concept. Click here to print this article. Re-Printed From SLCentral

Data Corruption In Storage Stack - Review

Unit Storage Structures 1. Storage Structures. Unit 4.3

CS161: Operating Systems

Virtual Infrastructure Security

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

RAID. Storage-centric computing, cloud computing. Benefits:

Block1. Block2. Block3. Block3 Striping

Copyright 1

CSE 120 Principles of Operating Systems

How To Improve Performance On A Single Chip Computer

RAID Overview

1 Storage Devices Summary

Reliability and Fault Tolerance in Storage

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

High Availability Solutions for the MariaDB and MySQL Database

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Summer Student Project Report

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Hardware Configuration Guide

Merkle Hash Trees for Distributed Audit Logs

Review. Lecture 21: Reliable, High Performance Storage. Overview. Basic Disk & File System properties CSC 468 / CSC /23/2006

A Novel Data Placement Model for Highly-Available Storage Systems

CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Striped Set, Advantages and Disadvantages of Using RAID

BrightStor ARCserve Backup for Windows

Cloud Based Application Architectures using Smart Computing

DATABASE DESIGN - 1DL400

CS420: Operating Systems

Cassandra A Decentralized, Structured Storage System

Storage node capacity in RAID0 is equal to the sum total capacity of all disks in the storage node.

CS 6290 I/O and Storage. Milos Prvulovic

Load Balancing in Distributed Web Server Systems With Partial Document Replication

EMC CENTERA VIRTUAL ARCHIVE

G Porcupine. Robert Grimm New York University

Filing Systems. Filing Systems

Recoverable Encryption through Noised Secret over Large Cloud

Data Deduplication: An Essential Component of your Data Protection Strategy

Guideline for stresstest Page 1 of 6. Stress test

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Cloud Computing at Google. Architecture

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Data Link Layer(1) Principal service: Transferring data from the network layer of the source machine to the one of the destination machine

File Management. Chapter 12

Chapter 9: Peripheral Devices: Magnetic Disks

Lecture 1: Data Storage & Index

Storage and File Structure

SmartSync Backup Efficient NAS-to-NAS backup

How To Virtualize A Storage Area Network (San) With Virtualization

A survey of big data architectures for handling massive data

Sistemas Operativos: Input/Output Disks

Scalable Prefix Matching for Internet Packet Forwarding

A Deduplication-based Data Archiving System

How To Create A Multi Disk Raid

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

Nutanix Tech Note. Failure Analysis All Rights Reserved, Nutanix Corporation

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

RAID technology and IBM TotalStorage NAS products

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

Transcription:

Big Data & Scripting storage networks and distributed file systems 1,

2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node will work on the part of the dataset that is locally available to the node computing nodes will have a partial, local copy of the whole dataset an optimal scenario will distribute the data in advance using nodes in parallel for storage and computations general setting nodes connected by network each node has external memory (e.g. hard disk) in addition: internal memory and computing capacity in this part we consider only storage and distribution of data

3, design issues for storage networks Space and Access balance even distribution of data to machines Availability implement redundancy and tolerance for data loss Resource Efficiency use resources in useful way (don t waste space) Access Efficiency provide fast access to stored data Heterogeneity integrate different types of hardware Adaptivity storage of growing amounts of data Locality minimize degree of communication for data access

4, storage networks model n nodes N 1,..., N n node N i has capacity C i total capacity: S = n i=1 C i, i.e. space for S blocks in total blocks stored on N i : F i (filling state) nodes are connected by network: N i can send data to N j for arbitrary i, j data is accessed by users from outside: retrieve a set of blocks (for now) retrieve the result of an operation on a set of blocks (later)

5, balancing problem: consider a simplified scenario with C i constant, i.e. all nodes have the same capacity and distribute m blocks to n nodes subject to: minimize i F i m/n (close to equal distribution) and minimize max i F i (minimize max load)

6, striping all objects combined to single stream of data divide data into blocks B i divide blocks into striping units U i of k blocks each store striping unit i to node N (i mod n) at position i div n stripe unit D1 D2 D3 D4 0 1 8 9 2 3 10 11 4 5 12 13 6 7 14 15 Stripe 0 Stripe 1 block advantage: units in one stripe can be read in parallel

7, striping: size of striping unit k? assumptions: operations tend to involve adjacent blocks example: one big file (e.g. large csv table) spanning several blocks several data accesses in parallel e.g. different users using different files small k high bandwidth (access in parallel) many parallel accesses block each other large k low bandwidth (most files in single node) parallel accesses (to different files) are distributed among nodes choice of k only depends on access-structure and average node performance 1 1 Chen, Patterson, Maximizing performance in a striped disk array, 1990

8, striping: advantages/disadvantages advantages perfectly balanced data distribution simple addressing/storage scheme disadvantages modifying stored data (blocks) block deletion yields holes (new data at the end or into holes) fragmentation (additional indexing adding and removing nodes (machines) addition could be solved by new striping removal leads to (partial) redistribution solutions exists, but striping is best for static scenarios

9, balancing: centralized approach idea one central address and positioning node master coordinates all data access, knows state of nodes store new blocks to nodes with lowest filling state adding/removing storage nodes is straightforward data access: client sends operation to server (read/write, add, delete) server answers with address of node to interact with operation is executed between client and node

10, centralized approach: advantages/disadvantages advantages optimal data distribution can be guaranteed operations can be synchronized disadvantages address and positioning node is bottleneck one centralized dictionary block id node return to access schemes later

11, balancing: distribution by hashing treat nodes as bins, use hash function h() for distribution write block B to node N h(b) load factor α >> 1 (many blocks per node) the balls to bins model usual assumption in hashing: α < 1, avoid collision here: α >> 1 achieve balanced distribution of blocks (balls) to nodes (bins) optimal distribution: m/n blocks (out of m) on each node (out of n) question: can we guarantee that maximum elements in one bin is not too large?

balancing: distribution by hashing when using the distribution of a hash function directly, the fill states of the bins tend to be unbalanced bin fill state, m=10.000 blocks in n=100 bins (blocks in bin) m/n 50 0 50 0 20 40 60 80 100 bin experiment: distribute 10.000 blocks to 100 bins expected fill state: 100 blocks per bin 12,

13, balancing: distribution by hashing the simple case : m elements, n bins, m > n log n, assumption h(x) uniform distributed then with high probability: expected number of elements in most B i : m/n bin with m/n + Θ ( mln(n)/n ) additional load more than with high probability m/n (compared to opt.) In a system with some parameter n, an event X appears with high probability if P(X) 1 1 n α for some constant α > 0. similar cases often denoted as P(X) = 1 o(1)

14, balancing: greedy improvement the expected distribution O(m/n) in each node is good bins with higher load can block computations and data access improvement: greedy(d) for each block, choose d 2 nodes N i1,..., N id find b = arg min k {1,...,d} F ik (break ties arbitrary) place block in N b example: consider block h(b) and blocks to the left and right retrieval: recalculate addresses and test all (in parallel)

balancing: greedy improvement experiment: comparing default choice and greedy improvement bin fill states, m=10.000 blocks in n=100 bins, (2 alternatives in greedy) direct greedy (blocks in bin) m/n 50 0 50 0 20 40 60 80 100 bin each greedy insert uses bin from h(b) 1, h(b), h(b + 1) with minimal fill state 15,

16, analysis of greedy(d) 2 theorem: maximal load Insert m blocks into n nodes using greedy(d), then with high probability: max i F i is ln(ln(n))/ln(d) Θ(m/n) theorem: number of overloaded bins Let γ be a suitable constant. If m balls are distributed into n bins using strategy greedy(d), with probability > 1 1 at most n n exp( d i ) bins have load > m + i + γ. n 1. the maximal load is not too extreme 2. only few bins with much more than the optimal load exist 2 c.f. Berenbrink, Czumaj, Steger, Vöcking, Balanced allocations: The heavily loaded case, 2000

17, heterogeneity implicit assumption above: C i = C j all nodes have equal capacities useful assumption but not realistic heterogeneity: arbitrary hardware for nodes in general C i C j (differing capacities) load balancing is more complicated more freedom of hardware choice e.g. upgrade with constantly larger nodes

18, heterogeneity: virtual buckets the hashing approach can be extended to heterogeneous settings by subdividing all node capacities into virtual buckets choose largest common storage unit C as size of virtual bucket real capacities C i should be approx. multiples of C: C i k i C with k i N every node N i is split into k i buckets s.t. K = i k i (K is the total number of buckets) hash function maps blocks to {1,..., K} (buckets) second mapping m : {1,..., K} {1,..., N}, with {m 1 (i)} = k i map K buckets to N nodes number of buckets for each node corresponding to node size

19, availability: prevent data loss avoid loss of data, i.e. ensure that stored data is available motivational example storage network with N uniform nodes probability of node failure within one month is p P(node survives a month) = (1 p) P(N nodes survive k months) = (1 p) N k failure probability exponential in number of nodes and time failures will happen eventually can not be avoided with fail-safe hardware use redundancy to handle failures

20, availability: implementing redundancy basic principle store additional information (more than only the given data) use that information to recover in case of partial data loss two basic approaches mirroring store data elements several times parity codes create additional information to recover missing bits

21, availability: redundancy by mirroring idea (simple version) for each block store r duplicate on different nodes failure rate for one node p probability of loosing block: p r problem: need rm space instead of r when node fails: create copies of all blocks on failed node from duplicates on update of nodes: update all duplicates

22, availability: parity codes assume string of bits s = s 1 s 2 s 3... s n e.g. 0110101001001110 parity: p(s) = i s i mod 2, e.g. 0 if one bit of s is lost, e.g. s = s 1 xs 3... s n, was x =1 or x =0? use parity of available part: { 0, if p(s) = p(s x = ) 1, else one additional bit allows recovering of one arbitrary lost bit can be extended to larger amounts of missing bits one example: Hamming code store additional bits instead of duplicates and restore on data loss often implemented on hardware level

23, adaptivity capacity is constantly extended by adding nodes problem: rehashing for every new node to expensive idea: adaptive hash function hash function with adaptive range change of range avoids total reorganization, but rearranges only (small) portion of input values when new nodes are added, only a few blocks have to be rearranged

24, adaptivity: adaptive hashing basic idea position nodes in space S for each block determine position in S by hash function store block on nearest node find nearest position for arbitrary point by binary search adapt to new/removed nodes: removing/adding points in space reassign neighboring blocks problem: when node is removed, all blocks go to neighbor(s) when node is added, takes huge load from neighbors refine using multiple positions for each node

25, adaptivity: adaptive hashing use one-dimensional ring [0, 1) as space (distance using modulo) assign k positions to each node i: P i 1,..., P i k every block is mapped to [0, 1)-position by hash function h block positioning determine hash value h(b) for block assign block B to nearest node by position: arg min min{ h(b) Pj i, 1 h(b) Pj i } i j adding a node create new positions for node reassign blocks from neighboring positions remove node reassign blocks remove positions, remove node

26, adaptivity: adaptive hashing the points P i j of node i can be determined by hash-functions for each insertion, a search for the nearest point has to be done until now: homogeneous setting (C i constant) heterogeneous settings: model different sizes by additional points reflect capacity by corresponding number of points using the virtual blocks approach large number of points