FAQs. Requirements in node Joins. CS535 Big Data Fall 2015 Colorado State University http://www.cs.colostate.edu/~cs535

Similar documents
Distributed Systems. Tutorial 12 Cassandra

Chord. A scalable peer-to-peer look-up protocol for internet applications

Case study: CASSANDRA

Apache Cassandra 1.2

Apache Cassandra 2.0

Chord - A Distributed Hash Table

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing

Apache Cassandra 1.2 Documentation

Cassandra in Action ApacheCon NA 2013

Cassandra A Decentralized, Structured Storage System

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Load Balancing in Structured Overlay Networks. Tallat M. Shafaat

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Practical Cassandra. Vitalii

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management

Hadoop Architecture. Part 1

Apache Hadoop: Past, Present, and Future

Evaluation of NoSQL databases for large-scale decentralized microblogging

Nutanix Tech Note. Failure Analysis All Rights Reserved, Nutanix Corporation

Energy Efficient MapReduce

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Technical Overview Simple, Scalable, Object Storage Software

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

FAQs. This material is built based on. Lambda Architecture. Scaling with a queue. 8/27/2015 Sangmi Pallickara

CS435 Introduction to Big Data

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Introduction to Apache Cassandra

Data Management in the Cloud

MinCopysets: Derandomizing Replication In Cloud Storage

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

LARGE-SCALE DATA STORAGE APPLICATIONS

OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS

Implementing a Distributed Peer to Peer File Sharing System using CHEWBACCA CHord, Enhanced With Basic Algorithm Corrections and Concurrent Activation

The Hadoop Distributed File System

Dynamo: Amazon s Highly Available Key-value Store

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Load Balancing in Distributed Web Server Systems With Partial Document Replication

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Intellicus Enterprise Reporting and BI Platform

File System Management

Administração e Optimização de BDs

Xiaowe Xiaow i e Wan Wa g Jingxin Fen Fe g n Mar 7th, 2011

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

PARALLEL PROCESSING AND THE DATA WAREHOUSE

Ceph. A file system a little bit different. Udo Seidel

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

query enabled P2P networks Park, Byunggyu

CSE-E5430 Scalable Cloud Computing Lecture 2

Distributed Data Stores

ZooKeeper Administrator's Guide

White Paper. Optimizing the Performance Of MySQL Cluster

V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Hadoop Cluster Applications

Multi-Datacenter Replication

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems


Best Practices for Hadoop Data Analysis with Tableau

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

The Advantages and Disadvantages of Network Computing Nodes

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Google File System. Web and scalability

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Couchbase Server Under the Hood

MakeMyTrip CUSTOMER SUCCESS STORY

CDH AND BUSINESS CONTINUITY:

Deploying De-Duplication on Ext4 File System

Async: Secure File Synchronization

HADOOP MOCK TEST HADOOP MOCK TEST I

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

HDFS Users Guide. Table of contents

Cloud Computing at Google. Architecture

FAWN - a Fast Array of Wimpy Nodes

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

The Hadoop Distributed File System

Introduction to Cassandra

Distributed Systems (CS236351) Exercise 3

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Parallel & Distributed Data Management

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

DISTRIBUTED AND PARALLELL DATABASE

SQL Server Business Intelligence on HP ProLiant DL785 Server

PEER-TO-PEER (P2P) systems have emerged as an appealing

Big Data With Hadoop

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Understanding Neo4j Scalability

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

NOSQL DATABASES AND CASSANDRA

How To Create A P2P Network

A survey of big data architectures for handling massive data

Transcription:

CS535 Big Data - Fall 215 W1.B. CS535 Big Data - Fall 215 W1.B.1 CS535 BIG DATA FAQs Zookeeper Installation [Step 1] Download the zookeeper package: $ wget http://apache.arvixe.com/zookeeper/stable/ zookeeper-3.4.6.tar.gz! $ tar xvf zookeeper-3.4.6.tar.gz! $ cd zookeeper-3.4.6/conf! [Step 2] Create a configuration file zookeeper-3.4.6/conf/zoo.cfg with following lines: ticktime=2! datadir=/s/*your_host_name*/a/tmp/ zookeepter_*your_login_name*/data client! Port=2181! Computer Science, Colorado State University [Step 3] Start your server $ bin/zkserver.sh start! CS535 Big Data - Fall 215 W1.B.2 CS535 Big Data - Fall 215 W1.B.3 Today s pics Document-oriented srage system - continued Consistent hashing Chord: Incremental architecture CS535 Big Data - Fall 215 W1.B.4 CS535 Big Data - Fall 215 W1.B.5 Requirements in node Joins In a dynamic network, nodes can join (and leave) at any time 1. Each node s successor is correctly maintained 2. For every key k, node successor(k) is responsible for k Tasks perform node join 1. Initialize the predecessor and fingers of node n 2. Update the fingers and predecessors of existing nodes reflect the addition of n 3. Notify the higher layer software so that it can transfer state (e.g. values) associated with keys that node n is now responsible for 1

CS535 Big Data - Fall 215 W1.B.6 CS535 Big Data - Fall 215 W1.B.7 #define successor finger[1].node // node n joins the network // n is an arbitrary node in the network n.join(n ) if (n ) init_finger_table(n ); update_others(); // move keys in (predessor, n] from successor else // if n is going be the only node in the network for i = 1 m finger[i].node = n; prodecessor = n; n.find_successor(id) n =find_predecessor(id); return n.successor; n.find_predecessor(id) n =n; while(id is NOT in (n, n.successor])) n = n.closest_preceding_finger(id); return n ; n.closest_preceding_finger(id) for i = m down 1 if(finger[i].node is in (n, id)) return finger[i].node; return n; CS535 Big Data - Fall 215 W1.B.8 Step1: Initializing fingers and predecessor (1/2) New node n learns its predecessor and fingers by asking any arbitrary node in the network n look them up n.init_finger_table(n ) finger[1].node = n.find_successor(finger[1].start); predecessor = successor.predecessor; successor.predecessor = n; for i=1 m-1 if(finger[i+1].start is in [n, n.finger[i].node)) finger[i+1].node = finger[i].node; else finger[i+1].node= n.find_successor(finger[i+1].start); CS535 Big Data - Fall 215 W1.B.9 Join 5 (After init_finger_table(n )) 1 [1,2) 1 2 [2,4) 3 4 [4,) 6 5 7 6 [6,7) 7 [7,1) 1 [1,5) 1 4 1 3 2 2 [2,3) 3 3 [3,5) 3 5 [5,1) 4 [4,5) 5 [5,7) 7 [7,3) CS535 Big Data - Fall 215 W1.B.1 Join 5 (After update_others()) 1 [1,2) 1 2 [2,4) 3 4 [4,) 5 6 5 7 6 [6,7) 7 [7,1) 1 [1,5) 1 4 1 3 2 2 [2,3) 3 3 [3,5) 3 5 [5,1) 5 4 [4,5) 5 5 [5,7) 5 7 [7,3) CS535 Big Data - Fall 215 W1.B.11 Step 1: Initializing fingers and predecessor (2/2) Naïve run for find_successor will take O(logN) For m finger entries O(mlogN) How can we optimize this? Check if i th node is also correct (i+1) th node Ask immediate neighbor and copy of its complete finger table and its predecessor New node n can use these table as hints help it find the correct values 2

CS535 Big Data - Fall 215 W1.B.12 CS535 Big Data - Fall 215 W1.B.13 Updating fingers of existing nodes Node n will be entered in the finger tables of some existing nodes n.update_others() for i=1 m p = find_predecessor(n-2 i-1 ); p.update_finger_table(n,i); p.update_finger_table(s,i) if (s is in [n, finger[i].node)) finger[i].node = s; p = predecessor;//get first node preceding n p.update_finger_table(s,i); Node n will become the i th finger of node p if and only if, p precedes n by at least 2 i-1 and the i th finger of node p succeeds n The first node p that can met these two condition Immediate predecessor of n-2i-1 For the given n, the algorithm starts with the finger of node n Continues walk in the counter-clock-wise direction on the identifier circle Number of nodes that need be updated is O(logN) CS535 Big Data - Fall 215 W1.B.14 CS535 Big Data - Fall 215 W1.B.15 Transferring keys Move responsibility for all the keys for which node n is now the successor It involves moving the data associated with each key the new node Node n can become the successor only for keys that were previously the responsibility of the node immediately following n n only needs contact that one node transfer responsibility for all relevant keys Example If you have following data, Name Age Car Gender Jim 36 Camaro M Carol 37 BMW F Jonny 1 M Suzy 9 F Cassandra assigns a hash value each partition key Partition Key Mumer 3 Hash value Jim -2245462676723223822 Carol 77233589272368754 Jonny -67233728543678875 Suzy 11686462738794318 CS535 Big Data - Fall 215 W1.B.16 CS535 Big Data - Fall 215 W1.B.17 Cassandra cluster with 4 nodes Node D 46116861842738794 92233723685477587 Carol 77233589272368754 Data Center ABC Node C Node A -92233723685477588-46116861842738793 Node B Jonny -67233728543678875-46116861842738794 -1 Jim -2245462676723223822 Suzy 11686462738794318 46116861842738793 Partitioning Partitioners 3

CS535 Big Data - Fall 215 W1.B.18 CS535 Big Data - Fall 215 W1.B.19 Partitioning Partitioner is a function for deriving a ken representing a row from its partition key, typically by hashing Each row of data is then distributed across the cluster by value of the ken Murmur3Partitioner! Murmur hash is a non-crypgraphic hash function Created by Austin Appleby in 28 Multiply (MU) and Rotate (R) Read and write requests the cluster are also evenly distributed Each part of the hash range receives an equal number of rows on average Cassandra offers three partitioners Murmur3Partitioner (default): uniformly distributes data across the cluster based on MurmurHash hash values. RandomPartitioner: uniformly distributes data across the cluster based on MD5 hash values. ByteOrderedPartitioner: keeps an ordered distribution of data lexically by key bytes Current version Murmur 3 yields 32 or 128-bit hash value Murmur3 has low bias of under.5% with the Avalanche analysis CS535 Big Data - Fall 215 W1.B.2 CS535 Big Data - Fall 215 W1.B.21 Testing with 42 Million keys Measuring the quality of hash function Hash function quality Where, b j is the number of items in j-th slot. n is the tal number of items m is the number of slots A. V. Aho, M. S. Lam, R. Sethi and J. D Ullman, Compilers, Principles, Techniques, and Tools, Pearson Education, Inc. CS535 Big Data - Fall 215 W1.B.22 CS535 Big Data - Fall 215 W1.B.23 Comparison between hash functions Avalanche Analysis for hash functions http://www.strchr.com/hash_functions Indicates how well the hash function mixes the bits of the key produce the bits of the hash Whether a small change in input causes a significant change in the output Whether or not it achieves avalanche P(Output bit i changes Input bit j changes) =.5 for all i, j If we keep all of the input bits the same, and flip exactly 1 bit Each of our hash function s output bits changes with probability ½ The hash is biased If the probability of an input bit affecting an output bit is greater than or less than 5% Large amounts of bias indicate that keys differing only in the biased bits may tend produce more hash collisions than expected. 4

CS535 Big Data - Fall 215 W1.B.24 CS535 Big Data - Fall 215 W1.B.25 RandomPartitioner! RandomPartitioner was the default partitioner prior Cassandra 2.1 Uses MD5 2 127-1 http://research.neustar.biz/212/2/2/choosing-a-good-hash-function-part-3/ CS535 Big Data - Fall 215 W1.B.26 CS535 Big Data - Fall 215 W1.B.27 ByteOrderPartitioner! This partitioner orders rows lexically by key bytes The ordered partitioner allows ordered scans by primary key If your application has user names as the partition key, you can scan rows for users whose names fall between Jake and Joe Disadvantage of this partitioner Difficult load balancing Sequential writes can cause hot spots Uneven load balancing for multiple tables Replication CS535 Big Data - Fall 215 W1.B.28 CS535 Big Data - Fall 215 W1.B.29 Replication Provides high availability and durability For a replication facr (replication degree) of N The coordinar replicates these keys at N-1 nodes Client can specify the replication scheme Rack-aware/Rack-unaware / Datacenter-aware There is no master or primary replica SimpleStrategy! Used only for a single data center Places the first replica on a node determined by the partitioner Places additional replicas on the next nodes clockwise in the ring without considering pology Does not consider rack or data center location Two replication strategies are available SimpleStrategy Use for a single data center only NetworkTopologyStrategy Multi-data center setup 5

CS535 Big Data - Fall 215 W1.B.3 CS535 Big Data - Fall 215 W1.B.31 NetworkTopologyStrategy (1/3) For the data cluster deployed across multiple data centers This strategy specifies how many replicas you want in each data center Places replicas in the same data center by walking the ring clockwise until it reaches the first node in another rack Attempts place replicas on distinct racks Nodes in the same rack (or similar physical grouping) often fail at the same time due power, cooling, or network issues. NetworkTopologyStrategy (2/3)! When deciding how many replicas configure in each data center, you should consider: being able satisfy reads locally, without incurring cross data-center latency failure scenario The two most common ways configure multiple data center clusters Two replicas in each data center This configuration lerates the failure of a single node per replication group and still allows local reads at a consistency level of ONE. Three replicas in each data center This configuration lerates either the failure of one node per replication group at a strong consistency level of LOCAL_QUORUM or multiple node failures per data center using consistency level ONE. CS535 Big Data - Fall 215 W1.B.32 CS535 Big Data - Fall 215 W1.B.33 NetworkTopologyStrategy (3/3)! Asymmetrical replication groupings For example, you can maintain 4 replicas Three replicas in one data center serve real-time application requests A single replica elsewhere for running analytics. Virtual Node CS535 Big Data - Fall 215 W1.B.34 CS535 Big Data - Fall 215 W1.B.35 What are Vnodes? With consistent hashing, a node owns exactly one contiguous range in the ring-space Vnodes change from one ken or range per node, many per node Within a cluster these can be randomly selected and be non-contiguous, giving us many smaller ranges that belong each node http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 6

CS535 Big Data - Fall 215 W1.B.36 CS535 Big Data - Fall 215 W1.B.37 Advantages of Vnodes Example 3 nodes and replication facr of 3 A node dies completely, and we need bring up a replacement A replica for 3 different ranges reconstitute 1 set of the first natural replica 2 sets of replica for replication facr of 3 Since our RF is 3 and we lost a node, we logically only have 2 replicas left, which for 3 ranges means there are up 6 nodes we can stream from With the setup of RF3, data will be streamed from 3 other nodes tal If vnodes are spread throughout the entire cluster Data transfers will be distributed on more machines CS535 Big Data - Fall 215 W1.B.38 Resring a new disk with vnodes Process of resring a disk Validating all the data and generating a Merkle tree This might take an hour Streaming when the actual data that is needed is sent This phase takes a few minutes CS535 Big Data - Fall 215 W1.B.39 The use of heterogeneous machines with vnodes Newer nodes might be able bear more load immediately You just assign a proportional number of vnodes the machines with more capacity e.g. If you started your older machines with 64 vnodes per node and the new machines are twice as powerful, give them 128 vnodes each and the cluster remains balanced even during transition Advantage of using Vnodes Since the ranges are smaller, data will be sent the damaged node in a more incremental fashion Instead of waiting until the end of a large validation phase The validation phase will be parallelized across more machines, causing it complete faster CS535 Big Data - Fall 215 W1.B.4 Gossip (Internode communications) 7