What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming



Similar documents
CSE-E5430 Scalable Cloud Computing Lecture 2

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Hadoop IST 734 SS CHUNG

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Hadoop Architecture. Part 1

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Big Data and Apache Hadoop s MapReduce

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Hadoop implementation of MapReduce computational model. Ján Vaňo

Design and Evolution of the Apache Hadoop File System(HDFS)

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Big Data With Hadoop

Map Reduce / Hadoop / HDFS

HDFS Architecture Guide

Large scale processing using Hadoop. Ján Vaňo

NoSQL and Hadoop Technologies On Oracle Cloud

ATLAS Tier 3

Distributed File Systems

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Comparative analysis of Google File System and Hadoop Distributed File System

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Alternatives to HIVE SQL in Hadoop File Structure

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Apache Hadoop FileSystem and its Usage in Facebook

Data-Intensive Computing with Map-Reduce and Hadoop

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

Distributed File Systems

Intro to Map/Reduce a.k.a. Hadoop

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Chapter 7. Using Hadoop Cluster and MapReduce

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Snapshots in Hadoop Distributed File System

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Hadoop and Map-Reduce. Swati Gore

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Apache Hadoop. Alexandru Costan

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

THE HADOOP DISTRIBUTED FILE SYSTEM

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

The Hadoop Distributed File System

A Brief Outline on Bigdata Hadoop

Application Development. A Paradigm Shift

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

The Hadoop Framework

The Google File System

Parallel Processing of cluster by Map Reduce

International Journal of Advance Research in Computer Science and Management Studies

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Cloud Computing at Google. Architecture

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

HDFS: Hadoop Distributed File System

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Hadoop: Embracing future hardware

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Introduction to Hadoop

Fault Tolerance in Hadoop for Work Migration

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Big Data Management and NoSQL Databases

Paolo Costa

Large-Scale Data Processing

Cloudera Certified Developer for Apache Hadoop

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

A programming model in Cloud: MapReduce

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Big Data Processing with Google s MapReduce. Alexandru Costan

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

Programming Abstractions and Security for Cloud Computing Systems

Hadoop & its Usage at Facebook

<Insert Picture Here> Big Data

Hadoop Distributed File System (HDFS) Overview

Introduction to Hadoop

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Map Reduce & Hadoop Recommended Text:

L1: Introduction to Hadoop

Apache Hadoop: Past, Present, and Future

Networking in the Hadoop Cluster

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Transcription:

Distributed and Parallel Technology Datacenter and Warehouse Computing Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh 0 Based on earlier versions by Greg Michaelson and Patrick Maier Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 What Is Datacenter (Warehouse) Computing A datacenter is a server farm a room/floor/warehouse full of servers. Datacenter requirements: Massive storage and compute power High availability /7 operation, no downtimes acceptable High security against intrusion (both physical and electronic) Homogeneous hardware and software architecture Recently: Low power consumption Who uses datacenters? Every big name on the internet: Google, Amazon, Facebook,... Lots of other companies to provide services over the internet and/or to manage their business processes Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 Datacenter Computing Architecture Datacenter Computing Application Programming ISP web servers firewall app servers ISP Problems: Distributed resources: Often data distributed over 000s of machines Performance: Scalability of applications is paramount Availability: Apps must cope with compute/network component downtimes Apps must re-configure dynamically when cluster is upgraded Architecture stack: replicated web servers, connected to multiple ISPs firewall (often also replicated) 000s of application servers, each with large disk app servers run application software data stored on app servers (via distributed FS/DB) no dedicated file server/data base servers Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 Solution: Distributed execution engine Offers skeleton-based programming model Hides most performance issues from programmer Automatic parallelisation, controllable by few parameters Hides all distribution and availability issues from programmer Some particular engines: MapReduce (Google; Apache Hadoop) Dryad (Microsoft) Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8

MapReduce Overview Observation Many applications conceptually simple (e.g. triv. data-parallel) Reliably dist. even simple apps on 000s of nodes is a nightmare. Skeletons could help! MapReduce programming model MapReduce skeleton Programmer writes only map and reduce functions. Runtime system takes care of parallel execution and fault tolerance. Implementation requires Distributed file system (Google: GFS, Hadoop: HDFS) For DB apps: dist. data base (Google: BigTable, Hadoop: HBase) MapReduce For Functional Programmers What func programmers think when they hear map/reduce -- map followed by reduce (= fold of associative m with identity e) fp_map_reduce :: (a -> b) -> (b -> b -> b) -> b -> [a] -> b fp_map_reduce f m e = foldr m e. map f -- map followed by group followed by groupwise reduce fp_map_group_reduce :: (a -> b) -> ([b] -> [[b]]) -> ([b] -> b) -> [a] -> [b] fp_map_group_reduce f g r = map r. g. map f -- list-valued map then group then groupwise list-valued reduce fp_map_group_reduce :: (a -> [b]) -> ([b] -> [[b]]) -> ([b] -> [b]) -> [a] -> [[b]] fp_map_group_reduce f g r = map r. g. (concat. map f) Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 5 / 8 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 6 / 8 MapReduce For Functional Programmers Google s MapReduce (sequential semantics) -- list-valued map then fixed group stage then list-valued reduce map_reduce :: Ord c => ((a,b) -> [(c,d)]) -> (c -> [d] -> [d]) -> [(a,b)] -> [(c,[d])] map_reduce f r = map (\ (k,vs) -> (k, r k vs)). (group. sort). (concat. map f) where sort :: Ord c => [(c,d)] -> [(c,d)] sort = sortby (\ (k,_) (k,_) -> compare k k) group :: Eq c => [(c,d)] -> [(c,[d])] group = map (\ ((k,v):kvs) -> (k, v : map snd kvs)). groupby (\ (k,_) (k,_) -> k == k) Specialised for processing key/value pairs. Group by keys Reduction may depend on key and values Not restricted to lists applicable to any container data type Reduction should be associative+commutative in nd argument Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 7 / 8 MapReduce Applications TotientRange -- Euler phi function euler :: Int -> Int euler n = length (filter (relprime n) [.. n-]) where relprime x y = hcf x y == hcf x 0 = x hcf x y = hcf y (rem x y) -- Summing over the phi functions in the interval [lower.. upper] sumtotient :: Int -> Int -> Int sumtotient lower upper = head (snd (head (map_reduce f r input))) where input :: [((),Int)] input = zip (repeat ()) [lower, lower+.. upper] f :: ((),Int) -> [((),Int)] f (k,v) = [(k, euler v)] r :: () -> [Int] -> [Int] r _ vs = [sum vs] -- reduction assoc+comm in nd arg Degenerate example: only single key Still exhibits useful parallelism but would not perform well on Google s implementation Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 8 / 8

MapReduce Applications URL count isurl :: String -> Bool isurl word = "http://" isprefixof word -- input: lines of log file -- : frequency of URLs in input counturl :: [String] -> [(String,[Int])] counturl lines = map_reduce f r input where input :: [((),String)] input = zip (repeat ()) lines f :: ((),String) -> [(String,Int)] f (_,line) = zip (filter isurl (words line)) (repeat ) r :: String -> [Int] -> [Int] r url ones = [length ones] Map breaks line into words filters words that are URLs zips URLs (which become keys) with value Group groups URLs with values (which = ) Reduction counts #values Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 9 / 8 MapReduce How To Parallelise Sequential code map_reduce f r = map (\ (k,vs) -> (k, r k vs)). (group. sort). (concat. map f) suggests -stage pipeline map data parallel task farm parallel sorting and grouping parallel mergesort groupwise reduce data parallel task farm Note: This is not how Google do it. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 0 / 8 Google MapReduce Execution Overview Google MapReduce Execution Overview user program Execution steps: User program forks master, M map s, R reduce s. Master assigns map/reduce tasks to map/reduce s. split 0 split split split split input read () local write () map assign map () master intermediate (on local disks) assign reduce () remote read (5) reduce write (6) file 0 file Map task = 6 6 MB chunk of input Reduce task = range of keys + names of M intermediate Map reads input from GFS and processes it. Map writes to local disk. Output partitioned into R (grouped by key) 5 Reduce gathers from map s and reduces them. Merge M intermediate together, grouping by key. Reduce values groupwise. 6 Reduce writes to GFS. 7 Master returns control to user program after all task completed. J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, Commun. ACM 5():07, 008 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8

Main Selling Points of MapReduce Easy to use for non-experts in parallel programming (details are hidden in the MapReduce implementation) Fault tolerance is integrated in the implementation Good modularity: many problems can be implemented as sequences of MapReduce Flexibility: many problems are instances of MapReduce Good scalability: using 000s of machines at the moment Tuned for large data volumes: several TB of data Highly tuned parallel implementation to achieve eg. good load balance Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 Similar Datacenter Frameworks Apache Hadoop MapReduce programming model Open source Java implementation http://hadoop.apache.org/ Dryad (Microsoft) More general skeleton: Programmer specifies directed acyclic dataflow graph: vertex = sequential program edge = unidirectional communication channel Generalises Unix paradigm of piping apps together High-level languages building on Dryad Nebula scripting language DryadLINQ: database query compiler Closed source C++ implementation M. Isard et al. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, EuroSys 007 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 How MapReduce/Hadoop Work Components: Distributed file system We ll have a look at HDFS (Apache Hadoop) D. Borthakur. HDFS Architecture. http://hadoop.apache.org/common/docs/current/ hdfs_design.html GFS (Google) is quite similar (but pre-dates MapReduce) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP 00 Fault-tolerant task scheduling system Can take advantage of distributed file system. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 5 / 8 Hadoop HDFS Architecture Design goals: Fault-tolerance HDFS runs on large clusters. Hardware failures are common. Large Up to terabytes per file. Performance Exploiting network locality where possible. Portable across heterogeneous hardware and software platforms. Some design choices: Emphasis on high throughput rather than low latency Optimise for streaming access rather than random access. Simple coherency model: streaming writer / many readers. Lock-free reading and writing. Processing facilities near the data. Moving computations cheaper than moving large data sets. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 6 / 8

Hadoop HDFS Datacenter Network Topology Hadoop HDFS Logical Architecture... node node node HDFS Server rack rack rack cluster... Cluster consists of multiple racks. Inter-rack traffic needs to cross (at least) more switches than intra-rack traffic. inter-rack latency > intra-rack latency aggregate intra-rack bandwidth inter-rack bandwidth HDFS must optimise for parallel intra-rack access where possible. /Server architecture (with distributed server) HDFS server implements master/ pattern: master: (stores HDFS meta-data on local FS) Many s: s (store HDFS data on local FS) applications may run on s (rather than on separate machines). Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 7 / 8 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 8 / 8 Hadoop HDFS File Organisation Hadoop HDFS Replication Strategies Metadata (on ) Hierarchical namespace (file/directory tree) Limited access control File system access control not important if clients are trusted. Data (on s) Files stored in large blocks spread over the whole cluster. Block size up to 6 MB (configurable per file) Each block stored as a file on s local FS Each block replicated n times. Replication factor n configurable per file (default n = ) Replicas spread over the cluster to achieve high availability, and to increase performance. Replication strategy crucial for both objectives. Two simple replication strategies: Spread all replicas evenly over racks of cluster. + Optimises availability. Performance may suffer if clients have to fall back on off-rack replicas. Retain one replica on same rack as primary replica, spread remaining replicas over other racks. + Optimises performance (even primary replica unavailable). Availability almost as high as with strategy. Single node failures more common than full rack failures. Many more strategies are possible. Finding good (high availability + high performance) strategies is a hot research topic in distributed file systems / data bases. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 9 / 8 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 0 / 8

Hadoop HDFS Writing to a File Hadoop HDFS Reading from a File 7 replica replica 6 5 6 5 Request to write to block i in file F Return block handle and locations Data transfer/forwarding sends data to nearest Replicas form forwarding pipeline. Initiate write on primary replica 5 Initiate replication 6 Commit secondary replica 7 Commit write Commit only if writing successful on all replicas. Otherwise return error msg to client. Note: Minimal interaction with (to avoid bottleneck). replica replica Request to read from block i in file F Return block handle and locations may cache handle and locations. Only cache for short time: location data becomes invalid quickly. Request data requests data from nearest Transfer data Note: Minimal interaction with. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 Hadoop HDFS Reading from a Dead Node Hadoop HDFS Reading from a Corrupt File 5 replica replica 6 Request to read from block i in file F Return block handle and locations Request data requests data from nearest Request times out because network down, dead or overloaded. 5 Re-request data requests data from another 6 Transfer data Note: No additional interaction with. will detect and deal with failure independently. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8 5 replica replica 6 Request to read from block i in file F Return block handle and locations Request data Signal corruption Inform both and. 5 Re-request data requests data from another 6 Transfer data Note: No additional / interaction. will deal with failure independently. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 / 8

Hadoop HDFS Fault Tolerance How MapReduce Piggybacks on HDFS Fast recovery from failure Snapshot of s state regularly dumped to disk and replicated off-rack. State updates between snapshots journaled. Note: No automatic failure detection. Recovery from failure or network partition Failure detection: listens to s heartbeats. Fast recovery: s timeout and fall back on other replicas. initiates re-replication of blocks residing on dead nodes. Recovery from data corruption Failure detection: compares checksums. Fast recovery: s timeout and fall back on other replicas. notifies, which initiates re-replication of corrupt block. split 0 split split split split input read () local write () map assign map () user program master intermediate (on local disks) assign reduce () remote read (5) reduce write (6) file 0 file Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 5 / 8 Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 6 / 8 How MapReduce Piggybacks on HDFS Fault-tolerance: MapReduce heartbeat monitoring done by HDFS. s run Map or Reduce s as applications. runs MapReduce master as application. If dead, re-replicate its data and re-execute its Map or Reduce tasks somewhere else. Performance: Select Map s optimising for local reads. Prefer s with blocks of input file on disk. Failing that, prefer s with blocks of the input file on the same rack. Have Reduce write primary replica to same. Replication strategy distributes file evenly over other racks. Select Reduce optimising access to all Map s. Prefer grouping Reduce s on same rack as Map s. Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 7 / 8 Further Reading: Tom White, Hadoop: The Definitive Guide. O Reilly Media, Third edition, May 0. Hadoop documentation: http://hadoop.apache.org/ See also the links on Hadoop-level scripting languages such as Pig and Hive. D. Borthakur, HDFS Architecture. http://hadoop.apache.org/common/docs/current/hdfs_ design.html Hans-Wolfgang Loidl (Heriot-Watt Univ) FDP 0/05 8 / 8