Cloud Computing at Google. Architecture



Similar documents
Big Table A Distributed Storage System For Data

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

A programming model in Cloud: MapReduce

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hypertable Architecture Overview

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

Distributed File Systems

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Cloud Computing mit mathematischen Anwendungen

Distributed File Systems

Google File System. Web and scalability

The Google File System

Big Data With Hadoop

Data Management in the Cloud -

The Google File System

Distributed storage for structured data

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Hadoop IST 734 SS CHUNG

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Hadoop and Map-Reduce. Swati Gore

THE HADOOP DISTRIBUTED FILE SYSTEM

CSE-E5430 Scalable Cloud Computing Lecture 2

Big Data Analytics with MapReduce VL Implementierung von Datenbanksystemen 05-Feb-13

Big Data Processing with Google s MapReduce. Alexandru Costan

16.1 MAPREDUCE. For personal use only, not for distribution. 333

NoSQL, Big Data, and all that

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Lecture Data Warehouse Systems

Parallel Processing of cluster by Map Reduce

Map Reduce / Hadoop / HDFS

Hadoop Architecture. Part 1

A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011

Open source large scale distributed data management with Google s MapReduce and Bigtable

Programming Abstractions and Security for Cloud Computing Systems

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Introduction to Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop

Infrastructures for big data

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

Big Data and Apache Hadoop s MapReduce

Distributed Systems. Tutorial 12 Cassandra

Challenges for Data Driven Systems

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Chapter 7. Using Hadoop Cluster and MapReduce

Apache Hadoop FileSystem and its Usage in Facebook

Massive Data Storage

Big Data Analytics. Lucas Rego Drumond

NoSQL and Hadoop Technologies On Oracle Cloud

Implementation Issues of A Cloud Computing Platform

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Parallel & Distributed Data Management

MapReduce. from the paper. MapReduce: Simplified Data Processing on Large Clusters (2004)

MapReduce. Olivier Curé. January 6, Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France

A very short Intro to Hadoop

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Hadoop implementation of MapReduce computational model. Ján Vaňo

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

A Comparison of Approaches to Large-Scale Data Analysis

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Comparative analysis of Google File System and Hadoop Distributed File System

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Big Data Analytics. Lucas Rego Drumond

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

BIG DATA What it is and how to use?

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

How To Scale Out Of A Nosql Database

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Data Management in the Cloud

HADOOP MOCK TEST HADOOP MOCK TEST I

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Architectures for Big Data Analytics A database perspective

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop

Big Systems, Big Data

GraySort and MinuteSort at Yahoo on Hadoop 0.23

How To Improve Performance In A Database

Big Data Technology Core Hadoop: HDFS-YARN Internals

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System

Google Bing Daytona Microsoft Research

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Comparing SQL and NOSQL databases

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

ImprovedApproachestoHandleBigdatathroughHadoop

MapReduce (in the cloud)

Cloudera Certified Developer for Apache Hadoop

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

An Approach to Implement Map Reduce with NoSQL Databases

Apache HBase. Crazy dances on the elephant back

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Bigtable is a proven design Underpins 100+ Google services:

Transcription:

Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale applications. Google File System BigTable MapReduce What are the primary design issues surrounding GFS? cisco p.1/?? cisco p.2/?? Department of Computer Science University of San Fra Design Issues Architecture Commodity hardware - failures are the rule Huge files Files tend to be written, then either appended or streamed Random writes are rare What sorts of applications would have this behavior? Multiple users may simultaneously write to a file API and application design should happen in tandem Sustained bandwidth more important than latency Single master Many chunkservers Many clients Files are divided into 64MB chunks. A chunk is redundantly stored at many chunkservers. What is the master s role? cisco p.4/?? cisco p.5/?? Department of Computer Science University of San Fra

Chunkserver Chunkserver Maintain metadata namespace, access control, mapping of files to chunks, chunk locations. Refer clients to chunkservers. Control lease management, garbage collection, migration What is the chunkserver s role? What is the chunkserver s role? Serve up chunks to clients cisco p.7/?? cisco p.8/?? Department of Computer Science University of San Fra Control flow Control flow Advantages What is the typical order of operations for a client that wants to read a file? Client sends filename and offset to master. returns chunk handle and replica locations. Client chooses a replica and requests chunk range. not needed for further data exchange. What are some advantages of this approach? cisco p.10/?? cisco p.11/??

Advantages Persistence Persistence Simplicity of master. Data held in memory. No need to handle chunk access. Easy to handle failure; client just requests a new chunk. Is a single master a potential point of failure? How can a master recover from a crash? keeps all data structures in memory Each file action is logged. also periodically checkpoints. On failure, reload from checkpoint and play back log. cisco p.13/?? cisco p.14/?? Chunk info Chunk info Consistency How does the master know what chunks are stored at each chunkserver? periodically sends a heartbeat to each chunkserver. chunkserver responds with list of all stored chunks and their status. Occasionally, master may have stale information. Simplifies master and reduces overhead. What does consistency mean? What does defined mean? cisco p.16/?? cisco p.17/??

Consistency Implications for applications Implications for applications What does consistency mean? All clients see the same data What does defined mean? All clients see the complete results of a mutation. If a single mutation succeeds, it is consistent and defined. Concurrent writes may be consistent but not defined. Appends are handled more efficiently than random writes. What implications does this model have for an application? What implications does this model have for an application? Applications should append when possible Applications need to keep track of the defined region of the file. Applications will need to tolerate or filter occasional duplicate records. cisco p.19/?? cisco p.20/?? Leases Leases Writing replicated data What is a lease? How is it used? What is a lease? How is it used? A lease is an object that is used to allow mutations to a chunk. The master grants this to one chunkserver (the primary) which then coordinates writes with other replicas. What is the order of operations for writing replicated data? cisco p.22/?? cisco p.23/??

Writing replicated data Data flow Bigtable Client obtains a lease Sends write request to primary Client sends data to all replicas; these are cached. Primary sends write request to all replicas. All replicas process writes to that chunk in the same order. What if a replica fails during this operation? Data is pushed between replicas in a linear fashion. This is an interesting choice; they could have used multicast, or a tree. Why is this? Bigtable is implemented on top of GFS What are the goals of bigtable? What does it not provide? cisco p.25/?? cisco p.26/?? Bigtable Data model Data model Bigtable is implemented on top of GFS What are the goals of bigtable? High availability, scalability, high performance What does it not provide? Complex relational queries, datatypes What is Bigtable s data model? What is Bigtable s data model? Multidimensional map: row name, column name, timestamp map to a data cell (string). cisco p.28/?? cisco p.29/??

Rows Column families Data storage Rows are broken into ranges called tablets, arranged lexicographically. What is the thinking behind this? Column keys are grouped into column families. What is the thinking behind this? GFS is used to store data. Bigtable can coexist with other applications. Data files are written out using the SSTable file format. Chubby is used to provide locking and synchronization. cisco p.31/?? cisco p.32/?? Architecture Tablet servers Tablet servers Tablet servers Clients Chubby What do tablet servers do? What do tablet servers do? Handle interactions with clients, read and write data Tablets are not replicated. cisco p.34/?? cisco p.35/??

How does a client find a tablet? Root tablet accessed via Chubby This contains a map of tablets to tablet servers. This info is then cached by the client. Client communicates directly with the server. What is the role of the master? Keep track of tablet servers Place unassigned tablets. cisco p.37/?? cisco p.38/?? Discussion How can the master tell that a tablet server has died? How can the master tell that a tablet server has died? When a tablet server starts, it creates a lock in Chubby. queries server for the status of the lock. If server does not reply, master attempts to acquire lock. If successful, it redistributes that server s tablets. How does BigTable s architecture compare to GFS? What advantages does this structure have? How does this compare to architectures such as Can or Chord that you might ve learned about in 682? cisco p.40/?? cisco p.41/??

MapReduce MapReduce Example What is the basic paradigm of mapreduce? Define a map operation that is applied to each record in an input to generate key/value pairs Define a reduce operation applied to all elements with the same key to aggregate results. the classic example, counting words: def map(document, words) : for word in words.split() : yield word, 1 def reduce(key, words) : yield key, sum(words) cisco p.43/?? cisco p.44/?? Parallelizing Implementation Failure Structuring your problem in this way allows the map function to run simultaneously on many different machines on subsets of your data. Reduce can then run in parallel for each key. Input data is split into a number of sets. Keyspace is subdivided. A master is used to assign tasks to workers. Each mapping task is performed independently. results are eventually buffered, and the location returned to the master. The master then forwards mapped locations to reduce workers. Reduce workers collect all data associated with their keys, perform reduce, and write data to file. How is worker failure handled? cisco p.46/?? cisco p.47/??

Failure Failure Failure How is worker failure handled? Workers are pinged. Active tasks belonging to non-responsive workers are reassigned. Completed map tasks must be redone. How is master failure handled? How is master failure handled? Checkpointing Restarting cisco p.49/?? cisco p.50/?? Refinements Refinements MR vs DBMS The authors describe a number of refinements to MapReduce. What are they and why are they useful? User-defined partitioning User-defined combining Specialized readers skipping bad records Stonebraker, et al identify the sorts of tasks that MapReduce (Hadoop) excels at, and that RDBMS excel at. MapReduce: Extract-Transform-Load Complex analytics that require multiple passes Semi-structured data (key-value pairs) Quick-and-dirty problems Limited budget cisco p.52/?? cisco p.53/??

MR vs DBMS MR vs DBMS Takeaway Parallel DBMS: Grep log mining with group by join (combine user visits to URLs with PageRank table) Stonebraker, et al suggest some reasons why DBMS might do better even on tasks that seem to be in MapReduce s area of expertise: Repeated parsing of records Tuned compression in DBMS Intermediate data streamed, rather than written to disk Scheduling - DBMSs construct a query plan Hadoop could incorporate streaming and more job-aware scheduling SQL is arguably easier to write than mapreduce code. DBMSs need to be more plug-and-play DBMSs should work with filesystem data. cisco p.55/?? cisco p.56/??