Cloud Programming: From Doom and Gloom to BOOM and Bloom



Similar documents
Dedalus: Datalog in Time and Space

Apache Hadoop. Alexandru Costan

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Snapshots in Hadoop Distributed File System

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Design and Evolution of the Apache Hadoop File System(HDFS)

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop Parallel Data Processing

Hadoop Architecture. Part 1

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

In Memory Accelerator for MongoDB

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Distributed File Systems

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Hadoop Distributed File System (HDFS) Overview

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Amr El Abbadi. Computer Science, UC Santa Barbara

Storage Architectures for Big Data in the Cloud

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joseph Hellerstein, Khaled Elmeleegy, Russell Sears. Neeraj Ganapathy

Large scale processing using Hadoop. Ján Vaňo

Hadoop and Map-Reduce. Swati Gore

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

The Inside Scoop on Hadoop

Log Mining Based on Hadoop s Map and Reduce Technique

Survey on Scheduling Algorithm in MapReduce Framework

Distributed Storage Systems

Distributed Filesystems

How To Use Hadoop

Hadoop & its Usage at Facebook

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Introduction to HDFS. Prasanth Kothuri, CERN

DRAFT. 1 Proposed System. 1.1 Abstract

Can the Elephants Handle the NoSQL Onslaught?

Distributed Metadata Management Scheme in HDFS

Deploying Hadoop with Manager

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

HDFS. Hadoop Distributed File System

THE HADOOP DISTRIBUTED FILE SYSTEM

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Introduction to Cloud Computing

Apache Hadoop FileSystem and its Usage in Facebook

Fault Tolerance in Hadoop for Work Migration

Programming Without a Call Stack: Event-driven Architectures

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Data Management in the Cloud

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Using Data Mining and Machine Learning in Retail

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Big Data With Hadoop

NoSQL Data Base Basics

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Enhancing Map-Reduce Job Execution On Geo-Distributed Data Across Datacenters

Yuji Shirasaki (JVO NAOJ)

Windows Azure Storage Scaling Cloud Storage Andrew Edwards Microsoft

HDFS Users Guide. Table of contents

Hypertable Architecture Overview

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Big Data Technology Core Hadoop: HDFS-YARN Internals

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

5 HDFS - Hadoop Distributed System

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

CSE-E5430 Scalable Cloud Computing Lecture 2

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Apache Hadoop FileSystem Internals

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

NoSQL and Hadoop Technologies On Oracle Cloud

Introduction to HDFS. Prasanth Kothuri, CERN

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Hadoop IST 734 SS CHUNG

NoSQL for SQL Professionals William McKnight

Google File System. Web and scalability

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Special Relativity and the Problem of Database Scalability

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Distributed File Systems

Hadoop Ecosystem B Y R A H I M A.

Chapter 7. Using Hadoop Cluster and MapReduce

Transcription:

Cloud Programming: From Doom and Gloom to BOOM and Bloom Neil Conway UC Berkeley Joint work with Peter Alvaro, Ras Bodik, Tyson Condie, Joseph M. Hellerstein, David Maier (PSU), William R. Marczak, and Russell Sears (Yahoo! Research) Datalog 2.0 Workshop

Cloud CompuQng: The Next Great CompuQng PlaSorm

The Problem WriQng reliable, scalable distributed souware remains extremely difficult.

Doom and Gloom!...when we start talking about parallelism and ease of use of truly parallel computers, we re talking about a problem that s as hard as any that computer science has faced... I would be panicked if I were in industry. - - John Hennessey, Stanford

A Ray of Light We understand data- parallel compuqng MapReduce, parallel DBs, etc. Can we take a hard problem and transform it into an easy one?

Everything is Data Distributed compuqng is all about state System state Session state Protocol state User and security- related state... and of course, the actual data CompuQng = CreaQng, updaqng, and communicaqng that state

Datalog to the Rescue! 1. Data- centric programming Explicit, uniform state representaqon: relaqons 2. High- level declaraqve queries Datalog + asynchrony + state update

Outline 1. The BOOM Project Cloud CompuQng stack built w/ distributed logic BOOM AnalyQcs: MapReduce and DFS in Overlog 2. Dedalus: Datalog in Time (and Space) 3. (Toward) The Bloom Language Distributed Logic for Joe the Programmer

The BOOM Project Berkeley Orders Of Magnitude OOM more scale, OOM less code Can we build Google in 10k LOC? Build Real Systems in distributed logic Begin w/ an exisqng variant of Datalog ( Overlog ) Inform the design of a new language for distributed compuqng (Bloom)

BOOM AnalyQcs Typical Big Data stack: MapReduce (Hadoop) + distributed file system (HDFS) We tried two approaches: HDFS: clean- slate rewrite Hadoop: replace job scheduling logic w/ Overlog Goals 1. Replicate exisqng funcqonality 2. Add Hard Stuff (and make it look easy!)

Overlog: Distributed Datalog Originally designed for rouqng protocols and overlay networks (Loo et al., SIGMOD 06) RouQng = recursive query over distributed DB Datalog w/ aggregaqon, negaqon, funcqons DistribuQon = horizontal parqqoning of tables Data placement induces communicaqon

Yet Another TransiQve Closure link(x, Y, C); path(x, Y, C) :- link(x, Y, C); path(x, Z, C1 + C2) :- link(x, Y, C1), path(y, Z, C2); mincost(x, Z, min<c>) :- path(x, Z, C);

Overlog Example link(@x, Y, C); path(@x, Y, C) :- link(@x, Y, C); path(@x, Z, C1 + C2) :- link(@x, Y, C1), path(@y, Z, C2); mincost(@x, Z, min<c>) :- path(@x, Z, C);

Overlog Timestep Model

Hadoop Distributed File System Based on the Google File System (SOSP 03) Large files, sequenqal workloads, append- only Used by Yahoo!, Facebook, etc. Chunks 3x replicated at data nodes for fault tolerance Data Protocol Client Data Node Metadata Protocol Heartbeat Protocol Data Node Master Node Control Protocol Data Node

BOOM- FS Hybrid system Client Complex logic: Overlog Metadata Protocol Overlog Performance- criqcal (but simple!): Java Data Protocol Heartbeat Protocol Master Node Control Protocol Clean separaqon between policy and mechanism Data Node Data Node Data Node

BOOM- FS Example: State Represent file system metadata with relaqons. Name Descrip9on A<ributes file Files fileid, parentid, name, isdir fqpath Fully- qualified path names fileid, path fchunk Chunks per file chunkid, fileid datanode DataNode heartbeats nodeaddr, lasthbtime hb_chunk Chunk heartbeats nodeaddr, chunkid, length

BOOM- FS Example: Query Represent file system metadata with relaqons. // Base case: root directory has null parent fqpath(fileid, Path) :- file(fileid, FParentId, FName, IsDir), IsDir = true, FParentId = null, Path = "/"; fqpath(fileid, Path) :- file(fileid, FParentId, FName, _), fqpath(fparentid, ParentPath), // Do not add extra slash if parent is root dir PathSep = (ParentPath = "/"? "" : "/"), Path = ParentPath + PathSep + FName;

BOOM- FS Example: Query Distributed protocols: join between event stream and local database // "ls" for extant path => return liscng for path response(@source, RequestId, true, DirLisQng) :- request(@master, RequestId, Source, "Ls", Path), fqpath(@master, FileId, Path), directory_lisqng(@master, FileId, DirLisQng); // "ls" for nonexistent path => error response(@source, RequestId, false, null) :- request(@master, RequestId, Source, "Ls", Path), noqn fqpath(@master, _, Path);

Comparison with Hadoop CompeQQve performance (~20%) Lines of Java Lines of Overlog HDFS ~21,700 0 BOOM- FS 1,431 469 New Features: 1. Hot Standby for FS master nodes using Paxos 2. ParQQoned FS master nodes for scalability ~1 day! 3. Monitoring, tracing, and invariant checking

Lessons from BOOM AnalyQcs Overall, Overlog was a good fit for the task Concise programs for real features, easy evoluqon Data- centric design: language- independent ReplicaQon, parqqoning, monitoring all involve data management Node- local invariants are cross- cuwng queries SpecificaQon is enforcement Policy vs. mechanism Datalog vs. Java

Challenges from BOOM AnalyQcs Poor perf, crypqc syntax, lixle/no tool support Easy to fix! Many bugs related to updaqng state Ambiguous semanqcs (in Overlog) We avoided distributed queries The global database is a lie! Hand- coding protocols vs. staqng distributed invariants

Outline 1. The BOOM Project Cloud CompuQng stack built w/ distributed logic BOOM AnalyQcs: MapReduce and DFS in Overlog 2. Dedalus: Datalog in Time (and Space) 3. (Toward) The Bloom Language Distributed Logic for Joe the Programmer

Dedalus Dedalus: a theoreqcal foundaqon for Bloom In Overlog, the Hard Stuff happens between Qme steps State update Asynchronous messaging Can we talk about the Hard Stuff with logic?

State Update Updates in Overlog: ugly, outside of logic Difficult to express common paxerns Queues, sequencing Order doesn t maxer... except when it does! counter(@a, Val + 1) :- counter(@a, Val), event(@a, _);

Asynchronous Messaging Overlog @ notaqon describes space Logical interpretaqon unclear: p(@a, B) :- q(@b, A); Upon reflecqon, "me is more fundamental Model failure with arbitrary delay Discard illusion of global DB

Dedalus: Datalog in Time (1) Deduc9ve rule: (Pure Datalog) p(a, B) :- q(a, B); (2) Induc9ve rule: (Constraint across next Qmestep) p(a, B)@next :- q(a, B); (3) Async rule: (Constraint across arbitrary Qmesteps) p(a, B)@async:- q(a, B);

Dedalus: Datalog in Time (1) Deduc9ve rule: (Pure Datalog) All terms in body have same Qme p(a, B, S) :- q(a, B, T), T = S; (2) Induc9ve rule: (Constraint across next Qmestep) p(a, B, S) :- q(a, B, T), successor(t, S); (3) Async rule: (Constraint across arbitrary Qmesteps) p(a, B, S) :- q(a, B, T), Qme(S), choose((a, B, T), (S));

State Update in Dedalus p (A, B)@next :- p (A, B), noqn p_neg(a, B); p (1, 2)@101; p (1, 3)@102; p_neg(1, 2)@300; Time p(1, 2) p(1, 3) p_neg(1, 2) 101 102... 300 301

Counters in Dedalus counter(a, Val + 1)@next :- counter(a, Val), event(a, _); counter(a, Val)@next :- counter(a, Val), noqn event(a, _);

Asynchrony in Dedalus Unreliable Broadcast: sbcast(#target, Sender, Message)@async :- new_message(#sender, Message), members(#sender, Target); More saqsfactory logical interpretaqon Can build Lamport clocks, reliable broadcast, etc. What about space? Space is the unit of atomic deducqon w/o parqal failure

Asynchrony in Dedalus Unreliable Broadcast: Project sender s local Qme sbcast(#target, Sender, N, Message)@async :- new_message(#sender, Message)@N, members(#sender, Target); More saqsfactory logical interpretaqon Can build Lamport clocks, reliable broadcast, etc. What about space? Space is the unit of atomic deducqon w/o parqal failure

Dedalus Summary Logical, model- theoreqc semanqcs for two key features of distributed systems 1. Mutable state 2. Asynchronous communicaqon All facts are transient Persistence and state update are explicit IniQal correctness checks 1. Temporal straqfiability ( modular straqficaqon in Qme ) 2. Temporal safety ( eventual quiescence )

DirecQons: Bloom 1. Bloom: Logic for Joe the Programmer Expose sets, map/reduce, and callbacks? TranslaQon to Dedalus 2. VerificaQon of Dedalus programs 3. Network- oriented opqmizaqon 4. Finding the right abstracqons for Distributed CompuQng Hand- coding protocols vs. staqng distributed invariants 5. Parallelism and monotonicity?

QuesQons? Thank you! hxp://declaraqvity.net IniQal PublicaQons: BOOM AnalyCcs: EuroSys 10, Alvaro et al. Paxos in Overlog: NetDB 09, Alvaro et al. Dedalus: UCB TR #2009-173, Alvaro et al.

Temporal StraQfiability ReducQon to Datalog not syntacqcally straqfiable: p(x)@next :- p(x), noqn p_neg(x); p_neg(x) :- event(x, _), p(x);