Big Data looks Tiny from the Stratosphere

Similar documents

Big Data Analytics. Chances and Challenges. Volker Markl

How To Get The Most Out Of Big Data

Apache Flink Next-gen data analysis. Kostas

The Stratosphere Big Data Analytics Platform

Apache Flink. Fast and Reliable Large-Scale Data Processing

Big Data Research in Berlin BBDC and Apache Flink

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Big Data and Scripting Systems beyond Hadoop

The Stratosphere platform for big data analytics

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Big Data and Scripting Systems build on top of Hadoop

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Apache MRQL (incubating): Advanced Query Processing for Complex, Large-Scale Data Analysis

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

Architectures for Big Data Analytics A database perspective

Reference Architecture, Requirements, Gaps, Roles

How To Optimize Data Processing In A Distributed System

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Big Data With Hadoop

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Scaling Out With Apache Spark. DTL Meeting Slides based on

Distributed Computing and Big Data: Hadoop and MapReduce

Machine Learning over Big Data

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Big Data Course Highlights

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

A Brief Introduction to Apache Tez

Unified Batch & Stream Processing Platform

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Unified Big Data Analytics Pipeline. 连城

Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Lecture Data Warehouse Systems

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

HIGH PERFORMANCE BIG DATA ANALYTICS

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Introduction to Spark

Big Data Can Drive the Business and IT to Evolve and Adapt

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Bayesian networks - Time-series models - Apache Spark & Scala

Domain driven design, NoSQL and multi-model databases

BSC vision on Big Data and extreme scale computing

Data Management in the Cloud

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Cloud Computing at Google. Architecture

Infrastructures for big data

Stateful Distributed Dataflow Graphs: Imperative Big Data Programming for the Masses

Report: Declarative Machine Learning on MapReduce (SystemML)

Hadoop Job Oriented Training Agenda

In Memory Accelerator for MongoDB

Apache Mahout's new DSL for Distributed Machine Learning. Sebastian Schelter GOTO Berlin 11/06/2014

Hadoop: The Definitive Guide

Big Systems, Big Data

Lecture 10 - Functional programming: Hadoop and MapReduce

Hadoop and Map-Reduce. Swati Gore

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Internals of Hadoop Application Framework and Distributed File System

An Approach to Implement Map Reduce with NoSQL Databases

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

Big Data and Apache Hadoop s MapReduce

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Large Scale Graph Processing with Apache Giraph

Unique column combinations

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011

Best Practices for Hadoop Data Analysis with Tableau

A brief introduction of IBM s work around Hadoop - BigInsights

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Big Data Technology Pig: Query Language atop Map-Reduce

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

The Classical Architecture. Storage 1 / 36

Transcription:

Volker Markl http://www.user.tu-berlin.de/marklv volker.markl@tu-berlin.de Big Data looks Tiny from the Stratosphere

Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type Uncertainty/Quality etc. Data (volume) (velocity) (variability) (veracity) Selection/Grouping Relational Operators Extraction & Integration, ML, Optimization Predictive Models etc. Analysis (map/reduce) (Join/Correlation) (map/reduce or dataflow systems) (R, S+, Matlab) (R, S+, Matlab) Slide 3

Overview of Big Data Systems query plan scripting SQL-- XQuery? column store++ a query plan scalable parallel sort Slide 4

Outline Stratosphere architecture Layered and flexible stack for massively parallel data management The PACT programming model Using second-order functions for data parallelism Stratosphere optimizer Deeply embedding Map/Reduce Style UDFs into a query optimizer Iterative algorithms in Stratosphere Teaching a dataflow system to execute iterative algorithms with comparable performance to specialized systems Slide 5

The Stratosphere System Stack Layered approach several entry points to the system PACT Program Meteor Script Pact 4 Scala SOPREMO Compiler Scala-Compiler Plugin Stratosphere Optimizer Runtime Operators Nephele Parallel Dataflow Nephele Dataflow Engine Slide 6

The Stratosphere System Stack PACT Program Meteor Script Pact 4 Scala Nephele parallel dataflow engine - Resource allocation - Scheduling - Task communication - Fault tolerance - Execution monitoring SOPREMO Compiler Scala-Compiler Plugin Stratosphere Optimizer Runtime Operators Nephele Parallel Dataflow Nephele Dataflow Engine Slide 9

The Stratosphere System Stack PACT Program Meteor Script Pact 4 Scala Runtime engine - Memory management - Asynchronous IO - Query execution (sorting, hashing, ) SOPREMO Compiler Scala-Compiler Plugin Stratosphere Optimizer Runtime Operators Nephele Parallel Dataflow Nephele Dataflow Engine Slide 10

The Stratosphere System Stack PACT Program Meteor Script SOPREMO Compiler Pact 4 Scala Scala-Compiler Plugin Stratosphere optimizer picks: - Physical execution strategies - Partitioning strategies - Operator order Stratosphere Optimizer Runtime Operators Nephele Parallel Dataflow Nephele Dataflow Engine Slide 11

SOPREMO Compiler Scala- Compiler Stratosphere Optimizer Runtime Operators Nephele Dataflow Engine The PACT programming model Second-order functions for data parallelism

Parallelization Contracts (PACTs) Data Second-order function First-order function (user code) Data Map PACT Generalize Map/Reduce Describe how input is partitioned/grouped as second order function What is processed together? First-order UDF called once per input group Map PACT (record at a time, 1-dimensional) Each input record forms a group Each record is independently processed by UDF Reduce PACT (set at a time, 1-dimensional) One attribute is the designated key All records with same key value form a group Reduce PACT Slide 14

More Parallelization Contracts Cross PACT Each pair of input records forms a group Distributed Cartesian product CoGroup PACT All pairs with equal key values form a group 2D Reduce More PACTs currently under consideration For similarity operators, stream processing, etc Match PACT Each pair with equal key values forms a group Distributed equi-join Record-at-a-time, 2-dimensional Set-at-a-time, 2-dimensional Record-at-a-time, 2-dimensional Slide 15

PACT Programming Model Map C := max(a,b) Source 1 Extract (A,B) Sink 1 Reduce (on A) sum(b), avg(c) Match (A = D) if (A>3) emit Map if (D>4) emit Source 2 Extract (D,E) A PACT program is an arbitrary dataflow DAG consisting of operators An operator consists of A second-order function (SOF) signature (PACT) A user-defined first-order function (FOF) written in Java PACT programs serve as intermediate representation, but are also exposed to the user To implement UDFs for functionality not supported by Meteor Slide 16

SOPREMO Compiler Scala- Compiler Stratosphere Optimizer Runtime Operators Nephele Dataflow Engine The Stratosphere Optimizer Opening the Black Boxes (VLDB 2012)

Optimizer Design Cost-based optimizer produces physical execution plan given PACT program Annotates edges with distribution patters, e.g., broadcast, partition Chooses physical execution strategies (e.g., hash/sort) Reorders PACT functions Constructs Nephele job graph Challenge: Semantics of user-defined functions unknown How to derive correct transformations (this talk) How to cost functions (ongoing work) Mix and match UDFs and native operators (ongoing work) Slide 18

Optimization Overview Approach: Statically analyze user code in each PACT UDFs and extract properties Based on these properties, derive semantically correct transformations Enumerate semantically equivalent plans Contribution: How to deeply embed MapReduce functions into a query optimizer Parallelization and reordering Applies to data flows composed (in part) of functions written in arbitrary imperative code Exportable to Scope, SQL/MapReduce (e.g., Aster, Greenplum) Slide 19

via Static Code Analysis 1 void match (Record left, 2 Record right, 3 Collector col) { 4 Record out = copy (left); 5 if (left.get(0) > 3) { 6 double a = right.get(2); 7 out.set(2,1.0/a); 8 } 9 out.set(1, 42); 10 out.set(3,right.get(0)); 11 out.set(4,right.get(1)); 12 out.set(5,right.get(2)); 13 col.emit (out); 14 } Feasible: 1.Record data model, fixed API for 2.No control flow between operators Correct: Difficulty comes from different code paths Correctness guaranteed through conservatism Add to R,W when in doubt Slide 20

Opening the Black Boxes Analyze user code to discover: (O f,r f,w f,e f ) Output schema O f : Schema of output record given schema of input record(s) Read set R f : Attributes of the input record(s) that might influence output PACT f Write set W f : Attributes of the output record(s) that might have different values from respective input attributes Emit cardinality E f : Bounds on records emitted per call (1, >1, ) Slide 21

Code Analysis Algorithm 1 void match (Record left, 2 Record right, 3 Collector col) { 4 Record out = copy (left); 5 if (left.get(0) > 3) { 6 double a = right.get(2); 7 out.set(2,1.0/a); 8 } 9 out.set(1, 42); 10 out.set(3,right.get(0)); 11 out.set(4,right.get(1)); 12 out.set(5,right.get(2)); 13 col.emit (out); 14 } R f from get statements W f by backwards traversal of data flow graph starting from emit statement E f by traversing control flow graph Input 1 = [A,B,C] Input 2 = [D,E,F] Output = [A,B,C,D,E,F] R f = {A,B,C,D,E,F} W f = {B,C} E f = 1 Slide 22

Automatic Parallelization probeht (A) parti./sort. (A) Map C := max(a,b) Source 1 Extract (A,B) Sink 1 Reduce (on A) sum(b), avg(c) fifo Match (A = D) if (A>3) emit buildht (D) partition (D) Map if (D>4) emit Source 2 Extract (D,E) Optimizer can pick partitioning strategies From PACT signature E.g., for Match: broadcast, partition, SFR Partitioning strategies propagated top-down as interesting properties Can infer preserved partitioning via R/W sets Identifies pass-through UDFs A Reduce does not always imply a physical sort operator Slide 23

Operator Reordering buildht (A) Reduce (on A) sum(b), avg(c) buildht (A) partition (A) Map C := max(a,b) Source 1 Extract (A,B) Sink 1 Match (A = D) if (A>3) emit probeht (D) part./sort (D) Map if (D>4) emit Source 2 Extract (D,E) Reordering PACTs Reduce data volume Introduce new partitioning opportunities Reordering, partitioning, and physical operators in one stage Optimal execution plan Powerful transformations using read and write conflicts Can emulate most relational optimizations without knowing operator semantics Slide 24

Example Transformations Theorem 1: Two Map operators can be reordered if their UDFs have only read-read conflicts Theorem 2: For a Map and a Reduce, we need in addition the Reduce key groups to be preserved Enabled optimizations: Selection push-down (Bushy) join reordering Aggregation push-down In VLDB 12 paper: Formal proofs and conditions for safe reorderings for all possible PACT pairs based on R f,w f,e f Equivalent to invariant grouping transformation [Chaudhuri & Shim 1994] Reordering of non-relational Reduce functions Slide 25

SOPREMO Compiler Scala- Compiler Stratosphere Optimizer Runtime Operators Nephele Dataflow Engine Support for Iterative queries Spinning Fast Iterative Dataflows (VLDB 2012)

Bulk Iterations dynamic Reduce (on tid) (pid=tid, r= k) Match (on pid) (tid, k=r*p) p S (pid, r) A Sum up partial ranks Join P and A (pid, tid, p) constant Recompute state at each iteration Conceptual feedback edge in the dataflow lazy unrolling possible Distinguish dynamic data path (different data each iteration) and constant data path (same) Caching heuristics were constant and dynamic paths meet Cached data may be indexed Optimizer weighs costs for constant and dynamic data path differently Automatically favors plans that push work to the constant path 29

PageRank: Two Optimizer Plans Reduce (on tid) (pid=tid, r= k) buildhash- Table (pid) O Match (on pid) (tid, k=r*p) broadcast I fifo p fifo (pid, r) CACHE part./sort (tid) A Sum up partial ranks Join P and A probehashtable (pid) (pid, tid, p) Reduce (on tid) (pid=tid, r= k) probehash- Table (pid) O part./sort (tid) Match (on pid) I (tid, k=r*p) CACHE partition (pid) p (pid, r) partition (pid) A Sum up partial ranks Join P and A buildhashtable (pid) (pid, tid, p) Slide 30

Sparse Computational Dependencies Parts of the state change at each iteration, based on the parts that changed at the previous iteration Most graph algorithms and beyond Huge savings if we do not recompute the whole state Need in-place updates in persistent state while surfacing a pure functional model 4 4 2 1 cid 1 3 5 2 3 7 7 8 5 6 6 8 S 0 S 1 S 2 9 9 vid 4 4 2 1 1 3 5 5 1 7 7 1 7 6 8 5 7 9 1 4 2 1 1 3 5 5 1 7 7 1 7 6 8 5 7 9 31

Incremental Iterations Different programming abstraction based on workset algorithms Algorithm works on two sets: Solution Set and Workset A delta to the solution set is computed from the workset A new workset is recomputed at each iteration Solution set is efficiently merged with delta set 32

Pregel as an Extended PACT Plan Working Set has messages sent by the vertices Create Messages from new state N W i+1 D i+10 Match CoGroup. U Delta set has state of changed vertices Aggregate messages and derive new state Graph Topology In-place updates in persistent hash table W i S i Slide 33

Big Data looks Tiny from the Stratosphere Stratosphere is a declarative, massively-parallel Big Data Analytics System, funded by DFG and EIT, available open-source under the Apache license Data analysis program PACT Dataflow Program compiler Execution plan Stratosphere optimizer Automatic selection of parallelization, shipping and local strategies, operator order and placement Job graph Execution graph Runtime operators Hash- and sort-based out-of-core operator implementations, memory management Big Data Analytics Volker Markl BDOD Big Data Chances and Challenges Slide 34 Parallel execution engine Task scheduling, network data transfers, resource allocation, checkpointing http://www. stratosphere.eu