Databases 2 (VU) ( )

Similar documents
MapReduce and the New Software Stack

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

How To Scale Out Of A Nosql Database

Big Data With Hadoop

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Scaling Out With Apache Spark. DTL Meeting Slides based on

Introduction to Big Data Training

Hadoop Job Oriented Training Agenda

Apache Flink Next-gen data analysis. Kostas

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

How Companies are! Using Spark

Oracle Big Data Fundamentals Ed 1 NEW

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Workshop on Hadoop with Big Data

Infrastructures for big data

Unified Big Data Processing with Apache Spark. Matei

BIG DATA What it is and how to use?

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

A Brief Introduction to Apache Tez

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data and Apache Hadoop s MapReduce

Hadoop. Sunday, November 25, 12

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Native Connectivity to Big Data Sources in MSTR 10

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Big Data: Tools and Technologies in Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Map Reduce & Hadoop Recommended Text:

Analysis of Web Archives. Vinay Goel Senior Data Engineer

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Big Data and Industrial Internet

Big Data Analytics - Accelerated. stream-horizon.com

HPC ABDS: The Case for an Integrating Apache Big Data Stack

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang

Hadoop in the Enterprise

Big Data and Scripting Systems build on top of Hadoop

Spark and the Big Data Library

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Hadoop in Social Network Analysis - overview on tools and some best practices - Headline Goes Here

HiBench Introduction. Carson Wang Software & Services Group

Apache HBase. Crazy dances on the elephant back

Xiaoming Gao Hui Li Thilina Gunarathne

Architectures for massive data management

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

The Internet of Things and Big Data: Intro

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Hadoop implementation of MapReduce computational model. Ján Vaňo

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

DATA ANALYSIS II. Matrix Algorithms

Cloud Scale Distributed Data Storage. Jürmo Mehine

How To Handle Big Data With A Data Scientist

Architectures for Big Data Analytics A database perspective

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Constructing a Data Lake: Hadoop and Oracle Database United!

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

A Brief Outline on Bigdata Hadoop

Data Services Advisory

Big Data Analytics Hadoop and Spark

Hadoop-BAM and SeqPig

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Data processing goes big

Challenges for Data Driven Systems

Using distributed technologies to analyze Big Data

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Big Data Course Highlights

Hadoop and Map-Reduce. Swati Gore

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Cloudera Certified Developer for Apache Hadoop

Databases 2 (VU) ( )

Graph Processing and Social Networks

Big Data Technology CS , Technion, Spring 2013

the missing log collector Treasure Data, Inc. Muga Nishizawa

Apache MRQL (incubating): Advanced Query Processing for Complex, Large-Scale Data Analysis

Integrating Big Data into the Computing Curricula

I/O Considerations in Big Data Analytics

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Advanced Big Data Analytics with R and Hadoop

HadoopRDF : A Scalable RDF Data Analysis System

Unified Big Data Analytics Pipeline. 连 城

Ali Ghodsi Head of PM and Engineering Databricks

Dell In-Memory Appliance for Cloudera Enterprise

Introduction to Spark

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Transcription:

Databases 2 (VU) (707.030) MapReduce (Part 3) Mark Kröll KTI, TU Graz Nov. 14, 2016 Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 1 / 41

Outline 1 Problems Suited for Map-Reduce Matrix-Vector Multiplication Relational-Algebra Operations 2 Hadoop Ecosystem Big Data Storage Technologies Slides are partially based on Mining Massive Datasets by Jure Leskovec Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 2 / 41

MapReduce: Applications MapReduce computation makes sense when files are large and rarely updated in place Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 3 / 41

MapReduce: Applications MapReduce computation makes sense when files are large and rarely updated in place not suitable when managing online sales (Amazon) the principal operations on Amazon data involve responding to searches for products, recording sales, and so on, processes that involve relatively little calculation and that change the database won t see MapReduce for handling Web requests (even if we have millions of users) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 3 / 41

MapReduce: Applications MapReduce computation makes sense when files are large and rarely updated in place not suitable when managing online sales (Amazon) the principal operations on Amazon data involve responding to searches for products, recording sales, and so on, processes that involve relatively little calculation and that change the database won t see MapReduce for handling Web requests (even if we have millions of users) however, you want to use MapReduce for analytic queries on the data generated by an e.g. Web application find users with similar buying patterns ranking search results Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 3 / 41

MapReduce: Applications computations such as analytic queries typically involve matrix operations original purpose for the MapReduce implementation was to execute large matrix-vector multiplications to calculate the PageRank Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 4 / 41

MapReduce: Applications computations such as analytic queries typically involve matrix operations original purpose for the MapReduce implementation was to execute large matrix-vector multiplications to calculate the PageRank matrix operations such as matrix-matrix and matrix-vector multiplications fit nicely into MapReduce programming model another important class of operations that can use MapReduce effectively are relational-algebra operations Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 4 / 41

MapReduce: Applications Matrix-Vector Multiplication Suppose we have an n n matrix M, whose element in row i and column j will be denoted m ij. Suppose we also have a vector v of length n, whose jth element is v j. Then the matrix-vector product is the vector x of length n, whose ith element is given by x i = n m ij v j j=1 Outline a Map-Reduce program that calculates the vector x. Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 5 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 6 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication let us first assume that the vector v is large, but it still can fit into the memory the matrix M and the vector v will be each stored in a file of the DFS assume that the row-column coordinates of a matrix element (indices) can be discovered for example, each value is stored as a triple (i, j, m ij ) similarly, the position of v j can be discovered analogously Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 7 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication Map Function: the map function applies to one element of the matrix M the vector v is first read in its entirety and is available for all Map tasks at that compute node from each matrix element m ij the map function produces the key-value pair (i, m ij v j ) all terms of the sum that make up the component x i of the matrix-vector product will get the same key i Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 8 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication Map Function: the map function applies to one element of the matrix M the vector v is first read in its entirety and is available for all Map tasks at that compute node from each matrix element m ij the map function produces the key-value pair (i, m ij v j ) all terms of the sum that make up the component x i of the matrix-vector product will get the same key i Reduce Function: reduce function sums all the values associated with a given key i result is a pair (i, x i ) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 8 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication however, it might be that the vector v does not fit into main memory it is not required that the vector v fits into the memory at a compute node, but if it does not there will be a very large number of disk accesses as we move pieces of the vector into main memory Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 9 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication however, it might be that the vector v does not fit into main memory it is not required that the vector v fits into the memory at a compute node, but if it does not there will be a very large number of disk accesses as we move pieces of the vector into main memory alternatively we can divide the matrix M into vertical stripes of equal width and divide the vector into an equal number of horizontal stripes of the same height use enough stripes so that the portion of the vector in one stripe can fit into main memory at a compute node Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 9 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication Figure: Divide matrix M and vector v into stripes Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 10 / 41

Matrix-Vector Multiplication Matrix-Vector Multiplication the ith stripe of matrix M multiplies only components from the ith stripe of the vector can divide matrix M into one file for each stripe, and do the same for the vector v each Map task is assigned a chunk from one of the stripes in the matrix and gets the entire corresponding stripe of the vector Map and Reduce tasks can then act exactly as before need to sum up once more the results of the stripes multiplication Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 11 / 41

Relational-Algebra Operations Relational-Algebra Operations many operation on data can be described easily in terms of the common database-query primitives the queries themselves must not be executed within a DBMS e.g. standard operations on relations such as selection Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 12 / 41

Relational-Algebra Operations Relational-Algebra Operations many operation on data can be described easily in terms of the common database-query primitives the queries themselves must not be executed within a DBMS e.g. standard operations on relations such as selection a relation is a table with column headers called attributes the set of attributes of a relation R is called its schema: R(A 1, A 2,..., A n ) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 12 / 41

Relational-Algebra Operations Relation Links From To url1 url2 url1 url3 url2 url3 url2 url4...... Table: The relation consists of the set of pairs of URL s, such that the first has one or more links to the second Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 13 / 41

Relational-Algebra Operations Relation Links a tuple is a pair of URLs such that there is at least one link from the first to the second URL the first row (url1, url2) states that the Web page at url1 points to the Web page at url2 a similar relation is typically stored by a search engine (with billions of tuples) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 14 / 41

Relational-Algebra Operations Relational-Algebra Standard operation on relations are 1 Selection(σ): apply a condition C to each tuple and output only tuples that satisfy C 2 Projection (π): produce from each tuple only a subset S of attributes 3 Union, Intersection, Difference: set operations on tuples 4 Natural Join ( ): Given two relations compare each pair of tuples and output those that agree on all common attributes 5 Grouping and Aggregation (γ, θ): partition the tuples in a relation according to their values in a set of attributes. For each group perform one of the operations such as Sum, Count, Avg, Min or Max Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 15 / 41

Relational-Algebra Operations Example 1: Paths of length 2 find paths of length 2 in the Web using the Links relation in other words find triples of URLs (u, v, w) such that there is a link between u and v and a link between v and w we want to take natural join of Links with itself let us describe this with two copies of Links: L1(U1, U2) and L2(U2, U3) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 16 / 41

Relational-Algebra Operations Example 1: Paths of length 2 now we compute L1(U1, U2) L2(U2, U3) for each tuple t1 of L1 and each tuple t2 of L2, we see if their U2 components are same these components are the second component of t1 and the first component of t2) if these two components agree, we produce (U1, U2, U3) as a result if we want only to check for the existence of the path of length two we might want to project onto U1 and U3 π U1,U3 (L1 L2) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 17 / 41

Relational-Algebra Operations Example 2: Number of friends imagine that a social-networking site has a relation Friends(User, Friend) suppose we want to calculate the statistics about the number of friends of each user in terms of relational algebra we would perform grouping and aggregation: γ User,COUNT (Friend) (Friends) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 18 / 41

Relational-Algebra Operations Example 2: Number of friends imagine that a social-networking site has a relation Friends(User, Friend) suppose we want to calculate the statistics about the number of friends of each user in terms of relational algebra we would perform grouping and aggregation: γ User,COUNT (Friend) (Friends) this operation groups all tuples by the value of the first component and then counts the number of friends one tuple for each group, and a typical tuple would look like (Sally, 300), if user Sally has 300 friends Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 18 / 41

Relational-Algebra Operations Selection by MapReduce Selection given is a relation R; we want to compute σ C (R); can be done most conveniently in the map part alone Map Function: for each tuple t in R, test if it satisfies C if so produce the key value pair (t, t) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 19 / 41

Relational-Algebra Operations Selection by MapReduce Selection given is a relation R; we want to compute σ C (R); can be done most conveniently in the map part alone Map Function: for each tuple t in R, test if it satisfies C if so produce the key value pair (t, t) Reduce Function: the reduce function is identity it simply passes each key-value pair to the output Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 19 / 41

Relational-Algebra Operations Projection by Map-Reduce Projection given is a relation R; we want to compute π S (R) Map Function: for each tuple t in R construct a tuple t by eliminating from t those components that are not in projection S output the key-value pair (t, t ) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 20 / 41

Relational-Algebra Operations Projection by Map-Reduce Projection given is a relation R; we want to compute π S (R) Map Function: for each tuple t in R construct a tuple t by eliminating from t those components that are not in projection S output the key-value pair (t, t ) Reduce Function: for each key t there will be one or more key-value pairs (t, t ) the reduce function turns (t, [t, t,..., t ]) into (t, t ) the Reduce operation equals a duplicate elimination Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 20 / 41

Relational-Algebra Operations Natural-Join by Map-Reduce Natural-Join given are relations R(A, B) and S(B, C); we want to compute R S must find tuples that agree on their B components Map Function: for each tuple (a, b) of R produce the key-value pair (b, (R, a)) for each tuple (b, c) of S produce the key-value pair (b, (S, c)) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 21 / 41

Relational-Algebra Operations Natural-Join by Map-Reduce Natural-Join given are relations R(A, B) and S(B, C); we want to compute R S must find tuples that agree on their B components Map Function: for each tuple (a, b) of R produce the key-value pair (b, (R, a)) for each tuple (b, c) of S produce the key-value pair (b, (S, c)) Reduce Function: each key b will be associated with a list of pairs that are either of the form (R, a) or (S, c) construct all pairs consisting of the values (a, b, c) the challenge is to convert one s task in a way that it can be processed by MapReduce; so that it adheres to its internal key/value structure Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 21 / 41

Relational-Algebra Operations Grouping and Aggregation by Map-Reduce Grouping and Aggregation given is a relation R(A, B, C); we want to compute γ A,θ(B) (R) Map Function: Map produces the grouping for each tuple (a, b, c) produce the key-value pair (a, b) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 22 / 41

Relational-Algebra Operations Grouping and Aggregation by Map-Reduce Grouping and Aggregation given is a relation R(A, B, C); we want to compute γ A,θ(B) (R) Map Function: Map produces the grouping for each tuple (a, b, c) produce the key-value pair (a, b) Reduce Function: reduce function produces the aggregation each key a represents a group apply the aggregation operator θ to the list [b 1, b 2,..., b n ] of the values associated with a output is a pair (a, x), where x is the result of θ applied to the list Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 22 / 41

Hadoop Ecosystem Hadoop Eco System (v1) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 23 / 41

Hadoop Ecosystem Hadoop Eco System (v1) HBase open source, non-relational, distributed database modeled after Google s BigTable and is written in Java provides a fault-tolerant way of storing large quantities of sparse data Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 24 / 41

Hadoop Ecosystem Hadoop Eco System (v1) HBase Hive open source, non-relational, distributed database modeled after Google s BigTable and is written in Java provides a fault-tolerant way of storing large quantities of sparse data data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis a data warehouse is a system used for reporting and data analysis Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 24 / 41

Hadoop Ecosystem Hadoop Eco System (v1) HBase Hive Pig open source, non-relational, distributed database modeled after Google s BigTable and is written in Java provides a fault-tolerant way of storing large quantities of sparse data data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis a data warehouse is a system used for reporting and data analysis high-level platform for creating programs that run on Apache Hadoop (language is called Pig Latin) abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 24 / 41

Hadoop Ecosystem Hadoop Eco System (v2) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 25 / 41

Hadoop Ecosystem Hadoop Eco System (v2) Yarn (Yet Another Resource Negotiator) is a cluster management system to run Big Data applications on a cluster data; not a data processing platform itself but enables the platforms to run their code in a cluster environment Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 26 / 41

Hadoop Ecosystem Hadoop Eco System (v2) Yarn (Yet Another Resource Negotiator) is a cluster management system to run Big Data applications on a cluster data; not a data processing platform itself but enables the platforms to run their code in a cluster environment Spark, Flink, Storm are frameworks for cluster computing specializing in either batch processing, stream processing or both (see next slide) Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 26 / 41

Hadoop Ecosystem Hadoop Eco System (v2) Yarn (Yet Another Resource Negotiator) is a cluster management system to run Big Data applications on a cluster data; not a data processing platform itself but enables the platforms to run their code in a cluster environment Spark, Flink, Storm Giraph Impala are frameworks for cluster computing specializing in either batch processing, stream processing or both (see next slide) utilizes Apache Hadoop s MapReduce implementation to process graphs is Cloudera s open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 26 / 41

Hadoop Ecosystem Data Processing Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 27 / 41

Hadoop Ecosystem Java(-ish) is the Hadoop language Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 28 / 41

Hadoop Ecosystem The good, the bad and the ugly The Good Easy to write parallel, highly scalable applications Stream and Batch processing Seamless integration with other systems (e.g. RDBMS) The Bad Rapid development, hard to keep overview 150 projects in (or near) Hadoop Eco System1 The Ugly Maven dependency hell if integrated with other systems Spark depends on > 50 libraries with a specific version! Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 29 / 41

Hadoop Ecosystem History of Hadoop Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 30 / 41

Hadoop Ecosystem System Design Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 31 / 41

Hadoop Ecosystem System Design Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 32 / 41

Hadoop Ecosystem Big Data Storage Technologies Big Data Storage Technologies File-based: HDFS distributed, permanent file storage tuned for large files no indexing Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 33 / 41

Hadoop Ecosystem Big Data Storage Technologies Big Data Storage Technologies File-based: HDFS distributed, permanent file storage tuned for large files no indexing Key-Value based: HBase distributed Key/Value store fast look-ups based on HDFS Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 33 / 41

Hadoop Ecosystem Big Data Storage Technologies Big Data Storage Technologies File-based: HDFS distributed, permanent file storage tuned for large files no indexing Key-Value based: HBase distributed Key/Value store fast look-ups based on HDFS Message-based: Kafka distributed Producer/Consumer messaging system data partitioned in topics producer groups / consumer groups Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 33 / 41

Hadoop Ecosystem Big Data Storage Technologies HDFS Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 34 / 41

Hadoop Ecosystem Big Data Storage Technologies HDFS Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 35 / 41

Hadoop Ecosystem Big Data Storage Technologies HBASE Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 36 / 41

Hadoop Ecosystem Big Data Storage Technologies HBASE Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 37 / 41

Hadoop Ecosystem Big Data Storage Technologies kafka Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 38 / 41

Hadoop Ecosystem Big Data Storage Technologies kafka Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 39 / 41

Hadoop Ecosystem Big Data Storage Technologies To Sum Up: Part 1: handling big data key elements: MapReduce, distributed file system Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 40 / 41

Hadoop Ecosystem Big Data Storage Technologies To Sum Up: Part 1: handling big data key elements: MapReduce, distributed file system Part 2: maximizing parallelism input data skew Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 40 / 41

Hadoop Ecosystem Big Data Storage Technologies To Sum Up: Part 1: handling big data key elements: MapReduce, distributed file system Part 2: maximizing parallelism input data skew Part 3: applications hadoop ecosystem Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 40 / 41

Hadoop Ecosystem Big Data Storage Technologies The End Next: Graph databases, Nov.28th Mark Kröll (KTI, TU Graz) MapReduce Nov. 14, 2016 41 / 41