SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG
|
|
|
- Elfrieda Baldwin
- 10 years ago
- Views:
Transcription
1 SCALABLE GRAPH ANALYTICS WITH GRADOOP AND BIIIG MARTIN JUNGHANNS, ANDRE PETERMANN, ERHARD RAHM
2 RESEARCH ON GRAPH ANALYTICS Graph Analytics on Hadoop (Gradoop) Distributed graph data management Rich graph data model with powerful operators Domain independent Business Intelligence with Integrated Instance Graphs (BIIIG) Graph-based data integration Graph OLAP, Mining and visualization Improved Scalability on Gradoop 2
3 GRAPHS ARE EVERYWHERE AND LARGE Social science Life science Engineering Information science 3
4 END-TO-END GRAPH ANALYTICS Data Integration Graph Analytics Representation Integrate data from one or more sources into a dedicated graph storage with common graph data model Definition of analytical workflows from operator algebra Result representation in meaningful way 4
5 An end-to-end framework and research platform for efficient, distributed and domain independent graph data management and analytics. 5
6 HIGH LEVEL ARCHITECTURE Data flow Control flow Workflow Declaration Visual GrALa DSL Representation Workflow Execution Operator Implementations Data Integration Graph Analytics Representation Extended Property Graph Model HBase Distributed Graph Store HDFS Cluster 6
7 EXTENDED PROPERTY GRAPH MODEL 7
8 GRALA OPERATORS 1. Collection Operators Select, Union, Intersect, Difference Sort by, Top, Distinct 2. Graph Operators Pattern Matching Combination, Overlap, Exclusion Aggregation, Summarization, Projection 3. Auxillary Operators Apply, Reduce, Call 8
9 WORKFLOW EXAMPLE: SUMMARIZATION 1: persongraph = db.g[0].combine(db.g[1]).combine(db.g[2]) 2: vertexgroupingkeys = {:type, city } 3: edgegroupingkeys = {:type} 4: vertexaggfunc = (Vertex vsum, Set vertices => vsum[ count ] = vertices ) 5: edgeaggfunc = (Edge esum, Set edges => esum[ count ] = edges ) 6: sumgraph = persongraph.summarize(vertexgroupingkeys, edgegroupingkeys, vertexaggfunc, edgeaggfunc) 9
10 WORKFLOW EXAMPLE: SUMMARIZATION 1: persongraph = db.g[0].combine(db.g[1]).combine(db.g[2]) 2: vertexgroupingkeys = {:type, city } 3: edgegroupingkeys = {:type} 4: vertexaggfunc = (Vertex vsum, Set vertices => vsum[ count ] = vertices ) 5: edgeaggfunc = (Edge esum, Set edges => esum[ count ] = edges ) 6: sumgraph = persongraph.summarize(vertexgroupingkeys, edgegroupingkeys, vertexaggfunc, edgeaggfunc) 10
11 SUMMARY & ROADMAP: GRADOOP Summary end-to-end framework for graph data management and analytics extended property graph model (EPGM) with powerful operators initial implementation running (HBase, MapReduce and Giraph) Roadmap WIP: workflow execution layer (Flink, Spark, ) WIP: reference implementation for all operators optimized graph partitioning approaches graph-based data integration (DeDoop) 11
12 BIIIG ON GRADOOP Fitting data model Complex Analytics composed of Gradoop Operators Example: Cluster Characteristic Patterns in Business Process Executions Quantify clusters by business measure (e.g., profitable and lossy) Characteristic = frequent within one but not in other clusters 12
13 BIIIG OVERVIEW 13
14 BUSINESS TRANSACTION GRAPH 14
15 CLUSTER-CHARACTERISTIC PATTERNS 15
16 CLUSTER-CHARACTERISTIC PATTERNS // generate base collection btgs = iig.callforcollection( :BusinessTransactionGraphs, {} ) 16
17 CLUSTER-CHARACTERISTIC PATTERNS 17
18 CLUSTER-CHARACTERISTIC PATTERNS // generate base collection btgs = iig.callforcollection( :BusinessTransactionGraphs, {} ) // aggregate profit aggfunc = ( Graph g => g.v.values( Revenue").sum() - g.v.values( Expense").sum() ) 18
19 CLUSTER-CHARACTERISTIC PATTERNS 19
20 CLUSTER-CHARACTERISTIC PATTERNS // generate base collection btgs = iig.callforcollection( :BusinessTransactionGraphs, {} ) // aggregate profit aggfunc = ( Graph g => g.v.values( Revenue").sum() - g.v.values( Expense").sum() ) btgs = btgs.apply( Graph g => g.aggregate( Profit, aggfunc ) ) 20
21 CLUSTER-CHARACTERISTIC PATTERNS 21
22 CLUSTER-CHARACTERISTIC PATTERNS 22
23 CLUSTER-CHARACTERISTIC PATTERNS // specific projection vertexfunc = (Vertex v => new Vertex( (v[ IsMasterData ]? v[ SourceID ] : v[:type]), { Result :v[ Result ]} ) edgefunc = (Edge e => new Edge( (e[:type]), {} ) btgs = btgs.apply( Graph g => g.project( vertexfunc, edgefunc ) ) 23
24 CLUSTER-CHARACTERISTIC PATTERNS 24
25 CLUSTER-CHARACTERISTIC PATTERNS // select profit and loss clusters proftitbtgs = btgs.select( Graph g => g[ Result ] >= 0 ) lossbtgs = btgs.difference(profitbtgs) 25
26 CLUSTER-CHARACTERISTIC PATTERNS 26
27 CLUSTER-CHARACTERISTIC PATTERNS // select profit and loss clusters proftitbtgs = btgs.select( Graph g => g[ Result ] >= 0 ) lossbtgs = btgs.difference(profitbtgs) profitfreqpats = proftitbtgs.callforcollection( :FrequentSubgraphs, { Threshold :0.7} ) lossfreqpats = lossbtgs.callforcollection( :FrequentSubgraphs, { Threshold :0.7} ) // determine cluster characteristic patterns trivialpats = profitfreqpats.intersect(lossfreqpats) profitcharpatterns = profitfreqpats.difference(trivialpats) losscharpatterns = lossfreqpats.difference(trivialpats) 27
28 SUMMARY & ROADMAP: BIIIG Summary Graph-based business intelligence framework Graph transformations of business information systems Concept of Business Transaction Graphs Roadmap WIP: distributed frequent pattern mining Summarization-based Graph OLAP Meaningful result representation Real-world evaluation 28
29 REFERENCES Junghanns, M., Petermann, A., Gomez, K., Peukert, E., Rahm, E.: GRADOOP - Scalable Graph Data Management and Analytics with Hadoop. Tech. report, Univ. of Leipzig, June 2015 Kolb L., E. Rahm: Parallel Entity Resolution with Dedoop. Datenbank-Spektrum 13(1): (2013) Kolb L., A. Thor, E. Rahm: Dedoop: Efficient Deduplication with Hadoop. PVLDB 5(12), 2012 Kolb L., A. Thor, E. Rahm: Load Balancing for MapReduce-based Entity Resolution. ICDE 2012: Kolb L., Z. Sehili, E. Rahm: Iterative Computation of Connected Graph Components with MapReduce. Datenbank-Spektrum 14(2): (2014) Petermann A., M. Junghanns, R. Müller, E. Rahm: BIIIG : Enabling Business Intelligence with Integrated Instance Graphs. Proc. 5th Int. Workshop on Graph Data Management (GDM 2014) Petermann A., M. Junghanns, R. Müller, E. Rahm: Graph-based Data Integration and Business Intelligence with BIIIG. Proc. VLDB Conf., 2014 Petermann, A.; Junghanns, M.; Müller, R.; Rahm, E.: FoodBroker - Generating Synthetic Datasets for Graph- Based Business Analytics. Proc. 5th Int. Workshop on Big Data Benchmarking (WBDB), 2014 Jindal, A. et.al.: Vertexica: your relational friend for graph analytics!. PVLDB 7(13), 2014 Rudolf, M. et.al.: The Graph Story of the SAP HANA Database. BTW,
30 Thank you!
BIG DATA INTEGRATION RESEARCH AT THE UNIVERSITY OF LEIPZIG
BIG DATA INTEGRATION RESEARCH AT THE UNIVERSITY OF LEIPZIG ERHARD RAHM, UNIV. LEIPZIG www.scads.de UNIVERSITY OF LEIPZIG Founded in 1409 Now about 30.000 students in 14 faculties Computer science 13 professorships
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics
FoodBroker - Generating Synthetic Datasets for Graph-Based Business Analytics André Petermann 1,2, Martin Junghanns 1, Robert Müller 2 and Erhard Rahm 1 1 University of Leipzig {petermann,junghanns,rahm}@informatik.uni-leipzig.de
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
SAP HANA From Relational OLAP Database to Big Data Infrastructure
SAP HANA From Relational OLAP Database to Big Data Infrastructure Anil K Goel VP & Chief Architect, SAP HANA Data Platform WBDB 2015, June 16, 2015 Toronto SAP Big Data Story Data Lifecycle Management
Apache Flink Next-gen data analysis. Kostas Tzoumas [email protected] @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas [email protected] @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
Load-Balancing the Distance Computations in Record Linkage
Load-Balancing the Distance Computations in Record Linkage Dimitrios Karapiperis Vassilios S. Verykios Hellenic Open University School of Science and Technology Patras, Greece {dkarapiperis, verykios}@eap.gr
Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012
Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
DduP Towards a Deduplication Framework utilising Apache Spark
DduP Towards a Deduplication Framework utilising Apache Spark Niklas Wilcke Datenbanken und Informationssysteme (ISYS) University of Hamburg Vogt-Koelln-Strasse 30 22527 Hamburg [email protected]
On a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Learning-based Entity Resolution with MapReduce
Learning-based Entity Resolution with MapReduce Lars Kolb 1 Hanna Köpcke 2 Andreas Thor 1 Erhard Rahm 1,2 1 Database Group, 2 WDI Lab University of Leipzig {kolb,koepcke,thor,rahm}@informatik.uni-leipzig.de
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
BIG DATA INTEGRATION AT SCADS DRESDEN/LEIPZIG
BIG DATA INTEGRATION AT SCADS DRESDEN/LEIPZIG ERHARD RAHM, UNIV. LEIPZIG www.scads.de GERMAN CENTERS FOR BIG DATA Two Centers of Excellence for Big Data in Germany ScaDS Dresden/Leipzig Berlin Big Data
Managing large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl [email protected] dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On
Extend your analytic capabilities with SAP Predictive Analysis
September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
Apache MRQL (incubating): Advanced Query Processing for Complex, Large-Scale Data Analysis
Apache MRQL (incubating): Advanced Query Processing for Complex, Large-Scale Data Analysis Leonidas Fegaras University of Texas at Arlington http://mrql.incubator.apache.org/ 04/12/2015 Outline Who am
What s next for the Berkeley Data Analytics Stack?
What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff
HPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox [email protected] http://www.infomall.org
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Big Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
Evaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist [email protected], [email protected], [email protected] Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
Scaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
BIG DATA AND ANALYTICS
BIG DATA AND ANALYTICS Björn Bjurling, [email protected] Daniel Gillblad, [email protected] Anders Holst, [email protected] Swedish Institute of Computer Science AGENDA What is big data and analytics? and why one must bother
Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015
Mastering Big Data Steve Hoskin, VP and Chief Architect INFORMATICA MDM October 2015 Agenda About Big Data MDM and Big Data The Importance of Relationships Big Data Use Cases About Big Data Big Data is
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
How To Balance In Cloud Computing
A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi [email protected] Yedhu Sastri Dept. of IT, RSET,
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
How To Use Spagobi Suite
Big Data Overview on SpagoBI suite A comprehensive suiteoffering a full set of analytical and reporting tools. Innovative themes and solutions: Location Intelligence, Free inquiry, KPI, Interactive cockpits,
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Architectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet [email protected] October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
Distributed R for Big Data
Distributed R for Big Data Indrajit Roy, HP Labs November 2013 Team: Shivara m Erik Kyungyon g Alvin Rob Vanish A Big Data story Once upon a time, a customer in distress had. 2+ billion rows of financial
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
Pla7orms for Big Data Management and Analysis. Michael J. Carey Informa(on Systems Group UCI CS Department
Pla7orms for Big Data Management and Analysis Michael J. Carey Informa(on Systems Group UCI CS Department Outline Big Data Pla6orm Space The Big Data Era Brief History of Data Pla6orms Dominant Pla6orms
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment firms turn big data into actionable research
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
Introduction to Spark
Introduction to Spark Shannon Quinn (with thanks to Paco Nathan and Databricks) Quick Demo Quick Demo API Hooks Scala / Java All Java libraries *.jar http://www.scala- lang.org Python Anaconda: https://
3.1 Solving Systems Using Tables and Graphs
Algebra 2 Chapter 3 3.1 Solve Systems Using Tables & Graphs 3.1 Solving Systems Using Tables and Graphs A solution to a system of linear equations is an that makes all of the equations. To solve a system
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
A Load-Balanced MapReduce Algorithm for Blocking-based Entity-resolution with Multiple Keys
Proceedings of the Twelfth Australasian Symposium on Parallel and Distributed Computing (AusPDC 0), Auckland, New Zealand A Load-Balanced MapReduce Algorithm for Blocking-based Entity-resolution with Multiple
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Business Intelligence in Microservice Architecture. Debarshi Basak @ bol.com
Business Intelligence in Microservice Architecture Debarshi Basak @ bol.com What can you expect? - Introduction Monolithic days Mapreduce Era Flink Era Operational Aspect Who am I? Debarshi Basak Software
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
www.ducenit.com Analance Data Integration Technical Whitepaper
Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring
How To Use Hadoop For Gis
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics
Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, and Fatma Özcan IBM Research China IBM Almaden
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Hadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
Big Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island
Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm
HadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
Big Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
Big Data Research in Berlin BBDC and Apache Flink
Big Data Research in Berlin BBDC and Apache Flink Tilmann Rabl [email protected] dima.tu-berlin.de bbdc.berlin 1 2013 Berlin Big Data Center All Rights Reserved DIMA 2015 Agenda About Data Management,
Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai [email protected] Intel Software and Services Group
Real-Time Analytical Processing (RTAP) Using the Spark Stack Jason Dai [email protected] Intel Software and Services Group Project Overview Research & open source projects initiated by AMPLab in UC Berkeley
http://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud. Tejas Bharat Thorat Prof.RanjanaR.Badre Computer Engineering Department Computer
