Systems and Algorithms for Big Data Analytics

Size: px
Start display at page:

Download "Systems and Algorithms for Big Data Analytics"

Transcription

1 Systems and Algorithms for Big Data Analytics YAN, Da

2 My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining Uncertain Data 2

3 My Research Graph Data Distributed Graph Processing Algorithm Design & Analysis Computation Model Communication Mechanism Fault Tolerance Out-of-core Support 3

4 My Research Spatial Settings Road Networks Terrain Meshes Euclidean Space (Trajectories). Spatial Data Spatial Query Processing Spatial Queries Optimal Meeting Point Distance-Preserving Subgraph Facility Location Problem Reverse Nearest Neighbors 4

5 My Research Top-k Queries (DASFAA 2011 Best Paper) Sequential Pattern Mining Spatial Queries. Uncertain Data Querying & Mining Uncertain Data 5

6 My Research Focus of this presentation Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining Uncertain Data 6

7 Google s Pregel Distributed Framework for Graph Processing» User-friendly: think like a vertex» Message passing» Iterative Bulk synchronous parallel Superstep 7

8 Google s Pregel Vertex Partitioning M 0 M 1 M 2 8

9 Google s Pregel Programming Interfaces» u.compute(msgs)» u.send_msg(v, msg)» get_superstep_number()» u.vote_to_halt() Called inside u.compute(msgs) 9

10 Google s Pregel Vertex state» Active / inactive» Reactivated by messages Stop condition» All vertices are halted, and» No pending messages for the next superstep 10

11 Google s Pregel Hash-Min: Connected Components Superstep 1 11

12 Google s Pregel Hash-Min: Connected Components Superstep 2 12

13 Google s Pregel Illustration of Hash-Min Superstep 3 13

14 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 14

15 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 15

16 Practical Pregel Alogorithms Practical Pregel Algorithms (PPAs) [PVLDB 14]» The first cost model for Pregel algorithm design» PPAs for fundamental graph problems Breadth-first search, list ranking, spanning tree, Euler tour, pre/post-order traversal, connected components, biconnected components, strongly connected components, etc. 16

17 Practical Pregel Alogorithms Practical Pregel Algorithms (PPAs) [PVLDB 14]» Linear cost per superstep O( V + E ) message number O( V + E ) computation time O( V + E ) RAM space» Logarithm number of supersteps O(log V ) supersteps O(log V ) = O(log E ) How about load balancing? 17

18 Practical Pregel Alogorithms Balanced Practical Pregel Algorithms (BPPAs)» d in (v): in-degree of v» d out (v): out-degree of v» Linear cost per superstep O(d in (v) + d out (v)) message number O(d in (v) + d out (v)) computation time O(d in (v) + d out (v)) RAM space» Logarithm number of supersteps 18

19 Practical Pregel Alogorithms Example: List Ranking» A procedure in computing bi-connected components» Linked list where each element v has Value val(v) Predecessor pred(v)» Element at the head has pred(v) = NULL NULL v 1 v 2 v 3 v 4 v Toy Example: val(v) = 1 for all v 19

20 Practical Pregel Alogorithms Example: List Ranking» Compute sum(v) for each element v summing val(v) and values of all predecessors» Why TeraSort cannot work? NULL v 1 v 2 v 3 v 4 v

21 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) As long as pred(v) NULL NULL v 1 v 2 v 3 v 4 v

22 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) NULL NULL v 1 v 2 v 3 v 4 v

23 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) NULL NULL v 1 v 2 v 3 v 4 v NULL

24 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) O(log V ) supersteps NULL NULL v 1 v 2 v 3 v 4 v NULL NULL

25 Practical Pregel Alogorithms Example: Connected Components» Pointer jumping / path doubling» Each vertex u maintains a pointer D[u] Vertices are organized by a pseudo-forest D[u] is the parent link v w 25

26 Practical Pregel Alogorithms Example: Connected Components» Repeating two steps: O(log V ) rounds» Step 1: tree hooking w x u v D[v] < D[u] 26

27 Practical Pregel Alogorithms Example: Connected Components» Repeating two steps: O(log V ) rounds» Step 2: Shortcutting y Pointing v to the parent of v s parent u w x u x y w 27

28 Practical Pregel Alogorithms Example: Connected Components» Repeating two steps: O(log V ) rounds» Stop condition: D[u] converges for every vertex u Every vertex belongs to a star Every star refers to a CC 28

29 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 29

30 Block-Centric Computation Blogel: Block-Centric Model [PVLDB 14]» Orders of magnitude performance improvement e.g., one hour 10 seconds 30

31

32

33 Block-Centric Computation Motivation» Graph characteristics adverse to Pregel Large graph diameter Skewed vertex degree distribution High average vertex degree Data Type V E AVG Deg Max Deg WebUK directed 133,633,040 5,507,679, ,429 LiveJournal directed 10,690, ,614, ,053,676 Twitter directed 52,579,682 1,963,263, ,958 BTC undirected 164,732, ,822, ,637,619 33

34 Block-Centric Computation Idea of Block-Centric Computation» A block refers to a connected subgraph of the graph» Message exchanges occur only among blocks» Serial in-memory algorithm is run within a block 34

35 Block-Centric Computation Benefits of Block-Centric Computation» High-degree vertices inside a block send no msgs» Much less number of supersteps» Much less number of blocks than vertices 35

36 Block-Centric Computation Example: Hash-Min» Condense each block into a supervertex, to get blocklevel graph i.e., to construct an adjacency list for each block» Run Hash-Min over block-level graph To propagate min block ID instead of min vertex ID 36

37 Block-Centric Computation Effectiveness BTC Friendster USA Road Computing Time Total Msg # Superstep # V-Centric s 1,188,832, B-Centric 0.94 s 1,747,653 6 V-Centric s 7,226,963, B-Centric 2.52 s 19,410,865 5 V-Centric s 8,353,044,435 6,262 B-Centric 1.94 s 270,

38 Block-Centric Computation Example: Single-Source Shortest Paths» Source s V» Each edge has a length» Goal: to compute distance from s to each v V 38

39 Block-Centric Computation Example: Single-Source Shortest Paths» Vertices receives msgs from remote neighbors to update their distances» A block runs Dijkstra s algorithm from updated vertices» Remote neighbors are sent msgs, rather than enqueued 39

40 Block-Centric Computation Effectiveness Euro Road USA Road Time Step # V-Centric s 6210 B-Centric s 60 V-Centric s B-Centric s 58 40

41 Block-Centric Computation Graph Partitioning» Graph Voronoi Diagram (GVD) partitioning v Three seeds v is 2 hops from red seed v is 3 hops from green seed v is 5 hops from blue seed 41

42 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p 42

43 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p 43

44 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p» Compute GVD grouping Vertex-centric multi-source BFS 44

45 Block-Centric Computation Vertex-Centric Multi-Source BFS State after Seed Sampling 45

46 Block-Centric Computation Vertex-Centric Multi-Source BFS Superstep 1 46

47 Block-Centric Computation Vertex-Centric Multi-Source BFS Superstep 2 47

48 Block-Centric Computation Vertex-Centric Multi-Source BFS Superstep 3 48

49 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p» Compute GVD grouping» Repeat GVD Computation: Erase colors of large blocks Increase p and resample seeds Compute GVD over unassigned vertices 49

50 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p» Compute GVD grouping» Repeat GVD Computation» Run Hash-Min over unassigned vertices Why is this step necessary? Consider a graph with many small components 50

51 Block-Centric Computation GVD Partitioning Performance WebUK Friendster BTC LiveJournal USA Road Euro Road Loading Partitioning Dumping 51

52 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 52

53 Message Reduction Message Reduction in Pregel+ [WWW 15]» Two techniques to reduce # of messages transmitted Vertex Mirroring Request-Respond Paradigm 53

54 Message Reduction Vertex Mirroring» Motivation: High-degree vertices send a lot of messages A vertex sends the same messages to neighbors Hash-Min: min(v) PageRank: PageRank(v) / out-degree(v) 54

55 Message Reduction Vertex Mirroring v 1 u 1 w 1 v 2 u 2 w 2 v j u i w k M 2 M 1 M 3 55

56 Message Reduction Vertex Mirroring v 1 u 1 w 1 v 2 u 2 w 2 v j u i u i u i w k M 2 M 1 M 3 56

57 Message Reduction Vertex Mirroring v.s. Message Combining» Create mirror for u 4? Consider messages to v 2 u 1 v 1 v 2 v 1 u 2 v 1 v 2 u 3 v 1 v 2 v 2 v 3 u 4 v 1 v 2 v 3 v 4 v 4 M 1 M 2 57

58 Message Reduction Vertex Mirroring v.s. Message Combining» Create mirror for u 4? Message combining without mirroring u 4 u 1 v 1 v 2 u 1 v 1 u 2 v 1 v 2 u 3 v 1 v 2 u 2 u 3 a(u 1 ) + a(u 2 ) + a(u 3 ) + a(u 4 ) v 2 v 3 u 4 v 1 v 2 v 3 v 4 u 4 v 4 M 1 M 1 M 2 58

59 Message Reduction Vertex Mirroring v.s. Message Combining» Create mirror for u 4? Message combining with u 4 mirrored u 1 v 1 v 2 u 1 a(u 1 ) + a(u 2 ) + a(u 3 ) v 1 u 2 v 1 v 2 u 2 v 2 u 3 v 1 v 2 u 4 v 1 v 2 v 3 v 4 u 3 u 4 a(u 4 ) u 4 v 3 v 4 M 1 M 1 M 2 59

60 Message Reduction Vertex Mirroring v.s. Message Combining» Only mirror high-degree vertices Choice of degree threshold τ M machines, n vertices, m edges Average degree: deg avg = m / n Optimal τ is M exp{deg avg / M} 60

61 Message Reduction Effectiveness of Message Reduction Number of messages sent by each worker in Pregel+ (blue bars w/o mirroring, red bars mirroring) 61

62 Message Reduction Request-Respond Paradigm» Motivation As a pointer-jumping algorithm goes on, there are fewer and fewer delegates communicating with more and more vertices E.g., PPA for computing connected components Merge small trees to large trees A vertex is the delegate of its children 62

63 Message Reduction Request-Respond Paradigm» Request-Respond API Retains all basic Pregel operations A vertex v can request attribute a(u) in superstep i, and a(u) will be available in superstep (i + 1) Here, u can be a delegate, and a(u) may be requested by many vertices v 63

64 Message Reduction Request-Respond Paradigm» Benefits Without Request-Respond v 1 v 2 v 3 v 4 <v 1 > <v 2 > <v 3 > <v 4 > u a(u) M 2 64

65 Message Reduction Request-Respond Paradigm» Benefits Without Request-Respond v 1 v 2 a(u) a(u) a(u) u v 3 v 4 a(u) a(u) M 2 65

66 Message Reduction Request-Respond Paradigm» Benefits Using Request-Respond v 1 v 2 u a[u] request u u v 3 v 4 M 1 a[u] M 2 66

67 Message Reduction Effectiveness of Request-Respond Paradigm Number of messages sent by each worker using Pregel+ (blue bars w/o req-resp, red bars with req-resp) 67

68 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 68

69 Other Improvements Fault Tolerance» Checkpointing time: 60 seconds 2 seconds Querying Workload» Over 100 seconds per query 3 queries per second Out-of-core Execution» Performance comparable to the fastest in-memory Pregel-like system Survey on Big Graph Systems 69

70 Open-Source Systems High ranking in Google, well indexed Used by industrial partners An ITF project funded with HK$ 1.4M 70

71 Open-Source Systems Many times faster than CMU s GraphLab» GraphLab is sold for US$ 6.7M 10x faster than Giraph used by Facebook» Facebook researchers closely follow our work Taobao replaces Spark with our system» Faster with 4 machines than Spark with 100 machines 71

72 Future Directions Beyond Pregel» Graph problem not suitable for Pregel Output size beyond linear Non-iterative» Examples Graph matching Motif mining Frequent subgraph mining 72

73 Future Directions Other Big Data Systems» Urban Computing Taxi trajectories Octopus card records (bus, MTR, ferry, )» Machine Learning Improving recommendation by Semantic Web Systems for deep learning 73

74 Thanks YAN, Da Contact Info Webpage: 74

LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.

LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D. LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing /35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of

More information

Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics

Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics Sabeur Aridhi Aalto University, Finland Sabeur Aridhi Frameworks for Big Data Analytics 1 / 59 Introduction Contents 1 Introduction

More information

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction

More information

Large Scale Graph Processing with Apache Giraph

Large Scale Graph Processing with Apache Giraph Large Scale Graph Processing with Apache Giraph Sebastian Schelter Invited talk at GameDuell Berlin 29th May 2012 the mandatory about me slide PhD student at the Database Systems and Information Management

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

An Experimental Comparison of Pregel-like Graph Processing Systems

An Experimental Comparison of Pregel-like Graph Processing Systems An Experimental Comparison of Pregel-like Graph Processing Systems Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, Tianqi Jin David R. Cheriton School of Computer Science, University

More information

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12 MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012 Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships

More information

Course on Social Network Analysis Graphs and Networks

Course on Social Network Analysis Graphs and Networks Course on Social Network Analysis Graphs and Networks Vladimir Batagelj University of Ljubljana Slovenia V. Batagelj: Social Network Analysis / Graphs and Networks 1 Outline 1 Graph...............................

More information

Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction

Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives. Vinay Goel Senior Data Engineer Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

More information

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Apache Hama Design Document v0.6

Apache Hama Design Document v0.6 Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault

More information

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research

More information

Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)

Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding) Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter

More information

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing : A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat Karim Awara Amani Alonazi Hani Jamjoom Dan Williams Panos Kalnis King Abdullah University of Science and Technology,

More information

An NSA Big Graph experiment. Paul Burkhardt, Chris Waring. May 20, 2013

An NSA Big Graph experiment. Paul Burkhardt, Chris Waring. May 20, 2013 U.S. National Security Agency Research Directorate - R6 Technical Report NSA-RD-2013-056002v1 May 20, 2013 Graphs are everywhere! A graph is a collection of binary relationships, i.e. networks of pairwise

More information

Presto/Blockus: Towards Scalable R Data Analysis

Presto/Blockus: Towards Scalable R Data Analysis /Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Big Data and Scripting Systems beyond Hadoop

Big Data and Scripting Systems beyond Hadoop Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

More information

6.852: Distributed Algorithms Fall, 2009. Class 2

6.852: Distributed Algorithms Fall, 2009. Class 2 .8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014 Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis

Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis Jiwon Seo Stanford University jiwon@stanford.edu Jongsoo Park Intel Corporation jongsoo.park@intel.com Jaeho Shin Stanford

More information

Social Network Mining

Social Network Mining Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

More information

Outline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing 2/28/2013

Outline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing 2/28/2013 MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing Iftekhar Naim Outline Motivation MapReduce Overview Design Issues & Abstractions Examples and Results Pros and Cons

More information

Information Processing, Big Data, and the Cloud

Information Processing, Big Data, and the Cloud Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive

More information

Analysis of Algorithms, I

Analysis of Algorithms, I Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

NP-Completeness. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

NP-Completeness. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University NP-Completeness CptS 223 Advanced Data Structures Larry Holder School of Electrical Engineering and Computer Science Washington State University 1 Hard Graph Problems Hard means no known solutions with

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Cpt S 223. School of EECS, WSU

Cpt S 223. School of EECS, WSU The Shortest Path Problem 1 Shortest-Path Algorithms Find the shortest path from point A to point B Shortest in time, distance, cost, Numerous applications Map navigation Flight itineraries Circuit wiring

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

CIS 700: algorithms for Big Data

CIS 700: algorithms for Big Data CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv

More information

Chapter 6: Graph Theory

Chapter 6: Graph Theory Chapter 6: Graph Theory Graph theory deals with routing and network problems and if it is possible to find a best route, whether that means the least expensive, least amount of time or the least distance.

More information

A1 and FARM scalable graph database on top of a transactional memory layer

A1 and FARM scalable graph database on top of a transactional memory layer A1 and FARM scalable graph database on top of a transactional memory layer Miguel Castro, Aleksandar Dragojević, Dushyanth Narayanan, Ed Nightingale, Alex Shamis Richie Khanna, Matt Renzelmann Chiranjeeb

More information

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

Oracle Spatial and Graph. Jayant Sharma Director, Product Management Oracle Spatial and Graph Jayant Sharma Director, Product Management Agenda Oracle Spatial and Graph Graph Capabilities Q&A 2 Oracle Spatial and Graph Complete Open Integrated Most Widely Used 3 Open and

More information

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010. Chapter 7: Digraphs

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010. Chapter 7: Digraphs MCS-236: Graph Theory Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010 Chapter 7: Digraphs Strong Digraphs Definitions. A digraph is an ordered pair (V, E), where V is the set

More information

Using Map-Reduce for Large Scale Analysis of Graph-Based Data

Using Map-Reduce for Large Scale Analysis of Graph-Based Data Using Map-Reduce for Large Scale Analysis of Graph-Based Data NAN GONG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2011 TRITA-ICT-EX-2011:218 Using Map-Reduce

More information

Optimizations and Analysis of BSP Graph Processing Models on Public Clouds

Optimizations and Analysis of BSP Graph Processing Models on Public Clouds Optimizations and Analysis of BSP Graph Processing Models on Public Clouds Mark Redekopp, Yogesh Simmhan, and Viktor K. Prasanna University of Southern California, Los Angeles CA 989 {redekopp, simmhan,

More information

The Stratosphere Big Data Analytics Platform

The Stratosphere Big Data Analytics Platform The Stratosphere Big Data Analytics Platform Amir H. Payberah Swedish Institute of Computer Science amir@sics.se June 4, 2014 Amir H. Payberah (SICS) Stratosphere June 4, 2014 1 / 44 Big Data small data

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005 V. Adamchik 1 Graph Theory Victor Adamchik Fall of 2005 Plan 1. Basic Vocabulary 2. Regular graph 3. Connectivity 4. Representing Graphs Introduction A.Aho and J.Ulman acknowledge that Fundamentally, computer

More information

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P) Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...

More information

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

Accelerating In-Memory Graph Database traversal using GPGPUS

Accelerating In-Memory Graph Database traversal using GPGPUS Accelerating In-Memory Graph Database traversal using GPGPUS Ashwin Raghav Mohan Ganesh University of Virginia High Performance Computing Laboratory am2qa@virginia.edu Abstract The paper aims to provide

More information

Cloud Computing. Lectures 10 and 11 Map Reduce: System Perspective 2014-2015

Cloud Computing. Lectures 10 and 11 Map Reduce: System Perspective 2014-2015 Cloud Computing Lectures 10 and 11 Map Reduce: System Perspective 2014-2015 1 MapReduce in More Detail 2 Master (i) Execution is controlled by the master process: Input data are split into 64MB blocks.

More information

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8] Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

More information

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014 SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance

More information

Graph theory and network analysis. Devika Subramanian Comp 140 Fall 2008

Graph theory and network analysis. Devika Subramanian Comp 140 Fall 2008 Graph theory and network analysis Devika Subramanian Comp 140 Fall 2008 1 The bridges of Konigsburg Source: Wikipedia The city of Königsberg in Prussia was set on both sides of the Pregel River, and included

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

A Lightweight Infrastructure for Graph Analytics

A Lightweight Infrastructure for Graph Analytics A Lightweight Infrastructure for Graph Analytics Donald Nguyen, Andrew Lenharth and Keshav Pingali The University of Texas at Austin, Texas, USA {ddn@cs, lenharth@ices, pingali@cs}.utexas.edu Abstract

More information

Diversity Coloring for Distributed Data Storage in Networks 1

Diversity Coloring for Distributed Data Storage in Networks 1 Diversity Coloring for Distributed Data Storage in Networks 1 Anxiao (Andrew) Jiang and Jehoshua Bruck California Institute of Technology Pasadena, CA 9115, U.S.A. {jax, bruck}@paradise.caltech.edu Abstract

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

arxiv:1405.1499v3 [cs.db] 30 Sep 2015

arxiv:1405.1499v3 [cs.db] 30 Sep 2015 Noname manuscript No. (will be inserted by the editor) NScale: Neighborhood-centric Large-Scale Graph Analytics in the Cloud Abdul Quamar Amol Deshpande Jimmy Lin arxiv:145.1499v3 [cs.db] 3 Sep 215 the

More information

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Significantly Speed up real world big data Applications using Apache Spark

Significantly Speed up real world big data Applications using Apache Spark Significantly Speed up real world big data Applications using Apache Spark Mingfei Shi(mingfei.shi@intel.com) Grace Huang ( jie.huang@intel.com) Intel/SSG/Big Data Technology 1 Agenda Who are we? Case

More information

Implementing Graph Pattern Mining for Big Data in the Cloud

Implementing Graph Pattern Mining for Big Data in the Cloud Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com

More information

The Power of Relationships

The Power of Relationships The Power of Relationships Opportunities and Challenges in Big Data Intel Labs Cluster Computing Architecture Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO

More information

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks

More information

Big Data looks Tiny from the Stratosphere

Big Data looks Tiny from the Stratosphere Volker Markl http://www.user.tu-berlin.de/marklv volker.markl@tu-berlin.de Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type

More information

CMPSCI611: Approximating MAX-CUT Lecture 20

CMPSCI611: Approximating MAX-CUT Lecture 20 CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to

More information

Persistent Data Structures and Planar Point Location

Persistent Data Structures and Planar Point Location Persistent Data Structures and Planar Point Location Inge Li Gørtz Persistent Data Structures Ephemeral Partial persistence Full persistence Confluent persistence V1 V1 V1 V1 V2 q ue V2 V2 V5 V2 V4 V4

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

B490 Mining the Big Data. 2 Clustering

B490 Mining the Big Data. 2 Clustering B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern

More information

Analyzing the Facebook graph?

Analyzing the Facebook graph? Logistics Big Data Algorithmic Introduction Prof. Yuval Shavitt Contact: shavitt@eng.tau.ac.il Final grade: 4 6 home assignments (will try to include programing assignments as well): 2% Exam 8% Big Data

More information

Analysis of MapReduce Algorithms

Analysis of MapReduce Algorithms Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model

More information

Trinity: A Distributed Graph Engine on a Memory Cloud

Trinity: A Distributed Graph Engine on a Memory Cloud Trinity: A Distributed Graph Engine on a Memory Cloud Bin Shao Microsoft Research Asia Beijing, China binshao@microsoft.com Haixun Wang Microsoft Research Asia Beijing, China haixunw@microsoft.com Yatao

More information

A SURVEY OF PERSISTENT GRAPH DATABASES

A SURVEY OF PERSISTENT GRAPH DATABASES A SURVEY OF PERSISTENT GRAPH DATABASES A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Science by Yufan Liu March 2014 Thesis written

More information

Data Structure [Question Bank]

Data Structure [Question Bank] Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:

More information

Graph Theory Algorithms for Mobile Ad Hoc Networks

Graph Theory Algorithms for Mobile Ad Hoc Networks Informatica 36 (2012) 185-200 185 Graph Theory Algorithms for Mobile Ad Hoc Networks Natarajan Meghanathan Department of Computer Science, Jackson State University Jackson, MS 39217, USA E-mail: natarajan.meghanathan@jsums.edu

More information

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving

More information

imgraph: A distributed in-memory graph database

imgraph: A distributed in-memory graph database imgraph: A distributed in-memory graph database Salim Jouili Eura Nova R&D 435 Mont-Saint-Guibert, Belgium Email: salim.jouili@euranova.eu Aldemar Reynaga Université Catholique de Louvain 348 Louvain-La-Neuve,

More information

Introduction to Graph Mining

Introduction to Graph Mining Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information