Systems and Algorithms for Big Data Analytics
|
|
- Barnaby Cook
- 8 years ago
- Views:
Transcription
1 Systems and Algorithms for Big Data Analytics YAN, Da
2 My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining Uncertain Data 2
3 My Research Graph Data Distributed Graph Processing Algorithm Design & Analysis Computation Model Communication Mechanism Fault Tolerance Out-of-core Support 3
4 My Research Spatial Settings Road Networks Terrain Meshes Euclidean Space (Trajectories). Spatial Data Spatial Query Processing Spatial Queries Optimal Meeting Point Distance-Preserving Subgraph Facility Location Problem Reverse Nearest Neighbors 4
5 My Research Top-k Queries (DASFAA 2011 Best Paper) Sequential Pattern Mining Spatial Queries. Uncertain Data Querying & Mining Uncertain Data 5
6 My Research Focus of this presentation Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining Uncertain Data 6
7 Google s Pregel Distributed Framework for Graph Processing» User-friendly: think like a vertex» Message passing» Iterative Bulk synchronous parallel Superstep 7
8 Google s Pregel Vertex Partitioning M 0 M 1 M 2 8
9 Google s Pregel Programming Interfaces» u.compute(msgs)» u.send_msg(v, msg)» get_superstep_number()» u.vote_to_halt() Called inside u.compute(msgs) 9
10 Google s Pregel Vertex state» Active / inactive» Reactivated by messages Stop condition» All vertices are halted, and» No pending messages for the next superstep 10
11 Google s Pregel Hash-Min: Connected Components Superstep 1 11
12 Google s Pregel Hash-Min: Connected Components Superstep 2 12
13 Google s Pregel Illustration of Hash-Min Superstep 3 13
14 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 14
15 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 15
16 Practical Pregel Alogorithms Practical Pregel Algorithms (PPAs) [PVLDB 14]» The first cost model for Pregel algorithm design» PPAs for fundamental graph problems Breadth-first search, list ranking, spanning tree, Euler tour, pre/post-order traversal, connected components, biconnected components, strongly connected components, etc. 16
17 Practical Pregel Alogorithms Practical Pregel Algorithms (PPAs) [PVLDB 14]» Linear cost per superstep O( V + E ) message number O( V + E ) computation time O( V + E ) RAM space» Logarithm number of supersteps O(log V ) supersteps O(log V ) = O(log E ) How about load balancing? 17
18 Practical Pregel Alogorithms Balanced Practical Pregel Algorithms (BPPAs)» d in (v): in-degree of v» d out (v): out-degree of v» Linear cost per superstep O(d in (v) + d out (v)) message number O(d in (v) + d out (v)) computation time O(d in (v) + d out (v)) RAM space» Logarithm number of supersteps 18
19 Practical Pregel Alogorithms Example: List Ranking» A procedure in computing bi-connected components» Linked list where each element v has Value val(v) Predecessor pred(v)» Element at the head has pred(v) = NULL NULL v 1 v 2 v 3 v 4 v Toy Example: val(v) = 1 for all v 19
20 Practical Pregel Alogorithms Example: List Ranking» Compute sum(v) for each element v summing val(v) and values of all predecessors» Why TeraSort cannot work? NULL v 1 v 2 v 3 v 4 v
21 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) As long as pred(v) NULL NULL v 1 v 2 v 3 v 4 v
22 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) NULL NULL v 1 v 2 v 3 v 4 v
23 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) NULL NULL v 1 v 2 v 3 v 4 v NULL
24 Practical Pregel Alogorithms Example: List Ranking» Pointer jumping / path doubling sum(v) sum(v) + sum(pred(v)) pred(v) pred(pred(v)) O(log V ) supersteps NULL NULL v 1 v 2 v 3 v 4 v NULL NULL
25 Practical Pregel Alogorithms Example: Connected Components» Pointer jumping / path doubling» Each vertex u maintains a pointer D[u] Vertices are organized by a pseudo-forest D[u] is the parent link v w 25
26 Practical Pregel Alogorithms Example: Connected Components» Repeating two steps: O(log V ) rounds» Step 1: tree hooking w x u v D[v] < D[u] 26
27 Practical Pregel Alogorithms Example: Connected Components» Repeating two steps: O(log V ) rounds» Step 2: Shortcutting y Pointing v to the parent of v s parent u w x u x y w 27
28 Practical Pregel Alogorithms Example: Connected Components» Repeating two steps: O(log V ) rounds» Stop condition: D[u] converges for every vertex u Every vertex belongs to a star Every star refers to a CC 28
29 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 29
30 Block-Centric Computation Blogel: Block-Centric Model [PVLDB 14]» Orders of magnitude performance improvement e.g., one hour 10 seconds 30
31
32
33 Block-Centric Computation Motivation» Graph characteristics adverse to Pregel Large graph diameter Skewed vertex degree distribution High average vertex degree Data Type V E AVG Deg Max Deg WebUK directed 133,633,040 5,507,679, ,429 LiveJournal directed 10,690, ,614, ,053,676 Twitter directed 52,579,682 1,963,263, ,958 BTC undirected 164,732, ,822, ,637,619 33
34 Block-Centric Computation Idea of Block-Centric Computation» A block refers to a connected subgraph of the graph» Message exchanges occur only among blocks» Serial in-memory algorithm is run within a block 34
35 Block-Centric Computation Benefits of Block-Centric Computation» High-degree vertices inside a block send no msgs» Much less number of supersteps» Much less number of blocks than vertices 35
36 Block-Centric Computation Example: Hash-Min» Condense each block into a supervertex, to get blocklevel graph i.e., to construct an adjacency list for each block» Run Hash-Min over block-level graph To propagate min block ID instead of min vertex ID 36
37 Block-Centric Computation Effectiveness BTC Friendster USA Road Computing Time Total Msg # Superstep # V-Centric s 1,188,832, B-Centric 0.94 s 1,747,653 6 V-Centric s 7,226,963, B-Centric 2.52 s 19,410,865 5 V-Centric s 8,353,044,435 6,262 B-Centric 1.94 s 270,
38 Block-Centric Computation Example: Single-Source Shortest Paths» Source s V» Each edge has a length» Goal: to compute distance from s to each v V 38
39 Block-Centric Computation Example: Single-Source Shortest Paths» Vertices receives msgs from remote neighbors to update their distances» A block runs Dijkstra s algorithm from updated vertices» Remote neighbors are sent msgs, rather than enqueued 39
40 Block-Centric Computation Effectiveness Euro Road USA Road Time Step # V-Centric s 6210 B-Centric s 60 V-Centric s B-Centric s 58 40
41 Block-Centric Computation Graph Partitioning» Graph Voronoi Diagram (GVD) partitioning v Three seeds v is 2 hops from red seed v is 3 hops from green seed v is 5 hops from blue seed 41
42 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p 42
43 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p 43
44 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p» Compute GVD grouping Vertex-centric multi-source BFS 44
45 Block-Centric Computation Vertex-Centric Multi-Source BFS State after Seed Sampling 45
46 Block-Centric Computation Vertex-Centric Multi-Source BFS Superstep 1 46
47 Block-Centric Computation Vertex-Centric Multi-Source BFS Superstep 2 47
48 Block-Centric Computation Vertex-Centric Multi-Source BFS Superstep 3 48
49 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p» Compute GVD grouping» Repeat GVD Computation: Erase colors of large blocks Increase p and resample seeds Compute GVD over unassigned vertices 49
50 Block-Centric Computation GVD Partitioning» Sample seed vertices with probability p» Compute GVD grouping» Repeat GVD Computation» Run Hash-Min over unassigned vertices Why is this step necessary? Consider a graph with many small components 50
51 Block-Centric Computation GVD Partitioning Performance WebUK Friendster BTC LiveJournal USA Road Euro Road Loading Partitioning Dumping 51
52 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 52
53 Message Reduction Message Reduction in Pregel+ [WWW 15]» Two techniques to reduce # of messages transmitted Vertex Mirroring Request-Respond Paradigm 53
54 Message Reduction Vertex Mirroring» Motivation: High-degree vertices send a lot of messages A vertex sends the same messages to neighbors Hash-Min: min(v) PageRank: PageRank(v) / out-degree(v) 54
55 Message Reduction Vertex Mirroring v 1 u 1 w 1 v 2 u 2 w 2 v j u i w k M 2 M 1 M 3 55
56 Message Reduction Vertex Mirroring v 1 u 1 w 1 v 2 u 2 w 2 v j u i u i u i w k M 2 M 1 M 3 56
57 Message Reduction Vertex Mirroring v.s. Message Combining» Create mirror for u 4? Consider messages to v 2 u 1 v 1 v 2 v 1 u 2 v 1 v 2 u 3 v 1 v 2 v 2 v 3 u 4 v 1 v 2 v 3 v 4 v 4 M 1 M 2 57
58 Message Reduction Vertex Mirroring v.s. Message Combining» Create mirror for u 4? Message combining without mirroring u 4 u 1 v 1 v 2 u 1 v 1 u 2 v 1 v 2 u 3 v 1 v 2 u 2 u 3 a(u 1 ) + a(u 2 ) + a(u 3 ) + a(u 4 ) v 2 v 3 u 4 v 1 v 2 v 3 v 4 u 4 v 4 M 1 M 1 M 2 58
59 Message Reduction Vertex Mirroring v.s. Message Combining» Create mirror for u 4? Message combining with u 4 mirrored u 1 v 1 v 2 u 1 a(u 1 ) + a(u 2 ) + a(u 3 ) v 1 u 2 v 1 v 2 u 2 v 2 u 3 v 1 v 2 u 4 v 1 v 2 v 3 v 4 u 3 u 4 a(u 4 ) u 4 v 3 v 4 M 1 M 1 M 2 59
60 Message Reduction Vertex Mirroring v.s. Message Combining» Only mirror high-degree vertices Choice of degree threshold τ M machines, n vertices, m edges Average degree: deg avg = m / n Optimal τ is M exp{deg avg / M} 60
61 Message Reduction Effectiveness of Message Reduction Number of messages sent by each worker in Pregel+ (blue bars w/o mirroring, red bars mirroring) 61
62 Message Reduction Request-Respond Paradigm» Motivation As a pointer-jumping algorithm goes on, there are fewer and fewer delegates communicating with more and more vertices E.g., PPA for computing connected components Merge small trees to large trees A vertex is the delegate of its children 62
63 Message Reduction Request-Respond Paradigm» Request-Respond API Retains all basic Pregel operations A vertex v can request attribute a(u) in superstep i, and a(u) will be available in superstep (i + 1) Here, u can be a delegate, and a(u) may be requested by many vertices v 63
64 Message Reduction Request-Respond Paradigm» Benefits Without Request-Respond v 1 v 2 v 3 v 4 <v 1 > <v 2 > <v 3 > <v 4 > u a(u) M 2 64
65 Message Reduction Request-Respond Paradigm» Benefits Without Request-Respond v 1 v 2 a(u) a(u) a(u) u v 3 v 4 a(u) a(u) M 2 65
66 Message Reduction Request-Respond Paradigm» Benefits Using Request-Respond v 1 v 2 u a[u] request u u v 3 v 4 M 1 a[u] M 2 66
67 Message Reduction Effectiveness of Request-Respond Paradigm Number of messages sent by each worker using Pregel+ (blue bars w/o req-resp, red bars with req-resp) 67
68 Outline Practical Pregel Algorithms Blogel: Block-Centric Computation Pregel+: Message Reduction Other Improvements to Pregel Future Directions 68
69 Other Improvements Fault Tolerance» Checkpointing time: 60 seconds 2 seconds Querying Workload» Over 100 seconds per query 3 queries per second Out-of-core Execution» Performance comparable to the fastest in-memory Pregel-like system Survey on Big Graph Systems 69
70 Open-Source Systems High ranking in Google, well indexed Used by industrial partners An ITF project funded with HK$ 1.4M 70
71 Open-Source Systems Many times faster than CMU s GraphLab» GraphLab is sold for US$ 6.7M 10x faster than Giraph used by Facebook» Facebook researchers closely follow our work Taobao replaces Spark with our system» Faster with 4 machines than Spark with 100 machines 71
72 Future Directions Beyond Pregel» Graph problem not suitable for Pregel Output size beyond linear Non-iterative» Examples Graph matching Motif mining Frequent subgraph mining 72
73 Future Directions Other Big Data Systems» Urban Computing Taxi trajectories Octopus card records (bus, MTR, ferry, )» Machine Learning Improving recommendation by Semantic Web Systems for deep learning 73
74 Thanks YAN, Da Contact Info Webpage: 74
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationConvex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics
Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics Sabeur Aridhi Aalto University, Finland Sabeur Aridhi Frameworks for Big Data Analytics 1 / 59 Introduction Contents 1 Introduction
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph Sebastian Schelter Invited talk at GameDuell Berlin 29th May 2012 the mandatory about me slide PhD student at the Database Systems and Information Management
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationAn Experimental Comparison of Pregel-like Graph Processing Systems
An Experimental Comparison of Pregel-like Graph Processing Systems Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer Özsu, Xingfang Wang, Tianqi Jin David R. Cheriton School of Computer Science, University
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationOverview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012
Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationCourse on Social Network Analysis Graphs and Networks
Course on Social Network Analysis Graphs and Networks Vladimir Batagelj University of Ljubljana Slovenia V. Batagelj: Social Network Analysis / Graphs and Networks 1 Outline 1 Graph...............................
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationGraph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationApache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
More informationSociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)
Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat Karim Awara Amani Alonazi Hani Jamjoom Dan Williams Panos Kalnis King Abdullah University of Science and Technology,
More informationAn NSA Big Graph experiment. Paul Burkhardt, Chris Waring. May 20, 2013
U.S. National Security Agency Research Directorate - R6 Technical Report NSA-RD-2013-056002v1 May 20, 2013 Graphs are everywhere! A graph is a collection of binary relationships, i.e. networks of pairwise
More informationPresto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationBig Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
More informationLarge-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More information6.852: Distributed Algorithms Fall, 2009. Class 2
.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
More informationDistributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis
Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis Jiwon Seo Stanford University jiwon@stanford.edu Jongsoo Park Intel Corporation jongsoo.park@intel.com Jaeho Shin Stanford
More informationSocial Network Mining
Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics
More informationOutline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing 2/28/2013
MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing Iftekhar Naim Outline Motivation MapReduce Overview Design Issues & Abstractions Examples and Results Pros and Cons
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive
More informationAnalysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationNP-Completeness. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University
NP-Completeness CptS 223 Advanced Data Structures Larry Holder School of Electrical Engineering and Computer Science Washington State University 1 Hard Graph Problems Hard means no known solutions with
More informationAn Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationCpt S 223. School of EECS, WSU
The Shortest Path Problem 1 Shortest-Path Algorithms Find the shortest path from point A to point B Shortest in time, distance, cost, Numerous applications Map navigation Flight itineraries Circuit wiring
More informationIE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
More informationCIS 700: algorithms for Big Data
CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv
More informationChapter 6: Graph Theory
Chapter 6: Graph Theory Graph theory deals with routing and network problems and if it is possible to find a best route, whether that means the least expensive, least amount of time or the least distance.
More informationA1 and FARM scalable graph database on top of a transactional memory layer
A1 and FARM scalable graph database on top of a transactional memory layer Miguel Castro, Aleksandar Dragojević, Dushyanth Narayanan, Ed Nightingale, Alex Shamis Richie Khanna, Matt Renzelmann Chiranjeeb
More informationOracle Spatial and Graph. Jayant Sharma Director, Product Management
Oracle Spatial and Graph Jayant Sharma Director, Product Management Agenda Oracle Spatial and Graph Graph Capabilities Q&A 2 Oracle Spatial and Graph Complete Open Integrated Most Widely Used 3 Open and
More informationHome Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit
Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationHandout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010. Chapter 7: Digraphs
MCS-236: Graph Theory Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, 2010 Chapter 7: Digraphs Strong Digraphs Definitions. A digraph is an ordered pair (V, E), where V is the set
More informationUsing Map-Reduce for Large Scale Analysis of Graph-Based Data
Using Map-Reduce for Large Scale Analysis of Graph-Based Data NAN GONG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2011 TRITA-ICT-EX-2011:218 Using Map-Reduce
More informationOptimizations and Analysis of BSP Graph Processing Models on Public Clouds
Optimizations and Analysis of BSP Graph Processing Models on Public Clouds Mark Redekopp, Yogesh Simmhan, and Viktor K. Prasanna University of Southern California, Los Angeles CA 989 {redekopp, simmhan,
More informationThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform Amir H. Payberah Swedish Institute of Computer Science amir@sics.se June 4, 2014 Amir H. Payberah (SICS) Stratosphere June 4, 2014 1 / 44 Big Data small data
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
More informationV. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005
V. Adamchik 1 Graph Theory Victor Adamchik Fall of 2005 Plan 1. Basic Vocabulary 2. Regular graph 3. Connectivity 4. Representing Graphs Introduction A.Aho and J.Ulman acknowledge that Fundamentally, computer
More informationDistributed Computing over Communication Networks: Topology. (with an excursion to P2P)
Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...
More informationBig Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
More informationPerformance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
More informationAccelerating In-Memory Graph Database traversal using GPGPUS
Accelerating In-Memory Graph Database traversal using GPGPUS Ashwin Raghav Mohan Ganesh University of Virginia High Performance Computing Laboratory am2qa@virginia.edu Abstract The paper aims to provide
More informationCloud Computing. Lectures 10 and 11 Map Reduce: System Perspective 2014-2015
Cloud Computing Lectures 10 and 11 Map Reduce: System Perspective 2014-2015 1 MapReduce in More Detail 2 Master (i) Execution is controlled by the master process: Input data are split into 64MB blocks.
More information2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]
Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationSCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
More informationSYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014
SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance
More informationGraph theory and network analysis. Devika Subramanian Comp 140 Fall 2008
Graph theory and network analysis Devika Subramanian Comp 140 Fall 2008 1 The bridges of Konigsburg Source: Wikipedia The city of Königsberg in Prussia was set on both sides of the Pregel River, and included
More informationParallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
More information8.1 Min Degree Spanning Tree
CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree
More informationA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph Analytics Donald Nguyen, Andrew Lenharth and Keshav Pingali The University of Texas at Austin, Texas, USA {ddn@cs, lenharth@ices, pingali@cs}.utexas.edu Abstract
More informationDiversity Coloring for Distributed Data Storage in Networks 1
Diversity Coloring for Distributed Data Storage in Networks 1 Anxiao (Andrew) Jiang and Jehoshua Bruck California Institute of Technology Pasadena, CA 9115, U.S.A. {jax, bruck}@paradise.caltech.edu Abstract
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationarxiv:1405.1499v3 [cs.db] 30 Sep 2015
Noname manuscript No. (will be inserted by the editor) NScale: Neighborhood-centric Large-Scale Graph Analytics in the Cloud Abdul Quamar Amol Deshpande Jimmy Lin arxiv:145.1499v3 [cs.db] 3 Sep 215 the
More informationSeminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it
Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International
More informationSignificantly Speed up real world big data Applications using Apache Spark
Significantly Speed up real world big data Applications using Apache Spark Mingfei Shi(mingfei.shi@intel.com) Grace Huang ( jie.huang@intel.com) Intel/SSG/Big Data Technology 1 Agenda Who are we? Case
More informationImplementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com
More informationThe Power of Relationships
The Power of Relationships Opportunities and Challenges in Big Data Intel Labs Cluster Computing Architecture Legal Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO
More informationGraph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
More informationBig Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv volker.markl@tu-berlin.de Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
More informationCMPSCI611: Approximating MAX-CUT Lecture 20
CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to
More informationPersistent Data Structures and Planar Point Location
Persistent Data Structures and Planar Point Location Inge Li Gørtz Persistent Data Structures Ephemeral Partial persistence Full persistence Confluent persistence V1 V1 V1 V1 V2 q ue V2 V2 V5 V2 V4 V4
More informationOutline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
More informationB490 Mining the Big Data. 2 Clustering
B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern
More informationAnalyzing the Facebook graph?
Logistics Big Data Algorithmic Introduction Prof. Yuval Shavitt Contact: shavitt@eng.tau.ac.il Final grade: 4 6 home assignments (will try to include programing assignments as well): 2% Exam 8% Big Data
More informationAnalysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
More informationTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud Bin Shao Microsoft Research Asia Beijing, China binshao@microsoft.com Haixun Wang Microsoft Research Asia Beijing, China haixunw@microsoft.com Yatao
More informationA SURVEY OF PERSISTENT GRAPH DATABASES
A SURVEY OF PERSISTENT GRAPH DATABASES A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Science by Yufan Liu March 2014 Thesis written
More informationData Structure [Question Bank]
Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:
More informationGraph Theory Algorithms for Mobile Ad Hoc Networks
Informatica 36 (2012) 185-200 185 Graph Theory Algorithms for Mobile Ad Hoc Networks Natarajan Meghanathan Department of Computer Science, Jackson State University Jackson, MS 39217, USA E-mail: natarajan.meghanathan@jsums.edu
More informationONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015
ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving
More informationimgraph: A distributed in-memory graph database
imgraph: A distributed in-memory graph database Salim Jouili Eura Nova R&D 435 Mont-Saint-Guibert, Belgium Email: salim.jouili@euranova.eu Aldemar Reynaga Université Catholique de Louvain 348 Louvain-La-Neuve,
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More information