Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014


 Justin Lyons
 2 years ago
 Views:
Transcription
1 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014
2 300 years before Watson there was Euler! The first (Jeopardy!) graph question? A path crossing each of the Seven Bridges of Königsberg exactly once is not possible because of this type of graph.
3 300 years before Watson there was Euler! The first (Jeopardy!) graph question? A path crossing each of the Seven Bridges of Königsberg exactly once is not possible because of this type of graph. Answer: What is a graph with more than two, odd degree vertices?
4 Asking a question is a search for an answer... Can we find X in Y? This is a graph search problem! What is a graph? A network of pairwise relationships... vertices connected by edges Brain network of C. elegans Watts, Strogatz, Nature 393(6684), 1998
5 Search the Web Graph! Google it! In 1998 Google published the PageRank algorithm... Treat the Web as a Big Graph Web Pages are vertices and hyperlinks are edges... Initialize all pages with a starting rank Imitate web surfer randomly clicking on hyperlinks Random Walk (Markov Chain) on a graph! Rank pages by quantity and quality of links Important/Relevant websites rank higher and websites referenced by important websites rank higher
6 Search the Social Graph! Facebook Graph Search index a social graph of 1 trillion edges complex queries based on the social connections between users in Facebook What questions can it answer? The Facebook Graph Search can answer questions like: which restaurants did my friends like? did people like my comments about the latest movie?
7 Search the Knowledge Graph! Semantic Graphs The meaning of a graph is encoded in the graph nodes are linked by semantics, e.g. dog is a mammal semantics are machinereadable... RDF, OWL, Open Graph Google Knowledge Graph Added to Google search engine in 2012 Microsoft Satori Announced in 2013 for the Microsoft Bing! search engine
8 Graphs are great, so what s the problem? Locality of Reference Conventional implementations expect O(1) random access, but the realworld is different... Random Access Memory (RAM) Typically takes 100 nanoseconds (ns) to load a memory reference.
9 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
10 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
11 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
12 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
13 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
14 Realworld costs for simple BFS Preliminaries (U) For a simple, undirected graph G = (V, E) with n = V vertices and m = E edges where... N(v) = {u V (v, u) E} is the neighborhood of v, d v = N(v) is the degree of v A is the adjacency matrix of G where A T = A Cost of BreadthFirst Search { Θ(2n) Θ() Θ(n + 2m) algorithm storage memory references
15 Why locality matters! Example The cost of BFS on the 2002 Yahoo! Web Graph (n = , m = ): Θ() { Θ(2n) 8 = Θ(n + 2m) = bytes of storage memory references If each memory reference took 100 ns the minimum time to complete BFS would be more than 24 minutes!
16 What s next... Bigger? Big Data begets Big Graphs Big Data challenges conventional algorithms... can t store it all in memory but... disks are 1000x slower need a scalable BreadthFirst Search... An NSA Big Graph experiment Tech Report NSARD v1 1 Petabyte RMAT graph 19.5x more than cluster memory linear performance from 1 trillion to 70 trillion edges... Problem Class Problem Class Memory = 57.6 TB Traversed Edges (trillion) Time (h) Terabytes
17 ... Harder graph questions? Beyond BreadthFirst Search Hard questions aren t always linear time... Example How similar is each vertex to all other vertices in the graph? Realworld use case Find all duplicate or nearduplicate web pages...
18 Similarity on Graphs Vertex Similarity Similarity between a pair of vertices can be defined by the overlap of common neighbors; i.e. structural similarity depends only on adjacency information does not require transitivity, i.e. triangles
19 Jaccard Similarity Jaccard Coefficient Ratio of intersection to union cardinalities of two sets. Properties J ij = N(i) N(j) N(i) N(j) range in [0, 1]; 0 disjoint sets and 1 identical sets nonzero if (i, j) are endpoints of paths of length two, i.e. 2paths Jaccard Distance, 1 J ij, satisfies the triangle inequality
20 Computing exact, allpairs Jaccard Similarity is hard! Cubic upper bound! O(n 3 ) worst case complexity! Count (i, j) pairs where i, j N(v) Let γ ij denote count of (i, j) 2paths and δ ij = d i + d j then, γ ij J ij = δ ij γ ij Runtime complexity Neighbor pairing 1: for all v V do 2: for all {i, j} N(v) do 3: γ ij γ ij + 1 4: end for 5: end for ( ( ) dv ) ( O O 2 v v d 2 v ) ) O (d max d v O(md max ) O(mn) O(n 3 ) v
21 MapReduce Jaccard Similarity (memorybound) Round 1 Map: Identity u, (v, dv ) u, (v, d v ) Reduce: Create ( d v 2 ) ordered pairs of neighbors as compound keys u, {(v, dv ) v N(u)} (v w, w), d v + d w v, w N(u) Round 2 Map: Identity Reduce function: Calculate the Jaccard Coefficient (i, j), {δij, δ ij,...} (i, j), J ij = γ ij /(δ ij γ ij ) δ ij = d i + d j γ ij = {δ ij, δ ij,...}
22 Must loadbalance v ( d(v) ) 2 pair construction Parallel, pairwise combinations construct neighbor pairs for each adjacency set; for each v i N(u) { (u, } (u 1, v 1 ) (u 2, v 2 ) (u 3, v 3 ) (u 4, v 4 ) j), vi (u 1, v 2 ) (u 2, v 3 ) (u 3, v 4 ) j=1..i (u 1, v 3 ) (u 2, v 4 ) (u 1, v 4 ) pair first element in each column with the remaining elements to generate all ( ) d(v) 2 pairs Benefit Enables distributed pair construction in O(1) memory
23 Jaccard Similarity in O(1) rounds and memory Round 1 Map: Identity Reduce: Label edges with ordinal Du, (v i, d vi ) v i N(u) E n (u, i), (vi, d vi ) o i=1..d(u) Round 2 Map: Prepare edges for pairing (u, i), (v, dv ) n (u, j), (v, dv ) o j=1..i Reduce: Create ordered neighbor pairs as compound keys Du, (v, d v ) v N(u) E (v w, w), d v + d w v, w N(u) Round 3 Map: Identity Reduce: Calculate and output J ij (i, j), {δij, δ ij,...} (i, j), J ij = γ ij /(δ ij γ ij ) δ ij = d i + d j γ ij = {δ ij, δ ij,...}
24 Exact, AllPairs Jaccard Similarity benchmarks Experiments Verify scalability on synthetic datasets Graph500 RMAT graphs (A=.57,B=C=.19,D=0.05) n = 2 SCALE, m = 16n Cluster 1000 nodes 12 cores per node 64 GB RAM per node Constantmemory MapReduce Job parameters prestep round to annotate undirected edges with degree, e.g. u, (v, d v ) edge preparation is blockdistributed total of 5 rounds; one additional round inserted to randomize output from round 1
25 AllPairs Jaccard Similarity on Graph500 datasets Graph500 RMAT Graphs (undirected) n (10 6 ) m (10 6 ) 2paths (10 9 ) J ij (10 9 ) RMAT RMAT RMAT RMAT RMAT RMAT Time (minutes) Fast Total Edges Time (minutes) RMAT RMAT RMAT RMAT RMAT RMAT
26 Results Exact, allpairs Jaccard Similarity in O(1) memory and rounds neighbor pairing similar to Node Iterator for triangle listing loadbalance O( v d(v)2 ) pair generation scales well with increasing 2path count MapReduce AllPairs Jaccard Similarity performance 9 billion Jaccard coefficients per minute! (RMAT scale trillion J ij )
27 What s the next Big question? Beyond...?
An NSA Big Graph experiment. Paul Burkhardt, Chris Waring. May 20, 2013
U.S. National Security Agency Research Directorate  R6 Technical Report NSARD2013056002v1 May 20, 2013 Graphs are everywhere! A graph is a collection of binary relationships, i.e. networks of pairwise
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationSCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the likeminded users
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationGraph Algorithms using MapReduce
Graph Algorithms using MapReduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using MapReduce Graphs are ubiquitous in modern society.
More informationGraphalytics: A Big Data Benchmark for GraphProcessing Platforms
Graphalytics: A Big Data Benchmark for GraphProcessing Platforms Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau PratPérez, Orri Erling, Peter Boncz Delft University of Technology Universitat Politècnica
More informationGraph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 3. Topic Overview Definitions and Representation Minimum
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationIntroduction to Flocking {Stochastic Matrices}
Supelec EECI Graduate School in Control Introduction to Flocking {Stochastic Matrices} A. S. Morse Yale University Gif sur  Yvette May 21, 2012 CRAIG REYNOLDS  1987 BOIDS The Lion King CRAIG REYNOLDS
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationAnalysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadthfirst search (BFS) 4 Applications
More informationDiscrete Mathematics & Mathematical Reasoning Chapter 10: Graphs
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationDepartment of Cognitive Sciences University of California, Irvine 1
Mark Steyvers Department of Cognitive Sciences University of California, Irvine 1 Network structure of word associations Decentralized search in information networks Analogy between Google and word retrieval
More informationSociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)
Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people
More informationGeneral Network Analysis: Graphtheoretic. COMP572 Fall 2009
General Network Analysis: Graphtheoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationNetwork (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,
More information2.3 Scheduling jobs on identical parallel machines
2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed
More informationIE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
More informationFour Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1  (vt) to change the size of something let s scale the
More informationLecture 9. 1 Introduction. 2 Random Walks in Graphs. 1.1 How To Explore a Graph? CS621 Theory Gems October 17, 2012
CS62 Theory Gems October 7, 202 Lecture 9 Lecturer: Aleksander Mądry Scribes: Dorina Thanou, Xiaowen Dong Introduction Over the next couple of lectures, our focus will be on graphs. Graphs are one of
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationhttp://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an opensource software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationCMSC 451: Graph Properties, DFS, BFS, etc.
CMSC 451: Graph Properties, DFS, BFS, etc. Slides By: Carl Kingsford Department of Computer Science University of Maryland, College Park Based on Chapter 3 of Algorithm Design by Kleinberg & Tardos. Graphs
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationInfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graphclustering methods and their applications
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationSystems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationSearch engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
More information! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)
E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) ChingYung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationOverview on Graph Datastores and Graph Computing Systems.  Litao Deng (Cloud Computing Group) 06082012
Overview on Graph Datastores and Graph Computing Systems  Litao Deng (Cloud Computing Group) 06082012 Graph  Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NPhard problem. What should I do? A. Theory says you're unlikely to find a polytime algorithm. Must sacrifice one
More informationGraph definition Degree, in, out degree, oriented graph. Complete, regular, bipartite graph. Graph representation, connectivity, adjacency.
Mária Markošová Graph definition Degree, in, out degree, oriented graph. Complete, regular, bipartite graph. Graph representation, connectivity, adjacency. Isomorphism of graphs. Paths, cycles, trials.
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE  Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationMinimum Spanning Trees
Minimum Spanning Trees Algorithms and 18.304 Presentation Outline 1 Graph Terminology Minimum Spanning Trees 2 3 Outline Graph Terminology Minimum Spanning Trees 1 Graph Terminology Minimum Spanning Trees
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationFast Parallel Conversion of Edge List to Adjacency List for LargeScale Graphs
Fast arallel Conversion of Edge List to Adjacency List for LargeScale Graphs Shaikh Arifuzzaman and Maleq Khan Network Dynamics & Simulation Science Lab, Virginia Bioinformatics Institute Virginia Tech,
More informationMining SocialNetwork Graphs
342 Chapter 10 Mining SocialNetwork Graphs There is much information to be gained by analyzing the largescale data that is derived from social networks. The bestknown example of a social network is
More informationThe origins of graph theory are humble, even frivolous. Biggs, E. K. Lloyd, and R. J. Wilson)
Chapter 11 Graph Theory The origins of graph theory are humble, even frivolous. Biggs, E. K. Lloyd, and R. J. Wilson) (N. Let us start with a formal definition of what is a graph. Definition 72. A graph
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationEfficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto mathiou@cs.toronto.edu Nick Koudas Department of Computer Science
More informationGraph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
More informationPresto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIAUIUCAL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
More informationMizan: A System for Dynamic Load Balancing in Largescale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Largescale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationLargeScale Data Processing
LargeScale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationWhy graph clustering is useful?
Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as any other clustering Identification of communities in social networks Webpage clustering for better data management
More informationHomework 15 Solutions
PROBLEM ONE (Trees) Homework 15 Solutions 1. Recall the definition of a tree: a tree is a connected, undirected graph which has no cycles. Which of the following definitions are equivalent to this definition
More informationEntropy based Graph Clustering: Application to Biological and Social Networks
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley YoungRae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationA Sublinear Bipartiteness Tester for Bounded Degree Graphs
A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublineartime algorithm for testing whether a bounded degree graph is bipartite
More informationIntroduction to Networks and Business Intelligence
Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. #approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of three
More informationNotes on Matrix Multiplication and the Transitive Closure
ICS 6D Due: Wednesday, February 25, 2015 Instructor: Sandy Irani Notes on Matrix Multiplication and the Transitive Closure An n m matrix over a set S is an array of elements from S with n rows and m columns.
More informationB490 Mining the Big Data. 0 Introduction
B490 Mining the Big Data 0 Introduction Qin Zhang 11 Data Mining What is Data Mining? A definition : Discovery of useful, possibly unexpected, patterns in data. 21 Data Mining What is Data Mining? A
More informationV. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005
V. Adamchik 1 Graph Theory Victor Adamchik Fall of 2005 Plan 1. Basic Vocabulary 2. Regular graph 3. Connectivity 4. Representing Graphs Introduction A.Aho and J.Ulman acknowledge that Fundamentally, computer
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736bewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736bewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationCSE 20: Discrete Mathematics for Computer Science. Prof. Miles Jones. Today s Topics: Graphs. The Internet graph
Today s Topics: CSE 0: Discrete Mathematics for Computer Science Prof. Miles Jones. Graphs. Some theorems on graphs. Eulerian graphs Graphs! Model relations between pairs of objects The Internet graph!
More informationFinding and counting given length cycles
Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected
More informationIn the following we will only consider undirected networks.
Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each
More informationA1 and FARM scalable graph database on top of a transactional memory layer
A1 and FARM scalable graph database on top of a transactional memory layer Miguel Castro, Aleksandar Dragojević, Dushyanth Narayanan, Ed Nightingale, Alex Shamis Richie Khanna, Matt Renzelmann Chiranjeeb
More informationSPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE
SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE 2012 SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH (M.Sc., SFU, Russia) A THESIS
More informationSECTIONS 1.51.6 NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES
SECIONS.5.6 NOES ON GRPH HEORY NOION ND IS USE IN HE SUDY OF SPRSE SYMMERIC MRICES graph G ( X, E) consists of a finite set of nodes or vertices X and edges E. EXMPLE : road map of part of British Columbia
More informationHadoop and Mapreduce computing
Hadoop and Mapreduce computing 1 Introduction This activity contains a great deal of background information and detailed instructions so that you can refer to it later for further activities and homework.
More informationAnswers to some of the exercises.
Answers to some of the exercises. Chapter 2. Ex.2.1 (a) There are several ways to do this. Here is one possibility. The idea is to apply the kcenter algorithm first to D and then for each center in D
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationHomework MA 725 Spring, 2012 C. Huneke SELECTED ANSWERS
Homework MA 725 Spring, 2012 C. Huneke SELECTED ANSWERS 1.1.25 Prove that the Petersen graph has no cycle of length 7. Solution: There are 10 vertices in the Petersen graph G. Assume there is a cycle C
More informationNetwork File Storage with Graceful Performance Degradation
Network File Storage with Graceful Performance Degradation ANXIAO (ANDREW) JIANG California Institute of Technology and JEHOSHUA BRUCK California Institute of Technology A file storage scheme is proposed
More information1 Basic Definitions and Concepts in Graph Theory
CME 305: Discrete Mathematics and Algorithms 1 Basic Definitions and Concepts in Graph Theory A graph G(V, E) is a set V of vertices and a set E of edges. In an undirected graph, an edge is an unordered
More informationUsing MapReduce for Large Scale Analysis of GraphBased Data
Using MapReduce for Large Scale Analysis of GraphBased Data NAN GONG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2011 TRITAICTEX2011:218 Using MapReduce
More informationFrans J.C.T. de Ruiter, Norman L. Biggs Applications of integer programming methods to cages
Frans J.C.T. de Ruiter, Norman L. Biggs Applications of integer programming methods to cages Article (Published version) (Refereed) Original citation: de Ruiter, Frans and Biggs, Norman (2015) Applications
More informationBig Data Technology MapReduce Motivation: Indexing in Search Engines
Big Data Technology MapReduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationStrong and Weak Ties
Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GSASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GSASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationOn the independence number of graphs with maximum degree 3
On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs
More informationExtreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1
Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing
More information