Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014


 Justin Lyons
 1 years ago
 Views:
Transcription
1 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014
2 300 years before Watson there was Euler! The first (Jeopardy!) graph question? A path crossing each of the Seven Bridges of Königsberg exactly once is not possible because of this type of graph.
3 300 years before Watson there was Euler! The first (Jeopardy!) graph question? A path crossing each of the Seven Bridges of Königsberg exactly once is not possible because of this type of graph. Answer: What is a graph with more than two, odd degree vertices?
4 Asking a question is a search for an answer... Can we find X in Y? This is a graph search problem! What is a graph? A network of pairwise relationships... vertices connected by edges Brain network of C. elegans Watts, Strogatz, Nature 393(6684), 1998
5 Search the Web Graph! Google it! In 1998 Google published the PageRank algorithm... Treat the Web as a Big Graph Web Pages are vertices and hyperlinks are edges... Initialize all pages with a starting rank Imitate web surfer randomly clicking on hyperlinks Random Walk (Markov Chain) on a graph! Rank pages by quantity and quality of links Important/Relevant websites rank higher and websites referenced by important websites rank higher
6 Search the Social Graph! Facebook Graph Search index a social graph of 1 trillion edges complex queries based on the social connections between users in Facebook What questions can it answer? The Facebook Graph Search can answer questions like: which restaurants did my friends like? did people like my comments about the latest movie?
7 Search the Knowledge Graph! Semantic Graphs The meaning of a graph is encoded in the graph nodes are linked by semantics, e.g. dog is a mammal semantics are machinereadable... RDF, OWL, Open Graph Google Knowledge Graph Added to Google search engine in 2012 Microsoft Satori Announced in 2013 for the Microsoft Bing! search engine
8 Graphs are great, so what s the problem? Locality of Reference Conventional implementations expect O(1) random access, but the realworld is different... Random Access Memory (RAM) Typically takes 100 nanoseconds (ns) to load a memory reference.
9 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
10 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
11 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
12 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
13 Back to Basics... BreadthFirst Search Search can be a walk on a graph... Traversal by breadthfirst expansion can answer the question: Starting from A can we find K? A B C D E F G H I J K L M N O
14 Realworld costs for simple BFS Preliminaries (U) For a simple, undirected graph G = (V, E) with n = V vertices and m = E edges where... N(v) = {u V (v, u) E} is the neighborhood of v, d v = N(v) is the degree of v A is the adjacency matrix of G where A T = A Cost of BreadthFirst Search { Θ(2n) Θ() Θ(n + 2m) algorithm storage memory references
15 Why locality matters! Example The cost of BFS on the 2002 Yahoo! Web Graph (n = , m = ): Θ() { Θ(2n) 8 = Θ(n + 2m) = bytes of storage memory references If each memory reference took 100 ns the minimum time to complete BFS would be more than 24 minutes!
16 What s next... Bigger? Big Data begets Big Graphs Big Data challenges conventional algorithms... can t store it all in memory but... disks are 1000x slower need a scalable BreadthFirst Search... An NSA Big Graph experiment Tech Report NSARD v1 1 Petabyte RMAT graph 19.5x more than cluster memory linear performance from 1 trillion to 70 trillion edges... Problem Class Problem Class Memory = 57.6 TB Traversed Edges (trillion) Time (h) Terabytes
17 ... Harder graph questions? Beyond BreadthFirst Search Hard questions aren t always linear time... Example How similar is each vertex to all other vertices in the graph? Realworld use case Find all duplicate or nearduplicate web pages...
18 Similarity on Graphs Vertex Similarity Similarity between a pair of vertices can be defined by the overlap of common neighbors; i.e. structural similarity depends only on adjacency information does not require transitivity, i.e. triangles
19 Jaccard Similarity Jaccard Coefficient Ratio of intersection to union cardinalities of two sets. Properties J ij = N(i) N(j) N(i) N(j) range in [0, 1]; 0 disjoint sets and 1 identical sets nonzero if (i, j) are endpoints of paths of length two, i.e. 2paths Jaccard Distance, 1 J ij, satisfies the triangle inequality
20 Computing exact, allpairs Jaccard Similarity is hard! Cubic upper bound! O(n 3 ) worst case complexity! Count (i, j) pairs where i, j N(v) Let γ ij denote count of (i, j) 2paths and δ ij = d i + d j then, γ ij J ij = δ ij γ ij Runtime complexity Neighbor pairing 1: for all v V do 2: for all {i, j} N(v) do 3: γ ij γ ij + 1 4: end for 5: end for ( ( ) dv ) ( O O 2 v v d 2 v ) ) O (d max d v O(md max ) O(mn) O(n 3 ) v
21 MapReduce Jaccard Similarity (memorybound) Round 1 Map: Identity u, (v, dv ) u, (v, d v ) Reduce: Create ( d v 2 ) ordered pairs of neighbors as compound keys u, {(v, dv ) v N(u)} (v w, w), d v + d w v, w N(u) Round 2 Map: Identity Reduce function: Calculate the Jaccard Coefficient (i, j), {δij, δ ij,...} (i, j), J ij = γ ij /(δ ij γ ij ) δ ij = d i + d j γ ij = {δ ij, δ ij,...}
22 Must loadbalance v ( d(v) ) 2 pair construction Parallel, pairwise combinations construct neighbor pairs for each adjacency set; for each v i N(u) { (u, } (u 1, v 1 ) (u 2, v 2 ) (u 3, v 3 ) (u 4, v 4 ) j), vi (u 1, v 2 ) (u 2, v 3 ) (u 3, v 4 ) j=1..i (u 1, v 3 ) (u 2, v 4 ) (u 1, v 4 ) pair first element in each column with the remaining elements to generate all ( ) d(v) 2 pairs Benefit Enables distributed pair construction in O(1) memory
23 Jaccard Similarity in O(1) rounds and memory Round 1 Map: Identity Reduce: Label edges with ordinal Du, (v i, d vi ) v i N(u) E n (u, i), (vi, d vi ) o i=1..d(u) Round 2 Map: Prepare edges for pairing (u, i), (v, dv ) n (u, j), (v, dv ) o j=1..i Reduce: Create ordered neighbor pairs as compound keys Du, (v, d v ) v N(u) E (v w, w), d v + d w v, w N(u) Round 3 Map: Identity Reduce: Calculate and output J ij (i, j), {δij, δ ij,...} (i, j), J ij = γ ij /(δ ij γ ij ) δ ij = d i + d j γ ij = {δ ij, δ ij,...}
24 Exact, AllPairs Jaccard Similarity benchmarks Experiments Verify scalability on synthetic datasets Graph500 RMAT graphs (A=.57,B=C=.19,D=0.05) n = 2 SCALE, m = 16n Cluster 1000 nodes 12 cores per node 64 GB RAM per node Constantmemory MapReduce Job parameters prestep round to annotate undirected edges with degree, e.g. u, (v, d v ) edge preparation is blockdistributed total of 5 rounds; one additional round inserted to randomize output from round 1
25 AllPairs Jaccard Similarity on Graph500 datasets Graph500 RMAT Graphs (undirected) n (10 6 ) m (10 6 ) 2paths (10 9 ) J ij (10 9 ) RMAT RMAT RMAT RMAT RMAT RMAT Time (minutes) Fast Total Edges Time (minutes) RMAT RMAT RMAT RMAT RMAT RMAT
26 Results Exact, allpairs Jaccard Similarity in O(1) memory and rounds neighbor pairing similar to Node Iterator for triangle listing loadbalance O( v d(v)2 ) pair generation scales well with increasing 2path count MapReduce AllPairs Jaccard Similarity performance 9 billion Jaccard coefficients per minute! (RMAT scale trillion J ij )
27 What s the next Big question? Beyond...?
An NSA Big Graph experiment. Paul Burkhardt, Chris Waring. May 20, 2013
U.S. National Security Agency Research Directorate  R6 Technical Report NSARD2013056002v1 May 20, 2013 Graphs are everywhere! A graph is a collection of binary relationships, i.e. networks of pairwise
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the likeminded users
More informationSCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationGraphalytics: A Big Data Benchmark for GraphProcessing Platforms
Graphalytics: A Big Data Benchmark for GraphProcessing Platforms Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau PratPérez, Orri Erling, Peter Boncz Delft University of Technology Universitat Politècnica
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationAnalysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadthfirst search (BFS) 4 Applications
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationGraph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 3. Topic Overview Definitions and Representation Minimum
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationhttp://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an opensource software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction
More information2.3 Scheduling jobs on identical parallel machines
2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationInfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationDiscrete Mathematics & Mathematical Reasoning Chapter 10: Graphs
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph
More informationDepartment of Cognitive Sciences University of California, Irvine 1
Mark Steyvers Department of Cognitive Sciences University of California, Irvine 1 Network structure of word associations Decentralized search in information networks Analogy between Google and word retrieval
More informationSociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)
Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people
More informationGeneral Network Analysis: Graphtheoretic. COMP572 Fall 2009
General Network Analysis: Graphtheoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5
More informationIE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
More informationNetwork (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationFour Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1  (vt) to change the size of something let s scale the
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graphclustering methods and their applications
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NPhard problem. What should I do? A. Theory says you're unlikely to find a polytime algorithm. Must sacrifice one
More information! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)
E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) ChingYung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationSystems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationSearch engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationMining SocialNetwork Graphs
342 Chapter 10 Mining SocialNetwork Graphs There is much information to be gained by analyzing the largescale data that is derived from social networks. The bestknown example of a social network is
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationPresto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIAUIUCAL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
More informationOverview on Graph Datastores and Graph Computing Systems.  Litao Deng (Cloud Computing Group) 06082012
Overview on Graph Datastores and Graph Computing Systems  Litao Deng (Cloud Computing Group) 06082012 Graph  Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationMizan: A System for Dynamic Load Balancing in Largescale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Largescale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE  Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736bewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736bewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationEfficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto mathiou@cs.toronto.edu Nick Koudas Department of Computer Science
More informationGraph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
More informationBig Data Technology MapReduce Motivation: Indexing in Search Engines
Big Data Technology MapReduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationHadoop and Mapreduce computing
Hadoop and Mapreduce computing 1 Introduction This activity contains a great deal of background information and detailed instructions so that you can refer to it later for further activities and homework.
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationLargeScale Data Processing
LargeScale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationEntropy based Graph Clustering: Application to Biological and Social Networks
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley YoungRae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
More informationFast Parallel Conversion of Edge List to Adjacency List for LargeScale Graphs
Fast arallel Conversion of Edge List to Adjacency List for LargeScale Graphs Shaikh Arifuzzaman and Maleq Khan Network Dynamics & Simulation Science Lab, Virginia Bioinformatics Institute Virginia Tech,
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationCommon Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data
More informationIntroduction to Networks and Business Intelligence
Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological
More informationMapReduce and Hadoop Distributed File System V I J A Y R A O
MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Bigdata Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB
More informationA Sublinear Bipartiteness Tester for Bounded Degree Graphs
A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublineartime algorithm for testing whether a bounded degree graph is bipartite
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GSASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GSASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. #approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of three
More informationStrong and Weak Ties
Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General
More informationEuler Paths and Euler Circuits
Euler Paths and Euler Circuits An Euler path is a path that uses every edge of a graph exactly once. An Euler circuit is a circuit that uses every edge of a graph exactly once. An Euler path starts and
More informationB490 Mining the Big Data. 0 Introduction
B490 Mining the Big Data 0 Introduction Qin Zhang 11 Data Mining What is Data Mining? A definition : Discovery of useful, possibly unexpected, patterns in data. 21 Data Mining What is Data Mining? A
More informationV. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005
V. Adamchik 1 Graph Theory Victor Adamchik Fall of 2005 Plan 1. Basic Vocabulary 2. Regular graph 3. Connectivity 4. Representing Graphs Introduction A.Aho and J.Ulman acknowledge that Fundamentally, computer
More informationA1 and FARM scalable graph database on top of a transactional memory layer
A1 and FARM scalable graph database on top of a transactional memory layer Miguel Castro, Aleksandar Dragojević, Dushyanth Narayanan, Ed Nightingale, Alex Shamis Richie Khanna, Matt Renzelmann Chiranjeeb
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationFinding and counting given length cycles
Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. !approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of
More informationSECTIONS 1.51.6 NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES
SECIONS.5.6 NOES ON GRPH HEORY NOION ND IS USE IN HE SUDY OF SPRSE SYMMERIC MRICES graph G ( X, E) consists of a finite set of nodes or vertices X and edges E. EXMPLE : road map of part of British Columbia
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationAnswers to some of the exercises.
Answers to some of the exercises. Chapter 2. Ex.2.1 (a) There are several ways to do this. Here is one possibility. The idea is to apply the kcenter algorithm first to D and then for each center in D
More informationIn the following we will only consider undirected networks.
Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each
More informationGRAPH data are important data types in many scientific
Limited Random Walk Algorithm for Big Graph Data Clustering Honglei Zhang, Member, IEEE, Jenni Raitoharju, Member, IEEE, Serkan Kiranyaz, Senior Member, IEEE, and Moncef Gabbouj, Fellow, IEEE arxiv:66.645v
More informationRanking on Data Manifolds
Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname
More informationUsing MapReduce for Large Scale Analysis of GraphBased Data
Using MapReduce for Large Scale Analysis of GraphBased Data NAN GONG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2011 TRITAICTEX2011:218 Using MapReduce
More informationFrans J.C.T. de Ruiter, Norman L. Biggs Applications of integer programming methods to cages
Frans J.C.T. de Ruiter, Norman L. Biggs Applications of integer programming methods to cages Article (Published version) (Refereed) Original citation: de Ruiter, Frans and Biggs, Norman (2015) Applications
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationMassive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A.
Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger, Karl Jiang, Jason Riedy and David A. Bader Overview Motivation A Framework for Massive Streaming hello Data Analytics
More informationNetwork File Storage with Graceful Performance Degradation
Network File Storage with Graceful Performance Degradation ANXIAO (ANDREW) JIANG California Institute of Technology and JEHOSHUA BRUCK California Institute of Technology A file storage scheme is proposed
More informationOn the independence number of graphs with maximum degree 3
On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs
More informationThis exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 20132014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
More informationBig Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network
, pp.273284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationDocument Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Condro Wibawa, Irwan Bastian, Metty Mustikasari Department of Information Systems, Faculty of Computer Science and
More informationNetwork/Graph Theory. What is a Network? What is network theory? Graphbased representations. Friendship Network. What makes a problem graphlike?
What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graphbased representations Representing a problem
More information(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7
(67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More information