Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
|
|
- Dinah Rich
- 4 years ago
- Views:
Transcription
1 Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro
2 Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free, small-world Community Detection What groups of nodes are related? Node Classification Importance and function of a certain node? Network Comparison What is the type of the network? 2
3 Building Blocks of Networks Subnetworks, or subgraphs, are the building blocks of networks They have the power to characterize and discriminate networks 3
4 Building Blocks of Networks 4
5 Some Graph Definitions Subgraph: subset of vertices and edges inside another graph Induced subgraph: You must consider all connections between selected nodes Induced Subgraph Non-Induced Subgraph 5
6 Some Graph Definitions Graph Isomorphism: Two graphs G and H are isomorphic (G ~ H) when there is a one to one mapping between the nodes of both graphs and there is an edge between two vertices of G if and only if the corresponding vertices in H also form an edge Informally we are seeing if the topology is the same 4 Isomorphic Graphs 6
7 Some Graph Definitions Graph Automorphisms: Set of isomorphisms of a graph into itself Equivalence of Nodes Two nodes are equivalent if there exists an automorphism that maps one into the other This relation partitions a graph into equivalence classes Informally, we are looking at the graph symmetry Example graph automorphisms and equivalence classes 7
8 Some Graph Definitions Graph Matches: A match of a graph H in a larger graph G is a set of nodes of G that induce the respective subgraph H. In other words, a subgraph of G that is isomorphic to H 8
9 Example Application of Subgraphs Consider all possible directed subgraphs of size 3 9
10 Example Application of Subgraphs Z-Score measures overrepresentation Different networks present similar fingerprints Image: Adapted from (Milo et al., 2004) 10
11 Network Motifs Milo et al. (2002) came up with the definition of network motifs: recurring, significant patterns of interconnections How to define: Pattern: induced subgraph Recurring: found many times, i.e., high frequency Significant: more frequent than it would be expected in similar networks (same degree sequence) Parameters P, U, D, N control the definition (Milo et al., 2002, used {0.01, 4, 0.1, 1000}) Image: Adapted from (Milo et al., 2004) 11
12 Network Motifs Random Networks Motif Original Network Image: Adapted from (Milo et al., 2002) 12
13 Network Motifs: Example Application Table: Adapted from (Milo et al., 2002) 13
14 Network Motifs: Example Application Image: taken from (Sporns and Kotter, 2004) 14
15 Graphlet Degree Distribution What about a node subgraph metric? The degree distribution is in a way measuring participation in subgraphs of size 2 Can we generalize this? Przulj (2006) came up with the definition of graphlet degree distribution: Where does the node appear in orbits of subgraphs? 15
16 Graphlet Degree Distribution 16
17 Computational Problem In its core, finding motifs and graphlets its all about finding and counting subgraphs. Just knowing if a certain subgraph exists is already an hard computational problem! Subgraph isomorphism is NP-complete Execution time grows exponentially as the size of the graph or the motif/graphlet increases Feasible motif size is usually small (3 to 8) and network size in the order of hundreds or thousands of nodes 17
18 Subgraph Discovery Research Part of my research is focused on improving efficiency in network motif detection. Scale up! How? Novel data structures for the graphs and subgraphs Novel faster algorithms Sampling techniques Parallel approaches (with different paradigms) 18
19 Previous Approaches Network-centric approaches: Enumerate all subgraphs and then compute isomorphisms (ex: ESU/Fanmod, Kavosh). Subgraph-centric approaches: Find one subgraph at a time (ex: Grochow and Kellis) 19
20 A set-centric approach Key insight: can we do better looking for a given set of subgraphs? Looking for all does not allow for discarding subgraphs we don't care about (which would save computation time) Looking for one at the time does not allow to reuse results (which would avoid redundant computation) Can we find what is common between subgraphs and use that? Set-centric approach: Find a custom set of subgraphs (maybe one, maybe all, maybe something in between) 20
21 G-Tries Motivation Sequences and Prefix Trees (tries) Can the concept be extended to graphs? 21
22 G-Tries Motivation Graphs have common substructure! G-Tries (etimology: Graph retrieval ) 22
23 The G-Trie data structure G-Tries: (customized) collections of subgraphs Common substructures are identified Information is compressed 23
24 The G-Trie data structure G-Tries: also valid for directed networks 24
25 The G-Trie data structure G-Tries: also valid for colored/labeled networks G-Trie 25
26 Creating a G-Trie Iterative insertion Start with an empty g-trie 26
27 The Need for a Canonical Form There are different node orderings representing the same subgraph Canonical form for a getting an unique g-tries Different canon will give origin to different g-tries 27
28 Impact of Canonical Form 28
29 Custom Canonical Form Connectivity Path induces connected subgraph Compressibility More common subtructre, less g-tries nodes Constraining As many connections as possible to ancestor nodes (limit possible matches) GTCanon 29
30 GTCanon Example 30
31 Searching with G-Tries Backtracking Procedure Searching at the same time for several subgraphs Candidates for node 1: {0, 1, 2, 3, 4, 5} Try 0: Match = {0}, Neighb. = {1,3,4} Try 1: Match = {0,1}, Neighb. = {2,3,4,5} Try 2: no edge from 2 to 0! FAIL Try 3: no edge from 3 to 1! FAIL Try 4: Match = {0, 1, 4} FOUND! Try 5: no edge from 5 to 1! FAIL 31
32 Searching with G-Tries The same subgraph could be found several times due to automorphisms (symmetries) We would not only find {0,1,4} but also: {0, 4, 1} {1, 0, 4} {1, 4, 0} {4, 0, 1} {4, 1, 0} 32
33 Symmetry Breaking Conditions Conditions on node labels Add conditions to respective g-trie path Match only when conditions of at least one descendant are respected Filter conditions to ensure minimum work Ex: transitive property (a<b,a<c,b<c leads to a<b,b<c); assured descendants, only store relevant to node, etc 33
34 Complete G-Trie Example All six undirected 4-subgraphs 34
35 Complete G-Trie Example All thirteen directed 3-subgraphs 35
36 Sequential version: some results Set of 12 representative real networks Network Group Directed dolphins social no circuit physical no neural biological yes 297 2, metabolic biological yes 453 2, links social yes 1,490 19, coauthors social no 1,589 2, ppi biological no 2,361 6, odlis semantic yes 2,909 18, power physical no 4,941 6, company social yes 8,497 6, foldoc Semantic yes 13, , internet Physical no 22,963 48, ,390 V(G) 36 E(G) Nr. Neighbours Average Max
37 Sequential version: some results On both directed and undirected graphs g-tries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x Example results for full census of size k (speedup on a set of undirected networks) Network k ESU Kavosh Grochow dolphins circuit coauthors ppi power internet
38 Sequential version: some results On both directed and undirected graphs g-tries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x Example results for full census of size k (speedup on a set of directed networks) Network K ESU Kavosh Grochow neural metabolic links odlis company foldoc
39 Sequential version: some results On both directed and undirected graphs g-tries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x 39
40 Sampling approach Sample some subgraph occurrences Deduce approximate total counting Trade accuracy for speed 40
41 Sampling approach: search tree Backtracking produces search tree 41
42 Sampling approach Original: unbalanced search tree Goal: Uniform sampling Subgraph occurrences are in last tree depth 42
43 Sampling approach Original: unbalanced search tree Goal: Uniform sampling Associate a probability of traversing with each search tree depth 100% 100% 80% 80% 43
44 Sampling Approach Probabilities associated with each depth: {P0, P1,, Pmax} Sampling is uniform Probability of any occurrence is P0 x P1 x x Pmax We can produce unbiased estimator Estimate = (Nr sampled occurences) / (P0 x P1 x x Pmax) The probabilities control the search Regarding accuracy: avoid small values close to root Entire search branches disregarded more variance Regarding speed: avoid high values close to root More search branches higher execution time Some results We can obtain ~90% of accuracy using ~20% of time 44
45 Parallelism Subgraph Discovery is a tree shaped computation Search nodes are independent from each other {0,1,3} If we know where we are, we can continue from there Tree Nodes -> Work Units 45
46 Parallel Problem Input: set of work units G-Trie: (network, g-trie node, partial match) Goal: efficiently distribute work units among processors We need dynamic load balancing 46
47 Parallel approaches Environments we explored Clusters (distributed memory using MPI) [almost linear speedup up to 128 cores] Multicores (shared memory using threads) [almost linear speedup up to 64 cores] Some of the techniques Receiver initiated work stealing Diagonal splitting and efficient snapshots Random polling 47
48 Example Computation G-Trie Node Graph Vertex 48
49 Example Computation G-Trie Node Graph Vertex 49
50 Example Computation G-Trie Node Graph Vertex 50
51 Example Computation G-Trie Node Graph Vertex 51
52 Example Computation G-Trie Node Graph Vertex Current Work Unit Explored Work Units STOP 52
53 Example Computation Keep Give to requester Both Diagonal Splitting 53
54 Some publications Core sequential algorithms - G-Tries: a data structure for storing and finding subgraphs. Data Mining and Know. Discovery, Towards a faster network-centric subgraph census. ASONAM' Querying Subgraph Sets with G-Tries. DBSocial'2012 (best paper award) - Strategies for Network Motifs Discovery. E-Science Sampling approach - Efficient Subgraph Frequency Estimation with G-Tries. WABI' Rand-FaSE: fast approximate subgraph census. SNAM, 2015 Parallel approach - Parallel subgraph counting for multicore architectures. ISPA' A Scalable Parallel Approach for Subgraph Census Computation. MuCoCos' Parallel Discovery of Network Motifs. Journal of Parallel and Distributed Computing Efficient Parallel Subgraph Counting using G-Tries. IEEE Cluster'2010. Concept variation and applications - Discovering Colored Network Motifs. Complenet' Motif Mining in Weighted Networks. Damnet'2012 (and new related in paper ACM-SAC'2015) - Co-authorship network comparison across research fields using motifs. ASONAM'
55 Software Reference sequential implementation is available online and uses C++: More versions available by request and we are working on giving more options to practitioners (ex: GUI, plugin for Gephi/Cytoscape) Graphlets, large scale and adaptive sampling almost public 55
56 Complex Networks "Group Pedro Ribeiro Fernando Silva (leader) David Aparício (PhD Student) Multicores, Graphlets Pedro Paredes (BSc Student) Network-Centric Graph Theory (leader) Miguel Araújo S. Choobdar (PhD Student) (former PhD Stud) Community Detection Matrix Factorization Structural Roles Temporal, Weigths Miguel Silva (MSc Student) Random Networks Ahmad Naser (MSc Student) GUI and Visualization Some collaborations (ongoing and/or starting): - M. Kaiser (U Newcastle) [Brain Networks] - L. F. Costa (U São Paulo) [Null Models] - K. Pingali (UT Austin) [Temporal Patterns] - C. Faloutsos (CMU) [Large Scale Networks] - S. Parthasarathy (Ohio State) [Roles] 56
On Network Tools for Network Motif Finding: A Survey Study
On Network Tools for Network Motif Finding: A Survey Study Elisabeth A. Wong 1,2, Brittany Baur 1,3 1 2010 NSF Bio-Grid REU Research Fellows at Univ of Connecticut 2 Bowdoin College 3 Manhattanville College
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining
9 Graph Mining, Social Network Analysis, and Multirelational Data Mining We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific and
2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]
Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)
Bioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
A Fast Algorithm For Finding Hamilton Cycles
A Fast Algorithm For Finding Hamilton Cycles by Andrew Chalaturnyk A thesis presented to the University of Manitoba in partial fulfillment of the requirements for the degree of Masters of Science in Computer
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
PPD: Scheduling and Load Balancing 2
PPD: Scheduling and Load Balancing 2 Fernando Silva Computer Science Department Center for Research in Advanced Computing Systems (CRACS) University of Porto, FCUP http://www.dcc.fc.up.pt/~fds 2 (Some
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Graph Pattern Analysis with PatternGravisto
Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 9, no. 1, pp. 19 29 (2005) Graph Pattern Analysis with PatternGravisto Christian Klukas Dirk Koschützki Falk Schreiber Institute of Plant
CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)
Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming) Dynamic Load Balancing Dr. Ralf-Peter Mundani Center for Simulation Technology in Engineering Technische Universität München
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,
Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
BIOINFORMATICS. Biomolecular Network Motif Counting and Discovery by Color Coding
BIOINFORMATICS Vol. 00 no. 00 2008 Pages 9 Biomolecular Network Motif Counting and Discovery by Color Coding Noga Alon, Phuong Dao 2, Iman Hajirasouliha 2, Fereydoun Hormozdiari 2, and S. Cenk Sahinalp
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing
CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi
GRAPH THEORY LECTURE 4: TREES
GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection
Graph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
Comparison of Data Mining Techniques for Money Laundering Detection System
Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer
A New Marketing Channel Management Strategy Based on Frequent Subtree Mining
A New Marketing Channel Management Strategy Based on Frequent Subtree Mining Daoping Wang Peng Gao School of Economics and Management University of Science and Technology Beijing ABSTRACT For most manufacturers,
Statistical and computational challenges in networks and cybersecurity
Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,
Optimal Network Alignment with Graphlet Degree Vectors
Cancer Informatics Original Research Open Access Full open access to this and thousands of other papers at http://www.la-press.com. Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković,2,
BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
Systems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Performance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.
1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction
In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
Clustering through Decision Tree Construction in Geology
Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty
Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Random graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14
Graph Mining and Social Network nalysis Data Mining; EES 4412 Darren Rolfe + Vince hu 11.06.14 genda Graph Mining Methods for Mining Frequent Subgraphs priori-based pproach: GM, FSG Pattern-Growth pproach:
Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5
Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5 Fernando Ferreira-Santos 2012 Title: Complex Network Analysis of Brain Connectivity: An Introduction Technical Report Authors:
Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu
Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
PARALLEL PROGRAMMING
PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL
How To Find Local Affinity Patterns In Big Data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A.
Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger, Karl Jiang, Jason Riedy and David A. Bader Overview Motivation A Framework for Massive Streaming hello Data Analytics
CAD Algorithms. P and NP
CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Implementing Graph Pattern Queries on a Relational Database
LLNL-TR-400310 Implementing Graph Pattern Queries on a Relational Database Ian L. Kaplan, Ghaleb M. Abdulla, S Terry Brugger, Scott R. Kohn January 8, 2008 Disclaimer This document was prepared as an account
Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Network Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
High Throughput Network Analysis
High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department
Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan
Network Metrics, Planar Graphs, and Software Tools Based on materials by Lala Adamic, UMichigan Network Metrics: Bowtie Model of the Web n The Web is a directed graph: n webpages link to other webpages
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Big Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
A comparative study of social network analysis tools
Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Analysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
Using In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
Exponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].
Big Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 8. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION
OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
urika! Unlocking the Power of Big Data at PSC
urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data
MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail: