Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Save this PDF as:

Size: px
Start display at page:

Download "Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro"

Transcription

1 Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro

2 Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free, small-world Community Detection What groups of nodes are related? Node Classification Importance and function of a certain node? Network Comparison What is the type of the network? 2

3 Building Blocks of Networks Subnetworks, or subgraphs, are the building blocks of networks They have the power to characterize and discriminate networks 3

4 Building Blocks of Networks 4

5 Some Graph Definitions Subgraph: subset of vertices and edges inside another graph Induced subgraph: You must consider all connections between selected nodes Induced Subgraph Non-Induced Subgraph 5

6 Some Graph Definitions Graph Isomorphism: Two graphs G and H are isomorphic (G ~ H) when there is a one to one mapping between the nodes of both graphs and there is an edge between two vertices of G if and only if the corresponding vertices in H also form an edge Informally we are seeing if the topology is the same 4 Isomorphic Graphs 6

7 Some Graph Definitions Graph Automorphisms: Set of isomorphisms of a graph into itself Equivalence of Nodes Two nodes are equivalent if there exists an automorphism that maps one into the other This relation partitions a graph into equivalence classes Informally, we are looking at the graph symmetry Example graph automorphisms and equivalence classes 7

8 Some Graph Definitions Graph Matches: A match of a graph H in a larger graph G is a set of nodes of G that induce the respective subgraph H. In other words, a subgraph of G that is isomorphic to H 8

9 Example Application of Subgraphs Consider all possible directed subgraphs of size 3 9

10 Example Application of Subgraphs Z-Score measures overrepresentation Different networks present similar fingerprints Image: Adapted from (Milo et al., 2004) 10

11 Network Motifs Milo et al. (2002) came up with the definition of network motifs: recurring, significant patterns of interconnections How to define: Pattern: induced subgraph Recurring: found many times, i.e., high frequency Significant: more frequent than it would be expected in similar networks (same degree sequence) Parameters P, U, D, N control the definition (Milo et al., 2002, used {0.01, 4, 0.1, 1000}) Image: Adapted from (Milo et al., 2004) 11

12 Network Motifs Random Networks Motif Original Network Image: Adapted from (Milo et al., 2002) 12

13 Network Motifs: Example Application Table: Adapted from (Milo et al., 2002) 13

14 Network Motifs: Example Application Image: taken from (Sporns and Kotter, 2004) 14

15 Graphlet Degree Distribution What about a node subgraph metric? The degree distribution is in a way measuring participation in subgraphs of size 2 Can we generalize this? Przulj (2006) came up with the definition of graphlet degree distribution: Where does the node appear in orbits of subgraphs? 15

16 Graphlet Degree Distribution 16

17 Computational Problem In its core, finding motifs and graphlets its all about finding and counting subgraphs. Just knowing if a certain subgraph exists is already an hard computational problem! Subgraph isomorphism is NP-complete Execution time grows exponentially as the size of the graph or the motif/graphlet increases Feasible motif size is usually small (3 to 8) and network size in the order of hundreds or thousands of nodes 17

18 Subgraph Discovery Research Part of my research is focused on improving efficiency in network motif detection. Scale up! How? Novel data structures for the graphs and subgraphs Novel faster algorithms Sampling techniques Parallel approaches (with different paradigms) 18

19 Previous Approaches Network-centric approaches: Enumerate all subgraphs and then compute isomorphisms (ex: ESU/Fanmod, Kavosh). Subgraph-centric approaches: Find one subgraph at a time (ex: Grochow and Kellis) 19

20 A set-centric approach Key insight: can we do better looking for a given set of subgraphs? Looking for all does not allow for discarding subgraphs we don't care about (which would save computation time) Looking for one at the time does not allow to reuse results (which would avoid redundant computation) Can we find what is common between subgraphs and use that? Set-centric approach: Find a custom set of subgraphs (maybe one, maybe all, maybe something in between) 20

21 G-Tries Motivation Sequences and Prefix Trees (tries) Can the concept be extended to graphs? 21

22 G-Tries Motivation Graphs have common substructure! G-Tries (etimology: Graph retrieval ) 22

23 The G-Trie data structure G-Tries: (customized) collections of subgraphs Common substructures are identified Information is compressed 23

24 The G-Trie data structure G-Tries: also valid for directed networks 24

25 The G-Trie data structure G-Tries: also valid for colored/labeled networks G-Trie 25

26 Creating a G-Trie Iterative insertion Start with an empty g-trie 26

27 The Need for a Canonical Form There are different node orderings representing the same subgraph Canonical form for a getting an unique g-tries Different canon will give origin to different g-tries 27

28 Impact of Canonical Form 28

29 Custom Canonical Form Connectivity Path induces connected subgraph Compressibility More common subtructre, less g-tries nodes Constraining As many connections as possible to ancestor nodes (limit possible matches) GTCanon 29

30 GTCanon Example 30

31 Searching with G-Tries Backtracking Procedure Searching at the same time for several subgraphs Candidates for node 1: {0, 1, 2, 3, 4, 5} Try 0: Match = {0}, Neighb. = {1,3,4} Try 1: Match = {0,1}, Neighb. = {2,3,4,5} Try 2: no edge from 2 to 0! FAIL Try 3: no edge from 3 to 1! FAIL Try 4: Match = {0, 1, 4} FOUND! Try 5: no edge from 5 to 1! FAIL 31

32 Searching with G-Tries The same subgraph could be found several times due to automorphisms (symmetries) We would not only find {0,1,4} but also: {0, 4, 1} {1, 0, 4} {1, 4, 0} {4, 0, 1} {4, 1, 0} 32

33 Symmetry Breaking Conditions Conditions on node labels Add conditions to respective g-trie path Match only when conditions of at least one descendant are respected Filter conditions to ensure minimum work Ex: transitive property (a<b,a<c,b<c leads to a<b,b<c); assured descendants, only store relevant to node, etc 33

34 Complete G-Trie Example All six undirected 4-subgraphs 34

35 Complete G-Trie Example All thirteen directed 3-subgraphs 35

36 Sequential version: some results Set of 12 representative real networks Network Group Directed dolphins social no circuit physical no neural biological yes 297 2, metabolic biological yes 453 2, links social yes 1,490 19, coauthors social no 1,589 2, ppi biological no 2,361 6, odlis semantic yes 2,909 18, power physical no 4,941 6, company social yes 8,497 6, foldoc Semantic yes 13, , internet Physical no 22,963 48, ,390 V(G) 36 E(G) Nr. Neighbours Average Max

37 Sequential version: some results On both directed and undirected graphs g-tries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x Example results for full census of size k (speedup on a set of undirected networks) Network k ESU Kavosh Grochow dolphins circuit coauthors ppi power internet

38 Sequential version: some results On both directed and undirected graphs g-tries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x Example results for full census of size k (speedup on a set of directed networks) Network K ESU Kavosh Grochow neural metabolic links odlis company foldoc

39 Sequential version: some results On both directed and undirected graphs g-tries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x 39

40 Sampling approach Sample some subgraph occurrences Deduce approximate total counting Trade accuracy for speed 40

41 Sampling approach: search tree Backtracking produces search tree 41

42 Sampling approach Original: unbalanced search tree Goal: Uniform sampling Subgraph occurrences are in last tree depth 42

43 Sampling approach Original: unbalanced search tree Goal: Uniform sampling Associate a probability of traversing with each search tree depth 100% 100% 80% 80% 43

44 Sampling Approach Probabilities associated with each depth: {P0, P1,, Pmax} Sampling is uniform Probability of any occurrence is P0 x P1 x x Pmax We can produce unbiased estimator Estimate = (Nr sampled occurences) / (P0 x P1 x x Pmax) The probabilities control the search Regarding accuracy: avoid small values close to root Entire search branches disregarded more variance Regarding speed: avoid high values close to root More search branches higher execution time Some results We can obtain ~90% of accuracy using ~20% of time 44

45 Parallelism Subgraph Discovery is a tree shaped computation Search nodes are independent from each other {0,1,3} If we know where we are, we can continue from there Tree Nodes -> Work Units 45

46 Parallel Problem Input: set of work units G-Trie: (network, g-trie node, partial match) Goal: efficiently distribute work units among processors We need dynamic load balancing 46

47 Parallel approaches Environments we explored Clusters (distributed memory using MPI) [almost linear speedup up to 128 cores] Multicores (shared memory using threads) [almost linear speedup up to 64 cores] Some of the techniques Receiver initiated work stealing Diagonal splitting and efficient snapshots Random polling 47

48 Example Computation G-Trie Node Graph Vertex 48

49 Example Computation G-Trie Node Graph Vertex 49

50 Example Computation G-Trie Node Graph Vertex 50

51 Example Computation G-Trie Node Graph Vertex 51

52 Example Computation G-Trie Node Graph Vertex Current Work Unit Explored Work Units STOP 52

53 Example Computation Keep Give to requester Both Diagonal Splitting 53

54 Some publications Core sequential algorithms - G-Tries: a data structure for storing and finding subgraphs. Data Mining and Know. Discovery, Towards a faster network-centric subgraph census. ASONAM' Querying Subgraph Sets with G-Tries. DBSocial'2012 (best paper award) - Strategies for Network Motifs Discovery. E-Science Sampling approach - Efficient Subgraph Frequency Estimation with G-Tries. WABI' Rand-FaSE: fast approximate subgraph census. SNAM, 2015 Parallel approach - Parallel subgraph counting for multicore architectures. ISPA' A Scalable Parallel Approach for Subgraph Census Computation. MuCoCos' Parallel Discovery of Network Motifs. Journal of Parallel and Distributed Computing Efficient Parallel Subgraph Counting using G-Tries. IEEE Cluster'2010. Concept variation and applications - Discovering Colored Network Motifs. Complenet' Motif Mining in Weighted Networks. Damnet'2012 (and new related in paper ACM-SAC'2015) - Co-authorship network comparison across research fields using motifs. ASONAM'

55 Software Reference sequential implementation is available online and uses C++: More versions available by request and we are working on giving more options to practitioners (ex: GUI, plugin for Gephi/Cytoscape) Graphlets, large scale and adaptive sampling almost public 55

56 Complex Networks "Group Pedro Ribeiro Fernando Silva (leader) David Aparício (PhD Student) Multicores, Graphlets Pedro Paredes (BSc Student) Network-Centric Graph Theory (leader) Miguel Araújo S. Choobdar (PhD Student) (former PhD Stud) Community Detection Matrix Factorization Structural Roles Temporal, Weigths Miguel Silva (MSc Student) Random Networks Ahmad Naser (MSc Student) GUI and Visualization Some collaborations (ongoing and/or starting): - M. Kaiser (U Newcastle) [Brain Networks] - L. F. Costa (U São Paulo) [Null Models] - K. Pingali (UT Austin) [Temporal Patterns] - C. Faloutsos (CMU) [Large Scale Networks] - S. Parthasarathy (Ohio State) [Roles] 56

On Network Tools for Network Motif Finding: A Survey Study

On Network Tools for Network Motif Finding: A Survey Study On Network Tools for Network Motif Finding: A Survey Study Elisabeth A. Wong 1,2, Brittany Baur 1,3 1 2010 NSF Bio-Grid REU Research Fellows at Univ of Connecticut 2 Bowdoin College 3 Manhattanville College

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Implementing Graph Pattern Mining for Big Data in the Cloud

Implementing Graph Pattern Mining for Big Data in the Cloud Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific and

More information

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8] Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

A Fast Algorithm For Finding Hamilton Cycles

A Fast Algorithm For Finding Hamilton Cycles A Fast Algorithm For Finding Hamilton Cycles by Andrew Chalaturnyk A thesis presented to the University of Manitoba in partial fulfillment of the requirements for the degree of Masters of Science in Computer

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

PPD: Scheduling and Load Balancing 2

PPD: Scheduling and Load Balancing 2 PPD: Scheduling and Load Balancing 2 Fernando Silva Computer Science Department Center for Research in Advanced Computing Systems (CRACS) University of Porto, FCUP http://www.dcc.fc.up.pt/~fds 2 (Some

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Graph Pattern Analysis with PatternGravisto

Graph Pattern Analysis with PatternGravisto Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 9, no. 1, pp. 19 29 (2005) Graph Pattern Analysis with PatternGravisto Christian Klukas Dirk Koschützki Falk Schreiber Institute of Plant

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming) Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming) Dynamic Load Balancing Dr. Ralf-Peter Mundani Center for Simulation Technology in Engineering Technische Universität München

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

BIOINFORMATICS. Biomolecular Network Motif Counting and Discovery by Color Coding

BIOINFORMATICS. Biomolecular Network Motif Counting and Discovery by Color Coding BIOINFORMATICS Vol. 00 no. 00 2008 Pages 9 Biomolecular Network Motif Counting and Discovery by Color Coding Noga Alon, Phuong Dao 2, Iman Hajirasouliha 2, Fereydoun Hormozdiari 2, and S. Cenk Sahinalp

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Introduction to Graph Mining

Introduction to Graph Mining Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

GRAPH THEORY LECTURE 4: TREES

GRAPH THEORY LECTURE 4: TREES GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

Comparison of Data Mining Techniques for Money Laundering Detection System

Comparison of Data Mining Techniques for Money Laundering Detection System Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer

More information

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining A New Marketing Channel Management Strategy Based on Frequent Subtree Mining Daoping Wang Peng Gao School of Economics and Management University of Science and Technology Beijing ABSTRACT For most manufacturers,

More information

Statistical and computational challenges in networks and cybersecurity

Statistical and computational challenges in networks and cybersecurity Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,

More information

Optimal Network Alignment with Graphlet Degree Vectors

Optimal Network Alignment with Graphlet Degree Vectors Cancer Informatics Original Research Open Access Full open access to this and thousands of other papers at http://www.la-press.com. Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković,2,

More information

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new

More information

Systems and Algorithms for Big Data Analytics

Systems and Algorithms for Big Data Analytics Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Performance Analysis and Optimization Tool

Performance Analysis and Optimization Tool Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop

More information

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes. 1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The

More information

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014 Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction

More information

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

Clustering through Decision Tree Construction in Geology

Clustering through Decision Tree Construction in Geology Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty

More information

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14

Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14 Graph Mining and Social Network nalysis Data Mining; EES 4412 Darren Rolfe + Vince hu 11.06.14 genda Graph Mining Methods for Mining Frequent Subgraphs priori-based pproach: GM, FSG Pattern-Growth pproach:

More information

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5 Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5 Fernando Ferreira-Santos 2012 Title: Complex Network Analysis of Brain Connectivity: An Introduction Technical Report Authors:

More information

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

PARALLEL PROGRAMMING

PARALLEL PROGRAMMING PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL

More information

Detection of local affinity patterns in big data

Detection of local affinity patterns in big data Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A.

Massive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger, Karl Jiang, Jason Riedy and David A. Bader Overview Motivation A Framework for Massive Streaming hello Data Analytics

More information

CAD Algorithms. P and NP

CAD Algorithms. P and NP CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Implementing Graph Pattern Queries on a Relational Database

Implementing Graph Pattern Queries on a Relational Database LLNL-TR-400310 Implementing Graph Pattern Queries on a Relational Database Ian L. Kaplan, Ghaleb M. Abdulla, S Terry Brugger, Scott R. Kohn January 8, 2008 Disclaimer This document was prepared as an account

More information

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)

More information

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

High Throughput Network Analysis

High Throughput Network Analysis High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

More information

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan Network Metrics, Planar Graphs, and Software Tools Based on materials by Lala Adamic, UMichigan Network Metrics: Bowtie Model of the Web n The Web is a directed graph: n webpages link to other webpages

More information

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Big Data Systems CS 5965/6965 FALL 2015

Big Data Systems CS 5965/6965 FALL 2015 Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

A comparative study of social network analysis tools

A comparative study of social network analysis tools Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Analysis of MapReduce Algorithms

Analysis of MapReduce Algorithms Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 8. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

urika! Unlocking the Power of Big Data at PSC

urika! Unlocking the Power of Big Data at PSC urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data

More information

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12 MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

More information