Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro


 Dinah Rich
 3 years ago
 Views:
Transcription
1 Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro
2 Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scalefree, smallworld Community Detection What groups of nodes are related? Node Classification Importance and function of a certain node? Network Comparison What is the type of the network? 2
3 Building Blocks of Networks Subnetworks, or subgraphs, are the building blocks of networks They have the power to characterize and discriminate networks 3
4 Building Blocks of Networks 4
5 Some Graph Definitions Subgraph: subset of vertices and edges inside another graph Induced subgraph: You must consider all connections between selected nodes Induced Subgraph NonInduced Subgraph 5
6 Some Graph Definitions Graph Isomorphism: Two graphs G and H are isomorphic (G ~ H) when there is a one to one mapping between the nodes of both graphs and there is an edge between two vertices of G if and only if the corresponding vertices in H also form an edge Informally we are seeing if the topology is the same 4 Isomorphic Graphs 6
7 Some Graph Definitions Graph Automorphisms: Set of isomorphisms of a graph into itself Equivalence of Nodes Two nodes are equivalent if there exists an automorphism that maps one into the other This relation partitions a graph into equivalence classes Informally, we are looking at the graph symmetry Example graph automorphisms and equivalence classes 7
8 Some Graph Definitions Graph Matches: A match of a graph H in a larger graph G is a set of nodes of G that induce the respective subgraph H. In other words, a subgraph of G that is isomorphic to H 8
9 Example Application of Subgraphs Consider all possible directed subgraphs of size 3 9
10 Example Application of Subgraphs ZScore measures overrepresentation Different networks present similar fingerprints Image: Adapted from (Milo et al., 2004) 10
11 Network Motifs Milo et al. (2002) came up with the definition of network motifs: recurring, significant patterns of interconnections How to define: Pattern: induced subgraph Recurring: found many times, i.e., high frequency Significant: more frequent than it would be expected in similar networks (same degree sequence) Parameters P, U, D, N control the definition (Milo et al., 2002, used {0.01, 4, 0.1, 1000}) Image: Adapted from (Milo et al., 2004) 11
12 Network Motifs Random Networks Motif Original Network Image: Adapted from (Milo et al., 2002) 12
13 Network Motifs: Example Application Table: Adapted from (Milo et al., 2002) 13
14 Network Motifs: Example Application Image: taken from (Sporns and Kotter, 2004) 14
15 Graphlet Degree Distribution What about a node subgraph metric? The degree distribution is in a way measuring participation in subgraphs of size 2 Can we generalize this? Przulj (2006) came up with the definition of graphlet degree distribution: Where does the node appear in orbits of subgraphs? 15
16 Graphlet Degree Distribution 16
17 Computational Problem In its core, finding motifs and graphlets its all about finding and counting subgraphs. Just knowing if a certain subgraph exists is already an hard computational problem! Subgraph isomorphism is NPcomplete Execution time grows exponentially as the size of the graph or the motif/graphlet increases Feasible motif size is usually small (3 to 8) and network size in the order of hundreds or thousands of nodes 17
18 Subgraph Discovery Research Part of my research is focused on improving efficiency in network motif detection. Scale up! How? Novel data structures for the graphs and subgraphs Novel faster algorithms Sampling techniques Parallel approaches (with different paradigms) 18
19 Previous Approaches Networkcentric approaches: Enumerate all subgraphs and then compute isomorphisms (ex: ESU/Fanmod, Kavosh). Subgraphcentric approaches: Find one subgraph at a time (ex: Grochow and Kellis) 19
20 A setcentric approach Key insight: can we do better looking for a given set of subgraphs? Looking for all does not allow for discarding subgraphs we don't care about (which would save computation time) Looking for one at the time does not allow to reuse results (which would avoid redundant computation) Can we find what is common between subgraphs and use that? Setcentric approach: Find a custom set of subgraphs (maybe one, maybe all, maybe something in between) 20
21 GTries Motivation Sequences and Prefix Trees (tries) Can the concept be extended to graphs? 21
22 GTries Motivation Graphs have common substructure! GTries (etimology: Graph retrieval ) 22
23 The GTrie data structure GTries: (customized) collections of subgraphs Common substructures are identified Information is compressed 23
24 The GTrie data structure GTries: also valid for directed networks 24
25 The GTrie data structure GTries: also valid for colored/labeled networks GTrie 25
26 Creating a GTrie Iterative insertion Start with an empty gtrie 26
27 The Need for a Canonical Form There are different node orderings representing the same subgraph Canonical form for a getting an unique gtries Different canon will give origin to different gtries 27
28 Impact of Canonical Form 28
29 Custom Canonical Form Connectivity Path induces connected subgraph Compressibility More common subtructre, less gtries nodes Constraining As many connections as possible to ancestor nodes (limit possible matches) GTCanon 29
30 GTCanon Example 30
31 Searching with GTries Backtracking Procedure Searching at the same time for several subgraphs Candidates for node 1: {0, 1, 2, 3, 4, 5} Try 0: Match = {0}, Neighb. = {1,3,4} Try 1: Match = {0,1}, Neighb. = {2,3,4,5} Try 2: no edge from 2 to 0! FAIL Try 3: no edge from 3 to 1! FAIL Try 4: Match = {0, 1, 4} FOUND! Try 5: no edge from 5 to 1! FAIL 31
32 Searching with GTries The same subgraph could be found several times due to automorphisms (symmetries) We would not only find {0,1,4} but also: {0, 4, 1} {1, 0, 4} {1, 4, 0} {4, 0, 1} {4, 1, 0} 32
33 Symmetry Breaking Conditions Conditions on node labels Add conditions to respective gtrie path Match only when conditions of at least one descendant are respected Filter conditions to ensure minimum work Ex: transitive property (a<b,a<c,b<c leads to a<b,b<c); assured descendants, only store relevant to node, etc 33
34 Complete GTrie Example All six undirected 4subgraphs 34
35 Complete GTrie Example All thirteen directed 3subgraphs 35
36 Sequential version: some results Set of 12 representative real networks Network Group Directed dolphins social no circuit physical no neural biological yes 297 2, metabolic biological yes 453 2, links social yes 1,490 19, coauthors social no 1,589 2, ppi biological no 2,361 6, odlis semantic yes 2,909 18, power physical no 4,941 6, company social yes 8,497 6, foldoc Semantic yes 13, , internet Physical no 22,963 48, ,390 V(G) 36 E(G) Nr. Neighbours Average Max
37 Sequential version: some results On both directed and undirected graphs gtries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x Example results for full census of size k (speedup on a set of undirected networks) Network k ESU Kavosh Grochow dolphins circuit coauthors ppi power internet
38 Sequential version: some results On both directed and undirected graphs gtries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x Example results for full census of size k (speedup on a set of directed networks) Network K ESU Kavosh Grochow neural metabolic links odlis company foldoc
39 Sequential version: some results On both directed and undirected graphs gtries were from 1 to 2 orders of magnitude faster than existing state of the art From 10x to 200x 39
40 Sampling approach Sample some subgraph occurrences Deduce approximate total counting Trade accuracy for speed 40
41 Sampling approach: search tree Backtracking produces search tree 41
42 Sampling approach Original: unbalanced search tree Goal: Uniform sampling Subgraph occurrences are in last tree depth 42
43 Sampling approach Original: unbalanced search tree Goal: Uniform sampling Associate a probability of traversing with each search tree depth 100% 100% 80% 80% 43
44 Sampling Approach Probabilities associated with each depth: {P0, P1,, Pmax} Sampling is uniform Probability of any occurrence is P0 x P1 x x Pmax We can produce unbiased estimator Estimate = (Nr sampled occurences) / (P0 x P1 x x Pmax) The probabilities control the search Regarding accuracy: avoid small values close to root Entire search branches disregarded more variance Regarding speed: avoid high values close to root More search branches higher execution time Some results We can obtain ~90% of accuracy using ~20% of time 44
45 Parallelism Subgraph Discovery is a tree shaped computation Search nodes are independent from each other {0,1,3} If we know where we are, we can continue from there Tree Nodes > Work Units 45
46 Parallel Problem Input: set of work units GTrie: (network, gtrie node, partial match) Goal: efficiently distribute work units among processors We need dynamic load balancing 46
47 Parallel approaches Environments we explored Clusters (distributed memory using MPI) [almost linear speedup up to 128 cores] Multicores (shared memory using threads) [almost linear speedup up to 64 cores] Some of the techniques Receiver initiated work stealing Diagonal splitting and efficient snapshots Random polling 47
48 Example Computation GTrie Node Graph Vertex 48
49 Example Computation GTrie Node Graph Vertex 49
50 Example Computation GTrie Node Graph Vertex 50
51 Example Computation GTrie Node Graph Vertex 51
52 Example Computation GTrie Node Graph Vertex Current Work Unit Explored Work Units STOP 52
53 Example Computation Keep Give to requester Both Diagonal Splitting 53
54 Some publications Core sequential algorithms  GTries: a data structure for storing and finding subgraphs. Data Mining and Know. Discovery, Towards a faster networkcentric subgraph census. ASONAM' Querying Subgraph Sets with GTries. DBSocial'2012 (best paper award)  Strategies for Network Motifs Discovery. EScience Sampling approach  Efficient Subgraph Frequency Estimation with GTries. WABI' RandFaSE: fast approximate subgraph census. SNAM, 2015 Parallel approach  Parallel subgraph counting for multicore architectures. ISPA' A Scalable Parallel Approach for Subgraph Census Computation. MuCoCos' Parallel Discovery of Network Motifs. Journal of Parallel and Distributed Computing Efficient Parallel Subgraph Counting using GTries. IEEE Cluster'2010. Concept variation and applications  Discovering Colored Network Motifs. Complenet' Motif Mining in Weighted Networks. Damnet'2012 (and new related in paper ACMSAC'2015)  Coauthorship network comparison across research fields using motifs. ASONAM'
55 Software Reference sequential implementation is available online and uses C++: More versions available by request and we are working on giving more options to practitioners (ex: GUI, plugin for Gephi/Cytoscape) Graphlets, large scale and adaptive sampling almost public 55
56 Complex Networks "Group Pedro Ribeiro Fernando Silva (leader) David Aparício (PhD Student) Multicores, Graphlets Pedro Paredes (BSc Student) NetworkCentric Graph Theory (leader) Miguel Araújo S. Choobdar (PhD Student) (former PhD Stud) Community Detection Matrix Factorization Structural Roles Temporal, Weigths Miguel Silva (MSc Student) Random Networks Ahmad Naser (MSc Student) GUI and Visualization Some collaborations (ongoing and/or starting):  M. Kaiser (U Newcastle) [Brain Networks]  L. F. Costa (U São Paulo) [Null Models]  K. Pingali (UT Austin) [Temporal Patterns]  C. Faloutsos (CMU) [Large Scale Networks]  S. Parthasarathy (Ohio State) [Roles] 56
On Network Tools for Network Motif Finding: A Survey Study
On Network Tools for Network Motif Finding: A Survey Study Elisabeth A. Wong 1,2, Brittany Baur 1,3 1 2010 NSF BioGrid REU Research Fellows at Univ of Connecticut 2 Bowdoin College 3 Manhattanville College
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks YoungRae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationA Serial Partitioning Approach to Scaling GraphBased Knowledge Discovery
A Serial Partitioning Approach to Scaling GraphBased Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
More information1311. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 1311 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationImplementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya Ojah.chandana@gmail.com
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in highdimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to highdimensional data.
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationLoad Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alphabeta pruning: how to assign work to idle processes without much communication? Additionally for alphabeta pruning: implementing the youngbrotherswait
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More information9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining
9 Graph Mining, Social Network Analysis, and Multirelational Data Mining We have studied frequentitemset mining in Chapter 5 and sequentialpattern mining in Section 3 of Chapter 8. Many scientific and
More information2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]
Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Graphtheoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564)  Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationA Fast Algorithm For Finding Hamilton Cycles
A Fast Algorithm For Finding Hamilton Cycles by Andrew Chalaturnyk A thesis presented to the University of Manitoba in partial fulfillment of the requirements for the degree of Masters of Science in Computer
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationPPD: Scheduling and Load Balancing 2
PPD: Scheduling and Load Balancing 2 Fernando Silva Computer Science Department Center for Research in Advanced Computing Systems (CRACS) University of Porto, FCUP http://www.dcc.fc.up.pt/~fds 2 (Some
More informationInfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
More informationParallel Algorithms for Smallworld Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Smallworld Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, smallworld topology Community Identification/Graph
More informationD A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
More informationGraph Pattern Analysis with PatternGravisto
Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 9, no. 1, pp. 19 29 (2005) Graph Pattern Analysis with PatternGravisto Christian Klukas Dirk Koschützki Falk Schreiber Institute of Plant
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationPraktikum Wissenschaftliches Rechnen (Performanceoptimized optimized Programming)
Praktikum Wissenschaftliches Rechnen (Performanceoptimized optimized Programming) Dynamic Load Balancing Dr. RalfPeter Mundani Center for Simulation Technology in Engineering Technische Universität München
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationNetwork (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,
More informationOutline. NPcompleteness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NPcompleteness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2pairs sum vs. general Subset Sum Reducing one problem to another Clique
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
More informationEchidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
More informationBIOINFORMATICS. Biomolecular Network Motif Counting and Discovery by Color Coding
BIOINFORMATICS Vol. 00 no. 00 2008 Pages 9 Biomolecular Network Motif Counting and Discovery by Color Coding Noga Alon, Phuong Dao 2, Iman Hajirasouliha 2, Fereydoun Hormozdiari 2, and S. Cenk Sahinalp
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationLoad balancing in a heterogeneous computer system by selforganizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by selforganizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationCSE 4351/5351 Notes 7: Task Scheduling & Load Balancing
CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationBinary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode638107, Tamil Nadu, India Email: kc.gomathi@gmail.com M. Moorthi
More informationGRAPH THEORY LECTURE 4: TREES
GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection
More informationGraph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 3137 (ISSN: 23472529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
More informationComparison of Data Mining Techniques for Money Laundering Detection System
Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer
More informationA New Marketing Channel Management Strategy Based on Frequent Subtree Mining
A New Marketing Channel Management Strategy Based on Frequent Subtree Mining Daoping Wang Peng Gao School of Economics and Management University of Science and Technology Beijing ABSTRACT For most manufacturers,
More informationStatistical and computational challenges in networks and cybersecurity
Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 48, 2015,
More informationOptimal Network Alignment with Graphlet Degree Vectors
Cancer Informatics Original Research Open Access Full open access to this and thousands of other papers at http://www.lapress.com. Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković,2,
More informationBIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
More informationSystems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationPerformance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIFRUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
More information5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.
1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction
More informationInSitu Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
InSitu Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating
More informationFast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
More informationClustering through Decision Tree Construction in Geology
Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 2941 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty
More informationBranchandPrice Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN BranchandPrice Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationGraph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14
Graph Mining and Social Network nalysis Data Mining; EES 4412 Darren Rolfe + Vince hu 11.06.14 genda Graph Mining Methods for Mining Frequent Subgraphs prioribased pproach: GM, FSG PatternGrowth pproach:
More informationComplex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5
Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5 Fernando FerreiraSantos 2012 Title: Complex Network Analysis of Brain Connectivity: An Introduction Technical Report Authors:
More informationDistributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu
Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationPARALLEL PROGRAMMING
PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL
More informationDetection of local affinity patterns in big data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationThe University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
More informationMassive Streaming Data Analytics: A Case Study with Clustering Coefficients. David Ediger, Karl Jiang, Jason Riedy and David A.
Massive Streaming Data Analytics: A Case Study with Clustering Coefficients David Ediger, Karl Jiang, Jason Riedy and David A. Bader Overview Motivation A Framework for Massive Streaming hello Data Analytics
More informationCAD Algorithms. P and NP
CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationImplementing Graph Pattern Queries on a Relational Database
LLNLTR400310 Implementing Graph Pattern Queries on a Relational Database Ian L. Kaplan, Ghaleb M. Abdulla, S Terry Brugger, Scott R. Kohn January 8, 2008 Disclaimer This document was prepared as an account
More informationVisual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationMapReduce for Machine Learning on Multicore
MapReduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers  dual core to 12+core Shift to more concurrent programming paradigms and languages Erlang,
More informationSCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationClustering. Adrian Groza. Department of Computer Science Technical University of ClujNapoca
Clustering Adrian Groza Department of Computer Science Technical University of ClujNapoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 Kmeans 3 Hierarchical Clustering What is Datamining?
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationNetwork Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik MagdonIsmail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationHigh Throughput Network Analysis
High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department
More informationNetwork Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan
Network Metrics, Planar Graphs, and Software Tools Based on materials by Lala Adamic, UMichigan Network Metrics: Bowtie Model of the Web n The Web is a directed graph: n webpages link to other webpages
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationA comparative study of social network analysis tools
Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd EgyedZsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationAnalysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 4089241000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
More informationWEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
More informationUsing InMemory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using InMemory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
More informationExponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A klabeling of vertices of a graph G(V, E) is a function V [k].
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 8. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationOPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION
OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering
More informationData Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing GridFiles Kdtrees Ulf Leser: Data
More informationurika! Unlocking the Power of Big Data at PSC
urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More information