Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Size: px
Start display at page:

Download "Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer"

Transcription

1 Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1

2 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities Clique percolation method Finding a community with query nodes Conclusion 2

3 What is Community Detection? Different from traditional clustering Algorithms use the graph property Graphs with a natural origin have a structure that is not random We try to find these structures by analyzing the graph A perfect solution has yet to be found 3

4 Motivation Communities can represent parts of a larger system (Like organs in the human body) Communities can be considered as a summary of the graph Communities make it easy to visualize and understand complex systems Communities on the web might represent pages of related topics Community can reveal the properties without releasing the individual privacy information 4

5 Defining a Community There is not exact definition of a community in a graph It depends on the application A general definition: Separation between nodes in different communities Cohesion between nodes in a community The differences between algorithms come down to the precise definition 5

6 Basics For a Graph G = {V, E} and a subgraph C G with G = V = n and C = nc φint(c) should have a higher value than the whole graph and φext(c) should be much lower Local definitions see communities as an autonomous entity within a larger system Global definitions see the communities as essential parts of a larger system Vertex similarity: compare individual nodes and group them based on a similarity measure 6

7 Methods Finding overlapping communities Clique percolation method (CPM) Finding communities with query nodes 7

8 Clique Percolation Method CPM is based on the idea that communities are likely to consist of cliques Assumption: Every node in the same community is connected to nearly every other node A community is build up by a chain of k-cliques which are adjacent. Two k-cliques are adjacent if they share k-1 nodes The largest possible chain is defined as community This is a local definition 8

9 Implementation of CPM The number of possible k-cliques in a graph is quite high Implementations search for maximal k-cliques (NP-hard problem) We build an clique-clique overlap matrix O All entries smaller than k-1 are removed 9

10 Parameter k = 3; k = 4 The results of processing the example graph with the CFinder software 10

11 Drawbacks Even if the underlying problem is NP-hard, for large sparse graphs, this algorithm is reasonably fast Some cases lead to useless results: It looks for cliques not dense subgraphs It requires a large number of cliques, but not too many 11

12 Finding a community with query nodes The goal is to find a subgraph H that contains a given set Q of query nodes and is densely connected. The function f is maximized among all possible choices for H In this case we choose the minimum degree for f Additionally we add a distance constraint d 12

13 Without size restriction - Greedy algorithm Choose f = f(h) = minimum degree of a node in H We set G0=G then repeat the steps: Obtain Gt+1 by removing a node which violates the distance constraint or has the minimum degree Terminate if either one of the query nodes has minimum degree or the query nodes are no longer connected We choose the component of Gt for which the minimum degree f(h) is maximized This can be implemented in O(n+m) 13

14 Q = {1, 2, 3} The greedy algorithm, without size constraint, applied on the example graph 14

15 Communities with size restriction A size constraint k makes the problem NP hard (Can be shown via a reduction to the Steiner tree problem) But it can be assumed that the size of the result set is correlated with the distance constraint The paper proposes two heuristics: GreedyDist repeatedly executes Greedy and decreases d until the size k of the graph is small enogh GreedyFast restricts the graph to the k closest nodes to the query nodes. Then Greedy is invoked 15

16 Evaluation with the DBLP dataset The goal was to find a network of scientific collaboration around Christos Papadimitriou 16

17 Conclusion A really broad topic with lots of applications Each algorithms is build with different problems in mind Algorithms are difficult to compare, there is no standard way of testing 17

18 Bibliography [1] P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17 61, [2] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 174, [3] P. F. Jonsson and P. A. Bates*. Global topological features of cancer proteins in the human interactome. Bioinformatics, , [4] T. H. J. S. J.-P. O. K. Kaski. Spectral and network methods in the analysis of correlation matrices of stock returns. Physica A 383, , [5] J. M. Kumpula, M. Kivelä, K. Kaski, and J. Saramäki. Sequential algorithm for fast clique percolation. Phys. Rev. E, 78:026109, Aug [6] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping com- munity structure of complex networks in nature and society. Nature, 435: , June [7] M. E. Porter, K. Schwab, M. E. Porter, K. Schwab, F. Paua, E. T. Herrera, and M. Porter. Communities in networks. Notices of the American Mathematical Society, , [8] M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining, KDD '10, , New York, NY, USA, ACM. [9] K.-F. W. Wei Gao. Information Retrieval Technology. Springer Berlin Heidelberg,

Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

More information

Group CRM: a New Telecom CRM Framework from Social Network Perspective

Group CRM: a New Telecom CRM Framework from Social Network Perspective Group CRM: a New Telecom CRM Framework from Social Network Perspective Bin Wu Beijing University of Posts and Telecommunications Beijing, China wubin@bupt.edu.cn Qi Ye Beijing University of Posts and Telecommunications

More information

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

More information

Expansion Properties of Large Social Graphs

Expansion Properties of Large Social Graphs Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov Serdica Math. J. 30 (2004), 95 102 SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS Nickolay Khadzhiivanov, Nedyalko Nenov Communicated by V. Drensky Abstract. Let Γ(M) where M V (G) be the set of all vertices

More information

Analysis of Internet Topologies

Analysis of Internet Topologies Analysis of Internet Topologies Ljiljana Trajković ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British

More information

Social Network Mining

Social Network Mining Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

More information

Graph Classification and Easy Reliability Polynomials

Graph Classification and Easy Reliability Polynomials Mathematical Assoc. of America American Mathematical Monthly 121:1 November 18, 2014 1:11 a.m. AMM.tex page 1 Graph Classification and Easy Reliability Polynomials Pablo Romero and Gerardo Rubino Abstract.

More information

EMPLOYMENT 2008 - Research associate, Statistical and Biological Physics Research

EMPLOYMENT 2008 - Research associate, Statistical and Biological Physics Research GERGELY PALLA - CURRICULUM VITAE CONTACT Statistical and Biological Physics Research Group of HAS, Eötvös University, Budapest, Pázmány P. stny. 1/A. H-1117 Hungary Phone: (36-1) 372-2768 Fax: (36-1) 372-2757

More information

Analysis of Internet Topologies: A Historical View

Analysis of Internet Topologies: A Historical View Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,

More information

OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT EQUIPMENT

OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT EQUIPMENT 12th GI/ITG CONFERENCE ON MEASURING, MODELING AND EVALUATION OF COMPUTER AND COMMUNICATION SYSTEMS 3rd POLISH-GERMAN TELETRAFFIC SYMPOSIUM OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT

More information

Information flow in generalized hierarchical networks

Information flow in generalized hierarchical networks Information flow in generalized hierarchical networks Juan A. Almendral, Luis López and Miguel A. F. Sanjuán Grupo de Dinámica no Lineal y Teoría del Caos E.S.C.E.T., Universidad Rey Juan Carlos Tulipán

More information

The spectra of random graphs with given expected degrees

The spectra of random graphs with given expected degrees Classification: Physical Sciences, Mathematics The spectra of random graphs with given expected degrees by Fan Chung Linyuan Lu Van Vu Department of Mathematics University of California at San Diego La

More information

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

A Performance Comparison of Five Algorithms for Graph Isomorphism

A Performance Comparison of Five Algorithms for Graph Isomorphism A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,

More information

Travis Goodwin & Sanda Harabagiu

Travis Goodwin & Sanda Harabagiu Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research

More information

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Ruoming Jin, Scott McCallen Department of Computer Science,Kent State University, Kent, OH, 44241 {jin,smccalle}@cs.kent.edu

More information

Dmitri Krioukov CAIDA/UCSD

Dmitri Krioukov CAIDA/UCSD Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD dima@caida.org F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

CAD Algorithms. P and NP

CAD Algorithms. P and NP CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

Big Data Graph Algorithms

Big Data Graph Algorithms Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8] Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)

More information

An Introduction to APGL

An Introduction to APGL An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an

More information

Discovering Overlapping Groups in Social Media

Discovering Overlapping Groups in Social Media Discovering Overlapping Groups in Social Media Xufei Wang Arizona State University Tempe, AZ 85287, USA Email:xufei.wang@asu.edu Lei Tang Yahoo! Labs Santa Clara, CA 9554, USA Email:ltang@yahoo-inc.com

More information

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML UNIVERSITY OF ALBERTA Social Network Analysis for the Assessment of Learning Osmar R. Zaïane Professor & Scientific Director of AICML Educational Data Mining 2010 Pittsburgh, USA University of Alberta

More information

Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics

Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics NDSSL Technical Report 9-66 July 8, 29 Title: Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics Authors: Maleq Khan V.S. Anil Kumar Madhav Marathe

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Access control for data integration in presence of data dependencies. Mehdi Haddad, Mohand-Saïd Hacid

Access control for data integration in presence of data dependencies. Mehdi Haddad, Mohand-Saïd Hacid Access control for data integration in presence of data dependencies Mehdi Haddad, Mohand-Saïd Hacid 1 Outline Introduction Motivating example Related work Approach Detection phase (Re)configuration phase

More information

1 Basic Definitions and Concepts in Graph Theory

1 Basic Definitions and Concepts in Graph Theory CME 305: Discrete Mathematics and Algorithms 1 Basic Definitions and Concepts in Graph Theory A graph G(V, E) is a set V of vertices and a set E of edges. In an undirected graph, an edge is an unordered

More information

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

More information

Inet-3.0: Internet Topology Generator

Inet-3.0: Internet Topology Generator Inet-3.: Internet Topology Generator Jared Winick Sugih Jamin {jwinick,jamin}@eecs.umich.edu CSE-TR-456-2 Abstract In this report we present version 3. of Inet, an Autonomous System (AS) level Internet

More information

Structural and functional analytics for community detection in large-scale complex networks

Structural and functional analytics for community detection in large-scale complex networks Chopade and Zhan Journal of Big Data DOI 10.1186/s40537-015-0019-y RESEARCH Open Access Structural and functional analytics for community detection in large-scale complex networks Pravin Chopade 1* and

More information

Discovering and Analyzing Deviant Communities: Methods and Experiments

Discovering and Analyzing Deviant Communities: Methods and Experiments Discovering and Analyzing Deviant Communities: Methods and Experiments Napoleon C. Paxton *, Dae-il Jang **, Ira S. Moskowitz *, Gail-Joon Ahn ** and Stephen Russell * * Information Technology Division,

More information

Entropy based Graph Clustering: Application to Biological and Social Networks

Entropy based Graph Clustering: Application to Biological and Social Networks Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

Finding and counting given length cycles

Finding and counting given length cycles Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem

Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem Martin Ester, Rong Ge, Byron J. Gao, Zengjian Hu, Boaz Ben-Moshe School of Computing Science, Simon Fraser

More information

Keywords Big Graphs, Big graph databases, Triangulation method, k-mutual friend subgraph, Streaming.

Keywords Big Graphs, Big graph databases, Triangulation method, k-mutual friend subgraph, Streaming. Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Review on Big

More information

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining The Minimum Consistent Subset Cover Problem and its Applications in Data Mining Byron J Gao 1,2, Martin Ester 1, Jin-Yi Cai 2, Oliver Schulte 1, and Hui Xiong 3 1 School of Computing Science, Simon Fraser

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Link Prediction in Social Networks

Link Prediction in Social Networks CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

Beyond the Stars: Revisiting Virtual Cluster Embeddings

Beyond the Stars: Revisiting Virtual Cluster Embeddings Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, Télécom-ParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

Problem Set 7 Solutions

Problem Set 7 Solutions 8 8 Introduction to Algorithms May 7, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Handout 25 Problem Set 7 Solutions This problem set is due in

More information

Ant Colony Optimization and Constraint Programming

Ant Colony Optimization and Constraint Programming Ant Colony Optimization and Constraint Programming Christine Solnon Series Editor Narendra Jussien WILEY Table of Contents Foreword Acknowledgements xi xiii Chapter 1. Introduction 1 1.1. Overview of the

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Course Syllabus For Operations Management. Management Information Systems

Course Syllabus For Operations Management. Management Information Systems For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES G Nandi 1, A Das 1 & 2 1 Assam Don Bosco University Guwahati, Assam 781017, India 2 St. Anthony s College, Shillong, Meghalaya 793001, India

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Community-based Recommendations to Improve Intranet Users Productivity

Community-based Recommendations to Improve Intranet Users Productivity Aalto University School of Science Degree Programme of Computer Science and Engineering Roxana Ioana Roman Community-based Recommendations to Improve Intranet Users Productivity Universe Community Recommender

More information

Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes

Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes Michael Svendsen a, Arko Provo Mukherjee a, Srikanta Tirthapura a, a Department of Electrical and Computer

More information

2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines 2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

More information

Community Mining from Multi-relational Networks

Community Mining from Multi-relational Networks Community Mining from Multi-relational Networks Deng Cai 1, Zheng Shao 1, Xiaofei He 2, Xifeng Yan 1, and Jiawei Han 1 1 Computer Science Department, University of Illinois at Urbana Champaign (dengcai2,

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance

CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance Susanne Halstead, Daniel Serrano, Scott Proctor 6 December 2014 1 Abstract The Behance social network allows professionals of diverse

More information

The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE *

A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE * Journal of Mathematical Sociology Gordon and Breach Science Publishers 1973 Vol. 3, pp 113-126 Printed in Birkenhead, England A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE * RICHARD D. ALBA Columbia

More information

Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis

Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis Abstract Framing the scene and investigating the cause of a software induced cyber-attack continues to be one of the

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Definition 11.1. Given a graph G on n vertices, we define the following quantities: Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

More information

Structural constraints in complex networks

Structural constraints in complex networks Structural constraints in complex networks Dr. Shi Zhou Lecturer of University College London Royal Academy of Engineering / EPSRC Research Fellow Part 1. Complex networks and three key topological properties

More information

A box-covering algorithm for fractal scaling in scale-free networks

A box-covering algorithm for fractal scaling in scale-free networks CHAOS 17, 026116 2007 A box-covering algorithm for fractal scaling in scale-free networks J. S. Kim CTP & FPRD, School of Physics and Astronomy, Seoul National University, NS50, Seoul 151-747, Korea K.-I.

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

SMTP: Stedelijk Museum Text Mining Project

SMTP: Stedelijk Museum Text Mining Project SMTP: Stedelijk Museum Text Mining Project Jeroen Smeets Maastricht University smeetsjeroen@hotmail.com Prof. Dr. Ir. Johannes C. Scholtes Maastricht University j.scholtes@maastrichtuniversity.nl Dr. Claartje

More information

Graph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Graph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 3. Topic Overview Definitions and Representation Minimum

More information

Rational exponents in extremal graph theory

Rational exponents in extremal graph theory Rational exponents in extremal graph theory Boris Bukh David Conlon Abstract Given a family of graphs H, the extremal number ex(n, H) is the largest m for which there exists a graph with n vertices and

More information

Research on Supply Chain Network Knowledge Dissemination Mode

Research on Supply Chain Network Knowledge Dissemination Mode 529 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The

More information

Thank you! NetMine Data mining on networks IIS -0209107 AWSOM. Outline. Proposed method. Goals

Thank you! NetMine Data mining on networks IIS -0209107 AWSOM. Outline. Proposed method. Goals NetMine Data mining on networks IIS -0209107 Christos Faloutsos (CMU) Michalis Faloutsos (UCR) Peggy Agouris George Kollios Fillia Makedon Betty Salzberg Anthony Stefanidis Thank you! NSF-IDM 04 C. Faloutsos

More information

Data Mining Fundamentals

Data Mining Fundamentals Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze

More information

Cacti with minimum, second-minimum, and third-minimum Kirchhoff indices

Cacti with minimum, second-minimum, and third-minimum Kirchhoff indices MATHEMATICAL COMMUNICATIONS 47 Math. Commun., Vol. 15, No. 2, pp. 47-58 (2010) Cacti with minimum, second-minimum, and third-minimum Kirchhoff indices Hongzhuan Wang 1, Hongbo Hua 1, and Dongdong Wang

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis

Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis MEHMET KOYUTÜRK and ANANTH GRAMA Department of Computer Sciences, Purdue University and NAREN RAMAKRISHNAN

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws

School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws Graph Mining, self-similarity and power laws Christos Faloutsos University Overview Achievements global patterns and laws (static/dynamic) generators influence propagation communities; graph partitioning

More information

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

CSC2420 Spring 2015: Lecture 3

CSC2420 Spring 2015: Lecture 3 CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda

More information

Community Detection in large-scale IP networks by Observing Traffic at Network Boundary

Community Detection in large-scale IP networks by Observing Traffic at Network Boundary , October 21-23, 2015, San Francisco, USA Community Detection in large-scale IP networks by Observing Traffic at Network Boundary Ahmad Jakalan, Jian Gong, Qi Su, Xiaoyan Hu Abstract Internet communications

More information

Network mining for crime/fraud detection. FuturICT CrimEx January 26th, 2012 Jan Ramon

Network mining for crime/fraud detection. FuturICT CrimEx January 26th, 2012 Jan Ramon Network mining for crime/fraud detection FuturICT CrimEx January 26th, 2012 Jan Ramon Overview Administrative data and crime/fraud Data mining and related domains Data mining in large networks Opportunities

More information