Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Size: px
Start display at page:

Download "Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer"

Transcription

1 Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1

2 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities Clique percolation method Finding a community with query nodes Conclusion 2

3 What is Community Detection? Different from traditional clustering Algorithms use the graph property Graphs with a natural origin have a structure that is not random We try to find these structures by analyzing the graph A perfect solution has yet to be found 3

4 Motivation Communities can represent parts of a larger system (Like organs in the human body) Communities can be considered as a summary of the graph Communities make it easy to visualize and understand complex systems Communities on the web might represent pages of related topics Community can reveal the properties without releasing the individual privacy information 4

5 Defining a Community There is not exact definition of a community in a graph It depends on the application A general definition: Separation between nodes in different communities Cohesion between nodes in a community The differences between algorithms come down to the precise definition 5

6 Basics For a Graph G = {V, E} and a subgraph C G with G = V = n and C = nc φint(c) should have a higher value than the whole graph and φext(c) should be much lower Local definitions see communities as an autonomous entity within a larger system Global definitions see the communities as essential parts of a larger system Vertex similarity: compare individual nodes and group them based on a similarity measure 6

7 Methods Finding overlapping communities Clique percolation method (CPM) Finding communities with query nodes 7

8 Clique Percolation Method CPM is based on the idea that communities are likely to consist of cliques Assumption: Every node in the same community is connected to nearly every other node A community is build up by a chain of k-cliques which are adjacent. Two k-cliques are adjacent if they share k-1 nodes The largest possible chain is defined as community This is a local definition 8

9 Implementation of CPM The number of possible k-cliques in a graph is quite high Implementations search for maximal k-cliques (NP-hard problem) We build an clique-clique overlap matrix O All entries smaller than k-1 are removed 9

10 Parameter k = 3; k = 4 The results of processing the example graph with the CFinder software 10

11 Drawbacks Even if the underlying problem is NP-hard, for large sparse graphs, this algorithm is reasonably fast Some cases lead to useless results: It looks for cliques not dense subgraphs It requires a large number of cliques, but not too many 11

12 Finding a community with query nodes The goal is to find a subgraph H that contains a given set Q of query nodes and is densely connected. The function f is maximized among all possible choices for H In this case we choose the minimum degree for f Additionally we add a distance constraint d 12

13 Without size restriction - Greedy algorithm Choose f = f(h) = minimum degree of a node in H We set G0=G then repeat the steps: Obtain Gt+1 by removing a node which violates the distance constraint or has the minimum degree Terminate if either one of the query nodes has minimum degree or the query nodes are no longer connected We choose the component of Gt for which the minimum degree f(h) is maximized This can be implemented in O(n+m) 13

14 Q = {1, 2, 3} The greedy algorithm, without size constraint, applied on the example graph 14

15 Communities with size restriction A size constraint k makes the problem NP hard (Can be shown via a reduction to the Steiner tree problem) But it can be assumed that the size of the result set is correlated with the distance constraint The paper proposes two heuristics: GreedyDist repeatedly executes Greedy and decreases d until the size k of the graph is small enogh GreedyFast restricts the graph to the k closest nodes to the query nodes. Then Greedy is invoked 15

16 Evaluation with the DBLP dataset The goal was to find a network of scientific collaboration around Christos Papadimitriou 16

17 Conclusion A really broad topic with lots of applications Each algorithms is build with different problems in mind Algorithms are difficult to compare, there is no standard way of testing 17

18 Bibliography [1] P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17 61, [2] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 174, [3] P. F. Jonsson and P. A. Bates*. Global topological features of cancer proteins in the human interactome. Bioinformatics, , [4] T. H. J. S. J.-P. O. K. Kaski. Spectral and network methods in the analysis of correlation matrices of stock returns. Physica A 383, , [5] J. M. Kumpula, M. Kivelä, K. Kaski, and J. Saramäki. Sequential algorithm for fast clique percolation. Phys. Rev. E, 78:026109, Aug [6] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping com- munity structure of complex networks in nature and society. Nature, 435: , June [7] M. E. Porter, K. Schwab, M. E. Porter, K. Schwab, F. Paua, E. T. Herrera, and M. Porter. Communities in networks. Notices of the American Mathematical Society, , [8] M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining, KDD '10, , New York, NY, USA, ACM. [9] K.-F. W. Wei Gao. Information Retrieval Technology. Springer Berlin Heidelberg,

Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

More information

Group CRM: a New Telecom CRM Framework from Social Network Perspective

Group CRM: a New Telecom CRM Framework from Social Network Perspective Group CRM: a New Telecom CRM Framework from Social Network Perspective Bin Wu Beijing University of Posts and Telecommunications Beijing, China wubin@bupt.edu.cn Qi Ye Beijing University of Posts and Telecommunications

More information

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Expansion Properties of Large Social Graphs

Expansion Properties of Large Social Graphs Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Analysis of Internet Topologies

Analysis of Internet Topologies Analysis of Internet Topologies Ljiljana Trajković ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov Serdica Math. J. 30 (2004), 95 102 SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS Nickolay Khadzhiivanov, Nedyalko Nenov Communicated by V. Drensky Abstract. Let Γ(M) where M V (G) be the set of all vertices

More information

Social Network Mining

Social Network Mining Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

More information

EMPLOYMENT 2008 - Research associate, Statistical and Biological Physics Research

EMPLOYMENT 2008 - Research associate, Statistical and Biological Physics Research GERGELY PALLA - CURRICULUM VITAE CONTACT Statistical and Biological Physics Research Group of HAS, Eötvös University, Budapest, Pázmány P. stny. 1/A. H-1117 Hungary Phone: (36-1) 372-2768 Fax: (36-1) 372-2757

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Graph Classification and Easy Reliability Polynomials

Graph Classification and Easy Reliability Polynomials Mathematical Assoc. of America American Mathematical Monthly 121:1 November 18, 2014 1:11 a.m. AMM.tex page 1 Graph Classification and Easy Reliability Polynomials Pablo Romero and Gerardo Rubino Abstract.

More information

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML UNIVERSITY OF ALBERTA Social Network Analysis for the Assessment of Learning Osmar R. Zaïane Professor & Scientific Director of AICML Educational Data Mining 2010 Pittsburgh, USA University of Alberta

More information

Analysis of Internet Topologies: A Historical View

Analysis of Internet Topologies: A Historical View Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,

More information

Dmitri Krioukov CAIDA/UCSD

Dmitri Krioukov CAIDA/UCSD Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD dima@caida.org F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid

More information

The spectra of random graphs with given expected degrees

The spectra of random graphs with given expected degrees Classification: Physical Sciences, Mathematics The spectra of random graphs with given expected degrees by Fan Chung Linyuan Lu Van Vu Department of Mathematics University of California at San Diego La

More information

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

More information

Discovering Overlapping Groups in Social Media

Discovering Overlapping Groups in Social Media Discovering Overlapping Groups in Social Media Xufei Wang Arizona State University Tempe, AZ 85287, USA Email:xufei.wang@asu.edu Lei Tang Yahoo! Labs Santa Clara, CA 9554, USA Email:ltang@yahoo-inc.com

More information

A Performance Comparison of Five Algorithms for Graph Isomorphism

A Performance Comparison of Five Algorithms for Graph Isomorphism A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

CAD Algorithms. P and NP

CAD Algorithms. P and NP CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

More information

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Ruoming Jin, Scott McCallen Department of Computer Science,Kent State University, Kent, OH, 44241 {jin,smccalle}@cs.kent.edu

More information

Course Syllabus For Operations Management. Management Information Systems

Course Syllabus For Operations Management. Management Information Systems For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third

More information

Link Prediction in Social Networks

Link Prediction in Social Networks CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

An Introduction to APGL

An Introduction to APGL An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an

More information

OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT EQUIPMENT

OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT EQUIPMENT 12th GI/ITG CONFERENCE ON MEASURING, MODELING AND EVALUATION OF COMPUTER AND COMMUNICATION SYSTEMS 3rd POLISH-GERMAN TELETRAFFIC SYMPOSIUM OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Discovering and Analyzing Deviant Communities: Methods and Experiments

Discovering and Analyzing Deviant Communities: Methods and Experiments Discovering and Analyzing Deviant Communities: Methods and Experiments Napoleon C. Paxton *, Dae-il Jang **, Ira S. Moskowitz *, Gail-Joon Ahn ** and Stephen Russell * * Information Technology Division,

More information

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8] Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Structural and functional analytics for community detection in large-scale complex networks

Structural and functional analytics for community detection in large-scale complex networks Chopade and Zhan Journal of Big Data DOI 10.1186/s40537-015-0019-y RESEARCH Open Access Structural and functional analytics for community detection in large-scale complex networks Pravin Chopade 1* and

More information

Access control for data integration in presence of data dependencies. Mehdi Haddad, Mohand-Saïd Hacid

Access control for data integration in presence of data dependencies. Mehdi Haddad, Mohand-Saïd Hacid Access control for data integration in presence of data dependencies Mehdi Haddad, Mohand-Saïd Hacid 1 Outline Introduction Motivating example Related work Approach Detection phase (Re)configuration phase

More information

Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics

Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics NDSSL Technical Report 9-66 July 8, 29 Title: Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics Authors: Maleq Khan V.S. Anil Kumar Madhav Marathe

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

Inet-3.0: Internet Topology Generator

Inet-3.0: Internet Topology Generator Inet-3.: Internet Topology Generator Jared Winick Sugih Jamin {jwinick,jamin}@eecs.umich.edu CSE-TR-456-2 Abstract In this report we present version 3. of Inet, an Autonomous System (AS) level Internet

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining The Minimum Consistent Subset Cover Problem and its Applications in Data Mining Byron J Gao 1,2, Martin Ester 1, Jin-Yi Cai 2, Oliver Schulte 1, and Hui Xiong 3 1 School of Computing Science, Simon Fraser

More information

Research on Supply Chain Network Knowledge Dissemination Mode

Research on Supply Chain Network Knowledge Dissemination Mode 529 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The

More information

Finding and counting given length cycles

Finding and counting given length cycles Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected

More information

Travis Goodwin & Sanda Harabagiu

Travis Goodwin & Sanda Harabagiu Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Big Data Graph Algorithms

Big Data Graph Algorithms Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

More information

Beyond the Stars: Revisiting Virtual Cluster Embeddings

Beyond the Stars: Revisiting Virtual Cluster Embeddings Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, Télécom-ParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

A comparative study of social network analysis tools

A comparative study of social network analysis tools Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises

More information

Ant Colony Optimization and Constraint Programming

Ant Colony Optimization and Constraint Programming Ant Colony Optimization and Constraint Programming Christine Solnon Series Editor Narendra Jussien WILEY Table of Contents Foreword Acknowledgements xi xiii Chapter 1. Introduction 1 1.1. Overview of the

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

Problem Set 7 Solutions

Problem Set 7 Solutions 8 8 Introduction to Algorithms May 7, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Handout 25 Problem Set 7 Solutions This problem set is due in

More information

School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws

School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws Graph Mining, self-similarity and power laws Christos Faloutsos University Overview Achievements global patterns and laws (static/dynamic) generators influence propagation communities; graph partitioning

More information

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES G Nandi 1, A Das 1 & 2 1 Assam Don Bosco University Guwahati, Assam 781017, India 2 St. Anthony s College, Shillong, Meghalaya 793001, India

More information

Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem

Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem Martin Ester, Rong Ge, Byron J. Gao, Zengjian Hu, Boaz Ben-Moshe School of Computing Science, Simon Fraser

More information

Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis

Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis Abstract Framing the scene and investigating the cause of a software induced cyber-attack continues to be one of the

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com>

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com> IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration

More information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and

More information

A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE *

A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE * Journal of Mathematical Sociology Gordon and Breach Science Publishers 1973 Vol. 3, pp 113-126 Printed in Birkenhead, England A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE * RICHARD D. ALBA Columbia

More information

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Community Mining from Multi-relational Networks

Community Mining from Multi-relational Networks Community Mining from Multi-relational Networks Deng Cai 1, Zheng Shao 1, Xiaofei He 2, Xifeng Yan 1, and Jiawei Han 1 1 Computer Science Department, University of Illinois at Urbana Champaign (dengcai2,

More information

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS International Scientific Conference & International Workshop Present Day Trends of Innovations 2012 28 th 29 th May 2012 Łomża, Poland DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS Lubos Takac 1 Michal Zabovsky

More information

CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance

CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance Susanne Halstead, Daniel Serrano, Scott Proctor 6 December 2014 1 Abstract The Behance social network allows professionals of diverse

More information

The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE 2012 SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH (M.Sc., SFU, Russia) A THESIS

More information

DATA MINING - SELECTED TOPICS

DATA MINING - SELECTED TOPICS DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Differentially Private Analysis of

Differentially Private Analysis of Title: Name: Affil./Addr. Keywords: SumOriWork: Differentially Private Analysis of Graphs Sofya Raskhodnikova, Adam Smith Pennsylvania State University Graphs, privacy, subgraph counts, degree distribution

More information

A box-covering algorithm for fractal scaling in scale-free networks

A box-covering algorithm for fractal scaling in scale-free networks CHAOS 17, 026116 2007 A box-covering algorithm for fractal scaling in scale-free networks J. S. Kim CTP & FPRD, School of Physics and Astronomy, Seoul National University, NS50, Seoul 151-747, Korea K.-I.

More information

Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis

Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis MEHMET KOYUTÜRK and ANANTH GRAMA Department of Computer Sciences, Purdue University and NAREN RAMAKRISHNAN

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Challenges in Computer Science April 1, 2014 Frank Takes (ftakes@liacs.nl) LIACS, Leiden University Overview Context Social Network Analysis Online Social Networks Friendship Graph

More information

Structural constraints in complex networks

Structural constraints in complex networks Structural constraints in complex networks Dr. Shi Zhou Lecturer of University College London Royal Academy of Engineering / EPSRC Research Fellow Part 1. Complex networks and three key topological properties

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Definition 11.1. Given a graph G on n vertices, we define the following quantities: Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Thank you! NetMine Data mining on networks IIS -0209107 AWSOM. Outline. Proposed method. Goals

Thank you! NetMine Data mining on networks IIS -0209107 AWSOM. Outline. Proposed method. Goals NetMine Data mining on networks IIS -0209107 Christos Faloutsos (CMU) Michalis Faloutsos (UCR) Peggy Agouris George Kollios Fillia Makedon Betty Salzberg Anthony Stefanidis Thank you! NSF-IDM 04 C. Faloutsos

More information

CSC2420 Spring 2015: Lecture 3

CSC2420 Spring 2015: Lecture 3 CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda

More information

Data Mining Fundamentals

Data Mining Fundamentals Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze

More information

How To Understand The Network Of A Network

How To Understand The Network Of A Network Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each

More information

Keywords Big Graphs, Big graph databases, Triangulation method, k-mutual friend subgraph, Streaming.

Keywords Big Graphs, Big graph databases, Triangulation method, k-mutual friend subgraph, Streaming. Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Review on Big

More information

Emergent properties of large linked networks have recently

Emergent properties of large linked networks have recently Colloquium Tracking evolving communities in large linked networks John Hopcroft*, Omar Khan, Brian Kulis, and Bart Selman* *Department of Computer Science, Cornell University, Ithaca, NY 14853; Google,

More information

Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System

Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System Implementing Web-Based Computing Services To Improve Performance And Assist Telemedicine Database Management System D. A. Vidhate 1, Ige Pranita 2, Kothari Pooja 3, Kshatriya Pooja 4 (Information Technology,

More information

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that

More information