# Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Size: px
Start display at page:

Download "Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer"

## Transcription

1 Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1

2 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities Clique percolation method Finding a community with query nodes Conclusion 2

3 What is Community Detection? Different from traditional clustering Algorithms use the graph property Graphs with a natural origin have a structure that is not random We try to find these structures by analyzing the graph A perfect solution has yet to be found 3

4 Motivation Communities can represent parts of a larger system (Like organs in the human body) Communities can be considered as a summary of the graph Communities make it easy to visualize and understand complex systems Communities on the web might represent pages of related topics Community can reveal the properties without releasing the individual privacy information 4

5 Defining a Community There is not exact definition of a community in a graph It depends on the application A general definition: Separation between nodes in different communities Cohesion between nodes in a community The differences between algorithms come down to the precise definition 5

6 Basics For a Graph G = {V, E} and a subgraph C G with G = V = n and C = nc φint(c) should have a higher value than the whole graph and φext(c) should be much lower Local definitions see communities as an autonomous entity within a larger system Global definitions see the communities as essential parts of a larger system Vertex similarity: compare individual nodes and group them based on a similarity measure 6

7 Methods Finding overlapping communities Clique percolation method (CPM) Finding communities with query nodes 7

8 Clique Percolation Method CPM is based on the idea that communities are likely to consist of cliques Assumption: Every node in the same community is connected to nearly every other node A community is build up by a chain of k-cliques which are adjacent. Two k-cliques are adjacent if they share k-1 nodes The largest possible chain is defined as community This is a local definition 8

9 Implementation of CPM The number of possible k-cliques in a graph is quite high Implementations search for maximal k-cliques (NP-hard problem) We build an clique-clique overlap matrix O All entries smaller than k-1 are removed 9

10 Parameter k = 3; k = 4 The results of processing the example graph with the CFinder software 10

11 Drawbacks Even if the underlying problem is NP-hard, for large sparse graphs, this algorithm is reasonably fast Some cases lead to useless results: It looks for cliques not dense subgraphs It requires a large number of cliques, but not too many 11

12 Finding a community with query nodes The goal is to find a subgraph H that contains a given set Q of query nodes and is densely connected. The function f is maximized among all possible choices for H In this case we choose the minimum degree for f Additionally we add a distance constraint d 12

13 Without size restriction - Greedy algorithm Choose f = f(h) = minimum degree of a node in H We set G0=G then repeat the steps: Obtain Gt+1 by removing a node which violates the distance constraint or has the minimum degree Terminate if either one of the query nodes has minimum degree or the query nodes are no longer connected We choose the component of Gt for which the minimum degree f(h) is maximized This can be implemented in O(n+m) 13

14 Q = {1, 2, 3} The greedy algorithm, without size constraint, applied on the example graph 14

15 Communities with size restriction A size constraint k makes the problem NP hard (Can be shown via a reduction to the Steiner tree problem) But it can be assumed that the size of the result set is correlated with the distance constraint The paper proposes two heuristics: GreedyDist repeatedly executes Greedy and decreases d until the size k of the graph is small enogh GreedyFast restricts the graph to the k closest nodes to the query nodes. Then Greedy is invoked 15

16 Evaluation with the DBLP dataset The goal was to find a network of scientific collaboration around Christos Papadimitriou 16

17 Conclusion A really broad topic with lots of applications Each algorithms is build with different problems in mind Algorithms are difficult to compare, there is no standard way of testing 17

18 Bibliography [1] P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17 61, [2] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 174, [3] P. F. Jonsson and P. A. Bates*. Global topological features of cancer proteins in the human interactome. Bioinformatics, , [4] T. H. J. S. J.-P. O. K. Kaski. Spectral and network methods in the analysis of correlation matrices of stock returns. Physica A 383, , [5] J. M. Kumpula, M. Kivelä, K. Kaski, and J. Saramäki. Sequential algorithm for fast clique percolation. Phys. Rev. E, 78:026109, Aug [6] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping com- munity structure of complex networks in nature and society. Nature, 435: , June [7] M. E. Porter, K. Schwab, M. E. Porter, K. Schwab, F. Paua, E. T. Herrera, and M. Porter. Communities in networks. Notices of the American Mathematical Society, , [8] M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD interna- tional conference on Knowledge discovery and data mining, KDD '10, , New York, NY, USA, ACM. [9] K.-F. W. Wei Gao. Information Retrieval Technology. Springer Berlin Heidelberg,

### Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

### Group CRM: a New Telecom CRM Framework from Social Network Perspective

Group CRM: a New Telecom CRM Framework from Social Network Perspective Bin Wu Beijing University of Posts and Telecommunications Beijing, China wubin@bupt.edu.cn Qi Ye Beijing University of Posts and Telecommunications

### An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,

### Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

### Expansion Properties of Large Social Graphs

Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data

### Protein Protein Interaction Networks

Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

### A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

### Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

### Analysis of Internet Topologies

Analysis of Internet Topologies Ljiljana Trajković ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Introduction to Data Mining

Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

### SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

Serdica Math. J. 30 (2004), 95 102 SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS Nickolay Khadzhiivanov, Nedyalko Nenov Communicated by V. Drensky Abstract. Let Γ(M) where M V (G) be the set of all vertices

### Social Network Mining

Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

### Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

### Graph Classification and Easy Reliability Polynomials

Mathematical Assoc. of America American Mathematical Monthly 121:1 November 18, 2014 1:11 a.m. AMM.tex page 1 Graph Classification and Easy Reliability Polynomials Pablo Romero and Gerardo Rubino Abstract.

### EMPLOYMENT 2008 - Research associate, Statistical and Biological Physics Research

GERGELY PALLA - CURRICULUM VITAE CONTACT Statistical and Biological Physics Research Group of HAS, Eötvös University, Budapest, Pázmány P. stny. 1/A. H-1117 Hungary Phone: (36-1) 372-2768 Fax: (36-1) 372-2757

### Analysis of Internet Topologies: A Historical View

Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,

### Dmitri Krioukov CAIDA/UCSD

Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD dima@caida.org F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid

### The spectra of random graphs with given expected degrees

Classification: Physical Sciences, Mathematics The spectra of random graphs with given expected degrees by Fan Chung Linyuan Lu Van Vu Department of Mathematics University of California at San Diego La

### ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML

UNIVERSITY OF ALBERTA Social Network Analysis for the Assessment of Learning Osmar R. Zaïane Professor & Scientific Director of AICML Educational Data Mining 2010 Pittsburgh, USA University of Alberta

### Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

### A Performance Comparison of Five Algorithms for Graph Isomorphism

A Performance Comparison of Five Algorithms for Graph Isomorphism P. Foggia, C.Sansone, M. Vento Dipartimento di Informatica e Sistemistica Via Claudio, 21 - I 80125 - Napoli, Italy {foggiapa, carlosan,

### Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

### CAD Algorithms. P and NP

CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

### Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks

Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks Ruoming Jin, Scott McCallen Department of Computer Science,Kent State University, Kent, OH, 44241 {jin,smccalle}@cs.kent.edu

### Link Prediction in Social Networks

CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

### Information Management course

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

### Course Syllabus For Operations Management. Management Information Systems

For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third

### Distance Degree Sequences for Network Analysis

Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

### Graph theoretic approach to analyze amino acid network

Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

### OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT EQUIPMENT

12th GI/ITG CONFERENCE ON MEASURING, MODELING AND EVALUATION OF COMPUTER AND COMMUNICATION SYSTEMS 3rd POLISH-GERMAN TELETRAFFIC SYMPOSIUM OPTIMIZED UTRAN TOPOLOGY PLANNING INCLUDING POINT-TO-MULTIPOINT

### An Introduction to APGL

An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an

### 2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

Code No: R05220502 Set No. 1 1. (a) Describe the performance analysis in detail. (b) Show that f 1 (n)+f 2 (n) = 0(max(g 1 (n), g 2 (n)) where f 1 (n) = 0(g 1 (n)) and f 2 (n) = 0(g 2 (n)). [8+8] 2. (a)

### Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

### Introduction to Scheduling Theory

Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

### Discovering and Analyzing Deviant Communities: Methods and Experiments

Discovering and Analyzing Deviant Communities: Methods and Experiments Napoleon C. Paxton *, Dae-il Jang **, Ira S. Moskowitz *, Gail-Joon Ahn ** and Stephen Russell * * Information Technology Division,

### NETZCOPE - a tool to analyze and display complex R&D collaboration networks

The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

### Access control for data integration in presence of data dependencies. Mehdi Haddad, Mohand-Saïd Hacid

Access control for data integration in presence of data dependencies Mehdi Haddad, Mohand-Saïd Hacid 1 Outline Introduction Motivating example Related work Approach Detection phase (Re)configuration phase

### Structural and functional analytics for community detection in large-scale complex networks

Chopade and Zhan Journal of Big Data DOI 10.1186/s40537-015-0019-y RESEARCH Open Access Structural and functional analytics for community detection in large-scale complex networks Pravin Chopade 1* and

### Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics

NDSSL Technical Report 9-66 July 8, 29 Title: Structural and Relational Properties of Social Contact Networks with Applications to Public Health Informatics Authors: Maleq Khan V.S. Anil Kumar Madhav Marathe

### Discovering Overlapping Groups in Social Media

Discovering Overlapping Groups in Social Media Xufei Wang Arizona State University Tempe, AZ 85287, USA Email:xufei.wang@asu.edu Lei Tang Yahoo! Labs Santa Clara, CA 9554, USA Email:ltang@yahoo-inc.com

### Inet-3.0: Internet Topology Generator

Inet-3.: Internet Topology Generator Jared Winick Sugih Jamin {jwinick,jamin}@eecs.umich.edu CSE-TR-456-2 Abstract In this report we present version 3. of Inet, an Autonomous System (AS) level Internet

### Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

### Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

### A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

### Entropy based Graph Clustering: Application to Biological and Social Networks

Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

### Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis

Utilizing Network Science and Honeynets for Software Induced Cyber Incident Analysis Abstract Framing the scene and investigating the cause of a software induced cyber-attack continues to be one of the

### Finding and counting given length cycles

Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected

### The Minimum Consistent Subset Cover Problem and its Applications in Data Mining

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining Byron J Gao 1,2, Martin Ester 1, Jin-Yi Cai 2, Oliver Schulte 1, and Hui Xiong 3 1 School of Computing Science, Simon Fraser

### Research on Supply Chain Network Knowledge Dissemination Mode

529 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The

### Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

### Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

### Travis Goodwin & Sanda Harabagiu

Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research

### Part 2: Community Detection

Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

### Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### Beyond the Stars: Revisiting Virtual Cluster Embeddings

Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, Télécom-ParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM

### Big Data Graph Algorithms

Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

### Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

### Keywords Big Graphs, Big graph databases, Triangulation method, k-mutual friend subgraph, Streaming.

Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Review on Big

### Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

### Network Analysis and Visualization of Staphylococcus aureus. by Russ Gibson

Network Analysis and Visualization of Staphylococcus aureus by Russ Gibson Network analysis Based on graph theory Probabilistic models (random graphs) developed by Erdős and Rényi in 1959 Theory and tools

### Ant Colony Optimization and Constraint Programming

Ant Colony Optimization and Constraint Programming Christine Solnon Series Editor Narendra Jussien WILEY Table of Contents Foreword Acknowledgements xi xiii Chapter 1. Introduction 1 1.1. Overview of the

### A comparative study of social network analysis tools

Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises

### Problem Set 7 Solutions

8 8 Introduction to Algorithms May 7, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Handout 25 Problem Set 7 Solutions This problem set is due in

### School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws

Graph Mining, self-similarity and power laws Christos Faloutsos University Overview Achievements global patterns and laws (static/dynamic) generators influence propagation communities; graph partitioning

### ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES

ONLINE SOCIAL NETWORK MINING: CURRENT TRENDS AND RESEARCH ISSUES G Nandi 1, A Das 1 & 2 1 Assam Don Bosco University Guwahati, Assam 781017, India 2 St. Anthony s College, Shillong, Meghalaya 793001, India

### IC05 Introduction on Networks &Visualization Nov. 2009.

IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration

### Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem

Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-center Problem Martin Ester, Rong Ge, Byron J. Gao, Zengjian Hu, Boaz Ben-Moshe School of Computing Science, Simon Fraser

### Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906

### Community Mining from Multi-relational Networks

Community Mining from Multi-relational Networks Deng Cai 1, Zheng Shao 1, Xiaofei He 2, Xifeng Yan 1, and Jiawei Han 1 1 Computer Science Department, University of Illinois at Urbana Champaign (dengcai2,

### CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

### APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk

Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and

### The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

### A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE *

Journal of Mathematical Sociology Gordon and Breach Science Publishers 1973 Vol. 3, pp 113-126 Printed in Birkenhead, England A GRAPH-THEORETIC DEFINITION OF A SOCIOMETRIC CLIQUE * RICHARD D. ALBA Columbia

### Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes

Mining Maximal Cliques from a Large Graph using MapReduce: Tackling Highly Uneven Subproblem Sizes Michael Svendsen a, Arko Provo Mukherjee a, Srikanta Tirthapura a, a Department of Electrical and Computer

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance

CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance Susanne Halstead, Daniel Serrano, Scott Proctor 6 December 2014 1 Abstract The Behance social network allows professionals of diverse

### DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

International Scientific Conference & International Workshop Present Day Trends of Innovations 2012 28 th 29 th May 2012 Łomża, Poland DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS Lubos Takac 1 Michal Zabovsky

### SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE 2012 SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH (M.Sc., SFU, Russia) A THESIS

### CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

### Differentially Private Analysis of

Title: Name: Affil./Addr. Keywords: SumOriWork: Differentially Private Analysis of Graphs Sofya Raskhodnikova, Adam Smith Pennsylvania State University Graphs, privacy, subgraph counts, degree distribution

### A box-covering algorithm for fractal scaling in scale-free networks

CHAOS 17, 026116 2007 A box-covering algorithm for fractal scaling in scale-free networks J. S. Kim CTP & FPRD, School of Physics and Astronomy, Seoul National University, NS50, Seoul 151-747, Korea K.-I.

### DATA MINING - SELECTED TOPICS

DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial

### Structural constraints in complex networks

Structural constraints in complex networks Dr. Shi Zhou Lecturer of University College London Royal Academy of Engineering / EPSRC Research Fellow Part 1. Complex networks and three key topological properties

### Social Network Analysis

Social Network Analysis Challenges in Computer Science April 1, 2014 Frank Takes (ftakes@liacs.nl) LIACS, Leiden University Overview Context Social Network Analysis Online Social Networks Friendship Graph

### SMTP: Stedelijk Museum Text Mining Project

SMTP: Stedelijk Museum Text Mining Project Jeroen Smeets Maastricht University smeetsjeroen@hotmail.com Prof. Dr. Ir. Johannes C. Scholtes Maastricht University j.scholtes@maastrichtuniversity.nl Dr. Claartje

### Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

### Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis

Nonorthogonal Decomposition of Binary Matrices for Bounded-Error Data Compression and Analysis MEHMET KOYUTÜRK and ANANTH GRAMA Department of Computer Sciences, Purdue University and NAREN RAMAKRISHNAN

### Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

### Data Mining Fundamentals

Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze

### Thank you! NetMine Data mining on networks IIS -0209107 AWSOM. Outline. Proposed method. Goals

NetMine Data mining on networks IIS -0209107 Christos Faloutsos (CMU) Michalis Faloutsos (UCR) Peggy Agouris George Kollios Fillia Makedon Betty Salzberg Anthony Stefanidis Thank you! NSF-IDM 04 C. Faloutsos

### Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

### In the following we will only consider undirected networks.

Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each

### CSC2420 Spring 2015: Lecture 3

CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda