# Graph Mining and Social Network Analysis

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC Politecnico di Milano)

2 References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management Systems (Second Edition) Chapter 9

3 Graph Mining

4 Graph Mining Overview Graphs are becoming increasingly important to model many phenomena in a large class of domains (e.g., bioinformatics, computer vision, social analysis) To deal with these needs, many data mining approaches have been extended also to graphs and trees Major approaches Mining frequent subgraphs Indexing Similarity search Classification Clustering

5 Mining frequent subgraphs Given a labeled graph data set D = {G 1,G 2,,G n } We define support(g) as the percentage of graphs in D where g is a subgraph A frequent subgraph in D is a subgraph with a support greater than min_sup How to find frequent subgraph? Apriori-based approach Pattern-growth approach

6 AprioriGraph Apply a level-wise iterative algorithm 1. Search for similar size-k frequent subgraphs 2. Merge two similar subgraphs in a size-(k+1) subgraph 3. Check if the new subgraph is frequent 4. Restart from 2. until all similar subgraphs have been considered. Otherwise restart from 1. and move to k+1. What is subgraph size? Number of vertex Number of edges Number of edge-disjoint paths Two subgraphs of size-k are similar if they have the same size-(k-1) subgraph AprioriGraph has a big computational cost (due to the merging step)

7 PatternGrowthGraph Incrementally extend frequent subgraphs 1. Add to S each frequent subgraphs g E obtained by extending subgraph g 2. Until S is not empty, select a new subgraph g in S to extend and start from 1. How to extend a subgraph? Add a vertex Add an edge The same graph can be discovered many times! Get rid of duplicates once discovered Reduce the generation of duplicates

8 Mining closed, unlabeled, and constrained subgraphs Closed subgraphs G is closed iff there is no proper supergraph G with the same support of G Reduce the growth of subgraphs discovered Is a more compact representation of knowledge Unlabeled (or partially labeled) graphs Introduce a special label Φ Φ can match any label or not Constrained subgraphs Containment constraint (edges, vertex, subgraphs) Geometric constraint Value constraint

9 Graph Indexing Indexing is basilar for effective search and query processing How to index graphs? Path-based approach takes the path as indexing unit All the path up to maxl length are indexed Does not scale very well gindex approach takes frequent and discriminative subgraphs as indexing unit A subgraph is frequent if it has a support greater than a threshold A subgraph is discriminative if its support cannot be well approximated by the intersection of the graph sets that contain one of its subgraphs

10 Graph Classification and Clustering Mining of frequent subgraphs can be effectively used for classification and clustering purposes Classification Frequent and discriminative subgraphs are used as features to perform the classification task A subgraph is discriminative if it is frequent only in one class of graphs and infrequent in the others The threshold on frequency and discriminativeness should be tuned to obtain the desired classification results Clustering The mined frequent subgraphs are used to define similarity between graphs Two graphs that share a large set of patterns should be considered similar and grouped in the same cluster The threshold on frequency can be tuned to find the desired number of clusters As the mining step affects heavily the final outcome, this is an intertwined process rather tan a two-steps process

11 Social Network Analysis

12 Social Network A social network is an heterogeneous and multirelational dataset represented by a graph Vertexes represent the objects (entities) Edges represent the links (relationships or interaction) Both objects and links may have attributes Social networks are usually very large Social network can be used to represents many real-world phenomena (not necessarily social) Electrical power grids Phone calls Spread of computer virus WWW

13 Small World Networks (1) Are social networks random graphs? NO! Internet Map Science Coauthorship High degree of local clustering Protein Network Few degrees of separation

14 Small World Networks (2) Sarah Peter Jane Ralph Society: Six degrees S. Milgram 1967 F. Karinthy 1929 WWW: 19 degrees Albert et al. 1999

15 Small World Networks (3) Definitions Node s degree us the number of incident edges Network effective diameter is the max distance within 90% of the network Properties Densification power law Shrinking diameter Heavy-tailed degrees distribution n: number of nodes e: number of eges 1<α<2 Node degrees Preferential attachment model Node rank

17 Mining social networks (2) Link cardinality estimation Prediction of the number of links to an object Prediction of the number of objects reachable from a specific object Object reconciliation Discover if two objects are the same on the basis of their attributes and links E.g., predict if two websites are mirrors of each other Group detection Clustering of objects on the basis both of their attributes and their links Subgraph detection Discover characteristic subgraphs within network

18 Challenges Feature construction Not only the objects attributes need to be considered but also attributes of linked objects Feature selection and aggregation techniques must be applied to reduce the size of search space Collective classification and consolidation Unlabeled data cannot be classified independently New objects can be correlated and need to be considered collectively to consolidate the current model Link prediction The prior probability of link between two objects may be very low Community mining from multirelational networks Many approaches assume an homogenous relationship while social networks usually represent different communities and functionalities

19 Applications Link Prediction Viral Marketing Community Mining

20 Link prediction What edges will be added to the network? Given a snapshot of a network at time t, link prediction aims to predict the edges that will be added before a given future time t Link prediction is generally solved assigning to each pair of nodes a weight score(x,y) The higher the score the more likely that link will be added in the near future The score(x,y) can be computed in several way Shortest path: the shortest he path between X and Y the highest is their score Common neighbors: the greater the number of neighbors X and Y have in common, the highest is their score Ensemble of all paths: weighted sum of paths that connects X and Y (shorter paths have usually larger weights)

21 Viral Marketing Several marketing approaches Mass marketing is targeted on specific segment of customers Direct marketing is target on specific customers solely on the basis of their characteristics Viral marketing tries to exploit the social connections to maximize the output of marketing actions Each customer has a specific network value based on The number of connections Its role in the network (e.g., opinion leader, listener) Role of its connections Viral marketing aims to exploit the network value of customers to predict their influence and to maximize the outcome of marketing actions

22 Viral Marketing: Random Spreading 500 randomly chosen customers are given a product (from 5000).

23 Viral Marketing: Directed Spreading The 500 most connected consumers are given a product.

24 Community Mining In social networks there are usually several kinds of relationships between objects A social network usually contains several relation networks that plays an important role to identify different communities The relation that identify a community can be an hidden relation Relation extraction and selection techniques are generally used to discover communities in social networks Example:

### Graph Mining and Social Network Analysis. Data Mining; EECS 4412 Darren Rolfe + Vince Chu 11.06.14

Graph Mining and Social Network nalysis Data Mining; EES 4412 Darren Rolfe + Vince hu 11.06.14 genda Graph Mining Methods for Mining Frequent Subgraphs priori-based pproach: GM, FSG Pattern-Growth pproach:

### 9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining

9 Graph Mining, Social Network Analysis, and Multirelational Data Mining We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific and

### Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

### SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

### Graph Algorithms using Map-Reduce

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society.

### Tutorial, IEEE SERVICE 2014 Anchorage, Alaska

Tutorial, IEEE SERVICE 2014 Anchorage, Alaska Big Data Science: Fundamental, Techniques, and Challenges (Data Mining on Big Data) 2014. 6. 27. By Neil Y. Yen Presented by Incheon Paik University of Aizu

### DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

### A New Marketing Channel Management Strategy Based on Frequent Subtree Mining

A New Marketing Channel Management Strategy Based on Frequent Subtree Mining Daoping Wang Peng Gao School of Economics and Management University of Science and Technology Beijing ABSTRACT For most manufacturers,

### Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter

### 131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

### Protein Protein Interaction Networks

Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

### IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

### Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

### Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

### A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

### Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

### International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

### Distance Degree Sequences for Network Analysis

Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

### Application of Machine Learning to Link Prediction

Application of Machine Learning to Link Prediction Kyle Julian (kjulian3), Wayne Lu (waynelu) December 6, 6 Introduction Real-world networks evolve over time as new nodes and links are added. Link prediction

### Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

### Trees and Fundamental Circuits

Trees and Fundamental Circuits Tree A connected graph without any circuits. o must have at least one vertex. o definition implies that it must be a simple graph. o only finite trees are being considered

### Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

### Introduction to Graph Mining

Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain

### Graph Processing and Social Networks

Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

### Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

### Scientific Collaboration Networks in China s System Engineering Subject

, pp.31-40 http://dx.doi.org/10.14257/ijunesst.2013.6.6.04 Scientific Collaboration Networks in China s System Engineering Subject Sen Wu 1, Jiaye Wang 1,*, Xiaodong Feng 1 and Dan Lu 1 1 Dongling School

### Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R.0, steen@cs.vu.nl Chapter 0: Version: April 8, 0 / Contents Chapter Description 0: Introduction

### KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

### Information Management course

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Part 2: Community Detection

Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

### Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

### Social Media Mining. Graph Essentials

Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

### 1 o Semestre 2007/2008

Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

### A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

### Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)

Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people

### K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

### Mining Social-Network Graphs

342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

### An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases

An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases Jinsoo Lee School of Computer Science and Engineering Kyungpook National University, Korea jslee@www-db.knu.ac.kr Romans Kasperovics

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Graph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 3. Topic Overview Definitions and Representation Minimum

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

### PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

### Viral Marketing in Social Network Using Data Mining

Viral Marketing in Social Network Using Data Mining Shalini Sharma*,Vishal Shrivastava** *M.Tech. Scholar, Arya College of Engg. & I.T, Jaipur (Raj.) **Associate Proffessor(Dept. of CSE), Arya College

### Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

### Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

### Data Mining Solutions for the Business Environment

Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototype-based Fuzzy c-means Mixture Model Clustering Density-based

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents

### Network Analysis and Visualization of Staphylococcus aureus. by Russ Gibson

Network Analysis and Visualization of Staphylococcus aureus by Russ Gibson Network analysis Based on graph theory Probabilistic models (random graphs) developed by Erdős and Rényi in 1959 Theory and tools

### Extracting Information from Social Networks

Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

### Investigating Clinical Care Pathways Correlated with Outcomes

Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges

### Entropy based Graph Clustering: Application to Biological and Social Networks

Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

### Minimum Spanning Trees

Minimum Spanning Trees Algorithms and 18.304 Presentation Outline 1 Graph Terminology Minimum Spanning Trees 2 3 Outline Graph Terminology Minimum Spanning Trees 1 Graph Terminology Minimum Spanning Trees

### Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graph-based representations Representing a problem

### Mining Social Network Graphs

Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

### Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

### Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### COMP3420: Advanced Databases and Data Mining. Web data mining

COMP3420: Advanced Databases and Data Mining Web data mining Lecture outline The Web as a data source Challenges the Web poses to data mining Types of Web data mining Mining the Web page layout structure

### A signature of power law network dynamics

Classification: BIOLOGICAL SCIENCES: Computational Biology A signature of power law network dynamics Ashish Bhan* and Animesh Ray* Center for Network Studies Keck Graduate Institute 535 Watson Drive Claremont,

### INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

### Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

### { Mining, Sets, of, Patterns }

{ Mining, Sets, of, Patterns } A tutorial at ECMLPKDD2010 September 20, 2010, Barcelona, Spain by B. Bringmann, S. Nijssen, N. Tatti, J. Vreeken, A. Zimmermann 1 Overview Tutorial 00:00 00:45 Introduction

### Social Network Analysis: Introduzione all'analisi di reti sociali

Social Network Analysis: Introduzione all'analisi di reti sociali Michele Coscia Dipartimento di Informatica Università di Pisa www.di.unipi.it/~coscia Piano Lezioni Introduzione Misure + Modelli di Social

### Knowledge Discovery from Data Bases Proposal for a MAP-I UC

Knowledge Discovery from Data Bases Proposal for a MAP-I UC João Gama (jgama@fep.up.pt) Universidade do Porto 1 Knowledge Discovery from Data Bases We are deluged by data: scientific data, medical data,

### 6.207/14.15: Networks Lecture 6: Growing Random Networks and Power Laws

6.207/14.15: Networks Lecture 6: Growing Random Networks and Power Laws Daron Acemoglu and Asu Ozdaglar MIT September 28, 2009 1 Outline Growing random networks Power-law degree distributions: Rich-Get-Richer

### Pattern Recognition: An Overview. Prof. Richard Zanibbi

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition The identification of implicit objects, types or relationships in raw data by an animal or machine i.e. recognizing

### A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University park.764@osu.edu yilmaz.15@osu.edu ABSTRACT Depending

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

### Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

### MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

### Data Mining and Clustering Techniques

DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center

### Planar Tree Transformation: Results and Counterexample

Planar Tree Transformation: Results and Counterexample Selim G Akl, Kamrul Islam, and Henk Meijer School of Computing, Queen s University Kingston, Ontario, Canada K7L 3N6 Abstract We consider the problem

### Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

### Web Usage Mining: Identification of Trends Followed by the user through Neural Network

International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web

### Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware

Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware Mahyar Shahsavari, Zaid Al-Ars, Koen Bertels,1, Computer Engineering Group, Software & Computer Technology

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### BIRCH: An Efficient Data Clustering Method For Very Large Databases

BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.

### 3. The Junction Tree Algorithms

A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

### KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

### 2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

### An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### Chapter ML:XI (continued)

Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

### EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One