Graph Mining Techniques for Social Media Analysis



Similar documents
School of Computer Science Carnegie Mellon Graph Mining, self-similarity and power laws

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

The Shape of the Network. The Shape of the Internet. Why study topology? Internet topologies. Early work. More on topologies..

CMU SCS Large Graph Mining Patterns, Tools and Cascade analysis

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Some questions... Graphs

Expansion Properties of Large Social Graphs

Introduction to Networks and Business Intelligence

The average distances in random graphs with given expected degrees

The ebay Graph: How Do Online Auction Users Interact?

A SURVEY OF PROPERTIES AND MODELS OF ON- LINE SOCIAL NETWORKS

Social Networks and Social Media

Towards Modelling The Internet Topology The Interactive Growth Model

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Distance Degree Sequences for Network Analysis

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Looking at the Blogosphere Topology through Different Lenses

A dynamic model for on-line social networks

Towards Modeling Legitimate and Unsolicited Traffic Using Social Network Properties

Overview of the Stateof-the-Art. Networks. Evolution of social network studies

Tutorial, IEEE SERVICE 2014 Anchorage, Alaska

Complex Networks Analysis: Clustering Methods

Communication Dynamics of Blog Networks

Analysis of Internet Topologies: A Historical View

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

Efficient target control of complex networks based on preferential matching

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Analysis of Internet Topologies

(B ) Empirical observation: Shrinking diameters: The effective diameter is, in many cases, actually decreasing as the network grows.

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Information Network or Social Network? The Structure of the Twitter Follow Graph

How To Find Out How A Graph Densifies

Social Network Mining

Applying Social Network Analysis to the Information in CVS Repositories

An Interest-Oriented Network Evolution Mechanism for Online Communities

A discussion of Statistical Mechanics of Complex Networks P. Part I

Statistical Mechanics of Complex Networks

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

On Realistic Network Topologies for Simulation

Jure Leskovec Stanford University

AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS

Online Social Networks and Network Economics. Aris Anagnostopoulos, Online Social Networks and Network Economics

Graph Mining and Social Network Analysis

A Graph-Based Friend Recommendation System Using Genetic Algorithm

An Alternative Web Search Strategy? Abstract

Structure of a large social network

Zipf s law and the Internet

Healthcare Analytics. Aryya Gangopadhyay UMBC

Extracting Information from Social Networks

How To Understand The Network Of A Network

The mathematics of networks

Random graphs and complex networks

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Q. Yan, X. Huang School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing, China,

arxiv:cs.dm/ v1 30 Mar 2002

Evolution of the Internet AS-Level Ecosystem

Physical Theories of the Evolution of Online Social Networks: A Discussion Impulse

Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions

CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms

WORKSHOP Analisi delle Reti Sociali per conoscere uno strumento uno strumento per conoscere

MINING COMMUNITIES OF BLOGGERS: A CASE STUDY

Social Media Network & Information Professionals

Inet-3.0: Internet Topology Generator

Mining Large Graphs. ECML/PKDD 2007 tutorial. Part 2: Diffusion and cascading behavior

Analyzing the Facebook graph?

Network Theory: 80/20 Rule and Small Worlds Theory

On the Bursty Evolution of Online Social Networks

Collapse by Cascading Failures in Hybrid Attacked Regional Internet

An Analysis of Verifications in Microblogging Social Networks - Sina Weibo

1 Six Degrees of Separation

Temporal Dynamics of Scale-Free Networks

A Network Science Perspective of a Distributed Reputation Mechanism

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Understanding the evolution dynamics of internet topology

IC05 Introduction on Networks &Visualization Nov

An Incremental Super-Linear Preferential Internet Topology Model

Studying Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information

Evolution of a large online social network

A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS ABSTRACT

Some Examples of Network Measurements

Computer Network Topologies: Models and Generation Tools

The Topology of Large-Scale Engineering Problem-Solving Networks

The Length of Bridge Ties: Structural and Geographic Properties of Online Social Interactions

Social Network Discovery based on Sensitivity Analysis

Effects of node buffer and capacity on network traffic

Crawling Facebook for Social Network Analysis Purposes

The Internet Is Like A Jellyfish

Combining Spatial and Network Analysis: A Case Study of the GoMore Network

How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks

Graph/Network Visualization

Statistical mechanics of complex networks

Power Law of Predictive Graphs

A comparative study of social network analysis tools

Social Network Analysis

Degrees of Separation in the New Zealand Workforce: Evidence from linked employer-employee data

Community Mining from Multi-relational Networks

IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Graph Theory and Networks in Biology

Transcription:

Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1

What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented as a graph. For our purposes, this is usually a social network.` Livejournal, via Lehman and Kottler Facebook graph, via Touchgraph 1-2

What is graph mining? Example: Social media host tries to look at certain online groups and predict whether the group will flourish or disband. Example: Phone provider looks at cell phone call records to determine whether an account is a result of identity theft. 1-3

Why graph mining? Thanks to the web and social media, for the first time we have easily accessible network data on a large-scale. Understand relationships (links) as well as content (text, images). Large amounts of data raise new questions. Massive amount of data Need for organization 1-4

Motivating questions Q1: How do networks form, evolve, collapse? Q2: What tools can we use to study networks? Q3: Who are the most influential/central members of a network? Q4: How do ideas diffuse through a network? Q5: How can we extract communities? Q6: What sort of anomaly detection can we perform on networks? 1-5

Outline Part 1: Q1: How do networks form, evolve, collapse? Introduction to networks Patterns, Laws Part 2: Q2: What tools can we use to study networks? Q3: Ranking: Who are the most important members of a network? Part 3: Case studies Q4: Diffusion: How do ideas diffuse through a network? Q5: How can we extract communities? Q6: What sort of anomaly detection can we perform? 1-6

Part 1 Outline Introduction to networks and 6 definitions Patterns Diameter Degree distribution Connected components Evolution over time 1-7

D1: Network A network is defined as a graph G=(V,E) V : set of vertices, or nodes. E : set of edges. Edges may have numerical weights. 1-8

D2: Adjacency matrix To represent graphs, use adjacency matrix Unweighted graphs: all entries are 0 or 1 Undirected graphs: matrix is symmetric to B1 B2 B3 B4 B1 0 1 0 0 fromb2 1 0 0 0 B3 0 0 1 0 B4 1 2 0 3 1-9

D3: Bipartite graphs In a bipartite graph, 2 sets of vertices edges occur between different sets. If graph is undirected, we can represent as a nonsquare adjacency matrix. n1 n2 n3 n4 m 1 m 2 m 3 m1 m2 m3 n1 1 1 0 n2 0 0 1 n3 0 0 0 n4 0 0 1 1-10

D4: Components Component: set of nodes with paths between each. n1 n2 n3 n4 m 1 m 2 m 3 1-11

D4: Components Component: set of nodes with paths between each. We will see later that often real graphs form a giant connected component. n1 n2 n3 n4 m 1 m 2 m 3 1-12

D5: Diameter Diameter of a graph is the longest shortest path. n1 n2 n3 n4 m 1 m 2 m 3 1-13

D5: Diameter Diameter of a graph is the longest shortest path. n1 n2 n3 n4 m 1 m 2 m 3 diameter=3 1-14

D5: Diameter Diameter of a graph is the longest shortest path. We can estimate this by sampling. Effective diameter is the distance at which 90% of nodes can be reached. n1 n2 n3 n4 m 1 m 2 m 3 diameter=3 1-15

D6: Degree distribution We can find the degree of any node by summing entries in the (unweighted) adjacency matrix. to B1 B2 B3 B4 B1 0 1 0 0 1 fromb2 1 0 0 0 1 B3 0 0 1 0 1 B4 1 1 0 1 3 2 2 1 1 in-degree out-degree 1-16

Some graphs Internet Map [lumeta.com] Food Web [Martinez 91] Research question: Are real graphs random? (no) Friendship Network [Moody 01] Protein Interactions [genomebiology.com] 1-17

Part 1 Outline Introduction to networks Patterns Diameter Degree distribution Connected components Evolution over time 1-18

Small-world effect Graphs usually display small diameter. First demonstrated by Travers & Milgram in 1960. Most of the time, distance was around 6. Similarly, real graphs we see have small diameter 1-19

[Leskovec & Horvitz 07] Distribution of shortest path lengths Microsoft Messenger network 180 million people 1.3 billion edges Number of nodes 7 Pick a random node, count how many nodes are at distance 1,2,3... hops Edge if two people exchanged at least one message in one month period Distance (Hops) 1-20

Part 1 Outline Introduction to networks Patterns Diameter: small world effect Degree distribution Connected components Evolution over time 1-21

Degree distribution Count vs. degree Suppose average degree is 3.3 If we pick a node at random, can we guess its degree? In real graph, mode is 1! avg: 3.3 1-22

Degree distribution Count vs. degree Suppose average degree is 3.3 If we pick a node at random, can we What pattern does guess its degree? degree of real graphs follow? In real graph, mode is 1! Therefore, mean is meaningless. avg: 3.3 1-23

Power law degree distribution Measure with rank exponent R [SIGCOMM99] internet domains log(degree) ibm.com att.com -0.82 log(rank) 1-24

Power laws - discussion Do they hold, over time? Do they hold on other graphs/domains? 1-25

Power laws - discussion Do they hold, over time? Yes! for multiple years [Siganos+] Do they hold on other graphs/domains? Yes! Web sites and links [Tomkins+], [Barabasi+] Peer-to-peer graphs (gnutella-style) Who-trusts-whom (epinions.com) 1-26

Time Evolution: rank R log(degree) ibm.com att.com -0.82 log(rank) Domain level The rank exponent has not changed! [Siganos+] 1-27

The Peer-to-Peer Topology count [Jovanovic+] degree Number of immediate peers (= degree), follows a power-law 1-28

epinions.com count who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree 1-29

Part 1 Outline Introduction to networks Patterns Diameter: small world effect Degree distribution: power law Connected components Evolution over time 1-30

Components Basic graph generator, Erdos-Renyi For n vertices, connect any two IID with probability p. Many provable properties, including emergence of a giant connected component. Real graphs do not have E-R degree distribution, but... 1-31

Giant connected component Nearly all real networks have a giant connected component (GCC) emerge! Often SM graphs have middle region See [Kumar, Novak, & Tomkins, KDD 2006] (Example: NIPS citation graph, visualization with GUESS [Adar 06] 1-32

Part 1 Outline Introduction to networks Patterns Diameter: small world effect Degree distribution: power law Connected components: giant CC Evolution over time 1-33

Motivating questions How do graphs evolve? Degree-exponent seems constant - any other consistent patterns? 1-34

Evolution of diameter? Prior analysis, on power-law-like graphs, hints diameter slowly increasing with time. diameter ~ O(log(N)) X or diameter ~ O( log(log(n))) Slowly increasing with network size What is happening, in reality? Diameter shrinks, toward a constant value! 1-35

Shrinking diameter [Leskovec, Faloutsos, Kleinberg KDD 2005] Citations among physics papers 11yrs; @ 2003: 29,555 papers 352,807 citations For each month M, create a graph of all citations up to month M diameter time 1-36

Shrinking diameter Authors & publications 1992 318 nodes 272 edges 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges 1-37

Shrinking diameter Patents & citations 1975 334,000 nodes 676,000 edges 1999 2.9 million nodes 16.5 million edges Each year is a datapoint 1-38

Shrinking diameter Autonomous systems 1997 3,000 nodes 10,000 edges 2000 6,000 nodes 26,000 edges One graph per day diameter N 1-39

Temporal evolution N(t) nodes; E(t) edges at time t Suppose that N(t+1) = 2 * N(t) What is your guess for E(t+1) =? 2 * E(t) 1-40

Temporal evolution N(t) nodes; E(t) edges at time t Suppose that N(t+1) = 2 * N(t) What is your guess for E(t+1) =? X2 * E(t) Edges over-double! 1-41

Temporal evolution Growth of edges obeys power law with: where 1<a<2 E(t) ~ N(t) a for all t 1-42

Densification Power Law ArXiv: Physics papers and their citations E(t) 1.69 N(t) 1-43

Densification Power Law ArXiv: Physics papers and their citations E(t) 1.69 tree 1 N(t) 1-44

Densification Power Law ArXiv: Physics papers and their citations E(t) clique 2 1.69 N(t) 1-45

Densification Power Law U.S. Patents, citing each other E(t) 1.66 N(t) 1-46

Densification Power Law Autonomous Systems E(t) 1.18 N(t) 1-47

Densification Power Law ArXiv: authors & papers E(t) 1.15 N(t) 1-48

Part 1 Outline Introduction to networks Patterns Diameter: small world effect Degree distribution: power law Connected components: giant CC Evolution over time: Shrinking diameter, Densification power law 1-49

Another big question Q: How can we generate realistic networks? 1-50

Another big question Q: How can we generate realistic networks? A: Answering this question fully would require another tutorial. Some models are preferential attachment (Barabasi et. al.), copying model (Kleinberg et. al.), winners don t take all (Pennock et. al.), Kronecker multiplication (Leskovec et. al.). 1-51

Bibliography: Part 1 Power Laws and patterns Albert, R. & Barabasi, A. (2002), 'Statistical mechanics of complex networks', Reviews of Modern Physics 74, 47. Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks', Science 286(5439), 509--512. Domingos, P. & Richardson, M. (2001), Mining the network value of customers, in 'KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining', ACM Press, New York, NY, USA, pp. 57--66. Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262. 1-52

Bibliography: Part 1 Jovanovic, M. A. (2001), 'Modeling Large-scale Peer-to- Peer Networks and a Case Study of Gnutella ', Master's thesis, University of Cincinatti. Kossinets, G. & Watts, D. J. (2006), 'Empirical Analysis of an Evolving Social Network', Science 311(5757), 88--90. Kumar, R.; Novak, J.; Raghavan, P. & Tomkins, A. (2004), 'Structure and evolution of blogspace', Commun. ACM 47(12), 35--39. Kumar, R.; Novak, J.; Raghavan, P. & Tomkins, A. (2003), On the bursty evolution of blogspace, in 'WWW '03: Proceedings of the 12th international conference on World Wide Web', ACM Press, New York, NY, USA, pp. 568--576. Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611--617. 1-53

Bibliography: Part 1 Leland, W.; Taqqu, M.; Willinger, W. & Wilson, D. (1994), 'On the Self-Similar Nature of Ethernet Traffic', IEEE Transactions on Networking 2(1), 1-15. Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05. Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60--67. Newman, M. E. J. (2005), 'Power laws, Pareto distributions and Zipf's law', Contemporary Physics 46, 323. Siganos, G.; Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (2003), 'Power laws and the AS-level internet topology.', IEEE/ACM Trans. Netw. 11(4), 514-524. Zipf, G. (1949), Human Behavior and Principle of Least Effort: An Introduction to Human Ecology, Addison Wesley, Cambridge, Massachusetts. 1-54

Bibliography: Part 1 Models Barabasi, A. (2005), 'The origin of bursts and heavy tails in human dynamics', Nature 435, 207. Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math. Inst. Hungary. Acad. Sci. 5, 17-61. Granovetter, M. (1978), 'Threshold Models of Collective Behavior', Am. Journal of Sociology 83(6), 1420--1443. Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007. Vazquez, A.; Oliveira, J. G.; Dezso, Z.; Goh, K. I.; Kondor, I. & Barabasi, A. L. (2006), 'Modeling bursts and heavy tails in human dynamics', Physical Review E 73, 036127. Watts, D. J. & Strogatz, S. H. (1998), 'Collective dynamics of 'small-world' networks.', Nature 393(6684), 440--442. 1-55

Bibliography: Part 1 Visualization example Adar, Eytan, "GUESS: A Language and Interface for Graph Exploration," CHI 2006 Katharina Anna Lehmann, Stephan Kottler, Michael Kaufmann. Visualizing Large and Clustered Networks. Graph Drawing 2006 Touchgraph: http://www.touchgraph.com/tgfacebookbrowser.ht ml 1-56

1-57