Social Media Mining. Network Measures



Similar documents
Practical Graph Mining with R. 5. Link Analysis

SGL: Stata graph library for network analysis

DATA ANALYSIS II. Matrix Algorithms

Social Media Mining. Graph Essentials

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Part 2: Community Detection

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Linear Programming I

Mining Social-Network Graphs

Walk-Based Centrality and Communicability Measures for Network Analysis

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Identifying Leaders and Followers in Online Social Networks

Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

How To Understand The Network Of A Network

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Predicting Influentials in Online Social Networks

Social network analysis with R sna package

Graph Processing and Social Networks

618 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS/SUPPLEMENT, VOL. 31, NO. 9, SEPTEMBER 2013

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)

Ranking on Data Manifolds

Healthcare Analytics. Aryya Gangopadhyay UMBC

Course on Social Network Analysis Graphs and Networks

Data Mining: Algorithms and Applications Matrix Math Review

Mining Social Network Graphs

Follow links for Class Use and other Permissions. For more information send to:

A Network Approach to Define Modularity of Components in Complex Products

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

The PageRank Citation Ranking: Bring Order to the Web

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Complex Networks Analysis: Clustering Methods

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

The Big Picture. Correlation. Scatter Plots. Data

Why? A central concept in Computer Science. Algorithms are ubiquitous.

1.204 Final Project Network Analysis of Urban Street Patterns

Graph theoretic approach to analyze amino acid network

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

3.1. RATIONAL EXPRESSIONS

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Approximation Algorithms

Nodes, Ties and Influence

Social network and smoking: a pilotstudyamonghigh-schoolstudents

Network (Tree) Topology Inference Based on Prüfer Sequence

Social Networks and Social Media

Analysis of Algorithms, I

Multicast Group Management for Interactive Distributed Applications

Network-Based Tools for the Visualization and Analysis of Domain Models

Strong and Weak Ties

arxiv: v3 [math.na] 1 Oct 2012

The mathematics of networks

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

B490 Mining the Big Data. 2 Clustering

CMPSCI611: Approximating MAX-CUT Lecture 20

CSV886: Social, Economics and Business Networks. Lecture 2: Affiliation and Balance. R Ravi ravi+iitd@andrew.cmu.edu

Manifold Learning Examples PCA, LLE and ISOMAP

Network Analysis of a Large Scale Open Source Project

Introduction to Networks and Business Intelligence

Network Analysis Basics and applications to online data

Distance Degree Sequences for Network Analysis

The Characteristic Polynomial

Performance Metrics for Graph Mining Tasks

Applied Algorithm Design Lecture 5

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Distributed Computing over Communication Networks: Maximal Independent Set

Affinity Prediction in Online Social Networks

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

Part 1: Link Analysis & Page Rank

Statistical and computational challenges in networks and cybersecurity

Chapter ML:XI (continued)

Inet-3.0: Internet Topology Generator

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

1 o Semestre 2007/2008

Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course

Identification of Influencers - Measuring Influence in Customer Networks

Risk Mitigation Strategies for Critical Infrastructures Based on Graph Centrality Analysis

5 Directed acyclic graphs

Francesco Sorrentino Department of Mechanical Engineering

GENERATING AN ASSORTATIVE NETWORK WITH A GIVEN DEGREE DISTRIBUTION

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

Linear Algebra and TI 89

Multiple regression - Matrices

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Network Analysis For Sustainability Management

An Introduction to APGL

Probabilistic Risk Analysis in Business Continuity Management

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

My work provides a distinction between the national inputoutput model and three spatial models: regional, interregional y multiregional

NodeXL for Network analysis Demo/hands-on at NICAR 2012, St Louis, Feb 24. Peter Aldhous, San Francisco Bureau Chief.

Lecture 8 : Dynamic Stability

A SURVEY OF MODELS AND ALGORITHMS FOR SOCIAL INFLUENCE ANALYSIS

Search Heuristics for Load Balancing in IP-networks

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Transcription:

Klout Measures and Metrics 22

Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users and how can we find these similar individuals? To answer these and similar questions, one first needs to define measures for quantifying centrality, level of interactions, and similarity, among others. Measures and Metrics 33

Centrality Centrality defines how important a node is within a network. Measures and Metrics 44

Degree Centrality The degree centrality measure ranks nodes with more connections higher in terms of centrality d i is the degree (number of adjacent edges) for vertex v i In this graph degree centrality for vertex v 1 is d 1 = 8 and for all others is d j = 1, j 1 Measures and Metrics 55

Degree Centrality in Directed Graphs In directed graphs, we can either use the indegree, the out-degree, or the combination as the degree centrality value: d out i is the number of outgoing links for vertex v i Measures and Metrics 66

Normalized Degree Centrality Normalized by the maximum possible degree Normalized by the maximum degree Normalized by the degree sum Measures and Metrics 77

Eigenvector Centrality Having more friends does not by itself guarantee that someone is more important, but having more important friends provides a stronger signal Eigenvector centrality tries to generalize degree centrality by incorporating the importance of the neighbors (undirected) For directed graphs, we can use incoming or outgoing edges C e (v i ): the eigenvector centrality of node v i : some fixed constant Measures and Metrics 8

Eigenvector Centrality, cont. Let T This means that C e is an eigenvector of adjacency matrix A and is the corresponding eigenvalue Which eigenvalue-eigenvector pair should we choose? Measures and Metrics 99

Eigenvector Centrality, cont. Measures and Metrics 10

Eigenvector Centrality: Example = (2.68, -1.74, -1.27, 0.33, 0.00) Eigenvalues Vector max = 2.68 Measures and Metrics 11

Katz Centrality A major problem with eigenvector centrality arises when it deals with directed graphs Centrality only passes over outgoing edges and in special cases such as when a node is in a weakly connected component centrality becomes zero even though the node can have many edge connected to it To resolve this problem we add bias term to the centrality values for all nodes Measures and Metrics 12

Katz Centrality, cont. Controlling term Bias term Rewriting equation in a vector form Katz centrality: vector of all 1 s Measures and Metrics 13

Katz Centrality, cont. When α=0, the eigenvector centrality is removed and all nodes get the same centrality value ᵦ As α gets larger the effect of ᵦ is reduced For the matrix (I- αa T ) to be invertible, we must have det(i- αa T )!=0 By rearranging we get det(a T - α -1 I)=0 This is basically the characteristic equation, which first becomes zero when the largest eigenvalue equals α -1 In practice we select α < 1/λ, where λ is the largest eigenvalue of A T Measures and Metrics 14

Katz Centrality Example The Eigenvalues are -1.68, -1.0, 0.35, 3.32 We assume α=0.25 < 1/3.32 ᵦ=0.2 Measures and Metrics 15

PageRank Problem with Katz Centrality: in directed graphs, once a node becomes an authority (high centrality), it passes all its centrality along all of its out-links This is less desirable since not everyone known by a well-known person is well-known To mitigate this problem we can divide the value of passed centrality by the number of outgoing links, i.e., out-degree of that node such that each connected neighbor gets a fraction of the source node s centrality Measures and Metrics 16

PageRank, cont. What if the degree is zero? Measures and Metrics 17

PageRank Example We assume α=0.95 and ᵦ=0.1 Measures and Metrics 18

Betweenness Centrality Another way of looking at centrality is by considering how important nodes are in connecting other nodes the number of shortest paths from vertex s to t a.k.a. information pathways the number of shortest paths from s to t that pass through v i Measures and Metrics 19

Normalizing Betweenness Centrality In the best case, node v i is on all shortest paths from s to t, hence, Therefore, the maximum value is 2 Betweenness centrality: Measures Network and Measures Metrics 20

Betweenness Centrality Example Measures and Metrics 21

Closeness Centrality The intuition is that influential and central nodes can quickly reach other nodes These nodes should have a smaller average shortest path length to other nodes Closeness centrality: Measures Network and Measures Metrics 22

Compute Closeness Centrality Measures Network and Measures Metrics 23

Group Centrality All centrality measures defined so far measure centrality for a single node. These measures can be generalized for a group of nodes. A simple approach is to replace all nodes in a group with a super node The group structure is disregarded. Let S denote the set of nodes in the group and V- S the set of outsiders Measures Network and Measures Metrics 24

Group Centrality Group Degree Centrality We can normalize it by dividing it by V-S Group Betweeness Centrality We can normalize it by dividing it by 2 Measures and Metrics 25

Group Centrality Group Closeness Centrality It is the average distance from non-members to the group One can also utilize the maximum distance or the average distance Measures Network and Measures Metrics 26

Group Centrality Example Consider S={v2,v3} Group degree centrality=3 Group betweenness centrality = 3 Group closeness centrality = 1 Measures and Metrics 27

Transitivity and Reciprocity Measures Network and Measures Metrics 28

Transitivity Mathematic representation: For a transitive relation R: In a social network: Transitivity is when a friend of my friend is my friend Transitivity in a social network leads to a denser graph, which in turn is closer to a complete graph We can determine how close graphs are to the complete graph by measuring transitivity Measures Network and Measures Metrics 29

[Global] Clustering Coefficient Clustering coefficient analyzes transitivity in an undirected graph We measure it by counting paths of length two and check whether the third edge exists When counting triangles, since every triangle has 6 closed paths of length 2: In undirected networks: Measures Network and Measures Metrics 30

[Global] Clustering Coefficient: Example Measures and Metrics 31

Local Clustering Coefficient Local clustering coefficient measures transitivity at the node level Commonly employed for undirected graphs, it computes how strongly neighbors of a node v (nodes adjacent to v) are themselves connected In an undirected graph, the denominator can be rewritten as: Measures Network and Measures Metrics 32

Local Clustering Coefficient: Example Thin lines depict connections to neighbors Dashed lines are the missing connections among neighbors Solid lines indicate connected neighbors When none of neighbors are connected C=0 When all neighbors are connected C=1 Measures Network and Measures Metrics 33

Reciprocity If you become my friend, I ll be yours Reciprocity is a more simplified version of transitivity as it considers closed loops of length 2 If node v is connected to node u, u by connecting to v, exhibits reciprocity Measures Network and Measures Metrics 34

Reciprocity: Example Reciprocal nodes: v 1, v 2 Measures and Metrics 35

Balance and Status Measuring stability based on an observed network Measures Network and Measures Metrics 36

Social Balance Theory Social balance theory discusses consistency in friend/foe relationships among individuals. Informally, social balance theory says friend/foe relationships are consistent when In the network Positive edges demonstrate friendships (w ij =1) Negative edges demonstrate being enemies (w ij =-1) Triangle of nodes i, j, and k, is balanced, if and only if w ij denotes the value of the edge between nodes i and j Measures and Metrics 37

Social Balance Theory: Possible Combinations For any cycle if the multiplication of edge values become positive, then the cycle is socially balanced Measures Network and Measures Metrics 38

Social Status Theory Status defines how prestigious an individual is ranked within a society Social status theory measures how consistent individuals are in assigning status to their neighbors Measures Network and Measures Metrics 39

Social Status Theory: Example Unstable configuration Stable configuration A directed + edge from node X to node Y shows that Y has a higher status than X and a - one shows vice versa Measures Network and Measures Metrics 40

Similarity How similar are two nodes in a network? Measures and Metrics 41

Structural Equivalence In structural equivalence, we look at the neighborhood shared by two nodes; the size of this neighborhood defines how similar two nodes are. For instance, two brothers have in common sisters, mother, father, grandparents, etc. This shows that they are similar, whereas two random male or female individuals do not have much in common and are not similar. Measures Network and Measures Metrics 42

Structural Equivalence: Definitions Vertex similarity Jaccard Similarity: Cosine Similarity: In general, the definition of neighborhood N(v) excludes the node itself v. Nodes that are connected and do not share a neighbor will be assigned zero similarity This can be rectified by assuming nodes to be included in their neighborhoods Measures and Metrics 43

Similarity: Example Measures Network and Measures Metrics 44

Normalized Similarity Comparing the calculated similarity value with its expected value where vertices pick their neighbors at random For vertices i and j with degrees d i and d j this expectation is d i d j /n Measures and Metrics 45

Normalized Similarity, cont. Measures Network and Measures Metrics 46

Normalized Similarity, cont. Covariance between A i and A j Covariance can be normalized by the multiplication of Standard deviations: Pearson correlation coefficient: [-1,1] Measures and Metrics 47

Regular Equivalence In regular equivalence, we do not look at neighborhoods shared between individuals, but how neighborhoods themselves are similar For instance, athletes are similar not because they know each other in person, but since they know similar individuals, such as coaches, trainers, other players, etc. Measures Network and Measures Metrics 48

Regular Equivalence v i, v j are similar when their neighbors v k and v l are similar The equation (left figure) is hard to solve since it is self referential so we relax our definition using the right figure Measures Network and Measures Metrics 49

Regular Equivalence v i, v j are similar when v j is similar to v i s neighbors v k In vector format A vertex is highly similar to itself, we guarantee this by adding an identity matrix to the equation Measures Network and Measures Metrics 50

Regular Equivalence v i, v j are similar when v j is similar to v i s neighbors v k In vector format A vertex is highly similar to itself, we guarantee this by adding an identity matrix to the equation Measures and Metrics 51

Regular Equivalence: Example The largest eigenvalue of A is 2.43 Set α = 0.4 < 1/2.43 Any row/column of this matrix shows the similarity to other vertices We can see that vertex 1 is most similar (other than itself) to vertices 2 and 3 Nodes 2 and 3 have the highest similarity Measures and Metrics 52