Predicting Influentials in Online Social Networks



Similar documents
Dynamics of information spread on networks. Kristina Lerman USC Information Sciences Institute

Social Media Mining. Network Measures

Identification of Influencers - Measuring Influence in Customer Networks

Practical Graph Mining with R. 5. Link Analysis

Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)

Social Network Mining

Healthcare Analytics. Aryya Gangopadhyay UMBC

Algorithms for representing network centrality, groups and density and clustered graph representation

Performance Metrics for Graph Mining Tasks

Information Flow and the Locus of Influence in Online User Networks: The Case of ios Jailbreak *

Risk Mitigation Strategies for Critical Infrastructures Based on Graph Centrality Analysis

Walk-Based Centrality and Communicability Measures for Network Analysis

Part 1: Link Analysis & Page Rank

PageRank Conveniention of a Single Web Pageboard

FIGHTING SPAM WEBSITES AND SOCIAL I.P. IN ON SOCIAL NEWS

SGL: Stata graph library for network analysis

arxiv: v1 [math.pr] 15 Aug 2014

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

1 o Semestre 2007/2008

Valuation of Network Effects in Software Markets

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Computing an Aggregate Edge-Weight Function for Clustering Graphs with Multiple Edge Types

How To Understand The Network Of A Network

Statistical Machine Learning

Social Media Mining. Data Mining Essentials

Department of Cognitive Sciences University of California, Irvine 1

Content-Based Discovery of Twitter Influencers

The PageRank Citation Ranking: Bring Order to the Web

Search engines: ranking algorithms

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

UCINET Visualization and Quantitative Analysis Tutorial

A Network Approach to Define Modularity of Components in Complex Products

Model Combination. 24 Novembre 2009

Prediction of Stock Performance Using Analytical Techniques

Social Networks and Social Media

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Machine Learning and Pattern Recognition Logistic Regression

A linear algebraic method for pricing temporary life annuities

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Identifying Leaders and Followers in Online Social Networks

arxiv: v3 [math.na] 1 Oct 2012

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Network Architectures & Services

A SURVEY OF MODELS AND ALGORITHMS FOR SOCIAL INFLUENCE ANALYSIS

Leveraging Ensemble Models in SAS Enterprise Miner

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

3.2 Roulette and Markov Chains

A PARADIGM FOR DEVELOPING BETTER MEASURES OF MARKETING CONSTRUCTS

Statistics 100A Homework 4 Solutions

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Secretary Problems. October 21, José Soto SPAMS

Characterization and Modeling of Packet Loss of a VoIP Communication

Efficient Identification of Starters and Followers in Social Media

HT2015: SC4 Statistical Data Mining and Machine Learning

Roadmap to Data Analysis. Introduction to the Series, and I. Introduction to Statistical Thinking-A (Very) Short Introductory Course for Agencies

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Alabama Department of Postsecondary Education

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Data Mining Methods: Applications for Institutional Research

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Social Media Mining. Graph Essentials

Top Online Activities (Jupiter Communications, 2000) CS276A Text Information Retrieval, Mining, and Exploitation

The Importance of Bug Reports and Triaging in Collaborational Design

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Hidden Markov Models

New Metrics for Reputation Management in P2P Networks

Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges. Presenter: Sancheng Peng Zhaoqing University

Risk Management for IT Security: When Theory Meets Practice

AMS 5 CHANCE VARIABILITY

Statistical Analysis of Complete Social Networks

Graph Algorithms and Graph Databases. Dr. Daisy Zhe Wang CISE Department University of Florida August 27th 2014

Making Sense of the Mayhem: Machine Learning and March Madness

Markov Chains. Chapter Stochastic Processes

A Study of Internet Routing Stability Using Link Weight

DATA ANALYSIS II. Matrix Algorithms

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Analytic Hierarchy Process, a Psychometric Approach. 1 Introduction and Presentation of the Experiment

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks

Predicting success based on social network analysis

OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME. Amir Gandomi *, Saeed Zolfaghari **

STAT 35A HW2 Solutions

Predict Influencers in the Social Network

1 Nonzero sum games and Nash equilibria

Exam 3 Review/WIR 9 These problems will be started in class on April 7 and continued on April 8 at the WIR.

Security Optimization of Dynamic Networks with Probabilistic Graph Modeling and Linear Programming

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Beating the MLB Moneyline

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Web Link Analysis: Estimating a Document s Importance from its Context

Network Analysis of Project Communication Based on Graph Theory

Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality

Gerry Hobbs, Department of Statistics, West Virginia University

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Overview of the Stateof-the-Art. Networks. Evolution of social network studies

Holistic Information Security: Human Factor and Behavior Prediction using Social Media

Mining Metrics to Predict Component Failures

Transcription:

Predicting Influentials in Online Social Networks Rumi Ghosh Kristina Lerman USC Information Sciences Institute

WHO is IMPORTANT? Characteristics Topology Dynamic Processes /Nature of flow What are the suitable METRICS that can be used to PREDICT influentials in social network? Characteristics Dynamic Process How do we EVALUATE predictive models of influence? Social Web

Contributions Prediction of Influence Classification of influence models : Conservative and Non-conservative The details of the underlying dynamic process on a network should match those of the influence model. Page Rank is not always the best! Evaluation of Influence Empirical Measure of Influence (Statistically Significant) Social News Aggregator Digg Dynamic Process-Information propagation First work evaluating predictive models of influence, using the actual dynamic process, occurring in a social network Mathematical formulation and analytical proofs Normalized -Centrality

Dynamic Processes Classification of Dynamic Processes on Networks Conservative Non-conservative

Conservative Process $ $ $ $ $ $ $ $ $ Blue Bags in the network=8 Social Web

Conservative Process Blue Bags in the network=8 $ $ $ $ $ $ Social Web $ $ $

Conservative Process Blue Bags in the network=8 $ $ $ $ $ $ $ Social Web $ $

Non-Conservative Process On Twitter Spain Win Posts in the network=1 Social Web

Non-Conservative Process On Twitter Spain Win On Twitter Spain Win Posts in the network=3 Social Web On Twitter Spain Win

Non-Conservative Process On Twitter Spain Win On Twitter Spain Win Posts in the network=5 On Twitter Spain Win On Twitter Spain Win Social Web On Twitter Spain Win

Quantifying Influence Exists empirical studies, structural models Two ways to quantify influence 1 Empirically measure online social behavior or dynamic processes to estimate influence [Lee,Cha] 2 Use influence models (centrality metrics) based on the structure of the underlying social network to predict influence. We evaluate predictive influence models using empirical measures of influence.

Predictive Influence Models Geodesic Path Based ranking measures Closeness centrality [Hakimi, Sabidussi, Wassermann et al, Lin] Graph centrality [Hage at al.] Betweenness centrality [Freeman] Topological ranking measures Markov Process Based Ranking Measures Page Rank[Brin et al.] Hubbel s Model Degree Centrality In-degree Out-degree Centrality Path-Based Ranking Measures -centrality [Bonacich] normalized -centrality Katz score [Katz] SenderRank [Kiss et al.] EigenVector centrality [Bonacich]

Classification of Influence Models Non-Conservative Conservative Topological Ranking Measures Degree Centrality In-degree Centrality Out-degree Centrality Path-Based Ranking Measures -centrality Normalized -centrality Katz Score SenderRank Eigenvector Centrality Geodesic Path-Based Ranking Measures Closeness Centrality Graph Centrality Betweenness Centrality Topological Ranking Measures Markov Process Based Ranking Measures Page Rank Hubbel s Model

Page Rank 1 4 5 2 3 C pr, (i) (1 ) 1 n j fan(i) C pr, (j) d j out Page Rank

Degree Centrality 1 4 5 2 3 Degree centrality C d in(i) d in (i) C d out(i) d out (i) C d in out(i) d in (i) d out (i)

-centrality (Bonacich) 1 4 5 2 3 C,k A A 2... n A n 1... all paths <=5 k i A i where 1 1 i 0 Parameter sets the length scale of interactions Mean path length=(1- ) -1

Normalized -centrality 1 4 5 C,k k i 0 i A i 2 3 NC,k i, j 1 (C,k ) i, j C,k

Normalized -centrality -centrality Matrix: C,k C k t 0 t A t -centrality: vc,k v(i A) 1 where 1 1 NC,k n i, j Normalized -centrality: vc,k (C,k ) ij Simple Algorithm Does not depend on eigenvalue computation(unlike -centrality) We give analytical proofs and conditions for Equivalence of ranking due to normalized -centrality and -centrality Equivalence of ranking due to eigenvector centrality and normalized -centrality Convergence of normalized -centrality Criteria for parametric independence of normalized -centrality Other analytical proofs for limiting conditions and boundary values.

Which model best predicts influentials? Evaluation?

Information Flow on Digg Post submitter

Information Flow on Digg Post submitter fan fan fan

Information Flow on Digg Post Post submitter Fan vote Post Fan vote

Information Flow on Digg Post Post submitter Post Non-Conservative Information Propagation on Digg Hypothesis: non-conservative influence model best predicts influentials

Data Collection-Digg Post Post submitter 3553 stories 587 distinct submitters 139,410 distinct voters Post Post Of 574 connected submitters belonging to the friendship network, 504 submitters received at least 1 fan vote in first 100 votes (in at least 1 story). Social network: voter connected to at one or more voters 69,524 connected voters 574 connected submitters 3489 stories

Estimation of Influence Probability of a fan vote Influence of Submitter Quality of the story Story Quality Random variable Average out by aggregating fan votes over all stories submitted by the same submitter 289 submitters at least 2 stories Estimate of Influence Average fan votes Rank Users

Statistical Significance of Fan Votes as a Measure of Influence fan Post submitter fan URN MODEL fan Users in OSN (N) Fans of submitter in OSN (K) No. of users who voted (n) No. of fans who voted (k) P(X k K,N,n) K N k k n k N n Balls in the urn (N) White balls in the urn (K) No. of balls picked (n) No of white balls picked (k) (Hypergeometric Dist.)

Statistical Significance (Results) r 2 0.75 k 65 (1 e ( 0.001 K 0.0005 ) 0.86 ) P(X k K 10,69524,100) 0.00038 Probability of <k> fan votes in first 100 votes, given the submitter has K fans, happening purely by chance is negligible

Evaluation of Influence Predictions Correlation with the empirical estimate of influence Avg. # of fan votes in first 100 votes =damping (attenuation factor) in PageRank, (normalized) -centrality, SenderRank

Evaluation of Influence Prediction Top 100 users 0.8 0.7 0.6 0.5 Recall 0.4 0.3 0.2 0.1 Recall 0 Normalized alpha In-degree emp(i) [1,100] pred(i) [169,524] R emp pred emp Page-Rank Betweenness

Results on Digg Results corroborate our hypothesis Since underlying non-conservative dynamic process of (normalized) -centrality most closely resembles the dynamic process of information propagation on Digg (normalized) -centrality is a better predictor of influential users on Digg than other influence models.

Conclusion How to choose Prediction Models? First work classifying influence models into conservative and nonconservative To get the best predictions choose that influence model whose the implicit dynamic process matches that on the network How to evaluate Influence Models? First work evaluating predictive models of influence using the empirical measurements obtained from the network itself Novel Method of evaluation Evaluate using influence score estimated empirically from the network Social News Aggregator Digg Dynamic Process-Information propagation Non-conservative influence models best predict influentials on Digg where the underlying dynamic process of information propagation is non-conservative in nature. Normalized -Centrality Mathematical formulation and analytical proofs