Nodes, Ties and Influence

Similar documents
ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

Social Media Mining. Network Measures

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

Mining Social-Network Graphs

Social Networks and Social Media

How To Understand The Network Of A Network

Missing Data. Katyn & Elena

Computer Security Incident Handling Detec6on and Analysis

Outline. Setting the Stage. Se#ng the stage for precep0ng drug therapy assessment Elements of drug therapy assessment Hierarchy Flow chart

Strong and Weak Ties

Social Media Mining. Graph Essentials

Privacy- Preserving P2P Data Sharing with OneSwarm. Presented by. Adnan Malik

Practical Graph Mining with R. 5. Link Analysis

Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

Program Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional.

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

On the effect of forwarding table size on SDN network utilization

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Please stand up, with your entry 3cket in hand, and find a partner from another table. Round 1: Share one interven3on and the current stage of

Big Data in medical image processing

About the Board. Minnesota Board of Behavioral Health and Therapy 10/24/12. Minnesota Board of Behavioral Health and Therapy

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Splunk for Networking and SDN

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

Financial Fraud Threats & Preven3on. Mark Frank EVP, Senior Opera3ons Officer Colorado Business Bank

Adventures in Bouncerland. Nicholas J. Percoco Sean Schulte Trustwave SpiderLabs

IT Change Management Process Training

Using Mobile to Capture In- the- Moment Insights

Mining Social Network Graphs

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

A SURVEY OF MODELS AND ALGORITHMS FOR SOCIAL INFLUENCE ANALYSIS

ECBDL 14: Evolu/onary Computa/on for Big Data and Big Learning Workshop July 13 th, 2014 Big Data Compe//on

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Encrypted Opening and Replying to a Secure Message

Help Framework. Ticket Management Ticket Resolu/on Communica/ons. Ticket Assignment Follow up Customer - communica/on System updates Delay management

Debugging & Profiling with Open Source SW Tools

The Shi'ing Role of School Psychologists within a Mul7-7ered System of Support Framework. FASP Annual Conference October 29, 2015

608: Economics of Regulation. Lecture 8: Competition for Market, Franchise Bidding and Cable TV regulation. Sugata Bag Delhi School of Economics

inforouter Workflows How the create a workflow defini6on in inforouter - Version 8.0.x

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Network Analysis For Sustainability Management

Fixed Scope Offering (FSO) for Oracle SRM

Blue Medora VMware vcenter Opera3ons Manager Management Pack for Oracle Enterprise Manager

Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges. Presenter: Sancheng Peng Zhaoqing University

Amjad Zaim, PhD. Cognitro Analytics, Founder and CEO

Run$me Query Op$miza$on

CMMI for High-Performance with TSP/PSP

The PageRank Citation Ranking: Bring Order to the Web

Social Network Mining

Network Analysis Basics and applications to online data

Geo- social Network Analysis and Applica5ons

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER

Kaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars

BIG DATA AND INVESTIGATIVE ANALYTICS

About Eric Garcia. Simplifying the Management of Your Online Reputation Eric D. Garcia, IT & Digital Marketing Consultant 3/1/15

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Performance Management. Ch. 9 The Performance Measurement. Mechanism. Chiara Demar8ni UNIVERSITY OF PAVIA. mariachiara.demar8ni@unipv.

Preparing ITAs to Teach Online

An Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style

Action Research Findings

Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology

Alessandro Laio, Maria d Errico and Alex Rodriguez SISSA (Trieste)

Transcription:

Nodes, Ties and Influence Chapter 2 Chapter 2, Community Detec:on and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010. 1

IMPORTANCE OF NODES 2

Importance of Nodes Not all nodes are equally important Centrality Analysis: Find out the most important nodes in one network Commonly- used Measures Degree Centrality Closeness Centrality Betweenness Centrality Eigenvector Centrality 3

Degree Centrality The importance of a node is determined by the number of nodes adjacent to it The larger the degree, the more import the node is Only a small number of nodes have high degrees in many real- life networks Degree Centrality Normalized Degree Centrality: For node 1, degree centrality is 3; Normalized degree centrality is 3/(9-1)=3/8. 4

Closeness Centrality Central nodes are important, as they can reach the whole network more quickly than non- central nodes Importance measured by how close a node is to other nodes Average Distance: Closeness Centrality 5

Closeness Centrality Example Node 4 is more central than node 3 6

Betweenness Centrality Node betweenness counts the number of shortest paths that pass one node Nodes with high betweenness are important in communica:on and informa:on diffusion Betweenness Centrality σ st : The number of shortest paths between s and t σ st (v i ): The number of shortest paths between s and t that pass vi 7

Betweenness Centrality Example C B (4) = 15 What s the betweenness centrality for node 5? σ st : The number of shortest paths between s and t σ st (v i ): The number of shortest paths between s and t that pass v i 8

Eigenvector Centrality One s importance is determined by his friends If one has many important friends, he should be important as well. The centrality corresponds to the top eigenvector of the adjacency matrix A. A variant of this eigenvector centrality is the PageRank score. 9

STRENGTHS OF TIES 10

Weak and Strong Ties In prac:ce, connec:ons are not of the same strength Interpersonal social networks are composed of strong :es (close friends) and weak :es (acquaintances). Strong :es and weak :es play different roles for community forma:on and informa:on diffusion Strength of Weak Ties (Granove(er, 1973) Occasional encounters with distant acquaintances can provide important informa:on about new opportuni:es for job search 11

Connec:ons in Social Media Social Media allows users to connect to each other more easily than ever One user might have thousands of friends online Who are the most important ones among your 300 Facebook friends? Impera:ve to es:mate the strengths of :es for advanced analysis Analyze network topology Learn from User Profiles and Ajributes Learn from User Ac:vi:es 12

Learning from Network Topology Bridges connec:ng two different communi:es are weak :es An edge is a bridge if its removal results in disconnec:on of its terminal nodes e(2,5) is a bridge e(2,5) is NOT a bridge 13

shortcut Bridge Bridges are rare in real- life networks Alterna:vely, one can relax the defini:on by checking if the distance between two terminal nodes increases if the edge is removed The larger the distance, the weaker the :e is d(2,5) = 4 if e(2,5) is removed d(5,6) = 2 if e(5,6) is removed e(5,6) is a stronger :e than e(2,5) 14

Neighborhood Overlap Tie Strength can be measured based on neighborhood overlap; the larger the overlap, the stronger the :e is. - 2 in the denominator is to exclude v i and v j 15

Learning from Profiles and Interac:ons Twijer: one can follow others without followee s confirma:on The real friendship network is determined by the frequency two users talk to each other, rather than the follower- followee network The real friendship network is more influen:al in driving Twijer usage Strengths of :es can be predicted accurately based on various informa:on from Facebook Friend- ini:ated posts, message exchanged in wall post, number of mutual friends, etc. Learning numeric link strength by maximum likelihood es:ma:on User profile similarity determines the strength Link strength in turn determines user interac:on Maximize the likelihood based on observed profiles and interac:ons 16

Learning from User Ac:vi:es One might learn how one influences his friends if the user ac:vity log is accessible Depending on the adopted influence model Independent cascading model Linear threshold model Maximizing the likelihood of user ac:vity given an influence model 17

INFLUENCE MODELING 18

Influence modeling Influence modeling is one of the fundamental ques:ons in order to understand the informa:on diffusion, spread of new ideas, and word- of- mouth (viral) marke:ng Well known Influence modeling methods 1. Linear threshold model (LTM) 2. Independent cascade model (ICM)

Common proper@es of Influence modeling methods A social network is represented a directed graph, with each actor being one node; Each node is started as ac:ve or inac:ve; A node, once ac:vated, will ac:vate his neighboring nodes; Once a node is ac:vated, this node cannot be deac:vated.

Linear Threshold Model An actor would take an ac:on if the number of his friends who have taken the ac:on exceeds (reaches) a certain threshold Each node v chooses a threshold ϴ v randomly from a uniform distribu:on in an interval between 0 and 1. In each discrete step, all nodes that were ac:ve in the previous step remain ac:ve The nodes sa:sfying the following condi:on will be ac:vated

Linear Threshold Model- Diffusion Process (Threshold = 50%)

Independent Cascade Model (ICM) The independent cascade model focuses on the sender s rather than the receiver s view A node w, once ac?vated at step t, has one chance to ac?vate each of its neighbors randomly For a neighboring node (say, v), the ac?va?on succeeds with probability p w,v (e.g. p = 0.5) If the ac:va:on succeeds, then v will become ac?ve at step t + 1 In the subsequent rounds, w will not a(empt to ac?vate v anymore. The diffusion process, starts with an ini:al ac:vated set of nodes, then con:nues un:l no further ac:va:on is possible

Independent Cascade Model- Diffusion Process

Influence Maximiza@on Given a network and a parameter k, which k nodes should be selected to be in the ac:va:on set B in order to maximize the influence in terms of ac?ve nodes at the end? Let σ(b) denote the expected number of nodes that can be influenced by B, the op:miza:on problem can be formulated as follows:

Influence Maximiza@on- A greedy approach Maximizing the influence, is a NP- hard problem but it is proved that the greedy approaches gives a solu:on that is 63 % of the op:mal. A greedy approach: Start with B = Ø Evaluate σ(v) for each node, and pick the node with maximum σ as the first node v1 to form B = {v1} Select a node which will increase σ(b) most if the node is included in B. Essen?ally, we greedily find a node v V \B such that

DISTINGUISH BETWEEN INFLUENCE AND CORRELATION 27

Correla@on It has been widely observed that user ajributes and behaviors tend to correlate with their social networks Suppose we have a binary ajribute with each node (say, whether or not being smoker) If the ajribute is correlated with the network, we expect actors sharing the same ajribute value to be posi:vely correlated with social connec:ons That is, smokers are more likely to interact with other smokers, and non- smokers with non- smokers

Test For Correla@on If the frac:on of edges linking nodes with different ajribute values are significantly less than the expected probability, then there is evidence of correla:on Example; if connec:ons are independent of the smoking behavior: p frac:on are smokers (1- p non- smoker) one edge is expected to connect two smokers with probability p p, two non- smokers with probability: (1 p) (1 p) A smoker and a non- smoker: 2 p (1- p)

Test For Correla@on- An example Red nodes denote non- smokers, and green ones are smokers. If there is no correla:on, then the probability of one edge connec:ng a smoker and a non- smoker is 2 4/9 5/9 = 49%. In this example the frac?on is 2/14 = 14% < 49%, so this network demonstrates some degree of correla:on with respect to the smoking behavior. A more formal way is to conduct a χ2 test for independence of social connec?ons and a(ributes (La Fond and Neville, 2010)

Correla@on in social networks It is well known that there exist correla:ons between behaviors or ajributes of adjacent actors in a social network. Three major social processes to explain correla:on are: Homophily, confounding, and influence homophily influence Confounding

Correla@on in social networks Homophily; is a term to explain our tendency to link to others that share certain similarity with us Confounding; correla:on between actors can also be forged due to external influences from environment. two individuals living in the same city are more likely to become friends than two random individuals Influence; a process that causes behavioral correla:ons between adjacent actors. if most of one s friends switch to a mobile company, he might be influenced by his friends and switch to the company as well.

Influence or Correla@on In many studies about influence modeling, influence is determined by :mestamps Shuffle test is an approach to iden:fy whether influence is a factor associated with a social system The probability of one node being ac:ve is a logis:c func:on of the number of his ac:ve friends as follows a is the number of ac?ve friends, α the social correla?on coefficient and β a constant to explain the innate bias for ac:va:on

Ac@va@on likelihood Suppose at one :me point t, Y a,t users with a ac?ve friends become ac:ve, and N a,t users who also have a ac?ve friends yet stay inac?ve at?me t. The likelihood at :me t is Given the user ac:vity log, we can compute a correla:on coefficient α to maximize the above likelihood.

Shuffle test The key idea of the shuffle test is that if influence does not play a role, the :ming of ac:va:on should be independent of the :ming of other actors. Thus, even if we randomly shuffle the :mestamps of user ac:vi:es, we should obtain a similar α value. Test for Influence: Aper we shuffle the :mestamps of user ac:vi:es, if the new es:mate of social correla:on is significantly different from the es:mate based on the user ac:vity log, then there is evidence of influence.

Book Available at Morgan & claypool Publishers Amazon If you have any comments, please feel free to contact: Lei Tang, Yahoo! Labs, ltang@yahoo- inc.com Huan Liu, ASU huanliu@asu.edu 36