Statistical Analysis of Complete Social Networks

Similar documents

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy

Practical Graph Mining with R. 5. Link Analysis

Social Media Mining. Network Measures

Social Media Mining. Graph Essentials

SGL: Stata graph library for network analysis

Imputation of missing network data: Some simple procedures

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS. Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00

Workshop in Applied Analysis Software MY591. Introduction to Social Network Analysis with UCINET

Introduction to social network analysis

Follow links for Class Use and other Permissions. For more information send to:

DATA ANALYSIS II. Matrix Algorithms

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Social and Economic Networks: Lecture 1, Networks?

Network Analysis Basics and applications to online data

Social Networking Analytics

Equivalence Concepts for Social Networks

Protein Protein Interaction Networks

Part 2: Community Detection

Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation

Introduction to Ego Network Analysis

Course on Social Network Analysis Graphs and Networks

A comparative study of social network analysis tools

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Social Network Analysis

Social network analysis with R sna package

Examining graduate committee faculty compositions- A social network analysis example. Kathryn Shirley and Kelly D. Bradley. University of Kentucky

Introduction to Networks and Business Intelligence

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

CSV886: Social, Economics and Business Networks. Lecture 2: Affiliation and Balance. R Ravi ravi+iitd@andrew.cmu.edu

Social Network Analysis: Visualization Tools

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Distributed Computing over Communication Networks: Maximal Independent Set

A Modified Elicitation of Personal Networks Using Dynamic Visualization

Social Network Mining

MINFS544: Business Network Data Analytics and Applications

How To Understand The Network Of A Network

Enterprise Organization and Communication Network

Analysis of Algorithms, I

PROBABILISTIC NETWORK ANALYSIS

Walk-Based Centrality and Communicability Measures for Network Analysis

Predicting Influentials in Online Social Networks

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Data quality in Accounting Information Systems

UCINET Visualization and Quantitative Analysis Tutorial

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Data Structure [Question Bank]

A case study of social network analysis of the discussion area of a virtual learning platform

Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)

The mathematics of networks

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Dmitri Krioukov CAIDA/UCSD

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Groups and Positions in Complete Networks

Mining Social-Network Graphs

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

UCINET Quick Start Guide

Statistical Analysis of Social Networks

Introduction to Social Network Methods

Specific Usage of Visual Data Analysis Techniques

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

Graph/Network Visualization

A discussion of Statistical Mechanics of Complex Networks P. Part I

Network-Based Tools for the Visualization and Analysis of Domain Models

How To Find Local Affinity Patterns In Big Data

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

SAP InfiniteInsight 7.0 SP1

How To Analyze The Social Interaction Between Students Of Ou

Complex Network Analysis of Brain Connectivity: An Introduction LABREPORT 5

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Gerry Hobbs, Department of Statistics, West Virginia University

Social Network Analysis using Graph Metrics of Web-based Social Networks

Simple Predictive Analytics Curtis Seare

Approximation Algorithms

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

1 o Semestre 2007/2008

Metabolic Network Analysis

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Recall this chart that showed how most of our course would be organized:

Social Network Analysis Measuring, Mapping, and Modeling Collections of Connections

A Property & Casualty Insurance Predictive Modeling Process in SAS

List of Examples. Examples 319

Spreadsheet software for linear regression analysis

Practical statistical network analysis (with R and igraph)

Transcription:

Statistical Analysis of Complete Social Networks Introduction to networks Christian Steglich c.e.g.steglich@rug.nl median geodesic distance between groups 1.8 1.2 0.6 transitivity 0.0 0.0 0.5 1.0 1.5 2.0 homophily 6.0 3.3 3.6 3.9 4.2 4.5 4.8 5.1 5.4 5.7 3.0 c b K b a ( s x s x ) Pr( x i x ) ln( ) = β ( ) ( ) c a k ik ik Pr( x x ) i k= 1

Overview of topics Network research in the Some characterising features Network data Definitions Data collection & related problems Different types of network data A quick overview of concepts in network analysis This will help understanding later course topics Statistical Analysis of Complete Social Networks 2

What are social networks? Social networking web services Stay in contact with friends Find new partners Social networking as strategy Career building for individuals Trust building among organisations Social networks as organisation form: Terrorist networks Flash mobs More systematically... Statistical Analysis of Complete Social Networks 3

When is network research appropriate? Generally whenever the detailed functioning of a social system is studied. Not just the actors of the system and their individual properties ( composition of the system ), but how they relate to each other ( structure of the system )! Compared to other disciplines, network research has been labelled holistic, non-reductionist, organic, or explanatory. Characteristic are the interdependencies between the different actors in the system. Statistical Analysis of Complete Social Networks 4

The meat grinder argument (Barton, 1968) Survey data cannot possibly reveal anything about how social systems work. Social system Independence assumptions Population of social units The network approach can! Survey sample Statistical Analysis of Complete Social Networks 5

The social network approach is quite universal Social actors and social relations can be many things. One can distinguish between the domains of constructed social order (government, organisations) and sponta- neous social order (markets, primary socialisation). In all four domains, social networks can be studied: Government International scale: political alliances, trade, all sorts of contracted agreements between countries. Domestic scale: networking of different government agencies (e.g., agencies involved in water management of river X ). Statistical Analysis of Complete Social Networks 6

Markets Trade networks: flow of goods, supply chains, auctions, buyerseller (bipartite) networks. Labour markets: Vacancy chains, getting jobs. Organisations Intra-organisational: Formal and informal relations at work. Inter-organisational: Contracting, interlocking directorates. Primary Social Order ( PSO ) Kin, friendship, confiding, and other private relations. There are overlapping aspects e.g. research on social capital addresses (in part) how PSO & informal organisational ties explain market position of social actors... Statistical Analysis of Complete Social Networks 7

Networks in general might be even more universal Physicists take on networks as universal entities : Gene networks (one gene activating another gene) Neural networks (brain cells activating each other) Food webs (see diagram) Transport networks (power grids, highway nets, telecommunication) Are all these different things indeed similar, on some level? Statistical Analysis of Complete Social Networks 8

Formal vs. Informal / Design vs. Emergence Social systems in constructed social orders can be part of the construction (formal networks, designed interdependence) or not (informal networks, emergent interdependence). Department structure (formal group) vs. communities (informal) Job description (formal role) vs. emergent (informal) roles Organigram (formal power) vs. collegial power attribution The informal, emergent phenomena is what network research typically focuses on. Statistical Analysis of Complete Social Networks 9

Statistical inference for interdependent data? The statistical procedures that hold for random samples do generally NOT hold for a network data set! Interdependence of social actors precludes this. Procedures from classical statistics not applicable (no regression, no ANOVA, no t-tests...) Special statistical procedures are required. Statistical Analysis of Complete Social Networks 10

Qualitative or Quantitative? Social network research knows a huge toolbox of quantitative measures: centrality, power, geodesic distances, etc., however, typically, just one social system is under study. SNA uses quantitative tools for qualitative case studies. Some form of external validation of SNA results is always desirable. Replication of results is essential. Better yet, work with a random sample of networks. Then you can also generalise results to a population (of networks). Statistical Analysis of Complete Social Networks 11

Handling networks: Data issues Definition & Notation Data collection Data storage Bipartite networks, multiplexity, components Classical data sets Statistical Analysis of Complete Social Networks 12

Definition & Notation Mathematically, a (binary) network is defined as a graph G=(V,E), where V={1,2,...,n} is a set of vertices (or nodes ) and E {<i,j> i,j V} is a set of edges (or ties, arcs ), one edge just being a pair of vertices. In social network analysis, nodes are typically called (social) actors (individuals, firms, countries,...) We write x ij =1 if actors i and j are related to each other (i.e., if <i,j> E), and x ij =0 otherwise. Networks can be symmetric (x ij =x ji for all i,j), and the definition can be extended to valued networks (x ij R). Statistical Analysis of Complete Social Networks 13

Network data collection Ways to collect network data Design of network studies Measurement of networks Methodological issues when collecting network data Ethical issues when collecting network data Statistical Analysis of Complete Social Networks 14

Data collection design Socio-centric or complete network design Identify a group, assess ties within this group. (Default case.) Ego-centric design, can be part of a classical survey Ask (random) sample of respondents about their contacts, and the contact persons interconnectedness. Respondent-driven design, used in open system networks Design exploits network structure. Necessary for study of hidden populations (e.g., injection drug users); let early respondents recruit subsequent respondents. In huge networks: method to extract meaningful communities. Statistical Analysis of Complete Social Networks 15

Measurement Observation Professional observer codes visible interactions. Name generators in questionnaires or interviews Cued recall (from lists of available candidates) vs. free recall (candidates from own memory); Limited vs. unlimited nominations; Resource-based vs. role-based. Mining Exploit relational data bases (internet based, stock exchange, trade auctions,...). Statistical Analysis of Complete Social Networks 16

Methodological issues Boundary specification Where does the network end? Sensitivity to missing data More problematic than in random samples, as structure can be affected crucially by non-response. Difficulty to distinguish missingness from non-response for some data collection methods. Informant accuracy lacking (Killworth, Bernard, et al.) People systematically forget about lost relations. Feld & Carter (2003): expansiveness and attractiveness bias. Statistical Analysis of Complete Social Networks 17

Ethical issues General points: Respect, benificence, justice; Informed consent, assessing risks & benefits; Anonymity, confidentiality, privacy. Network-specific points: Possibility to re-construct identities even after anonymisation; Passive consent vs. active consent and the double role of what participation in the study means. Statistical Analysis of Complete Social Networks 18

Data storage Adjacency matrix Node list X = 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 1 0 a 1 a 2 a 4 a 3 a 5 a 1 a 3 a 2 Edge list a 3 a 2, a 4, a 5 a 4 a 5 a 5 a 4 Statistical Analysis of Complete Social Networks 19

Two-mode data Two-mode (a.k.a. bipartite) networks consist of ties between two qualitatively distinct sets of nodes, often events and actors attending them They can be projected to two onemode networks: between actors (relation: sharing events) and between events (relation: sharing actors) On the right, you see hangjongeren (adolescents, blue circles) by policeregistered incidents (red squares) in a village in province Drenthe/NL Statistical Analysis of Complete Social Networks 20

An almost bipartite network: Romantic and Sexual Relations at Jefferson High School Characteristic feature: few cycles, in particular few 4-cycles. Statistical Analysis of Complete Social Networks 21

4-cycle prohibition for kinship relations is partly coded into law (Figure from Lin Freeman s 2004 book on the history of SNA). Statistical Analysis of Complete Social Networks 22

Multiplexity There can be multiple social relations between the same actors; this is called multiplexity. Formal job hierarchy Workflow Statistical Analysis of Complete Social Networks 23

Components A network can split into mutually unconnected regions. These regions are called components of the network. On the right you see a network separating into seven components. Statistical Analysis of Complete Social Networks 24

Classical data sets Newcomb s Fraternity data (1961) Liking rankings among 17 university students. Sampson s Monastery data (1968) Trust & liking scales among 25 novice monks. Zachary s Karate Club data (1977) Friendship among 34 members of a Karate club. Padgett s Florentine Families data (1994) Marriage ties among families in late medieval Florence. & a couple of others... All classics are typically provided with software, e.g., Ucinet. Statistical Analysis of Complete Social Networks 25

Some concepts in descriptive network analysis Dyad census and triad census Distances & geodesic distributions Degree distributions Centrality measures Status / power measures Positional measures Macro measures: segmentation, segregation, centralisation, hierarchisation, autocorrelation Statistical Analysis of Complete Social Networks 26

Dyad census A dyad is a pair of actors <i,j> in the network, plus the configuration of tie variables <x ij, x ji > between them. In a directed, binary network, there are n(n-1) tie variables located in n(n-1)/2 dyads. Dyads can be of three types: M mutual A asymmetric N null relation (n=75) M A N friendship 214 313 2248 advice 45 217 2513 A simple count of types ( dyad census ) gives information about the degree to which the network is symmetric. Statistical Analysis of Complete Social Networks 27

Network indices based on the dyad census Density of the network Is defined as the proportion of actually observed ties among the potentially observable ones. Actually observed are 2M+A 2M + A density = Potentially observable are n(n-1) n(n 1) Reciprocity index of the network Can be defined as the proportion of actually reciprocated ties among the potentially reciprocable ones. Actually reciprocated are 2M Potentially reciprocable are 2M+A reciprocity = 2M 2M + A Statistical Analysis of Complete Social Networks 28

Indices as (conditional) tie probabilities Assume two nodes i,j are randomly sampled from the network. Then...... the (unconditional) probability that i is tied to j is Pr(xij = 1) = density... the conditional probability that i is tied to j, given that j is tied to i, is relation (n=75) Pr(x = 1 x = 1) = reciprocity M A N density ij ji logodds reciprocity logodds Effect of the incoming tie x ji on the logodds of outgoing tie x ij friendship 214 313 2248 13.3% -1.87 57.6% 0.31 2.18 advice 45 217 2513 5.5% -2.84 29.3% -0.88 1.96 Statistical Analysis of Complete Social Networks 29

Triad census A triad is a set of three actors in the network, together with the configuration of ties between them. In a directed, binary network, there are sixteen triad types typically indicated by their dyad census MAN plus (where necessary) a distinguishing letter: C = cyclical U = up D = down T = transitive Statistical Analysis of Complete Social Networks 30

Network indices based on the triad census Transitivity index of the network Is defined as the proportion of actually observed transitively closed triples of nodes <i,k,j> among the k observed potentially closed paths of length 2 from i to j via k. i j Actually closed are 030T + 2 120D + 2 120U + 120C + 3 210 + 6 300 Potentially closed are 021c + 111D + 111U + 030T + 3 030C + 2 201 + 2 120D + 2 120U + 3 120C + 4 210 + 6 300 Statistical Analysis of Complete Social Networks 31

Transitivity index as conditional probability The index can also be calculated directly from the matrix : transitivity = x x x j i,k i,k j ik kj ij i j x x j i,k i,k j ik kj k relation (n=75) actually transitive potentially transtitive In network literature also other indices are used! transitivity index friendship 4143 9546 0.43 advice 369 1213 0.30 Statistical Analysis of Complete Social Networks 32

Distances in a network Length of a shortest connecting path defines the (sociometric) distance between two actors in a network 15 33 9 1 7 17 Example of a shortest path of length 5 Statistical Analysis of Complete Social Networks 33

Calculating distances Matrix calculus: The adjacency matrix X indicates which row actor is directly connected to which column actor. So the squared adjacency matrix X 2 indicates which row actor can reach which column actor in two steps. etc. sociometric shortest distances a.k.a. geodesic distances X 2 a 3 a a 5 1 a 2 a 4 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0 1 0 1 1 0 1 0 1 1 = 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 a 1 a 2 a 3 a 5 a 4 Statistical Analysis of Complete Social Networks 34

Geodesic distributions For each network, you can calculate how often each distance occurs. This generates the distribution of geodesic distances. Actors in separate components have infinite distance. 17.5% 15.0% 12.5% 10.0% 7.5% 5.0% 2.5% 0.0% 0 3 6 9 12 15 18 21 24 On the right you see a series of distributions of geodesic distances T2 T3 T4 in English/Welsh school-based friendship networks over 3 years. Statistical Analysis of Complete Social Networks 35

Degree distributions For each network, you can calculate how often each in- or outdegree occurs. This generates the so-called degree distributions. Outliers on the right are called hubs. On the right you see a series of distributions of outdegrees (top) and indegrees (bottom) in a Scottish school-based friendship network over 3 years. 20% 15% 10% 5% 0% 20% 15% 10% 5% 0% 0 1 2 3 4 5 6 T1 T2 T3 0 1 2 3 4 5 6 7 8 9 10 11 12 Statistical Analysis of Complete Social Networks 36

Scale-free networks Degree distributions are often depicted on a loglog-scale (physics tradition) If the distribution is linear on this scale, the network is called scalefree (A.-L. Barabási) On the right, you see the indegree distribution of the previous slide s data set on a log-log-scale. 32 16 8 4 2 1 T1 T2 T3 1 2 4 8 16 32 Statistical Analysis of Complete Social Networks 37

Centrality and Centralisation Basic idea behind these concepts: Well-connected actors are in a structurally advantageous position. What makes an actor well-connected? different notions of centrality try to capture such aspects When actors in a network differ strongly in terms of their centrality, the network is called centralised Statistical Analysis of Complete Social Networks 38

Structurally advantageous positions A star structure expresses the centrality concept most clearly: Peripheral actors are only indirectly connected to each other, they rely on the central actor for access to each other. There is some nuance to this Central actor is connected to everyone, dominates access. Statistical Analysis of Complete Social Networks 39

Three centrality measures Degree centrality Number of people nominating you (indegree centrality) or nominated by you (outdegree centrality) Eigenvector centrality Your centrality is proportional to the sum of your nominees / nominators centrality. Betweenness centrality Sum of fractions of shortest paths between any two nodes that pass through a given node S2 W9 S2 W9 S2 W9 I3 W6 W8 S4 I3 W6 W8 S4 I3 W6 W8 W7 W7 W7 W5 W5 W5 S1 W4 W1 W3 S1 W4 W1 W3 S1 W4 W1 W3 W2 I1 W2 I1 W2 I1 S4 Statistical Analysis of Complete Social Networks 40

Status / prestige measures Centrality works in the first place for symmetric (undirected) networks. When evaluating them on directed networks, they tend to acquire the meaning of prestige or status measures. Katz sociometric status (1953) is essentially directed eigenvector centrality Other status measures: Popularity / indegree; Hubbell s & Taylor s proposals,... Statistical Analysis of Complete Social Networks 41

Positional measures Group member peripheral bridge isolate Problem: very algorithmic, various definitions of what a group is, etc. Structural holes / brokerage Actors can exploit other actors disconnectedness Statistical Analysis of Complete Social Networks 42

Macro measures Segmentation Degree to which network falls apart into components. Segregation Degree to which network falls apart into components that are homogeneous on some actor property (e.g., sex, race). Centralisation Degree to which actors differ in centrality. Statistical Analysis of Complete Social Networks 43

More macro measures Hierarchisation (Krackhardt, 1994) Degree to which network exhibits a tree structure. Autocorrelation (Moran, Geary) Degree to which actors who are similar on a variable are also directly connected in the network. Density, reciprocity, transitivity,... Median geodesic distance, average ego-network density etc. Statistical Analysis of Complete Social Networks 44

Qualitative or quantitative? Social network research knows a huge toolbox of quantitative measures: centrality, power, geodesic distances, etc., however, typically, just one social system is under study. SNA uses quantitative tools for qualitative case studies. Some form of external validation of SNA results is always desirable. Replication of results is essential. Better yet, work with a random sample of networks. Then you can also generalise results to a population (of networks). Statistical Analysis of Complete Social Networks 45

Understanding the hype Why is social network research hot now? Data collection is easier than in the past. Mining opens up a lot of possibilities, especially when coupled with new social media. Data analysis is possible only now. Statistical methods for hypothesis testing are computationally intensive, require high-end calculators. Mainstream awareness of networks has risen. Political events & information technology. Statistical Analysis of Complete Social Networks 46