Statistical Analysis of Social Networks

Similar documents
Handling attrition and non-response in longitudinal data

Imputation of missing network data: Some simple procedures

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

ONS Methodology Working Paper Series No 4. Non-probability Survey Sampling in Official Statistics

Equivalence Concepts for Social Networks

Statistical Models for Social Networks

Social Media Mining. Network Measures

Statistical and computational challenges in networks and cybersecurity

PROBABILISTIC NETWORK ANALYSIS

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy

Inferential Network Analysis with Exponential Random. Graph Models

Sensitivity Analysis in Multiple Imputation for Missing Data

Introduction to social network analysis

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

An extension of the factoring likelihood approach for non-monotone missing data

Statistics in Applications III. Distribution Theory and Inference

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Statistics Graduate Courses

Missing data in randomized controlled trials (RCTs) can

Exploring Big Data in Social Networks

Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation

ARTICLE IN PRESS Social Networks xxx (2013) xxx xxx

Protocols for Randomized Experiments to Identify Network Contagion

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Statistical Analysis of Complete Social Networks

Statistical Rules of Thumb

2. Making example missing-value datasets: MCAR, MAR, and MNAR

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Social network analysis with R sna package

MINFS544: Business Network Data Analytics and Applications

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

SAS Software to Fit the Generalized Linear Model

CSV886: Social, Economics and Business Networks. Lecture 2: Affiliation and Balance. R Ravi ravi+iitd@andrew.cmu.edu

The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is

2006 RESEARCH GRANT FINAL PROJECT REPORT

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Social Networking Analytics

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Multilevel Modeling of Complex Survey Data

What Is Probability?

IBM SPSS Modeler Social Network Analysis 15 User Guide

School of Public Health and Health Services Department of Epidemiology and Biostatistics

10. Analysis of Longitudinal Studies Repeat-measures analysis

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set

Towards running complex models on big data

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Electronic Theses and Dissertations UC Riverside

Practical Graph Mining with R. 5. Link Analysis

Bayesian Statistics in One Hour. Patrick Lam

Categorical Data Analysis

Statistical Models in Data Mining

Postdocs: All Sectors (N=533)

Introduction to Markov Chain Monte Carlo

COLLEGE ALGEBRA IN CONTEXT: Redefining the College Algebra Experience

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band

INTRODUCTION TO SURVEY DATA ANALYSIS THROUGH STATISTICAL PACKAGES

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Secondary school students use of computers at home

THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS. Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00

RDS INCORPORATED. RDS Analysis Tool v5.3. User Manual

Lecture 6. Artificial Neural Networks

Power and sample size in multilevel modeling

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

AFTER HIGH SCHOOL: A FIRST LOOK AT THE POSTSCHOOL EXPERIENCES OF YOUTH WITH DISABILITIES

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A Basic Introduction to Missing Data

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

Dealing with Missing Data

The Basics of Graphical Models

Transcription:

Statistical Analysis of Social Networks Krista J. Gile University of Massachusetts, Amherst Octover 24, 2013

Collaborators: Social Network Analysis [1] Isabelle Beaudry, UMass Amherst Elena Erosheva, University of Washington Ian Fellows, FellStat Karen Fredriksen-Goldsen, University of Washington Krista J. Gile, UMass, Amherst Mark S. Handcock, UCLA Lisa G. Johnston, Tulane University, UCSF Corinne M. Mar, University of Washington Miles Ott, Carleton College Matt Salganik, Princeton University Amber Tomas, Mathematica Policy Research For details, see: http://www.math.umass.edu/ gile

Career Path (how it felt) Social Network Analysis [2] M.S. Work people High School PhD, Postdoc Faculty College math

Career Path (employer version) Social Network Analysis [3] ratio: math/people High School College M.S. Work PhD Postdoc Faculty

Outline of Presentation Social Network Analysis [4] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Outline of Presentation Social Network Analysis [5] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

(Cross-Sectional) Social Networks Social Network Analysis [6] Social Network: Tool to formally represent and quantify relational social structure. Relations can include: friendships, workplace collaborations, international trade Represent mathematically as a sociomatrix, Y, where Y ij = the value of the relationship from i to j 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 (c) Sociogram (d) Sociomatrix

Social Networks that are harder: Social Network Analysis [7] Networks that change over time (ties, nodes, discrete, continuous, models, data) (effectively) Unbounded networks (the internet, social contacts in the world) Multiple social networks (friendships, co-work, leisure) Multi-modal networks (people and places) Valued networks (amount of trade, signed networks) Event networks (e-mail, armed conflict)

Outline of Presentation Social Network Analysis [8] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Why do we want to analyze social networks? Social Network Analysis [9] Because it s cool and fun. Because we want information we can t get other ways. Learn about nodes Learn about relations Learn about processes Learn about the particular network Learn about the process the network came from

Outline of Presentation Social Network Analysis [10] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

What can we know about a social network? Social Network Analysis [11] All nodes, all relations All nodes, some relations Some nodes, some relations Others we won t emphasize here: Attributes of nodes (where do your facebook contacts live?) Attributes of relations (who is your *best* friend?) Information on processes over the network (how much trade between countries?)

Partially-Observed Social Network Data Social Network Analysis [12] Some portion of the social network is often unobserved.

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [13]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [14]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [15]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [16]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [17]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [18]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [19]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [20]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [21]

Partial Observation of Social Networks Sampling Design: Choose which part to observe: Ask 10% of employees about their collaborations Egocentric Adaptive Out-of-design Missing Data: Try to survey the whole company, but someone is out sick Boundary Specification Problem: Should a contractor be considered a part of the company? Social Network Analysis [22]???

Outline of Presentation Social Network Analysis [23] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Frameworks for Statistical Analysis Social Network Analysis [24] Describe Describe Structure Mechanism Fully Observed Description Modeling Data (Statistical) Partially Observed Design-Based Likelihood Data Inference Inference

Outline of Presentation Social Network Analysis [25] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Descriptive Analysis Social Network Analysis [26] Centrality Balance Reachability and path length Clustering (e.g. Facebook [image from Bernie Hogan, Oxford])

Outline of Presentation Social Network Analysis [27] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

The Horvitz-Thompson Estimator of a Total Social Network Analysis [28] I have a sample of students in a classroom and I want to estimate the total amount of money spent on textbooks in the class. ˆT HT = π i s i Where y i is the amount spent by student i, and π i is the probability i sampled. E( ˆT HT ) = i P op V ar( ˆT HT ) = y i y i π i = π i i P op j P op i P op ij y i π i y j π j y i = T Where ij = π ij π i π j Where ˇ ij = π ij π i π j π ij ˆ V ar( ˆT HT ) = i s j s ˇ ij y i π i y j π j

Social Network Analysis [29] Horvitz-Thompson Estimator for Number of Edges How many friendships are in a classroom? I choose a simple random sample of n students, from a classroom of N students. I look at the web space of each student in the sample, and identify all other students in the class with whom they share friendships I use this data to form a Horvitz-Thompson estimator for the total number of friendships in the classroom ˆT HT = i s V ar( ˆT HT ) = i s j / s or j>i j / s or j>i π ij k s l / s or l>k ij,kl y ij π ij y kl π kl y ij Where ij,kl = π ij,kl π ij π kl

The π ij s Social Network Analysis [30] π ij = 1 ( N 2 n ( N n ) ) = n(2n n 1) N(N 1) for i j π ij,kl = p(exactly one end of each sampled) + p(two end of one, one of the other sampled) + p(both ends of both sampled) ) ) = 2 2 ( N 4 n 2 ( N n) + 2 2 ( N 4 n 3 ( N n) + ( N 4 n 4 ( N n) ) for i, j, k, l unique.

Link-Tracing Designs and HT Estimation Social Network Analysis [31] Consider: π i = 1 ( N mi n ( N n) ) Where m i is the number of initial students that would have landed i in the sample. 1-Wave link-tracing: m i = 1+ number of friends of i, observed! 2-Wave link-tracing: m i = 1+ number of friends and friends of friends of i, NOT observed!

Social Network Analysis [32] Limitation: Sampling probabilities often not obserable Observable sampling probabilities under various sampling schemes for directed and undirected networks. Sampling Nodal Probabilities π i Dyadic Probabilities π ij Scheme Undirected Directed Undirected Directed Ego-centric X X X X One-Wave X k Wave, 1 < k < Saturated X Nodal and dyadic sampling probabilities are considered separately. X indicates observable sampling probabilities, while a blank indicates unobservable sampling probabilities.

Frameworks for Statistical Analysis Social Network Analysis [33] Describe Describe Structure Mechanism Fully Observed Description Modeling Data (Statistical) Partially Observed Design-Based Likelihood Data Inference Inference

Outline of Presentation Social Network Analysis [34] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Social Network Analysis [35] Modeling Social Network Data - Independence Models logitp (Y ij = 1 X) = β 0 + β 1 (same sex) + β 2 (i older than j) +.. Logistic regression form. Each pair (dyad) independent of all others. Scientific questions answered by comparing estimated coefficients, β i, to 0.

Social Network Analysis [36] More complex models: Exponential-family Random Graph Models Sometimes, we are interested in more complex relational structures. Need to look at whole graph at once, can t separate dyads. Exponential-family Random Graph Model (ERGM): (Holland and Leinhardt (1981), Snijders et al, (2006),... ) P β (Y = y) = c(β)e β 1g 1 (y)+β 2 g 2 (y)+... g(y) represent components of the social process c(β) is the normalizing constant

Social Network Analysis [37] Fitting Models to Partially Observed Social Network Data Two types of data: Observed relations (Y obs ), and indicators of units sampled (D). P (Y obs, D β, δ) = P (Y, D β, δ) Unobserved = P (D Y, δ)p (Y β) Unobserved β is the model parameter δ is the sampling parameter If P (D Y, δ) = P (D Y obs, δ) (adaptive sampling or missing at random) Then P (Y obs, D β, δ) = P (D Y, δ) P (Y β) Unobserved Can find maximum likelihood estimates by summing over the possible values of unobserved, ignoring sampling Sample with Markov Chain Monte Carlo (MCMC)

Social Network Analysis [38] Fitting Models to Partially Observed Social Network Data Two types of data: Observed relations (Y obs ), and indicators of units sampled (D). P (Y obs, D β, δ) = P (Y, D β, δ) Unobserved = P (D Y, δ)p (Y β) Unobserved β is the model parameter δ is the sampling parameter If P (D Y, δ) = P (D Y obs, δ) (adaptive sampling or missing at random) Then P (Y obs, D β, δ) = P (D Y, δ) P (Y β) Unobserved Can find maximum likelihood estimates by summing over the possible values of unobserved, ignoring sampling Sample with Markov Chain Monte Carlo (MCMC)

Outline of Presentation Social Network Analysis [39] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Example: Friendships in a School Social Network Analysis [40] From the National Longitudinal Survey on Adolescent Health - Wave 1: Each student asked to nominate up to 5 male and 5 female friends Sex and Grade available for 89 students, 70 students reported friendships.

Example: Friendships in a School Social Network Analysis [41] From the National Longitudinal Survey on Adolescent Health - Wave 1: Each student asked to nominate up to 5 male and 5 female friends Sex and Grade available for 89 students, 70 students reported friendships.

Example: Friendships in a School Social Network Analysis [42] From the National Longitudinal Survey on Adolescent Health - Wave 1: Each student asked to nominate up to 5 male and 5 female friends Sex and Grade available for 89 students, 70 students reported friendships.

Example: Friendships in a School Social Network Analysis [43] Scientific Question: manner? Do friendships form in an egalitarian or an hierarchal Methodological Question: Can we fit a network model to a network with missing data? Is the fit different from that of just the observed data? P (D Y, δ) = P (D Y obs, δ) (missing at random) Does observed status depend on unobserved characteristics?

Structure of Data Social Network Analysis [44] Up to 5 female friends and up to 5 male friends 89 students in school 70 completed friendship nominations portion of survey

Example: Friendships in a School Social Network Analysis [45] Fit an ERGM to the partially observed data, get coefficients like in logistic regression. Terms in the model: Density: Overall rate of ties Reciprocity: Do students tend to reciprocate nominations? Popularity by Grade: Do students in different grades receive different rates of ties? Popularity by Sex: Do boys and girls receive different rates of ties? Age:Sex Mixing: Rates of ties between older and younger boys and girls Propensity for ties within sex and grade to be transitive (hierarchal) Propensity for ties within sex and grade to be cyclical (egalitarian) Isolation: Propensity for students to receive no nominations

Percent of Possible Relations Realised Social Network Analysis [46] Observed Respondents to Respondents 8.2 Respondents to Non-Respondents 6.2 Non-Respondents to Respondents - Non-Respondents to Non-Respondents - 8.2% 6.2%

Social Network Analysis [47] Goodness of Fit: Percent of Possible Relations Realised Observed Fit Respondents to Respondents 8.2 7.6 Respondents to Non-Respondents 6.2 8.0 Non-Respondents to Respondents - 7.2 Non-Respondents to Non-Respondents - 9.3 8.2% 6.2% 7.6% 8.0% 7.2% 9.3% (e) Observed (f) Fit

Social Network Analysis [48] Goodness of Fit: Percent of Possible Relations Realised Observed Original Diff. Popularity Respondents to Respondents 8.2 7.6 8.1 Respondents to Non-Respondents 6.2 8.0 6.2 Non-Respondents to Respondents - 7.2 7.4 Non-Respondents to Non-Respondents - 9.3 7.1 8.2% 6.2% 7.6% 8.0% 8.1% 6.2% 7.2% 9.3% 7.4% 7.1% (g) Observed (h) Original (i) Differential Popularity

coefficient s.e. Density 1.138 0.19 Sex and Grade Factors Grade 8 Popularity 0.178 0.14 Grade 9 Popularity 0.420 0.16 Grade10 Popularity 0.339 0.16 Grade 11 Popularity 0.256 0.19 Grade 12 Popularity 0.243 0.20 Male Popularity 0.779 0.17 Non-Resp Popularity 0.322 0.10 Sex and Grade Mixing Girl to Same Grade Boy 0.308 0.23 Boy to Same Grade Girl 0.453 0.23 Girl to Older Girl 1.406 0.16 Girl to Younger Girl 1.873 0.21 Girl to Older Boy 1.412 0.14 Girl to Younger Boy 2.129 0.24 Boy to Older Boy 1.444 0.16 Boy to Younger Boy 2.788 0.35 Boy to Older Girl 1.017 0.14 Boy to Younger Girl 1.660 0.18 Mutuality 3.290 0.22 Transitivity Transitive Same Sex and Grade 0.844 0.04 Cyclical Same Sex and Grade 1.965 0.16 Isolation 5.331 0.64 Social Network Analysis [49]

coefficient s.e. Density 1.138 0.19 Sex and Grade Factors Grade 8 Popularity 0.178 0.14 Grade 9 Popularity 0.420 0.16 Grade10 Popularity 0.339 0.16 Grade 11 Popularity 0.256 0.19 Grade 12 Popularity 0.243 0.20 Male Popularity 0.779 0.17 Non-Resp Popularity 0.322 0.10 Sex and Grade Mixing Girl to Same Grade Boy 0.308 0.23 Boy to Same Grade Girl 0.453 0.23 Girl to Older Girl 1.406 0.16 Girl to Younger Girl 1.873 0.21 Girl to Older Boy 1.412 0.14 Girl to Younger Boy 2.129 0.24 Boy to Older Boy 1.444 0.16 Boy to Younger Boy 2.788 0.35 Boy to Older Girl 1.017 0.14 Boy to Younger Girl 1.660 0.18 Mutuality 3.290 0.22 Transitivity Transitive Same Sex and Grade 0.844 0.04 Cyclical Same Sex and Grade 1.965 0.16 Isolation 5.331 0.64 Social Network Analysis [50]

coefficient s.e. Density 1.138 0.19 Sex and Grade Factors Grade 8 Popularity 0.178 0.14 Grade 9 Popularity 0.420 0.16 Grade10 Popularity 0.339 0.16 Grade 11 Popularity 0.256 0.19 Grade 12 Popularity 0.243 0.20 Male Popularity 0.779 0.17 Non-Resp Popularity 0.322 0.10 Sex and Grade Mixing Girl to Same Grade Boy 0.308 0.23 Boy to Same Grade Girl 0.453 0.23 Girl to Older Girl 1.406 0.16 Girl to Younger Girl 1.873 0.21 Girl to Older Boy 1.412 0.14 Girl to Younger Boy 2.129 0.24 Boy to Older Boy 1.444 0.16 Boy to Younger Boy 2.788 0.35 Boy to Older Girl 1.017 0.14 Boy to Younger Girl 1.660 0.18 Mutuality 3.290 0.22 Transitivity Transitive Same Sex and Grade 0.844 0.04 Cyclical Same Sex and Grade 1.965 0.16 Isolation 5.331 0.64 Social Network Analysis [51]

coefficient s.e. Density 1.138 0.19 Sex and Grade Factors Grade 8 Popularity 0.178 0.14 Grade 9 Popularity 0.420 0.16 Grade10 Popularity 0.339 0.16 Grade 11 Popularity 0.256 0.19 Grade 12 Popularity 0.243 0.20 Male Popularity 0.779 0.17 Non-Resp Popularity 0.322 0.10 Sex and Grade Mixing Girl to Same Grade Boy 0.308 0.23 Boy to Same Grade Girl 0.453 0.23 Girl to Older Girl 1.406 0.16 Girl to Younger Girl 1.873 0.21 Girl to Older Boy 1.412 0.14 Girl to Younger Boy 2.129 0.24 Boy to Older Boy 1.444 0.16 Boy to Younger Boy 2.788 0.35 Boy to Older Girl 1.017 0.14 Boy to Younger Girl 1.660 0.18 Mutuality 3.290 0.22 Transitivity Transitive Same Sex and Grade 0.844 0.04 Cyclical Same Sex and Grade 1.965 0.16 Isolation 5.331 0.64 Social Network Analysis [52]

Conclusions, School Friendships Example Social Network Analysis [53] Nominations are reciprocated at a higher rate than random Males receive nominations from other males at a higher rate than females from females Nominations within grade are more likely than outside grade Nominations of older students are more likely than younger students Nominations within sex and grade are more consistent with a hierarchal rather than egalitarian structure More students receive no nominations than we would expect at random.

Frameworks for Statistical Analysis Social Network Analysis [54] Describe Describe Structure Mechanism Fully Observed Description Modeling Data (Statistical) Partially Observed Design-Based Likelihood Data Inference Inference

Injection Drug User Example Social Network Analysis [55] Simulations corresponding to U.S. Center for Disease Control study of Injection Drug Users in New York City Injection Drug Users (IDU) asked how many IDUs they know, and HIV tested. Given 2 coupons to pass to other IDUs they know, to invite them to join study. Sampling continues until sample size 500 (from simulated population 715).

Social Network Analysis [56]

Social Network Analysis [57]

Social Network Analysis [58]

Social Network Analysis [59]

Social Network Analysis [60]

Social Network Analysis [61]

Social Network Analysis [62]

Social Network Analysis [63]

Injection Drug User Example Social Network Analysis [64] Scientific Question: What proportion of Injection Drug Users in New York City are HIV positive? Methodological Question: Can we estimate population proportions from samples starting at a convenience sample of seeds? P (D Y, δ) = P (D Y obs, δ) (adaptive sampling) P (D Y, δ) = P (seeds Y, δ)p (D Y, δ, seeds) = P (seeds)p (D Y obs, δ, seeds) but typically P (seeds Y, δ) P (seeds Y obs, δ)

Structure of Analysis Social Network Analysis [65] Sample: Link-tracing sampling variant - Respondent-Driven Sampling Ask number of contacts - but not who. Can t identify alters. No matrix. Network used as sampling tool Existing Approach: Assume inclusion probability proportional to number of contacts (Volz and Heckathorn, 2008) Assume many waves of sampling remove bias of seed selection Our work: Design-based (describe structure, not mechanism) Fit simple network model to observed data (model-assisted) Correct for biases due to network-based sampling, and observable irregularities

Generalized Horvitz-Thompson Estimator Goal: Estimate proportion infected : N Social Network Analysis [66] µ = 1 N j=1 z i where z i = { 1 node i infected 0 node i uninfected. Generalized Horvitz-Thompson Estimator: i:s ˆµ = i =1 π 1 z i i i:s i =1 π 1 i where S i = { 1 node i sampled 0 node i not sampled π i = P (S i = 1). Key Point: Requires π i i : S i = 1

Simulate Population Simulation Study 1000, 835, 715, 625, 555, or 525 nodes 20% Infected Social Network Analysis [67] Simulate Social Network (from ERGM, using statnet) Mean degree 7 Homophily on Infection: R = Differential Activity: w = Simulate Respondent-Driven Sample P (infected to infected tie) P (uninfected to infected tie) mean degree infected mean degree uninfected 500 total samples 10 seeds, chosen proportional to degree 2 coupons each Coupons at random to relations Sample without replacement Repeat 1000 times! = 1 (or other) Blue parameters varied in study. = 5 (or other)

Social Network Analysis [68] Volz-Heckathorn, w=1 Estimated Proportion Infected 0.10 0.15 0.20 0.25 0.30 50% 60% 70% 80% 90% 95% Sample %:

Social Network Analysis [69] Varying Sample Percentage, w=1.4 Estimated Proportion Infected 0.10 0.15 0.20 0.25 0.30 50% 60% 70% 80% 90% 95% Sample %:

Successive Sampling Mapping Social Network Analysis [70] Nodal Inclusion Probability 0.0 0.5 1.0 1.5 VH Probs. 50% Sample 60% Sample 70% Sample 80% Sample 90% Sample 95% Sample 100% Sample 0 5 10 15 20 Nodal Degree

Social Network Analysis [71] Volz-Heckathorn, w=1.4 Estimated Proportion Infected 0.10 0.15 0.20 0.25 0.30 50% 60% 70% 80% 90% 95% Sample %:

Social Network Analysis [72] SS, w=1.4 Estimated Proportion Infected 0.10 0.15 0.20 0.25 0.30 50% 60% 70% 80% 90% 95% Sample %:

Social Network Analysis [73] All Infected Seeds, varying Homophily, 50% Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Mean VH SS R 1 3 5

Social Network Analysis [74] All Infected Seeds, varying number of seeds, 50% Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Max Wave 4 5 6 Seeds 20 10 6 Mean VH SS

Network-Model Estimator Social Network Analysis [75] Key Points: Goal: Estimate inclusion probabilities to use to weight sampled nodes If we knew the network, we could simulate sampling to estimate inclusion probabilities Approach: 1. Use (weighted) information in the data to estimate network structure 2. Use simulated sampling over estimated network to estimate inclusion probabilities 3. Iterate steps (1) and (2)

Simulation Study Social Network Analysis [76] Set-up: High sample fraction (500 of 715) Infected have more relations than uninfected (twice as many) Initial sample biased (all infected) Truth: 20% Infected Comparison of Estimators: Mean : Naive Sample Mean SH: Salganik-Heckathorn: based on MME of number of cross-relations VH: Existing Volz-Heckathorn Estimator (2008) SS: Another new estimator, not discussed here Net: New Network-Based Estimator

Social Network Analysis [77] All Infected Seeds, varying Homophily Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Mean VH SS R 1 3 5

Social Network Analysis [78] All Infected Seeds, varying Homophily Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Mean SH VH SS R 1 3 5

Social Network Analysis [79] All Infected Seeds, varying Homophily Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Mean SH VH SS MA R 1 3 5

Social Network Analysis [80] All Infected Seeds, varying number of seeds (waves) Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Max Wave 4 5 6 Seeds 20 10 6 Mean VH SS

Social Network Analysis [81] All Infected Seeds, varying number of seeds (waves) Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Max Wave 4 5 6 Seeds 20 10 6 Mean SH VH SS

Social Network Analysis [82] All Infected Seeds, varying number of seeds (waves) Expected Prevalence Estimate (Truth = 0.20) 0.10 0.15 0.20 0.25 0.30 Max Wave 4 5 6 Seeds 20 10 6 Mean SH VH SS MA

Social Network Analysis [83] HIV Prevalence among IDU in an Eastern European City Estimated Prevalence 0.76 0.78 0.80 0.82 0.84 0.86 0.88 Random Correct Seeds & Seeds Seeds Offspring Mean SH VH SS MA (+/ 1se)

Wave 10 9 8 7 6 5 4 3 2 1 0 Total Social Network Analysis [84] Recruitment Rates Uninfected Recruiter Avg Infected Recruiter Avg 7 0 24 0 8 2 1 3 0.93 17 11 5 0.75 4 2 2 1.25 15 21 8 1.08 2 1 1 3 1.71 10 2 4 4 1.1 4 1 1 1 0.86 9 5 2 4 1.05 1 2 1 1 9 1 2 6 1.28 2 2 0.5 11 4 2 4 0.95 6 2 7 1.67 1 0 8 1 1 4 1.07 1 0 7 1 1 4 1.15 1 2 3 2.33 30 8 6 9 0.89 119 2019 49 0.99 Legend: Number of Recruits: 0 1 2 3

Social Network Analysis [85] HIV Prevalence among IDU in an Eastern European City Estimated Prevalence 0.76 0.78 0.80 0.82 0.84 0.86 0.88 Random Correct Seeds & Seeds Seeds Offspring Mean SH VH SS MA (+/ 1se)

Social Network Analysis [86] HIV Prevalence among IDU in an Eastern European City Estimated Prevalence 0.76 0.78 0.80 0.82 0.84 0.86 0.88 Random Correct Seeds & Seeds Seeds Offspring Mean SH VH SS MA (+/ 1se)

Methodological Social Network Analysis [87] Network-Model estimator corrects for differential activity by infection status, unlike sample mean. Network-Model estimator uses appropriate sample weights for simulated high sample fraction, unlike sample mean or Volz-Heckathorn estimator. Network-Model estimator corrects for seed bias, unlike any existing method.

Outline of Presentation Social Network Analysis [88] 1. What is a Social Network? 2. Why do we want to analyze a social network? 3. What can we know about a social network? 4. How do we analyze a social network? 5. Descriptive Analysis 6. Design-Based Inference 7. Model-based (likelihood) Inference 8. Examples 9. Discussion

Discussion Social Network Analysis [89] Examples School Friendships: Describe process generating relations. Missing data Injecting Drug User: Describe nodes in actual population. Network for sampling Network data have many types of complexity (nested nodes/edges, time, attributes, boundaries, sampling No single dominant approach Can only correct for what we can measure - data and scientific question related Room for future research: deeper, also broader.

References Social Network Analysis [90] Missing Data and Sampling Little, R. J.A. and D. B. Rubin, Second Edition (2002). Statistical Analysis with Missing Data, John Wiley and Sons, Hoboken, NJ. Thompson, S.K., and G.A. Seber (1996). Adaptive Sampling John Wiley and Sons, Inc. New York. Modeling Social Network Data with Exponential-Family Random Graph Models Handcock, M.S., D.R. Hunter, C.T. Butts, S.M. Goodreau, and M. Morris (2003) statnet: An R package for the Statistical Modeling of Social Networks. URL: http://www.csde.washington.edu/statnet. Holland, P.W., and S. Leinhardt (1981), An exponential family of probability distributions for directed graphs, Journal of the American Statistical Association, 76: 33-50. Snijders, T.A.B., P.E. Pattison, G.L. Robins, and M.S. Handcock (2006). New specifications for exponential random graph models. Sociological Methodology, 99-153. Inference with Partially-Observed Network Data Frank, O. (1971). The Statistical Analysis of Networks Chapman and Hall, London. Frank, O., and T.A.B. Snijders (1994). Estimating the size of hidden populations using snowball sampling. Journal of Official Statistics, 10: 53-67. Gile, K.J. (2008). Inference from Partially-Observed Network Data. PhD. Dissertation. University of Washington, Seattle. Gile, K. and M.S. Handcock (2006). Model-based Assessment of the Impact of Missing Data on Inference for Networks. Working paper, Center for Statistics and the Social Sciences, University of Washington. Handcock, M.S., and K. Gile (2007). Modeling social networks with sampled data. Technical Report, Department of Statistics, University of Washington. Thompson, S.K. and O. Frank (2000). Model-Based Estimation With Link-Tracing Sampling Designs. Survey Methodology, 26: 87-98. Other Harris, K. M., F. Florey, J. Tabor, P. S. Bearman, J. Jones, and R. J. Udry (2003). The National Longitudinal Study of Adolescent Health: Research design. Technical Report, Carolina Population Center, University of North Carolina at Chapel Hill. E-mail: gile@math.umass.edu Thank you for your attention!