New Specifications for Exponential Random Graph Models

Similar documents
PROBABILISTIC NETWORK ANALYSIS

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Equivalence Concepts for Social Networks

Social Networks 31 (2009) Contents lists available at ScienceDirect. Social Networks

Statistical Analysis of Complete Social Networks

ARTICLE IN PRESS Social Networks xxx (2013) xxx xxx

Statistical Models for Social Networks

Statistical Analysis of Social Networks

Accounting for Degree Distributions in Empirical Analysis of Network Dynamics

Imputation of missing network data: Some simple procedures

Social Media Mining. Network Measures

IEMS 441 Social Network Analysis Term Paper Multiple Testing Multi-theoretical, Multi-level Hypotheses

StOCNET: Software for the statistical analysis of social networks 1

A case study of social network analysis of the discussion area of a virtual learning platform

How To Understand And Solve A Linear Programming Problem

Reliability Reliability Models. Chapter 26 Page 1

Social network analysis with R sna package

Inferential Network Analysis with Exponential Random. Graph Models

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

SOFTWARE FOR STATISTICAL ANALYSIS OF SOCIAL NETWORKS

SGL: Stata graph library for network analysis

Virtually There: Exploring Proximity and Homophily in a Virtual World

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Practical Graph Mining with R. 5. Link Analysis

10.2 Series and Convergence

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

E3: PROBABILITY AND STATISTICS lecture notes

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Introduction to Stochastic Actor-Based Models for Network Dynamics

Follow links for Class Use and other Permissions. For more information send to:

Chapter 3 RANDOM VARIATE GENERATION

Quantitative Methods for Finance

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Fairfield Public Schools

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Common Core State Standards for Mathematics Accelerated 7th Grade

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

IBM SPSS Modeler Social Network Analysis 15 User Guide

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS. Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

1 The Brownian bridge construction

Enterprise Organization and Communication Network

Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Simple linear regression

Introduction to social network analysis

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

2. Filling Data Gaps, Data validation & Descriptive Statistics

Common Core Unit Summary Grades 6 to 8

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Model-based Synthesis. Tony O Hagan

Vote Dilution: Measuring Voting Patterns by Race/Ethnicity. Presented by Dr. Lisa Handley Frontier International Electoral Consulting

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

with functions, expressions and equations which follow in units 3 and 4.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chapter G08 Nonparametric Statistics

Groups and Positions in Complete Networks

Special Situations in the Simplex Algorithm

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

COMMON CORE STATE STANDARDS FOR

Java Modules for Time Series Analysis

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Reinforcement Learning

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

1 o Semestre 2007/2008

Lecture Notes Module 1

Chapter 1 Introduction. 1.1 Introduction

Chapter 4. Probability and Probability Distributions

Circuits 1 M H Miller

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

Strong and Weak Ties

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

An overview of Software Applications for Social Network Analysis

Social Media Mining. Graph Essentials

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

5 Directed acyclic graphs

Graph Theory and Networks in Biology

The global R&D network:

Statistics 2014 Scoring Guidelines

MATHEMATICAL METHODS OF STATISTICS

Lab 11. Simulations. The Concept

USE OF GRAPH THEORY AND NETWORKS IN BIOLOGY

Pennsylvania System of School Assessment

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

3 Some Integer Functions

RISK MITIGATION IN FAST TRACKING PROJECTS

Transcription:

New Specifications for Exponential Random Graph Models Garry Robins University of Melbourne, Australia Workshop on exponential random graph models Sunbelt XXVI April 2006 http://www.sna.unimelb.edu.au/ What counts as a good model for a social network. Goodness of fit New specifications for exponential random graph (p*) models Degree sequences; higher order triangulation; multiple connectivity Nondirected and directed networks An empirical example Social selection models example: exogenous attributes or endogenous network effects? Concluding remarks 1

What counts as a good statistical model for an observed social network? 1. Models must be estimable from data. 2. Model parameters should imply model statistics consistent with those of the observed graph. 3. A good model will imply graphs with other features that are consistent with the observed graph. - path lengths (geodesic distribution) - clustering (triangle formation) - degree distribution - denser regions (cohesive subsets of nodes) 4. An excellent model will also successfully predict the presence or absence of network ties. DATA MODEL MODEL COEFFICIENTS SIMULATED DATA (draws from the prob. dist.) GRAPH STATISTICS OF DATA GRAPH STATISTICS OF SIMULATED DATA GOODNESS OF FIT OF MODEL TO DATA 2

Markov random graph models Notation: Network variables: Y ij = 0, 1 Y = [Y ij ], the set of all variables y = [y ij ], the observed graph. Sufficient statistics for a homogenous Markov graph model are the numbers of: Edges Stars of different types Triangles in y A Markov random graph model: Nondirected networks Pr(Y = y) = (1/κ) exp{θ L + σ 2 S 2 + σ 3 S 3 + τ T} Edge parameter (θ) L number of edges Star parameters (σ k ) Propensities for individuals to have connections with multiple network partners Triangle parameter (τ) - represents clustering If θ is the only nonzero parameter, this is a Bernoulli random graph model. 3

Example: Florentine families business network (Padgett & Ansell, 1993) Markov model estimates: Florentine families business network Model containing edges, 2-stars, 3-stars, triangles Monte Carlo Max. Likelihood estimates (pnet) Edge = - 4.27 (1.13)* 2-star = 1.09 (0.65) 3-star = -0.67 (0.41) Triangle= 1.32 (0.65)* 4

Goodness of fit: Florentine Business network Comparing observed network to a sample from simulation of Markov model: t-statistics Model statistics Edges: t = -0.01 2-stars: t = -0.01 3-stars: t = 0.01 triangles: t = -0.03 Degree distribution std dev degree dist: t = 0.51 skew deg dist: t = 0.17 Clustering global clustering: t = 0.12 local clustering: t = 0.68 Geodesic distribution: None of the quartiles of the geodesic distribution for the observed graph are extreme in the distribution Observed graph No of edges in simulated graphs No of edges in simulated graphs against simulation number (id) 5

Degree distributions of simulated graphs Observed degree distribution Observed graph Standard deviation of degree distribution of simulated graphs But the same model is degenerate for the combined business and marriage networks: there are no coherent maximum likelihood parameter estimates simulating from pseudo-likelihood estimates produces complete (or almost complete) graphs 6

Papers on new specifications For advance copies, go to: http://www.sna.unimelb.edu.au/ Original paper: Snijders, Pattison, Robins, & Handock (2006). New specifications for exponential random graph models. Sociological Methodology. In press. Forthcoming in Social Networks: Goodreau (2006). Advances in Exponential Random Graph (p*) Models Applied to a Large Social Network. Hunter (2006). Curved exponential family models for social networks. Robins, Pattison, Kalish, & Lusher (2006). An introduction to exponential random graph (p*) models for social networks. Robins, Snijders, Wang, Handcock, & Pattison (2006). Recent developments in exponential random graph (p*) models for social networks. Also see Hunter & Handcock (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics. In press. New specifications for exponential random graph models Estimation, simulation and goodness of fit software: Statnet: http://csde.washington.edu/statnet More tomorrow morning SIENA: http://stat.gamma.rug.nl/siena.html pnet: http://www.sna.unimelb.edu.au/pnet/pnet.html 7

New specifications for exponential random graph models 1. Parameters for degree sequences: a. Alternating k-stars b. Geometrically weighted degree distributions 2. Parameters for higher order triangulation a. Alternating k-triangles b. Edge-wise shared partner distributions 3. Parameters for higher order connectivity a. Alternating 2paths b. Dyad-wise shared partner distributions Related model parameters Star configurations Parameters σ 2 σ 3 σ 4 Markov models with star effects only (ignore triangles) Pr(Y = y) = (1/κ) exp{θ L + σ 2 S 2 + σ 3 S 3 + σ 4 S 3 + } We can combine all the star effects into the one parameter by setting constraints among the star parameters: σ 2, σ 3, σ 4. We have made an hypothesis about constraints based on experience with model parameters (and also on mathematical convenience.) 8

Related model parameters Star configurations Parameters σ 2 σ 3 σ 4 Assume that σ k = -σ k-1 /λ, for k >1 and λ 1 a (fixed) constant alternating k-star hypothesis For this presentation I assume that λ = 2 Then we obtain a single star parameter (σ 2 ) with statistic: S [λ] (y) = k (-1) k S k (y)/λ k-2 alternating k-star statistic Note that if λ =1 and the edge parameter is included, the no. of isolated nodes is modeled separately 1. Parameters for degree sequences: a. alternating k-star parameters S S S z( y) = S +... + ( 1) n n n λ λ λ 3 4 2 1 2 2 3 S 2 S 3 Interpretation: Positive parameter indicates centralization through a small number of high degree nodes core-periphery based on popularity More dispersed degree distribution S 4 Negative parameter: truncated (less dispersed) degree distribution; nodes tend not to have particularly high degrees. 9

Simulated degree distributions: 20 nodes: fixed density = 0.2 Alt.kstar=-3 Alt.kstar=0 Alt.kstar=+3 Higher order models: increasingly dispersed degree distribution 20 nodes: Fix the density at 0.20 all parameters = 0 EXCEPT Vary alternating k-star parameter from 3.0 to +3.0 in steps of +0.5 The movie shows one representative graph from each simulated distribution 10

1. Parameters for degree sequences: b. Geometrically weighted degree distributions An equivalent characterisation: Consider statistics d k (y), where d k (y) is the number of nodes in y of degree k (with corresponding parameters θ k ) Assuming that θ k = e -αk γ for k = 1,2,, n-1 yields the statistic: D [α] (y)= k e -αk d k (y) weighted degree distribution Relationship with alternating k-star statistic: S [λ] (y) = λ 2 D [α] (y) + 2λL(y) n λ 2 See Hunter (2006) λ = e α /(e α -1) 11

2. Parameters for higher order triangulation Cohesive subsets of nodes Require realization dependent neighbourhoods: Alternating k-triangles: 3-triangle 1-triangle 2-triangle (T 1 ) (T 23 ) 2. Parameters for higher order triangulation Alternating k-triangles: T T T u( y) = T +... + ( 1) n n n λ λ λ 2 3 2 2 1 2 3 Interpretation: a. Positive parameter suggests triangles tend to clump together in denser regions of the network (cohesive subsets). b. Models the edgewise shared partner distribution: For each pair of tied nodes, how many partners do they share? (Hunter, 2006) etc 12

Interpretation: With only positive k- triangle effect, there is a coreperiphery type structure based on triangulation Higher order model Positive k-triangle parameter: Parameters: Edge = -4.5 Alt. k-triangle=1.3 65 nodes Interpretation: With positive k- triangle effect, and negative k- star, various regions of greater density distributed across the network. Higher order model Positive k-triangle parameter & negative k-star parameter: Parameters: Edge = -0.5, alt. k-star = -1.5, alt. k-triangle = 2.0 65 nodes 13

Higher order models: increasing triangulation 30 nodes: Fix the density at 0.10 all parameters = 0 EXCEPT Vary alternating k-triangle parameter from 0.0 to +3.0 in steps of +0.25 The movie shows one representative graph from each simulated distribution 14

Higher order models: From centralization to segmentation 60 nodes: Fix the density at 0.05 (i.e. 88 or 89 edges) alternating k-triangle parameter = +2.0 Vary alternating k-star parameter from 0.0 to 1.0 in steps of 0.1 The movie shows one representative graph from each simulated distribution 15

3. Parameters for multiple connectivity Alternating independent 2-paths: 2-indept.2-3-indept.2-2-path (U 1 ) paths (U 23 ) n 2 k 1 2U 2 1 = 1 + λ k = 3 λ z( x) U U k Models the dyadwise shared partner distribution: For each pair of nodes, how many partners do they share? (Hunter, 2006) Why might we want Alternating independent 2-paths? Indpt. 2-paths are lower order to k-triangles. Helps assess whether a clustering (ktriangle) effect relates to the formation of the base of the k-triangle 16

Directed networks: Alternating k-star parameters Snijders, Pattison, Robins & Handcock (2006) Alternating k-in-stars: Statistic: Weighted summation of k-in-stars analogous to the non-directed model Alternating k-out-stars: Statistic: Weighted summation of k-out-stars analogous to the non-directed model Directed network parameters Snijders, Pattison, Robins & Handcock (2006) Alternating directed k-triangles: Statistic: weighted sum of directed k-triangles with 2-paths as sides of the k-triangle 17

Directed network parameters Snijders, Pattison, Robins & Handcock (2006) Alternating directed k-2paths: Statistic: weighted sum of k-2paths NB: Robins, Pattison & Wang (2006) propose additional variations for directed network models Classes of exponential random graph models (Single binary networks without node attributes) Dyadic independence Markov random graphs Higher order dependence Non-directed networks Edges (Bernoulli graphs; simple random graphs) Edges, stars, triangles The above plus: Alternating k- stars, k-triangles, alt. 2-paths Directed networks Edges; Mutuality (Also p1 models) Edges, Mutuality, in-,out-,mixedstars,transitive and cyclic triads The above plus: Alternating k- stars, k-triangles, alt. 2-paths 18

Fitting Models: 20 network data sets from UCINET5 (Borgatti, Everett & Freeman, 1999) Non directed networks: Kapferer mine: kapfmm, kapfmu (16 nodes) Kapferer tailor shop: kapfts1, kapfts2 (39 nodes) Padgett Florentine families: padgb, padgm (16 nodes) Read Highland tribes: gamapos (16 nodes) Zachary karate club: Zache (34 nodes) Bank wiring room: rdpos, rdgam (14 nodes) Taro exchange: Taro (22 nodes) Thurman office: Thurm (15 nodes) Fitting Models: 20 network data sets from UCINET5 (Borgatti, Everett & Freeman, 1999) Directed networks: Kapferer tailor shop: kapfti1, kapfti2 (39 nodes) Wolf primates: wolfk (20 nodes) Krackhardt hi-tech managers: friend, advice (21 nodes) Bank wiring room: rdhlp (14 nodes) Knoke bureaucracies: knokm, knoki (10 nodes) 19

Non directed networks Data set Kapfmm Kapfmu Kapfts1 Kapfts2 Padgm Padgb Gamapos Zache Rdpos Rdgam Taro Thurm Markov TOTAL 7/12 Directed networks Data set Kapfti1 Kapfti2 Wolfk Krackhardt friend Krackhardt advice Rdhlp Knokm Knoki Markov TOTAL (directed) TOTAL (nondir.) 5/8 7/12 TOTAL 12/20 20

Non directed networks Data set Kapfmm Kapfmu Kapfts1 Kapfts2 Padgm Padgb Gamapos Zache Rdpos Rdgam Taro Thurm Markov Higher order TOTAL 7/12 12/12 Directed networks Data set Kapfti1 Kapfti2 Wolfk Krackhardt friend Krackhardt advice Rdhlp Knokm Knoki Markov Higher order TOTAL (directed) TOTAL (nondir.) 5/8 7/12 TOTAL 12/20 20/20 21

Example: After hours socialising network After hours network Parameter Estimate Standard error Convergence statistic Arc 1.27 0.63 0.06 * Reciprocity 2.42 0.34 0.06 * k instar (2) 0.86 0.32 0.06 * k outstar (2) 0.96 0.33 0.06 * Alt. k-triangles (2) 1.09 0.14 0.06 * 22

Goodness of fit: After hours network Model statistics Arcs: t = 0.03 Reciprocity: t = 0.03 k instar (2) : t = 0.03 k outstar (2) : t = 0.02 Alt. k-triangles (2): t=0.03 Other Markov statistics 2-in-stars: t = - 0.14 3-in-stars : t = - 0.37 2-out-stars: t = - 0.29 3-out-stars : t = -0.53 2-paths: t= - 0.34 Transitive triads: t= - 0.07 Cyclic triads: t= - 0.37 Other higher order statistics Alt. 2-paths (2): t= - 0.37 Goodness of fit: After hours network Degree distributions standard deviations indegrees: t = - 0.10 outdegrees : t = - 0.56 skew indegrees: t = - 1.05 outdegrees : t = - 1.96 Clustering coefficients Proportion of 2-stars in transitive triads: 2-in-stars: t= 0.40 2-out-stars: t= 1.18 2-paths: t= 1.57 Proportion of 2-paths in cyclic triads: t=-0.13 Geodesic distribution: None of the quartiles of the geodesic distribution for the observed graph are extreme in the distribution 23

Goodness of fit: After hours network Triad census 300: t = -0.5036 210: t = -0.1364 120C: t = -0.3707 120D: t = 1.4142 120U: t = 1.0437 201: t = -0.4520 111D: t = -0.1218 111U: t = -0.5027 030T: t = 2.2584 030C: t = -0.3287 102: t = 0.4598 021D: t = -0.4971 021C: t = -0.9884 021U: t = 0.1030 012: t = 0.3200 003: t = -0.1781 Observed graph 24

500 300 400 250 200 Frequency 300 arc 150 200 100 100 50 S 0 0 0 50 100 150 arc 200 250 300 0 500000 1000000 id 1500000 2000000 What we DON T want to see: the model exhibits two regions! This would indicate a bad model. indegree distribution outdegree distribution 25

Models with nodal attributes: Social selection models There are various ways to introduce actor attributes (binary or continuous) Robins, Elliott & Pattison (2001) e.g. dyad level effects Arc parameters Reciprocation parameters sender effects receiver effects interaction of sender/receiver effects (matching/absolute difference) AFL Concluding remarks The new specifications are a dramatic improvement over the previous models. For many data sets we now have models that are quite convincing in reproducing major features of the network. Larger networks are more difficult to fit; Directed networks are more difficult to fit. This is NOT to say all problems are solved Degeneracy may still be a problem for these models applied to certain data sets (indicates the model specification is not right for that data.) So ongoing work will be required on model specification, BUT Partial conditional dependence models give us a substantial way forward. Combining related parameters into the one function through weighted constraints is parsimonious and helps with model convergence MCMCMLE methods of parameter estimation, and model simulation techniques are a crucial part of these recent developments 26