1 The Social Statistics Discipline Area, School of Social Sciences Session 5x: Bonus material Mitchell Centre for Network Analysis Johan Koskinen Workshop: Mon-Fri, 7- July 24 Advanced Meths SNA, Manchester

2 Session 5x: Bonus material q Bayesian analysis q Missing data q Snowball sampled data q Fitting ERGM to LARGE data sets q Spatial Embedded networks q Multilevel ERGM q Longitudinal ERGM

3 Bayesian analysis in MPNet

4 Bayesian inference (in MPNet): Fishermen Bayesian estimation Go back into Select parameters, start afresh by clearing all and then select edge, ASA, ATA BE PATIENT. Bayesian estimation can be slower (we are working on automation).

5 Quick MCMC settings for Bayesian We need a slightly large multiplication factor than for non- Bayesian estimation Maximum lag should be chosen to be roughly the lag where SACF is (in order for ESS to be correct) roughly 2-4 If model is good we can use Pre- tuning only to get good initial values The objective is to get high acceptance rate around.85 Run number of small MCMC sample sizes and press update When pre- tuning not too bad check Nonconditional simulation and press update (the latter to start in a better place and get proposal covariance) The objective is to get acceptance rate around.25 and SACF around lag 2 small and ESS large If acceptance rate too small (say smaller than.5) reduce Proposal scaling (e.g. divide by 2); if too large (say greater than.45) increase Proposal scaling (e.g. multiply by 2) Once SACF at large lags (say or 2) is low (say, around.) you can Improve the ESS by making the MCMC sample size bigger If you have a good run and want the perfect run read in Covariance file

6 Bayesian analysis in MPNet ts(output[, k + ]) Set multiplication factor to 8 Scale.5 MCMC sample size Max lag 2 Scaled identity After run EdgeA Time Note that Inverse D matrix is diagonal 2 6 ts(output[, k + ]) ASA 2 6 Time ts(output[, k + ]) ATA 2 6 Time This run just got us close to where we want to be Inverse D matrix: Acceptance rate:.42 Estimation results Effects Lambda PostMean Stddev EdgeA * ASA ATA * SACF Effect 3 9 ESS(2) EdgeA ASA ATA

7 Bayesian analysis in MPNet Set multiplication factor to 8 Scale.5 MCMC sample size Max lag 2 Scaled identity After run Note that Inverse D matrix is diagonal Press Update MCMC sample size 55; Parameter burnin 5 Proposal scaling. Nonconditional simulation This run just got us close to where we want to be We want to draw values roughly here BUT more efficiently (by setting a better Proposal variance than the diagonal)

8 Bayesian analysis in MPNet ts(output[, k + ]) EdgeA 2 4 Time SACF for EdgeA ts(output[, k + ]) ASA 2 4 Time SACF for ASA ts(output[, k + ]) ATA 2 4 Time SACF for ATA - > It is moving around really well BUT it takes too small steps (acceptance ratio:.93) Inverse D matrix: ACF Re- run with longer moves ESS: 4 Set Proposal scaling.5 rerun ACF ESS: 2 ACF ESS: 4 Acceptance rate:.93 Estimation results Effects Lambda PostMean Stddev EdgeA * ASA ATA * SACF Effect 3 9 ESS(2) EdgeA ASA ATA

9 Bayesian analysis in MPNet ts(output[, k + ]) EdgeA 3 5 Time ts(output[, k + ]) ASA 3 5 Time ts(output[, k + ]) ATA 3 5 Time - > It is moving around really well AND it takes Nice LONG strides (acceptance ratio: Acceptance rate:.34 Estimation results Effects Lambda PostMean Stddev EdgeA * ASA ATA * ACF SACF for EdgeA ACF SACF for ASA ACF SACF for ATA SACF Effect 9 ESS(2) EdgeA ASA ATA ESS: 229 ESS: 22 ESS: 83 Effective sample size. We want these to be larger than 5, else the Stddev:s are misleading. Here, as acceptance rate GOOD, rerun the estimation with larger MCMC sample size Here SACF is almost zero allready at lag 3!!!

10 Bayesian analysis in MPNet The output file, [session name]_posterior_bayesian.txt contains the Bayesian posteriors. EdgeA Frequency Frequency Frequency ASA ATA ASA ATA EdgeA EdgeA

11 Missing data in MPNet

12 Session 4: More complex models q The datafile miss2.txt is an 85X85 randomly simulated matrix with a density of.2. This will be a matrix equivalent 2% missing data at random. q The datafile fish_miss2.txt is the fishermen data except that all the s in miss2.txt are set to zero. In other words, fish_miss2.txt can be regarded as the fishermen s network with 2% missing data (both s and s) q Note that to use the missing data estimation in MPNET, you need to have an indicator matrix with entered into every missing cell, and all missing cells in the original data have to be entered as o s.

13 Session 4: More complex models Our results Effects Lambda PostMean Stddev EdgeA * ASA ATA * In MPNET, under the Bayesian estimation tab: Enter the fish_miss2 file to be estimated Enter the miss2 as the missing indicators file, Select parameters and clear any previous parameter values (i.e. start from ) Conduct Bayesian estimation for an edge, AS and AT model. Use 3 as the MCMC sample size.

14 Session 4: More complex models Make sure no ME Are all zeros really zeros In principle valid for sampled data (admissible) MNAR impossible to check (but robustness can be assessed) Are missing data different than observed If attributes are missing we can use a similar technique of data- augmentation (not in Pnet yet)

15 Unobserved data: snowball sampling

16 Unobserved data: snowball sampling

17 Unobserved data: snowball sampling

18 Unobserved data: snowball sampling

19 Unobserved data: snowball sampling

20 Unobserved data: snowball sampling

21 Unobserved data: snowball sampling missing data observed data

22 Sampling in/on networks - x = =

23 Sampling in/on networks - - x = =

24 Sampling in/on networks - - x = =

25 Sampling in/on networks - - x = =

26 Sampling in/on networks - - x = =

27 Sampling in/on networks = x =

28 Ignoring non-sampled = x =

29 What about alter alter across ego = x =

30 Unobserved data: snowball sampling Making some (brave) assumptions (Handcock & Gile 2) we can fit an ERGM (Wang et al. 23) to snowball sampled networks Importance sampling MCMCMLE (Handcock & Gile 2) Stochastic approximation and the missing data principle (Orchard & Woodbury,972) (Koskinen & Snijders, 23) Bayesian data augmentation (Koskinen, Robins & Pattison, 2,23) (MPNet) Conditional MLE (Pattison, Robins, Snijders & Wang, 23)(SnowPNet)

31 Unobserved data: snowball sampling Bayesian data augmentation (Koskinen, Robins & Pattison, 2,23) (MPNet) Need to know N Need to simulate un-observed ties Time-consuming Conditional MLE (Pattison, Robins, Snijders & Wang, 23)(SnowPNet) No need to know N No need to simulate un-observed data properties of conditional MLE unclear

32 Estimating ERGM for LARGE networks

33 Stivala et al. (24) Take many small snowball samples from your LARGE N network Estimate Conditional MLE for each (Pattison, Robins, Snijders & Wang, 23) Pool estimates using Meta-analysis techniques

34 Stivala et al. (24)

35 Stivala et al. (24)

36 Stivala et al. (24)

37 Spatial embedding

38 Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia

39 Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia spatially embedded... all living within 4 kilometres of each other

40 Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia spatially embedded... all living within 4 kilometres of each other

41 Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia Bernoulli conditional on distance Empirical probability... all living within 4 kilometres of each other

42 Spatial embedding (Book Ch. 8) Spatial interaction function: Tie probability as a function of distance E.g. Attenuated Power-Law: p Pr(X ij = d ij ) = γ +αd ij

43 Spatial embedding (Book Ch. 8) Spatial interaction function: Tie probability as a function of distance The Attenuated Power-Law: Is equivalent to: Pr(X = x D = (d ij )) = Pr(X ij = d ij ) = p +αd ij γ exp{θ x i< j ij +θ 2 x ij log(d ij )} i< j exp{θ u ij +θ 2 u ij log(d ij )} u X p = α = e θ with: γ = θ 2 AND: log(d ij ) i< j i< j

44 Spatial embedding (Book Ch. 8) Edges -4.87* (.3) Alt. star Alt. triangel Log distance Age heterophily Gender homophily -.7* (.) -.3* (.6)

45 Spatial embedding (Book Ch. 8) Edges -4.87* (.3).56* (.65) Alt. star Alt. triangel Log distance -.78* (.8) Age heterophily Gender homophily -.7* (.) -.7* (.) -.3* (.6) -.3 (.69)

46 Spatial embedding (Book Ch. 8) Edges -4.87* (.3).56* (.65) -4.79* (.66) Alt. star -.86* (.8) Alt. triangel 2.74* (.5) Log distance -.78* (.8) Age heterophily Gender homophily -.7* (.) -.7* (.). (.7) -.3* (.6) -.3 (.69).9 (.83)

47 Spatial embedding (Book Ch. 8) ERGM: distance and endogenous dependence explain different things Edges -4.87* (.3).56* (.65) -4.79* (.66) -.2 (.87) Alt. star -.86* (.8) -.86* (.2) Alt. triangel 2.74* (.5) 2.69* (.4) Log distance -.78* (.8) -.56* (.7) Age heterophily Gender homophily -.7* (.) -.7* (.). (.7).2 (.6) -.3* (.6) -.3 (.69).9 (.83).7 (.47)

48 Bipartite and Multilevel ERGM

49 Multilevel B rs The B- network Level B X ir The X- network Level A A ij The A- network

50 Multilevel Network statistics can be derived based on the same dependence assumptions Different interpretation as we assume dependencies between tie- variables of different types = = = = Q Q Q Q Q Q Q Q Q Q Q Q Q x b a z x b z x a z x z b z a z b B x X a A ),, ( ), ( ), ( ) ( ) ( ) ( exp ) ( ),, Pr( θ θ θ θ θ θ θ κ Three network variables A, B and X Within level effects Between level effects Interaction between within level and between level networks Cross level effects

51 Multilevel Bernoulli Markov Affiliation based activity Affiliation based closure or homophily Social circuit and three- path Affiliation assortativity Cross- level assortativity/entrainment

52 Multilevel

53 Multilevel: example, global fisheries governance (Hollway & Koskinen, 24)

54 Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) IRQ HTI ERI CPV TGO GUY BEN SLE LBR GIN SOM KWT GHA GRD BEL SDN LBY VUT LCA FSM FJI GMB BRB AGO CIV LBN PRK MDG SLB DMA NGA IRL TUV DJI QAT MUS DZA MMR PNG URY CMR MLT BHS KHM ISR GEO SYC KNA WSM CYP VCT MHL SYR SVN BHR ATG BIH BGD MNE LKA ARG TUN CAN CHN OMN MAR BRN SGP ALB IND VNM IRN YEM PAK HND IDN MYS NOR TUR PHL ARE GNQ GRC TTO PER PRT SLV BGR DNK MOZ HRV SAU ESP CHL ROK ISL PAN POL JAM THA COD CUB UKR FIN PLW SWE JPN GBR ZAF TLS LVA GAB NAM EGY TON TZA KEN LTU NZL RUS BLZ CRI SEN GTM VEN COL BRA MEX ITA FRA NLD KIR SUR EST AUS USA JOR DOM MRT COG ROU NIC DEU GNB ECU MCO BOL PRY SMR LIE TCD LSO MNG AND LUX BLR BWA CZE CAF ZWE MLI KGZ BTN ARM NER BFA SWZ AFG TJK SVK COM UZB NPL HUN RWA CHE MDA AZE YUG AUT TKM STP ETH KAZ MKD MDV BDI ZMB LAO UGA MWI TWN NRU EU IRQ HTI ERI CPV TGO GUY BEN SLE LBR GIN SOM KWT GHA GRD BEL SDN LBY VUT LCA FSM FJI GMB BRB AGO CIV LBN PRK MDG SLB DMA NGA IRL TUV DJI QAT MUS DZA MMR PNG URY CMR MLT BHS KHM ISR GEO SYC KNA WSM CYP VCT MHL SYR BHR SVN ATG BIH BGD MNE LKA ARG TUN CAN CHN OMN MAR BRN SGP ALB IND VNM IRN YEM PAK HND IDN MYS NOR TUR PHL ARE GNQ GRC TTO PER PRT SLV BGR DNK MOZ HRV SAU ESP CHL ROK ISL PAN POL JAM THA COD CUB UKR FIN PLW SWE JPN GBR ZAF TLS LVA GAB NAM EGY TON TZA KEN LTU NZL RUS BLZ CRI SEN GTM VEN COL BRA MEX ITA FRA NLD KIR SUR EST AUS USA JOR DOM MRT COG ROU NIC DEU GNB ECU MCO BOL PRY SMR LIE TCD LSO MNG AND LUX BLR BWA CZE CAF ZWE MLI KGZ BTN ARM NER BFA SWZ AFG TJK SVK COM UZB NPL HUN RWA CHE MDA AZE YUG AUT TKM STP ETH KAZ MKD MDV BDI ZMB LAO UGA MWI TWN NRU The A- network The X- network The B- network

55 Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) The B- network

56 global fish. (Hollway & Koskinen, 24) Effects Parameter Stderr t- ra/o SACF EdgeA ASA * ATA * GDP_SumA GDP_ProductA species_suma * distance_edgea * XEdge * IsolatesA * XASA * XASB * XACA * XACB loggdpstatetreat_xedge * Star2AX * Star2BX * TriangleXBX * L3XBX * ATXBX * L3AXB

57 Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) EdgeA Star2A Star3A Star4A Star5A TriangleA ASA ASA ATA A2PA AETA coast_suma coast_differencea coast_producta GDP_SumA GDP_DifferenceA GDP_ProductA species_suma species_differencea species_producta distance_edgea XEdge XStar2A XStar2B XStar3A XStar3B X3Path X4Cycle XECA XECB

58 Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) IsolatesA IsolatesB XASA XASB XACA XACB XAECA XAECB loggdpstatetreat_xe dge Star2AX StarAAX StarAXA StarAXAA TriangleXAX L3XAX ATXAX EXTA Star2BX StarABX StarAXB StarAXAB TriangleXBX L3XBX ATXBX EXTB L3AXB C4AXB ASAXASB

59 Longitudinal ERGM

60 LERGM FDI electricity market (Koskinen and Lomi, 23)

61 LERGM FDI electricity market (Koskinen and Lomi, 23)

62 LERGM FDI electricity market (Koskinen and Lomi, 23)

63 LERGM FDI electricity market (Koskinen and Lomi, 23)

