The Social Statistics Discipline Area, School of Social Sciences Session 5x: Bonus material Mitchell Centre for Network Analysis Johan Koskinen http://www.ccsr.ac.uk/staff/jk.htm! johan.koskinen@manchester.ac.uk Workshop: Mon-Fri, 7- July 24 Advanced Meths SNA, Manchester
Session 5x: Bonus material q Bayesian analysis q Missing data q Snowball sampled data q Fitting ERGM to LARGE data sets q Spatial Embedded networks q Multilevel ERGM q Longitudinal ERGM
Bayesian analysis in MPNet
Bayesian inference (in MPNet): Fishermen Bayesian estimation Go back into Select parameters, start afresh by clearing all and then select edge, ASA, ATA BE PATIENT. Bayesian estimation can be slower (we are working on automation).
Quick MCMC settings for Bayesian We need a slightly large multiplication factor than for non- Bayesian estimation Maximum lag should be chosen to be roughly the lag where SACF is (in order for ESS to be correct) roughly 2-4 If model is good we can use Pre- tuning only to get good initial values The objective is to get high acceptance rate around.85 Run number of small MCMC sample sizes and press update When pre- tuning not too bad check Nonconditional simulation and press update (the latter to start in a better place and get proposal covariance) The objective is to get acceptance rate around.25 and SACF around lag 2 small and ESS large If acceptance rate too small (say smaller than.5) reduce Proposal scaling (e.g. divide by 2); if too large (say greater than.45) increase Proposal scaling (e.g. multiply by 2) Once SACF at large lags (say or 2) is low (say, around.) you can Improve the ESS by making the MCMC sample size bigger If you have a good run and want the perfect run read in Covariance file
Bayesian analysis in MPNet ts(output[, k + ]) Set multiplication factor to 8 Scale.5 MCMC sample size Max lag 2 Scaled identity After run -3.6-3.4-3.2-3. -2.8 EdgeA Time Note that Inverse D matrix is diagonal 2 6 ts(output[, k + ]) -.3 -.2 -.. ASA 2 6 Time ts(output[, k + ])..2.4.6.8 ATA 2 6 Time This run just got us close to where we want to be Inverse D matrix:......... Acceptance rate:.42 Estimation results Effects Lambda PostMean Stddev EdgeA 2. -3.2257.237 * ASA 2. -.862.94 ATA 2..7372.39 * SACF Effect 3 9 ESS(2) EdgeA.973.94.265 4 ASA.95.78.42 4 ATA.82.456 -.53 2
Bayesian analysis in MPNet Set multiplication factor to 8 Scale.5 MCMC sample size Max lag 2 Scaled identity After run Note that Inverse D matrix is diagonal Press Update MCMC sample size 55; Parameter burnin 5 Proposal scaling. Nonconditional simulation This run just got us close to where we want to be We want to draw values roughly here BUT more efficiently (by setting a better Proposal variance than the diagonal)
Bayesian analysis in MPNet ts(output[, k + ]) -4.5-4. -3.5-3. -2.5 EdgeA 2 4 Time SACF for EdgeA ts(output[, k + ]) -.4 -.2..2 ASA 2 4 Time SACF for ASA ts(output[, k + ]).6.7.8.9. ATA 2 4 Time SACF for ATA - > It is moving around really well BUT it takes too small steps (acceptance ratio:.93) Inverse D matrix:.5 -.5. -.5.2 -.. -.. ACF..2.4.6.8. 2 6 Re- run with longer moves ESS: 4 Set Proposal scaling.5 rerun ACF -.2..2.4.6.8. 2 6 ESS: 2 ACF -.2.2.6. 2 6 ESS: 4 Acceptance rate:.93 Estimation results Effects Lambda PostMean Stddev EdgeA 2. -3.6399.385 * ASA 2. -.596.38 ATA 2..7484.8 * SACF Effect 3 9 ESS(2) EdgeA.952.87.47 2 ASA.959.887.499 2 ATA.955.87.37 22
Bayesian analysis in MPNet ts(output[, k + ]) -5-4 -3-2 EdgeA 3 5 Time ts(output[, k + ]) -.6 -.2..2.4.6 ASA 3 5 Time ts(output[, k + ]).4.6.8..2 ATA 3 5 Time - > It is moving around really well AND it takes Nice LONG strides (acceptance ratio: Acceptance rate:.34 Estimation results Effects Lambda PostMean Stddev EdgeA 2. -3.5364.534 * ASA 2. -.223.85 ATA 2..867.22 * ACF..2.4.6.8. SACF for EdgeA ACF..2.4.6.8. SACF for ASA ACF..2.4.6.8. SACF for ATA SACF Effect 9 ESS(2) EdgeA.442 -.7 35 ASA.467 -.43 328 ATA.5.44 25 2 6 2 6 2 6 ESS: 229 ESS: 22 ESS: 83 Effective sample size. We want these to be larger than 5, else the Stddev:s are misleading. Here, as acceptance rate GOOD, rerun the estimation with larger MCMC sample size Here SACF is almost zero allready at lag 3!!!
Bayesian analysis in MPNet The output file, [session name]_posterior_bayesian.txt contains the Bayesian posteriors. EdgeA Frequency Frequency Frequency 4 8 4 8 4 8-5 -4-3 -2 ASA -.8 -.6 -.4 -.2..2.4.6 ATA ASA -.6 -.2..2.4.6-5 -4-3 -2 ATA.4.6.8..2-5 -4-3 -2.4.6.8..2 EdgeA EdgeA
Missing data in MPNet
Session 4: More complex models q The datafile miss2.txt is an 85X85 randomly simulated matrix with a density of.2. This will be a matrix equivalent 2% missing data at random. q The datafile fish_miss2.txt is the fishermen data except that all the s in miss2.txt are set to zero. In other words, fish_miss2.txt can be regarded as the fishermen s network with 2% missing data (both s and s) q Note that to use the missing data estimation in MPNET, you need to have an indicator matrix with entered into every missing cell, and all missing cells in the original data have to be entered as o s.
Session 4: More complex models Our results Effects Lambda PostMean Stddev EdgeA 2. -4.4253.552 * ASA 2..45.96 ATA 2..7759.77 * In MPNET, under the Bayesian estimation tab: Enter the fish_miss2 file to be estimated Enter the miss2 as the missing indicators file, Select parameters and clear any previous parameter values (i.e. start from ) Conduct Bayesian estimation for an edge, AS and AT model. Use 3 as the MCMC sample size.
Session 4: More complex models Make sure no ME Are all zeros really zeros In principle valid for sampled data (admissible) MNAR impossible to check (but robustness can be assessed) Are missing data different than observed If attributes are missing we can use a similar technique of data- augmentation (not in Pnet yet)
Unobserved data: snowball sampling
Unobserved data: snowball sampling
Unobserved data: snowball sampling
Unobserved data: snowball sampling
Unobserved data: snowball sampling
Unobserved data: snowball sampling
Unobserved data: snowball sampling missing data observed data
Sampling in/on networks - x = =
Sampling in/on networks - - x = =
Sampling in/on networks - - x = =
Sampling in/on networks - - x = =
Sampling in/on networks - - x = = - - - -
Sampling in/on networks = x = - - - - - - - -
Ignoring non-sampled = x = - - - - - - - -
What about alter alter across ego = x = - - - - - - - -
Unobserved data: snowball sampling Making some (brave) assumptions (Handcock & Gile 2) we can fit an ERGM (Wang et al. 23) to snowball sampled networks Importance sampling MCMCMLE (Handcock & Gile 2) Stochastic approximation and the missing data principle (Orchard & Woodbury,972) (Koskinen & Snijders, 23) Bayesian data augmentation (Koskinen, Robins & Pattison, 2,23) (MPNet) Conditional MLE (Pattison, Robins, Snijders & Wang, 23)(SnowPNet)
Unobserved data: snowball sampling Bayesian data augmentation (Koskinen, Robins & Pattison, 2,23) (MPNet) Need to know N Need to simulate un-observed ties Time-consuming Conditional MLE (Pattison, Robins, Snijders & Wang, 23)(SnowPNet) No need to know N No need to simulate un-observed data properties of conditional MLE unclear
Estimating ERGM for LARGE networks
Stivala et al. (24) Take many small snowball samples from your LARGE N network Estimate Conditional MLE for each (Pattison, Robins, Snijders & Wang, 23) Pool estimates using Meta-analysis techniques
Stivala et al. (24)
Stivala et al. (24)
Stivala et al. (24)
Spatial embedding
Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia
Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia spatially embedded... all living within 4 kilometres of each other
Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia spatially embedded... all living within 4 kilometres of each other
Spatial embedding (Book Ch. 8) 36 actors in Victoria, Australia Bernoulli conditional on distance Empirical probability... all living within 4 kilometres of each other
Spatial embedding (Book Ch. 8) Spatial interaction function: Tie probability as a function of distance E.g. Attenuated Power-Law: p Pr(X ij = d ij ) = γ +αd ij
Spatial embedding (Book Ch. 8) Spatial interaction function: Tie probability as a function of distance The Attenuated Power-Law: Is equivalent to: Pr(X = x D = (d ij )) = Pr(X ij = d ij ) = p +αd ij γ exp{θ x i< j ij +θ 2 x ij log(d ij )} i< j exp{θ u ij +θ 2 u ij log(d ij )} u X p = α = e θ with: γ = θ 2 AND: log(d ij ) i< j i< j
Spatial embedding (Book Ch. 8) Edges -4.87* (.3) Alt. star Alt. triangel Log distance Age heterophily Gender homophily -.7* (.) -.3* (.6)
Spatial embedding (Book Ch. 8) Edges -4.87* (.3).56* (.65) Alt. star Alt. triangel Log distance -.78* (.8) Age heterophily Gender homophily -.7* (.) -.7* (.) -.3* (.6) -.3 (.69)
Spatial embedding (Book Ch. 8) Edges -4.87* (.3).56* (.65) -4.79* (.66) Alt. star -.86* (.8) Alt. triangel 2.74* (.5) Log distance -.78* (.8) Age heterophily Gender homophily -.7* (.) -.7* (.). (.7) -.3* (.6) -.3 (.69).9 (.83)
Spatial embedding (Book Ch. 8) ERGM: distance and endogenous dependence explain different things Edges -4.87* (.3).56* (.65) -4.79* (.66) -.2 (.87) Alt. star -.86* (.8) -.86* (.2) Alt. triangel 2.74* (.5) 2.69* (.4) Log distance -.78* (.8) -.56* (.7) Age heterophily Gender homophily -.7* (.) -.7* (.). (.7).2 (.6) -.3* (.6) -.3 (.69).9 (.83).7 (.47)
Bipartite and Multilevel ERGM
Multilevel B rs The B- network Level B X ir The X- network Level A A ij The A- network
Multilevel Network statistics can be derived based on the same dependence assumptions Different interpretation as we assume dependencies between tie- variables of different types. + + + + + = = = = Q Q Q Q Q Q Q Q Q Q Q Q Q x b a z x b z x a z x z b z a z b B x X a A ),, ( ), ( ), ( ) ( ) ( ) ( exp ) ( ),, Pr( θ θ θ θ θ θ θ κ Three network variables A, B and X Within level effects Between level effects Interaction between within level and between level networks Cross level effects
Multilevel Bernoulli Markov Affiliation based activity Affiliation based closure or homophily Social circuit and three- path Affiliation assortativity Cross- level assortativity/entrainment
Multilevel
Multilevel: example, global fisheries governance (Hollway & Koskinen, 24)
Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) IRQ HTI ERI CPV TGO GUY BEN SLE LBR GIN SOM KWT GHA GRD BEL SDN LBY VUT LCA FSM FJI GMB BRB AGO CIV LBN PRK MDG SLB DMA NGA IRL TUV DJI QAT MUS DZA MMR PNG URY CMR MLT BHS KHM ISR GEO SYC KNA WSM CYP VCT MHL SYR SVN BHR ATG BIH BGD MNE LKA ARG TUN CAN CHN OMN MAR BRN SGP ALB IND VNM IRN YEM PAK HND IDN MYS NOR TUR PHL ARE GNQ GRC TTO PER PRT SLV BGR DNK MOZ HRV SAU ESP CHL ROK ISL PAN POL JAM THA COD CUB UKR FIN PLW SWE JPN GBR ZAF TLS LVA GAB NAM EGY TON TZA KEN LTU NZL RUS BLZ CRI SEN GTM VEN COL BRA MEX ITA FRA NLD KIR SUR EST AUS USA JOR DOM MRT COG ROU NIC DEU GNB ECU MCO BOL PRY SMR LIE TCD LSO MNG AND LUX BLR BWA CZE CAF ZWE MLI KGZ BTN ARM NER BFA SWZ AFG TJK SVK COM UZB NPL HUN RWA CHE MDA AZE YUG AUT TKM STP ETH KAZ MKD MDV BDI ZMB LAO UGA MWI TWN NRU EU IRQ HTI ERI CPV TGO GUY BEN SLE LBR GIN SOM KWT GHA GRD BEL SDN LBY VUT LCA FSM FJI GMB BRB AGO CIV LBN PRK MDG SLB DMA NGA IRL TUV DJI QAT MUS DZA MMR PNG URY CMR MLT BHS KHM ISR GEO SYC KNA WSM CYP VCT MHL SYR BHR SVN ATG BIH BGD MNE LKA ARG TUN CAN CHN OMN MAR BRN SGP ALB IND VNM IRN YEM PAK HND IDN MYS NOR TUR PHL ARE GNQ GRC TTO PER PRT SLV BGR DNK MOZ HRV SAU ESP CHL ROK ISL PAN POL JAM THA COD CUB UKR FIN PLW SWE JPN GBR ZAF TLS LVA GAB NAM EGY TON TZA KEN LTU NZL RUS BLZ CRI SEN GTM VEN COL BRA MEX ITA FRA NLD KIR SUR EST AUS USA JOR DOM MRT COG ROU NIC DEU GNB ECU MCO BOL PRY SMR LIE TCD LSO MNG AND LUX BLR BWA CZE CAF ZWE MLI KGZ BTN ARM NER BFA SWZ AFG TJK SVK COM UZB NPL HUN RWA CHE MDA AZE YUG AUT TKM STP ETH KAZ MKD MDV BDI ZMB LAO UGA MWI TWN NRU 546 5685 5359 4477 5355 378 84 53423 362 384 753 4 924 383 333 88 5347 87 826 5349 53399 657 5386 237 5433 99 239 27 39 83 227 53638 53759 858 835 889 38 25 73 9 5363 958 964 232 54445 388 27 53563 978 32 33 29 4 99 56 43 53358 2 53833 976 389 234 997 53294 53889 826 87 83 53354 767 732 225 74 848 34 735 536 3 535 53475 33 48 97 587 43 479 96 52 433 426 77 53479 58 425 423 457 476 76 422 4 43 44 5 5379 5375 539 475 28 92 56 44643 73 5325 53896 72 The A- network The X- network The B- network
Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) The B- network
global fish. (Hollway & Koskinen, 24) Effects Parameter Stderr t- ra/o SACF EdgeA - 2.222 9.526 -.2 -.5 ASA.348.36 -.5 -. * ATA.3388.3.7. * GDP_SumA.68.865 -.2 -.9 GDP_ProductA -.6.78 -.22 -.2 species_suma.84.2 -. -.2 * distance_edgea -.64.95 -.9 -. * XEdge - 9.8.896 -.6.6 * IsolatesA - 6.824.748.8 -.4 * XASA 4.3784.32 -.63.55 * XASB -.467.396 -.6.62 * XACA -.4665.3 -.67 -.4 * XACB.2 -.45.33 loggdpstatetreat_xedge.467.4 -.7.63 * Star2AX.458. -.8 -.24 * Star2BX -.5834.84 -.46.45 * TriangleXBX 2.8928.97 -.3.4 * L3XBX 3.433.267.57.7 * ATXBX -.35. -.3 -.24 * L3AXB -.36. -. -.4
Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) EdgeA 59 59.53 4.54 -.35 Star2A 45 25.96 38.5.38 Star3A 7525 6986.738 62.3.867 Star4A 48877 44694.3 2383.26.755 Star5A 26433 245549.822 7529.89 2.494 TriangleA 37 38.89.6 -.8 ASA 326.9238 328.5935 42.39 -.39 ASA2 326.9238 328.5935 42.39 -.39 ATA 85.963 86.578 9.376 -.34 A2PA 62.2734 45.7699.27.5 AETA 83.82 97.295 62.753 -.225 coast_suma 8227.8 7848.968 698.574.223 coast_differencea 574.2 628.7824 655.48 -.34 coast_producta 552268.66 5742.9956 5757.678.786 GDP_SumA 358.3 3592.88 325.357 -.36 GDP_DifferenceA 248.9 248.4494 9.5.24 GDP_ProductA 28.929 274.487 83.76 -.36 species_suma 426 468.939 38.74 -.3 species_differencea 5458 627.877 76.398 -.985 species_producta 229363 2267.25 43495.224.2 distance_edgea 245.633 249.2669 2.7 -.32 XEdge 744 744.34 25.393 -.2 XStar2A 2297 666.35 29.44 2.875 XStar2B 4955 447.62 476.587.7 XStar3A 8932 74966.58 9.676 5.464 XStar3B 36652 354528.85 5253.258 2.28 X3Path 94997 937576.735 2398.243.9 X4Cycle 7539 69696.854 899.436 2.82 XECA 2265537 2239.446 72782.425.995 XECB 38684 4999. 223227.68.28
Multilevel: example, global fisheries governance (Hollway & Koskinen, 24) IsolatesA 6 6.8.966 -.4 IsolatesB.93.44 -.438 XASA 2785.5996 2786.52 48.835 -. XASB 347.4525 348.564 5.356 -.2 XACA 4344.3993 4345.9997 6.884 -.26 XACB 22625.6352 22624.8278 4.667.8 XAECA 293489.2989 27696.6764 7695.3 2.26 XAECB 29868.776 278382.2566 762.24 2.67 loggdpstatetreat_xe dge 8782.9 8786.8598 267.542 -.8 Star2AX 526 5272.267 529.59 -.2 StarAAX 6539.459 6834.3932 922.6 -.32 StarAXA 9277.9278 933.466 956.33 -.27 StarAXAA 34.2835 349.8379 78.967.6 TriangleXAX 7 888.225 7.855.695 L3XAX 27.8552 253.83 25.28.78 ATXAX 47884 45349.46 556.9.459 EXTA 2273 2638.83 853.83 -.428 Star2BX 23 23.26 26.2 -. StarABX 85.375 824.7945 2.68 -.774 StarAXB 3833.3 3833.2944 52.83 -.5 StarAXAB 36.9437 36.825 5.42 -.7 TriangleXBX 396 396.673 6.435 -.4 L3XBX 54.3754 54.38.832 -.6 ATXBX 37668 3783.366 88.2 -.75 EXTB 552 566.58 5.745-2.525 L3AXB 5383 5398.32 538.28 -.28 C4AXB 73 875.858 8.474.87 ASAXASB 8354.834 8659.877 923.27 -.33
Longitudinal ERGM
LERGM FDI electricity market (Koskinen and Lomi, 23)
LERGM FDI electricity market (Koskinen and Lomi, 23)
LERGM FDI electricity market (Koskinen and Lomi, 23)
LERGM FDI electricity market (Koskinen and Lomi, 23)