Social network analysis with R sna package George Zhang iresearch Consulting Group (China) bird@iresearch.com.cn birdzhangxiang@gmail.com
Social network (graph) definition G = (V,E) Max edges = N All possible E edge graphs = Linear graph(without parallel edges and slings) N E V V V V V V6 V7 V V9 V0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 Erdös, P. and Rényi, A. (960). On the Evolution of Random Graphs.
Different kinds of networks Random graphs a graph that is generated by some random process Scale free network whose degree distribution follows a power law Small world most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps
Differ by Graph index Degree distribution average node-to-node distance average shortest path length clustering coefficient Global, local
network examples Butts, C.T. (006). Cycle Census Statistics for Exponential Random Graph Models.
GLI-Graph level index
Array statistic Mean Variance Standard deviation Skr Graph statistic Degree Density Reciprocity Centralization
Simple graph measurements Degree Number of links to a vertex(indegree, outdegree ) Density sum of tie values divided by the number of possible ties Reciprocity the proportion of dyads which are symmetric Mutuality the number of complete dyads Transtivity the total number of transitive triads is computed
Example Degree sum(g) = Density 9 gden(g) = /90 = 0. Reciprocity grecip(g, measure="dyadic") = 0.76 0 7 grecip(g,measure="edgewise") = 0 Mutuality 6 mutuality(g) = 0 Transtivity gtrans(g) = 0.
Path and Cycle statistics kpath.census kcycle.census dyad.census Triad.census Butts, C.T. (006). Cycle Census Statistics for Exponential Random Graph Models.
Multi graph measurements Graph mean In dichotomous case, graph mean corresponds to graph s density Graph covariance gcov/gscov Graph correlation gcor/gscor Structural covariance unlabeled graph Butts, C.T., and Carley, K.M. (00). Multivariate Methods for Interstructural Analysis.
Example gcov(g,g) = -0.0096 9 gscov(g,g,exchange.list=:0) = - 0.0096 gscov(g,g)=0.00 unlabeled graph 0 7 6 gcor(g,g) = -0.0076 gscor(g,g,exchange.list=:0) = - 0.0076 gscor(g,g) = 0.099 7 6 0 unlabeled graph 9
gcov
Connectedness 0 for no edges for N edges Hierarchy Measure of structure 0 for all two-way links for all one-way links Efficiency 0 for edgs N V - N( N ) - V MaxV for N- edges Least Upper Boundedness (lubness) 0 for all vertex link into one for all outtree V - MaxV Krackhardt,David.(99). Graph Theoretical Dimensionsof Informal Organizations.
Example Outtree Connectedness= Hierarchy= Efficiency= Lubness=
Graph centrality Degree Number of links to a vertex(indegree, outdegree ) Betweenness Number of shortest paths pass it Closeness Length to all other vertices Centralization by ways above 0 for all vertices has equal position(central score) for vertex be the center of the graph See also evcent, bonpow, graphcent, infocent, prestige Freeman,L.C.(979). Centrality in Social Networks-Conceptual Clarification
Example > centralization(g,degree,mode="graph") [] 0.9 > centralization(g,betweenness,mode="graph") [] 0.06 > centralization(g,closeness,mode="graph") [] 0 6 0 7 9 Mode= graph means only consider indegree
Example > centralization(g,degree,mode="graph") [] 0.9 > centralization(g,betweenness,mode="graph") [] 0.06 > centralization(g,closeness,mode="graph") [] 0 Degree center 6 0 7 9 Mode= graph means only consider indegree
Example > centralization(g,degree,mode="graph") [] 0.9 > centralization(g,betweenness,mode="graph") [] 0.06 > centralization(g,closeness,mode="graph") [] 0 Betweeness center Degree center 6 0 7 9 Mode= graph means only consider indegree
Example > centralization(g,degree,mode="graph") [] 0.9 > centralization(g,betweenness,mode="graph") [] 0.06 > centralization(g,closeness,mode="graph") [] 0 Betweeness center Degree center 6 0 7 Mode= graph means only consider indegree 9 Both closeness centers
GLI relation
GLI map density Connectedness Efficiency Hierarchy 0 size Centralization(degree) Centralization(betweenness) Centralization(closeness) Anderson,B.S.;Butts,C. T.;and Carley,K. M.(999). The Interaction of Size and Density with Graph-Level Indices.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 connectedness distribution by graph size and density density size / /6 / 6 Anderson,B.S.;Butts,C. T.;and Carley,K. M.(999). The Interaction of Size and Density with Graph-Level Indices.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 efficiency distribution by graph size and density density size /6 / / 6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 hierarchy distribution by graph size and density density size /6 / / 6
GLI map R code compare<-function(size,den) { g=rgraph(n=size,m=00,tprob=den) gli=apply(g,,connectedness) gli=apply(g,,efficiency) gli=apply(g,,hierarchy) gli=apply(g,,function(x) centralization(x,degree)) gli=apply(g,,function(x) centralization(x,betweenness)) gli6=apply(g,,function(x) centralization(x,closeness)) x=mean(gli,na.rm=t) x=mean(gli,na.rm=t) x=mean(gli,na.rm=t) x=mean(gli,na.rm=t) x=mean(gli,na.rm=t) x6=mean(gli6,na.rm=t) return(c(x,x,x,x,x,x6)) } nx=0 ny=0 res=array(0,c(nx,ny,6)) size=:6 den=seq(0.0,0.,length.out=0) for(i in :nx) for(j in :ny) res[i,j,]=compare(size[i],den[j]) #image(res,col=gray(000:/000)) par(mfrow=c(,)) image(res[,,],col=gray(000:/000),main="connectedness") image(res[,,],col=gray(000:/000),main="efficiency") image(res[,,],col=gray(000:/000),main="hierarchy") image(res[,,],col=gray(000:/000),main="centralization(degree)") image(res[,,],col=gray(000:/000),main="centralization(betweenness)") image(res[,,6],col=gray(000:/000),main="centralization(closeness)")
GLI distribution R code par(mfrow=c(,)) for(i in :) for(j in :) hist(centralization(rgraph(*^i,00,tprob=j/),betweenness),main="",xlab="",ylab="",xlim=range(0:),ylim=range(0:0)) hist(centralization(rgraph(*^i,00,tprob=j/),degree),main="",xlab="",ylab="",xlim=range(0:),ylim=range(0:0)) hist(hierarchy(rgraph(*^i,00,tprob=j/6)),main="",xlab="",ylab="",xlim=range(0:),ylim=range(0:0)) hist(efficiency(rgraph(*^i,00,tprob=j/6)),main="",xlab="",ylab="",xlim=range(0:),ylim=range(0:0)) hist(connectedness(rgraph(*^i,00,tprob=j/)),main="",xlab="",ylab="",xlim=range(0:),ylim=range(0:00))
Graph distance Clustering, MDS
Distance between graphs Hamming(labeling) distance e : ( e E( G ), e E( G)) ( e E( G ), e E( G number of addition/deletion operations required to turn the edge set of G into that of G hdist for typical hamming distance matrix Structure distance d (G,H L,L ) min d( (G), (H)) S G H L G, L H structdist & sdmat for structure distance with exchange.list of vertices )) Butts, C.T., and Carley, K.M. (00). Multivariate Methods for Interstructural Analysis.
Example 6 7 9 0 6 7 9 0 6 7 9 0 6 7 9 0 6 7 9 0
Example hdist(g) 0 9 9 0 9 9 0 0 9 9 0 sdmat(g) [,] [,] [,] [,] [,] [,] 0 7 [,] 0 7 9 [,] 0 6 [,] 7 6 0 [,] 7 9 0 structdist(g) 0 0 0 0 0 0 0 0 0
Measure Distance 0.0 0.06 0.0-0 -0 0 0 0 Inter-Graph MDS gdist.plotstats Plot by distances between graphs Add graph level index as third or forth dimension -0-0 0 0 0 > g.h<-hdist(g) #sample graph used before > gdist.plotdiff(g.h,gden(g),lm.line=true) > gdist.plotstats(g.h,cbind(gden(g),grecip(g))) 0 0 Inter-Graph Distance
0.6 0.0 0. 0. Height 0 0 Graph clustering Use hamming distance g.h=hdist(g) g.c<-hclust(as.dist(g.h)) rect.hclust(g.c,) g.cg<-gclust.centralgraph(g.c,,g) gplot(g.cg[,,]) gplot(g.cg[,,]) gclust.boxstats(g.c,,gden(g)) Cluster Dendrogram hc hc as.dist(g.h) hclust (*, "complete") X X
Distance between vertices Structural equivalence sedist with methods:. correlation: the product-moment correlation. euclidean: the euclidean distance. hamming: the Hamming distance. gamma: the gamma correlation Path distance geodist with shortest path distance and the number of shortest pathes Breiger, R.L.; Boorman, S.A.; and Arabie, P. (97). An Algorithm for Clustering Relational Data with Applications to Social Network Analysis and Comparison with Multidimensional Scaling. Brandes, U. (000). Faster Evaluation of Shortest-Path Based Centrality Indices.
sedist Example 9 7 0 6 sedist(g) = sedist(g,mode="graph") [,] [,] [,] [,] [,] [,6] [,7] [,] [,9] [,0] [,] 0 0 7 9 [,] 0 0 7 9 [,] 0 6 6 6 6 6 [,] 7 7 6 0 6 6 6 6 [,] 6 0 [6,] 6 6 0 6 [7,] 6 6 0 6 [,] 6 6 0 6 [9,] 0 6 [0,] 9 9 6 6 6 6 6 0
geodist Example 7 6 0 9 geodist(g) $counts [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] [,] 0 [,] 0 0 [,] 0 0 0 [,] 0 0 0 [6,] 0 0 0 [7,] 0 0 0 [,] 0 0 0 [9,] 0 0 0 [0,] 0 0 0 $gdist [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] 0 [,] Inf 0 [,] Inf Inf 0 [,] Inf Inf Inf 0 [,] Inf Inf Inf 0 6 [6,] Inf Inf Inf 0 [7,] Inf Inf Inf 0 [,] Inf Inf Inf 0 [9,] Inf Inf Inf 0 [0,] Inf Inf Inf 0
geodist Example 7 6 0 9 geodist(g) $counts [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] [,] 0 [,] 0 0 [,] 0 0 0 [,] 0 0 0 [6,] 0 0 0 [7,] 0 0 0 [,] 0 0 0 [9,] 0 0 0 [0,] 0 0 0 $gdist [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] 0 [,] Inf 0 [,] Inf Inf 0 [,] Inf Inf Inf 0 [,] Inf Inf Inf 0 6 [6,] Inf Inf Inf 0 [7,] Inf Inf Inf 0 [,] Inf Inf Inf 0 [9,] Inf Inf Inf 0 [0,] Inf Inf Inf 0
geodist Example 7 6 0 9 geodist(g) $counts [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] [,] 0 [,] 0 0 [,] 0 0 0 [,] 0 0 0 [6,] 0 0 0 [7,] 0 0 0 [,] 0 0 0 [9,] 0 0 0 [0,] 0 0 0 $gdist [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] 0 [,] Inf 0 [,] Inf Inf 0 [,] Inf Inf Inf 0 [,] Inf Inf Inf 0 6 [6,] Inf Inf Inf 0 [7,] Inf Inf Inf 0 [,] Inf Inf Inf 0 [9,] Inf Inf Inf 0 [0,] Inf Inf Inf 0
geodist Example 7 6 0 9 geodist(g) $counts [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] [,] 0 [,] 0 0 [,] 0 0 0 [,] 0 0 0 [6,] 0 0 0 [7,] 0 0 0 [,] 0 0 0 [9,] 0 0 0 [0,] 0 0 0 $gdist [,] [,] [,] [,] [,] [,6] [,7] [,] [,9][,0] [,] 0 [,] Inf 0 [,] Inf Inf 0 [,] Inf Inf Inf 0 [,] Inf Inf Inf 0 6 [6,] Inf Inf Inf 0 [7,] Inf Inf Inf 0 [,] Inf Inf Inf 0 [9,] Inf Inf Inf 0 [0,] Inf Inf Inf 0
geodist reachability gplot(reachability(g),label=:0) 6 7 9 7 0 0 9 6
6 7 9 Height 0 6 0 Graph vertices clustering by sedist General clustering methods equiv.clust for vertices clustering by Structural equivalence( sedist ) Cluster Dendrogram 9 7 0 6 as.dist(equiv.dist) hclust (*, "complete")
Distance 0. 0. 0. 0.7 Graph structure by geodist structure.statistics > ss<-structure.statistics(g) > plot(0:9,ss,xlab="mean Coverage",ylab="Distance") 9 7 0 6 0 6 Mean Coverage
Graph cov based function Regression, principal component, canonical correlation
Multi graph measurements Graph mean In dichotomous case, graph mean corresponds to graph s density Graph covariance gcov/gscov Graph correlation gcor/gscor Structural covariance unlabeled graph Butts, C.T., and Carley, K.M. (00). Multivariate Methods for Interstructural Analysis.
Correlation statistic model Canonical correlation netcancor Linear regression netlm Logistic regression netlogit Linear autocorrelation model lnam nacf
Random graph models
Graph evolution Random Biased Phases
Proportion Reached 0. 0. Mut Asym Null 00 0 0 0D 0U 0C D U 00T 00C 0 0D 0U 0C 0 00 Count Count 0 0 Biased net model graph generate: rgbn graph prediction: bn Predicted Dyad Census Predicted Triad Census 9 6 Dyad Type Triad Type 0 7 Predicted Structure Statistics 0 6 Distance
Graph statistic test cugtest qaptest
Thanks