Workshop in Applied Analysis Software MY591. Introduction to Social Network Analysis with UCINET

Transcription

1 Workshop in Applied Analysis Software MY591 Introduction to Social Network Analysis with UCINET Instructor: Prof. Ahmet K. Suerdem (Istanbul Bilgi University and London School of Economics) Contact: Course Convenor (MY591) Dr. Aude Bicquelet (LSE, Department of Methodology) Contact:

2 Background Information UCINET is a social network analysis program developed by Steve Borgatti, Martin Everett and Lin Freeman. UCINET works in tandem with freeware program called NETDRAW for visualizing networks. To download UCINET and manuals: You can download the trial version of Ucinet for FREE (for 3 months). You can also purchase it for 40$ from Analytic Technologies: This handout is based on the following references: (Hanneman and Riddle 2005); (Wasserman and Faust. 1994) and Christian Stieglitz s Statistical Analysis of Complete Networks: Introduction to Networks Powerpoint slides. (Hanneman and Riddle 2005) is a nice and handy introduction and an online version can be reached at (Wasserman and Faust. 1994) is a classical complete guide to social network analysis. Part I: Introduction to Social Network Analysis (SNA): Major Concepts Social Network Analysis (SNA) is the study of the pattern of interaction between actors Units of analysis are relations, not entities. Nodes in a network are vertices, just the connectors of edges (links, relations, arrows), their attributes are secondary. For example, if Bob (node) is a male, 32 years old, etc has secondary importance for social network analysis. Patterns in Bob s relations depicted by arrows (edges) is primary importance. Therefore, social network data is different from the traditional quantitative data collected through surveys and analysed by statistical software such as SPSS That s how a data entry looks like for traditional quantitative analysis: Rows are cases, columns are attributes, variables; usually a rectangular matrix. Numbers in the cells depict the value a case takes on an attribute: Bob is a 32 yrs old Male. For SNA, data entry is a square matrix depicting the relations between cases. Cells depict the existence or value of a relation: Carol likes Bob=1. Rows are sending and columns are receiving relations. Carol likes Bob but Bob does not like Carol. 2

3 An important note for network data: network data cannot be considered as quantitative in the conventional sense. The statistical procedures that hold for random samples do generally NOT hold for a network data set since the social actors are dependent. This is a violation of the independence of the sampling units assumption. Therefore, procedures from classical statistics are not applicable (no regression, no ANOVA, no t-tests...). Special statistical procedures are required (bootstrapping, simulation) Then, SNA quantitative or qualitative? Social network research involves a huge toolbox of quantitative measures. However, typically, just one social system is under study. In fact, SNA is a form of case study; not a random sample of independent cases. SNA uses quantitative tools for qualitative case studies. We cannot habitually generalise the results of a case study. Therefore, some form of external validation of SNA results is always desirable. Replication of results is essential. Better yet, work with a random sample of networks. Then you can also generalise results to a population (of networks). Data collection: strategies for sampling a. Full network methods: require that we collect information about each actor's ties with all other actors. For example, we could examine the boards of directors of all public corporations for overlapping directors; who likes whom in a classroom etc. Advantages: Full network data is necessary to properly define and measure many of the structural concepts of network analysis. Disadvantages: can be very expensive, difficult and sometimes unrealistic to collect. b. Snowball methods: begin with a focal actor or set of actors. Each of these actors is asked to name some or all of their ties to other actors. Then, all the actors named (who were not part of the original list) are tracked down and asked for some or all of their ties. The process continues until no new actors are identified, or until we decide to stop. Advantages: can be particularly helpful for tracking down "special" populations such as business contact networks, community elites and deviant sub-cultures. Disadvantages: First, this method may tend to overstate the "connectedness" and "solidarity". Second, there is no guaranteed way of finding all of the connected individuals in the population. c. Ego-centric networks (with alter connections): In many cases it will not be possible (or necessary) to track down the full networks beginning with focal nodes (as in the snowball method). An alternative approach is to begin with a selection of focal nodes (egos), and identify the nodes to which they are connected. Then, we determine which of the nodes identified in the first stage are connected to one another. This can be done by contacting each of the nodes; sometimes we can ask ego to report which of the nodes that it is tied to are tied to one another. Advantages: can be quite effective for collecting a form of relational data from very large populations, and can be combined with attribute-based approaches. Such data can be very useful in helping to understand the opportunities and constraints that an individual has as a result of the way they are embedded in their networks. Disadvantages: Such data are, in fact, samplings of local areas of larger networks. Many network properties -- distance, centrality, and various kinds of positional equivalence cannot be assessed with ego-centric data. Some properties, such as overall network density can be reasonably estimated. Some properties -- such as the prevalence of reciprocal ties, cliques, and the like can be estimated rather directly. d. Ego-centric networks (ego only) Ego-centric methods really focus on the individual, rather than on the network as a whole. By collecting information on the connections among the actors connected to each focal ego, we can 3

4 still get a pretty good picture of the "local" networks or "neighbourhoods" of individuals. Such information is useful for understanding how networks affect individuals, and they also give a (incomplete) picture of the general texture of the network as a whole. Advantages: can understand something about the differences in the actors places in social structure, and make some predictions about how these locations constrain their behaviour. Disadvantages: can analyse only networks around focal egos (individual nodes) not the whole network. Multiple relations: actors may be connected to each other in terms of different relations: faculty have students in common, serve on the same committees, interact as friends outside of the workplace, have one or more areas of expertise in common, and co-author papers. Usually our research question and theory indicate which of the kinds of relations among actors are the most relevant to our study, and we do not sample -- but rather select -- relations. Methodologies for working with multirelational data are not as well developed as those for working with single relations. Multimodality: individuals may form networks in terms of their affiliations to different organisations or contexts. A data set that contains information about two types of social entities (say persons and organizations) is a two mode network. Case (person, actor) by Affiliation (organisation, context, event; etc ) matrices are called incidence matrices. Example of a six children and three birthday parties affiliation matrix.,an affiliation network can be represented by a bipartite graph. 4

5 The lines in the bipartite graph represent the relation "is affiliated with" (from the perspective of actors) or "has as a member" (from the perspective of events). Since actors are affiliated with events, and events have actors as members, all lines in the bipartite graph are between nodes representing actors and nodes representing events. Part II: Starting UCINET and basic operations When you first open UCINET, set the default directory to a directory of your choice, by typing in the directory name (into the space at the bottom edge of the UCINET window). The original default directory is just the c:\ drive. Note that UCINET produces many types of files and deleting any (before you are entirely done with your analysis) may make it difficult to use some of the others. If you do not set the default directory, it may be very difficult to manage your files. File>Change Default Folder or Make New folder How to enter data into UCINET? There are several ways for doing this. Most common ones are either entering the data to an excel file or using the UCINET Matrix spreadsheet editor. How to prepare data with excel: Network matrix: attribute matrix affiliation matrix To import: Data->Import via spreadsheet->full matrix w/ multiple sheets (the filename in the example is for demonstration purposes only; please import your own data, or the sample file provided) 5

6 Choose Node attribute or Network adjacency matrix depending on the type of your data. This is for one mode networks. If your data is bi-modal data import through Network adjacency matrix and create one-mode data sets. Data>Affiliatons If you choose the rows mode then you will have an actor by actor matrix, if you choose Columns then an event by event (or organisation by organisation etc ) matrix. The Cross-Products method takes each entry of the row for actor A, and multiplies it times the same entry for actor B, and then sums the result. Usually, this method is used for binary data because the result is a count of co-occurrence. The minimums method examines the entries for the two actors at each event, and selects the minimum value. This approach is commonly used when the original data are measured as valued. Bi-modal data are sometimes stored in a second way, called the "bipartite" matrix. The upper left actor (g) x actor (g) submatrix and the lower right event (h) x event (h) submatrix are filled with O's, indicating no "affiliation" ties among the g actors (the first g rows and columns) or among the h events (the last h rows and columns). The upper right submatrix is the g x h affiliation matrix, A, indicating "is affiliated with" ties from row actors to column events. The lower left h x g submatrix is the transpose of A, denoted by A', indicating whether or not each row event includes the column actor. Transform>Bipartite tool converts two-mode rectangular matrices to one-mode bipartite matrices. 6

7 Without going through Excel, you can also directly enter the data through the UCINET Matrix spreadsheet editor: Data>Spreadsheets>Matrix Visualising the network data: Netdraw You can access to Netdraw from within the UCINET by simply clicking visualize network with netdraw: Once you open Netdraw; to open a network data to visualise: Netdraw>File>Open>UCInet dataset>network Select: either 1-Mode Networks; Node Attributes or 2-Mode Networks options depending on the type of your data. 1-Mode Network: 1-Mode Network with attributes: To visualise according to attributes: sex is represented by colour, and age by size To visualise sex: Nodes>Symbols>Colour and from Colour>Attribute based Select sex: 7

8 For age: Nodes>Symbols>size and from Size>Attribute based Select age Now female are blue and the male are red; as age increases, node size also increases. Relation properties: Suppose that you wanted to highlight certain "types" of relations in the graph. For example, a network in a classroom may have friendhip and note exchange networks. To do this you need to enter the data for each relationship to a separate Sheet in Excel. In our sample data friendship data is in the first and note exchange data is in the second. Properties>Lines>Color from the menu. Then select Relations. Caution: before doing this you need to select both of the sheets: select Rels at the right hand side window and select Sheet 1 and Sheet 2. Red lines depict Friendship (Sheet 1) and Blue lines depict note exchange relations (Sheet 2) and the invisible lines depict if the relation exists for both. You can also show the strength of the relations if the data is valued: Properties>Lines>Size. Then, select Tie-Strength (before doing this, select only Sheet 1 and Properties>Lines>Color select General) 8

9 Visualizing two-mode data NetDraw>File>Open>UCINET dataset>2-mode Network Location: Where a node or a relation is drawn in the space is essentially arbitrary; i.e. their configuration in the two dimensional space is random. Since the X and Y directions don't "mean" anything, the location of the nodes and relations don't provide any particular insight. That is, the distances between the nodes are arbitrary, and can't be interpreted in any meaningful way as "closeness" of the actors. And, the "directions" X and Y have no meaning -- we could rotate any of the graphs any amount, and it would not change a thing about our interpretation. Therefore, we can "drag and drop" to relocate the nodes so that actors that share the same combinations of attributes come nearby (for example, males on one side and females on the other). To do this automatically, NetDraw has a built-in tool that allows the user to assign the X and Y dimensions of the graph to scores on attributes (either categorical or continuous): Layout>Attributes as Coordinates, and then select attributes to be assigned to X or Y or both. 9

10 Now, males are grouped on the right hand side of the X axis, and as age increases nodes go upper on the Y axis. To visualize which nodes are most highly connected. Layout>Circle, and selecting the optimization button. The nodes are located at equal distances around a circle, and nodes that are highly connected are very easy to quickly locate (Jim and Bob). So far, we have stated that "closeness" of the actors in the layout can't be interpreted in any meaningful way. However, there are several other commonly used graphic layouts that do try to make the distances and/or directions of locations among the actors somewhat more meaningful: One way of doing this is "Multi-Dimensional Scaling" of the network space. MDS is a family of techniques that is used (in network analysis) to assign locations to nodes in multidimensional space (in the case of the drawing, a 2-dimensional space) such that nodes that are "more similar" are closer together. Similarity in this context refers to similarity in terms of embeddedness in a network of relations ; i.e; similarity in terms of structural positions: structural equivalence. Position means sets of others the subsets of actors who are similarly connected to others in the network. Layout>Graph Theoretic Layout>MDS. 10

11 Carole and Alex are closer in terms of the configuration of their connections. Alice have a very different pattern of ties to the other nodes. This approach can be useful to reveal structural equivalences. Part III: Basic Concepts in Descriptive Network Analysis Visualisation gives us a rough picture about a network. Especially in case of large networks it can be very difficult to understand the properties of a network structure. Network indicators help us for a more tidy and systematic analysis. The basic properties of networks are easier to learn and understand by example. For this workshop we will look at a single directed binary network that describes the flow of information among 10 formal organizations concerned with social welfare issues in one mid-western U.S. city (Knoke and Burke). Change your default folder to: C:\ Program files\ Analytic technologies\ucinet\datafiles. KNOKBUR.##H I the working file. Measures of Cohesion Dyad census: A dyad is a pair of actors <i,j> in the network, plus the configuration of tie variables <xij, xji> between them. In a directed, binary network, there are n(n-1) tie variables located in n(n- 1)/2 dyads. A simple count of types ( dyad census ) gives information about the degree to which the network is symmetric. Network indices based on the dyad census: o Density of the network is defined as the proportion of actually observed ties among the potentially observable ones. Actually observed are 2M+A/Potentially observable are n(n-1) Network>Cohesion>Density There are two ways of calculating density: overall or by groups. 11

12 In the output we can see that 54% of all the possible ties are present. You can also calculate the density by groups. For this purpose you need an attribute file indicating the properties of each node. (For this we will use our previous who likes who example since KNOKEBUR does not have an attribute file. Take the second sheet since the cells are binary. Interpretation of valued data is more complicated; For a valued network, density is defined as the sum of the ties divided by the number of possible ties (i.e. the ratio of all tie strength that is actually present to the number of possible ties).). Besides selecting your network dataset as usual, select also the attribute file (Dataset containing row and column partition and select the attribute you want to partition your network data; in this case sex). Density (prop of ties) / Average tie strength Output: 0 is the female and 1 is the male. So density within the female group is 0.500; from female to male 0.583; male to female and male to male is From this we can deduce that liking relation is more dense between sexes than within sexes (for this particular group). o Reciprocity index of the network Can be defined as the proportion of actually reciprocated ties among the potentially reciprocable ones. Actually reciprocated are 2M/ Potentially reciprocable are 2M+A DensityNetwork>Cohesion>Reciprocity 12

13 A network that has a predominance of null or reciprocated ties over asymmetric connections may be a more "equal" or "stable" network than one with a predominance of asymmetric connections (which might be more of a hierarchy). Overall reciprocity in the network is This is neither high nor low suggesting a considerable degree of institutionalized horizontal connection within this organizational population. We can also examine the reciprocity of individual actors. In this example, Actor five has the most reciprocal relations. Triad census: A triad is a set of three actors in the network, together with the configuration of ties between them. In a directed, binary network, there are sixteen triad types typically indicated by their dyad census M-A-N plus (where necessary) a distinguishing letter: for example: 021U indicates: 0 Mutual, 2 Asymmetric and 1 Null ties. Ties are Upwards. Triad census can give us important clues about hierarchy, equality, and the formation of exclusive groups (e.g. where two actors connect, and exclude the third). However, UCINET does not have a routine for conducting triad censuses. Network indices based on the triad census In particular, we may be interested in the proportion of triads that are "transitive" (that is, display a type of balance where, if i directs a tie to j, and j directs a tie to k, then i also directs a tie to k). Such transitive or balanced triads are argued by some theorists to be the "equilibrium" or natural state toward which triadic relationships tend (not all theorists would agree!). 13

14 UCINET can calculate this. Network>Cohesion>Transitivity 146 transitive (directed) triples. That is, there are 146 cases where, if AB and BC are present, then AC is also present. Number of triples of all kinds (720). The proportion of transitive triples to triples of all kinds norms the transitivity. That is, A better way of norming transitivity is dividing the number of transitive triples to the number of cases where a single link could complete the triad (Number of triples in which i->j and j->k: 217). That is, For random graphs, the expected value of the transitivity index is close to the density of the graph; for actual social networks, values between 0.3 and 0.6 are quite usual. Remember that the density of this network was ; thus the proportion of transitive triads is more than expected than a random graph. Network can be considered as a stable network. 14

15 Indicators related to the degree of connectivity of a network. o Reachability : shows if there is the potential of a division of the network. In the output, if any of the cells other than the diagonals have a zero, than the network breaks at the link between those two actors. Network>Cohesion>Reachability o Connectivity: Network>Cohesion>Point Connectivity calculates the number of nodes that would have to be removed in order for one actor to no longer be able to reach another. If there are many different pathways that connect two actors, they have high "connectivity" in the sense that there are multiple ways for a signal to reach from one to the other. Distance: Cohesion properties that we have examined so far primarily deal with the direct connections from one actor to the next. However, indirect connections may also be important for understanding the cohesiveness of a network. Distance basically indicates how many steps are required to reach an actor from another. How many actors are at various distances from each actor can be important for understanding the differences among actors in the constraints and opportunities they have as a result of their position. For example, where distances are great, it may take a long time for information to diffuse across a population. The variability across the actors in the distances that they have from other actors may be a basis for differentiation and even stratification. Those actors who are closer to more others may be able to exert more power than those who are more distant. o Geodesic distance: is the number of relations in the shortest possible walk from one actor to another i.e.; the "optimal" or most "efficient" connection between two actors. It is also possible to define the distance between two actors where the links are valued. Where we have measures of the strengths of ties, the "distance" between two actors is defined as the strength of the weakest path between them. If A sends 6 units to B, and B sends 4 units to C, the "strength" of the path from A to C (assuming A to B to C is the shortest path) is 4. Where we have a measure of the cost of making a connection (as in an "opportunity cost" or "transaction cost" analysis), the "distance" between two actors is defined as the sum of the costs along the shortest pathway. Network>Cohesion>Distance Type of data may be Adjacency (default), Strength, Cost or Probabilities. Nearness transformation: multiplicative: divides the distance by the largest possible distance between two actors. additive: subtracts the actual distance between two actors from the number of nodes. linear : rescales distance by reversing the scale (i.e. the closest becomes the most distant, the most distant becomes the nearest) and re-scoring to make the scale range from zero (closest pair of nodes) to one (most distant pair of nodes). 15

16 exponential decay: turns distance into nearness by weighting the links in the pathway with decreasing values as they fall farther away from ego. With an attenuation factor of.5, for example, a path from A to B to C would result in a distance of 1.5. frequency decay : 1 minus the proportion of other actors who are as close or closer to the target as ego is. Output: Besides giving average distance for the whole network, the output gives indicators of compactness and its inverse, fragmentation. Compactness is the harmonic mean of the entries in the distance matrix (that is the normalized sum of the reciprocal of all the distances). For this network, compactness is near 1 (0.759), so this is a rather connected network. We can see this from the individual geodesic distances among actors. Most of the actors are only one step distant from the others (average, 1.533): Eccentricity and diameter: For each actor, that actor's largest geodesic distance is its eccentricity-- a measure of how far a actor is from the furthest other. The diameter of a network is the largest geodesic distance in the network. Many researchers limit their explorations of the connections among actors to involve connections that are no longer than the diameter of the network. Flow: The geodesic distance examines only a single connection between a pair of actors. Sometimes the sum of all connections between actors, rather than the shortest connection may be relevant. If I start a rumour, for example, it will pass through a network by all pathways -- not just the most efficient ones. How much credence another person gives my rumour may depend on how many times they hear it from different sources -- and not how soon they hear it. One notion of how totally connected two actors are (called maximum flow by UCINET) asks how many different actors in the neighbourhood of a source lead to pathways to a target. The "flow" approach suggests that the strength of my tie to you is no stronger than the weakest link in the chain of connections, where weakness means a lack of alternatives. Network>Cohesion>Maximum Flow. 16

17 Note that actors 6, 7, and 9 are relatively disadvantaged. In particular, actor 6 has only one way of obtaining information from all other actors (the column vector of flows to actor 6). Hubbell and Katz cohesion. If we are interested in how much two actors may influence one another, or share a sense of common position, the full range of their connections should probably be considered. That is the length of the connections are taken into account. A path of length 10 is not same as a path of length 1. The Hubbell and Katz approaches count the total connections between actors (ties for undirected data, both sending and receiving ties for directed data). Each connection, however, is given a weight, according to it's length. The greater the length, the weaker the connection. How much weaker the connection becomes with increasing length depends on an "attenuation" factor. In our example, below, we have used an attenuation factor of.5. That is, a direct connection receives a weight of one, a walk of length two receives a weight of.5, a connection of length three receives a weight of.5 squared (.25) etc. Shows the pairwise solidarity between actors. Large negative distances indicate that the pair of actors are very close relative to the other pairs, or have high solidarity. Measures of Centrality and Power Network approach emphasizes that power is inherently relational. Actors do not have power in the abstract, they have power because they can dominate others -- ego's power is alter's dependence, and vice versa. Thus, power is a consequence of patterns of relations. Actors that face fewer constraints, and have more opportunities than others are in favourable structural positions. Having a favoured position means that an actor may extract better bargains in exchanges, and that the actor will be a focus for deference and attention from those in less favoured positions. What do we mean by "having a favoured position" and having "more opportunities" and "fewer constraints? There are different aspects of power: 17

18 An actor may get its power from having many connections, being close to or in between all other actors. Now let s focus on how to handle these different aspects of power with UCINET. Degree centrality; Freeman s approach: Actors who have more ties to other actors may be advantaged positions. In undirected data, actors differ from one another only in how many connections they have. With directed data, however, it can be important to distinguish centrality based on in-degree from centrality based on out-degree. If an actor receives many ties, they are often said to be prominent, or to have high prestige. Actors who have unusually high out-degree are actors who are able to exchange with many others, or make many others aware of their views. Actors who display high out-degree centrality are often said to be influential actors (evidently, depending on the nature of relation). Network>Centrality>Degree Actors #5 and #2 have the greatest out-degrees, and might be regarded as the most influential (though it might matter to whom they are sending information, this measure does not take that into account). Actors #5 and #2 are joined by #7 (the newspaper) when we examine in-degree. Actor 7 has the largest in-degree, so it may be prominent in the sense of receiving information. To compare across networks of different sizes or densities, it might be useful to "standardize" the measures of in and out-degree (NrmOUtDeg and NRMIn Deg). Coming to the Descriptive statistics panel, the mean degree is 4.9, which is quite high, given that there are only nine other actors. We see that the range of in-degree is slightly larger (minimum and maximum) than that of out-degree, and that there is more variability across the actors in in-degree than out-degree (standard deviations and variances). The range and variability of degree (and other network properties) can be quite important, because it describes whether the population is homogeneous or heterogeneous in terms of their structural positions. One could examine whether the variability is high or low relative to the typical scores by calculating the coefficient of variation (standard deviation divided by mean, times 100) for in-degree and out-degree. By the rules of thumb that are often used to evaluate coefficients of variation, the current values (35 for out-degree and 53 for in-degree) are moderate. Clearly, however, the population is more homogeneous with regard to out-degree (influence) than with regard to in-degree (prominence). 18

19 Network centralization: the degree of inequality or variance in our network as a percentage of that of a perfect star network of the same size. In the current case, the out-degree graph centralization is 51% and the in-degree graph centralization 38% of these theoretical maximums. It is a measure of hierarchy, values approaching one imply more hierarchy. Degree centrality: Bonacich's approach: Bonacich questioned the idea that more central actors are more likely to be more powerful actors. But if the actors that you are connected to are, themselves, well connected, they have other alternatives being connected to connected others makes an actor central, but not powerful. Then there are two aspects of power: o Centrality: The more connections the actors in your neighbourhood, o Power: The fewer the connections the actors in your neighbourhood, Network>Centrality>Power Beta coefficient(attenuation factor): Positive (between zero and one): connections have more connections, implies centrality o Negative values: connections are more dependent; connections have less connections, implies power Outputs: Coefficient: 0.5; centrality Coefficient:- 0.5, power 19

20 MAYR and COMM are clearly the most central for the positive coefficient. However, with a negative attenuation parameter, we have a quite different definition of power -- having weak neighbours, rather than strong ones. Actors numbers COMM and WRO are distinguished because their ties are mostly ties to actors with high degree -- making actors COMM and WRO "weak" by having powerful neighbours. By this definition of power, COMM is not powerful although it is central. Closeness centrality Degree centrality measures might be criticized because they only take into account the immediate ties that an actor has, or the ties of the actor's neighbours, rather than indirect ties to all others. Closeness centrality approaches emphasize the distance of an actor to all others in the network by focusing on the distance from each actor to all others. Network>Centrality>Closeness provides a number of alternative ways of calculating the "farness" of each actor from all others. Far-ness is the sum of the distance (by various approaches) from each ego to all others in the network. The most common is the geodesic path distance. Here, "far-ness" is the sum of the lengths of the shortest paths from ego (or to ego) from all other nodes. Far-ness" is then transformed into "nearness" as the reciprocal of farness. We see that actor 6 has the largest sum of geodesic distances from other actors (infarness of 22) and to other actors (outfarness of 17). The farness figures can be re-expressed as nearness (the reciprocal of far-ness) and normed relative to the greatest nearness observed in the graph (here, the incloseness of actor 7). 20

21 Summary statistics on the distribution of the nearness and farness measures are also calculated. We see that the distribution of out-closeness has less variability than in-closeness, for example. This is also reflected in the graph in-centralization (71.5%) and out-centralization (54.1%) measures; that is, in-distances are more unequally distributed than are out-distances. Closeness centrality: Eigenvector of geodesic distances Consider two actors, A and B. Actor A is quite close to a small and fairly closed group within a larger network, and rather distant from many of the members of the population. Actor B is at a moderate distance from all of the members of the population. The farness measures for actor A and actor B could be quite similar in magnitude. In a sense, however, actor B is really more "central" than actor A in this example, because B is able to reach more of the network with same amount of effort. The eigenvector approach is an effort to find the most central actors (i.e. those with the smallest farness from others) in terms of the "global" or "overall" structure of the network, and to pay less attention to patterns that are more "local." Network>Centrality>Eigenvector Usually, the first dimension captures the "global" aspects of distances among actors; second and further dimensions capture more specific and local sub-structures. The first set of statistics, the eigenvalues, tell us how much of the overall pattern of distances among actors can be seen as reflecting the global pattern (the first eigenvalue), and more local, or additional patterns. We are interested in the percentage of the overall variation in distances that is accounted for by the first factor. Here, this percentage is 74.3%. This means that about 3/4 of all of the distances among actors are reflective of the main dimension or pattern. If this amount is not large (say over 70%), great caution should be exercised in interpreting the further results, because the dominant pattern is not doing a very complete job of describing the data. The first eigenvalue should also be considerably larger than the second (here, the ratio of the first eigenvalue to the second is about 5.6 to 1). This means that the dominant pattern is, in a sense, 5.6 times as "important" as the secondary pattern. Next, we turn our attention to the scores of each of the cases on the 1st eigenvector. Higher scores indicate that actors are "more central" to the main pattern of distances among all of the actors, lower values indicate that actors are more peripheral. The results are very similar to those for our earlier analysis of closeness centrality, with actors #7, #5, and #2 being most central, and actor #6 being most peripheral. Usually the eigenvalue approach will do what it is supposed to do: give us a 21

22 "cleaned-up" version of the closeness centrality measures, as it does here. It is a good idea to examine both, and to compare them. Betweenness centrality Explains the extent that the actor falls on the geodesic paths between other pairs of actors in the network. The more people depend on me to make connections with other people, the more power I have. Network>Centrality>Freeman Betweenness>Node Betwenness We can see that there is quite a bit of variation (std. dev. = 6.2 relative to a mean betweenness of 4.8). Despite this, the overall network centralization is relatively low (20.11%). This makes sense, because we know that fully one half of all connections can be made in this network without the aid of any intermediary." In the sense of structural constraint, there is not a lot of "power" in this network. Actors #2, #3, and #5 appear to be relatively a good bit more powerful than others by this measure. Indeed, it would not be surprising if these three actors saw themselves as the movers-and-shakers, and the deal-makers that made things happen., Another way to think about betweenness is to ask which relations are most central, rather than which actors. Network>Centrality>Freeman Betweenness>Line (edge) Betweenness Betweenness is zero if there is no tie, or if a tie that is present is not part of any geodesic paths. There are some quite central relations in the graph. For example, the tie from the board of education (actor 3) to the welfare rights organization (actor 6). This particular high value arises because without the tie to actor 3, actor 6 would be largely isolated. 22

23 Network>Centrality>Freeman Betweenness>Hierarchical Reduction is an algorithm that identifies which actors fall at which levels of a hierarchy (if there is one). Since there is little hierarchy in KNOKE data (remember %), we take another example: KAPMINE: data collected by Bruce Kapferer (1969) on men working on the surface in a mining operation in Zambia (then Northern Rhodesia). The first portion of the output shows a partition (which can be saved as a file, and used as an attribute to colour a graph) of the node's level in the hierarchy. For example, first node is at the top (3), second at the second and the third is at the lowest (1) of the hierarchy, while the 4th node is at the third level again. The second portion of the output has re-arranged the nodes to show which actors are included at the lowest betweenness (level one, or everyone).. Ego networks Up to now, we have focused on the properties of macro, whole networks. This section is dedicated to the properties of individual focal nodes. "Ego" is an individual "focal" node. Egos can be persons, groups, organizations, or whole societies. "Neighbourhood" is the collection of ego and all nodes to whom ego has a connection at some path length. In social network analysis, the "neighbourhood" is almost always one-step; that is, it includes only ego and actors that are directly adjacent. The neighbourhood also includes all of the ties among all of the actors to whom ego has a direct connection. "N-step neighbourhood" expands the definition of the size of ego's neighbourhood by including all nodes to whom ego has a connection at a path length of N, and all the connections among all of these actors. Ego network data: o Surveys: ask the subjects to identify their connections, and to report to us the ties or two stage snowball or ask the connections according to social roles. Data collected in this way cannot directly inform us about the overall network, but it can give us information on the prevalence of various kinds of ego networks in even very large populations. As the actors in each network are likely to be different people, the networks need to be treated as separate actor-by-actor matrices stored as different data sets o "Extracting" from regular complete network data: Extract multiple, or even all of the ego networks from a full network to be stored as separate files. Data>Extract>Egonet That is: extract a network that includes the 3rd and 5th rows/columns, and all the nodes that are connected to any of these actors. 23

24 Some important calculations concerning Ego Networks: Network>Ego networks>density calculate a substantial number of indexes that describe aspects of the neighborhood of each ego in a data set. we've decided to examine "out-neighbourhoods". Each line describes the one-step ego neighbourhood of a particular actor., Size, Number of directed ties, Number of ordered pairs, Density, Average geodesic distance, Diameter, Betweenness imply same indicators as the macro network. Besides them, some interesting indicators: o Number of weak components. A weak component is the largest number of actors who are connected, disregarding the direction of the ties (a strong component pays attention to the direction of the ties for directed data). If ego was connected to A and B (who are connected to one another), and ego is connected to C and D (who are connected to one another), but A and B are not connected in any way to C and D (except by way of everyone being connected to ego) then there would be two "weak components" in ego's neighbourhood. In our example, there are no such cases -- each ego is embedded in a single component neighbourhood (all 1s). o Two-step reach goes beyond ego's one-step neighborhood to report the percentage of all actors in the whole network that are within two directed steps of ego. In our example, only node 7 cannot get a message to all other actors within "friend-of-a-friend" distance. o Reach efficiency (two-step reach divided by size) norms the two-step reach by dividing it by size. If my neighbours, on the average, have few contacts that I don't have, I have low efficiency. o Brokerage (number of pairs not directly connected). The idea of brokerage (more on this, below) is that ego is the "go-between" for pairs of other actors. If other actors are not connected directly to one another, ego may be a "broker" ego falls on a the paths between the others. One item of interest is simply how much potential for brokerage there is for each actor. In our example, actor number 5, who is connected to almost everyone, is in a position to broker many connections. o Normalized brokerage (brokerage divided by number of pairs) assesses the extent to which ego's role is that of broker. 24

25 Structural holes The concept of structural holes is important since it help us to understand how and why the ways that an actor is connected affect their constraints and opportunities, and hence their behaviour. These holes, and how and where they are distributed can be a source of inequality among actors embedded in networks. No structural holes Structural hole between B and C Network>Ego Networks>Structural Holes Measures related to structural holes can be computed on both valued and binary data. The normal practice in sociological research has been to use binary (a relation is present or not) since interpretation of the measures becomes quite difficult with valued. As an alternative to losing the information that valued data may provide, the input data could be dichotomized (Transform>Dichotomize) at various levels of strength; or the Structural Holes procedure can dichotomise automatically (Dichotomize the data?). Select to dichotomise and Whole network model. 25

26 Output: o Dyadic redundancy: calculates, for each actor in ego's neighbourhood, how many of the other actors in the neighbourhood are also tied to the other. Actor 1's (COUN) tie to actor 2 (COMM) is largely redundant, as 72% of ego's other neighbours also have ties with COMM. Actors that display high dyadic redundancy are actors who are embedded in local neighbourhoods where there are few structural holes. o Dyadic constraint is a measure that indexes the extent to which the relationship between ego and each of the alters in ego's neighbourhood "constrains" ego. That is, A is constrained by its relationship with B to the extent that A does not have many alternatives (has few other ties except that to B), and A's other alternatives are also tied to B. In our example constraint measures are not very large, as most actors have several ties. COMM and MAYR (columns indicate exerting rows being constrained) are, however, exerting constraint over a number of others, and are not very constrained by them. o Effective size of the network (EffSize) is the number of alters that ego has, minus the average number of ties that each alter has to other alters. o Efficiency (Efficie) norms the effective size of ego's network by its actual size. That is, what proportion of ego's ties to its neighbourhood are "non-redundant." The effective size of ego's network may tell us something about ego's total impact; efficiency tells us how much impact ego is getting for each unit invested in using ties. An actor can be effective without being efficient; and actor can be efficient without being effective. 26

27 o Constraint (Constra) is a summary measure that taps the extent to which ego's connections are to others who are connected to one another. If ego's potential trading partners all have one another as potential trading partners, ego is highly constrained. If ego's partners do not have other alternatives in the neighbourhood, they cannot constrain ego's behaviour. The idea of constraint is an important one because it points out that actors who have many ties to others may actually lose freedom of action rather than gain it -- depending on the relationships among the other actors. o Hierarchy (Hierarc) If the total constraint on ego is concentrated in a single other actor, the hierarchy measure will have a higher value. If the constraint results more equally from multiple actors in ego's neighbourhood, hierarchy will be less. It is an important measure of dependency. Brokerage: Focuses on the roles that ego plays in connecting groups. Examines ego's relations with its neighborhood from the perspective of ego acting as a broker in relations among groups. To examine the brokerage roles played by a given actor, we find every instance where that actor lies on the directed path between two others. There are five possible combinations. o Coordinator: B and both the source and destination nodes (A and C) are all members of the same group. o Consultant: B is brokering a relation between two members of the same group, but is not itself a member of that group. o Gatekeeper: B is a member of a group who is at its boundary, and controls access of outsiders (A) to the group. o Representative: B is in the same group as A, and acts as the contact point or representative of the red group to the blue. o Liaison: B is brokering a relation between two groups, and is not part of either. 27

28 To examine brokerage, you need to create an attribute file that identifies which actor is part of which group. Network>Ego Networks>GF Brokerage roles The option "unweighted" needs a little explanation. Suppose that actor B was brokering a relation between actors A and C, and was acting as a "liaison." In the unweighted approach, this would count as one such relation for actor B. But, suppose that there was some other actor D who also was acting as a liaison between A and C. In the "weighted" approach, both B and D would get 1/2 of the credit for this role; in the unweighted approach, both B and D would get full credit. Generally, if we are interested in ego's relations, the unweighted approach would be used. If we were more interested in group relations, a weighted approach might be a better choice Output: Unnormalized brokerage scores for Knoke information network The actors have been grouped together into "partitions" for presentation; actors 1, 3, and 5, for example, form the first type of organization (1: government; 2: private;3: organisational specialist). Two actors (5 and 2) are the main sources of inter-connection among the three organizational populations. Organizations in the third population (6, 8, 9, 10), the welfare specialists, have overall low rates of brokerage. Organizations in the first population (1, 3, 5), the government organizations seem to be more heavily involved in liaison than other roles. Organizations in the second population (2, 4, 7), non-governmental generalists play more diverse roles. Group-to-group brokerage map 28

29 We see that actor 1 (who is in group 1) plays no role in connections from group 1 to itself or the other groups (i.e. the zero entries in the first row of the matrix). Actor 1 does, however, act as a "liaison" in making a connection from group 2 to group 3. Actor 1 also acts as a "consultant" in connecting a member of group 3 to another member of group 3. Expected Values However, in any population, partitioning will produce brokerage -- even if the partitions are not meaningful, or even completely random. We can check the number of relations of each type that would be expected by pure random processes. We ask: what if actors were assigned to groups as we specify, and each actor has the same number of ties to other actors that we actually observe; but, the ties are distributed at random across the available actors? Relative Brokerage If we examine the actual brokerage relative to this random expectation, we can get a better sense of which parts of which actors roles are "significant." That is, occur much more frequently than we would expect in a world characterized by groups, but random relations among them. Larger values tend to be significant that is, observed raw scores are higher than the expected. 29

30 REFERENCES: Conceptual Degenne, A. and M. Forse (1999) Introducing Social Networks. London: SAGE Publications Ltd. Hanneman, R. A. and M. Riddle (2005). Introduction to social network methods. Riverside, CA, University of California, Riverside ( published in digital form at < ). Scott, J (1991) Social Network Analysis: A Handbook. London: SAGE Publications Wasserman, S. and K. Faust. (1994). Social network analysis : methods and applications. Cambridge Cambridge University Press. Political Science Applications Diani, M. and D. McAdam (2003) Social Movements and Networks: Relational Approaches to Collective Action. Oxford: Oxford University Press Knoke, D. (1990) Political Networks: The Structural Perspective. Cambridge: Cambridge University Press La Due Lake, R. and R. Hucjfeldt (1998) Social Capital, Social Networks and Political Participation. Political Psychology 19(3): McClurg, S. D. (2003) Social Networks and Political Participation: The Role of Social Interaction in Explaining Political Participation. Political Research Quarterly 56: Organisational theory applications Borgatti S.P. and P.C. Foster (2003) The Network Paradigm in Organisational Research: A Review and Typology. Journal of Management 29: Kahler, M. (2009) Collective Action and Clandestine Networks: The Case of Al Qaeda. Pp in Networked Politics: Agency, Power and Governance, London: Cornell University Press. The paper can be found online: Nohria, N. and Robert E. (1992) Networks and Organizations: Structure, Form, and Action. Harvard: Harvard Business School Press Porter, K. A. and W.W. Powell (2006) Networks and Organisations Pp in Cleggs S, R., Hardy, C., Lawrence, T.B. and W.R. Nord (eds.) The SAGE Handbook of Organisation Studies, London: SAGE Publication Ltd. International Relations applications Maoz, Z. L., G. Terris, R.D. Kuperman & I. Talmud (2005) International Relations: A Network Approach, in Alex Mintz & Bruce Russett, eds, New Directions for International Relations. Lanham, MD: Lexington (35 64). The paper can be found online: Talmud, I. and S. Mishal. (2000) The Network State: Triangular Relations in Middle Eastern Politics. International Journal of Contemporary Sociology 37(2): The paper can be found online: Applications to text analysis 30

31 Carley, K. (1997). Network Text analysis: The Network Position of concepts. Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts. K. Carley. Mahway, NJ, Lawrence Erlbaum Associates: Diesner, J. and K. Carley (2004). Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. Causal Mapping for Information Systems and Technology Research: Approaches, Advances, and Illustrations. J. C. Diesner, Kathleen. Harrisburg, PA, Idea Group Publishing. Leydesdorff, L. and I. Hellsten (2006). "Measuring the Meanings of Words in Contexts: Automated Analysis of Monarch Butterflies, Frankenfoods and Stem Cells." Scientometrics 67(2): Suerdem, A. K. (2009). A Semiotic Network Comparison of Technocratic and Populist Discourses in Turkey. Do They Walk Like They Talk? Dissonance in Policy Processes Ed. L. M. Imbeau. NY, Springer. Some classical papers Burt, Ronald. (1993). The Social Structure of Competition. Pp in Explorations in Economic Sociology, edited by Richard Swedberg. New York: Sage. Burt, Ronald Structural Holes versus Network Closure as Social Capital. Pp in Social Capital: Theory and Research, edited by Nan Lin, Karen Cook, and Ronald Burt. New York: Aldine De Gruyter. Granovetter, Mark. (1973). The Strength of Weak Ties. American Journal of Sociology 78(6): Lin, Nan Building a Network Theory of Social Capital. Pp in Social Capital: Theory and Research, edited by Nan Lin, Karen Cook, and Ronald Burt. New York: Aldine De Gruyter. 31