Workshop in Applied Analysis Software MY591. Introduction to Social Network Analysis with UCINET

Size: px
Start display at page:

Download "Workshop in Applied Analysis Software MY591. Introduction to Social Network Analysis with UCINET"

Transcription

1 Workshop in Applied Analysis Software MY591 Introduction to Social Network Analysis with UCINET Instructor: Prof. Ahmet K. Suerdem (Istanbul Bilgi University and London School of Economics) Contact: Course Convenor (MY591) Dr. Aude Bicquelet (LSE, Department of Methodology) Contact:

2 Background Information UCINET is a social network analysis program developed by Steve Borgatti, Martin Everett and Lin Freeman. UCINET works in tandem with freeware program called NETDRAW for visualizing networks. To download UCINET and manuals: You can download the trial version of Ucinet for FREE (for 3 months). You can also purchase it for 40$ from Analytic Technologies: This handout is based on the following references: (Hanneman and Riddle 2005); (Wasserman and Faust. 1994) and Christian Stieglitz s Statistical Analysis of Complete Networks: Introduction to Networks Powerpoint slides. (Hanneman and Riddle 2005) is a nice and handy introduction and an online version can be reached at (Wasserman and Faust. 1994) is a classical complete guide to social network analysis. Part I: Introduction to Social Network Analysis (SNA): Major Concepts Social Network Analysis (SNA) is the study of the pattern of interaction between actors Units of analysis are relations, not entities. Nodes in a network are vertices, just the connectors of edges (links, relations, arrows), their attributes are secondary. For example, if Bob (node) is a male, 32 years old, etc has secondary importance for social network analysis. Patterns in Bob s relations depicted by arrows (edges) is primary importance. Therefore, social network data is different from the traditional quantitative data collected through surveys and analysed by statistical software such as SPSS That s how a data entry looks like for traditional quantitative analysis: Rows are cases, columns are attributes, variables; usually a rectangular matrix. Numbers in the cells depict the value a case takes on an attribute: Bob is a 32 yrs old Male. For SNA, data entry is a square matrix depicting the relations between cases. Cells depict the existence or value of a relation: Carol likes Bob=1. Rows are sending and columns are receiving relations. Carol likes Bob but Bob does not like Carol. 2

3 An important note for network data: network data cannot be considered as quantitative in the conventional sense. The statistical procedures that hold for random samples do generally NOT hold for a network data set since the social actors are dependent. This is a violation of the independence of the sampling units assumption. Therefore, procedures from classical statistics are not applicable (no regression, no ANOVA, no t-tests...). Special statistical procedures are required (bootstrapping, simulation) Then, SNA quantitative or qualitative? Social network research involves a huge toolbox of quantitative measures. However, typically, just one social system is under study. In fact, SNA is a form of case study; not a random sample of independent cases. SNA uses quantitative tools for qualitative case studies. We cannot habitually generalise the results of a case study. Therefore, some form of external validation of SNA results is always desirable. Replication of results is essential. Better yet, work with a random sample of networks. Then you can also generalise results to a population (of networks). Data collection: strategies for sampling a. Full network methods: require that we collect information about each actor's ties with all other actors. For example, we could examine the boards of directors of all public corporations for overlapping directors; who likes whom in a classroom etc. Advantages: Full network data is necessary to properly define and measure many of the structural concepts of network analysis. Disadvantages: can be very expensive, difficult and sometimes unrealistic to collect. b. Snowball methods: begin with a focal actor or set of actors. Each of these actors is asked to name some or all of their ties to other actors. Then, all the actors named (who were not part of the original list) are tracked down and asked for some or all of their ties. The process continues until no new actors are identified, or until we decide to stop. Advantages: can be particularly helpful for tracking down "special" populations such as business contact networks, community elites and deviant sub-cultures. Disadvantages: First, this method may tend to overstate the "connectedness" and "solidarity". Second, there is no guaranteed way of finding all of the connected individuals in the population. c. Ego-centric networks (with alter connections): In many cases it will not be possible (or necessary) to track down the full networks beginning with focal nodes (as in the snowball method). An alternative approach is to begin with a selection of focal nodes (egos), and identify the nodes to which they are connected. Then, we determine which of the nodes identified in the first stage are connected to one another. This can be done by contacting each of the nodes; sometimes we can ask ego to report which of the nodes that it is tied to are tied to one another. Advantages: can be quite effective for collecting a form of relational data from very large populations, and can be combined with attribute-based approaches. Such data can be very useful in helping to understand the opportunities and constraints that an individual has as a result of the way they are embedded in their networks. Disadvantages: Such data are, in fact, samplings of local areas of larger networks. Many network properties -- distance, centrality, and various kinds of positional equivalence cannot be assessed with ego-centric data. Some properties, such as overall network density can be reasonably estimated. Some properties -- such as the prevalence of reciprocal ties, cliques, and the like can be estimated rather directly. d. Ego-centric networks (ego only) Ego-centric methods really focus on the individual, rather than on the network as a whole. By collecting information on the connections among the actors connected to each focal ego, we can 3

4 still get a pretty good picture of the "local" networks or "neighbourhoods" of individuals. Such information is useful for understanding how networks affect individuals, and they also give a (incomplete) picture of the general texture of the network as a whole. Advantages: can understand something about the differences in the actors places in social structure, and make some predictions about how these locations constrain their behaviour. Disadvantages: can analyse only networks around focal egos (individual nodes) not the whole network. Multiple relations: actors may be connected to each other in terms of different relations: faculty have students in common, serve on the same committees, interact as friends outside of the workplace, have one or more areas of expertise in common, and co-author papers. Usually our research question and theory indicate which of the kinds of relations among actors are the most relevant to our study, and we do not sample -- but rather select -- relations. Methodologies for working with multirelational data are not as well developed as those for working with single relations. Multimodality: individuals may form networks in terms of their affiliations to different organisations or contexts. A data set that contains information about two types of social entities (say persons and organizations) is a two mode network. Case (person, actor) by Affiliation (organisation, context, event; etc ) matrices are called incidence matrices. Example of a six children and three birthday parties affiliation matrix.,an affiliation network can be represented by a bipartite graph. 4

5 The lines in the bipartite graph represent the relation "is affiliated with" (from the perspective of actors) or "has as a member" (from the perspective of events). Since actors are affiliated with events, and events have actors as members, all lines in the bipartite graph are between nodes representing actors and nodes representing events. Part II: Starting UCINET and basic operations When you first open UCINET, set the default directory to a directory of your choice, by typing in the directory name (into the space at the bottom edge of the UCINET window). The original default directory is just the c:\ drive. Note that UCINET produces many types of files and deleting any (before you are entirely done with your analysis) may make it difficult to use some of the others. If you do not set the default directory, it may be very difficult to manage your files. File>Change Default Folder or Make New folder How to enter data into UCINET? There are several ways for doing this. Most common ones are either entering the data to an excel file or using the UCINET Matrix spreadsheet editor. How to prepare data with excel: Network matrix: attribute matrix affiliation matrix To import: Data->Import via spreadsheet->full matrix w/ multiple sheets (the filename in the example is for demonstration purposes only; please import your own data, or the sample file provided) 5

6 Choose Node attribute or Network adjacency matrix depending on the type of your data. This is for one mode networks. If your data is bi-modal data import through Network adjacency matrix and create one-mode data sets. Data>Affiliatons If you choose the rows mode then you will have an actor by actor matrix, if you choose Columns then an event by event (or organisation by organisation etc ) matrix. The Cross-Products method takes each entry of the row for actor A, and multiplies it times the same entry for actor B, and then sums the result. Usually, this method is used for binary data because the result is a count of co-occurrence. The minimums method examines the entries for the two actors at each event, and selects the minimum value. This approach is commonly used when the original data are measured as valued. Bi-modal data are sometimes stored in a second way, called the "bipartite" matrix. The upper left actor (g) x actor (g) submatrix and the lower right event (h) x event (h) submatrix are filled with O's, indicating no "affiliation" ties among the g actors (the first g rows and columns) or among the h events (the last h rows and columns). The upper right submatrix is the g x h affiliation matrix, A, indicating "is affiliated with" ties from row actors to column events. The lower left h x g submatrix is the transpose of A, denoted by A', indicating whether or not each row event includes the column actor. Transform>Bipartite tool converts two-mode rectangular matrices to one-mode bipartite matrices. 6

7 Without going through Excel, you can also directly enter the data through the UCINET Matrix spreadsheet editor: Data>Spreadsheets>Matrix Visualising the network data: Netdraw You can access to Netdraw from within the UCINET by simply clicking visualize network with netdraw: Once you open Netdraw; to open a network data to visualise: Netdraw>File>Open>UCInet dataset>network Select: either 1-Mode Networks; Node Attributes or 2-Mode Networks options depending on the type of your data. 1-Mode Network: 1-Mode Network with attributes: To visualise according to attributes: sex is represented by colour, and age by size To visualise sex: Nodes>Symbols>Colour and from Colour>Attribute based Select sex: 7

8 For age: Nodes>Symbols>size and from Size>Attribute based Select age Now female are blue and the male are red; as age increases, node size also increases. Relation properties: Suppose that you wanted to highlight certain "types" of relations in the graph. For example, a network in a classroom may have friendhip and note exchange networks. To do this you need to enter the data for each relationship to a separate Sheet in Excel. In our sample data friendship data is in the first and note exchange data is in the second. Properties>Lines>Color from the menu. Then select Relations. Caution: before doing this you need to select both of the sheets: select Rels at the right hand side window and select Sheet 1 and Sheet 2. Red lines depict Friendship (Sheet 1) and Blue lines depict note exchange relations (Sheet 2) and the invisible lines depict if the relation exists for both. You can also show the strength of the relations if the data is valued: Properties>Lines>Size. Then, select Tie-Strength (before doing this, select only Sheet 1 and Properties>Lines>Color select General) 8

9 Visualizing two-mode data NetDraw>File>Open>UCINET dataset>2-mode Network Location: Where a node or a relation is drawn in the space is essentially arbitrary; i.e. their configuration in the two dimensional space is random. Since the X and Y directions don't "mean" anything, the location of the nodes and relations don't provide any particular insight. That is, the distances between the nodes are arbitrary, and can't be interpreted in any meaningful way as "closeness" of the actors. And, the "directions" X and Y have no meaning -- we could rotate any of the graphs any amount, and it would not change a thing about our interpretation. Therefore, we can "drag and drop" to relocate the nodes so that actors that share the same combinations of attributes come nearby (for example, males on one side and females on the other). To do this automatically, NetDraw has a built-in tool that allows the user to assign the X and Y dimensions of the graph to scores on attributes (either categorical or continuous): Layout>Attributes as Coordinates, and then select attributes to be assigned to X or Y or both. 9

10 Now, males are grouped on the right hand side of the X axis, and as age increases nodes go upper on the Y axis. To visualize which nodes are most highly connected. Layout>Circle, and selecting the optimization button. The nodes are located at equal distances around a circle, and nodes that are highly connected are very easy to quickly locate (Jim and Bob). So far, we have stated that "closeness" of the actors in the layout can't be interpreted in any meaningful way. However, there are several other commonly used graphic layouts that do try to make the distances and/or directions of locations among the actors somewhat more meaningful: One way of doing this is "Multi-Dimensional Scaling" of the network space. MDS is a family of techniques that is used (in network analysis) to assign locations to nodes in multidimensional space (in the case of the drawing, a 2-dimensional space) such that nodes that are "more similar" are closer together. Similarity in this context refers to similarity in terms of embeddedness in a network of relations ; i.e; similarity in terms of structural positions: structural equivalence. Position means sets of others the subsets of actors who are similarly connected to others in the network. Layout>Graph Theoretic Layout>MDS. 10

11 Carole and Alex are closer in terms of the configuration of their connections. Alice have a very different pattern of ties to the other nodes. This approach can be useful to reveal structural equivalences. Part III: Basic Concepts in Descriptive Network Analysis Visualisation gives us a rough picture about a network. Especially in case of large networks it can be very difficult to understand the properties of a network structure. Network indicators help us for a more tidy and systematic analysis. The basic properties of networks are easier to learn and understand by example. For this workshop we will look at a single directed binary network that describes the flow of information among 10 formal organizations concerned with social welfare issues in one mid-western U.S. city (Knoke and Burke). Change your default folder to: C:\ Program files\ Analytic technologies\ucinet\datafiles. KNOKBUR.##H I the working file. Measures of Cohesion Dyad census: A dyad is a pair of actors <i,j> in the network, plus the configuration of tie variables <xij, xji> between them. In a directed, binary network, there are n(n-1) tie variables located in n(n- 1)/2 dyads. A simple count of types ( dyad census ) gives information about the degree to which the network is symmetric. Network indices based on the dyad census: o Density of the network is defined as the proportion of actually observed ties among the potentially observable ones. Actually observed are 2M+A/Potentially observable are n(n-1) Network>Cohesion>Density There are two ways of calculating density: overall or by groups. 11

12 In the output we can see that 54% of all the possible ties are present. You can also calculate the density by groups. For this purpose you need an attribute file indicating the properties of each node. (For this we will use our previous who likes who example since KNOKEBUR does not have an attribute file. Take the second sheet since the cells are binary. Interpretation of valued data is more complicated; For a valued network, density is defined as the sum of the ties divided by the number of possible ties (i.e. the ratio of all tie strength that is actually present to the number of possible ties).). Besides selecting your network dataset as usual, select also the attribute file (Dataset containing row and column partition and select the attribute you want to partition your network data; in this case sex). Density (prop of ties) / Average tie strength Output: 0 is the female and 1 is the male. So density within the female group is 0.500; from female to male 0.583; male to female and male to male is From this we can deduce that liking relation is more dense between sexes than within sexes (for this particular group). o Reciprocity index of the network Can be defined as the proportion of actually reciprocated ties among the potentially reciprocable ones. Actually reciprocated are 2M/ Potentially reciprocable are 2M+A DensityNetwork>Cohesion>Reciprocity 12

13 A network that has a predominance of null or reciprocated ties over asymmetric connections may be a more "equal" or "stable" network than one with a predominance of asymmetric connections (which might be more of a hierarchy). Overall reciprocity in the network is This is neither high nor low suggesting a considerable degree of institutionalized horizontal connection within this organizational population. We can also examine the reciprocity of individual actors. In this example, Actor five has the most reciprocal relations. Triad census: A triad is a set of three actors in the network, together with the configuration of ties between them. In a directed, binary network, there are sixteen triad types typically indicated by their dyad census M-A-N plus (where necessary) a distinguishing letter: for example: 021U indicates: 0 Mutual, 2 Asymmetric and 1 Null ties. Ties are Upwards. Triad census can give us important clues about hierarchy, equality, and the formation of exclusive groups (e.g. where two actors connect, and exclude the third). However, UCINET does not have a routine for conducting triad censuses. Network indices based on the triad census In particular, we may be interested in the proportion of triads that are "transitive" (that is, display a type of balance where, if i directs a tie to j, and j directs a tie to k, then i also directs a tie to k). Such transitive or balanced triads are argued by some theorists to be the "equilibrium" or natural state toward which triadic relationships tend (not all theorists would agree!). 13

14 UCINET can calculate this. Network>Cohesion>Transitivity 146 transitive (directed) triples. That is, there are 146 cases where, if AB and BC are present, then AC is also present. Number of triples of all kinds (720). The proportion of transitive triples to triples of all kinds norms the transitivity. That is, A better way of norming transitivity is dividing the number of transitive triples to the number of cases where a single link could complete the triad (Number of triples in which i->j and j->k: 217). That is, For random graphs, the expected value of the transitivity index is close to the density of the graph; for actual social networks, values between 0.3 and 0.6 are quite usual. Remember that the density of this network was ; thus the proportion of transitive triads is more than expected than a random graph. Network can be considered as a stable network. 14

15 Indicators related to the degree of connectivity of a network. o Reachability : shows if there is the potential of a division of the network. In the output, if any of the cells other than the diagonals have a zero, than the network breaks at the link between those two actors. Network>Cohesion>Reachability o Connectivity: Network>Cohesion>Point Connectivity calculates the number of nodes that would have to be removed in order for one actor to no longer be able to reach another. If there are many different pathways that connect two actors, they have high "connectivity" in the sense that there are multiple ways for a signal to reach from one to the other. Distance: Cohesion properties that we have examined so far primarily deal with the direct connections from one actor to the next. However, indirect connections may also be important for understanding the cohesiveness of a network. Distance basically indicates how many steps are required to reach an actor from another. How many actors are at various distances from each actor can be important for understanding the differences among actors in the constraints and opportunities they have as a result of their position. For example, where distances are great, it may take a long time for information to diffuse across a population. The variability across the actors in the distances that they have from other actors may be a basis for differentiation and even stratification. Those actors who are closer to more others may be able to exert more power than those who are more distant. o Geodesic distance: is the number of relations in the shortest possible walk from one actor to another i.e.; the "optimal" or most "efficient" connection between two actors. It is also possible to define the distance between two actors where the links are valued. Where we have measures of the strengths of ties, the "distance" between two actors is defined as the strength of the weakest path between them. If A sends 6 units to B, and B sends 4 units to C, the "strength" of the path from A to C (assuming A to B to C is the shortest path) is 4. Where we have a measure of the cost of making a connection (as in an "opportunity cost" or "transaction cost" analysis), the "distance" between two actors is defined as the sum of the costs along the shortest pathway. Network>Cohesion>Distance Type of data may be Adjacency (default), Strength, Cost or Probabilities. Nearness transformation: multiplicative: divides the distance by the largest possible distance between two actors. additive: subtracts the actual distance between two actors from the number of nodes. linear : rescales distance by reversing the scale (i.e. the closest becomes the most distant, the most distant becomes the nearest) and re-scoring to make the scale range from zero (closest pair of nodes) to one (most distant pair of nodes). 15

16 exponential decay: turns distance into nearness by weighting the links in the pathway with decreasing values as they fall farther away from ego. With an attenuation factor of.5, for example, a path from A to B to C would result in a distance of 1.5. frequency decay : 1 minus the proportion of other actors who are as close or closer to the target as ego is. Output: Besides giving average distance for the whole network, the output gives indicators of compactness and its inverse, fragmentation. Compactness is the harmonic mean of the entries in the distance matrix (that is the normalized sum of the reciprocal of all the distances). For this network, compactness is near 1 (0.759), so this is a rather connected network. We can see this from the individual geodesic distances among actors. Most of the actors are only one step distant from the others (average, 1.533): Eccentricity and diameter: For each actor, that actor's largest geodesic distance is its eccentricity-- a measure of how far a actor is from the furthest other. The diameter of a network is the largest geodesic distance in the network. Many researchers limit their explorations of the connections among actors to involve connections that are no longer than the diameter of the network. Flow: The geodesic distance examines only a single connection between a pair of actors. Sometimes the sum of all connections between actors, rather than the shortest connection may be relevant. If I start a rumour, for example, it will pass through a network by all pathways -- not just the most efficient ones. How much credence another person gives my rumour may depend on how many times they hear it from different sources -- and not how soon they hear it. One notion of how totally connected two actors are (called maximum flow by UCINET) asks how many different actors in the neighbourhood of a source lead to pathways to a target. The "flow" approach suggests that the strength of my tie to you is no stronger than the weakest link in the chain of connections, where weakness means a lack of alternatives. Network>Cohesion>Maximum Flow. 16

17 Note that actors 6, 7, and 9 are relatively disadvantaged. In particular, actor 6 has only one way of obtaining information from all other actors (the column vector of flows to actor 6). Hubbell and Katz cohesion. If we are interested in how much two actors may influence one another, or share a sense of common position, the full range of their connections should probably be considered. That is the length of the connections are taken into account. A path of length 10 is not same as a path of length 1. The Hubbell and Katz approaches count the total connections between actors (ties for undirected data, both sending and receiving ties for directed data). Each connection, however, is given a weight, according to it's length. The greater the length, the weaker the connection. How much weaker the connection becomes with increasing length depends on an "attenuation" factor. In our example, below, we have used an attenuation factor of.5. That is, a direct connection receives a weight of one, a walk of length two receives a weight of.5, a connection of length three receives a weight of.5 squared (.25) etc. Shows the pairwise solidarity between actors. Large negative distances indicate that the pair of actors are very close relative to the other pairs, or have high solidarity. Measures of Centrality and Power Network approach emphasizes that power is inherently relational. Actors do not have power in the abstract, they have power because they can dominate others -- ego's power is alter's dependence, and vice versa. Thus, power is a consequence of patterns of relations. Actors that face fewer constraints, and have more opportunities than others are in favourable structural positions. Having a favoured position means that an actor may extract better bargains in exchanges, and that the actor will be a focus for deference and attention from those in less favoured positions. What do we mean by "having a favoured position" and having "more opportunities" and "fewer constraints? There are different aspects of power: 17

18 An actor may get its power from having many connections, being close to or in between all other actors. Now let s focus on how to handle these different aspects of power with UCINET. Degree centrality; Freeman s approach: Actors who have more ties to other actors may be advantaged positions. In undirected data, actors differ from one another only in how many connections they have. With directed data, however, it can be important to distinguish centrality based on in-degree from centrality based on out-degree. If an actor receives many ties, they are often said to be prominent, or to have high prestige. Actors who have unusually high out-degree are actors who are able to exchange with many others, or make many others aware of their views. Actors who display high out-degree centrality are often said to be influential actors (evidently, depending on the nature of relation). Network>Centrality>Degree Actors #5 and #2 have the greatest out-degrees, and might be regarded as the most influential (though it might matter to whom they are sending information, this measure does not take that into account). Actors #5 and #2 are joined by #7 (the newspaper) when we examine in-degree. Actor 7 has the largest in-degree, so it may be prominent in the sense of receiving information. To compare across networks of different sizes or densities, it might be useful to "standardize" the measures of in and out-degree (NrmOUtDeg and NRMIn Deg). Coming to the Descriptive statistics panel, the mean degree is 4.9, which is quite high, given that there are only nine other actors. We see that the range of in-degree is slightly larger (minimum and maximum) than that of out-degree, and that there is more variability across the actors in in-degree than out-degree (standard deviations and variances). The range and variability of degree (and other network properties) can be quite important, because it describes whether the population is homogeneous or heterogeneous in terms of their structural positions. One could examine whether the variability is high or low relative to the typical scores by calculating the coefficient of variation (standard deviation divided by mean, times 100) for in-degree and out-degree. By the rules of thumb that are often used to evaluate coefficients of variation, the current values (35 for out-degree and 53 for in-degree) are moderate. Clearly, however, the population is more homogeneous with regard to out-degree (influence) than with regard to in-degree (prominence). 18

19 Network centralization: the degree of inequality or variance in our network as a percentage of that of a perfect star network of the same size. In the current case, the out-degree graph centralization is 51% and the in-degree graph centralization 38% of these theoretical maximums. It is a measure of hierarchy, values approaching one imply more hierarchy. Degree centrality: Bonacich's approach: Bonacich questioned the idea that more central actors are more likely to be more powerful actors. But if the actors that you are connected to are, themselves, well connected, they have other alternatives being connected to connected others makes an actor central, but not powerful. Then there are two aspects of power: o Centrality: The more connections the actors in your neighbourhood, o Power: The fewer the connections the actors in your neighbourhood, Network>Centrality>Power Beta coefficient(attenuation factor): Positive (between zero and one): connections have more connections, implies centrality o Negative values: connections are more dependent; connections have less connections, implies power Outputs: Coefficient: 0.5; centrality Coefficient:- 0.5, power 19

20 MAYR and COMM are clearly the most central for the positive coefficient. However, with a negative attenuation parameter, we have a quite different definition of power -- having weak neighbours, rather than strong ones. Actors numbers COMM and WRO are distinguished because their ties are mostly ties to actors with high degree -- making actors COMM and WRO "weak" by having powerful neighbours. By this definition of power, COMM is not powerful although it is central. Closeness centrality Degree centrality measures might be criticized because they only take into account the immediate ties that an actor has, or the ties of the actor's neighbours, rather than indirect ties to all others. Closeness centrality approaches emphasize the distance of an actor to all others in the network by focusing on the distance from each actor to all others. Network>Centrality>Closeness provides a number of alternative ways of calculating the "farness" of each actor from all others. Far-ness is the sum of the distance (by various approaches) from each ego to all others in the network. The most common is the geodesic path distance. Here, "far-ness" is the sum of the lengths of the shortest paths from ego (or to ego) from all other nodes. Far-ness" is then transformed into "nearness" as the reciprocal of farness. We see that actor 6 has the largest sum of geodesic distances from other actors (infarness of 22) and to other actors (outfarness of 17). The farness figures can be re-expressed as nearness (the reciprocal of far-ness) and normed relative to the greatest nearness observed in the graph (here, the incloseness of actor 7). 20

21 Summary statistics on the distribution of the nearness and farness measures are also calculated. We see that the distribution of out-closeness has less variability than in-closeness, for example. This is also reflected in the graph in-centralization (71.5%) and out-centralization (54.1%) measures; that is, in-distances are more unequally distributed than are out-distances. Closeness centrality: Eigenvector of geodesic distances Consider two actors, A and B. Actor A is quite close to a small and fairly closed group within a larger network, and rather distant from many of the members of the population. Actor B is at a moderate distance from all of the members of the population. The farness measures for actor A and actor B could be quite similar in magnitude. In a sense, however, actor B is really more "central" than actor A in this example, because B is able to reach more of the network with same amount of effort. The eigenvector approach is an effort to find the most central actors (i.e. those with the smallest farness from others) in terms of the "global" or "overall" structure of the network, and to pay less attention to patterns that are more "local." Network>Centrality>Eigenvector Usually, the first dimension captures the "global" aspects of distances among actors; second and further dimensions capture more specific and local sub-structures. The first set of statistics, the eigenvalues, tell us how much of the overall pattern of distances among actors can be seen as reflecting the global pattern (the first eigenvalue), and more local, or additional patterns. We are interested in the percentage of the overall variation in distances that is accounted for by the first factor. Here, this percentage is 74.3%. This means that about 3/4 of all of the distances among actors are reflective of the main dimension or pattern. If this amount is not large (say over 70%), great caution should be exercised in interpreting the further results, because the dominant pattern is not doing a very complete job of describing the data. The first eigenvalue should also be considerably larger than the second (here, the ratio of the first eigenvalue to the second is about 5.6 to 1). This means that the dominant pattern is, in a sense, 5.6 times as "important" as the secondary pattern. Next, we turn our attention to the scores of each of the cases on the 1st eigenvector. Higher scores indicate that actors are "more central" to the main pattern of distances among all of the actors, lower values indicate that actors are more peripheral. The results are very similar to those for our earlier analysis of closeness centrality, with actors #7, #5, and #2 being most central, and actor #6 being most peripheral. Usually the eigenvalue approach will do what it is supposed to do: give us a 21

22 "cleaned-up" version of the closeness centrality measures, as it does here. It is a good idea to examine both, and to compare them. Betweenness centrality Explains the extent that the actor falls on the geodesic paths between other pairs of actors in the network. The more people depend on me to make connections with other people, the more power I have. Network>Centrality>Freeman Betweenness>Node Betwenness We can see that there is quite a bit of variation (std. dev. = 6.2 relative to a mean betweenness of 4.8). Despite this, the overall network centralization is relatively low (20.11%). This makes sense, because we know that fully one half of all connections can be made in this network without the aid of any intermediary." In the sense of structural constraint, there is not a lot of "power" in this network. Actors #2, #3, and #5 appear to be relatively a good bit more powerful than others by this measure. Indeed, it would not be surprising if these three actors saw themselves as the movers-and-shakers, and the deal-makers that made things happen., Another way to think about betweenness is to ask which relations are most central, rather than which actors. Network>Centrality>Freeman Betweenness>Line (edge) Betweenness Betweenness is zero if there is no tie, or if a tie that is present is not part of any geodesic paths. There are some quite central relations in the graph. For example, the tie from the board of education (actor 3) to the welfare rights organization (actor 6). This particular high value arises because without the tie to actor 3, actor 6 would be largely isolated. 22

23 Network>Centrality>Freeman Betweenness>Hierarchical Reduction is an algorithm that identifies which actors fall at which levels of a hierarchy (if there is one). Since there is little hierarchy in KNOKE data (remember %), we take another example: KAPMINE: data collected by Bruce Kapferer (1969) on men working on the surface in a mining operation in Zambia (then Northern Rhodesia). The first portion of the output shows a partition (which can be saved as a file, and used as an attribute to colour a graph) of the node's level in the hierarchy. For example, first node is at the top (3), second at the second and the third is at the lowest (1) of the hierarchy, while the 4th node is at the third level again. The second portion of the output has re-arranged the nodes to show which actors are included at the lowest betweenness (level one, or everyone).. Ego networks Up to now, we have focused on the properties of macro, whole networks. This section is dedicated to the properties of individual focal nodes. "Ego" is an individual "focal" node. Egos can be persons, groups, organizations, or whole societies. "Neighbourhood" is the collection of ego and all nodes to whom ego has a connection at some path length. In social network analysis, the "neighbourhood" is almost always one-step; that is, it includes only ego and actors that are directly adjacent. The neighbourhood also includes all of the ties among all of the actors to whom ego has a direct connection. "N-step neighbourhood" expands the definition of the size of ego's neighbourhood by including all nodes to whom ego has a connection at a path length of N, and all the connections among all of these actors. Ego network data: o Surveys: ask the subjects to identify their connections, and to report to us the ties or two stage snowball or ask the connections according to social roles. Data collected in this way cannot directly inform us about the overall network, but it can give us information on the prevalence of various kinds of ego networks in even very large populations. As the actors in each network are likely to be different people, the networks need to be treated as separate actor-by-actor matrices stored as different data sets o "Extracting" from regular complete network data: Extract multiple, or even all of the ego networks from a full network to be stored as separate files. Data>Extract>Egonet That is: extract a network that includes the 3rd and 5th rows/columns, and all the nodes that are connected to any of these actors. 23

24 Some important calculations concerning Ego Networks: Network>Ego networks>density calculate a substantial number of indexes that describe aspects of the neighborhood of each ego in a data set. we've decided to examine "out-neighbourhoods". Each line describes the one-step ego neighbourhood of a particular actor., Size, Number of directed ties, Number of ordered pairs, Density, Average geodesic distance, Diameter, Betweenness imply same indicators as the macro network. Besides them, some interesting indicators: o Number of weak components. A weak component is the largest number of actors who are connected, disregarding the direction of the ties (a strong component pays attention to the direction of the ties for directed data). If ego was connected to A and B (who are connected to one another), and ego is connected to C and D (who are connected to one another), but A and B are not connected in any way to C and D (except by way of everyone being connected to ego) then there would be two "weak components" in ego's neighbourhood. In our example, there are no such cases -- each ego is embedded in a single component neighbourhood (all 1s). o Two-step reach goes beyond ego's one-step neighborhood to report the percentage of all actors in the whole network that are within two directed steps of ego. In our example, only node 7 cannot get a message to all other actors within "friend-of-a-friend" distance. o Reach efficiency (two-step reach divided by size) norms the two-step reach by dividing it by size. If my neighbours, on the average, have few contacts that I don't have, I have low efficiency. o Brokerage (number of pairs not directly connected). The idea of brokerage (more on this, below) is that ego is the "go-between" for pairs of other actors. If other actors are not connected directly to one another, ego may be a "broker" ego falls on a the paths between the others. One item of interest is simply how much potential for brokerage there is for each actor. In our example, actor number 5, who is connected to almost everyone, is in a position to broker many connections. o Normalized brokerage (brokerage divided by number of pairs) assesses the extent to which ego's role is that of broker. 24

25 Structural holes The concept of structural holes is important since it help us to understand how and why the ways that an actor is connected affect their constraints and opportunities, and hence their behaviour. These holes, and how and where they are distributed can be a source of inequality among actors embedded in networks. No structural holes Structural hole between B and C Network>Ego Networks>Structural Holes Measures related to structural holes can be computed on both valued and binary data. The normal practice in sociological research has been to use binary (a relation is present or not) since interpretation of the measures becomes quite difficult with valued. As an alternative to losing the information that valued data may provide, the input data could be dichotomized (Transform>Dichotomize) at various levels of strength; or the Structural Holes procedure can dichotomise automatically (Dichotomize the data?). Select to dichotomise and Whole network model. 25

26 Output: o Dyadic redundancy: calculates, for each actor in ego's neighbourhood, how many of the other actors in the neighbourhood are also tied to the other. Actor 1's (COUN) tie to actor 2 (COMM) is largely redundant, as 72% of ego's other neighbours also have ties with COMM. Actors that display high dyadic redundancy are actors who are embedded in local neighbourhoods where there are few structural holes. o Dyadic constraint is a measure that indexes the extent to which the relationship between ego and each of the alters in ego's neighbourhood "constrains" ego. That is, A is constrained by its relationship with B to the extent that A does not have many alternatives (has few other ties except that to B), and A's other alternatives are also tied to B. In our example constraint measures are not very large, as most actors have several ties. COMM and MAYR (columns indicate exerting rows being constrained) are, however, exerting constraint over a number of others, and are not very constrained by them. o Effective size of the network (EffSize) is the number of alters that ego has, minus the average number of ties that each alter has to other alters. o Efficiency (Efficie) norms the effective size of ego's network by its actual size. That is, what proportion of ego's ties to its neighbourhood are "non-redundant." The effective size of ego's network may tell us something about ego's total impact; efficiency tells us how much impact ego is getting for each unit invested in using ties. An actor can be effective without being efficient; and actor can be efficient without being effective. 26

27 o Constraint (Constra) is a summary measure that taps the extent to which ego's connections are to others who are connected to one another. If ego's potential trading partners all have one another as potential trading partners, ego is highly constrained. If ego's partners do not have other alternatives in the neighbourhood, they cannot constrain ego's behaviour. The idea of constraint is an important one because it points out that actors who have many ties to others may actually lose freedom of action rather than gain it -- depending on the relationships among the other actors. o Hierarchy (Hierarc) If the total constraint on ego is concentrated in a single other actor, the hierarchy measure will have a higher value. If the constraint results more equally from multiple actors in ego's neighbourhood, hierarchy will be less. It is an important measure of dependency. Brokerage: Focuses on the roles that ego plays in connecting groups. Examines ego's relations with its neighborhood from the perspective of ego acting as a broker in relations among groups. To examine the brokerage roles played by a given actor, we find every instance where that actor lies on the directed path between two others. There are five possible combinations. o Coordinator: B and both the source and destination nodes (A and C) are all members of the same group. o Consultant: B is brokering a relation between two members of the same group, but is not itself a member of that group. o Gatekeeper: B is a member of a group who is at its boundary, and controls access of outsiders (A) to the group. o Representative: B is in the same group as A, and acts as the contact point or representative of the red group to the blue. o Liaison: B is brokering a relation between two groups, and is not part of either. 27

28 To examine brokerage, you need to create an attribute file that identifies which actor is part of which group. Network>Ego Networks>GF Brokerage roles The option "unweighted" needs a little explanation. Suppose that actor B was brokering a relation between actors A and C, and was acting as a "liaison." In the unweighted approach, this would count as one such relation for actor B. But, suppose that there was some other actor D who also was acting as a liaison between A and C. In the "weighted" approach, both B and D would get 1/2 of the credit for this role; in the unweighted approach, both B and D would get full credit. Generally, if we are interested in ego's relations, the unweighted approach would be used. If we were more interested in group relations, a weighted approach might be a better choice Output: Unnormalized brokerage scores for Knoke information network The actors have been grouped together into "partitions" for presentation; actors 1, 3, and 5, for example, form the first type of organization (1: government; 2: private;3: organisational specialist). Two actors (5 and 2) are the main sources of inter-connection among the three organizational populations. Organizations in the third population (6, 8, 9, 10), the welfare specialists, have overall low rates of brokerage. Organizations in the first population (1, 3, 5), the government organizations seem to be more heavily involved in liaison than other roles. Organizations in the second population (2, 4, 7), non-governmental generalists play more diverse roles. Group-to-group brokerage map 28

29 We see that actor 1 (who is in group 1) plays no role in connections from group 1 to itself or the other groups (i.e. the zero entries in the first row of the matrix). Actor 1 does, however, act as a "liaison" in making a connection from group 2 to group 3. Actor 1 also acts as a "consultant" in connecting a member of group 3 to another member of group 3. Expected Values However, in any population, partitioning will produce brokerage -- even if the partitions are not meaningful, or even completely random. We can check the number of relations of each type that would be expected by pure random processes. We ask: what if actors were assigned to groups as we specify, and each actor has the same number of ties to other actors that we actually observe; but, the ties are distributed at random across the available actors? Relative Brokerage If we examine the actual brokerage relative to this random expectation, we can get a better sense of which parts of which actors roles are "significant." That is, occur much more frequently than we would expect in a world characterized by groups, but random relations among them. Larger values tend to be significant that is, observed raw scores are higher than the expected. 29

30 REFERENCES: Conceptual Degenne, A. and M. Forse (1999) Introducing Social Networks. London: SAGE Publications Ltd. Hanneman, R. A. and M. Riddle (2005). Introduction to social network methods. Riverside, CA, University of California, Riverside ( published in digital form at < ). Scott, J (1991) Social Network Analysis: A Handbook. London: SAGE Publications Wasserman, S. and K. Faust. (1994). Social network analysis : methods and applications. Cambridge Cambridge University Press. Political Science Applications Diani, M. and D. McAdam (2003) Social Movements and Networks: Relational Approaches to Collective Action. Oxford: Oxford University Press Knoke, D. (1990) Political Networks: The Structural Perspective. Cambridge: Cambridge University Press La Due Lake, R. and R. Hucjfeldt (1998) Social Capital, Social Networks and Political Participation. Political Psychology 19(3): McClurg, S. D. (2003) Social Networks and Political Participation: The Role of Social Interaction in Explaining Political Participation. Political Research Quarterly 56: Organisational theory applications Borgatti S.P. and P.C. Foster (2003) The Network Paradigm in Organisational Research: A Review and Typology. Journal of Management 29: Kahler, M. (2009) Collective Action and Clandestine Networks: The Case of Al Qaeda. Pp in Networked Politics: Agency, Power and Governance, London: Cornell University Press. The paper can be found online: Nohria, N. and Robert E. (1992) Networks and Organizations: Structure, Form, and Action. Harvard: Harvard Business School Press Porter, K. A. and W.W. Powell (2006) Networks and Organisations Pp in Cleggs S, R., Hardy, C., Lawrence, T.B. and W.R. Nord (eds.) The SAGE Handbook of Organisation Studies, London: SAGE Publication Ltd. International Relations applications Maoz, Z. L., G. Terris, R.D. Kuperman & I. Talmud (2005) International Relations: A Network Approach, in Alex Mintz & Bruce Russett, eds, New Directions for International Relations. Lanham, MD: Lexington (35 64). The paper can be found online: Talmud, I. and S. Mishal. (2000) The Network State: Triangular Relations in Middle Eastern Politics. International Journal of Contemporary Sociology 37(2): The paper can be found online: Applications to text analysis 30

Introduction to Social Network Methods

Introduction to Social Network Methods Introduction to Social Network Methods Table of Contents This page is the starting point for an on-line textbook supporting Sociology 157, an undergraduate introductory course on social network analysis.

More information

UCINET Visualization and Quantitative Analysis Tutorial

UCINET Visualization and Quantitative Analysis Tutorial UCINET Visualization and Quantitative Analysis Tutorial Session 1 Network Visualization Session 2 Quantitative Techniques Page 2 An Overview of UCINET (6.437) Page 3 Transferring Data from Excel (From

More information

Equivalence Concepts for Social Networks

Equivalence Concepts for Social Networks Equivalence Concepts for Social Networks Tom A.B. Snijders University of Oxford March 26, 2009 c Tom A.B. Snijders (University of Oxford) Equivalences in networks March 26, 2009 1 / 40 Outline Structural

More information

Examining graduate committee faculty compositions- A social network analysis example. Kathryn Shirley and Kelly D. Bradley. University of Kentucky

Examining graduate committee faculty compositions- A social network analysis example. Kathryn Shirley and Kelly D. Bradley. University of Kentucky Examining graduate committee faculty compositions- A social network analysis example Kathryn Shirley and Kelly D. Bradley University of Kentucky Graduate committee social network analysis 1 Abstract Social

More information

Statistical Analysis of Complete Social Networks

Statistical Analysis of Complete Social Networks Statistical Analysis of Complete Social Networks Introduction to networks Christian Steglich c.e.g.steglich@rug.nl median geodesic distance between groups 1.8 1.2 0.6 transitivity 0.0 0.0 0.5 1.0 1.5 2.0

More information

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy SOCIAL NETWORK ANALYSIS University of Amsterdam, Netherlands Keywords: Social networks, structuralism, cohesion, brokerage, stratification, network analysis, methods, graph theory, statistical models Contents

More information

Introduction to social network analysis

Introduction to social network analysis Introduction to social network analysis Paola Tubaro University of Greenwich, London 26 March 2012 Introduction to social network analysis Introduction Introducing SNA Rise of online social networking

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Borgatti, Steven, Everett, Martin, Johnson, Jeffrey (2013) Analyzing Social Networks Sage

Borgatti, Steven, Everett, Martin, Johnson, Jeffrey (2013) Analyzing Social Networks Sage Social Network Analysis in Cultural Anthropology Instructor: Dr. Jeffrey C. Johnson and Dr. Christopher McCarty Email: johnsonje@ecu.edu and ufchris@ufl.edu Description and Objectives Social network analysis

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

How To Analyze The Social Interaction Between Students Of Ou

How To Analyze The Social Interaction Between Students Of Ou Using Social Networking Analysis (SNA) to Analyze Collaboration between Students (Case Study: Students of Open University in Kupang) Bonie Empy Giri Faculty of Information Technology Satya Wacana Christian

More information

LAB 1 Intro to Ucinet & Netdraw

LAB 1 Intro to Ucinet & Netdraw LAB 1 Intro to Ucinet & Netdraw Virginie Kidwell Travis Grosser Doctoral Candidates in Management Links Center for Social Network Research in Business Gatton College of Business & Economics University

More information

UCINET Quick Start Guide

UCINET Quick Start Guide UCINET Quick Start Guide This guide provides a quick introduction to UCINET. It assumes that the software has been installed with the data in the folder C:\Program Files\Analytic Technologies\Ucinet 6\DataFiles

More information

Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation

Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation Wasserman, Stanley, and Katherine Faust. 2009. Social Network Analysis: Methods and Applications, Structural Analysis in the Social Sciences. New York, NY: Cambridge University Press. Chapter III: Notation

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Groups and Positions in Complete Networks

Groups and Positions in Complete Networks 86 Groups and Positions in Complete Networks OBJECTIVES The objective of this chapter is to show how a complete network can be analyzed further by using different algorithms to identify its groups and

More information

SGL: Stata graph library for network analysis

SGL: Stata graph library for network analysis SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the

More information

A case study of social network analysis of the discussion area of a virtual learning platform

A case study of social network analysis of the discussion area of a virtual learning platform World Transactions on Engineering and Technology Education Vol.12, No.3, 2014 2014 WIETE A case study of social network analysis of the discussion area of a virtual learning platform Meimei Wu & Xinmin

More information

Network Analysis Basics and applications to online data

Network Analysis Basics and applications to online data Network Analysis Basics and applications to online data Katherine Ognyanova University of Southern California Prepared for the Annenberg Program for Online Communities, 2010. Relational data Node (actor,

More information

Using social network analysis in evaluating community-based programs: Some experiences and thoughts.

Using social network analysis in evaluating community-based programs: Some experiences and thoughts. Using social network analysis in evaluating community-based programs: Some experiences and thoughts. Dr Gretchen Ennis Lecturer, Social Work & Community Studies School of Health Seminar Overview What is

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

How to do a Business Network Analysis

How to do a Business Network Analysis How to do a Business Network Analysis by Graham Durant-Law Copyright HolisTech 2006-2007 Information and Knowledge Management Society 1 Format for the Evening Presentation (7:00 pm to 7:40 pm) Essential

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

IBM SPSS Modeler Social Network Analysis 15 User Guide

IBM SPSS Modeler Social Network Analysis 15 User Guide IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM

More information

An Interorganizational Social Network Analysis of the Michigan Diabetes Outreach Networks Measuring Relationships in Community Networks

An Interorganizational Social Network Analysis of the Michigan Diabetes Outreach Networks Measuring Relationships in Community Networks An Interorganizational Social Network Analysis of the Michigan Diabetes Outreach Networks Measuring Relationships in Community Networks Michigan Department of Community Health Authors: Lori Corteville,

More information

Inside Social Network Analysis

Inside Social Network Analysis Kate Ehrlich 1 and Inga Carboni 2 Introduction A management consulting firm hopes to win a lucrative contract with a large international financial institution. After weeks of intense preparation, the team

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation

More information

SOCIAL NETWORK ANALYSIS

SOCIAL NETWORK ANALYSIS SOCIAL NETWORK ANALYSIS Understanding your communities Some Common SNA Terms Centrality is a measure of the degree to which a single person is connected to others in the network Closeness is a measure

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Multilevel Models for Social Network Analysis

Multilevel Models for Social Network Analysis Multilevel Models for Social Network Analysis Paul-Philippe Pare ppare@uwo.ca Department of Sociology Centre for Population, Aging, and Health University of Western Ontario Pamela Wilcox & Matthew Logan

More information

Follow links for Class Use and other Permissions. For more information send email to: permissions@press.princeton.edu

Follow links for Class Use and other Permissions. For more information send email to: permissions@press.princeton.edu COPYRIGHT NOTICE: Matthew O. Jackson: Social and Economic Networks is published by Princeton University Press and copyrighted, 2008, by Princeton University Press. All rights reserved. No part of this

More information

Network Mapping of SME s in Borobudur Cluster Using Social Network Analysis for Strengthening Local Economic Development Platform

Network Mapping of SME s in Borobudur Cluster Using Social Network Analysis for Strengthening Local Economic Development Platform www.ijcsi.org 161 Network Mapping of SME s in Borobudur Cluster Using Social Network Analysis for Strengthening Local Economic Development Platform Rudy Latuperissa Faculty of Information and Technology,

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

2006-352: RICH NETWORKS: EVALUATING UNIVERSITY-HIGH SCHOOLS PARTNERSHIPS USING GRAPH ANALYSIS

2006-352: RICH NETWORKS: EVALUATING UNIVERSITY-HIGH SCHOOLS PARTNERSHIPS USING GRAPH ANALYSIS 2006-352: RICH NETWORKS: EVALUATING UNIVERSITY-HIGH SCHOOLS PARTNERSHIPS USING GRAPH ANALYSIS Donna Llewellyn, Georgia Institute of Technology Dr. Donna C. Llewellyn is the Director of the Center for the

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Remix Your Data: Visualizing Library Instruction Statistics

Remix Your Data: Visualizing Library Instruction Statistics Remix Your Data: Visualizing Library Instruction Statistics Brianna Marshall David Edward Ted Polley We will be handing out flash drives. If you would like to follow along, please install Sci2 and Gephi

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

Leveraging Boundary-spanning Knowledge Community Building

Leveraging Boundary-spanning Knowledge Community Building in: Gronau, N. et al. (eds.): Wissensmanagement. Motivation, Org 247 Leveraging Boundary-spanning Knowledge Community Building - Interventions from a Social Network Analysis in Interorganizational R&D

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS. Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00

THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS. Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00 THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00 FORMAT OF PRESENTATION Part I SNA overview 10 minutes Part II Sociograms Example

More information

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Copyright 2008, Lada Adamic. School of Information University of Michigan

Copyright 2008, Lada Adamic. School of Information University of Michigan School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

When to use Excel. When NOT to use Excel 9/24/2014

When to use Excel. When NOT to use Excel 9/24/2014 Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Excel -- Creating Charts

Excel -- Creating Charts Excel -- Creating Charts The saying goes, A picture is worth a thousand words, and so true. Professional looking charts give visual enhancement to your statistics, fiscal reports or presentation. Excel

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Review on Using Packages to Enhance the Teaching of Critical Path Networks

Review on Using Packages to Enhance the Teaching of Critical Path Networks Review on Using Packages to Enhance the Teaching of Critical Path Networks Harry S Ku Abstract The aim of this paper is to review a published paper, Using computer software packages to enhance the teaching

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process Business Intelligence and Decision Making Professor Jason Chen The analytical hierarchy process (AHP) is a systematic procedure

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Social Network Analysis: Visualization Tools

Social Network Analysis: Visualization Tools Social Network Analysis: Visualization Tools Dr. oec. Ines Mergel The Program on Networked Governance Kennedy School of Government Harvard University ines_mergel@harvard.edu Content Assembling network

More information

A Network Approach to Define Modularity of Components in Complex Products

A Network Approach to Define Modularity of Components in Complex Products Manuel E. Sosa INSEAD Fontainebleau, France manuel.sosa@insead.edu Steven D. Eppinger MIT Cambridge, MA, USA eppinger@mit.edu Craig M. Rowles Pratt and Whitney East Hartford, CT, USA rowles@alum.mit.edu

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Drawing a histogram using Excel

Drawing a histogram using Excel Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to

More information

Social network analysis with R sna package

Social network analysis with R sna package Social network analysis with R sna package George Zhang iresearch Consulting Group (China) bird@iresearch.com.cn birdzhangxiang@gmail.com Social network (graph) definition G = (V,E) Max edges = N All possible

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

A Basic Guide to Analyzing Individual Scores Data with SPSS

A Basic Guide to Analyzing Individual Scores Data with SPSS A Basic Guide to Analyzing Individual Scores Data with SPSS Step 1. Clean the data file Open the Excel file with your data. You may get the following message: If you get this message, click yes. Delete

More information

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

NATIONAL GENETICS REFERENCE LABORATORY (Manchester) NATIONAL GENETICS REFERENCE LABORATORY (Manchester) MLPA analysis spreadsheets User Guide (updated October 2006) INTRODUCTION These spreadsheets are designed to assist with MLPA analysis using the kits

More information

Performance of networks containing both MaxNet and SumNet links

Performance of networks containing both MaxNet and SumNet links Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,

More information

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University park.764@osu.edu yilmaz.15@osu.edu ABSTRACT Depending

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

How to Analyze Company Using Social Network?

How to Analyze Company Using Social Network? How to Analyze Company Using Social Network? Sebastian Palus 1, Piotr Bródka 1, Przemysław Kazienko 1 1 Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland {sebastian.palus,

More information

The mathematics of networks

The mathematics of networks The mathematics of networks M. E. J. Newman Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109 1040 In much of economic theory it is assumed that economic agents interact,

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Technical note I: Comparing measures of hospital markets in England across market definitions, measures of concentration and products

Technical note I: Comparing measures of hospital markets in England across market definitions, measures of concentration and products Technical note I: Comparing measures of hospital markets in England across market definitions, measures of concentration and products 1. Introduction This document explores how a range of measures of the

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Simple Random Sampling

Simple Random Sampling Source: Frerichs, R.R. Rapid Surveys (unpublished), 2008. NOT FOR COMMERCIAL DISTRIBUTION 3 Simple Random Sampling 3.1 INTRODUCTION Everyone mentions simple random sampling, but few use this method for

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information