Analysis and visualization of 2-mode networks



Similar documents
Course on Social Network Analysis Graphs and Networks

HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy

Tools and Techniques for Social Network Analysis

DATA ANALYSIS II. Matrix Algorithms

Pajek. Program for Analysis and Visualization of Large Networks. Reference Manual List of commands with short explanation version 2.

Part 2: Community Detection

Equivalence Concepts for Social Networks

5.1 Bipartite Matching

Practical Graph Mining with R. 5. Link Analysis

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Approximation Algorithms

Social Media Mining. Graph Essentials

The PageRank Citation Ranking: Bring Order to the Web

How to Analyze Company Using Social Network?

Complex Networks Analysis: Clustering Methods

Warshall s Algorithm: Transitive Closure

Social Media Mining. Network Measures

Mining Social-Network Graphs

SCAN: A Structural Clustering Algorithm for Networks

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Design of LDPC codes

Visualization of Social Networks in Stata by Multi-dimensional Scaling

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

A comparative study of social network analysis tools

SGL: Stata graph library for network analysis

Graph/Network Visualization

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

in R Binbin Lu, Martin Charlton National Centre for Geocomputation National University of Ireland Maynooth The R User Conference 2011

How To Understand The Network Of A Network

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Statistical and computational challenges in networks and cybersecurity

Data Mining: Algorithms and Applications Matrix Math Review

Scheduling Shop Scheduling. Tim Nieberg

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Practical statistical network analysis (with R and igraph)

Evaluating Software Products - A Case Study

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Rank one SVD: un algorithm pour la visualisation d une matrice non négative

3. The Junction Tree Algorithms

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

CIS 700: algorithms for Big Data

A Review And Evaluations Of Shortest Path Algorithms

Simplified External memory Algorithms for Planar DAGs. July 2004

Graph Mining and Social Network Analysis

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Finding and counting given length cycles

Trusses: Cohesive Subgraphs for Social Network Analysis

Network File Storage with Graceful Performance Degradation

Lecture 7: NP-Complete Problems

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Groups and Positions in Complete Networks

A Tutorial on Spectral Clustering

SECTIONS NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES

Algorithm Design and Analysis

An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups

Graph theoretic approach to analyze amino acid network

AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC

Examining graduate committee faculty compositions- A social network analysis example. Kathryn Shirley and Kelly D. Bradley. University of Kentucky

Analysis of Algorithms I: Optimal Binary Search Trees

Solutions to Homework 6

International Journal of Software and Web Sciences (IJSWS)

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Mean Ramsey-Turán numbers

On the k-path cover problem for cacti

1 o Semestre 2007/2008

Structural and functional analytics for community detection in large-scale complex networks

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

An overview of Software Applications for Social Network Analysis

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

Introduction to Scheduling Theory

Hadoop SNS. renren.com. Saturday, December 3, 11

Graphical degree sequences and realizations

1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal , AP, INDIA

High degree graphs contain large-star factors

Community Mining from Multi-relational Networks

Social Networks and Social Media

Minimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring

What is SNA? A sociogram showing ties

Discrete Applied Mathematics. The firefighter problem with more than one firefighter on trees

A. Mrvar: Network Analysis using Pajek 1. Cluster

MapReduce Approach to Collective Classification for Networks

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs

Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

Clique coloring B 1 -EPG graphs

Transcription:

1 Analysis and visualization of 2-mode networks Matjaž Zaveršnik, Vladimir Batagelj and Andrej Mrvar University of Ljubljana Abstract: 2-mode data consist of a data matrix A over two sets U (rows) and V (columns). One or both sets can be (very) large. Such data can be viewed also as a (directed) bipartite network. We can analyze 2-mode network directly or transform it into an 1-mode network over U or V. In the paper we present different approaches to the analysis and visualization of 2-mode networks and illustrate them on real life examples. These approaches are supported in the program Pajek. Keywords: 2-mode networks, hubs and authorities, normalizations. 1 2-mode networks A 2-mode network is a structure (U, V, R, w), where U and V are disjoint sets of vertices (nodes, units), R U V is a relation and w : R IR is a weight. If no weight is defined we can assume a constant weight w(u, v) = 1 for all (u, v) R. A 2-mode network can be viewed also as an ordinary (1-mode) network on the vertex set U + V, divided into two sets U and V, where the arcs can only go from U to V (a bipartite directed graph). Some examples of 2-mode networks are: (authors, papers, cites the paper), (authors, papers, is cited in the paper), (authors, papers, is the (co)author of the paper), (people, events, was present at), (people, institutions, is member of), (customers, products/services, consumption), (articles, shopping lists, is on the list), (delegates, proposals, voting results). Most of the techniques to analyze ordinary (1-mode) networks cannot be directly applied to 2-mode networks without modifications or change of the meaning (see?). We have two possibilities how to analyze 2-mode network: using adapted or special techniques, which are usually (but not always) very complicated, changing 2-mode network into two 1-mode networks (U, RR T ) and (V, R T R), which can be analyzed separately with standard techniques. 2 Hubs and authorities As an example of adapted technique we present hubs and authorities. In large networks we often want to determine, which vertices are the most important. There are several ways, how to define importance. Usually we compute some kind of weights on vertices and say, that vertex with higher weight is more important.

2 In directed networks, we usually associate two weights with each vertex v: the authority weight x v and the hub weight y v. If a vertex points to many vertices with large authority weight, then it should receive a large hub weight. If a vertex is pointed to by many vertices with large hub weight, then it should receive a large authority weight. A vertex is a good hub, if it points to many good authorities, and it is a good authority, if it is pointed to by many good hubs. The algorithm for determining authority and hub weights consist of a loop, in which we iteratively compute hub/authority weights as normalized sums of authority/hub weights of the neighbors x = normalize(random starting vector) repeat y = normalize(ax) x = normalize(a T y) until the largest change in components of x and y is < ε where A is the network matrix: A uv = { w(u, v) (u, v) R 0 otherwise Let (U, V, R, w) be a 2-mode network. Because each arc can only go from U to V, vertices in U cannot be good authorities and vertices in V cannot be good hubs. So we associate only one weight (importance) with each vertex. Let x u be the importance of vertex u U, y v the importance of vertex v V and A the adjacency matrix, where rows and columns correspond to vertices from U and V respectively (a uv = 1, if urv and 0, otherwise). The algorithm for determining the importance for each vertex remains unchanged, the only difference is in the interpretation of matrix A and vectors x and y. See section 5 for the results of described algorithm on the network of bibliographic data. 3 Transforming 2-mode network into 1-mode networks Another way to analyze 2-mode network is to transform it into two ordinary (1-mode) networks (U, RR T, w 1 ) and (V, R T R, w 2 ), which can be analyzed separately using standard techniques. The problem is that, if the maximal degree on the set U (or V ) is large, the obtained ordinary network contains large cliques; but if the vertex set is small, this is not bothering (see Figure??). From 2-mode networks over large sets U and/or V we can get smaller ordinary networks by first partitioning U and/or V and shrinking the clusters. There are also different ways to define matrix multiplication we can select different semirings. Instead of usual operations (plus, times) we can also use (or, and) for w = 1, (max, min), (min, +)... In this way we get different ordinary networks.

3 4 Example: Davis Southern Club Women Davis Southern Club Women is a classical example of (small) 2-mode network. The network has 32 vertices (18 women and 14 events) and 93 arcs represented by a binary 18 14 matrix (see Figure??). Arcs represent observed attendance at 14 social events by 18 Southern women. We have an arc from woman u to event v, iff u attended the social event v. These data were collected by Davis et al in the 1930s (see? and?). Because we have a small network we can visualize it as a bipartite graph (see Figure??). We placed events to the left and women to the right side of the picture. We also tried to identify groups of women who attended similar events and groups of events attended by the same women. Figure?? presents the same network. To obtain this picture we computed both ordinary networks (network of women and network of events) and draw each of them using first two eigenvectors. The obtained coordinates were used to draw the original 2-mode network. Finally we manually slightly displaced some vertices in order to obtain a more readable picture. 5 Example: References In this example we have 2-mode network on 674 vertices (314 authors and 360 papers) with 613 arcs. It was produced from the bibliography of the book written by?. The result is an author-by-paper 2-mode network, where the author u is linked to the paper v iff u is the (co)author of the paper v. A network of this size can still be vizualized in the usual way, but because of larger number of vertices any additional data (labels) should be omitted (see Figure??) to make the picture readable. There exist special graphical formats and viewers, which enable us to zoom-in and zoom-out the picture. One of such formats is SVG (Scalable Vector Graphics). The latest graphical programs already contain viewers or even editors for SVG (the viewer is included also in Internet Explorer 6.0). In Figure?? a zoom-in into the network with labels is presented. It is easy to see that a pair of vertices u, v U (or V ) belong to the same (weakly) connected component of (U, RR T, w 1 ) (or (V, R T R, w 2 )) iff they belong to the same (weakly) connected component of (U, V, R, w). The ordinary network on authors is also called a collaboration network, since there is an edge between two authors iff they wrote a joint paper. In Figure?? the largest connected component of the collaboration network (86 vertices) is presented. We applied hubs and authorities algorithm to the biggest weak component of the original 2-mode network and got the following results. The most important authors, ordered descending by their importancy weights are Klavžar (0.6621643), Imrich (0.4118873), Mulder (0.1989839), Mohar (0.1730697) and Gutman (0.1446318), while the most important papers are imkl-99b (0.4073044), klgu-95 (0.3135055), klmu-98 (0.3018555), klmu-99 (0.2755221) and klgu-97 (0.2581323). For the the largest connected component of the collaboration network the results of hubs and authorities algorithm are a little bit different. The most important authors are

4 Klavžar (0.5883468), Imrich (0.4223956), Mohar (0.3885042), Gutman (0.3416242) and Pisanski (0.2379134). The author s degree in the original 2-mode network is the number of his papers, cited in the book, while the degree in the collaboration network is the number of his coauthors. Authors with many papers are Imrich (26), Klavžar (22), Bandelt (12), Mulder (11), Mohar (10), Lovasz (10) and Zh1u (10), while authors with many coauthors are Hell (12), Klavžar (11), Imrich(10), Bandelt (10), Lovasz (8) and Pisanski(7). Another measure used to sort out the most important vertices is betweenness centrality. Vertex is central, if it lies on several shortest paths among other pairs of vertices. Authors with high betweenness centrality are often some kind of middlemans between other authors: Imrich (0.4218487), Hell (0.4007937), Klavžar (0.3668067), Pultr (0.3528478), Bandelt (0.3003735) and Lovasz (0.2431373). 6 Example: Slovenian journals and magazines Over 100.000 people in Slovenia have been asked which Slovenian magazines and journals they read (survey conducted in 1999 and 2000, source CATI Center Ljubljana). They listed 124 different magazines and journals. The collected raw data can be represented as a 2-mode network: Delo Dnevnik Sl.novice... Reader1 x x... Reader2 x... Reader3 x.................. From this 2-mode reader/journal network we generated the corresponding ordinary network on journals using the (+, ) semiring: the value of an undirected edge linking two journals counts the number of readers of both journals. loop on selected journal counts the number of all readers that read this journal. The obtained matrix (A): Delo Dnevnik Sl.novice... Delo 20714 3219 4214... Dnevnik 15992 3642... Sl.novice 31997.................. The ordinary network on readers would be huge (more than 100.000 vertices) containing large cliques (readers of particular journal). Because of the huge differences in number of readers of different journals it is very hard to make any conclusions about journals relationships according to the raw data. First

5 we have to normalize the network all the normalized values are on the interval [0, 1]. There exist several types of normalizations: Geo ij = a ij aii a jj Input ij = a ij a jj Row ij = a ij j a ij Output ij = a ij Col ij = a ii a ij i a ij Min ij = a ij min(a ii, a jj ) Max ij = a ij max(a ii, a jj ) MinDir ij = { aij a ii a jj 0 otherwise a ii MaxDir ij = { aij a ii a jj 0 otherwise a jj In Slovenia we have some publishing houses (Delo, Dnevnik, Večer,... ), which publish several magazines and journals. For example, Delo publishes Delo, Nedelo, Slovenske novice, Delničar, Vikend magazin, Ona, Moj mikro, Jana, Lady,..., Dnevnik publishes Dnevnik, Nedeljski dnevnik, Hopla, Pilot,..., while Večer publishes Večer, Naš dom, 7D, TV večer,... There are also several different types of journals and magazines: daily papers (Delo, Dnevnik, Večer, Slovenske novice, Ekipa), regional papers (Dolenjski list, Novi tednik, Primorske novice, Ptujski tednik, Vestnik Murska Sobota), business (Delničar, Finance, Gospodarski vestnik, Kapital, Manager, Obrtnik, Podjetnik), agricultural (Kmečki glas, Slovenske brazde, Kmetovalec), juvenile magazines (Ciciban, Pil, Firbec, Smrklja, Cool), general interests (7D, Nedeljski dnevnik, Nedelo, Radar), computers (Moj Mikro, Monitor, Win.ini), women (Jana, Lady, Naša žena, Anja, Viva, Otrok in družina, Moj malček, Mama), religion (Družina, Ognjišče), actuality (Mag, Mladina), home (Naš dom, Jana Ambient, Moj mali svet), relaxation (Kih, Razvedrilo, Salomonov ugankar, Antena, Hopla, Lady, Stop), mysticism (Misteriji, Aura), supplements (Delo in dom, Pilot, Vikend magazin, Ona). 6.1 Geo normalization The Geo normalization of an element of the network matrix is obtained by dividing it with the geometric mean of corresponding diagonal elements. In our case this normalization is a measure of association (correlation) between journals (see Figure??). 6.2 MinDir normalization The MinDir normalization makes two things: converts an undirected network into a directed one - an arc points from the journal with less readers to the journal with higher number of readers;

6 the value on an arc corresponds to the percentage of readers of the first journal, who read also the second one. In our case this normalization measures the dependence between journals (see Figure??). On both pictures the size of vertices corresponds to the number of readers. The colors determine weakly connected components. The thickness of the lines represents the normalized value of the links. Only the lines with values above some threshold value are displayed.

7

8 Figure 2: Davis data matrix

9 OLIVIA E13 E12 E14 E10 E11 E9 E7 E8 E6 FLORA SYLVIA HELEN VERNE NORA MYRNA PEARL KATHERINE DOROTHY RUTH EVELYN THERESA E3 E2 E5 E1 E4 ELEANOR BRENDA FRANCES LAURA CHARLOTTE Figure 3: Davis Bipartite network

10 E11 OLIVIA FLORA E9 VERNE SYLVIA NORA HELEN THERESA RUTH EVELYN CHARLOTTE E2 E1 E3 E4 E5 E6 E8 E7 FRANCES ELEANOR LAURA BRENDA E14 E12 E13 E10 PEARL MYRNA DOROTHY KATHERINE Figure 4: Davis First two eigenvectors on U and V, combined, manual editing

Figure 5: References 2-mode network: papers (darker) and their authors (lighter) 11

12 Figure 6: References Zoom in, with labels

13 Nedela, R. Cyvin, S.J. Zmazek, B. Milutinovic, U. Slutzki, G. Dakic, T. Shawe-Taylor, J. Gutman, I. Kumar, R. vande V Pisanski, T. Klavzar, S. Jha, P.K. P Batagelj, V. White, A.T. Mohar, B. Agnihotri, N. Skrekovski, R. Chepoi, V. Formann, M. Hagauer, J. Zerovnik, J. Seifter, N. Skoviera, M. Izbicki, H. Mulder, H.M. Haddad, R.W. Idury, R.M. Aurenhammer, F. Bandelt Feigenbaum, J. Dorfler, W. Wilkeit, E. Schaffer, A.A. Wagner, F. Imrich, W. Haggkvist, R. F Hershberger, J., Burr, S.A. Greenwell, D. Gyori, E. Watkins, M.E. Shearer, J.B. Plummer, M.D. Nowitz, L.A. Miller, D.J. Erdos, P. Lovasz, L. Pultr, A. Neumann-Lara, V. G Groetschel, M. Poljak, S. Hell, P. West, D.B. Nesetril, J. Rodl, V. Hahn, G. Roberts, F.S. Saks, M.E. Graham, R.L. Eaton, N. Tardif, C. Neufeld, S. Rival, I. Winkler, P.M. Pollak, H.O. Larose, B. Laviolette, F. Rall, D.F. Nowako Chung, F. R.K. Roth, R.L. Hartnell, B.L. Brown, Figure 7: References Collaboration network

14 Finance 7d Tv vecer Gospodarski vestnik Ona Lady Vecer Mag Delo Nasa zena Mladina Vikend magazin Jana Moj dom (Dnevnik) Delo in dom Pilot Sobotna priloga(delo) Slovenske novice Dnevnik Moj mikro Nedeljski dnevnik Smrklja Misteriji Glamur Ognjisce Pil Moj malcek Monitor Cool Aura Win.ini Pepita Druzina Firbec Mama Figure 8: Geo normalization

15 Slov. brazde Cool Pilot Mag Jana Ambient Glamur Internet Kmecki glas Kmetovalec Zaslon Prim. novice Moj dom Podjetnik Monitor Sobotna pril. Smrklja Ned. dnev. Finance Svet in ljudje Nedelo Gosp. vestnik Kapital Val Moj mikro Manager Nasa zena Viva Ljubljana Jana Pepita Delo Glas gosp. Mladina Denar Gea Tajnica Marketing mag. Proteus Vip Oskar Modna Jana Delo in dom Dnevnik Kaj Panorama Zdravje Anja Lady Ptujski tednik Vecer Druzina 7d Tv vecer Ognjisce Kam Trac Ona Grafiti Vikend magazin Prestop Savinjcan Slov. nov. Antena Moja skrivnost Celjski ogl. Dol.i list Ljub. zgodbe Nas casopis Salomonov ogl. Grand prix Figure 9: MinDir normalization

16 References V. Batagelj and A. Mrvar. Pajek - program for large network analysis, 1996 2002. URL http://vlado.fmf.uni-lj.si/pub/networks/pajek/. S. P. Borgatti and M. Everett. Network analysis of 2-mode data. Social Networks, 19: 243 269, 1997. R. Breiger. The duality of persons and groups. Social Forces, 53:181 190, 1974. A. Davis et al. Deep South. University of Chicago Press, Chicago, 1941. W. Imrich and S. Klavžar. Product graphs: structure and recognition. John Wiley & Sons, New York, USA, 2000. J. Scott. Social Network Analysis: A Handbook. Sage Publications, London, second edition, 2000. S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, 1994.