Gene duplication and hierarchical modularity in intracellular interaction networks

Size: px
Start display at page:

Download "Gene duplication and hierarchical modularity in intracellular interaction networks"

Transcription

1 BioSystems 74 (2004) Gene duplication and hierarchical modularity in intracellular interaction networks Jennifer Hallinan ARC Centre for Bioinformatics, Institute for Molecular Biosciences, The University of Queensland, Brisbane 4072, Qld, Australia Received 13 June 2003; received in revised form 29 January 2004; accepted 2 February 2004 Abstract Networks of interactions evolve in many different domains. They tend to have topological characteristics in common, possibly due to common factors in the way the networks grow and develop. It has been recently suggested that one such common characteristic is the presence of a hierarchically modular organization. In this paper, we describe a new algorithm for the detection and quantification of hierarchical modularity, and demonstrate that the yeast protein protein interaction network does have a hierarchically modular organization. We further show that such organization is evident in artificial networks produced by computational evolution using a gene duplication operator, but not in those developing via preferential attachment of new nodes to highly connected existing nodes Elsevier Ireland Ltd. All rights reserved. Keywords: Modularity; Hierarchical; Network; Intracellular; Yeast; Gene duplication 1. Introduction Networks of interactions between agents arise in a wide variety of contexts, including social networks (Newman, 2001), the Internet (Albert et al., 1999; Huberman and Adamic, 1999) and the world wide web (Kleinberg and Lawrence, 2002; Flake et al., 2002), ecological networks (Williams and Martinez, 2000), and intracellular interaction networks (Bhalla and Iyengar, 1999; Uetz et al., 2000; Sole and Pastor-Santorros, 2002; Jeong et al., 2000). Analysis reveals that these diverse networks frequently have topological and dynamic features in common, and it has been suggested that these commonalities arise Corresponding author. Tel.: ; fax: address: j.hallinan@imb.uq.edu.au (J. Hallinan). from similar processes operating during the evolution and development of the networks. Topological characteristics which are common to many naturally occurring networks include a scale-free pattern of connectivity. A scale-free network has no characteristic number of connections per node as does a randomly constructed network; the probability P(k) of finding a node with k connections follows a power law: P(k) k γ, (1) where the scaling exponent, γ, varies with the degree distribution of the network. When the degree, k,ofthe nodes of a scale-free network is plotted against the probability of occurrence of that degree, P(k), on a log log scale the data forms a straight line, the slope of which is γ. Interaction networks may also exhibit small-world properties (Watts, 1999). A small-world network has a /$ see front matter 2004 Elsevier Ireland Ltd. All rights reserved. doi: /j.biosystems

2 52 J. Hallinan / BioSystems 74 (2004) small but significant number of short-cut connections between otherwise widely separated nodes. This organization leads to characteristic topological features, including a small diameter, where diameter is defined as the longest of the shortest paths between every pair of nodes in the network. Small-world networks also have a large average cluster coefficient, C, compared with randomly connected networks with the same number of nodes and links. Cluster coefficient is a measure of the extent to which the neighbors of a node are linked to each other: C = 1 n n i=1 C i N i (N i 1)/2, (2) where, n is the number of nodes in the network, C i is the number of connections between neighbors of node I, and N i is the number of neighbors of node i (Watts, 1999). Many networks further appear to be organized into a number of modules. A module is generally defined as a subnetwork of a graph, the nodes of which have more connections to other nodes within the module than to external nodes (see, for example, Ancel and Fontana, 2002; Calabretta et al., 1998; Csete and Doyle, 2002; Rives and Galitski, 2003). The identification of modules within a network is an NP-complete problem (Flake et al., 2002). In practice, a number of algorithms have been used for the identification of modules in networks. One approach involves the analysis of flux modes (the smallest subnetworks enabling the metabolic system to operate in steady state) within the network. Many polynomial time algorithms exist for finding the maximum flow that can be routed from a source node, to a sink node, while obeying all capacity constraints (Flake et al., 2002; Stelling et al., 2002). This approach requires a seed node with which to initialize the algorithm. Another approach to module detection relies upon the identification of nodes or links which lie between modules. Snel et al. (2002) define such linkers as orthologous groups with mutually exclusive associations, and split the network at the linkers to produce modules which appear to be biologically plausible. Similarly, Schuster et al. (2002) split the network at nodes which have more than a threshold number of links, on the contention that such highly connected hubs must be external to the modules. They used a threshold number of links of four, but point out that other values may be useful, depending upon the size of the subnets produced. Module identification can be approached as a form of cluster analysis. Hierarchical clustering algorithms are widely used, even by researchers who are not interested in the cluster tree itself. The cluster tree is simply thresholded at an arbitrary depth in order to determine the final clusters (Ravasz and Barabasi, 2003). Girvan and Newman (2002) produced a cluster tree by identifying links with high betweenness (Freeman, 1977) and iteratively removing the links with the highest betweenness to produce a cluster tree. An interesting approach was taken by Holme et al. (2002), who combined the node-removal approach with the betweenness measure to develop an algorithm in which nodes of high betweenness are iteratively removed to deconstruct the network. It has recently been suggested that in addition to a modular organization, biological networks tend to have a hierarchical structure, in which nodes are organized into small modules which are, in turn, organized into larger modules, and so on (Rives and Galitski, 2003). These authors propose a method for the identification of hierarchical modularity which does not require the identification of individual modules. They derive a scaling law for the connectivity of nodes in a hierarchically modular network C(k) k 1 where, C(k) is the cluster coefficient defined in Eq. (2). Networks whose C(k) distributions fit this curve are held to be hierarchically modular. Ravasz et al. (2002) have identified hierarchical modularity in the metabolic networks of 43 different organisms. While the scaling law provides a simple means of identifying hierarchical modularity in a network, it offers no insights into the form of that modularity and, hence, does not contribute to a detailed analysis of network structure. All of the algorithms discussed above rely upon user judgement either to chose the threshold at which the network is fragmented or to validate the biological plausibility of the modules. Since the algorithms are topology-based, an objective, topology-based measure of the goodness of a module would be a valuable addition to the module detection algorithms. In this paper we describe a new algorithm for the

3 J. Hallinan / BioSystems 74 (2004) detection of modularity, in conjunction with an objective, topology-based measure of the coherence of the modules detected. These tools are combined to produce a coherence profile which can be used to visualize the extent of hierarchical modularity of a network, compare the modular structure of networks, and identify the threshold at which a network has maximum modular coherence. The major evolutionary operators which have been implicated in the evolution of scale-free networks are the preferential attachment of new nodes to highly connected existing nodes (Albert and Barabasi, 2000) and the noisy duplication of existing nodes ( gene duplication ; Pastor-Satorras et al., 2002). The relative importance of these operators to the development of real networks is unclear, and probably differs from network to network. Although both of these operators have been demonstrated to produce a scale-free pattern of connectivity, their effect upon the modularity of the network topology has not previously been investigated. We use the coherence profile algorithm to examine networks evolved according to several different published algorithms and compare the modularity of the resulting networks with that of the best-characterized biological network, the protein protein interaction network of the yeast Saccharomyces cerevisiae. 2. Network generation 2.1. The yeast protein protein interaction network Probably the best-characterized subcellular interaction network is the protein protein interaction network of the bakers yeast, S. cerevisiae. High-throughput methods for the collection of yeast protein protein interaction data have been developed over the last 5 years (Fields and Song, 1989), and large interaction databases exist on the Web. The data in these databases is known to be both noisy and incomplete (von Meering et al., 2002). Both false negatives (interactions which exist in vivo, but have not been picked up by the screens) and false positives (interactions which occur under the particular conditions of a yeast two-hybrid screen, but not otherwise) will occur, to an unknown extent. Further, the network which can be constructed from the yeast two-hybrid data is a static snapshot of interactions, with none of the dynamic, temporal qualities of the network in the living cell. These problems mean that considerable care must be taken to choose the most reliable data with which to work, and care must be taken not to over-interpret the results of work done using yeast two-hybrid interaction data. In an effort to use only the most reliable data available, the dataset used for these experiments is the core set of S. cerevisiae protein protein interactions identified by Deane et al. (2002) from the Database of Interacting Proteins (DIP database; This data is a subset of the entire DIP database consisting of those interactions which the authors verified using two forms of computational assessment, and is, therefore, less likely to contain false positive relationships than is the DIP database as a whole, although false negatives (missing interactions) undoubtedly occur. The core dataset comprises 3003 interactions between 1788 proteins (an average connectivity of 1.7). It does not form a single connected component, however; there are 139 components, of which the largest has 1471 proteins and 2770 interactions (average connectivity 1.9). This largest connected component was used for all investigations (Fig. 1) Network models Since we are interested in the evolution of biological interaction networks, the yeast protein protein interaction network was used as the gold standard network for these experiments. In order to compare the effects of different evolutionary operators, we used different operators to generate networks with the same general characteristics as the yeast network. The yeast network, and probably many other biological interaction networks, have three major characteristics: 1. A power law connectivity with a well-defined cutoff. The distribution of connectivity within the network follows a power law. A truly scale-free network obeys this distribution over a wide range of connectivities. Naturally occurring networks, however, tend to deviate from the power law at the extremes of the distribution, probably because of physical factors affecting nodes: people form

4 54 J. Hallinan / BioSystems 74 (2004) Fig. 1. The largest connected component of the curated yeast dataset. In this diagram the circles represent proteins and the lines represent interactions between proteins. relatively fewer new relationships as they age; proteins have physical limitations to the number of binding sites they can support (Amaral et al., 2000), and so on. The yeast network displays such a cutoff at the tail of the distribution. 2. Sparse average connectivity. Although the range of connectivities is wide, most naturally occurring networks have an average connectivity of around The average connectivity of the core yeast network is Small-world characteristics. Small-world networks are characterized by a small diameter relative to the number of nodes in the network, and a large cluster coefficient in comparison with that of a randomly connected network of the same size and average connectivity. We generated networks with characteristics as close as possible to the size and average connectivity of the yeast protein protein interaction network, using two published algorithms which have been demonstrated to produce scale-free networks: gene duplication (Pastor-Satorras et al., 2002); and preferential attachment (Ravasz et al., 2002). In addition, we generated randomly connected networks with approximately the same size and average connectivity as the yeast network. Five networks were generated using each algorithm. The size, average connectivity, average diameter, and average cluster coefficient of these networks are described in Table The preferential attachment model Scale-free networks were generated using the algorithm described by Albert and Barabasi (2000). In this algorithm, a network grows by the addition of new nodes to an existing node k i with probability, Π, proportional to the connectivity k i of node i: k i + 1 Π(k i ) = j (k (3) j + 1) Albert and Barabasi s model produces scale-free networks only for a subset of possible values of the parameters, p and q (see Albert and Barabasi, 2000 for a full analysis of the behavior of the algorithm). The network analysis program Pajek (Batagelj and Mrvar, 1998) incorporates an implementation of Albert and Barabasi s algorithm, with default parameter values which will produce a scale-free network. Starting from these defaults (m 0 (initial number of nodes) = 3, m (nodes added at each time step) = 2, p (probability of adding a link) = , q = (probability of rewiring a link) ), we iteratively modified the parameter values until we obtained scale-free networks which also had an average connectivity as close as possible

5 J. Hallinan / BioSystems 74 (2004) Table 1 Characteristics of the networks used in the project Network Nodes Edges Connectivity Diameter Cluster coefficient Mean S.D. Mean S.D. Mean S.D. Mean S.D. Mean S.D. Yeast N/A N/A 1.88 N/A N/A N/A Random Preferential attachment Gene duplication All values are averaged over five networks, except for the yeast network, which is the largest connected component of the core yeast protein protein interaction network. to that of the yeast network. The final parameters used were m 0 = 2, m = 1, p = 0.333, q = Fig. 2 shows a typical example of a network grown using the preferential attachment algorithm Gene duplication Gene duplication has been an important factor in the evolution of many organisms (Lynch, 2002). We used a network generation algorithm based upon gene duplication, in which a gene is interpreted as a node in the network (Pastor-Satorras et al., 2002). At each time step a node is selected at random and duplicated, together with all of its links to other nodes. Links associated with the new node are then added with probability α, or deleted with probability δ. The gene duplication model tends to generate networks with a large number of single, unconnected nodes. The average connectivity of the largest connected component of the network is, therefore, considerably higher than the value for the network as a whole. In addition, the highly stochastic nature of the algorithm means that the average connectivity of the network varies considerably from run to run of the algorithm, particularly when generating a relatively small network. Table 2 summarizes the results of 100 runs of the algorithm with the parameters described in Pastor-Satorras et al. (2002). The average connectivity of the largest connected component in a gene duplication network is dependant upon the link deletion parameter δ. It proved Fig. 2. The scale-free network generated using the preferential attachment algorithm.

6 56 J. Hallinan / BioSystems 74 (2004) Table 2 Node and link statistics for whole network and largest connected component of the same network, averaged over 100 runs of the gene duplication algorithm with δ = 0.562, N = 2000 and k =2.5 Nodes Links Average connectivity Mean S.D. Mean S.D. Mean S.D. Whole net Largest CC impossible to find a value of δ which would produce a giant component corresponding to that of the yeast network. A very high value of δ yields a highly fragmented network with no single large connected component, while lower δ produced larger components with connectivity somewhat higher than that of the yeast network. A value of δ (0.75) which would result in a giant component with average connectivity of around 2.5 (within the range of average connectivities reported for real networks) was selected empirically. In order to generate a largest connected component of a useful size, networks of 10,000 nodes were generated and the largest connected component extracted. Fig. 3 shows the largest connected component of a typical network generated using the gene duplication algorithm Random networks Control networks were generated with the appropriate numbers of nodes and links, connected at random so that the resulting network has approximately the same average connectivity as the yeast network. These random networks do not have scale-free connectivity, and would not be expected to display any significant modularity. An example of a random network is shown in Fig Quantifying hierarchical modularity 3.1. Iterative vector diffusion The iterative vector diffusion algorithm operates in the context of a graph G consisting of a vertex set V(G) = {v 1...v n } and an edge set E(G) = {e 1...e m } where each edge consists of two vertices. The algorithm is initialized by assigning to each vertex a binary vector of length n, initialized to { 0,i j v i,j = 1,i= j where, i is an index into the vector and j is the unique number assigned to a given node. This generates an initial set of n orthogonal vectors. The algorithm proceeds iteratively. At each iteration an edge from the network is selected at random and the vectors associated with each of its nodes are moved towards each other by adding a small amount, δ, to each element of the vector. This vector diffusion process is iterated until a stopping criterion is met. We chose to compute a maximum number of iterations as the stopping criterion. This number, n, is dependant upon both the number of connections in the network, c, and the size of δ, such that ( α ) n = c, δ Fig. 3. A network generated using the gene duplication algorithm. There are 943 nodes and 2500 links. where, α is the average amount by which a vector is changed in the course of the run. A value for α of

7 J. Hallinan / BioSystems 74 (2004) Fig. 4. A random network with 1432 nodes and 2770 edges. 0.1 was selected empirically in trials on artificially generated networks. At the end of the vector diffusion process the vectors, initially mutually orthogonal, are clustered in n-dimensional space. To reduce the dimensionality of the data set, the vectors are then subjected to hierarchical clustering using the hierarchical clustering algorithm implemented by Eisen et al. (1998). This algorithm uses the Pearson correlation coefficient as a distance metric. It calculates the distance matrix for all members of the input set of vectors and then uses an agglomerative hierarchical algorithm to create a hierarchical cluster tree, in which the two closest items in the set are joined by a node of the tree, and the two items replaced by a single item representing the new node. The process iterates until only one item remains Modular coherence The problem of identifying modules in a network is essentially an unsupervised cluster analysis task. Nodes are identified as belonging to a given module, or cluster, on the basis of their closeness to other nodes as assessed using an appropriate metric. There are many cluster analysis algorithms, several of which have been applied to the module detection task, as discussed above. Most clustering algorithms, however, will identify clusters in any dataset, whether or not these have any correspondence to real groupings in the 3.2. Cluster thresholding The output of the cluster algorithm is a binary tree, with a single root node giving rise to two offspring nodes, each of which give rise to two child nodes of their own, and so on. The tree can, therefore, be thresholded at various levels (two parents, four parents, eight parents, etc.; see Fig. 5) and the modularity of the network at each level can be examined. Fig. 5. Thresholding a cluster tree. (a) Tree thresholded at parent level 2 produces two clusters, (b) the same tree thresholded at parent level 3 has four clusters.

8 58 J. Hallinan / BioSystems 74 (2004) dataset. In order to validate the output of a clustering algorithm, practitioners often examine measures such as inter- and intra-cluster variance. Such measures are not easily applied to nodes on a graph. We propose a measure of modular coherence, which measures the relative proportions of inter- and intra-module links and assigns a value in the range 1 (no coherence) to +1 (a fully connected, stand-alone subgraph). The coherence, χ, of a previously identified module can be defined as ( ) 2ki χ = 1 n ( ) kji (4) n(n 1) n k jo + k ji j=1 where, k i is the total number of edges between nodes in the module, n is the number of nodes in the network, k ji is the number of edges between node j and other nodes within the module, and k jo is the number of edges between node j and other nodes outside the module. The first term in this equation is simply the proportion of possible links between the nodes comprising the module which actually exist; a measure of the connectivity within the module. The second term is the average proportion of edges per node which are internal to the module. A highly connected node with few external edges will, therefore, have a lower value of χ than a highly connected node with many external edges. χ will have a value in the range ( 1, +1). The concept of modularity in a network leads naturally to the question of scale. At what scale should modularity be sought? It is important that any characteristic scale for modularity in the network arise from the data, rather than being imposed by the investigator, since the appropriate scale cannot be determined a priori. This consideration has led to the concept of hierarchical modularity: the idea that network modules can occur at a range of scales, with modules higher up the hierarchy divided into smaller modules, and so on. Holme et al. (2002) consider a fundamental question in biological network analysis to be: what the hierarchical organization of subnetworks looks like. Rather than making an a priori decision about the scale at which network modularity should be analyzed, we propose an approach which provides an overview of the degree of modularity present in a given network at every possible scale of modularity. The resulting graph facilitates visual inspection of the modularity of the network over all possible scales, and permits the selection of a specific characteristic scale of the network for further analysis, if required. We call this approach a coherence profile. At each level in the hierarchy the number of modules and the average modular coherence of the network was computed. Average coherence was then plotted against threshold level to produce the coherence profile summarizing the hierarchical modularity of the network. 4. Results The coherence profile for the yeast network is shown in Fig. 6. It is immediately apparent that the yeast network has significant positive modular coherence over most of the range of thresholds. At low threshold values, corresponding to a partitioning of the network into a small number of relatively large modules, average coherence dips below 0, indicating that the modules have more external than internal connectivity. This is because a clustering algorithm is part of the module identification algorithm. Any clustering algorithm will identify clusters in whatever data it is given; whether or not they reflect real modules in the biological network. Only when the measured modular coherence is positive can confidence be placed in the biological reality of the modules. The spurious nature of the results at high threshold levels is also indicated by the sudden increase in standard deviation at the point where modular coherence drops below zero. These modules are illusory. The yeast network has approximately equal coherence over most thresholds, indicating that the network has a strongly hierarchically modular organization. In contrast to the yeast network, the random network (Fig. 7) shows negative coherence over most of its range. Although the clustering algorithm is still identifying modules, as expected, they have no coherence, have more external than internal edges, and do not fit the definition of a module given earlier. At the higher threshold levels, coherence rises slightly above zero. Inspection of the clustered network reveals that the modules detected at the extreme of the graph are an artefact of the unequal lengths of the branches of the

9 J. Hallinan / BioSystems 74 (2004) Threshold Fig. 6. Coherence profile for the core S. cerevisiae protein protein interaction network. The data is the mean of 100 runs of the algorithm on the same network. Dashed lines represent ±1 standard deviation Threshold Fig. 7. Coherence profile for random networks with approximately the same number of nodes and edges as the yeast network. The data is the mean of 100 runs of the algorithm for each of five randomly generated networks. Dashed lines represent ±1 standard deviation. cluster tree. At the extremes of the tree, there tends to be one large cluster and a number of very small (one or two node) clusters. Within the large cluster most edges will lie between nodes in the same cluster. The number of tiny clusters, most of whose edges connect to nodes external to the cluster, is too small to drive the mean coherence below zero. The random network, therefore, shows no evidence of hierarchical modularity, or, indeed, of significant modularity at any level. The fact that the yeast network displays hierarchical modularity, while the random network does not, provides a benchmark against which to assess the biological plausibility of the network evolution algorithms discussed in the introduction. The coherence profiles for the preferential attachment and gene duplication networks are shown in Figs. 8 and 9, respectively. The preferential attachment algorithm has been shown to produce scale-free networks (Albert and Barabasi, 2000). Fig. 8 shows, however, that these networks exhibit no sign of modularity at any level of the hierarchy. In contrast, the networks generated by the gene duplication algorithm have a coherence profile very similar to that of the yeast protein protein interaction network, with significant modular coherence present at almost every level of the hierarchy. It appears that gene duplication is more likely to produce a

10 60 J. Hallinan / BioSystems 74 (2004) Threshold Fig. 8. Coherence profile for the preferential attachment algorithm. The data is the mean of 100 runs of the algorithm for each of five randomly generated networks. Dashed lines represent ±1 standard deviation Threshold Fig. 9. Coherence profile for the gene duplication algorithm. The data is the mean of 100 runs of the hierarchical modularity detection algorithm for each of five randomly generated networks. Dashed lines represent ±1 standard deviation. hierarchically modular network than is preferential attachment. Preferential attachment is a feasible mechanism for the evolution of some networks, such as social networks, in which an already popular individual is likely to be sought out by new members of the social group. In a biological context, however, preferential attachment appears less plausible. There is no particular reason why a newly-evolved protein should bind more readily to a protein which already binds to several other partners. Gene duplication, however, has been shown to be important in evolution. A newly duplicated gene already has a functional output, which is usually a protein. This protein, however, is free of the selection pressure which acts upon its parent, since the parent still exists and fills its original function. One copy of the gene is, thus, free to mutate and change its function. It is known that the yeast genome has undergone several episodes of complete duplication in the course of its evolutionary history (Wagner, 2001). Gene duplication would, therefore, appear to be a plausible mechanism by which biological networks may have evolved. Gene duplication has previously been shown to produce scale-free networks in silico (Pastor-Satorras et al., 2002), and we show here that it also produces hierarchically modular networks, very similar in profile to the yeast network. There are other aspects of the topology of the yeast protein protein interaction network which appear to

11 J. Hallinan / BioSystems 74 (2004) be consistent with an origin by gene duplication. In order to produce a network as sparsely connected as a typical intracellular interaction network, which tend to have an average connectivity in the range , a large proportion of the duplicated edges (in our study 0.75) must be deleted, while relatively few are added. The algorithm, therefore, tends to produce highly fragmented networks containing a large number of individual nodes unconnected to any other nodes. This pattern of connectivity is evident in yeast. The core yeast dataset contains only 1788 of the 6223 proteins encoded by the yeast genome. The number of false negatives in the dataset (genuine interactions which have not yet been detected) is currently unknown. However, several genome-wide scans for protein protein interactions have been performed (Schwikowski et al., 2000; Legrain and Selig, 2000; Deane et al., 2002), and it is unlikely that the majority of interactions have been missed. The number of genuinely isolated proteins in the yeast network appears to be consistent with the gene duplication algorithm. The major problem with the gene duplication algorithm used here is the difficulty of evolving a network with a single connected component of a size comparable with that of the largest connected component of the yeast network. Increasing the total number of nodes generated increases the number of isolated nodes much more rapidly than the size of the largest connected component. It can be seen from Table 1 that the gene duplication networks were, in general, smaller and more highly connected than the other networks. These results are consistent with the hypothesis that gene duplication, while important, is not the only factor in yeast evolution; a suggestion with which most biologists would heartily agree. References Albert, R., Barabasi, A.-L., Topology of evolving networks: local events and universality. Phys. Rev. Lett. 85, Albert, R., Jeong, H., Barabasi, A.-L., Internet: diameter of the world-wide web. Nature 401, Amaral, L.A.N., Scala, A., Barthelemy, M., Stanley, H.A., Classes of small-world networks. Proc. Natl. Acad. Sci. U.S.A. 97, Ancel, L.W., Fontana, W., Evolutionary lock-in and the origin of modularity in RNA structure. In: Callabaut, W., Rasskin-Gutman, D. (Eds.), Modularity. Understanding the development and evolution of complex natural systems. Cambridge, MA, MIT Press. Bhalla, U.S., Iyengar, R., Emergent properties of networks of biological signaling pathways. Science 283, Calabretta, R., Nolfi, S., Parisi, D., Wagner, G.P., A case study of the evolution of modularity: Towards a bridge between evolutionary biology, artificial life, neuro- and cognitive science. In: Adami, C., Belew, R., Kitano, H., Taylor, C. (Eds.), Proceedings of the Sixth International Conference on Artificial Life. Cambridge, MA, MIT Press, pp Csete, M.E., Doyle, J.C., Reverse engineering of biological complexity. Science 295, Deane, C.M., Salwinski, L., Xenarios, I., Eisenberg, D., Protein interactions. Mol. Cell. Proteomics 1, Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D., Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, Fields, S., Song, O., A novel genetic system to detect protein protein interactions. Nature 340, Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M., Self-organization and identification of web communities. IEEE Comput. 35, Freeman, L.C., A set of measures of centrality based on betweenness. Sociometry 40 (1), Girvan, M., Newman, M.E.J., Community structure in social and biological networks. Proc. Natl. Acad. Sci. U.S.A. 99, Holme, P., Kim, B.J., Yoon, C.N., Han, S.K., Attack vulnerability of complex networks. Physical Review E 65, Huberman, B.A., Adamic, L.A., Internet: growth dynamics of the world-wide web. Nature 401, 131. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabasi, A.- L., The large-scale organization of metabolic networks. Nature 407, Kleinberg, J., Lawrence, S., The structure of the web. Science 294, Legrain, P., Selig, L., Genome-wide protein interaction maps using two-hybrid systems. FEBS Lett. 480, Lynch, M., Gene duplication and evolution. Science 297, Newman, M.E., The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. U.S.A. 98, Pastor-Satorras, R., Smith, E., Sole, R.V., Evolving protein interaction networks through gene duplication. Santa Fe Institute Working Paper Ravasz, E., Somera, A.L., Oltvai, Z.N., Barabasi, A.-L., Hierarchical organization of modularity in metabolic networks. Science 297, Ravasz, E., Barabasi, A.-L., Hierarchical organization in complex networks. Physical Review E 67, Rives, A.W., Galitski, T., Modular organization of cellular networks. Proc. Natl. Acad. Sci. U.S.A. 100 (3), Schuster, S., Pfeiffer, T., Moldenhauer, F., Koch, I., Dandekar, T., Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae. Bioinformatics 18,

12 62 J. Hallinan / BioSystems 74 (2004) Schwikowski, B., Uetz, P., Fields, S., A network of interacting proteins in yeast. Nat. Biotechnol. 18, Snel, B., Bork, P., Huynen, M.A., The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. U.S.A. 99, Sole, R., Pastor-Santorros, R., Complex networks in genomics and proteomics. Santa Fe Institute Working Paper Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S., Gilles, E.D., Metabolic network structure determines key aspects of functionality and regulation. Nature 420, Uetz, P., Glot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M., A comprehensive analysis of protein protein interactions in Saccharomyces cerevisiae. Nature 403, von Meering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P., Comparative assessment of largescale data sets of protein protein interactions. Nature 417, Wagner, A., The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol. Biol. Evol. 18, Watts, D.J., Small-Worlds: The Dynamics of Networks between Order and Randomness. Princeton University Press, Princeton, NJ. Williams, R.J., Martinez, N.D., Simple rules yield complex food webs. Nature 409, Batagelj, V., Mrvar, A., Pajek program for large network analysis. Connections 21,

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Cluster detection algorithm in neural networks

Cluster detection algorithm in neural networks Cluster detection algorithm in neural networks David Meunier and Hélène Paugam-Moisy Institute for Cognitive Science, UMR CNRS 5015 67, boulevard Pinel F-69675 BRON - France E-mail: {dmeunier,hpaugam}@isc.cnrs.fr

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Structure of a large social network

Structure of a large social network PHYSICAL REVIEW E 69, 036131 2004 Structure of a large social network Gábor Csányi 1, * and Balázs Szendrői 2, 1 TCM Group, Cavendish Laboratory, University of Cambridge, Madingley Road, Cambridge CB3

More information

Understanding the dynamics and function of cellular networks

Understanding the dynamics and function of cellular networks Understanding the dynamics and function of cellular networks Cells are complex systems functionally diverse elements diverse interactions that form networks signal transduction-, gene regulatory-, metabolic-

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Hierarchical organization in complex networks

Hierarchical organization in complex networks PHYSICAL REVIEW E 67, 026112 2003 Hierarchical organization in complex networks Erzsébet Ravasz and Albert-László Barabási Department of Physics, 225 Nieuwland Science Hall, University of Notre Dame, Notre

More information

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites, microrna target prediction

More information

A box-covering algorithm for fractal scaling in scale-free networks

A box-covering algorithm for fractal scaling in scale-free networks CHAOS 17, 026116 2007 A box-covering algorithm for fractal scaling in scale-free networks J. S. Kim CTP & FPRD, School of Physics and Astronomy, Seoul National University, NS50, Seoul 151-747, Korea K.-I.

More information

The Structure of Growing Social Networks

The Structure of Growing Social Networks The Structure of Growing Social Networks Emily M. Jin Michelle Girvan M. E. J. Newman SFI WORKING PAPER: 2001-06-032 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily

More information

A discussion of Statistical Mechanics of Complex Networks P. Part I

A discussion of Statistical Mechanics of Complex Networks P. Part I A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts

More information

High Throughput Network Analysis

High Throughput Network Analysis High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

The Topology of Large-Scale Engineering Problem-Solving Networks

The Topology of Large-Scale Engineering Problem-Solving Networks The Topology of Large-Scale Engineering Problem-Solving Networks by Dan Braha 1, 2 and Yaneer Bar-Yam 2, 3 1 Faculty of Engineering Sciences Ben-Gurion University, P.O.Box 653 Beer-Sheva 84105, Israel

More information

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Hierarchical organization of modularity in metabolic networks

Hierarchical organization of modularity in metabolic networks Hierarchical organization of modularity in metabolic networks E. Ravasz 1, A.L. Somera 2, D.A. Mongru 2, Z.N. Oltvai 2 & A.-L. Barabási 1 1 Department of Physics, University of Notre Dame, Notre Dame,

More information

Open Source Software Developer and Project Networks

Open Source Software Developer and Project Networks Open Source Software Developer and Project Networks Matthew Van Antwerp and Greg Madey University of Notre Dame {mvanantw,gmadey}@cse.nd.edu Abstract. This paper outlines complex network concepts and how

More information

Temporal Dynamics of Scale-Free Networks

Temporal Dynamics of Scale-Free Networks Temporal Dynamics of Scale-Free Networks Erez Shmueli, Yaniv Altshuler, and Alex Sandy Pentland MIT Media Lab {shmueli,yanival,sandy}@media.mit.edu Abstract. Many social, biological, and technological

More information

Graph Theory Approaches to Protein Interaction Data Analysis

Graph Theory Approaches to Protein Interaction Data Analysis Graph Theory Approaches to Protein Interaction Data Analysis Nataša Pržulj Technical Report 322/04 Department of Computer Science, University of Toronto Completed on September 8, 2003 Report issued on

More information

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

Applying Social Network Analysis to the Information in CVS Repositories

Applying Social Network Analysis to the Information in CVS Repositories Applying Social Network Analysis to the Information in CVS Repositories Luis Lopez-Fernandez, Gregorio Robles, Jesus M. Gonzalez-Barahona GSyC, Universidad Rey Juan Carlos {llopez,grex,jgb}@gsyc.escet.urjc.es

More information

arxiv:cs.dm/0204001 v1 30 Mar 2002

arxiv:cs.dm/0204001 v1 30 Mar 2002 A Steady State Model for Graph Power Laws David Eppstein Joseph Wang arxiv:cs.dm/0000 v 0 Mar 00 Abstract Power law distribution seems to be an important characteristic of web graphs. Several existing

More information

Time-Dependent Complex Networks:

Time-Dependent Complex Networks: Time-Dependent Complex Networks: Dynamic Centrality, Dynamic Motifs, and Cycles of Social Interaction* Dan Braha 1, 2 and Yaneer Bar-Yam 2 1 University of Massachusetts Dartmouth, MA 02747, USA http://necsi.edu/affiliates/braha/dan_braha-description.htm

More information

Some questions... Graphs

Some questions... Graphs Uni Innsbruck Informatik - 1 Uni Innsbruck Informatik - 2 Some questions... Peer-to to-peer Systems Analysis of unstructured P2P systems How scalable is Gnutella? How robust is Gnutella? Why does FreeNet

More information

An Alternative Web Search Strategy? Abstract

An Alternative Web Search Strategy? Abstract An Alternative Web Search Strategy? V.-H. Winterer, Rechenzentrum Universität Freiburg (Dated: November 2007) Abstract We propose an alternative Web search strategy taking advantage of the knowledge on

More information

Generating Hierarchically Modular Networks via Link Switching

Generating Hierarchically Modular Networks via Link Switching Generating Hierarchically Modular Networks via Link Switching Susan Khor ABSTRACT This paper introduces a method to generate hierarchically modular networks with prescribed node degree list by link switching.

More information

Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures

Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures Dmitri Krioukov, kc claffy, and Kevin Fall CAIDA/UCSD, and Intel Research, Berkeley Problem High-level Routing is

More information

Strategies for Optimizing Public Train Transport Networks in China: Under a Viewpoint of Complex Networks

Strategies for Optimizing Public Train Transport Networks in China: Under a Viewpoint of Complex Networks Strategies for Optimizing Public Train Transport Networks in China: Under a Viewpoint of Complex Networks Zhichong ZHAO, Jie LIU Research Centre of Nonlinear Science, College of Science, Wuhan University

More information

PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS

PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS Vol. 36 (2005) ACTA PHYSICA POLONICA B No 5 PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS Julian Sienkiewicz and Janusz

More information

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

General Network Analysis: Graph-theoretic. COMP572 Fall 2009 General Network Analysis: Graph-theoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5

More information

A General Framework for Weighted Gene Co-expression Network Analysis

A General Framework for Weighted Gene Co-expression Network Analysis Please cite: Statistical Applications in Genetics and Molecular Biology (2005). A General Framework for Weighted Gene Co-expression Network Analysis Bin Zhang and Steve Horvath Departments of Human Genetics

More information

Degree distribution in random Apollonian networks structures

Degree distribution in random Apollonian networks structures Degree distribution in random Apollonian networks structures Alexis Darrasse joint work with Michèle Soria ALÉA 2007 Plan 1 Introduction 2 Properties of real-life graphs Distinctive properties Existing

More information

ModelingandSimulationofthe OpenSourceSoftware Community

ModelingandSimulationofthe OpenSourceSoftware Community ModelingandSimulationofthe OpenSourceSoftware Community Yongqin Gao, GregMadey Departmentof ComputerScience and Engineering University ofnotre Dame ygao,gmadey@nd.edu Vince Freeh Department of ComputerScience

More information

Evolving Networks with Distance Preferences

Evolving Networks with Distance Preferences Evolving Networks with Distance Preferences Juergen Jost M. P. Joy SFI WORKING PAPER: 2002-07-030 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent

More information

Roles of Statistics in Network Science

Roles of Statistics in Network Science Exploration, testing, and prediction: the many roles of statistics in Network Science Aaron Clauset inferred community inferred community 2 inferred community 3 Assistant Professor of Computer Science

More information

http://www.elsevier.com/copyright

http://www.elsevier.com/copyright This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

USE OF GRAPH THEORY AND NETWORKS IN BIOLOGY

USE OF GRAPH THEORY AND NETWORKS IN BIOLOGY USE OF GRAPH THEORY AND NETWORKS IN BIOLOGY Ladislav Beránek, Václav Novák University of South Bohemia Abstract In this paper we will present some basic concepts of network analysis. We will present some

More information

Graph Theory and Networks in Biology

Graph Theory and Networks in Biology Graph Theory and Networks in Biology Oliver Mason and Mark Verwoerd March 14, 2006 Abstract In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss

More information

Many systems take the form of networks, sets of nodes or

Many systems take the form of networks, sets of nodes or Community structure in social and biological networks M. Girvan* and M. E. J. Newman* *Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501; Department of Physics, Cornell University, Clark Hall,

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

More information

Strong and Weak Ties

Strong and Weak Ties Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General

More information

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003 Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation

More information

Computer Network Topologies: Models and Generation Tools

Computer Network Topologies: Models and Generation Tools Consiglio Nazionale delle Ricerche Technical Report n. 5/200 Computer Network Topologies: Models and Generation Tools Giuseppe Di Fatta, Giuseppe Lo Presti 2, Giuseppe Lo Re CE.R.E. Researcher 2 CE.R.E.,

More information

The Network Structure of Hard Combinatorial Landscapes

The Network Structure of Hard Combinatorial Landscapes The Network Structure of Hard Combinatorial Landscapes Marco Tomassini 1, Sebastien Verel 2, Gabriela Ochoa 3 1 University of Lausanne, Lausanne, Switzerland 2 University of Nice Sophia-Antipolis, France

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

A Graph-Theoretic Analysis of the Human Protein-Interaction Network Using Multicore Parallel Algorithms

A Graph-Theoretic Analysis of the Human Protein-Interaction Network Using Multicore Parallel Algorithms A Graph-Theoretic Analysis of the Human Protein-Interaction Network Using Multicore Parallel Algorithms David A. Bader and Kamesh Madduri College of Computing Georgia Institute of Technology, Atlanta,

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Temporal Visualization and Analysis of Social Networks

Temporal Visualization and Analysis of Social Networks Temporal Visualization and Analysis of Social Networks Peter A. Gloor*, Rob Laubacher MIT {pgloor,rjl}@mit.edu Yan Zhao, Scott B.C. Dynes *Dartmouth {yan.zhao,sdynes}@dartmouth.edu Abstract This paper

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 4, Issue 1 2005 Article 17 A General Framework for Weighted Gene Co-Expression Network Analysis Bin Zhang Steve Horvath Departments of

More information

Visualization and Modeling of Structural Features of a Large Organizational Email Network

Visualization and Modeling of Structural Features of a Large Organizational Email Network Visualization and Modeling of Structural Features of a Large Organizational Email Network Benjamin H. Sims Statistical Sciences (CCS-6) Email: bsims@lanl.gov Nikolai Sinitsyn Physics of Condensed Matter

More information

Application of Graph-based Data Mining to Metabolic Pathways

Application of Graph-based Data Mining to Metabolic Pathways Application of Graph-based Data Mining to Metabolic Pathways Chang Hun You, Lawrence B. Holder, Diane J. Cook School of Electrical Engineering and Computer Science Washington State University Pullman,

More information

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II) E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

More information

Effects of node buffer and capacity on network traffic

Effects of node buffer and capacity on network traffic Chin. Phys. B Vol. 21, No. 9 (212) 9892 Effects of node buffer and capacity on network traffic Ling Xiang( 凌 翔 ) a), Hu Mao-Bin( 胡 茂 彬 ) b), and Ding Jian-Xun( 丁 建 勋 ) a) a) School of Transportation Engineering,

More information

Dmitri Krioukov CAIDA/UCSD

Dmitri Krioukov CAIDA/UCSD Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD dima@caida.org F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid

More information

Experimental Comparison of Symbolic Learning Programs for the Classification of Gene Network Topology Models

Experimental Comparison of Symbolic Learning Programs for the Classification of Gene Network Topology Models Experimental Comparison of Symbolic Learning Programs for the Classification of Gene Network Topology Models Andreas D. Lattner, Sohyoung Kim, Guido Cervone, John J. Grefenstette Center for Computing Technologies

More information

Morphological characterization of in vitro neuronal networks

Morphological characterization of in vitro neuronal networks Morphological characterization of in vitro neuronal networks Orit Shefi, 1,2 Ido Golding, 1, * Ronen Segev, 1 Eshel Ben-Jacob, 1 and Amir Ayali 2, 1 School of Physics and Astronomy, Raymond & Beverly Sackler

More information

Boolean Network Models

Boolean Network Models Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected

More information

Enterprise Organization and Communication Network

Enterprise Organization and Communication Network Enterprise Organization and Communication Network Hideyuki Mizuta IBM Tokyo Research Laboratory 1623-14, Shimotsuruma, Yamato-shi Kanagawa-ken 242-8502, Japan E-mail: e28193@jp.ibm.com Fusashi Nakamura

More information

The architecture of complex weighted networks

The architecture of complex weighted networks The architecture of complex weighted networks A. Barrat*, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani* *Laboratoire de Physique Théorique (Unité Mixte de Recherche du Centre National de la Recherche

More information

How To Predict The Growth Of A Network

How To Predict The Growth Of A Network Physica A 272 (1999) 173 187 www.elsevier.com/locate/physa Mean-eld theory for scale-free random networks Albert-Laszlo Barabasi,Reka Albert, Hawoong Jeong Department of Physics, University of Notre-Dame,

More information

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging Mathematical Problems in Engineering, Article ID 578713, 6 pages http://dx.doi.org/10.1155/2014/578713 Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of

More information

A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS ABSTRACT

A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS ABSTRACT A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS Jin Xu Yongqin Gao Jeffrey Goett Gregory Madey Dept. of Comp. Science University of Notre Dame Notre Dame, IN 46556 Email: {jxu, ygao,

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

GRAPH THEORY LECTURE 4: TREES

GRAPH THEORY LECTURE 4: TREES GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Small-World Characteristics of Internet Topologies and Implications on Multicast Scaling

Small-World Characteristics of Internet Topologies and Implications on Multicast Scaling Small-World Characteristics of Internet Topologies and Implications on Multicast Scaling Shudong Jin Department of Electrical Engineering and Computer Science, Case Western Reserve University Cleveland,

More information

Healthcare Analytics. Aryya Gangopadhyay UMBC

Healthcare Analytics. Aryya Gangopadhyay UMBC Healthcare Analytics Aryya Gangopadhyay UMBC Two of many projects Integrated network approach to personalized medicine Multidimensional and multimodal Dynamic Analyze interactions HealthMask Need for sharing

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

How To Cluster Of Complex Systems

How To Cluster Of Complex Systems Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

The Open University s repository of research publications and other research outputs

The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs Online survey for collective clustering of computer generated architectural floor plans Conference

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like? What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graph-based representations Representing a problem

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

Finding community structure in very large networks

Finding community structure in very large networks Finding community structure in very large networks Aaron Clauset, 1 M. E. J. Newman, 2 and Cristopher Moore 1, 3 1 Department of Computer Science, University of New Mexico, Albuquerque, NM 87131 2 Department

More information

Exercise with Gene Ontology - Cytoscape - BiNGO

Exercise with Gene Ontology - Cytoscape - BiNGO Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray

More information

ATM Network Performance Evaluation And Optimization Using Complex Network Theory

ATM Network Performance Evaluation And Optimization Using Complex Network Theory ATM Network Performance Evaluation And Optimization Using Complex Network Theory Yalin LI 1, Bruno F. Santos 2 and Richard Curran 3 Air Transport and Operations Faculty of Aerospace Engineering The Technical

More information

arxiv:cond-mat/0212469v1 [cond-mat.dis-nn] 19 Dec 2002

arxiv:cond-mat/0212469v1 [cond-mat.dis-nn] 19 Dec 2002 Generating correlated networks from uncorrelated ones A. Ramezanpour and V. Karimipour arxiv:cond-mat/0212469v1 [cond-mat.dis-nn] 19 Dec 2002 Department of Physics, Sharif University of Technology, P.O.Box

More information

Structural constraints in complex networks

Structural constraints in complex networks Structural constraints in complex networks Dr. Shi Zhou Lecturer of University College London Royal Academy of Engineering / EPSRC Research Fellow Part 1. Complex networks and three key topological properties

More information

A Review And Evaluations Of Shortest Path Algorithms

A Review And Evaluations Of Shortest Path Algorithms A Review And Evaluations Of Shortest Path Algorithms Kairanbay Magzhan, Hajar Mat Jani Abstract: Nowadays, in computer networks, the routing is based on the shortest path problem. This will help in minimizing

More information

Metabolic Network Analysis

Metabolic Network Analysis Metabolic Network nalysis Overview -- modelling chemical reaction networks -- Levels of modelling Lecture II: Modelling chemical reaction networks dr. Sander Hille shille@math.leidenuniv.nl http://www.math.leidenuniv.nl/~shille

More information

Hierarchical Organization of Railway Networks

Hierarchical Organization of Railway Networks Hierarchical Organization of Railway Networks Praveen R, Animesh Mukherjee, and Niloy Ganguly Department of Computer Science and Engineering Indian Institute of Technology Kharagpur, India 721302 (Dated:

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Self similarity of complex networks & hidden metric spaces

Self similarity of complex networks & hidden metric spaces Self similarity of complex networks & hidden metric spaces M. ÁNGELES SERRANO Departament de Química Física Universitat de Barcelona TERA-NET: Toward Evolutive Routing Algorithms for scale-free/internet-like

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Structure and evolution of online social relationships: Heterogeneity in unrestricted discussions

Structure and evolution of online social relationships: Heterogeneity in unrestricted discussions Structure and evolution of online social relationships: Heterogeneity in unrestricted discussions K.-I. Goh, 1, * Y.-H. Eom, 2 H. Jeong, 2 B. Kahng, 1,3 and D. Kim 1 1 School of Physics and Center for

More information