Disassortative Degree Mixing and Information Diffusion for Overlapping Community Detection in Social Networks (DMID)
|
|
|
- Kerrie Stevenson
- 10 years ago
- Views:
Transcription
1 Disassortative Degree Mixing and Information Diffusion for Overlapping Community Detection in Social Networks (DMID) Mohsen Shahriari Advanced Community Information Systems (ACIS), RWTH Aachen University, Ahornstr. 55, Aachen, Germany Sebastian Krott Advanced Community Information Systems (ACIS), RWTH Aachen University, Ahonrstr. 55, Aachen, Germany Ralf Klamma Advanced Community Information Systems (ACIS), RWTH Aachen University, Ahonrstr. 55, Aachen, Germany ABSTRACT In this paper we propose a new two-phase algorithm for overlapping community detection (OCD) in social networks. In the first phase, called disassortative degree mixing, we identify nodes with high degrees through a random walk process on the row-normalized disassortative matrix representation of the network. In the second phase, we calculate how closely each node of the network is bound to the leaders via a cascading process called network coordination game. We implemented the algorithm and four additional ones as a Web service on a federated peer-to-peer infrastructure. Comparative test results for small and big real world networks demonstrated the correct identification of leaders, high precision and good time complexity. The Web service is available as open source software. Categories and Subject Descriptors G.2.2 [Discrete Mathematics]: Graph Theory Graph Algorithms; H.2.8 [Database Management]: Database Applications Data Mining; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Clustering General Terms Algorithms, theory and experimentation Keywords Overlapping community detection; expert identification; information diffusion; Web service Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author s site if the Material is used in electronic media. WWW 2015 Companion, May 18 22, 2015, Florence, Italy. ACM /15/ INTRODUCTION In recent years, researchers have been interested in identifying the more connected and dense parts of graphs, named communities. There is still no well-established definition of the term community but we consider both a traditional and a more modern one. In the classical understanding, communities are considered components in which internal cluster relationships are dense and the relationships among different components are sparse [8]. In a more modern definition online communities are defined as groups of people or nodes which interact with each other based on their needs and interests [22]. First research on community detection algorithms just intended to identify a set of disjoint communities. However, today s online networks, people (or nodes) tend to belong to more than one community. E.g. readers of online media are interested in more than one topic or are member of several news groups. Consequently the detection of overlapping communities has recently gained much attention [13, 4, 5, 6, 7, 11, 17, 29, 32]. There exists a broad variety of methods for identifying overlapping communities. The first category employs global properties of the graph such as e.g. the global clustering coefficient or modularity values to find covers. The modularity optimization by Girvan and Newman is a typical example for this category [18]. So-called local approaches circumvent problems related to global methods like resolution limit and high time complexity. They apply optimization techniques, identify cliques of fixed small sizes or make use of local dynamics through random walk processes and clustering coefficients [13, 24, 30, 31, 10, 9, 2, 32]. One subcategory of local methods for overlapping community detection is based on identifying leaders. At first most influential nodes are detected in the graph and each or a group of them are considered as communities. Afterwards, the other nodes membership are calculated based on the leaders. Different kinds of metrics and processes like node weights or degree [2, 13], node distances [17] and random walks [24] can be used to determine most influential vertices in the network. In this manuscript a new local approach for detecting overlapping communities is proposed. It is a two-phase method which utilizes two kinds of dynamics for identifying communities: disassortative degree mixing and information diffusion. Communities form around influential nodes, so iden- 1369
2 tifying these nodes is very important [24]. In networks, nodes can have two states regarding their similarity with their neighbours. One state is assortative, where nodes tend to communicate with similar nodes in terms of rank, degree or other aspects. For example assortative low-degree nodes tend to communicate with other low-degree nodes. Disassortativity on the other hand is a sign of dissimilarity. Disassortative high-rank/high-degree nodes tend to communicate with low-rank/low-degree nodes and vice versa, which is also a characteristic of real world networks [19]. However, for identifying communities, we should look for disassortative hubs. Most real world networks like the internet, citation networks, communication and co-authorship networks have disassortative degree mixing properties. For example, scientists like to cite reputable papers in their field and students like to co-author with well-known and high-rank professors [25]. Numerous methods can be used for detecting disassortative hubs. Here a simple random walk process is proposed. First, we compute a disassortative matrix whose elements are computed based on subtraction of the corresponding node degrees. By row-normalizing this matrix we then obtain a disassortative transition matrix. By performing a random walk process for each node, we compute the local disassortative-ness of each node in the network. Through the disassortative-ness value we identify nodes which largely differ from their neighbours in terms of degree. Among these disassortative nodes we are interested in those which have the highest degrees in the network. Details of disassortative dynamics are explained in the method section. In the second phase of the algorithm the membership degree of each node to the communities is computed. For this purpose a network coordination game based on the dynamics of information diffusion is used. The details of the underlying cascading process are explained in the method section as well. The proposed method and the four algorithms including SSK [24], CLiZZ [17], MONC [6] and Link Communities [30] are implemented on top of a java-based peer-2-peer infrastructure 1 and provided as open source software. Amongthe above mentioned baseline approaches, SSK and CLiZZ are leader-based local OCD algorithms. They are suitable for both directed and weighted networks. SSK uses a random walk process for the detection of leaders [24]. Besides CLiZZ finds leaders based on a distance matrix and membership of non-leader nodes are computed iteratively [17]. Moreover, MONCisalso alocal basedmethodwhichperformsbasedon community expansion. In other words, it starts with some initial communities(seeds) and it adds nodes to communities based on a fitness function [6]. Finally, Link Communities algorithm is an approach for OCD based on link partitioning. It aims to increase the link density inside communities by creating communities of edges as opposed to minimizing the communities external edges. This is achieved by detecting communities of edges rather than of nodes by using the so called similarity index [30]. We ran the algorithms on different offline and online networks to demonstrate the usefulness of our approach. It proves to be competitive in terms of both precision and time complexity. In summary the paper makes the following contribution: A novel two-phase method of overlapping community detection is proposed and experimented on small, large real 1 world and synthetic networks. Not only the algorithm is competitive in comparison to others but also it can identify most influential nodes correctly. 2. USE OF TERMS AND VARIABLES Overlapping community detection is a graph-theoretic problem. We are trying to identify communities in a network (or graph) G = (V,E) which is a tuple containing a set of nodes (or vertices) V and a set of edges (or links) E. The total number of nodes is denoted as n = V and the total number of edges as m = E. A node j is called a neighbour of node i if the two are directly connected by an edge. The set of all neighbours of i is called the (open) neighbourhood of i, denoted N(i). The degree deg(i) of node i is given by the amount of edges which are incident to i. In the case of the original (disjoint) community detection problem the goal is to identify a partition of V into disjoint communities, i.e. each node of the graph is member of exactly one community. In OCD, one single node can be part of different communities. Hence a community structure is given by a cover Γ = (C 1,C 2,...,C l ) which is a tuple containing the (generally not disjoint) identified communities. The number of communities in cover Γ is denoted Γ. Often we want to be more specific and define to what degree a node is member of which community. Therefore we assign each node i a membership vector M i of dimension l. The j-th value of this node determines to what extent i is part of C j. 3. PROPOSED TWO-PHASED METHOD In this section, the new method of overlapping community detection is explained. 3.1 Phase I: Disassortative Random Walk In the first phase, a random walk is used to identify disassortative hubs. A hub is a node which has a high degree. A network is disassortative if high-degree nodes are mainly connected to low-degree nodes, and vice versa. We consider a node to be disassortative if its degree strongly differs from the degree of most of its neighbours. So a disassortative hub is a central node, i.e. a node with a high degree, whose neighbours are mostly low-degree nodes. Consequently, such nodescanberegardedas local leaders, since theyhaveanextraordinarily high level of communication compared to their surrounding nodes. We begin by defining a disassortative matrix AS ij = deg(i) deg(j) (1) so that AS ij corresponds to the disassortativity for nodes i and j which are directly connected. For the execution of a random walk we row-normalize this matrix, resulting in the following transition matrix T ij = AS ij N k=1 AS ik Consequently, we can perform a random walk for calculating a disassortativity vector DA t at time t with (2) DA t+1 = DA t T (3) The probability of choosing each path in this process is commensurate with its level of disassortativity of the path. So if one edge connects two nodes that are highly disassortative, then the probability of choosing this edge in the random 1370
3 walk process will be high as well. For initializing the values of the disassortativity vector we use a uniform distribution DA 0 i = 1 N After a finite number of t iterations this process converges, giving us the local disassortativity value DA t i of each node i. Since we are searching for disassortative nodes with a high degree, we still have to consider additional information. Hence we continue by calculating a normalized degree vector DRN i = deg(i) max k V deg(k) using the infinity norm. Combining node degrees and disassortativity values, we define the leadership of node i as (4) (5) LS i = DA i DRN i (6) We name LS the leadership vector and call a node i a local leader if LS i > LS j j N(i) (7) Node j is said to be a follower of node i if it is connected to i and i is a local leader. In order to obtain the most influential leaders from LL, the set of local leaders, we compare the number of each leader s followers with the average number of followers. We define the average follower degree as i LL AFD = FL(i) (8) LL where FL(i) is the set of followers of local leader i. We finally consider a local leader i to be a global leader, i.e. a community leader, if its number of followers is above average such that holds FL(i) > AFD (9) 3.2 Phase II: Cascading Behaviour In order to compute the assignment of remaining nonleader nodes to communities a cascading behaviour is used. The cascading behaviour which is used here is a network coordination game. To make things more clear, if all nodes in the network are communicating with a behaviour B and suddenly one leader of a community adopts a new behaviour A, we are interested in identifying to what extent this behaviour is cascaded through the network. If we run this game with different initial sets, in our case each consisting of one leader, we can also identify overlapping nodes. Since we have L leaders in the network, we play this game L times. In each game some nodes are affected by the leader s new behaviour A. If we select a suitable threshold for each node then its membership degree can be computed based on the time after which it adopts behaviour A. More precisely, for each leader a cascade set is created. Cascade sets for different leaders can overlap. Hence nodes belonging to several different cascade sets are overlapping nodes and are part of all the corresponding communities. Moreover, nodes which change their behaviour in the first iterations will have a higher degree membership for the leader s community than nodes which resist this new behaviour and join the set A in the final iterations. To illustrate this, let us consider an example node i that currently has behaviour B and possesses three neighbours. If we assume that only one of its neighbours works with behaviour B, but two with behaviour A, then it would be more profitable for the node i to adopt behaviour A because this way it could collaborate more easily with a higher number of nodes. More precisely, for a node i and a behaviour A we define the pay-off for that node for adopting the behaviour as the percentage of its neighbours working with behaviour A p A(i) = {j N(i) : j has behavior A} N(i) (10) A node will adopt a new behaviour if the pay-off for that behaviour is higher than a given profitability threshold α. We now do this label propagation once for each leader L j. Initially all nodes apart from L j will have behaviour B, whereas the leader starts with behaviour A. Generally speaking, the sooner a node adopts the new behaviour, the more profitable this behaviour results for it. Hence we can determine the membership of a node i of the community represented by leader L j by the number of iterations t i it took for i to adopt the leader s behaviour M ij = 1 t 2 i (11) To reduce the dependency of nodes that accept a new behaviour at a later time we propose the function in equation (11) which was also used in the experiments. The game ends after an iteration in which no additional node adopts the new behaviour. For determining a meaningful profitability threshold a binary search is performed over all possible threshold values between the bounds 0 and 1. In each iteration of the binary search the game is played once for each leader using the average value of the two current bounds as the profitability threshold. If all nodes are assigned to at least one community, the upper bound is decreased, otherwise the the lower bound will increase. The iteration count for the binary search phase must be predefined, for the experiments we used 10 iterations. The final outcome is a membership matrix M where M ij defines the membership degree of node i of community j. 3.3 DMID Time Complexity and its Extension for Directed/Weighted Graphs This algorithm can be easily tuned for weighted and directed networks. In the first phase, the nodes weighted in-degrees will be used instead of the regular degrees to construct the disassortativity matrix and the leadership vector. In the second phase, the definition of the pay-off value changes to p A(i) = {j S(i) : j has behavior A} S(i) (12) where S(i) is the set of successors of node i. For analysing time complexity let us define c(l) as the size of the community generated by the l-th leader. If we assume that each node is member of only a constant amount of communities and further that the sizes of the communities are equal, it can be approximated that the worst case time complexity is about O(m n Γ ). 4. EXPERIMENTAL SET UP AND DATA SETS In this section, firtly we introduce the datasets which are used in the experiments and then we investigate and compare the results of DMID and other algorithms on synthetic and real world networks. 1371
4 Table 1: Type, number of nodes and edges of networks used in the experiments. Number of edges is shown after making the graph undirected meaning that a backward edge was added for any directed edge. Information regarding the small datasets Zachary, Sawmill, Sawmill Strike and Dolphins is not listed. Graph dblp [21] [23] facebook [12] hamsterster [1] internet [16] jazz [20] Power grid [28] Nodes Edges Type Citation Social Social Social Tech Social Tech Figure 2: This figure comprises both NMI measure and CPU time calculated for different algorithms on the LFR benchmark networks with different parameter settings (average degree 12, 24 and mixing parameter 0.1, 0.3 and percentage of overlapping nodes ranging from 0 to 100 percent. The y axis for NMI and CPU time respectively indicates the NMI value and CPU time and x axis shows the percentage of overlapping nodes. For each overlap value we generated 100 networks and averaged over the results. Because of the high time complexity of the Link Communities algorithm (LC), unlike the others it was only run on one network for each parameter setting. Moreover, running time of LC was too high to be plotted. Table 2: Results of extended modularity and running time on different real-world networks for various algorithms. The upper part of the table indicates the modularity values and the lower part shows the running times. NaN indicates that the algorithm could not finish in appropriate running time on the corresponding network. Graph dblp dolphins facebook hamsterster internet jazz powergrid sawmill sawmill strike zachary DMID CLiZZ MONC NaN SSK LC NaN DMID CLiZZ MONC NaN SSK LC NaN
5 Figure 1: Results of DMID on Zachary. 4.1 Data Sets Some information about the number of nodes, edges and types of networks are shown in table Results on Zachary Karate Club Zachary Karate Club [27] network is popular for testing community detection algorithms. This network consists of 34 nodes which are members of a karate club and 78 edges which represent friendship among these nodes. As can be observed in figure 1, for Zachary DMID detected two communities, node 1 and node 34 as the most influential nodes. In real setting, the author identified two communities, node 1 as an instructor and node 34 as the president of the club. As the author noted, this network was split into two communities as a result of a conflict between instructor and the president. Our algorithm was able to identify exact leaders. Also it identified overlapping nodes in the network (9, 31, 14, 3, 2 and 20). However, their memberships of the leaders are different. Nodes 3 and 20 have equal membership degrees for both leaders. 4.3 Results on Synthetic Networks For a better analysis of the precision and running time, we also used the LFR networks proposed by Lancichinetti [14]. We set the mixing parameter to µ {0.1, 0.3}, the average degree to k {12,24} and increased the percentage of overlapping nodes stepwise from 0% to 100%. For each parameter combination, we generated 100 networks and averaged over the results. Figure 2 indicates extended Normalized Mutual Information (NMI) measure and running time for these networks. The extended NMI is a variant of the original NMI that is a metric for determining the quality of partitions [3], i.e. for disjoint community detection. Later it was adapted by Lancichinetti for measuring covers as well [15]. The NMI is knowledge-driven, i.e. based on the comparison with ground truth data sets. As it can be observed in figure 2, DMID has higher NMI values in comparison to CLiZZ and SSK. As for LFR k = 12, µ = 01 and LFR k = 12, µ = 03, it is competitive with MONC and LC except in some cases where the NMI is a little bit lower. On the other hand CPU times indicate a better time complexity of DMID. LC is not illustrated because the high running times did not allow to plot it. In all parameter settings for the LFR networks, CPU time of DMID is better than that of both MONC and SSK. Only in LFR k = 24 and µ = 01, the MONC algorithm outperforms others. 4.4 Results on Real-World Networks We ran SSK, CLiZZ, Link Community (LC), DMID and MONC on the networks of table 1. Table 2 indicates the extended modularity measure and running time for these networks 2. Extendedmodularityisastatisticalmeasureforthe evaluation of covers without any given ground truth datasets [26]. Concerning extended modularity, DMID has the highest values on the four datasets Zachary (0.691), Sawmill Strike (0.752), Sawmill (0.685) and Internet (0.524) and also has the second rank among the algorithms for Jazz (0.346). One can observe that DMID is very competitive over different datasets regarding both the quality of the detected community structure and the time complexity. Although SSK sometimes is better regarding modularity, DMID beats it in terms of time complexity. Moreover, it seems that CLiZZ has good time complexity in comparison to other algorithms. However, DMID defeats CLiZZ with respect to modularity. Finally, it is worth mentioning that DMID has further advantageous since it can be used e.g. to detect global and local leaders as well as node hierarchies. 5. CONCLUSIONS AND FUTURE WORKS In this paper a two-phase overlapping community detection algorithm is proposed, which uses two social processes named disassortative degree mixing and information diffusion. The proposed algorithm has several advantages. Firstly, it detects leaders as disassortative hubs and based on these dynamics it seems that it can better identify communities in networks with disassortative degree mixing property. In future works, we would like to investigate the effects of network structure by using graphs with different assortativity values. Also, in this implementation only one leader started the cascade and all the nodes used the same threshold value to determine when to change their behaviour. In further steps we would like to change this algorithm to allow multiple leaders for each community in the first phase and different threshold values for the other non-leader nodes in the second phase. Secondly, this algorithm can uncover the hierarchical structure in the network. Thirdly, via this algorithm, overlapping nodes can be identified and due to the soft partitioning one can identify the membership degrees of each node for all communities. Moreover, in terms of time complexity, since the algorithm only considers local information, it can be implemented in a decentralized environment. Hence, another aspect of future work is implementing the algorithm with the Map-Reduce framework and running it on larger networks. 6. ACKNOWLEDGMENTS The work has received funding from the European Commission s FP7 IP Learning Layers under grant agreement no On the internet network MONC and on the facebook network LC did not finish because of high running times 1373
6 7. REFERENCES [1] Hamsterster full network dataset - konect, August, [2] D. Chen, Y. Fu, and M. Shang. An efficient algorithm for overlapping community detection in complex networks. In Intelligent Systems, GCIS 09. WRI Global Congress on, volume 1, pages , May [3] L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, P09008, [4] T. S. Evans. Clique graphs and overlapping communities. Journal of Statistical Mechanics: Theory and Experiment, (12), [5] T. S. Evans and R. Lambiotte. Line graphs of weighted networks for overlapping communities. European Physical Journal B, 77(2): , [6] F. Havemann, M. Heinz, A. Struck, and J. Glaser. Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels. Journal of Statistical Mechanics: Theory and Experiment, [7] M. Fan, K.-C. Wong, T. Ryu, T. Ravasi, and X. Gao. Secom: A novel hash seed and community detection based-approach for genome-scale protein domain identification. PLOS ONE, 7(6), [8] M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12): , [9] S. Gregory. A fast algorithm to find overlapping communities in networks. In Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I, ECML PKDD 08, pages , Berlin, Heidelberg, Springer-Verlag. [10] S. Gregory. Finding overlapping communities in networks by label propagation. New Journal of Physics, 12(10):1 21, [11] J. Huang, H. Sun, J. Han, and B. Feng. Density-based shrinkage for revealing hierarchical and overlapping community structure in networks. Physica A-Statistical Mechanics and its Applications, 390(11): , [12] J. McAuley and J. Leskovec. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems 25, pages [13] D. Jin, B. Yang, C. Baquero, D. Liu, D. He, and J. Liu. A markov random walk under constraint for discovering overlapping communities in complex networks. Journal of Statistical Mechanics-Theory and Experiment, , [14] A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E, 80(1):016118, July [15] A. Lancichinetti, S. Fortunato, and J. Kertész. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3):033015, [16] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. KDD 05, pages , New York, NY, USA, ACM. [17] H. Li, J. Zhang, Z. Liu, L. Chen, and X. Zhang. Identifying overlapping communities in social networks using multi-scale local information expansion. Eur. Phys. J. B, 85(6):190, [18] M. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(026113), [19] M. E. J. Newman. Mixing patterns in networks. Phys. Rev. E, (67), [20] P. Gleiser and L. Garrido. Advances in complex systems. 6(565), [21] M. Pham, D. Kovachev, Y. Cao, and R. Klamma. Enhancing academic event participation with context-aware and social recommendations. In ASONAM 2012, pages [22] J. Preece. Online Communities: Designing Usability and Supporting Sociability. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, [23] R. Guimerà, L. Danon, A. Díaz-Guilera, and F. Giralt, and A. Arenas. Self-similar community structure in a network of human interactions. American Physical Society, 68(6):065103, [24] A. Stanoev, D. Smilkov, and L. Kocarev. Identifying communities by influence dynamics in social networks. Phys. Rev. E, 84:046102, Oct [25] L. Šubelj and M. Bajec. Model of complex networks based on citation dynamics. In Proceedings of the WWW Workshop on Large Scale Network Analysis 2013 (LSNA 13), pages [26] V. Nicosia, G. Mangioni, V. Carchiolo, and M. Malgeri. Extending the definition of modularity to directed graphs with overlapping communities. J. Stat. Mech. (2009) P [27] W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33: , [28] D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393: , [29] Z. Wu, Y. Lin, H. Wan, S. Tian, and K. Hu. Efficient overlapping community detection in huge real-world networks. Physica A-Statistical Mechanics and its Applications, 391(7): , [30] J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: the state of the art and comparative study. CoRR, abs/ , [31] J. Xie, B. K. Szymanski, and X. Liu. Slpa: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In ICDMW 2012, pages [32] Y. Zhang, J. Wang, Y. Wang, and L. Zhou. Parallel community detection on large networks with propinquity dynamics. KDD-09, pages ,
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Efficient Crawling of Community Structures in Online Social Networks
Efficient Crawling of Community Structures in Online Social Networks Network Architectures and Services PVM 2011-071 Efficient Crawling of Community Structures in Online Social Networks For the degree
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns
Effects of node buffer and capacity on network traffic
Chin. Phys. B Vol. 21, No. 9 (212) 9892 Effects of node buffer and capacity on network traffic Ling Xiang( 凌 翔 ) a), Hu Mao-Bin( 胡 茂 彬 ) b), and Ding Jian-Xun( 丁 建 勋 ) a) a) School of Transportation Engineering,
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging
Mathematical Problems in Engineering, Article ID 578713, 6 pages http://dx.doi.org/10.1155/2014/578713 Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities
GENERATING AN ASSORTATIVE NETWORK WITH A GIVEN DEGREE DISTRIBUTION
International Journal of Bifurcation and Chaos, Vol. 18, o. 11 (2008) 3495 3502 c World Scientific Publishing Company GEERATIG A ASSORTATIVE ETWORK WITH A GIVE DEGREE DISTRIBUTIO JI ZHOU, XIAOKE XU, JIE
Random graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
A comparative study of social network analysis tools
Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing
Structural constraints in complex networks
Structural constraints in complex networks Dr. Shi Zhou Lecturer of University College London Royal Academy of Engineering / EPSRC Research Fellow Part 1. Complex networks and three key topological properties
Understanding Graph Sampling Algorithms for Social Network Analysis
Understanding Graph Sampling Algorithms for Social Network Analysis Tianyi Wang, Yang Chen 2, Zengbin Zhang 3, Tianyin Xu 2 Long Jin, Pan Hui 4, Beixing Deng, Xing Li Department of Electronic Engineering,
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Cluster detection algorithm in neural networks
Cluster detection algorithm in neural networks David Meunier and Hélène Paugam-Moisy Institute for Cognitive Science, UMR CNRS 5015 67, boulevard Pinel F-69675 BRON - France E-mail: {dmeunier,hpaugam}@isc.cnrs.fr
Data Exploration with GIS Viewsheds and Social Network Analysis
Data Exploration with GIS Viewsheds and Social Network Analysis Giles Oatley 1, Tom Crick 1 and Ray Howell 2 1 Department of Computing, Cardiff Metropolitan University, UK 2 Faculty of Business and Society,
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Graph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
GRAPH data are important data types in many scientific
Limited Random Walk Algorithm for Big Graph Data Clustering Honglei Zhang, Member, IEEE, Jenni Raitoharju, Member, IEEE, Serkan Kiranyaz, Senior Member, IEEE, and Moncef Gabbouj, Fellow, IEEE arxiv:66.645v
General Network Analysis: Graph-theoretic. COMP572 Fall 2009
General Network Analysis: Graph-theoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Subordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University [email protected] [email protected] ABSTRACT Depending
Introduction to Networks and Business Intelligence
Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
Community-Aware Prediction of Virality Timing Using Big Data of Social Cascades
1 Community-Aware Prediction of Virality Timing Using Big Data of Social Cascades Alvin Junus, Ming Cheung, James She and Zhanming Jie HKUST-NIE Social Media Lab, Hong Kong University of Science and Technology
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
A scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML
UNIVERSITY OF ALBERTA Social Network Analysis for the Assessment of Learning Osmar R. Zaïane Professor & Scientific Director of AICML Educational Data Mining 2010 Pittsburgh, USA University of Alberta
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
A discussion of Statistical Mechanics of Complex Networks P. Part I
A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
Complex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *
Send Orders for Reprints to [email protected] 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Dynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC [email protected] Hong Cheng CS Dept, UIUC [email protected] Abstract Most current search engines present the user a ranked
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Temporal Dynamics of Scale-Free Networks
Temporal Dynamics of Scale-Free Networks Erez Shmueli, Yaniv Altshuler, and Alex Sandy Pentland MIT Media Lab {shmueli,yanival,sandy}@media.mit.edu Abstract. Many social, biological, and technological
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,
Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003
Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Online Appendix to Social Network Formation and Strategic Interaction in Large Networks
Online Appendix to Social Network Formation and Strategic Interaction in Large Networks Euncheol Shin Recent Version: http://people.hss.caltech.edu/~eshin/pdf/dsnf-oa.pdf October 3, 25 Abstract In this
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
An Analysis of Verifications in Microblogging Social Networks - Sina Weibo
An Analysis of Verifications in Microblogging Social Networks - Sina Weibo Junting Chen and James She HKUST-NIE Social Media Lab Dept. of Electronic and Computer Engineering The Hong Kong University of
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Scientific Collaboration Networks in China s System Engineering Subject
, pp.31-40 http://dx.doi.org/10.14257/ijunesst.2013.6.6.04 Scientific Collaboration Networks in China s System Engineering Subject Sen Wu 1, Jiaye Wang 1,*, Xiaodong Feng 1 and Dan Lu 1 1 Dongling School
Algorithms for representing network centrality, groups and density and clustered graph representation
COSIN IST 2001 33555 COevolution and Self-organization In dynamical Networks Algorithms for representing network centrality, groups and density and clustered graph representation Deliverable Number: D06
Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, [email protected] Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
An Analysis of Social Network-Based Sybil Defenses
An Analysis of Social Network-Based Sybil Defenses ABSTRACT Bimal Viswanath MPI-SWS [email protected] Krishna P. Gummadi MPI-SWS [email protected] Recently, there has been much excitement in the research
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
npsolver A SAT Based Solver for Optimization Problems
npsolver A SAT Based Solver for Optimization Problems Norbert Manthey and Peter Steinke Knowledge Representation and Reasoning Group Technische Universität Dresden, 01062 Dresden, Germany [email protected]
Task Placement in a Cloud with Case-based Reasoning
Task Placement in a Cloud with Case-based Reasoning Eric Schulte-Zurhausen and Mirjam Minor Institute of Informatik, Goethe University, Robert-Mayer-Str.10, Frankfurt am Main, Germany {eschulte, minor}@informatik.uni-frankfurt.de
Strong and Weak Ties
Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General
Figure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 [email protected], [email protected] Abstract One of the most important issues
