Computational Discovery in Evolving Complex Networks
|
|
- Basil Phelps
- 8 years ago
- Views:
Transcription
1 Computational Discovery in Evolving Complex Networks Yongqin Gao Advisor: Greg Madey Yongqin Gao December 2006 Dissertation Defense
2 Outline Background Methodology for Computational Discovery Problem Domain OSS Research Process I: Data Mining Process II: Network Analysis Process III: Computer Simulation Process IV: Research Collaboratory Contributions Conclusion and Future Work
3 Background Network research gains more attentions Internet Communication network Social network Software developer network Biological network Understanding the evolving complex network Goal I: Search Goal II: Prediction Computational scientific discovery
4 Computational Discovery Our Methodology Discovery Network Analysis Assessment Researcher Initialization Data Mining Feedback Revision Computer Simulation Research Collaboratory Contribution Reference Community Members
5 Problem Domain Open Source Software Movement What is OSS Free to use, modify and distribute and source code available and modifiable Potential advantages over commercial software: Potentially high quality; Fast development; Low cost Why study OSS (Goal) Software engineering new development and coordination methods Open content model for other forms of open, shared collaboration Complexity successful example of selforganization/emergence
6 Glory of OSS Number of Active Apache Hosts
7 Problem Domain SourceForge.net community The biggest OSS development communities 134,751 registered projects 1,439,773 registered users
8 Our Data Set Problem Domain 25 monthly dumps since January Totally 460G and growing at 25G/month. Every dump has about 100 tables. Largest table has up to 30 million records. Experiment Environment Dual Xeon 3.06GHz, 4G memory, 2T storage Linux ELsmp with PostgreSQL 8.1
9 Related Research OSS research W. Scacchi, Free/open source software development practices in the computer game community, IEEE Software, C. Kevin, A. Hala and H. James, Defining open source software project success, 24th International Conference on Information Systems, Seattle, Complex networks L.A. Adamic and B.A. Huberman, Scaling behavior of the world wide web, Science, M.E.J. Newman, Clustering and preferential attachment in growing networks, Physics Review, 2001.
10 Process I: Data Mining Related Research: S. Chawla, B. Arunasalam and J. Davis, Mining open source software (OSS) data using association rules network, PAKDD, D. Kempe, J. Kleinberg and E. Tardos, Maximizing the spread of influence through a social network, SIGKDD, C. Jensen and W. Scacchi, Data mining for software process discovery in open source software development communities, Workshop on Mining Software Repositories, 2004.
11 Process I: Data Mining Algorithm Application Feature Selection Relevant data Data Purging Database Data Preparation Raw data
12 Process I: Data Mining Data Preparation Data discovery Locating the information Data characterization Activity features: user categorization Network features Data assembly Data Purging Treatment about data inconsistency Unifying the date presentation by loading into single depository Treatment about data pollution Removing inactive projects Feature Selection This method is used to remove dependent or insignificant features. NMF (Non-negative Matrix Factorization)
13 Result I Process I: Data Mining Significant features By feature selection, we can identify the significant feature set describing the projects. Activity features: file_releases, followup_msg, support_assigned, feature_assigned and task related features Network features: degrees, betweenness and closeness
14 Process I: Data Mining Distribution-based clustering (Christley, 2005) Clustering according to the distribution of features instead of values of individual feature We assume every entity (project) has an underlying distribution of the feature set (activity features) Using statistical hypothesis test Non-parametric test Fisher s contingency-table test is used Joachim Krauth, Distribution-free statistics: an application-oriented approach, Elsevier Science Publisher, 1988.
15 Process I: Data Mining Procedure: While (still unclustered entities) Put all unclustered entities into one cluster While (some entities not yet pairwise compared) A = Pick entity from cluster For each other entity, B, in cluster not yet compared to A Run statistical test on A and B If significant result Remove B from cluster Worst case complexity: O(n 2 )
16 Process I: Data Mining Result II Unsupervised learning Distribution-based method used to cluster the project history using the activity distribution We named the clusters using ID and the results are shown in the table High support and confidence in evaluation Cluster ID Total Size
17 Process I: Data Mining Two sample distributions from different categories Unbalanced feature distribution could be unpopular Balanced feature distribution could be popular Cluster Activity Category Cluster Activity Category
18 Process I: Data Mining Discoveries in Process I Significant feature set selection Network features are important Further inspection in next process Distribution based predictor Based on the activity feature distribution Prediction of the popularity based on the balance of the activity feature distribution Benefit of these discoveries For collaboration based communities, these discoveries can help in resource allocation optimization.
19 Process II: Network Analysis Why network analysis Assess the importance of the network measures to the whole network and to individual entity in the network Inspect the developing patterns of these network measures Network analysis Structure analysis Centrality analysis Path analysis
20 Process II: Network Analysis Related research: P. Erdös and A. Rényi, On random graphs, Publicationes Mathematicae, D.J. Watts and S. H. Strogatz, Collective dynamics of small-world networks, Nature, R. Albert and A.L. Barabάsi, Emergence of scaling in random networks, Science, Y. Gao, Topology and evolution of the open source software community, Master Thesis, 2003.
21 Process II: Network Analysis Structure Analysis Understanding the influence of the network structure to individual entities in the network Inspected measures Approximate diameter log( N / z D = log( z / z Approximate clustering coefficient Component distribution 2 1 ) ) C = 2 ( µ 2 " µ 1)(! 2 "! 1) 1+ µ! (2! " 3! +! )
22 Process II: Network Analysis Conversion among C-NET, P-NET and D- NET
23 Process II: Network Analysis Result I Approximate Diameters D-NET: between (5,7) while network size ranged from 151,803 to 195,744. P-NET: between (6,8) while network size ranged from 123,192 to 161,798. Approximate Clustering Coefficient D-NET: between (0.85, 0.95) P-NET: between (0.65, 0.75)
24 Process II: Network Analysis Result I
25 Process II: Network Analysis Centrality Analysis Understanding the importance of individual entities to the global network structure Inspected measures: Average Degrees Degree Distributions Betweenness B( v) Closeness = C( v) =! s# v# t" V $ st ( v) $ 1 st! " d t V G ( v, t )
26 Process II: Network Analysis Result II Average Degrees Developer degree in C-NET: Project degree in C-NET: Developer degree in D-NET: Project degree in P-NET:
27 Process II: Network Analysis Result II (Degree distributions in C-NET)
28 Process II: Network Analysis Result II (Degree distributions in D-NET and P-NET)
29 Process II: Network Analysis Result II Average Betweenness P-NET: e-003 Average Closeness P-NET: e-005 Normally these two measures yield very small value in large networks (N>10,000).
30 Process II: Network Analysis Path Analysis Understanding the developing patterns of the network structure and individual entities in the network Inspected measures: Active Developer Percentage Average Degrees Diameters Clustering coefficients Betweenness Closeness
31 Process II: Network Analysis Result III (Active entities)
32 Process II: Network Analysis Result III (Average degrees in C-NET)
33 Process II: Network Analysis Result III (Average degrees in D-NET and P-NET)
34 Process II: Network Analysis Result III (Diameters in D-NET and P- NET)
35 Process II: Network Analysis Result III (Clustering coefficients for D- NET and P-NET)
36 Process II: Network Analysis Result III (Average betweenness and closeness for P-NET)
37 Process II: Network Analysis Measures D-NET P-NET C-NET Average Degree Yes Yes Yes Diameter Yes Yes N/A Clustering Coefficient Yes Yes N/A Degree Distribution Yes Yes Yes Component Distribution N/A Yes N/A Major Component N/A Yes N/A Average Betweenness Yes Yes N/A Average Closeness Yes Yes N/A Active Entity Size Development Yes Yes Yes Average Degree Development Yes Yes Yes Diameter Development Yes Yes N/A Clustering Coefficient Development Yes Yes N/A Average Betweenness Development Yes Yes N/A Average Closeness Development Yes Yes N/A
38 Process II: Network Analysis Discoveries in Process II: Measures of structure analysis and centrality analysis all indicate very high connectivity of the network. Measures of path analysis reveal the developing patterns of these measures (life cycle behavior). Benefits of these discoveries High connectivity in a network is an important feature for information propagation, failure proof. Understanding this discovery can help us improve our practices in collaboration networks and communication networks. Understanding the developing patterns of these network measures provides us a method to monitor network development and to improve the network if necessary.
39 Process III: Computer Simulation Related Research: P.J. Kiviat, Simulation, technology, and the decision process, ACM Transactions on Modeling and Computer Simulation,1991. R. Albert and A.L. Barabási, Emergence of scaling in random networks, Science, J. Epstein R. Axtell, R. Axelrod and M. Cohen, Aligning simulation models: A case study and results, Computational and Mathematical Organization Theory, Y. Gao, Topology and evolution of the open source software community, Master Thesis, 2003.
40 Process III: Computer Simulation Iterative simulation method Empirical dataset Model Simulation Characterization Description Model Adjustment Generation Verification and validation More measures Empirical Data Collection Verification Validation Simulation More methods
41 Process III: Computer Simulation Previous iterated models (master thesis): Adapted ER Model BA Model BA Model with fitness BA Model with dynamic fitness Iterated models in this study Improved Model Four (Model I) Constant user energy (Model II) Dynamic user energy (Model III)
42 Process III: Computer Simulation Model I Realistic stochastic procedures. New developer every time step based on Poisson distribution Initial fitness based on log-normal distribution Updated procedure for the weighted project pool (for preferential selection of projects).
43 Process III: Computer Simulation Average degrees
44 Process III: Computer Simulation Diameter and CC
45 Process III: Computer Simulation Betweenness and Closeness
46 Process III: Computer Simulation Degree Distributions
47 Process III: Computer Simulation Deficit in the measures
48 Process III: Computer Simulation Model II New addition: user energy. User energy the fitness parameter for the user Every time a new user is created, a energy level is randomly generated for the user Energy level will be used to decide whether a user will take a action or not during every time step.
49 Process III: Computer Simulation Degree distributions for Model II
50 Process III: Computer Simulation Deficit in the measures
51 Process III: Computer Simulation Model III New addition: dynamic user energy. Dynamic user energy Decaying with respect to time Self-adjustable according to the roles the user is taking in various projects.
52 Process III: Computer Simulation Degree distributions (Model III)
53 Process III: Computer Simulation Models Measures Patterns in Data Simulated Patterns Developer Distribution Power Law (large tail) Power Law (small tail) Model I (more realistic distributions) Project Distribution Average Degrees Clustering Coefficient Diameter Power Law (small tail) Increasing Decreasing Decreasing Power Law (large tail) Increasing Decreasing Decreasing Average Betweenness Decreasing Decreasing Average Closeness Decreasing Decreasing Developer Distribution Power Law (large tail) Power Law (large tail) Model II (constant user energy) Project Distribution Average Degrees Clustering Coefficient Diameter Power Law (small tail) Increasing Decreasing Decreasing Power Law (reasonable tail) Increasing Decreasing Decreasing Average Betweenness Decreasing Decreasing Average Closeness Decreasing Decreasing Developer Distribution Power Law (large tail) Power Law (large tail) Project Distribution Power Law (small tail) Power Law (small tail) Model III (dynamic user energy) Average Degrees Clustering Coefficient Diameter Increasing Decreasing Decreasing Increasing Decreasing Decreasing Average Betweenness Decreasing Decreasing Average Closeness Decreasing Decreasing
54 Process III: Computer Simulation Discoveries in Process III Expanding the network models for modeling evolving complex networks (more parameters) Providing a validated model to simulate the community network at SourceForge.net Benefits of these discoveries Expanded network models can benefit other researchers in complex networks. Validated model for SourceForge.net can be used to study other OSS communities or similar collaboration networks.
55 Process IV: Research Collaboratory Related Research: G. Chin Jr. and C. Lansing, The biological sciences collaboratory, Mathematics and Engineering Techniques in Medicine and Biological Sciences, L. Koukianakis, A system for hybrid learning and hybrid psychology, Cybernetics and Information Technologies, Systems and Applications, NCBI, FlyBase, Ensembl, VectorBase
56 Process IV: Research Collaboratory What is Collaboratory? An elaborate collection of data, information, analytical toolkits and communication technologies A new networked organizational form that also includes social processes, collaboration techniques and agreements on norms, principles, value, and rules
57 Process IV: Research Collaboratory
58 Process IV: Research Collaboratory Data tier - schema design SF0205 Timeline SF0305 SF0405 Every schema is a database dump from the SourceForge.net SF0103 SF0605 SF0505 SF0805 SF0705
59 Process IV: Research Collaboratory Data tier - connection pool Logic Tier Connection Request Connection Assigner Connection Pool Persistent Link Persistent Link Persistent Link Timeline
60 Process IV: Research Collaboratory Presentation Tier Various access methods Documentation and references Community support Wiki interface
61 Logic Tier Process IV: Research Collaboratory Interactive web query system Authorized user can submit query to the back end repository through the web query Results are provided by files with various formats Dynamic web schema browser Authorized user can access the dynamic schema of the repository through the schema browser
62 Process IV: Research Collaboratory Utilization reports Monthly statistics (June 2006) Total queries submitted: 16,947 Total data files retrieved: 13,343 Total bytes of query data downloaded: 26,684,556,278 Programmable access method Programmable access method should be provided for complicated access Web services planned
63 Process IV: Research Collaboratory Results in Process IV Designing, implementing and maintaining a research collaboratory for OSS related research. Benefits of these results OSS researchers can access one of the most complete data sets for a OSS community development. By providing the community service to OSS researchers, the collaboratory can help in sparkling, improving and promoting research ideas about OSS.
64 Contributions Designed and demonstrated a computational discovery methodology to study evolving complex networks using research on OSS as a representative problem domain Understanding the OSS movement by applying the methods. Process I: data mining Identifying significant features to describe a project Using distribution based clustering to generate a distribution based predictor to predict the popularity of a project Process II: network analysis Introducing more complete analysis to inspect more complete data set from SourceForge.net. Discovering high connectivity and possible life cycle behaviors in both the network structure and individuals in the network Process III: computer simulation Introducing more parameters in modeling evolving complex networks Generating a fit model to replicate the evolution of the SourceForge.net community. Process IV: research collaboratory Designing, implementing and maintaining a research collaboratory to host the SourceForge.net data set and provide community support for OSS related researches.
65 Publications to-date Y. Gao; G. Madey and V. Freeh. Modeling and simulation of the open source software community, ADSC, San Diego, Y. Gao and G. Madey. Project development analysis of the oss community using st mining, NAACSOS, Notre Dame, S. Christley; Y. Gao; J: Xu and G. Madey. Public goods theory of the open source software development community, Agent, Chicago, Y. Gao, Y. Huang and G. Madey, Data Mining Project History in Open Source Software Communities, NAACSOS, Pittsburgh, J. Xu, Y. Gao, J. Goett and G. Madey, A Multi-model Docking Experiment of Dynamic Social Network Simulations, Agent, Chicago, Y. Gao, V. Freeh, and G. Madey, Analysis and Modeling of the Open Source Software Community, NAACSOS, Pittsburgh, Y. Gao, V. Freeh, and G. Madey, Conceptual Framework for Agentbased Modeling and Simulation, NAACSOS, Pittsburgh, G. Madey; V. Freeh; R: Tynan and Y. Gao. Agent-based modeling and simulation of collaborative social networks, AMCIS, Tampa, Y. Gao; V. Freeh and G. Madey. Topology and evolution of the open source software community, SwarmFest, Notre Dame, 2003.
66 Publication Plan Chapter III (data mining) Journal of Machine Learning Research Journal of Systems and Software Chapter IV (network analysis) Journal of Network and Systems Management Journal of Social Structure Chapter V (computer simulation) Spring Simulation Conference 2007 (under review) IEEE Computing in Science and Engineering Chapter VI (research collaboratory) CITSA 2007 Journal of Computer Science and Applications
67 Conclusion and Future Work Cyclic computational discovery method for studying evolving complex networks Study of Open Source Software by applying this method Future works: Maintaining and expanding the collaboratory Verifying the discoveries in the SourceForge.net against further accumulated database dump from SourceForge.net Applying our simulation model on other software development communities Extending our methodology to other evolving complex networks like Internet, communication network and various social networks
68 Acknowledgement My advisor: Dr. Madey My committee members: Dr. Flynn Dr. Striegel Dr. Wood My Colleagues: Scott Christley, Yingping Huang, Tim Schoenharl, Matt Van Antwerp, Ryan Kennedy, Alec Pawling and Jin Xu SourceForge.net managers: Jeff Bates, VP of OSTG Inc. Jay Seirmarco, GM of SourceForge.net. US NSF CISE/IIS-Digital Society & Technology, under Grant No
69 Questions
70 Case Study II Project 6882 OSS Developer Network (Part) Developers are nodes / Projects are links 24 Developers 5 Projects 2 hub Developers 1 Cluster dev[72] Project 7597 dev[64] dev[67] dev[52] Project 7028 dev[70] dev[65] dev[47] 6882 dev[47] dev[52] 6882 dev[47] dev[55] 6882 dev[47] 6882 dev[58] dev[79] dev[47] dev[79] dev[55] dev[58] dev[99] dev[51] dev[46] dev[58] dev[57] 7597 dev[46] 7028 dev[46] dev[70] 7028 dev[46] dev[57] dev[99] 7028 dev[46] dev[51] dev[46] dev[46] dev[46] dev[56] dev[83] dev[46] dev[48] 7597 dev[46] 7597 dev[46] dev[64] 7597 dev[46] dev[72] dev[67] 7597 dev[46] dev[55] 7597 dev[46] dev[45] 7597 dev[46] dev[61] 7597 dev[46] dev[58] 9859 dev[46] dev[54] 9859 dev[46] 9859 dev[46] dev[49] dev[53] 9859 dev[46] dev[59] dev[53] dev[54] dev[58] dev[45] dev[61] dev[49] dev[83] Project dev[48] dev[56] dev[59] Project 9859
71 Process I: Data Mining Characteristics of data set Massive Incomplete, noisy, redundant Complex structures, unstructured Classic analysis tools are often inadequate and inefficient for analyzing these data, especially in exploratory research What is DM (Data mining) Nontrivial extraction of implicit, previously unknown and potentially useful information from data.
72 Process I: Data Mining Feature Selection Given a non-negative n x m matrix V, find factors W (n, r) and H (r, m), such that V W *H This is called the non-negative matrix factorization (NMF) of the matrix V NMF can be used on multivariate data to reduce the dimension of the data set By using NMF, we can reduce dimension from m features to r features
73 Why NMF? Feature extraction methods linear methods are simpler and more completely understood. nonlinear methods are more general and more difficult to analyze. Linear methods: ICA: Independent Component Analysis Matrix decomposition: PCA, SVD, NMF In practice, NMF is most popular and simple. Dimensionality reduction is effective if the loss of information due to mapping to a lowerdimensional space is less than the gain due simplifying the problem.
74 Process I: Data Mining Feature-based Clustering Grouping data into K number of clusters based on features. The distance metrics used is Euclidean distance like Hierarchical K-Means is used. The result is a binary tree. The root is the whole data set and the leaf clusters are the fine-grained clusters, which are the resulting K clusters.
75 Process I: Data Mining Case Study Result II Unsupervised learning K-Means method used to cluster the project history using the features we selected We named the clusters using ID and the results are shown in the table The result is not acceptable by evaluation Cluster ID Total Size
76 Process I: Data Mining Other tables artifact table Forum table People_job table Project_task table Doc_data table User_group table UNION User_project_act table Admin_flags? No Grantcvs? No Assigned? Yes No Yes Activities? Yes No Yes Administrator Core developer Co-developer Active user lurker
77 Process I: Data Mining
78 Clustering Result Evaluation Evaluation test set generation Popular/unpopular projects Stratified sampling to make 500 projects Feature sets used Popular feature set Activity Feature set (Page 34, Table 3.2) Network Feature set (Page35, Table 3.3) Generating rules for the test sets Calculating the support and confidence value
79 Popularity Definition Feature Developers Downloads Site_views Subdomain_views Page_views Description Number of core developers Number of downloads Number of views of the website Number of views of the subdomain Number of views of the pages
80 Why K-MEAN? The algorithm has remained extremely popular because it converges extremely quickly in practice. In fact, many have observed that the number of iterations is typically much less than the number of points. K-Means is most successful algorithm in large data set (size>1000, dimension > 2) than GA and Evolution CLIQUE is sensitive to noise CURE is not scalable O(n 2 logn) CLARANS & BIRCH are not good for high dimension data D. Arthur, S. Vassilvitskii (2006): "How Slow is the k- means Method?," Proceedings of the 2006 Symposium on Computational Geometry (SoCG).
81 K-MEAN It maximizes inter-cluster (or minimizes intra-cluster) variance, but does not ensure that the result has a global minimum of variance. Multiple run is needed. Elbow criterion
82 Distribution Categories Category Feature File release New message Followup message Artifact request Todo request Support request Feature request Patch request Bug reports Bug assigned Patch assigned Feature assigned Support assigned Todo assigned Artifact assigned
83 Process III: Computer Simulation Start Simulation model procedure New Users User List User_Project Links Project List User Action Project Pool Update Idle Drop Create Join Weighted Project Pool End of Simu? No Yes Stop
84 Process III: Computer Simulation Poisson Process: It expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate, and are independent of the time since the last event. PDF: e F( k;!) = k! k! "!
85 Process III: Computer Simulation Log-normal distribution:
86 Process III: Computer Simulation Kolmogorov-Smirnov test Used to determine whether two underlying onedimensional distributions differ. Two one-sided K-S test statistics are given by D D + n! n = max( F n ( x) = max( F( x)! F! F( x)) n ( x))
87 Process III: Computer Simulation
88 Similar Publications Chapter III (data mining) JMLR: G. Hamerly, E. Perelman..Using machine learning to guide simulation (Feb. 2006) JSS: S. Kim, J. Yoon..Shape-based retrieval in time-series database (Feb. 2006) Chapter IV (network analysis) JNSM: Special Issue Self-Managing Systems and Networks JoSS: The Journal of Social Structure (JoSS) is an electronic journal of the International Network for Social Network Analysis (INSNA) Chapter V (computer simulation) SSC 2007: simulation co IEEE/CSE: E. Luijten..Fluid simulation with monte carlo algorithm (2006 Vol. 8, Issue 2) Chapter VI (research collaboratory) CITSA 2007: L. Koukianakis..A system for hybrid learning and hybrid psychology (2005) JCSA: S. Chen, K. Wen..An Integrated System for Cancer-Related Genes Mining from Biomedical Literatures (2006)
Data Mining Project History in Open Source Software Communities
Data Mining Project History in Open Source Software Communities Yongqin Gao ygao@nd.edu Yingping Huang yhuang3@nd.edu Greg Madey gmadey@nd.edu Abstract Understanding the Open Source Software (OSS) movement
More informationModelingandSimulationofthe OpenSourceSoftware Community
ModelingandSimulationofthe OpenSourceSoftware Community Yongqin Gao, GregMadey Departmentof ComputerScience and Engineering University ofnotre Dame ygao,gmadey@nd.edu Vince Freeh Department of ComputerScience
More informationA TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY
A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY Jin Xu,Yongqin Gao, Scott Christley & Gregory Madey Department of Computer Science and Engineering University of Notre Dame Notre
More informationCOMPUTATIONAL DISCOVERY IN EVOLVING COMPLEX NETWORKS. A Dissertation. Submitted to the Graduate School. of the University of Notre Dame
COMPUTATIONAL DISCOVERY IN EVOLVING COMPLEX NETWORKS A Dissertation Submitted to the Graduate School of the University of Notre Dame in Partial Fulfillment of the Requirements for the Degree of Doctor
More informationA MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS ABSTRACT
A MULTI-MODEL DOCKING EXPERIMENT OF DYNAMIC SOCIAL NETWORK SIMULATIONS Jin Xu Yongqin Gao Jeffrey Goett Gregory Madey Dept. of Comp. Science University of Notre Dame Notre Dame, IN 46556 Email: {jxu, ygao,
More informationThe Computer Experiment in Computational Social Science
The Computer Experiment in Computational Social Science Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame http://www.nd.edu/~gmadey Eighth Annual Swarm Users/Researchers Conference
More informationOpen Source Software Developer and Project Networks
Open Source Software Developer and Project Networks Matthew Van Antwerp and Greg Madey University of Notre Dame {mvanantw,gmadey}@cse.nd.edu Abstract. This paper outlines complex network concepts and how
More informationModeling and Simulation of a Complex Social System: A Case Study
Modeling and Simulation of a Complex Social System: A Case Study Yongqin Gao Computer Science and Engineering Dept. University of Notre Dame Notre Dame, IN 66 ygao@nd.edu Vincent Freeh Department of Computer
More informationA TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY. Jin Xu Yongqin Gao Scott Christley Gregory Madey
Proceedings of the 8th Hawaii International Conference on System Sciences - A TOPOLOGICAL ANALYSIS OF THE OPEN SOURCE SOFTWARE DEVELOPMENT COMMUNITY Jin Xu Yongqin Gao Scott Christley Gregory Madey Dept.
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationAnalysis of Activity in the Open Source Software Development Community
Analysis of Activity in the Open Source Software Development Community Scott Christley and Greg Madey Dept. of Computer Science and Engineering University of Notre Dame Notre Dame, IN 44656 Email: {schristl,gmadey}@nd.edu
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationAGENT-BASED MODELING AND SIMULATION OF COLLABORATIVE SOCIAL NETWORKS
AGENT-BASED MODELING AND SIMULATION OF COLLABORATIVE SOCIAL NETWORKS Greg Madey Yongqin Gao Computer Science University of Notre Dame gmadey@nd.edu ygao1@nd.edu Vincent Freeh Computer Science North Carolina
More informationChapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationAn Alternative Web Search Strategy? Abstract
An Alternative Web Search Strategy? V.-H. Winterer, Rechenzentrum Universität Freiburg (Dated: November 2007) Abstract We propose an alternative Web search strategy taking advantage of the knowledge on
More informationThe Importance of Social Network Structure in the Open Source Software Developer Community
The Importance of Social Network Structure in the Open Source Software Developer Community Matthew Van Antwerp Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556
More information11 Application of Social Network Analysis to the Study of Open Source Software
Elsevier AMS 0bsd -3-8:p.m. Page: 2 The Economics of Open Source Software Development Jürgen Bitzer and Philipp J. H. Schröder (Editors) Published by Elsevier B.V. Application of Social Network Analysis
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationModeling the Free/Open Source Software Community: A Quantitative Investigation
Modeling the Free/Open Source Software Community: A Quantitative Investigation Gregory Madey Computer Science & Engineering University of Notre Dame Phone: 574-631-8752 Fax: 574-631-9260 gmadey@nd.edu
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationVirtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationBig Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationComplex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationUsing Networks to Visualize and Understand Participation on SourceForge.net
Nathan Oostendorp; Mailbox #200 SI708 Networks Theory and Application Final Project Report Using Networks to Visualize and Understand Participation on SourceForge.net SourceForge.net is an online repository
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationBisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationAn Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Oriented Network Evolution Mechanism for Online Communities Caihong Sun and Xiaoping Yang School of Information, Renmin University of China, Beijing 100872, P.R. China {chsun.vang> @ruc.edu.cn
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
More informationData Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
More informationIC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
More informationEchidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
More informationSupporting Knowledge Collaboration Using Social Networks in a Large-Scale Online Community of Software Development Projects
Supporting Knowledge Collaboration Using Social Networks in a Large-Scale Online Community of Software Development Projects Masao Ohira Tetsuya Ohoka Takeshi Kakimoto Naoki Ohsugi Ken-ichi Matsumoto Graduate
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationResearch on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationStandardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationGraphs over Time Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationGraph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
More informationRule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationHow To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationReinventing Business Intelligence through Big Data
Reinventing Business Intelligence through Big Data Dr. Flavio Villanustre VP, Technology and lead of the Open Source HPCC Systems initiative LexisNexis Risk Solutions Reed Elsevier LEXISNEXIS From RISK
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationText Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
More informationKeywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
More informationComparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationA Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study
A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and An Empirical Study 1 C. -F. Chien J. -C. Cheng Y. -S. Lin 1 Department of Industrial Engineering, National Tsing Hua University
More informationApplication of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationClustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting
More informationKeywords : Data Warehouse, Data Warehouse Testing, Lifecycle based Testing
Volume 4, Issue 12, December 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Lifecycle
More informationExploring Big Data in Social Networks
Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
More informationBuilding well-balanced CDN 1
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 679 683 ISBN 978-83-60810-51-4 Building well-balanced CDN 1 Piotr Stapp, Piotr Zgadzaj Warsaw University of Technology
More informationPerformance Analysis of Book Recommendation System on Hadoop Platform
Performance Analysis of Book Recommendation System on Hadoop Platform Sugandha Bhatia #1, Surbhi Sehgal #2, Seema Sharma #3 Department of Computer Science & Engineering, Amity School of Engineering & Technology,
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationClustering Methods in Data Mining with its Applications in High Education
2012 International Conference on Education Technology and Computer (ICETC2012) IPCSIT vol.43 (2012) (2012) IACSIT Press, Singapore Clustering Methods in Data Mining with its Applications in High Education
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationGENERATING AN ASSORTATIVE NETWORK WITH A GIVEN DEGREE DISTRIBUTION
International Journal of Bifurcation and Chaos, Vol. 18, o. 11 (2008) 3495 3502 c World Scientific Publishing Company GEERATIG A ASSORTATIVE ETWORK WITH A GIVE DEGREE DISTRIBUTIO JI ZHOU, XIAOKE XU, JIE
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationContinuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationBuilding Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu
Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationTowards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More information