K-Means algorithm with different distance metrics in spatial data mining with uses of NetBeans IDE 8.2
|
|
|
- Noreen Walsh
- 8 years ago
- Views:
Transcription
1 K-Means algorithm with different distance metrics in spatial data mining with uses of NetBeans IDE 8.2 Ms. Kothariya Arzoo 1, Asst. Prof. Kirit Rathod 2 1 M. Tech student, Computer Engineering, C.U. Shah College of engineering and technology, Gujarat, India 2 Asst. Prof, Computer Engineering, C.U. Shah College of engineering and technology, Gujarat, India *** Abstract -Data mining is a process of finding useful information from large database. Clustering is a process of grouping the same characteristics elements in one group (cluster) and while distinct characteristics elements in different group (cluster).k-means is very simple and very popular clustering technique. In this paper we will do the experiments with spatial data mining. Spatial data mining is the application of data mining. In spatial data mining spatial or geographic dataset is used. Distance metrics play very important role in clustering technique. In this paper we will do the experiments with the NetBeans IDE 8.2 and taking spatial data from Indian government website. This paper includes implementation analysis on k means clustering with different distance metrics with taking spatial database. Key Words: Clustering, Spatial Data mining, K-Means, Distance Metrics 1. INTRODUCTION variants [6]. There are many parameters for modifying k means algorithm based centroid initialization, distance metrics, improving accuracy. There are many limitations of k means algorithm [8] like handling empty cluster, outlier detection, distance metrics, number of cluster predefined, cluster center chosen randomly etc. Spatial data mining [9] [10] [11] requires specific resources to get the spatial database in specific format. The application covered by spatial data mining are geomarketing, environmental studies, risk analysis, and so on. The aim of clustering is to automatically find groups of instances that are similar to each other. For example, in classroom there is cluster of students with similarities in their birth date is in same month. Using GIS, the user can query spatial data and perform simple analytical tasks using programs or queries. However, GIS are not designed to perform complex data analysis or knowledge discovery [12]. There are many distance metrics [13] [14] [15] are available like Euclidean, Manhattan (city block), Chebychev, Minkowski etc. Data mining [1] [2] involves process of finding useful information. Data mining consists of steps of a) data cleaning, b) data selection, c) data integration, d) data mining, e) pattern evaluation and finally f) knowledge representation [2]. Nowadays large amount of data generating day by day but this all data are not useful for us. So. Data mining is very essential for us in daily life. The data Clustering [3] involves the process of dividing the same data in one cluster and distinct data in different cluster so that inter cluster property should high and intra cluster property should low. These two are the clustering property. Clustering is the popular research topic nowadays. Classification of Clustering algorithm as: partitioning based clustering, hierarchical based clustering, grid based clustering etc [4]. A partitional clustering a simply a division of the set of data objects into non-overlapping subsets such that each data object is in exactly one subset. A hierarchical clustering is a set of nested clusters that are structured as a tree. Clustering process involves feature selection, clustering algorithm, cluster validation, result interpretation, knowledge [5]. K- means [6] [7] is very simple clustering algorithm in data mining. Clustering is unsupervised technique [3] [4] such that there is no any test data and training data are available by which we can predict our result. Clustering is totally unsupervised method for data mining. There are many research and review paper available for k means and its 1.1 K-Means Algorithm The K-means [6] [7] algorithm involves randomly selecting K initial centroids or mean where K is a user defined number of desired clusters. For each of the object the distance is calculated between center points and data points and with minimum distance data, cluster is generated. This data points are far from another cluster or group. This computation is stop when no center points do not move any more. K means algorithm work as follows: 1. Initialization by setting initial centroids with a predefined k. 2. Cluster or group the data points in given k clusters. 3. Assign data or objects to nearest cluster center as per distance function. 4. When all objects are assigned recalculate or update the position of k centroids. 5. Repeat step 3 and 4 until the centroids no longer move. Diagrammatic representation of k means algorithm as follows [17]: 2017, IRJET Impact Factor value: ISO 9001:2008 Certified Journal Page 2363
2 The formula for calculate Chebychev distance is as follows: Fig-1: Flowchart of k means algorithm Chebyshev distance is also known as maximum value difference. It is also known as chessboard distance as it can be illustrated on real number plane as the number of moves needed by chess king to travel from one point to another point. 4) Minkowski Distance: Minkowski distance [14] is defined as generalization of both Euclidean and Manhattan distance metric. The formula for calculate Minkowski distance is defined as follows: 1.2 DISTANCE METRICS Distance metrics [13] [14] [15] play very important role to measure similarities between objects. Distance metrics are used to find how far the data points to the center points and by this distance the algorithm can compute which data objects in one cluster while another is in different cluster. Many different distance metrics are available like Euclidean, Manhattan (City Block) [16], Chebychev etc. In general, squared Euclidean distance metrics is used in k-means algorithm but here we will do the experiments of different distance metrics in k means algorithm using spatial database. The overview of different distance metrics are as follows: 1) Euclidean Distance : The Euclidean distance [13] between two points, a and b, with k dimensions is calculated as : The Euclidean distance or Euclidean metric is the ordinary distance between two points that one would measure with a ruler. It is the straight line distance between two points. Different names for the Minkowski distance or Minkowski metric arise form the order: λ = 1 is the Manhattan distance. Synonyms are L 1- Norm, Taxicab or City-Block distance. For two vectors of ranked ordinal variables the Mahattan distance is sometimes called Footruler distance. λ = 2 is the Euclidean distance. Synonyms are L 2- Norm or Ruler distance. For two vectors of ranked ordinal variables the Euclidean distance is sometimes called Spearman distance. λ = is the Chebyshev distance. Synonym are L max-norm or Chessboard distance. The Minkowski distance is often used when variables are measured on ratio scales with an absolute zero value 1.3 SPATIAL DATA MINING 2) Manhattan(City Block) Manhattan distance [16] is also named as city block distance because it is a distance the car would drive in a city put out in square blocks like Manhattan. The formula for calculate Manhattan distance is as follows: Manhattan distance is also known as L1 distance. The distance between two points is the absolute difference between the points. Absolute value distance gives more robust result whereas Euclidean influenced by unusual values. 3) Chebychev Distance Chebychev distance (Tchebychev) [15] is the distance between two vectors is the greatest of their differences along any coordinate dimension. It is named after Pafnuty Chebychev. Spatial data mining [8] [9] is the process of discovering useful, interesting, non trivial pattern from large spatial database. Spatial data mining (SDM) consists of extracting knowledge, spatial relationships and any other properties which are not explicitly stored in the database. SDM is used to find implicit regularities, relations between spatial data and/or non-spatial data [11]. Spatial data mining is the discovery of interesting the relationship and characteristics that may exist implicitly in spatial databases. There are many spatial data mining application like risk analysis, earthquake analysis, flood analysis, in medical research, in army center etc. if in any place earthquake is placed then spatial data mining is very useful to measure damage causes because of earthquake. In astronomy also spatial data mining is very useful. So, spatial data mining is data mining application which is related to geometric or spatial information and it is used to mine the spatial data. Spatial Data Mining (SDM) [8] [9] is a process of discovering trends or patterns from large spatial databases that hold geographical data. Objects in 2017, IRJET Impact Factor value: ISO 9001:2008 Certified Journal Page 2364
3 space such as roads, rivers, forests, deserts, buildings, cities etc., are stored in spatial database. Geographic data collection devices connected to global positioning system (GIS) receivers allow field researchers to collect unprecedented amounts of data. Position alert devices such as cell phones, invehicle navigation systems and wireless Internet clients allow tracking of single movement behaviour in space and time [11]. 3. Data point is assigned to the cluster center whose distance from the cluster center is minimum of all the cluster centers. 4. New cluster center is calculated using: 2. PROPOSED ALGORITHM Methodology used in simple k-means The formal algorithm is: a) Select cluster center randomly. b) Calculate the distance between data points and center points using Euclidean distance formula. c) Data points are assigned to cluster with minimum distance. d) Repeat step b and c until centroid do not change. In K-Means algorithm [6] [7], we calculate the distance between each point of the dataset to every centroid initialized. Based on the values found, points are assigned to the centroid with minimum distance. The main goal of proposed algorithm is to analyze how different distance metrics perform in k means clustering algorithm. In my proposed algorithm we have modified distance function formula in basic k means algorithm and analyze the result. I merge the two equation of Euclidean distance and Chebychev distance and apply that distance formula in basic k means algorithm. 1) Euclidean distance formula: 2) Chebychev distance formula: Proposed Equation as follows: So, the algorithm as follows: 1. Select c cluster centers randomly. 2. Calculate the distance between each data point and cluster centers using the new distance metric as follows: where, ci denotes the number of data points in ith cluster. 5. The distance between each data point and new obtained cluster centers is recalculated. 6. If no data point was reassigned then stop, otherwise repeat steps from 3 to 5. Table -1: Comparison table of old algorithm and new algorithm Old Algorithm New Algorithm In old algorithm In new algorithm the only Euclidean new distance distance measure is used to find distance measure is used. between center points and data In new algorithm points. there are merging of two distance measure: Euclidean distance and Chebychev distance. And develop new distance measure. This new distance measure is used to find distance between data points and center points. 3. EXPERIMENTS AND RESULT ANALYSIS Software requirements: To do the experiments I used NetBeans IDE 8.2. Some Applications of NetBeans are listed here: 1) best support for latest java technologies 2) fast and smart code editing 3) easy and efficient project management 4) rapid user interface development 5) write bug free code For more information about NetBeans please visit the website , IRJET Impact Factor value: ISO 9001:2008 Certified Journal Page 2365
4 Fig-2: Screenshot of NetBeans IDE SPATIAL DATA Spatial data [12] also known as geographical data. It is the data that identifies geographical location of features and boundaries on earth, ocean etc. spatial data are data that have spatial component it means that data are connected to a place in the earth. Any data that is dependent on its location is called spatial data even human s weight is also called spatial data because it is dependent on its location. Non spatial data is that information which is independent of all geometric consideration. For example human s age, height etc. are non-spatial data. Examples of non-spatial data Names, phone numbers, addresses of people Examples of Spatial data Census Data NASA satellites imagery - terabytes of data per day Weather and Climate Data Rivers, Farms, ecological impact Medical Imaging In my experiments I used data of rainfall_areawt_india_ I downloaded this data from the website of This is the India s most reputable website for geographic dataset. Data are freely available on this website. Another website for geographic dataset is etc. for download the data from the bhuvan site you have to first register on the website and then you can login and download the data. 3.2 EXPERIMENTS We do all experiments in NetBeans IDE 8.2 and the screenshot of output of the algorithm as follows:. Fig-3: Screenshot of NetBeans IDE, where k means performed with Euclidean distance metric Fig-4: Screenshot of NetBeans IDE, where k means performed with Chebychev distance metric Fig-5: Screenshot of NetBeans IDE, where k means performed with Minkowski distance metric with p=4 ra Fig-6: Screenshot of NetBeans IDE, where k means performed with Minkowski distance metric with p=5 2017, IRJET Impact Factor value: ISO 9001:2008 Certified Journal Page 2366
5 distance metrics and Manhattan distance metrics take low time than new distance metrics. Fig-7: Screenshot of NetBeans IDE, where k means performed with Manhattan distance metric Fig-8: Screenshot of NetBeans IDE, where k means performed with new distance metric 3.3 COMPARISON OF DIFFERENT DISTANCE METRICS IN BASIS OF TIME Chart -1: Comparison chart 4. CONCLUSIONS From whole my experiments and literature review experience I conclude that, in general in all k means algorithm mostly Euclidean distance metrics is used. But we try different distance metrics in k means algorithm using spatial database and we have conclude that the new distance metrics that is described in proposed algorithm is better work than Euclidean distance and Minkowski distance with two different values in basis of time. While Chebychev REFERENCES [1]Venkatadri.M, Dr. Lokanatha C. Reddy. A Review on Data mining from Past to the Future, International Journal of Computer Applications ( ) Volume 15 No.7, February 2011 [2]. J. Han, M. Kamber, Data Mining, Morgan Kaufmann Publishers, [3] A.K. Jain, M. N. Murty, P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol. 31, pp , Sep [4]P.Berkhin.(2001) Survey of Clustering Data Mining Techniques [Online]. Available: ew.pdf. [5] Rui Xu, Donald C. Wunsch II, Survey of Clustering Algorithms, IEEE Transactions on neural Networks, vol. 16, pp , May 2005 [6]Aakansha Chaudhry, Survey of K-Means and its variants, International Journal of Innovative Research in computer and communication Engineering,Vol 4, Issue 1, 2016 [7] Kapil Joshi, Himanshu Gupta, Prashant Chaudhry,Punit Sharma. Survey on different Enhanced k-means Clustering Algorithm. International journal of Engineering trends and technology (IJETT). [8] Aakunuri Manjula, Dr.G.Narsimha. A REVIEW ON SPATIAL DATA MINING METHODS AND APPLICATIONS, International Journal of Computer Engineering and Applications, Volume VII, Issue I, Part II, July 14 [9] M.Hemalatha.M; Naga Saranya.N. A Recent Survey on Knowledge Discovery in Spatial Data Mining, IJCI International Journal of Computer Science, Vol 8, Issue 3, No.2, may, 2011 [10] Harvey J. Mailer, Jiawei Han, GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY: AN OVERVIEW, GKD chapter 1 v8. [11] Kehar singh, dimple malik, Naveen Sharma.Evolving Limitations in k-means algorithm in data mining and their removal. IJCEM International Journal of Computational Engineering & Management, Vol. 12, April 2011 [12] Gangireddy Ravikumar, Mallireddy Sivareddy. An Effective analysis of spatial data mining methods using range queries. Journal of global research in computer science [13] Archana singh, Avantika Yadav,Ajay rana. K-Means with three different distance metrics, International journal of computer Applications, ( ) Volume 67 No.10, April 2013 [14] B.S.Charulatha, Paul Rodrigues, T.Chitralekha, Arun Rajaraman Member IEEE A Comparative study of different distance metrics that can be used in Fuzzy Clustering Algorithms, International Journal of Emerging trends & 2017, IRJET Impact Factor value: ISO 9001:2008 Certified Journal Page 2367
6 technology in computer science(ijettcs)- Special Issue ISSN [15] Dibya Jyoti Bora,Dr. Anil Kumar Gupta. Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab, (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2), 2014, [16]ple.revoledu.com/kardi/tutorial/Similiarity/CityBlockDi stance.html [17] , IRJET Impact Factor value: ISO 9001:2008 Certified Journal Page 2368
Fig. 1 A typical Knowledge Discovery process [2]
Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering
Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points
Journal of Computer Science 6 (3): 363-368, 2010 ISSN 1549-3636 2010 Science Publications Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions
EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING
EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara- Punjab Abstract Clustering is an essential
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
MINING EDUCATIONAL DATA USING DATA MINING TECHNIQUES AND ALGORITHMS A REVIEW
International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 www.irjet.net p-issn: 2395-0072 MINING EDUCATIONAL DATA USING DATA MINING TECHNIQUES
A Comparative Study of clustering algorithms Using weka tools
A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering
Extensive Survey on Hierarchical Clustering Methods in Data Mining
Extensive Survey on Hierarchical Clustering Methods in Data Mining Dipak P Dabhi 1, Mihir R Patel 2 1Dipak P Dabhi Assistant Professor, Computer and Information Technology (C.E & I.T), C.G.P.I.T, Gujarat,
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Chapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
A Novel Density based improved k-means Clustering Algorithm Dbkmeans
A Novel Density based improved k-means Clustering Algorithm Dbkmeans K. Mumtaz 1 and Dr. K. Duraiswamy 2, 1 Vivekanandha Institute of Information and Management Studies, Tiruchengode, India 2 KS Rangasamy
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Data Mining and Clustering Techniques
DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center
CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015
CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Leaf recognition for plant classification using GLCM and PCA methods
Oriental Journal of Computer Science & Technology Vol. 3(1), 31-36 (2010) Leaf recognition for plant classification using GLCM and PCA methods ABDOLVAHAB EHSANIRAD and SHARATH KUMAR Y. H. Department of
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)
Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (:- :) (B) Min-Yuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Clustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin [email protected] [email protected] [email protected] Introduction: Data mining is the science of extracting
A K-means-like Algorithm for K-medoids Clustering and Its Performance
A K-means-like Algorithm for K-medoids Clustering and Its Performance Hae-Sang Park*, Jong-Seok Lee and Chi-Hyuck Jun Department of Industrial and Management Engineering, POSTECH San 31 Hyoja-dong, Pohang
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Keywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Oracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
Data Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: [email protected] http://sci2s.ugr.es
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
CHAPTER-24 Mining Spatial Databases
CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. [email protected] 2 SCMS School of Technology
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
Stock Asian-African Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303-308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS
ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of
CHAPTER 3 DATA MINING AND CLUSTERING
CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge
KINGS COLLEGE OF ENGINEERING
KINGS COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ACADEMIC YEAR 2011-2012 / ODD SEMESTER SUBJECT CODE\NAME: CS1011-DATA WAREHOUSE AND DATA MINING YEAR / SEM: IV / VII UNIT I BASICS
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1. Vera Marinova Boncheva
382 [7] Reznik, A, Kussul, N., Sokolov, A.: Identification of user activity using neural networks. Cybernetics and computer techniques, vol. 123 (1999) 70 79. (in Russian) [8] Kussul, N., et al. : Multi-Agent
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 [email protected] What is Learning? "Learning denotes changes in a system that enable
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process
Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management
Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Simplified Data
Web Usage Mining: Identification of Trends Followed by the user through Neural Network
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web
Map-Reduce Algorithm for Mining Outliers in the Large Data Sets using Twister Programming Model
Map-Reduce Algorithm for Mining Outliers in the Large Data Sets using Twister Programming Model Subramanyam. RBV, Sonam. Gupta Abstract An important problem that appears often when analyzing data involves
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Clustering Breast Cancer Dataset using Self- Organizing Map Method
International Journal of Advanced Studies in Computer Science and Engineering Clustering Breast Cancer Dataset using Self- Organizing Map Method M. Thenmozhi Assistant Professor Dept. of Computer Science
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Study and Analysis of Data Mining Concepts
Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Drug Store Sales Prediction
Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
Mining for Web Engineering
Mining for Engineering A. Venkata Krishna Prasad 1, Prof. S.Ramakrishna 2 1 Associate Professor, Department of Computer Science, MIPGS, Hyderabad 2 Professor, Department of Computer Science, Sri Venkateswara
HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM
HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:
Unsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
User Behavior Analysis Based On Predictive Recommendation System for E-Learning Portal
Abstract ISSN: 2348 9510 User Behavior Analysis Based On Predictive Recommendation System for E-Learning Portal Toshi Sharma Department of CSE Truba College of Engineering & Technology Indore, India [email protected]
