International Journal of Advance Research in Computer Science and Management Studies
|
|
|
- Oswin Bates
- 9 years ago
- Views:
Transcription
1 Volume 2, Issue 12, December 2014 ISSN: (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: An Efficient Mechanism for Data Mining with Clustering and Classification Analysis as a Hybrid Approach M. Nageshu 1 Research Scholar, Department of Computer Science, Sri Venkateswara University, Tirupati, India Dr. E. Kesavulu Reddy 2 Assistant Professor, Department of Computer Science, Sri Venkateswara University, Tirupati, India Abstract: Appearance of modern techniques for scientific records collection has resulted in big scale accumulation of data pertaining to various fields. Predictable database querying methods are insufficient to extract functional information from enormous data banks. In this research, we are using clustering with classification and decision tree methods to mine the data by using hybrid algorithms like EM, K-MEANS and HAC algorithms from clustering, J48 and C4.5 algorithms from decision making and it can generate the improved outcome than the conventional algorithms. It also performs the proportional study of these algorithms to acquire elevated accuracy. This contrast is capable to discover clusters in huge high dimensional spaces proficiently. It is appropriate for clustering in the complete dimensional space as fit as in subspaces. Experiments on both synthetic data and real-life data demonstrate that the method is successful and also balance well for huge high dimensional datasets. Keywords: HAC, EM, J48, C4.5, K-Means, Data Mining, Clustering, Decision tree. I. INTRODUCTION Today Information Technology plays a very important function in every aspects of the human life. It is extremely necessary to collect data from dissimilar sources. This data can be stored and maintained to create information and knowledge. Data mining is the non trivial procedure of identifying suitable, original, potentially helpful and eventually understandable patterns in data. With the extensive use of databases and the volatile development in their sizes, organizations are faced with the problem of information burden. The problem of effectively utilize these huge volumes of data is becoming a foremost problem or all enterprise. We used special algorithms to mine the precious data. To extract the data we use these significant steps or tasks: Classification use to classify the data items into the predefined classes and discover the replica to analysis. Regression identifies real valued variables. Clustering use to illustrate the data and categories into comparable objects in groups, get the dependencies between variables, extract the data via tools. Cluster analysis groups objects based on the information found in the data relating the objects or their relationships. Clustering is a tool for data analysis, which solves classification problems. Its object is to allocate cases into groups, so that the quantity of association to be strong between members of the identical cluster and weak between members of dissimilar clusters. This way every cluster describes, in requisites of data collected, the class to which its members belong. Classification is an essential task in data mining. It belongs to intended for learning and the core methods include decision tree, neural network and genetic algorithm. Decision tree build its finest tree form by selecting essential relationship features. Special decision making methods will adopt different technologies to resolve these problems. II. BACK GROUND Data mining is the study and analysis of big data sets, in order to find out meaningful pattern and rules. The key scheme is to discover useful way to merge the computer s power to process the data with the human eye s ability to identify patterns. The idea of data mining is to plan and work professionally with large data sets. Data mining is the component of wider procedure 2014, IJARCSMS All Rights Reserved 405 P age
2 called knowledge discovery from database. Data Mining is the procedure of analyze data from dissimilar perspectives the results as positive information. It has been defined as "the nontrivial process of identifying suitable, original, potentially useful, and eventually reasonable patterns in data" The meaning of data mining is directly associated to one more usually used word knowledge discovery. Data mining is an interdisciplinary, incorporated database, artificial intelligence, machine learning, statistics, etc. Several areas of theory and technology in present era are databases, artificial intelligence, data mining and statistics is a study of three strong large expertise pillars. Data mining is a multi-step process; require accessing and preparing data for a mining the data, data mining algorithm, analyzing consequences and taking suitable action. The data, which is accessed can be stored in one or further operational databases. In data mining the information can be mined by passing different processes. A. Data Mining Application Areas Figure 1: Process of Knowledge Discovery Data mining is determined in part by latest applications which necessitate new capabilities that are not at present being supplied by today s technology. These new applications can be logically into the following categories.» Business and E-Commerce» Scientific, Engineering and Health Care Data» Telecommunication Industry» Biological Data Analysis» Intrusion Detection» Other Scientific Applications B. Data Mining Tasks Data mining tasks are principally classified into the following wide categories:» Classification/Prediction» Estimation» Association» Clustering» Visualization 2014, IJARCSMS All Rights Reserved ISSN: (Online) 406 P age
3 C. Issues In Data Mining Data mining has evolved into a significant and dynamic area of research because of the hypothetical challenges and realistic applications connected with the problem of discovering exciting and previously unknown knowledge from real-world databases. The major challenge to the data mining and the equivalent consideration in scheming the algorithms are as follows: 1. Huge datasets and high dimensionality. 2. Over fitting and assessing the statistical implication. 3. Understandability of patterns. 4. Non-standard unfinished data and data integration. 5. Diverse changing and redundant data. III. DECISION TREE ANALYSIS Decision tree/making is a popular classification method that results in a flow chart like tree arrangement where every node denotes a test on an attribute value and every branch represents an outcome of the test. Decision tree is a supervised data mining technique. It can be used to partition a big collection of data in to smaller sets by recursively applying two-way and /or multi way. Using the data, the decision tree technique generates a tree that consists of nodes that are rules. Every leaf node represents a classification or a decision. The training process that generates the tree is called induction. It has been shown in different studies that employing pruning methods can develop the overview performance of a decision tree, particularly in noisy domains according to this methodology; a loosely stopping criterion is used, letting the decision tree to over fit the training set. Then the over-fitted tree is cut back into a minor tree by removing sub branches that are not contributing to the simplification accuracy. IV. HYBRID APPROACH We are using Hybrid techniques of clustering and decision tree method for large dimensional dataset. Clustering analysis is an essential and popular data analysis technique that is great mixture of fields. Clustering and decision tree are the frequently used methods of data mining. Clustering can be used for describing and decision tree can be applied to analyzing. After combining these two methods successfully we evaluate the usefulness of clustering data mining algorithms HAC and EM with the conventional algorithms with using decision tree algorithms C4.5 and J48 by applying them to data sets. After using the hybridization, algorithms construct the greatest classification accuracy and also show the advanced robustness and generalization capability compared to the further algorithms. V. PROPOSED WORK We contrast the efficiency of two stage clustering and decision tree data mining algorithms by applying them to data sets. Testing results will show that like two stage algorithms create the greatest classification accuracy and also demonstrate the higher robustness and simplification capability compared to the other traditional algorithms. Our approach can eliminate the shortcoming of hybridization of algorithms (clustering and decision tree algorithms) and develop the results on applying them to data sets. Our approach gives us efficient results, improved performance and reduces the error rate than the traditional algorithms of clustering and classification in data mining. We consider the absolute value equation (AVE): Ax- x =b Where A R n n and b R n are given, and denotes absolute value. A slightly more common form of the AVE, Ax + B x = b was introduced for investigated algorithmically in a novel general context. Our current hybrid approach consists of an iterative process whereby we first solve a system of linear equations based on AVE. 2014, IJARCSMS All Rights Reserved ISSN: (Online) 407 P age
4 A. K- Means Method 1. Select k points as the initial centroids in a random way. 2. (Re) allocate all objects to the closest centroid. 3. Recalculate the centroid of every cluster. 4. Repeat steps 2 and 3 until a termination criterion is met. 5. Pass the clarification to the next stage. B. HAC (Hierarchical Agglomerative Clustering) 1. Calculate the proximity matrix containing the distance between every pair of patterns. Treat every pattern as a cluster. 2. Find the majority comparable pair of clusters using the proximity matrix. Combine these two clusters into one cluster. Modernize the proximity matrix to reflect this merge operation. 3. If all patterns are in one cluster, stop. Otherwise, go to step 2. C. EM (Expectation Maximization) Although the fact that EM can infrequently get stuck in a limited maximum as you guess the parameters by maximizing the log likelihood of the experimental data, in my mind there are three things that construct it magical:» The capability to concurrently optimize a huge number of variables.» The ability to discover good estimates for any absent information in your data at the same time.» In the situation of clustering multidimensional data that lends itself to modeling by a Gaussian mixture, the ability to generate both the traditional hard clusters and not-so- traditional soft clusters. D. J48 Classification is the procedure of build a model of classes from a set of records that hold class labels. Decision Tree Algorithm is to discover exposed the way the attributes-vector behaves for a number of instances. Furthermore on the bases of the training instances the classes for the recently generated instances is being originated. This algorithm generates the rules for the prediction of the objective variable. With facilitate of tree classification algorithm the significant distribution of the data is easily understandable. J48 creates a binary tree. The decision tree approach is most useful in classification problem. With this technique, a tree is constructed to model the classification process. Once the tree is built, it is useful to each tuple in the database and results in classification for that tuple. D. C4.5 C4.5 is a development of ID3, presented by the same author (Quinlan, 1993). It uses gain ratio as splitting criteria. The splitting ceases when the number of instances to be split is below a certain threshold. Error based pruning is performed after the mounting phase. C4.5 can hold numeric attributes. It can induce from a training set that incorporates missing values by using corrected gain ratio criteria. VI. PERFORMANCE EVALUATION This section has shown the evaluation of the dissimilar data mining algorithms applied on several data sets with data mining toolkit. The formula to estimate accuracy is: TP TN TP TN FP FN (1) 2014, IJARCSMS All Rights Reserved ISSN: (Online) 408 P age
5 TP FP TN FN FN TP FP TP (2) TOTAL TOTAL In the equation (1) TA represents Total Accuracy, TP is True Positive, TN is True Negative, FP is False Positive and FN is False Negative. In equation (2) RA represents Random Accuracy. Table 1: Prediction Performance Comparison of Algorithms ALGORITHM ACCURACY ERROR K-Means HAC EM J C VII. CONCLUSION This research work can improve the performance of traditional algorithms Like K-Means and presents a hybrid approach like algorithm HAC, EM, J48 and C4.5 and for mining large-scale high dimensional datasets. The frequently used algorithm is K-Means which can deal with small convex datasets preferably. It reduces the error rate and achieves accuracy. This research compares the efficiency of these clustering and classification data mining algorithms by applying them to data sets. Experiment results will show that the K-Means and J48 algorithms generate the best classification accuracy and also show the high robustness and generalization capacity compared to the other algorithms. References 1. Usama Fayyad, G. Paitetsky-Shapiro, and Padhrais Smith, knowledge discovery and data mining: Towards a unifying framework, proceedings of the International Conference on Knowledge Discovery and Data Mining, pp , ZhaoHui Tang, Jamie MacLennan, Data Mining with SQL server 2005, John Wiley & Sons, Yasogha, P; M. Kannan, Analysis of a Population of Diabetics Patients Databases in Weka Tool, International Journal of Science & Engineering Research, Vol. 2, Issue 5, May Jinxin Gao, David B. Hitchcock, James-Stein Shrinkage to Improve K-means Cluster Analysis, University of South Carolina,Department of Statistics November 30, A. P. Dempster; N. M. Laird; D. B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. Pp.1-38, M. and Heckerman, D., An experimental comparison of several clustering and initialization method, Technical Report MSRTR-98-06, Microsoft Research, Redmond, WA, February, Mrs. Bharati M. Ramageri, Data Mining Techniques and Applications, Indian Journal of Computer Science and Engineering, Vol. 1 No. 4, pp , Karimella Vikram and Niraj Upadhayaya, Data Mining Tools and Techniques: a review, Computer Engineering and Intelligent Systems, Vol 2, No.8, pp.31-39, Fayyad, U., Mining Databases: Towards Algorithms for Knowledge Discovery, IEEE Bulletin of the Technical Committee on Data Engineering, 21, 1, pp.41-48, Zhong, N.; Zhou, L., Methodologies for Knowledge Discovery and Data Mining, the Third Pacific-Asia Conference, Pakdd-99, Beijing, China, April 26-28, 1999; Proceedings, Springer Verlag,, Abdolreza Hatamlo and Salwani Abdullah A Two-Stage Algorithm for Data Clustering Int Conf. Data Mining DMIN, pp , Ji Dan, Qiu Jianlin, A Synthesized Data Mining Algorithm Based on Clustering and Decision Tree, 10th IEEE International Conference on Computer and Information Technology, CIT, Ng, R.T. and Han, J., Efficient and effective clustering methods for spatial data mining. In Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, pp , Agrawal, R., Imielinski, T., and Swami, A., Database mining: A performance perspective, IEEE Transactions on Knowledge and Data Engineering, 5.6, pp , , IJARCSMS All Rights Reserved ISSN: (Online) 409 P age
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS
ANALYSIS OF VARIOUS CLUSTERING ALGORITHMS OF DATA MINING ON HEALTH INFORMATICS 1 PANKAJ SAXENA & 2 SUSHMA LEHRI 1 Deptt. Of Computer Applications, RBS Management Techanical Campus, Agra 2 Institute of
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool
Comparison and Analysis of Various Clustering Metho in Data mining On Education data set Using the weak tool Abstract:- Data mining is used to find the hidden information pattern and relationship between
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
A Two-Step Method for Clustering Mixed Categroical and Numeric Data
Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
A comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Data mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
Enhanced data mining analysis in higher educational system using rough set theory
African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Data Mining in the Application of Criminal Cases Based on Decision Tree
8 Journal of Computer Science and Information Technology, Vol. 1 No. 2, December 2013 Data Mining in the Application of Criminal Cases Based on Decision Tree Ruijuan Hu 1 Abstract A briefing on data mining
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
Hadoop Operations Management for Big Data Clusters in Telecommunication Industry
Hadoop Operations Management for Big Data Clusters in Telecommunication Industry N. Kamalraj Asst. Prof., Department of Computer Technology Dr. SNS Rajalakshmi College of Arts and Science Coimbatore-49
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Effective Data Mining Using Neural Networks
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance
Consolidated Tree Classifier Learning in a Car Insurance Fraud Detection Domain with Class Imbalance Jesús M. Pérez, Javier Muguerza, Olatz Arbelaitz, Ibai Gurrutxaga, and José I. Martín Dept. of Computer
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa [email protected] skype, gtalk: avellido Tels.:
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Building Data Cubes and Mining Them. Jelena Jovanovic Email: [email protected]
Building Data Cubes and Mining Them Jelena Jovanovic Email: [email protected] KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Optimization of C4.5 Decision Tree Algorithm for Data Mining Application
Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
