Data Mining in Education: Data Classification and Decision Tree Approach
|
|
|
- Ashlyn Snow
- 10 years ago
- Views:
Transcription
1 Data Mining in Education: Data Classification and Decision Tree Approach Sonali Agarwal, G. N. Pandey, and M. D. Tiwari Abstract Educational organizations are one of the important parts of our society and playing a vital role for growth and development of any nation. Data Mining is an emerging technique with the help of this one can efficiently learn with historical data and use that knowledge for predicting future behavior of concern areas. Growth of current education system is surely enhanced if data mining has been adopted as a futuristic strategic management tool. The Data Mining tool is able to facilitate better resource utilization in terms of student performance, course development and finally the development of nation's education related standards. In this paper a student data from a community college database has been taken and various classification approaches have been performed and a comparative analysis has been done. In this research work Support Vector Machines (SVM) are established as a best classifier with maximum accuracy and minimum root mean square error (RMSE). The study also includes a comparative analysis of all Support Vector Machine Kernel types and in this the Radial Basis Kernel is identified as a best choice for Support Vector Machine. A Decision tree approach is proposed which may be taken as an important basis of selection of student during any course program. The paper is aimed to develop a faith on Data Mining techniques so that present education and business system may adopt this as a strategic management tool. Index Terms Data mining, education data mining, data classification, support vector machine, decision tree. I. INTRODUCTION As we are growing in terms of population, technology advancements, literary rate and globalization, education systems are also taking a new shape as business systems. Now Data Mining has become universal tool for strategic management in business organizations as well as social system and organizations. Today out of all infrastructural support for the development of society, education is considered as one of the key inputs for social development. This paper analyzes the data available on student s academic record and student likelihood in terms of placement may be predicted on the basis of entrance examination marks, quantitative ability marks and verbal ability marks by using Manuscript received January 9, 2012; revised March 6, S. Agarwal is working as Assistance Professor in Indian Institute of Information Technology, Allahabad, A Center of Excellence in IT established by the Human Resource Development Ministry, Government of India, New Delhi. G. N. Pandey is working as Adjunct Professor in Indian Institute of Information Technology, Allahabad, A Center of Excellence in IT established by the Human Resource Development Ministry, Government of India, New Delhi. M. D. Tiwari is the Director of the Indian Institute of Information Technology, Allahabad, A Center of Excellence in IT established by the Human Resource Development Ministry, Government of India, New Delhi. decision tree approach and decision rule approach. Considering the global opportunities coupled with global competition even in case of education it is essential to admit the best students as far as possible. So their academic performance and subsequent placements are best in the world. Data mining is useful whenever a system is dealing with large data sets. In any education system, student records i.e. enrollment details, course eligibility criteria, course interest and academic performance may be an important consideration to analyze various trends since all the systems are now computer based information system so data availability, modification and updation are a common process now. Data warehousing may be taken as good choice for maintaining the records of past history. The data warehouse can be easily developed in any education institute with the adaptation of common data standard. Common data standards may eliminate the need of data clarity and modification before loading this for a data warehouse. An institute with efficient Data Warehousing and Data Mining approach can find out novel way of improving student s behavior, success rate and course popularity. All these effort may finally improve the quality of education, better student intake, better career counseling and overall practices of education system. In Data Mining classification clustering and regression are the three key approaches. Classification is a supervised learning approach in which students are grouped into identified classes [1]. Classification rules may be identified from a part of data known as training data and further it may be tested for rest of the data [2]. The effectiveness of classification approach may be evaluated in terms of reliability of the rule with test data set. Clustering approach is based on unsupervised learning because there are no predefined classes. In this approach data may be grouped together as cluster [2], [3]. The usability of clusters in terms of relevant area may be interpreted by data mining expert. Regression is a data mining approach in which it uses the explanatory variable to predict an outcome variable. For example, performance appraisal of faculty members may be done by regression analysis. Here, faculty qualification, feedback rating, amount of content covered may be taken as explanatory variable and faculty salary, increment, bonus and perks may be estimated as outcome variable so regression may be the best way to setting few important parameters based on existing variables [2]. II. RELATED WORK As a part of this research work more than 25 papers has been explored and thoroughly studied. Few papers related to 140
2 education data mining are highlighted here. Salazar, et al. suggested a clustering and decision rule based Data Mining approach to identify group of clusters, which have been qualitatively described [4]. The papers used 20,000 students records and WEKA as Data Mining Tool. M. Ramaswami, et al. suggested a CHi-squared Automatic Interaction Detector (CHAID) Based approach to analyze the performance of higher secondary students [5]. The study reveals that few parameters like students school type, location, family background, educational organization, medium of teaching like Hindi or English are prime key factors to predict their performance in higher education. Cortez, et al. proposed a Bayesian Networks based approach for student Data Classification [6]. The research work implement binary data classification, 5 levels Data classification and a regression based analysis for predictive performance calculation of classification algorithms and descriptive knowledge analysis for better quality management in educational organization. Qasem, et al. discussed a decision tree approach based on C4.5, id3 and Naïve Bayes. The research work proposed a Cross Industry Standard Process for Data Mining (CRISP-DM) based classification model for student performance evaluation Model [7]. A critical literature review suggests that majority of the work is based on data classification by using only a specific approach or classifiers. The aim should be not only to find out a solution for the problem specified but there should be some work for identifying the best approach. Here in the present research work few classifiers are taken together and applied to the dataset in order to select best classifier for the indentified problem. There must be some comparative analysis is essential to establish the optimum approach and this is the ultimate goal of present investigation. community college database has been taken. The dataset has 4 attributes and 2000 records of student performance details. The attributes are MAT score, verbal ability score, quantitative ability score and likelihood of placement. Fig. 1. Student data cube. Data processing is required to make dataset appropriate for various classification algorithms. Here numerical data sets are converted as nominal datasets. Verbal ability score, quantitative score and MAT score has been converted as category data like Low, Medium and High. The likelihood of placement may also be defined as two categories like Yes and No.It is essential to have a suitable Data Mining tool for the purpose of carrying out Data Mining analysis of the data available. The software WEKA is suitable because it is open source. WEKA can efficiently work with limited data [11]. WEKA also provides convenient data preprocessing, cleaning and handling missing values. It takes data from excel file in Comma Separated Values (CSV) format, which is a very common application software to be used in each school / college for initial collection of data [12]. This software contains tools for a whole range of data mining tasks like Data pre-processing, Classification, Clustering, Association and Visualization. IV. CLASSIFICATION METHODS III. PROPOSED MODEL A typical Data Warehousing and Data Mining application includes data collection from heterogeneous data sources, cleaning and transformation of data and periodic updation of data field [8]. A Data Warehouse for Education Data Mining may include student personal details, academic details, examination details and accounting details. A student data cube is shown in figure where dimensions are data attributes and their values are stored in the cell. Figure 1 shows a student data cube with name, verbal ability and MAT score as attributes. Data viewing operation i.e. rolls up and drill down may be performed with respect to multi dimensional data cube [9]. For example, a student data cube is proposed for three different attributes i.e. name, MAT score and verbal ability score. Aggregate details of the student are stored in the individual cell of the data cube.. Figure 1 shows a student data cube with name, verbal ability and MAT score as attributes. Any specific view of data cube is considered as data mart. Department wise, course wise, or subject wise data mart represent a close view of data cube. Here the paper is only limited to a theoretical model of student data cube because it is an essential part of any Data Warehouse [10]. For the implementation of data classification a student dataset of Fig. 2. Classification accuracy for WEKA classifiers. WEKA has a approximately 40 classifiers divided into 4 groups. With the help of its explorer tool any classifiers may be applied to available data set. The research work has chosen 8 different classifiers for comparative analysis of performance of classifiers. As given in Figure 2 LIBSVM classification accuracy is 97.3 %, RBFNetwork accuracy is 96.05% and Multilayer Perceptron accuracy is 95.85% accuracy. Minimum accuracy is given by Logistic, Simple Logistic and Voted Perceptron classifiers. Another observation is made in terms of root relative square error as shown in Figure 3. It shows the accuracy of the predicted values [13]. Here again LIBSVM has minimum root relative 141
3 square error. For LIBSVM the study is further enhanced to observe the performance of various kernel types. It is observed that radial basis kernel has maximum accuracy and Sigmod type kernel as minimum accuracy [14]. TABLE I: THE WEKA CLASSIFIER AND THEIR WORKING PRINCIPLE Classifiers Details A multi category logistic regression model Logistic is used for Data Classification [15] Multilayer Perceptron LIBSVM RBFNetwork Simple Logistic Here back propagation approach is used to classify the datasets[15] A wrapper class for the libsvm tools for implementing Support Vector machine classifier [15], [16] It uses normalized Gaussian radial basis function for data classification [15] It is a simple linear regression model based classifier [15] V. MINING DATA WITH DECISION TREE A typical decision tree approach is an effective way of generating interesting classification rule. In decision tree approach an attribute which is required for analysis is taken as a starting node. The attribute is first classified in terms of groups and then next important attribute is again taken and classified under certain consideration. Here for the student dataset verbal ability, qantitative ability and MAT score all these parameters are important to predict the student placement. As per two chosen criteria suppose a graph is plotted with X axis as MAT Score and Y axis as verbal ability score. Figure 5 shows a Scatter Plot of Verbal Ability and MAT Score. Here as result of scatter plot two different classes i.e. successful and failure student in terms of placement represented by + and - sing. The X axis is showing MAT score ranging from 90 to 100 percentile and y axis showing Verbal Ability score ranging from 90 to 100 percentile. SMO Voted Perceptron Winnow It is an implementation of sequential minimal optimization algorithm for training a support vector classifier[17] It is based on voted perceptron algorithm by Freund and Schapire.[15] Implements Winnow and Balanced Winnow algorithms by Littlestone [15] The above study suggests that data with categorical attributes is very well classified if LIBSVM with Radial Basis Kernel has been taken as a best choice for Data classification [18]. A comparative analysis of LIBSVM with various kernel type is shown in Fig. 4. Fig. 5. Scatter plot of verbal ability and MAT score. Fig. 6. Student data separated by verbal ability and MAT score. Fig. 3. Root relative squared error for WEKA classifiers. Fig. 4. LIBSVM Classification accuracy for various kernel type. As per observed range it is clear that cut off marks for verbal ability is 90 percentile for getting admission in the particular Business School. The minimum MAT score taken for admission is 90 percentile and maximum score observed is 100. As per given details of figure 6 a graph is potted for the two dimension. One can easily observe that successful and failure students cannot be analyzed by taking any parameter alone. So MAT score or high verbal ability score may not predict anything about the success rate but when we take the combination of these, may predict rule in terms of success rates. Decision tree approach first takes a parameter which may consider as important decision making parameter. Following are the rules. 1. class YES IF: MAT_SCORE in {HIGH} ^ Quant_SCORE in {LOW} ^ VA_SCORE in {LOW} (159) 142
4 2. class YES IF: MAT_SCORE in {LOW,MEDIUM,HIGH} ^ Quant_SCORE in {HIGH} ^ VA_SCORE in {HIGH} (691) 3. class YES IF: MAT_SCORE in {MEDIUM,HIGH} ^ Quant_SCORE in {LOW} ^ VA_SCORE in {HIGH} (268) 4. class NO IF: MAT_SCORE in {HIGH} ^ Quant_SCORE in {HIGH} ^ VA_SCORE in {LOW} (59) 5. class NO IF: GMAT_SCORE in {LOW,MEDIUM} ^ Quant_SCORE in {LOW,HIGH} ^ VA_SCORE in {LOW} (644) 6. class NO IF: GMAT_SCORE in {LOW} ^ Quant_SCORE in {LOW} ^ VA_SCORE in {HIGH} (65) So if we take rule no. (1) to (4) the likelihood of placement is always high. The rule suggests that a student with High MAT score and high verbal ability score may have batter chances of placement. For any decision making process if more than one parameters are important then a combination of individual parameter can also taken for example, a proper scaling can be taken to scale both parameter as equal range and sum is finally taken as new attribute. Here combination of MAT Score and verbal ability score together may be taken as important consideration at the time of generating rules. The same is clearly verified in figure 7. Figure 8 shows the formation of decision tree based on the rules explored. Fig. 7. Student data separated by verbal ability and MAT score. In some cases the classification line is inclined under certain angle. The inclination indicates that here two parameters are taken and one may have more predictive value than other. So it depends upon availability of data, for any institution. The above example has been taken for calculating the possible placements; other education related processes can also be evaluated in the light of Data mining approaches. The admission policies, retention rate, examination performance, career interest, may also be estimated on the basis of students past practices. Fig. 8. Decision tree of student data. VI. CONCLUSION Data Mining could be used to improve business intelligence process including education system to enhance the efficacy and overall efficiency by optimally utilizing the resources available. The performance, success of students in the examination as well as their overall personality development could be exponentially accelerated by thoroughly utilizing Data Mining technique to evaluate their admission academic performance and finally the placement. It has been established beyond doubt that placement which is one of the critical issue to the education process could be based on their performance is qualifying examination as well as their performance in the test. This is the obvious conclusion of the present investigation. The research work also suggests that for the given data set LIBSVM with Radial Basis Kernel has been taken as a best choice for data classification. REFERENCES [1] A. Merceron and K. Yacef, Educational Data Mining: a Case Study, In C. Looi; G. McCalla; B. Bredeweg; J. Breuker, editor, Proceedings of the 12th international Conference on Artificial Intelligence in Education AIED, pp Amsterdam, IOS Press, 2005 [2] I. H. Witten and E. Frank, Data mining: Practical Machine Learning Tools and Techniques, 2nd ed, Morgan-Kaufman Series of Data Management Systems San Francisco Elsevier, [3] J. Han and M. Kamber, Data mining: Concepts and Techniques, (Morgan-Kaufman Series of Data Management Systems). San Diego: Academic Press, [4] A. Salazar, J. Gosalbez, I. Bosch, Miralles, and R. Vergara, A case study of knowledge discovery on academic achievement, student desertion and student retention, Information Technology: Research and Education, ITRE nd International Conference, Issue 28, pp , [5] M. Ramaswami et al., A CHAID Based Performance Prediction Model in Educational Data Mining, IJCSI International Journal of Computer Science Issues, vol. 7, Issue 1, no. 1, January 2010, ISSN (Online): [6] P. Cortez and A. Silva, Using Data Mining To Predict Secondary School Student Performance, In EUROSIS, A. Brito and J. Teixeira (Eds.), pp. 5-12, [7] Q. A. AI-Radaideh, E. M. AI-Shawakfa, and M. I. AI-Najjar, Mining Student Data using Decision Trees,, International Arab Conference on Information Technology(ACIT'2006), Yarmouk University, Jordan [8] T. Ohmori, M. Naruse, and M. Hoshi, "A New Data Cube for Integrating Data Mining and OLAP," Data Engineering Workshop, 2007 IEEE 23rd International Conference, pp , [9] C. Surjit and D. Umeshwar, An Overview of Data Warehousing and OLAP Technology, Retrieved February 2010, from MODrecord97.pdf. [10] L. Jing, Data Mining as Driven by Knowledge Management in Higher Education Keynote for SPSS Public Conference, UCSF, [11] WEKA Data Mining Book (n.d.) [12] WEKA 3: Data Mining Software in Java (n.d.) Retrieved March 2010 from [13] A. Chaudhuri, K. De, and D. Chatterjee, "A Comparative Study of Kernels for the Multi-class Support Vector Machine," icnc, Fourth International Conference on Natural Computation, vol. 2, pp. 3-7, [14] J. Platt, Fast Training of Support Vector Machines Using Sequential Minimal Optimization, in Advances in Kernel Methods Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola Eds., pp , [15] C.-C. Chang and C.-J. Lin, LIBSVM -- A Library for Support Vector Machines, [16] T.-N. Do and J.-D. Fekete, Large Scale Classification with Support Vector Machine Algorithms, in proc. of ICMLA 07, 6th International Conference on Machine Learning and Applications, IEEE Press, Ohio, USA, pp. 7-12,
5 [17] I. H. Witten and E. Frank, and D. Mining, Practical Machine Learning Tools and Techniques with Java Implementations, Academic Press, [18] C. Cortes and V. Vapnik, Support-vector networks, Mach. Learn., vol. 20, pp , Prof. G.N.Pandey is Adjunct Professor, Indian Institute of Information Technology, Allahabad, India Dr. Sonali Agarwal is Assistance Professor, Indian Institute of Information Technology, Allahabad, India Dr. M.D.Tiwari is Director, Indian Institute of Information Technology, Allahabad, India. 144
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Edifice an Educational Framework using Educational Data Mining and Visual Analytics
I.J. Education and Management Engineering, 2016, 2, 24-30 Published Online March 2016 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2016.02.03 Available online at http://www.mecs-press.net/ijeme
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Keywords : Data Warehouse, Data Warehouse Testing, Lifecycle based Testing
Volume 4, Issue 12, December 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Lifecycle
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network
General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management
Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Simplified Data
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
KEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Data Mining Analysis (breast-cancer data)
Data Mining Analysis (breast-cancer data) Jung-Ying Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Predicting Students Final GPA Using Decision Trees: A Case Study
Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: [email protected] Office: Dipartimento di Ingegneria
Integrated Data Mining and Knowledge Discovery Techniques in ERP
Integrated Data Mining and Knowledge Discovery Techniques in ERP I Gandhimathi Amirthalingam, II Rabia Shaheen, III Mohammad Kousar, IV Syeda Meraj Bilfaqih I,III,IV Dept. of Computer Science, King Khalid
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Introduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Decision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS
CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
New Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
An Evaluation of Neural Networks Approaches used for Software Effort Estimation
Proc. of Int. Conf. on Multimedia Processing, Communication and Info. Tech., MPCIT An Evaluation of Neural Networks Approaches used for Software Effort Estimation B.V. Ajay Prakash 1, D.V.Ashoka 2, V.N.
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Subject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
Optimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. [email protected] P Srinivasu
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
The Prophecy-Prototype of Prediction modeling tool
The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai
A Data Mining view on Class Room Teaching Language
277 A Data Mining view on Class Room Teaching Language Umesh Kumar Pandey 1, S. Pal 2 1 Research Scholar, Singhania University, Jhunjhunu, Rajasthan, India 2 Dept. of MCA, VBS Purvanchal University Jaunpur
Role of Neural network in data mining
Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
Getting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand [email protected] ABSTRACT Ensemble Selection uses forward stepwise
Data Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH
1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Modeling Suspicious Email Detection Using Enhanced Feature Selection
Modeling Suspicious Email Detection Using Enhanced Feature Selection Sarwat Nizamani, Nasrullah Memon, Uffe Kock Wiil, and Panagiotis Karampelas Abstract The paper presents a suspicious email detection
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION
http:// IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION Harinder Kaur 1, Raveen Bajwa 2 1 PG Student., CSE., Baba Banda Singh Bahadur Engg. College, Fatehgarh Sahib, (India) 2 Asstt. Prof.,
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Data Mining is the process of knowledge discovery involving finding
using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Waffles: A Machine Learning Toolkit
Journal of Machine Learning Research 12 (2011) 2383-2387 Submitted 6/10; Revised 3/11; Published 7/11 Waffles: A Machine Learning Toolkit Mike Gashler Department of Computer Science Brigham Young University
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
