Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database



Similar documents
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

A Review of Data Mining Techniques

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

Data Mining Application in Enrollment Management: A Case Study

Predicting Students Final GPA Using Decision Trees: A Case Study

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Comparison of K-means and Backpropagation Data Mining Algorithms

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Customer Classification And Prediction Based On Data Mining Technique

How To Understand The Impact Of A Computer On Organization

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

A Content based Spam Filtering Using Optical Back Propagation Technique

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Data Mining Application in Advertisement Management of Higher Educational Institutes

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Chapter 6. The stacking ensemble approach

Predicting Student Performance by Using Data Mining Methods for Classification

Neural Networks in Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

Rule based Classification of BSE Stock Data with Data Mining

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Edifice an Educational Framework using Educational Data Mining and Visual Analytics

Comparison of Data Mining Techniques used for Financial Data Analysis

How To Use Neural Networks In Data Mining

Keywords data mining, prediction techniques, decision making.

Social Media Mining. Data Mining Essentials

A Framework for Dynamic Faculty Support System to Analyze Student Course Data

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

An Overview of Knowledge Discovery Database and Data mining Techniques

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Data Mining : A prediction of performer or underperformer using classification

Data Mining - Evaluation of Classifiers

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

Data Mining Algorithms Part 1. Dejan Sarka

Prediction of Heart Disease Using Naïve Bayes Algorithm

Random forest algorithm in big data environment

1. Classification problems

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

An Analysis on Performance of Decision Tree Algorithms using Student s Qualitative Data

Classification algorithm in Data mining: An Overview

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

Introduction to Data Mining

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

A New Approach for Evaluation of Data Mining Techniques

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Web Usage Mining: Identification of Trends Followed by the user through Neural Network

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE

A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study

IDENTIFIC ATION OF SOFTWARE EROSION USING LOGISTIC REGRESSION

Predicting Student Academic Performance at Degree Level: A Case Study

Enhancing Quality of Data using Data Mining Method

DATA MINING TECHNIQUES AND APPLICATIONS

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Inner Classification of Clusters for Online News

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

How To Predict Web Site Visits

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

International Journal of Advance Research in Computer Science and Management Studies

Database Marketing, Business Intelligence and Knowledge Discovery

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

A Data Mining view on Class Room Teaching Language

Towards applying Data Mining Techniques for Talent Mangement

Comparison of Classification Techniques for Heart Health Analysis System

Final Project Report

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Weather forecast prediction: a Data Mining application

Enhanced data mining analysis in higher educational system using rough set theory

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA

The Scientific Data Mining Process

How To Cluster

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Data quality in Accounting Information Systems

Mining the Software Change Repository of a Legacy Telephony System

SVM Ensemble Model for Investment Prediction

Chapter 12 Discovering New Knowledge Data Mining

Data Mining Solutions for the Business Environment

Transcription:

Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database A.O. Osofisan 1, O.O. Adeyemo 2 & S.T. Oluwasusi 3 Department of Computer Science, University of Ibadan Ibadan, Oyo State, Nigeria. E-mail: nikeosofisan@gmail.com, wumiglory@yahoo.com 1 Corresponding author: ABSTRACT The ability to predict student s performance is very important in educational environments because it plays an important role in producing the best quality graduates and post-graduates who will become great leaders of tomorrow and source of manpower for the country. Therefore the performance of students in universities is of utmost concern. One way to achieve this is by discovering knowledge for prediction as regards enrollment of student in a particular course, prediction of students performance and so on. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. Over the years, many students who enrolled in University of Ibadan M.Sc. program were unable to complete the program because there were no supporting tools that can help them take the best decision previous to their enrolment. Some also finish with poor grades, due to the fact that the students enrolment is only based on their personal experience. However, many students do not have enough experience for taking enrolment decisions. This is a waste of resources from the student s point of view as well as from the department s. These students also have probably wasted their time doing a course that they do not have the ability to do or interest to complete the program. On the other hand the department has wasted resources on such students. These resources could have been applied elsewhere or used on for student that were not admitted but deserved admission. The aim of this research work is to use Data Mining techniques to study students performance in order to discover appropriate knowledge and extract useful patterns from existing stored data of students. The knowledge and pattern extracted would be used for decision making and the specific Objectives are to discover knowledge for prediction regarding enrolment of student in a particular course and enhance decision making, to improve students performance and overcome the problem of low grades of graduate students and to discover an efficient algorithm that is sufficient in handling mining of data in educational sector. The work investigates the educational domain of data mining using a case study of the M.Sc. Student s data from Computer Science department, University of Ibadan. The data comprised of four hundred and eleven (411) records of students. In this research, the classification task is used to evaluate student s performance and as there are many approaches that are used for data classification, the neural network and decision tree method was used. The results of the two classification methods - Decision Trees and Neural Network are compared to determine the one that gives the best classification results as well as prediction capability in EDM. For the modeling stage, an open source software called WEKA 3.7.9 was used. The data set was divided into two sets Training and Testing. Seventy percent (70%) was used for training while thirty percent (30%) was used for testing. From the output generated from the experiment, for neural network, as the number of hidden layer increases, a better result was obtained. The results obtained from the analysis clearly demonstrated a superior performance of neural network over decision tree not only in terms of the number of correctly classified instances but also in terms of RMSE, MAE, RAE. Neural Network performed well in classification as well as in prediction but suffered from lack of speed. Decision Tree was fast but performed badly at the classification. Also the rules generated makes decision tree to be clearer and understandable. Neural Network gives the best classification results as well as prediction capability in EDM. Keywords: Data Mining (DM), Knowledge Discovery in Databases (KDD), Educational Data Mining (EDM), Classification, Prediction, Decision Trees, Neural Network. Reference Format: A.O. Osofisan 1 O.O. Adeyemo 2 & S.T. Oluwasusi 3 (2014). Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database. Afr J. of Comp & ICTs. Vol 7, No. 2. Pp 187-.196. 187

1. INTRODUCTION Students are the major assets in a university. The ability to evaluate and predict student s performance is very important in educational environments because it plays an important role in producing the best quality graduates and post-graduates who will become great leaders of tomorrow and source of manpower for the country. Therefore the performance of students in universities is of utmost concern. Discovering knowledge for prediction regarding: enrolment of students in a particular course, detection of abnormal values in the result sheets of the students, and prediction about students performance are information hidden within the educational data set. This hidden information can be extracted through data mining techniques. Data Mining (DM) focuses upon methodologies for extracting useful knowledge from large amounts of data. There are several useful Data Mining (DM) tools for extracting knowledge, such knowledge if found in students database may be used to increase quality of education. The evolution of information technology has made the collection, processing, transfer and storage of huge amount of data easier and cheaper to meet the increasing demand for information. As huge amount of data is being collected and stored in various formats (records, files, documents, images, sound, video, scientific data) traditional statistical techniques and database management tools are no longer adequate for analyzing them, hence there is need for proper and efficient knowledge extraction tool such as data mining [1]. 1.1 Data Mining Data mining techniques are used to operate on large volumes of data to discover hidden patterns and relationships helpful in decision making. While data mining and Knowledge Discovery in Databases (KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process. Data mining is a step in the "Knowledge Discovery in Databases" (KDD) process. The aim of this research work is to use Data Mining techniques to study students performance in order to discover appropriate knowledge and extract useful patterns from existing stored data of students. The knowledge and pattern extracted would be used for decision making. The main attribute of Data Mining (DM) is that it includes identifying valid, novel, potentially useful and understandable patterns in data repositories, thereby contributing to the prediction of outcome trends by profiting performance attributes that support effective decision making [2]. DM has been successfully used in different areas including the educational environment.dm application in Educational System is referred to as Educational Data Mining (EDM). EDM uses many techniques such as Decision Trees, Neural Networks, Naive Bayes, K- Nearest neighbor, k-means, Support Vector Machines, Expectation Maximization, etc. but the methods used in this work are Decision Trees and Neural Network. 1.2 The specific Objectives are: To discover knowledge for prediction regarding enrolment of student in a particular course and enhance decision making. To improve students performance and overcome the problem of low grades of graduate students. Discover an efficient algorithm that is sufficient in handling mining of data in educational sector. The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps: Data cleaning: also known as data cleansing, it is a phase in which noise and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing Knowledge are identified based on given measures. Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. 1.3 Decision Tree A decision tree is a flow-chart-like tree structure, where each internal node is denoted by rectangles, and leaf nodes are denoted by ovals. All internal nodes have two or more child nodes. All internal nodes contain splits, which test the value of an expression of the attributes. Arcs from an internal node to its children are labelled with distinct outcomes of the test. Each leaf node has a class label associated with it. Decision trees are powerful and popular for both classification and prediction. The attractiveness of tree-based methods is due largely to the fact that decision trees represent rules. Rules can readily be expressed in English so that humans can understand them. Decision trees are produced by algorithms that identify 188

various ways of splitting a dataset into branch-like segments. These segments form an inverted decision tree that originates with a root node at the top of the tree. The object of analysis is reflected in this root node as a simple, one-dimensional display in the decision tree interface. The name of the field of data that is the object of analysis is usually displayed, along with the spread or distribution of the values that are contained in that field. 1.4 Artificial Neural Network An artificial neural network, simply called neural network is a mathematical model inspired by biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system changing its structure during a learning phase. Neural networks are used for modeling complex relationships between inputs and outputs. Neural networks, with their remarkable ability to derive meaning from complicated data, can be used to extract patterns and detect trends that are too complex to be noticed by humans or other computer techniques. A trained neural network can be thought of as an expert in the category of information it has been given to analyze. A Neural Network is usually structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected and the various types and architectures are identified both by the different topologies adopted for the connections as well by the choice of the activation function. The values of the functions associated with the connections are called weights. For NNs to yield appropriate outputs for given inputs, the weight must be set to suitable values. The way this is obtained allows a further distinction among modes of operations. Figure 1: Neural Network 2. RELATED WORKS [3] gave a case study of mining students data to analyze learning behaviour using 151 students data collected from data base management system course held at the Islamic university of Gaza in the first semester of 2007/2008 including their usage of moodle e-learning facility. Four data mining task namely: Association rules, Classification, clustering and outline detection was applied to the data and it was found that each one of these knowledge discovered can be used to improve students performance. [4] investigated the academic background in relationship with the performance of students in a computer science programme in a Nigerian university. Results indicate that the grade obtained from senior secondary certificate examination (SSCE) in mathematics is the highest determinant used by the C4.5 learning algorithm in building the model of the students performance. Another of the findings is that even if a student does not finish his programme in the normal number of (four) academic sessions for whatever reasons he would still graduate with minimum of second class lower if he took further mathematics at SSCE examination. Students who spend more than four academic sessions in the programme and did not take further mathematics at SSCE examination are more likely to graduate with class below second class lower. [5] conducted a study on comparative study for predicting student s performance by selecting 48 students from VBS Purvanchal University, Jaunpur (Ultar pradesh) India on the sampling method of computer applications department of course MCA (Master of Computer Application) from session 2008 to 2011. Three different decision trees algorithm namely (ID3, C4.5, and CART) were used in order to investigate their accuracy or know the best out of them. The outcome of their results indicates that CART is the best algorithm for classification of data. [6] carried out a research on mining education data to predict student s retention. In the study machine learning algorithm (1D3, C4.5, and ADT) was applied to analyze and extract information from existing student data to establish predictive models and shows that machine learning algorithm such as Alternating decision tree (ADT) can learn predictive models from the student retention data accumulated from the previous year. [7] applied data classification and decision tree methods in order to improve the student performance. The data set used was obtained from M.Sc. IT department of Information Technology 2009 to 2012 batch. Extracurricular activities were also included. The information generated after the implementation of the data mining techniques will help the teachers to predict those students who have lesser performance and also to develop them with special attention. [8] conducted a study on the use of data mining technology to evaluate students academic achievement via multiple channels of enrolment like joint recruitment enrolment, athletic enrolment and application enrolment. Decision tree method was used and this shows that there are differences in the academic results of students from different enrolment channels. 189

It was found out that joint recruitment enrolment students perform much better than other students who are admitted via other enrolment methods and also that the long-term performance of students from athletic enrolment all show a declining trend. So, from this it can be seen that different enrolment methods influence the students academic achievement. [9] applied data mining techniques particularly classification, association, clustering and outlier detection rules to improve student s performance. They extracted useful knowledge from graduate students data collected from the College of Science and Technology, Khanyounis which include fifteen years period (1993-2007). Each one of the tasks can be used to improve the performance of graduate students. [10] applied Bayesian classification method on student database in order to predict for performance improvement. In the study, data was gathered from different degree colleges and institutions affiliated with Dr. R.M.L.Awadh University, Faizabad, India. The study will work to identify those students which needed special attention to reduce failing ration and taking appropriate action at right time. [11] conducted a study on an empirical study of applications of data mining techniques for predicting student performance in higher education. Student data of B.Tech second year (CS & IT branch) from database management system course held at the United College of Engineering and Research Naini Allahabad (Affiliated to GBTU) in the fourth semester of 2011/2012 was collected and also used questionnaire to collect the real data that describe the relationships between learning behavior of students and their academic performance. Data mining techniques were applied to discover knowledge, association rules, classification rules and k-means to cluster the students in to groups. The study showed how useful data mining can be used in higher education specifically to improve engineering students performance. 3. METHODOLOGY Before using data mining technology to carry out analysis, it is important to undergo some procedures to increase the accuracy of the analysis (Han and Kamber, 2001). Therefore, this research adopted the following steps before proceeding to analysis. 3.1 Data Collection The data used for this research was postgraduate student data from session 2000 to 2011 collected from Computer Science department, University of Ibadan. 3.2 Experimental Design a. Data Cleaning b. Data Integration c. Data Selection d. Data Transformation e. Data Mining 3.2.1 Data Cleaning This is the phase in which irrelevant data are removed from the collection, such as data errors, (which can either be from a data entry clerk or from a faulty data collection devices), irrelevant fields, non-variant fields, etc. In the original dataset, some classes of data such as the serial number, Accumulated Course Units Passed (ACUP), Overall Weighted Total (OWT) were not selected to be part of the mining process; this is because they do not provide any knowledge for the data set processing. Also, duplicate data are removed. Data source from the total of 511 instances, the data cleaning process ended up 411 instances that are ready to be mined. 3.2.2 Data Integration Data Integration is the phase where multiple data sources are combined in a data source. Also, a number of separate tables can be joined into one. 3.2.3 Data Selection At this stage, the data relevant to the analysis is decided on and retrieved from the dataset. This step in KDD process selects the data to be analyzed from the set of all available data. It will be highly unnecessary to attempt to analyze all data if meaningful pattern is to be obtained. The selected data is based on an evaluation of its potential to yield knowledge and these sets of data may represent a number of different aspects of the domain that are not directly related. 3.2.4. Data Transformation This is the stage in which the selected data is transformed into forms acceptable to data mining software. In this phase, a number of separate tables can be joined into one and vice versa. If the data is represented as text, but it is intended to use a data mining technique that require the data to be in numerical form, the data must be transformed accordingly. The data file was saved in Comma Separated Value (CSV) file format and later was converted to Attribute relation file format (ARFF) file inside weka. 190

4. RESULTS AND DISCUSSION In this analysis, the data set used was postgraduate student data from session 2000 to 2011 collected from Computer Science department, University of Ibadan, Nigeria. Table 1: Results of Modelling Student data on MLP Metrics Value 2.7 Seconds Correctly Classified Instances 98.2639% Incorrectly Classified Instances 1.7361% Kappa Statistics 0.9739 Mean Absolute Error 0.0115 Root Mean Squared Error 0.067 Relative Absolute Error 5.1556% Root Relative Squared Error 20.1556% Total Number of Instances 288 Table 2: The Performance measures TP Rate FP Rate Precision Recall F-Measure MCC ROC- Class Area 0.984 0.004 0.984 0.984 0.984 0.979 0.988 M.Phil/Ph.D 1.000 0.004 0.957 1.000 0.978 0.976 1.000 M.Phil 0.986 0.007 0.993 0.986 0.990 0.979 0.987 Ph.D 0.000 0.000 0.000 0.000 0.000 0.000 1.000 Withdraw 0.981 0.004 0.981 0.981 0.981 0.977 0.997 Fail 1.000 0.004 0.857 1.000 0.923 0.924 1.000 Terminal Weighted Average 0.983 0.006 0.980 0.983 0.981 0.974 0.990 Predicted Actual === Confusion Matrix === a b c d e f <-- classified as 60 0 1 0 0 0 a = M.Phil/Ph.D 0 22 0 0 0 0 b = M.Phil 1 0 142 0 0 1 c = Ph.D 0 0 0 0 1 0 d = Withdraw 0 1 0 0 53 0 e = Fail 0 0 0 0 0 6 f = Terminal 4.1 Confusion Matrix The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. Table 3: Results on test set (MLP) Metrics Value 5.93 seconds Correctly Classified Instances 60.1626% Incorrectly Classified Instances 39.8374% Kappa Statistics 0.5002 Mean Absolute Error 0.1306 Root Mean Squared Error 0.3381 Relative Absolute Error 48.2144% Root Relative Squared Error 84.7807% Total Number of Instances 123 191

Table 4: Performance measures on test set TP Rate FP Rate Precision Recall F-Measure MCC ROC-Area Class 0.882 0.135 0.714 0.882 0.789 0.705 0.934 M.Phil/Ph.D 0.400 0.037 0.600 0.400 0.480 0.435 0.931 M.Phil 0.762 0.010 0.941 0.762 0.842 0.820 0.955 Ph.D 0.000 0.011 0.000 0.000 0.000 0.050 0.808 Withdraw 1.000 0.277 0.440 1.000 0.611 0.564 0.873 Fail 0.000 0.025 0.000 0.000 0.000 0.020 0.975 Terminal Weighted Average 0.602 0.096 0.510 0.602 0.530 0.477 0.897 Predicted Actual === Confusion Matrix === a b c d e f <-- classified as 30 1 1 1 0 1 a = M.Phil/Ph.D 7 6 0 0 0 2 b = M.Phil 5 0 16 0 0 0 c = Ph.D 0 1 0 0 28 0 d = Withdraw 0 0 0 0 22 0 e = Fail 0 2 0 0 0 0 f = Terminal The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. Table 5: Results of Modelling Student data in J48 decision tree Metrics Value 0.25 Seconds Correctly Classified Instances 85.4167% Incorrectly Classified Instances 14.5833% Kappa Statistics 0.7751 Mean Absolute Error 0.0656 Root Mean Squared Error 0.1811 Relative Absolute Error 29.4987% Root Relative Squared Error 54.4513% Total Number of Instances 288 Table 6: The Performance measures TP Rate FP Rate Precision Recall F-Measure MCC ROC-Area Class 0.852 0.128 0.642 0.852 0.732 0.659 0.939 M.Phil/Ph.D 0.273 0.000 1.000 0.273 0.429 0.507 0.960 M.Phil 0.938 0.083 0.918 0.938 0.928 0.854 0.975 Ph.D 0.000 0.000 0.000 0.000 0.000 0.000 0.908 Withdraw 0.981 0.004 0.981 0.981 0.981 0.977 0.997 Fail 0.000 0.000 0.000 0.000 0.000 0.000 0.942 Terminal Weighted Average 0.854 0.070 0.856 0.854 0.836 0.789 0.970 Predicted Actual === Confusion Matrix === a b c d e f <-- classified as 52 0 9 0 0 0 a = M.Phil/Ph.D 15 6 1 0 0 0 b = M.Phil 9 0 135 0 0 0 c = Ph.D 0 0 0 0 1 0 d = Withdraw 0 0 1 0 53 0 e = Fail 5 0 1 0 0 0 f = Terminal 192

The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. Table 7: Results on Test Set Metrics Value 0.04 seconds Correctly Classified Instances 52.8455% Incorrectly Classified Instances 47.1545% Kappa Statistics 0.3977 Mean Absolute Error 0.1565 Root Mean Squared Error 0.3845 Relative absolute error 57.7893% Root relative squared error 96.4035% Total number of instances 123 Table 8: The Performance measure TP Rate FP Rate Precision Recall F-Measure MCC ROC-Area Class 0.853 0.247 0.569 0.853 0.682 0.550 0.858 M.Phil/Ph.D 0.000 0.019 0.000 0.000 0.000 0.048 0.523 M.Phil 0.667 0.059 0.700 0.667 0.683 0.620 0.815 Ph.D 0.000 0.000 0.000 0.000 0.000 0.000 0.866 Withdraw 1.000 0.277 0.440 1.000 0.611 0.564 0.861 Fail 0.000 0.000 0.000 0.000 0.000 0.000 0.492 Terminal Weighted Average 0.528 0.130 0.355 0.528 0.415 0.353 0.806 Predicted === Confusion Matrix === Actual a b c d e f <-- classified as 29 0 5 0 0 0 a = M.Phil/Ph.D 14 0 1 0 0 0 b = M.Phil 6 1 14 0 0 0 c = Ph.D 1 0 0 0 28 0 d = Withdraw 0 0 0 0 22 0 e = Fail 1 1 0 0 0 0 f = Terminal The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. 193

Figure 2: Decision tree rules Above is the decision tree constructed by the J48 classifier. This indicates how the classifier uses the attributes to make a decision. The leaf nodes indicate the outcome of a test, and each leaf (terminal) node holds a class label and the topmost node is the root node (Eligibility). 26 Rules generated from the decision tree. It can be expressed in English so that we humans can understand them. 1. IF Eligibility = NG & YGSD > 2004 & CSC 755 > 60 THEN Class = Withdraw 2. IF Eligibility = NG & YGSD > 2004 & CSC 755 <= 60 THEN Class = Fail 3. IF Eligibility = NG & YGSD <= 2004 THEN Class = Withdraw 4. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 > 68 THEN Class = PhD 5. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 > 62 THEN Class = PhD 6. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 > 62 THEN Class = PhD 7. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 <= 62 & CSC 746 > 57 & CSC 751 > 53 THEN Class = PhD 8. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 <= 62 & CSC 746 > 57 & CSC 751 <= 53 & CSC 746 > 68 THEN Class = PhD 9. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 <= 62 & CSC 746 > 57 & CSC 751 <= 53 & CSC 746 <= 68 THEN Class = MPhil/PhD 10. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 <= 43 THEN Class = MPhil/PhD 11. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 > 52 & CSC 747 > 46 & CSC 775 > 22 THEN Class = PhD 12. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 > 61 THEN Class = PhD 13. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 <= 61 & Modeofentry = PT THEN Class = MPhil 194

14. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 <= 61 & Modeofentry = FT & CSC 776 > 54 THEN Class = MPhil/PhD 15. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 <= 61 & Modeofentry = FT & CSC 776 <= 54 THEN Class = MPhil 16. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 > 57.08 & CSC 747 > 61 THEN Class = PhD 17. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 > 57.08 & CSC 747 <= 61 THEN Class = MPhil/PhD 18. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 <= 57.08 & CSC 745 > 53 THEN Class = MPhil/PhD 19. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 <= 57.08 & CSC 745 <= 53 & CSC 753 > 51.63 THEN Class = MPhil/PhD 20. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 <= 57.08 & CSC 745 <= 53 & CSC 753 <= 51.63 THEN Class = MPhil 21. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 <=48 & CSC 741 >44 & CSC 741 > 56 THEN Class = MPhil/PhD 22. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 <=48 & CSC 741 >44 & CSC 741 <= 56 THEN Class = MPhil 23. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 <=48 & CSC 741 <= 44 THEN Class = Terminal 24. IF Eligibility = P & CSC 765 <= 57 & CSC 799 <= 54 & CSC 757 > 30 & CSC 741 > 62 THEN Class = MPhil/PhD 25. IF Eligibility = P & CSC 765 <= 57 & CSC 799 <= 54 & CSC 757 > 30 & CSC 741 <= 62 THEN Class = MPhil 26. IF Eligibility = P & CSC 765 <= 57 & CSC 799 <= 54 & CSC 757 <= 30 THEN Class = Terminal 4.2 Discussion of ANN and Decision Tree Models for Student Datasets Artificial Neural Networks Modelling Results of Table 4.1a and 4.1b show that MLP ANN is better and more appropriate for student data than decision tree considering its highest level of accuracy. Also, Decision Trees Modelling Result of Table 4.3a and 4.3b and figure 4.6 show that decision tree is appropriate in deriving rules from the dataset and has lowest time taken to model than MLP-ANN Table 9: Comparative analysis on training set Metrics Value (MLP) Value (DT) 2.7 Seconds 0.25 Seconds Correctly Classified Instances 98.2639% 85.4167% Incorrectly Classified Instances 1.7361% 14.5837% Kappa Statistics 0.9739 0.7751 Mean Absolute Error 0.0115 0.0656 Root Mean Squared Error 0.067 0.1811 Relative Absolute Error 5.1556% 29.4987% Root Relative Squared Error 20.1556% 54.4513% Total Number of Instances 288 288 Table 10: Comparative analysis on test set Metrics Value (MLP) Value (DT) 5.93 Seconds 0.04 Seconds Correctly Classified Instances 60.1626% 52.8455% Incorrectly Classified Instances 39.8374% 47.1545% Kappa Statistics 0.5002 0.3977 Mean Absolute Error 0.1306 0.1565 Root Mean Squared Error 0.3381 0.3845 Relative Absolute Error 48.2144% 57.7893% Root Relative Squared Error 84.7807% 96.4035% Total Number of Instances 123 123 195

The results obtained from the analysis clearly demonstrated a superior performance of neural network over decision tree not only in terms of the number of correctly classified instances also in terms of RMSE, MAE, RAE. Neural Network performed well in classification as well as in prediction but suffered from lack of speed. Decision Tree was fast but performed badly at the classification. Also the rules generated makes decision tree to be clearer and understandable. 5. CONCLUSION The data to be analyzed by data mining techniques may be incomplete, noisy and inconsistent. Thus when starting the application, first the data must be preprocessed. This preprocessing includes data cleaning, data selection and data transformation. The data used in this application was also preprocessed. We applied data mining techniques to discover knowledge. Particularly we discovered classification rules using decision tree. These rules can be of help to the student to take the right decision based on courses to enrol. Thus, with this information, students will have supporting tool that will help them to take the best decisions previous to their enrolment. REFERENCES [1] Kumar, V. and Chadha, A. (2011) An Empirical Study of the Applications of Data Mining Techniques in Higher Education. IJACSA - International Journal of Advanced Computer Science and Applications, 2(3), 80-84. Retrieved from http://ijacsa.thesai.org. [2] Ogor, E. N., (2007) Student Academic Performance Monitoring and Evaluation Using Data Mining Techniques. Fourth Congress of Electronics, Robotics and Automotive Mechanics. IEEE Computer Society. pp 354-359. [3] Alaa el-halees, (2009) Mining students data to analyze e- Learning behavior: A Case Study. Department of Computer Science, Islamic University of Gaza P.O.Box 108 Gaza, Palestine. [6]. Surjeet Kumar Yadav et al., Data Mining Applications: A comparative Study for Predicting Student s Performance. International Journal Of Innovative Technology & Creative Engineering (Issn:2045-711) Vol.1 No.12 December 2012. [7]. Chin Chia Hsu and Tao Huang, The use of Data Mining Technology to evaluate student s academic achievement via multiple channels of enrolment: An empirical analysis of St. John s University of Technolgy. The IABPAD Conference Proceedings Orlando, Florida, January 3-6,2006. [8]. Shanmuga Priya K. and Senthil Kumar A.V., Improving the student s performance using Educational Data Mining, Int. J. Advanced Networking and Applications. Volume: 04 Issue: 04 Pages:1680-1685 (2013) ISSN : 0975-0290. [9]. Mohammed M. Abu Tair, Alaa M. El-Halees, Mining Educational Data to Improve Students Performance: A Case Study. International Journal of Information and Communication Technology Research Volume 2 No. 2, February 2012. ISSN 2223-4985. [10]. Brijesh Kumar Bhardwaj and Saurabh Pal, Data Mining: A prediction for performance improvement using classification. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 4, April 2011. [11]. Mahendra Tiwari, Randhir Singh and Neeraj Vimal, An Empirical Study of Applications of Data Mining Techniques for Predicting Student Performance in Higher Education. International Journal of Computer Science and Mobile Computing, IJCSMC, Vol. 2, Issue. 2, February 2013, pg.53 57. [12].http://www.cs.waikato.ac.nz/ml/weka/Software WEKA. [13]. http://www.educationaldatamining.org [4] Osofisan A.O. and Olamiti A.O., (2009) Academic Background of Students and Performance in a Computer Science Programme in a Nigerian University. European Journal of Social Sciences. 9(4): 564-572. [5]. Surjeet Kumar Yadav et al., Mining Education Data to predict Student s Retention: A comparative Study. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 2, 2012. 196