Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database

Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database A.O. Osofisan 1, O.O. Adeyemo 2 & S.T. Oluwasusi 3 Department of Computer Science, University of Ibadan Ibadan, Oyo State, Nigeria. E-mail: nikeosofisan@gmail.com, wumiglory@yahoo.com 1 Corresponding author: ABSTRACT The ability to predict student s performance is very important in educational environments because it plays an important role in producing the best quality graduates and post-graduates who will become great leaders of tomorrow and source of manpower for the country. Therefore the performance of students in universities is of utmost concern. One way to achieve this is by discovering knowledge for prediction as regards enrollment of student in a particular course, prediction of students performance and so on. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. Over the years, many students who enrolled in University of Ibadan M.Sc. program were unable to complete the program because there were no supporting tools that can help them take the best decision previous to their enrolment. Some also finish with poor grades, due to the fact that the students enrolment is only based on their personal experience. However, many students do not have enough experience for taking enrolment decisions. This is a waste of resources from the student s point of view as well as from the department s. These students also have probably wasted their time doing a course that they do not have the ability to do or interest to complete the program. On the other hand the department has wasted resources on such students. These resources could have been applied elsewhere or used on for student that were not admitted but deserved admission. The aim of this research work is to use Data Mining techniques to study students performance in order to discover appropriate knowledge and extract useful patterns from existing stored data of students. The knowledge and pattern extracted would be used for decision making and the specific Objectives are to discover knowledge for prediction regarding enrolment of student in a particular course and enhance decision making, to improve students performance and overcome the problem of low grades of graduate students and to discover an efficient algorithm that is sufficient in handling mining of data in educational sector. The work investigates the educational domain of data mining using a case study of the M.Sc. Student s data from Computer Science department, University of Ibadan. The data comprised of four hundred and eleven (411) records of students. In this research, the classification task is used to evaluate student s performance and as there are many approaches that are used for data classification, the neural network and decision tree method was used. The results of the two classification methods - Decision Trees and Neural Network are compared to determine the one that gives the best classification results as well as prediction capability in EDM. For the modeling stage, an open source software called WEKA 3.7.9 was used. The data set was divided into two sets Training and Testing. Seventy percent (70%) was used for training while thirty percent (30%) was used for testing. From the output generated from the experiment, for neural network, as the number of hidden layer increases, a better result was obtained. The results obtained from the analysis clearly demonstrated a superior performance of neural network over decision tree not only in terms of the number of correctly classified instances but also in terms of RMSE, MAE, RAE. Neural Network performed well in classification as well as in prediction but suffered from lack of speed. Decision Tree was fast but performed badly at the classification. Also the rules generated makes decision tree to be clearer and understandable. Neural Network gives the best classification results as well as prediction capability in EDM. Keywords: Data Mining (DM), Knowledge Discovery in Databases (KDD), Educational Data Mining (EDM), Classification, Prediction, Decision Trees, Neural Network. Reference Format: A.O. Osofisan 1 O.O. Adeyemo 2 & S.T. Oluwasusi 3 (2014). Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database. Afr J. of Comp & ICTs. Vol 7, No. 2. Pp 187-.196. 187

1. INTRODUCTION Students are the major assets in a university. The ability to evaluate and predict student s performance is very important in educational environments because it plays an important role in producing the best quality graduates and post-graduates who will become great leaders of tomorrow and source of manpower for the country. Therefore the performance of students in universities is of utmost concern. Discovering knowledge for prediction regarding: enrolment of students in a particular course, detection of abnormal values in the result sheets of the students, and prediction about students performance are information hidden within the educational data set. This hidden information can be extracted through data mining techniques. Data Mining (DM) focuses upon methodologies for extracting useful knowledge from large amounts of data. There are several useful Data Mining (DM) tools for extracting knowledge, such knowledge if found in students database may be used to increase quality of education. The evolution of information technology has made the collection, processing, transfer and storage of huge amount of data easier and cheaper to meet the increasing demand for information. As huge amount of data is being collected and stored in various formats (records, files, documents, images, sound, video, scientific data) traditional statistical techniques and database management tools are no longer adequate for analyzing them, hence there is need for proper and efficient knowledge extraction tool such as data mining [1]. 1.1 Data Mining Data mining techniques are used to operate on large volumes of data to discover hidden patterns and relationships helpful in decision making. While data mining and Knowledge Discovery in Databases (KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process. Data mining is a step in the "Knowledge Discovery in Databases" (KDD) process. The aim of this research work is to use Data Mining techniques to study students performance in order to discover appropriate knowledge and extract useful patterns from existing stored data of students. The knowledge and pattern extracted would be used for decision making. The main attribute of Data Mining (DM) is that it includes identifying valid, novel, potentially useful and understandable patterns in data repositories, thereby contributing to the prediction of outcome trends by profiting performance attributes that support effective decision making [2]. DM has been successfully used in different areas including the educational environment.dm application in Educational System is referred to as Educational Data Mining (EDM). EDM uses many techniques such as Decision Trees, Neural Networks, Naive Bayes, K- Nearest neighbor, k-means, Support Vector Machines, Expectation Maximization, etc. but the methods used in this work are Decision Trees and Neural Network. 1.2 The specific Objectives are: To discover knowledge for prediction regarding enrolment of student in a particular course and enhance decision making. To improve students performance and overcome the problem of low grades of graduate students. Discover an efficient algorithm that is sufficient in handling mining of data in educational sector. The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps: Data cleaning: also known as data cleansing, it is a phase in which noise and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing Knowledge are identified based on given measures. Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. 1.3 Decision Tree A decision tree is a flow-chart-like tree structure, where each internal node is denoted by rectangles, and leaf nodes are denoted by ovals. All internal nodes have two or more child nodes. All internal nodes contain splits, which test the value of an expression of the attributes. Arcs from an internal node to its children are labelled with distinct outcomes of the test. Each leaf node has a class label associated with it. Decision trees are powerful and popular for both classification and prediction. The attractiveness of tree-based methods is due largely to the fact that decision trees represent rules. Rules can readily be expressed in English so that humans can understand them. Decision trees are produced by algorithms that identify 188

various ways of splitting a dataset into branch-like segments. These segments form an inverted decision tree that originates with a root node at the top of the tree. The object of analysis is reflected in this root node as a simple, one-dimensional display in the decision tree interface. The name of the field of data that is the object of analysis is usually displayed, along with the spread or distribution of the values that are contained in that field. 1.4 Artificial Neural Network An artificial neural network, simply called neural network is a mathematical model inspired by biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system changing its structure during a learning phase. Neural networks are used for modeling complex relationships between inputs and outputs. Neural networks, with their remarkable ability to derive meaning from complicated data, can be used to extract patterns and detect trends that are too complex to be noticed by humans or other computer techniques. A trained neural network can be thought of as an expert in the category of information it has been given to analyze. A Neural Network is usually structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected and the various types and architectures are identified both by the different topologies adopted for the connections as well by the choice of the activation function. The values of the functions associated with the connections are called weights. For NNs to yield appropriate outputs for given inputs, the weight must be set to suitable values. The way this is obtained allows a further distinction among modes of operations. Figure 1: Neural Network 2. RELATED WORKS [3] gave a case study of mining students data to analyze learning behaviour using 151 students data collected from data base management system course held at the Islamic university of Gaza in the first semester of 2007/2008 including their usage of moodle e-learning facility. Four data mining task namely: Association rules, Classification, clustering and outline detection was applied to the data and it was found that each one of these knowledge discovered can be used to improve students performance. [4] investigated the academic background in relationship with the performance of students in a computer science programme in a Nigerian university. Results indicate that the grade obtained from senior secondary certificate examination (SSCE) in mathematics is the highest determinant used by the C4.5 learning algorithm in building the model of the students performance. Another of the findings is that even if a student does not finish his programme in the normal number of (four) academic sessions for whatever reasons he would still graduate with minimum of second class lower if he took further mathematics at SSCE examination. Students who spend more than four academic sessions in the programme and did not take further mathematics at SSCE examination are more likely to graduate with class below second class lower. [5] conducted a study on comparative study for predicting student s performance by selecting 48 students from VBS Purvanchal University, Jaunpur (Ultar pradesh) India on the sampling method of computer applications department of course MCA (Master of Computer Application) from session 2008 to 2011. Three different decision trees algorithm namely (ID3, C4.5, and CART) were used in order to investigate their accuracy or know the best out of them. The outcome of their results indicates that CART is the best algorithm for classification of data. [6] carried out a research on mining education data to predict student s retention. In the study machine learning algorithm (1D3, C4.5, and ADT) was applied to analyze and extract information from existing student data to establish predictive models and shows that machine learning algorithm such as Alternating decision tree (ADT) can learn predictive models from the student retention data accumulated from the previous year. [7] applied data classification and decision tree methods in order to improve the student performance. The data set used was obtained from M.Sc. IT department of Information Technology 2009 to 2012 batch. Extracurricular activities were also included. The information generated after the implementation of the data mining techniques will help the teachers to predict those students who have lesser performance and also to develop them with special attention. [8] conducted a study on the use of data mining technology to evaluate students academic achievement via multiple channels of enrolment like joint recruitment enrolment, athletic enrolment and application enrolment. Decision tree method was used and this shows that there are differences in the academic results of students from different enrolment channels. 189

It was found out that joint recruitment enrolment students perform much better than other students who are admitted via other enrolment methods and also that the long-term performance of students from athletic enrolment all show a declining trend. So, from this it can be seen that different enrolment methods influence the students academic achievement. [9] applied data mining techniques particularly classification, association, clustering and outlier detection rules to improve student s performance. They extracted useful knowledge from graduate students data collected from the College of Science and Technology, Khanyounis which include fifteen years period (1993-2007). Each one of the tasks can be used to improve the performance of graduate students. [10] applied Bayesian classification method on student database in order to predict for performance improvement. In the study, data was gathered from different degree colleges and institutions affiliated with Dr. R.M.L.Awadh University, Faizabad, India. The study will work to identify those students which needed special attention to reduce failing ration and taking appropriate action at right time. [11] conducted a study on an empirical study of applications of data mining techniques for predicting student performance in higher education. Student data of B.Tech second year (CS & IT branch) from database management system course held at the United College of Engineering and Research Naini Allahabad (Affiliated to GBTU) in the fourth semester of 2011/2012 was collected and also used questionnaire to collect the real data that describe the relationships between learning behavior of students and their academic performance. Data mining techniques were applied to discover knowledge, association rules, classification rules and k-means to cluster the students in to groups. The study showed how useful data mining can be used in higher education specifically to improve engineering students performance. 3. METHODOLOGY Before using data mining technology to carry out analysis, it is important to undergo some procedures to increase the accuracy of the analysis (Han and Kamber, 2001). Therefore, this research adopted the following steps before proceeding to analysis. 3.1 Data Collection The data used for this research was postgraduate student data from session 2000 to 2011 collected from Computer Science department, University of Ibadan. 3.2 Experimental Design a. Data Cleaning b. Data Integration c. Data Selection d. Data Transformation e. Data Mining 3.2.1 Data Cleaning This is the phase in which irrelevant data are removed from the collection, such as data errors, (which can either be from a data entry clerk or from a faulty data collection devices), irrelevant fields, non-variant fields, etc. In the original dataset, some classes of data such as the serial number, Accumulated Course Units Passed (ACUP), Overall Weighted Total (OWT) were not selected to be part of the mining process; this is because they do not provide any knowledge for the data set processing. Also, duplicate data are removed. Data source from the total of 511 instances, the data cleaning process ended up 411 instances that are ready to be mined. 3.2.2 Data Integration Data Integration is the phase where multiple data sources are combined in a data source. Also, a number of separate tables can be joined into one. 3.2.3 Data Selection At this stage, the data relevant to the analysis is decided on and retrieved from the dataset. This step in KDD process selects the data to be analyzed from the set of all available data. It will be highly unnecessary to attempt to analyze all data if meaningful pattern is to be obtained. The selected data is based on an evaluation of its potential to yield knowledge and these sets of data may represent a number of different aspects of the domain that are not directly related. 3.2.4. Data Transformation This is the stage in which the selected data is transformed into forms acceptable to data mining software. In this phase, a number of separate tables can be joined into one and vice versa. If the data is represented as text, but it is intended to use a data mining technique that require the data to be in numerical form, the data must be transformed accordingly. The data file was saved in Comma Separated Value (CSV) file format and later was converted to Attribute relation file format (ARFF) file inside weka. 190

4. RESULTS AND DISCUSSION In this analysis, the data set used was postgraduate student data from session 2000 to 2011 collected from Computer Science department, University of Ibadan, Nigeria. Table 1: Results of Modelling Student data on MLP Metrics Value 2.7 Seconds Correctly Classified Instances 98.2639% Incorrectly Classified Instances 1.7361% Kappa Statistics 0.9739 Mean Absolute Error 0.0115 Root Mean Squared Error 0.067 Relative Absolute Error 5.1556% Root Relative Squared Error 20.1556% Total Number of Instances 288 Table 2: The Performance measures TP Rate FP Rate Precision Recall F-Measure MCC ROC- Class Area 0.984 0.004 0.984 0.984 0.984 0.979 0.988 M.Phil/Ph.D 1.000 0.004 0.957 1.000 0.978 0.976 1.000 M.Phil 0.986 0.007 0.993 0.986 0.990 0.979 0.987 Ph.D 0.000 0.000 0.000 0.000 0.000 0.000 1.000 Withdraw 0.981 0.004 0.981 0.981 0.981 0.977 0.997 Fail 1.000 0.004 0.857 1.000 0.923 0.924 1.000 Terminal Weighted Average 0.983 0.006 0.980 0.983 0.981 0.974 0.990 Predicted Actual === Confusion Matrix === a b c d e f <-- classified as 60 0 1 0 0 0 a = M.Phil/Ph.D 0 22 0 0 0 0 b = M.Phil 1 0 142 0 0 1 c = Ph.D 0 0 0 0 1 0 d = Withdraw 0 1 0 0 53 0 e = Fail 0 0 0 0 0 6 f = Terminal 4.1 Confusion Matrix The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. Table 3: Results on test set (MLP) Metrics Value 5.93 seconds Correctly Classified Instances 60.1626% Incorrectly Classified Instances 39.8374% Kappa Statistics 0.5002 Mean Absolute Error 0.1306 Root Mean Squared Error 0.3381 Relative Absolute Error 48.2144% Root Relative Squared Error 84.7807% Total Number of Instances 123 191

Table 4: Performance measures on test set TP Rate FP Rate Precision Recall F-Measure MCC ROC-Area Class 0.882 0.135 0.714 0.882 0.789 0.705 0.934 M.Phil/Ph.D 0.400 0.037 0.600 0.400 0.480 0.435 0.931 M.Phil 0.762 0.010 0.941 0.762 0.842 0.820 0.955 Ph.D 0.000 0.011 0.000 0.000 0.000 0.050 0.808 Withdraw 1.000 0.277 0.440 1.000 0.611 0.564 0.873 Fail 0.000 0.025 0.000 0.000 0.000 0.020 0.975 Terminal Weighted Average 0.602 0.096 0.510 0.602 0.530 0.477 0.897 Predicted Actual === Confusion Matrix === a b c d e f <-- classified as 30 1 1 1 0 1 a = M.Phil/Ph.D 7 6 0 0 0 2 b = M.Phil 5 0 16 0 0 0 c = Ph.D 0 1 0 0 28 0 d = Withdraw 0 0 0 0 22 0 e = Fail 0 2 0 0 0 0 f = Terminal The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. Table 5: Results of Modelling Student data in J48 decision tree Metrics Value 0.25 Seconds Correctly Classified Instances 85.4167% Incorrectly Classified Instances 14.5833% Kappa Statistics 0.7751 Mean Absolute Error 0.0656 Root Mean Squared Error 0.1811 Relative Absolute Error 29.4987% Root Relative Squared Error 54.4513% Total Number of Instances 288 Table 6: The Performance measures TP Rate FP Rate Precision Recall F-Measure MCC ROC-Area Class 0.852 0.128 0.642 0.852 0.732 0.659 0.939 M.Phil/Ph.D 0.273 0.000 1.000 0.273 0.429 0.507 0.960 M.Phil 0.938 0.083 0.918 0.938 0.928 0.854 0.975 Ph.D 0.000 0.000 0.000 0.000 0.000 0.000 0.908 Withdraw 0.981 0.004 0.981 0.981 0.981 0.977 0.997 Fail 0.000 0.000 0.000 0.000 0.000 0.000 0.942 Terminal Weighted Average 0.854 0.070 0.856 0.854 0.836 0.789 0.970 Predicted Actual === Confusion Matrix === a b c d e f <-- classified as 52 0 9 0 0 0 a = M.Phil/Ph.D 15 6 1 0 0 0 b = M.Phil 9 0 135 0 0 0 c = Ph.D 0 0 0 0 1 0 d = Withdraw 0 0 1 0 53 0 e = Fail 5 0 1 0 0 0 f = Terminal 192

The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. Table 7: Results on Test Set Metrics Value 0.04 seconds Correctly Classified Instances 52.8455% Incorrectly Classified Instances 47.1545% Kappa Statistics 0.3977 Mean Absolute Error 0.1565 Root Mean Squared Error 0.3845 Relative absolute error 57.7893% Root relative squared error 96.4035% Total number of instances 123 Table 8: The Performance measure TP Rate FP Rate Precision Recall F-Measure MCC ROC-Area Class 0.853 0.247 0.569 0.853 0.682 0.550 0.858 M.Phil/Ph.D 0.000 0.019 0.000 0.000 0.000 0.048 0.523 M.Phil 0.667 0.059 0.700 0.667 0.683 0.620 0.815 Ph.D 0.000 0.000 0.000 0.000 0.000 0.000 0.866 Withdraw 1.000 0.277 0.440 1.000 0.611 0.564 0.861 Fail 0.000 0.000 0.000 0.000 0.000 0.000 0.492 Terminal Weighted Average 0.528 0.130 0.355 0.528 0.415 0.353 0.806 Predicted === Confusion Matrix === Actual a b c d e f <-- classified as 29 0 5 0 0 0 a = M.Phil/Ph.D 14 0 1 0 0 0 b = M.Phil 6 1 14 0 0 0 c = Ph.D 1 0 0 0 28 0 d = Withdraw 0 0 0 0 22 0 e = Fail 1 1 0 0 0 0 f = Terminal The confusion matrix is commonly named contingency table. The number of correctly classified instances is the sum of the diagonals in the matrix; all others are incorrectly classified. 193

Figure 2: Decision tree rules Above is the decision tree constructed by the J48 classifier. This indicates how the classifier uses the attributes to make a decision. The leaf nodes indicate the outcome of a test, and each leaf (terminal) node holds a class label and the topmost node is the root node (Eligibility). 26 Rules generated from the decision tree. It can be expressed in English so that we humans can understand them. 1. IF Eligibility = NG & YGSD > 2004 & CSC 755 > 60 THEN Class = Withdraw 2. IF Eligibility = NG & YGSD > 2004 & CSC 755 <= 60 THEN Class = Fail 3. IF Eligibility = NG & YGSD <= 2004 THEN Class = Withdraw 4. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 > 68 THEN Class = PhD 5. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 > 62 THEN Class = PhD 6. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 > 62 THEN Class = PhD 7. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 <= 62 & CSC 746 > 57 & CSC 751 > 53 THEN Class = PhD 8. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 <= 62 & CSC 746 > 57 & CSC 751 <= 53 & CSC 746 > 68 THEN Class = PhD 9. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 > 43 & CSC 799 <= 68 & CSC 765 <= 62 & CSC 766 <= 62 & CSC 746 > 57 & CSC 751 <= 53 & CSC 746 <= 68 THEN Class = MPhil/PhD 10. IF Eligibility = P & CSC 765 > 57 & CSC 742 > 49 & CSC 746 <= 43 THEN Class = MPhil/PhD 11. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 > 52 & CSC 747 > 46 & CSC 775 > 22 THEN Class = PhD 12. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 > 61 THEN Class = PhD 13. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 <= 61 & Modeofentry = PT THEN Class = MPhil 194

14. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 <= 61 & Modeofentry = FT & CSC 776 > 54 THEN Class = MPhil/PhD 15. IF Eligibility = P & CSC 765 > 57 & CSC 742 <=49 & CSC 751 <=52 & CSC 746 <= 61 & Modeofentry = FT & CSC 776 <= 54 THEN Class = MPhil 16. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 > 57.08 & CSC 747 > 61 THEN Class = PhD 17. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 > 57.08 & CSC 747 <= 61 THEN Class = MPhil/PhD 18. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 <= 57.08 & CSC 745 > 53 THEN Class = MPhil/PhD 19. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 <= 57.08 & CSC 745 <= 53 & CSC 753 > 51.63 THEN Class = MPhil/PhD 20. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 > 48 & CSC 741 <= 57.08 & CSC 745 <= 53 & CSC 753 <= 51.63 THEN Class = MPhil 21. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 <=48 & CSC 741 >44 & CSC 741 > 56 THEN Class = MPhil/PhD 22. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 <=48 & CSC 741 >44 & CSC 741 <= 56 THEN Class = MPhil 23. IF Eligibility = P & CSC 765 <= 57 & CSC 799 > 54 & CSC 755 <=48 & CSC 741 <= 44 THEN Class = Terminal 24. IF Eligibility = P & CSC 765 <= 57 & CSC 799 <= 54 & CSC 757 > 30 & CSC 741 > 62 THEN Class = MPhil/PhD 25. IF Eligibility = P & CSC 765 <= 57 & CSC 799 <= 54 & CSC 757 > 30 & CSC 741 <= 62 THEN Class = MPhil 26. IF Eligibility = P & CSC 765 <= 57 & CSC 799 <= 54 & CSC 757 <= 30 THEN Class = Terminal 4.2 Discussion of ANN and Decision Tree Models for Student Datasets Artificial Neural Networks Modelling Results of Table 4.1a and 4.1b show that MLP ANN is better and more appropriate for student data than decision tree considering its highest level of accuracy. Also, Decision Trees Modelling Result of Table 4.3a and 4.3b and figure 4.6 show that decision tree is appropriate in deriving rules from the dataset and has lowest time taken to model than MLP-ANN Table 9: Comparative analysis on training set Metrics Value (MLP) Value (DT) 2.7 Seconds 0.25 Seconds Correctly Classified Instances 98.2639% 85.4167% Incorrectly Classified Instances 1.7361% 14.5837% Kappa Statistics 0.9739 0.7751 Mean Absolute Error 0.0115 0.0656 Root Mean Squared Error 0.067 0.1811 Relative Absolute Error 5.1556% 29.4987% Root Relative Squared Error 20.1556% 54.4513% Total Number of Instances 288 288 Table 10: Comparative analysis on test set Metrics Value (MLP) Value (DT) 5.93 Seconds 0.04 Seconds Correctly Classified Instances 60.1626% 52.8455% Incorrectly Classified Instances 39.8374% 47.1545% Kappa Statistics 0.5002 0.3977 Mean Absolute Error 0.1306 0.1565 Root Mean Squared Error 0.3381 0.3845 Relative Absolute Error 48.2144% 57.7893% Root Relative Squared Error 84.7807% 96.4035% Total Number of Instances 123 123 195

The results obtained from the analysis clearly demonstrated a superior performance of neural network over decision tree not only in terms of the number of correctly classified instances also in terms of RMSE, MAE, RAE. Neural Network performed well in classification as well as in prediction but suffered from lack of speed. Decision Tree was fast but performed badly at the classification. Also the rules generated makes decision tree to be clearer and understandable. 5. CONCLUSION The data to be analyzed by data mining techniques may be incomplete, noisy and inconsistent. Thus when starting the application, first the data must be preprocessed. This preprocessing includes data cleaning, data selection and data transformation. The data used in this application was also preprocessed. We applied data mining techniques to discover knowledge. Particularly we discovered classification rules using decision tree. These rules can be of help to the student to take the right decision based on courses to enrol. Thus, with this information, students will have supporting tool that will help them to take the best decisions previous to their enrolment. REFERENCES [1] Kumar, V. and Chadha, A. (2011) An Empirical Study of the Applications of Data Mining Techniques in Higher Education. IJACSA - International Journal of Advanced Computer Science and Applications, 2(3), 80-84. Retrieved from http://ijacsa.thesai.org. [2] Ogor, E. N., (2007) Student Academic Performance Monitoring and Evaluation Using Data Mining Techniques. Fourth Congress of Electronics, Robotics and Automotive Mechanics. IEEE Computer Society. pp 354-359. [3] Alaa el-halees, (2009) Mining students data to analyze e- Learning behavior: A Case Study. Department of Computer Science, Islamic University of Gaza P.O.Box 108 Gaza, Palestine. [6]. Surjeet Kumar Yadav et al., Data Mining Applications: A comparative Study for Predicting Student s Performance. International Journal Of Innovative Technology & Creative Engineering (Issn:2045-711) Vol.1 No.12 December 2012. [7]. Chin Chia Hsu and Tao Huang, The use of Data Mining Technology to evaluate student s academic achievement via multiple channels of enrolment: An empirical analysis of St. John s University of Technolgy. The IABPAD Conference Proceedings Orlando, Florida, January 3-6,2006. [8]. Shanmuga Priya K. and Senthil Kumar A.V., Improving the student s performance using Educational Data Mining, Int. J. Advanced Networking and Applications. Volume: 04 Issue: 04 Pages:1680-1685 (2013) ISSN : 0975-0290. [9]. Mohammed M. Abu Tair, Alaa M. El-Halees, Mining Educational Data to Improve Students Performance: A Case Study. International Journal of Information and Communication Technology Research Volume 2 No. 2, February 2012. ISSN 2223-4985. [10]. Brijesh Kumar Bhardwaj and Saurabh Pal, Data Mining: A prediction for performance improvement using classification. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 4, April 2011. [11]. Mahendra Tiwari, Randhir Singh and Neeraj Vimal, An Empirical Study of Applications of Data Mining Techniques for Predicting Student Performance in Higher Education. International Journal of Computer Science and Mobile Computing, IJCSMC, Vol. 2, Issue. 2, February 2013, pg.53 57. [12].http://www.cs.waikato.ac.nz/ml/weka/Software WEKA. [13]. http://www.educationaldatamining.org [4] Osofisan A.O. and Olamiti A.O., (2009) Academic Background of Students and Performance in a Computer Science Programme in a Nigerian University. European Journal of Social Sciences. 9(4): 564-572. [5]. Surjeet Kumar Yadav et al., Mining Education Data to predict Student s Retention: A comparative Study. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 2, 2012. 196