How To Extract From Data From A College Course



Similar documents
Review: Classification Outline

Application and research of fuzzy clustering analysis algorithm under micro-lecture English teaching mode

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Modified Line Search Method for Global Optimization

STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA. Maya Maria, Universitas Terbuka, Indonesia

LECTURE 13: Cross-validation

Information for Programs Seeking Initial Accreditation

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

1 Correlation and Regression Analysis

The Forgotten Middle. research readiness results. Executive Summary

Clustering Algorithm Analysis of Web Users with Dissimilarity and SOM Neural Networks

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

CHAPTER 3 THE TIME VALUE OF MONEY

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Determining the sample size

The Canadian Council of Professional Engineers

Professional Networking

Lesson 17 Pearson s Correlation Coefficient

1 Computing the Standard Deviation of Sample Means

CREATIVE MARKETING PROJECT 2016

One Goal. 18-Months. Unlimited Opportunities.

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Data Mining Application in Enrollment Management: A Case Study

How to read A Mutual Fund shareholder report

Hypothesis testing. Null and alternative hypotheses

PUBLIC RELATIONS PROJECT 2016

PSYCHOLOGICAL STATISTICS

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Reliability Analysis in HPC clusters

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

SPC for Software Reliability: Imperfect Software Debugging Model

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Hypergeometric Distributions

Mining Customer s Data for Vehicle Insurance Prediction System using k-means Clustering - An Application

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

insight reporting solutions

INVESTMENT PERFORMANCE COUNCIL (IPC)

Subject CT5 Contingencies Core Technical Syllabus

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Baan Service Master Data Management

CS100: Introduction to Computer Science

A guide to School Employees' Well-Being

UM USER SATISFACTION SURVEY Final Report. September 2, Prepared by. ers e-research & Solutions (Macau)

G r a d e. 2 M a t h e M a t i c s. statistics and Probability

Prescribing costs in primary care

How To Solve The Homewor Problem Beautifully

The Importance of Media in the Classroom

Chapter XIV: Fundamentals of Probability and Statistics *

Domain 1: Designing a SQL Server Instance and a Database Solution

Statistical inference: example 1. Inferential Statistics

Ordinal Classification Method for the Evaluation Of Thai Non-life Insurance Companies

Study in the United States. Post Graduate Programs

THE ROLE OF BUSINESS INTELLIGENCE IN DECISION PROCESS MODELING

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Systems Design Project: Indoor Location of Wireless Devices

Confidence Intervals for One Mean

Lesson 15 ANOVA (analysis of variance)

5 Boolean Decision Trees (February 11)

JJMIE Jordan Journal of Mechanical and Industrial Engineering

Multiple Representations for Pattern Exploration with the Graphing Calculator and Manipulatives

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Advancement FORUM. CULTIVATING LEADERS IN CASE MANAGEMENT

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand

ijcrb.webs.com INTERDISCIPLINARY JOURNAL OF CONTEMPORARY RESEARCH IN BUSINESS DECEMBER 2011 VOL 3, NO 8

AdaLab. Adaptive Automated Scientific Laboratory (AdaLab) Adaptive Machines in Complex Environments. n Start Date:

AN ECONOMIC ANALYSIS OF VISVESVARYA URBAN COOPERATIVE BANK

ANALYTICS. Insights that drive your business

Soving Recurrence Relations

Trading rule extraction in stock market using the rough set approach

A Balanced Scorecard

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Engineering Data Management

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

Initial Teacher Training Programmes

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Lecture 2: Karger s Min Cut Algorithm

Comparative Study On Estimate House Price Using Statistical And Neural Network Model

LOAD BALANCING IN PUBLIC CLOUD COMBINING THE CONCEPTS OF DATA MINING AND NETWORKING

Assessment of the Board

Extracting Similar and Opposite News Websites Based on Sentiment Analysis

Detecting Auto Insurance Fraud by Data Mining Techniques

ODBC. Getting Started With Sage Timberline Office ODBC

CS100: Introduction to Computer Science

Chapter 7: Confidence Interval and Sample Size

7.1 Finding Rational Solutions of Polynomial Equations

Domain 1 - Describe Cisco VoIP Implementations

Forecasting. Forecasting Application. Practical Forecasting. Chapter 7 OVERVIEW KEY CONCEPTS. Chapter 7. Chapter 7

AP Calculus BC 2003 Scoring Guidelines Form B

Under University of Dhaka

AGC s SUPERVISORY TRAINING PROGRAM

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Forecasting techniques

Simple Annuities Present Value.

Transcription:

(IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 Miig Educatioal Data to Aalyze Studets Performace Briesh Kumar Baradwa Research Scholor, Sighaiya Uiversity, Raastha, Idia Saurabh Pal Sr. Lecturer, Dept. of MCA, VBS Purvachal Uiversity, Jaupur-00, Idia Abstract The mai obective of higher educatio istitutios is to provide quality educatio to its studets. Oe way to achieve highest level of quality i higher educatio system is by discoverig kowledge for predictio regardig erolmet of studets i a particular course, alieatio of traditioal classroom teachig model, detectio of ufair meas used i olie examiatio, detectio of abormal values i the result sheets of the studets, predictio about studets performace ad so o. The kowledge is hidde amog the educatioal data set ad it is extractable through data miig techiques. Preset paper is desiged to ustify the capabilities of data miig techiques i cotext of higher educatio by offerig a data miig model for higher educatio system i the uiversity. I this research, the classificatio task is used to evaluate studet s performace ad as there are may approaches that are used for data classificatio, the decisio tree method is used here. By this task we extract kowledge that describes studets performace i ed semester examiatio. It helps earlier i idetifyig the dropouts ad studets who eed special attetio ad allow the teacher to provide appropriate advisig/couselig. Keywords-Educatioal Data Miig (EDM; Classificatio; Kowledge Discovery i Database (KDD; ID3 Algorithm. I. INTRODUCTION The advet of iformatio techology i various fields has lead the large volumes of data storage i various formats like records, files, documets, images, soud, videos, scietific data ad may ew data formats. The data collected from differet applicatios require proper method of extractig kowledge from large repositories for better decisio makig. Kowledge discovery i databases (KDD, ofte called data miig, aims at the discovery of useful iformatio from large collectios of data []. The mai fuctios of data miig are applyig various methods ad algorithms i order to discover ad extract patters of stored data []. Data miig ad kowledge discovery applicatios have got a rich focus due to its sigificace i decisio makig ad it has become a essetial compoet i various orgaizatios. Data miig techiques have bee itroduced ito ew fields of Statistics, Databases, Machie Learig, Patter Reorgaizatio, Artificial Itelligece ad Computatio capabilities etc. There are icreasig research iterests i usig data miig i educatio. This ew emergig field, called Educatioal Data Miig, cocers with developig methods that discover kowledge from data origiatig from educatioal eviromets [3]. Educatioal Data Miig uses may techiques such as Decisio Trees, Neural Networks, Naïve Bayes, K- Nearest eighbor, ad may others. Usig these techiques may kids of kowledge ca be discovered such as associatio rules, classificatios ad clusterig. The discovered kowledge ca be used for predictio regardig erolmet of studets i a particular course, alieatio of traditioal classroom teachig model, detectio of ufair meas used i olie examiatio, detectio of abormal values i the result sheets of the studets, predictio about studets performace ad so o. The mai obective of this paper is to use data miig methodologies to study studets performace i the courses. Data miig provides may tasks that could be used to study the studet performace. I this research, the classificatio task is used to evaluate studet s performace ad as there are may approaches that are used for data classificatio, the decisio tree method is used here. Iformatio s like Attedace, Class test, Semiar ad Assigmet marks were collected from the studet s maagemet system, to predict the performace at the ed of the semester. This paper ivestigates the accuracy of Decisio tree techiques for predictig studet performace. II. DATA MINING DEFINITION AND TECHNIQUES Data miig, also popularly kow as Kowledge Discovery i Database, refers to extractig or miig" kowledge from large amouts of data. Data miig techiques are used to operate o large volumes of data to discover hidde patters ad relatioships helpful i decisio makig. While data miig ad kowledge discovery i database are frequetly treated as syoyms, data miig is actually part of the kowledge discovery process. The sequeces of steps idetified i extractig kowledge from data are show i Figure. 63 P a g e

(IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 Figure : The steps of extractig kowledge from data Various algorithms ad techiques like Classificatio, Clusterig, Regressio, Artificial Itelligece, Neural Networks, Associatio Rules, Decisio Trees, Geetic Algorithm, Nearest Neighbor method etc., are used for kowledge discovery from databases. These techiques ad methods i data miig eed brief metio to have better uderstadig. A. Classificatio Classificatio is the most commoly applied data miig techique, which employs a set of pre-classified examples to develop a model that ca classify the populatio of records at large. This approach frequetly employs decisio tree or eural etwork-based classificatio algorithms. The data classificatio process ivolves learig ad classificatio. I Learig the traiig data are aalyzed by classificatio algorithm. I classificatio test data are used to estimate the accuracy of the classificatio rules. If the accuracy is acceptable the rules ca be applied to the ew data tuples. The classifier-traiig algorithm uses these pre-classified examples to determie the set of parameters required for proper discrimiatio. The algorithm the ecodes these parameters ito a model called a classifier. B. Clusterig Kowledge Clusterig ca be said as idetificatio of similar classes of obects. By usig clusterig techiques we ca further idetify dese ad sparse regios i obect space ad ca discover overall distributio patter ad correlatios amog data attributes. Classificatio approach ca also be used for effective meas of distiguishig groups or classes of obect but it becomes costly so clusterig ca be used as preprocessig approach for attribute subset selectio ad classificatio. C. Predicatio Regressio techique ca be adapted for predicatio. Regressio aalysis ca be used to model the relatioship betwee oe or more idepedet variables ad depedet variables. I data miig idepedet variables are attributes already kow ad respose variables are what we wat to predict. Ufortuately, may real-world problems are ot simply predictio. Therefore, more complex techiques (e.g., logistic regressio, decisio trees, or eural ets may be ecessary to forecast future values. The same model types ca ofte be used for both regressio ad classificatio. For example, the CART (Classificatio ad Regressio Trees decisio tree algorithm ca be used to build both classificatio trees (to classify categorical respose variables ad regressio trees (to forecast cotiuous respose variables. Neural etworks too ca create both classificatio ad regressio models. D. Associatio rule Associatio ad correlatio is usually to fid frequet item set fidigs amog large data sets. This type of fidig helps busiesses to make certai decisios, such as catalogue desig, cross marketig ad customer shoppig behavior aalysis. Associatio Rule algorithms eed to be able to geerate rules with cofidece values less tha oe. However the umber of possible Associatio Rules for a give dataset is geerally very large ad a high proportio of the rules are usually of little (if ay value. E. Neural etworks Neural etwork is a set of coected iput/output uits ad each coectio has a weight preset with it. Durig the learig phase, etwork lears by adustig weights so as to be able to predict the correct class labels of the iput tuples. Neural etworks have the remarkable ability to derive meaig from complicated or imprecise data ad ca be used to extract patters ad detect treds that are too complex to be oticed by either humas or other computer techiques. These are well suited for cotiuous valued iputs ad outputs. Neural etworks are best at idetifyig patters or treds i data ad well suited for predictio or forecastig eeds. F. Decisio Trees Decisio tree is tree-shaped structures that represet sets of decisios. These decisios geerate rules for the classificatio of a dataset. Specific decisio tree methods iclude Classificatio ad Regressio Trees (CART ad Chi Square Automatic Iteractio Detectio (CHAID. G. Nearest Neighbor Method A techique that classifies each record i a dataset based o a combiatio of the classes of the k record(s most similar to it i a historical dataset (where k is greater tha or equal to. Sometimes called the k-earest eighbor techique. 64 P a g e

(IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 III. RELATED WORK Data miig i higher educatio is a recet research field ad this area of research is gaiig popularity because of its potetials to educatioal istitutes. Data Miig ca be used i educatioal field to ehace our uderstadig of learig process to focus o idetifyig, extractig ad evaluatig variables related to the learig process of studets as described by Alaa el-halees [4]. Miig i educatioal eviromet is called Educatioal Data Miig. Ha ad Kamber [3] describes data miig software that allow the users to aalyze data from differet dimesios, categorize it ad summarize the relatioships which are idetified durig the miig process. Padey ad Pal [5] coducted study o the studet performace based by selectig 600 studets from differet colleges of Dr. R. M. L. Awadh Uiversity, Faizabad, Idia. By meas of Bayes Classificatio o category, laguage ad backgroud qualificatio, it was foud that whether ew comer studets will performer or ot. Hiazi ad Naqvi [6] coducted as study o the studet performace by selectig a sample of 300 studets (5 males, 75 females from a group of colleges affiliated to Puab uiversity of Pakista. The hypothesis that was stated as "Studet's attitude towards attedace i class, hours spet i study o daily basis after college, studets' family icome, studets' mother's age ad mother's educatio are sigificatly related with studet performace" was framed. By meas of simple liear regressio aalysis, it was foud that the factors like mother s educatio ad studet s family icome were highly correlated with the studet academic performace. Kha [7] coducted a performace study o 400 studets comprisig 00 boys ad 00 girls selected from the seior secodary school of Aligarh Muslim Uiversity, Aligarh, Idia with a mai obective to establish the progostic value of differet measures of cogitio, persoality ad demographic variables for success at higher secodary level i sciece stream. The selectio was based o cluster samplig techique i which the etire populatio of iterest was divided ito groups, or clusters, ad a radom sample of these clusters was selected for further aalyses. It was foud that girls with high socio-ecoomic status had relatively higher academic achievemet i sciece stream ad boys with low socioecoomic status had relatively higher academic achievemet i geeral. Galit [8] gave a case study that use studets data to aalyze their learig behavior to predict the results ad to war studets at risk before their fial exams. Al-Radaideh, et al [9] applied a decisio tree model to predict the fial grade of studets who studied the C++ course i Yarmouk Uiversity, Jorda i the year 005. Three differet classificatio methods amely ID3, C4.5, ad the NaïveBayes were used. The outcome of their results idicated that Decisio Tree model had better predictio tha other models. Padey ad Pal [0] coducted study o the studet performace based by selectig 60 studets from a degree college of Dr. R. M. L. Awadh Uiversity, Faizabad, Idia. By meas of associatio rule they fid the iterestigess of studet i optig class teachig laguage. Ayesha, Mustafa, Sattar ad Kha [] describes the use of k-meas clusterig algorithm to predict studet s learig activities. The iformatio geerated after the implemetatio of data miig techique may be helpful for istructor as well as for studets. Bray [], i his study o private tutorig ad its implicatios, observed that the percetage of studets receivig private tutorig i Idia was relatively higher tha i Malaysia, Sigapore, Japa, Chia ad Sri Laka. It was also observed that there was a ehacemet of academic performace with the itesity of private tutorig ad this variatio of itesity of private tutorig depeds o the collective factor amely socioecoomic coditios. Bhardwa ad Pal [3] coducted study o the studet performace based by selectig 300 studets from 5 differet degree college coductig BCA (Bachelor of Computer Applicatio course of Dr. R. M. L. Awadh Uiversity, Faizabad, Idia. By meas of Bayesia classificatio method o 7 attribute, it was foud that the factors like studets grade i seior secodary exam, livig locatio, medium of teachig, mother s qualificatio, studets other habit, family aual icome ad studet s family status were highly correlated with the studet academic performace. IV. DATA MINING PROCESS I preset day s educatioal system, a studets performace is determied by the iteral assessmet ad ed semester examiatio. The iteral assessmet is carried out by the teacher based upo studets performace i educatioal activities such as class test, semiar, assigmets, geeral proficiecy, attedace ad lab work. The ed semester examiatio is oe that is scored by the studet i semester examiatio. Each studet has to get miimum marks to pass a semester i iteral as well as ed semester examiatio. A. Data Preparatios The data set used i this study was obtaied from VBS Purvachal Uiversity, Jaupur (Uttar Pradesh o the samplig method of computer Applicatios departmet of course MCA (Master of Computer Applicatios from sessio 007 to 00. Iitially size of the data is 50. I this step data stored i differet tables was oied i a sigle table after oiig process errors were removed. B. Data selectio ad trasformatio I this step oly those fields were selected which were required for data miig. A few derived variables were selected. While some of the iformatio for the variables was extracted from the database. All the predictor ad respose variables which were derived from the database are give i Table I for referece. 65 P a g e

(IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 TABLE I. STUDENT RELATED VARIABLES Variable Descriptio Possible Values {First > 60% PSM Previous Semester Marks Secod >45 & <60% Third >36 & <45% Fail < 36%} CTG Class Test Grade {Poor, Average, Good} SEM Semiar Performace {Poor, Average, Good} ASS Assigmet {Yes, No} GP Geeral Proficiecy {Yes, No} ATT Attedace {Poor, Average, Good} LW Lab Work {Yes, No} ESM Ed Semester Marks {First > 60% Secod >45 & <60% Third >36 & <45% Fail < 36%} The domai values for some of the variables were defied for the preset ivestigatio as follows: PSM Previous Semester Marks/Grade obtaied i MCA course. It is split ito five class values: First >60%, Secod >45% ad <60%, Third >36% ad < 45%, Fail < 40%. CTG Class test grade obtaied. Here i each semester two class tests are coducted ad average of two class test are used to calculate sessioal marks. CTG is split ito three classes: Poor < 40%, Average > 40% ad < 60%, Good >60%. SEM Semiar Performace obtaied. I each semester semiar are orgaized to check the performace of studets. Semiar performace is evaluated ito three classes: Poor Presetatio ad commuicatio skill is low, Average Either presetatio is fie or Commuicatio skill is fie, Good Both presetatio ad Commuicatio skill is fie. ASS Assigmet performace. I each semester two assigmets are give to studets by each teacher. Assigmet performace is divided ito two classes: Yes studet submitted assigmet, No Studet ot submitted assigmet. GP - Geeral Proficiecy performace. Like semiar, i each semester geeral proficiecy tests are orgaized. Geeral Proficiecy test is divided ito two classes: Yes studet participated i geeral proficiecy, No Studet ot participated i geeral proficiecy. ATT Attedace of Studet. Miimum 70% attedace is compulsory to participate i Ed Semester Examiatio. But eve through i special cases low attedace studets also participate i Ed Semester Examiatio o geuie reaso. Attedace is divided ito three classes: Poor - <60%, Average - > 60% ad <80%, Good - >80%. LW Lab Work. Lab work is divided ito two classes: Yes studet completed lab work, No studet ot completed lab work. ESM - Ed semester Marks obtaied i MCA semester ad it is declared as respose variable. It is split ito five class values: First >60%, Secod >45% ad <60%, Third >36% ad < 45%, Fail < 40%. C. Decisio Tree A decisio tree is a tree i which each brach ode represets a choice betwee a umber of alteratives, ad each leaf ode represets a decisio. Decisio tree are commoly used for gaiig iformatio for the purpose of decisio -makig. Decisio tree starts with a root ode o which it is for users to take actios. From this ode, users split each ode recursively accordig to decisio tree learig algorithm. The fial result is a decisio tree i which each brach represets a possible sceario of decisio ad its outcome. The three widely used decisio tree learig algorithms are: ID3, ASSISTANT ad C4.5. D. The ID3 Decisio Tree ID3 is a simple decisio tree learig algorithm developed by Ross Quila [4]. The basic idea of ID3 algorithm is to costruct the decisio tree by employig a top-dow, greedy search through the give sets to test each attribute at every tree ode. I order to select the attribute that is most useful for classifyig a give sets, we itroduce a metric - iformatio gai. To fid a optimal way to classify a learig set, what we eed to do is to miimize the questios asked (i.e. miimizig the depth of the tree. Thus, we eed some fuctio which ca measure which questios provide the most balaced splittig. The iformatio gai metric is such a fuctio. E. Measurig Impurity Give a data table that cotais attributes ad class of the attributes, we ca measure homogeeity (or heterogeeity of the table based o the classes. We say a table is pure or homogeous if it cotais oly a sigle class. If a data table cotais several classes, the we say that the table is impure or heterogeeous. There are several idices to measure degree of impurity quatitatively. Most well kow idices to measure degree of impurity are etropy, gii idex, ad classificatio error. Etropy = - p log p Etropy of a pure table (cosist of sigle class is zero because the probability is ad log ( = 0. Etropy reaches maximum value whe all classes i the table have equal probability. 66 P a g e

Gii Idex = p Gii idex of a pure table cosist of sigle class is zero because the probability is ad - = 0. Similar to Etropy, Gii idex also reaches maximum value whe all classes i the table have equal probability. Classificatio Error = max Similar to Etropy ad Gii Idex, Classificatio error idex of a pure table (cosist of sigle class is zero because the probability is ad -max ( = 0. The value of classificatio error idex is always betwee 0 ad. I fact the maximum Gii idex for a give umber of classes is always equal to the maximum of classificatio error idex because for a umber of classes, we set probability is equal to ad maximum Gii idex happes at p = (IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 p, while maximum classificatio error idex also happes at max. F. Splittig Criteria To determie the best attribute for a particular ode i the tree we use the measure called Iformatio Gai. The iformatio gai, Gai (S, A of a attribute A, relative to a collectio of examples S, is defied as Gai( S, A Etropy ( S vvalues( A Sv Etropy ( S Where Values (A is the set of all possible values for attribute A, ad S v is the subset of S for which attribute A has value v (i.e., S v = {s S A(s = v}. The first term i the equatio for Gai is ust the etropy of the origial collectio S ad the secod term is the expected value of the etropy after S is partitioed usig attribute A. The expected etropy described by this secod term is simply the sum of the etropies of each subset, weighted by the fractio of examples S v v that belog to Gai (S, A is therefore the expected reductio i etropy caused by kowig the value of attribute A. Split Iformatio (S, A= ad Gai Ratio(S, A = i Si log Si Gai( S, A Split Iformatio ( S, A The process of selectig a ew attribute ad partitioig the traiig examples is ow repeated for each o termial descedat ode. Attributes that have bee icorporated higher i the tree are excluded, so that ay give attribute ca appear at most oce alog ay path through the tree. This process cotiues for each ew leaf ode util either of two coditios is met:. Every attribute has already bee icluded alog this path through the tree, or. The traiig examples associated with this leaf ode all have the same target attribute value (i.e., their etropy is zero. G. The ID3Algoritm ID3 (Examples, Target_Attribute, Attributes Create a root ode for the tree If all examples are positive, Retur the sigle-ode tree Root, with label = +. If all examples are egative, Retur the sigle-ode tree Root, with label = -. If umber of predictig attributes is empty, the Retur the sigle ode tree Root, with label = most commo value of the target attribute i the examples. Otherwise Begi o A = The Attribute that best classifies examples. o Decisio Tree attribute for Root = A. o For each possible value, v i, of A, Add a ew tree brach below Root, correspodig to the test A = v i. Let Examples(v i be the subset of examples that have the value v i for A If Examples(v i is empty The below this ew brach add a leaf ode with label = most commo target value i the examples Else below this ew brach add the subtree ID3 (Examples(v i, Target_Attribute, Attributes {A} Ed Retur Root V. RESULTS AND DISCUSSION The data set of 50 studets used i this study was obtaied from VBS Purvachal Uiversity, Jaupur (Uttar Pradesh Computer Applicatios departmet of course MCA (Master of Computer Applicatios from sessio 007 to 00. TABLE II. DATA SET S. No. PSM CTG SEM ASS GP ATT LW ESM. First Good Good Yes Yes Good Yes First. First Good Average Yes No Good Yes First 3. First Good Average No No Average No First 4. First Average Good No No Good Yes First 5. First Average Average No Yes Good Yes First 6. First Poor Average No No Average Yes First 67 P a g e

(IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 7. First Poor Average No No Poor Yes Secod 8. First Average Poor Yes Yes Average No First 9. First Poor Poor No No Poor No Third 0. First Average Average Yes Yes Good No First. Secod Good Good Yes Yes Good Yes First. Secod Good Average Yes Yes Good Yes First 3. Secod Good Average Yes No Good No First 4. Secod Average Good Yes Yes Good No First 5. Secod Good Average Yes Yes Average Yes First 6. Secod Good Average Yes Yes Poor Yes Secod 7. Secod Average Average Yes Yes Good Yes Secod 8. Secod Average Average Yes Yes Poor Yes Secod 9. Secod Poor Average No Yes Good Yes Secod 0. Secod Average Poor Yes No Average Yes Secod. Secod Poor Average No Yes Poor No Third. Secod Poor Poor Yes Yes Average Yes Third 3. Secod Poor Poor No No Average Yes Third 4. Secod Poor Poor Yes Yes Good Yes Secod 5. Secod Poor Poor Yes Yes Poor Yes Third 6. Secod Poor Poor No No Poor Yes Fail 7. Third Good Good Yes Yes Good Yes First 8. Third Average Good Yes Yes Good Yes Secod 9. Third Good Average Yes Yes Good Yes Secod 30. Third Good Good Yes Yes Average Yes Secod 3. Third Good Good No No Good Yes Secod 3. Third Average Average Yes Yes Good Yes Secod 33. Third Average Average No Yes Average Yes Third 34. Third Average Good No No Good Yes Third 35. Third Good Average No Yes Average Yes Third 36. Third Average Poor No No Average Yes Third 37. Third Poor Average Yes No Average Yes Third 38. Third Poor Average No Yes Poor Yes Fail 39. Third Average Average No Yes Poor Yes Third 40. Third Poor Poor No No Good No Third 4. Third Poor Poor No Yes Poor Yes Fail 4. Third Poor Poor No No Poor No Fail 43. Fail Good Good Yes Yes Good Yes Secod 44. Fail Good Good Yes Yes Average Yes Secod 45. Fail Average Good Yes Yes Average Yes Third 46. Fail Poor Poor Yes Yes Average No Fail 47. Fail Good Poor No Yes Poor Yes Fail 48. Fail Poor Poor No No Poor Yes Fail 49. Fail Average Average Yes Yes Good Yes Secod 50. Fail Poor Good No No Poor No Fail To work out the iformatio gai for A relative to S, we first eed to calculate the etropy of S. Here S is a set of 50 examples are 4 First, 5 Secod, 3 Third ad 8 Fail.. Etropy (S = = p p First third 4 log 50 3 log 50 =.964 First third 3 50 p p Secod Fail 4 5 log 50 50 8 50 log 5 50 8 50 Fail Secod To determie the best attribute for a particular ode i the tree we use the measure called Iformatio Gai. The iformatio gai, Gai (S, A of a attribute A, relative to a collectio of examples S, SFirst Gai( S, PSM Etropy ( S Etropy ( S S Secod Etropy ( S SFail Etropy ( S TABLE III. Fail Secod GAIN VALUES Gai Value Gai(S, PSM 0.577036 Gai(S, CTG 0.5573 Gai(S, SEM 0.36588 Gai(S, ASS 0.868 Gai (S, GP 0.043936 Gai(S, ATT 0.4594 Gai(S, LW 0.45353 First SThird Etropy ( S PSM has the highest gai, therefore it is used as the root ode as show i figure. Figure. PSM as root ode Gai Ratio ca be used for attribute selectio, before calculatig Gai ratio Split Iformatio is show i table IV. TABLE IV. SPLIT INFORMATION Split Iformatio Value Split(S, PSM.386579 Split (S, CTG.44844 Split (S, SEM.597734 Split (S, ASS.744987 Split (S, GP.9968 Split (S, ATT.5673 Split (S, LW.500 Gai Ratio is show i table V. TABLE V. PSM First Secod Third Fail GAIN RATIO Gai Ratio Value Gai Ratio (S, PSM 0.4658 Gai Ratio (S, CTG 0.355674 Gai Ratio (S, SEM 0.9 Gai Ratio (S, ASS 0.589 Gai Ratio (S, GP 0.0887 Gai Ratio (S, ATT 0.98968 Gai Ratio (S, LW 0.3003 This process goes o util all data classified perfectly or ru out of attributes. The kowledge represeted by decisio tree ca be extracted ad represeted i the form of IF-THEN rules. Third 68 P a g e

(IJACSA Iteratioal Joural of Advaced Computer Sciece ad Applicatios, Vol., No. 6, 0 IF PSM = First AND ATT = Good AND CTG = Good or Average THEN ESM = First IF PSM = First AND CTG = Good AND ATT = Good OR Average THEN ESM = First IF PSM = Secod AND ATT = Good AND ASS = Yes THEN ESM = First IF PSM = Secod AND CTG = Average AND LW = Yes THEN ESM = Secod IF PSM = Third AND CTG = Good OR Average AND ATT = Good OR Average THEN PSM = Secod IF PSM = Third AND ASS = No AND ATT = Average THEN PSM = Third IF PSM = Fail AND CTG = Poor AND ATT = Poor THEN PSM = Fail Figure 3. Rule Set geerated by Decisio Tree Oe classificatio rules ca be geerated for each path from each termial ode to root ode. Pruig techique was executed by removig odes with less tha desired umber of obects. IF- THEN rules may be easier to uderstad is show i figure 3. CONCLUSION I this paper, the classificatio task is used o studet database to predict the studets divisio o the basis of previous database. As there are may approaches that are used for data classificatio, the decisio tree method is used here. Iformatio s like Attedace, Class test, Semiar ad Assigmet marks were collected from the studet s previous database, to predict the performace at the ed of the semester. This study will help to the studets ad the teachers to improve the divisio of the studet. This study will also work to idetify those studets which eeded special attetio to reduce fail ratio ad takig appropriate actio for the ext semester examiatio. REFERENCES [] Heikki, Maila, Data miig: machie learig, statistics, ad databases, IEEE, 996. [] U. Fayadd, Piatesky, G. Shapiro, ad P. Smyth, From data miig to kowledge discovery i databases, AAAI Press / The MIT Press, Massachusetts Istitute Of Techology. ISBN 0 6 56097 6, 996. [3] J. Ha ad M. Kamber, Data Miig: Cocepts ad Techiques, Morga Kaufma, 000. [4] Alaa el-halees, Miig studets data to aalyze e-learig behavior: A Case Study, 009.. [5] U. K. Padey, ad S. Pal, Data Miig: A predictio of performer or uderperformer usig classificatio, (IJCSIT Iteratioal Joural of Computer Sciece ad Iformatio Techology, Vol. (, pp.686-690, ISSN:0975-9646, 0. [6] S. T. Hiazi, ad R. S. M. M. Naqvi, Factors affectig studet s performace: A Case of Private Colleges, Bagladesh e-joural of Sociology, Vol. 3, No., 006. [7] Z. N. Kha, Scholastic achievemet of higher secodary studets i sciece stream, Joural of Social Scieces, Vol., No., pp. 84-87, 005.. [8] Galit.et.al, Examiig olie learig processes based o log files aalysis: a case study. Research, Reflectio ad Iovatios i Itegratig ICT i Educatio 007. [9] Q. A. AI-Radaideh, E. W. AI-Shawakfa, ad M. I. AI-Naar, Miig studet data usig decisio trees, Iteratioal Arab Coferece o Iformatio Techology(ACIT'006, Yarmouk Uiversity, Jorda, 006. [0] U. K. Padey, ad S. Pal, A Data miig view o class room teachig laguage, (IJCSI Iteratioal Joural of Computer Sciece Issue, Vol. 8, Issue, pp. 77-8, ISSN:694-084, 0. [] Shaeela Ayesha, Tasleem Mustafa, Ahsa Raza Sattar, M. Iayat Kha, Data miig model for higher educatio system, Europe Joural of Scietific Research, Vol.43, No., pp.4-9, 00. [] M. Bray, The shadow educatio system: private tutorig ad its implicatios for plaers, (d ed., UNESCO, PARIS, Frace, 007. [3] B.K. Bharadwa ad S. Pal. Data Miig: A predictio for performace improvemet usig classificatio, Iteratioal Joural of Computer Sciece ad Iformatio Security (IJCSIS, Vol. 9, No. 4, pp. 36-40, 0. [4] J. R. Quila, Itroductio of decisio tree: Machie lear, : pp. 86-06, 986. [5] Vashishta, S. (0. Efficiet Retrieval of Text for Biomedical Domai usig Data Miig Algorithm. IJACSA - Iteratioal Joural of Advaced Computer Sciece ad Applicatios, (4, 77-80. [6] Kumar, V. (0. A Empirical Study of the Applicatios of Data Miig Techiques i Higher Educatio. IJACSA - Iteratioal Joural of Advaced Computer Sciece ad Applicatios, (3, 80-84. Retrieved from http://iacsa.thesai.org. AUTHORS PROFILE Briesh Kumar Bhardwa is Assistat Professor i the Departmet of Computer Applicatios, Dr. R. M. L. Avadh Uiversity Faizabad Idia. He obtaied his M.C.A degree from Dr. R. M. L. Avadh Uiversity Faizabad (003 ad M.Phil. i Computer Applicatios from Viayaka missio Uiversity, Tamiladu. He is curretly doig research i Data Miig ad Kowledge Discovery. He has published oe iteratioal paper. Saurabh Pal received his M.Sc. (Computer Sciece from Allahabad Uiversity, UP, Idia (996 ad obtaied his Ph.D. degree from the Dr. R. M. L. Awadh Uiversity, Faizabad (00. He the oied the Dept. of Computer Applicatios, VBS Purvachal Uiversity, Jaupur as Lecturer. At preset, he is workig as Head ad Sr. Lecturer at Departmet of Computer Applicatios. Saurabh Pal has authored a commedable umber of research papers i iteratioal/atioal Coferece/ourals ad also guides research scholars i Computer Sciece/Applicatios. He is a active member of IACSIT, CSI, Society of Statistics ad Computer Applicatios ad workig as Reviewer/Editorial Board Member for more tha 5 iteratioal ourals. His research iterests iclude Image Processig, Data Miig, Grid Computig ad Artificial Itelligece. 69 P a g e