Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2. Data Mining & Knowledge Discovery 7 2.1 Data Mining Concepts 11 2.2 Data Mining Process 16 2.3 Data Mining as a Part of the Knowledge Discovery Process 21 2.4 Models for Data Mining 22 2.5 Goals of Data Mining and Knowledge Discovery 24 2.5.1 Prediction 24 2.5.2 Identification 24 2.5.3 Classification 24 2.5.4 Optimization 25 2.6 Types of Knowledge Discovered during Data Mining 25 2.6.1 Association Rules 25 2.6.2 Classification Hierarchy 25 2.6.3 Sequential Patterns 26 2.6.4 Patterns within Time Series 26 2.6.5 Categorization and segmentation 26 2.7 Learning 27
2.7.1 Inductive Learning 27 2.7.2 Supervised learning 27 2.7.3 Unsupervised learning 27 2.8 Data Mining and Data Warehousing 28 2.8.1 Introduction to Data Warehousing 28 2.8.2 Characteristics of Data Warehouse 29 2.8.3 Benefits of Data Warehouse 30 2.8.4 How Does a Data Warehouse Work 30 3. Data Mining Tasks & Algorithm 31 3.1 Association Analysis 31 3.2 Algorithm APRIORI 33 3.3 Classification and Prediction 34 3.4 Bayesian Classification 36 3.5 Decision Trees 36 3.5.1 Viewing Decision Trees as Segmentation with a Purpose 41 3.5.2 Applying Decision Trees to Business 42 3.5.3 Where Can Decision Trees be Used 42 3.5.4 Using Decision Trees for Data Preprocessing 43 3.5.5 Decision tress for Prediction 43 3.5.6 C4.5 Algorithm: Generating a Decision Tree 44 3.6 Neural Networks 46 3.6.1 Artificial Neural Network 48 3.6.2 The Mathematical Model 49 3.6.3 Neural Networks in Data Mining 50 3.6.4 Applying Neural Networks to Business 52 3.6.5 Neural Networks for Clustering 53 3.6.6 Neural Networks for Outlier Analysis 53 3.7 Rule Induction 54 3.7.1 Applying Rule Induction to Business 55 3.7.2 What is a rule 55
3.7.3 What to do with a Rule 56 3.7.4 Discovery 58 3.8 Deviation analysis 58 3.9 Clustering Analysis 58 3.9.1 K-means Partitional-Clustering Algorithm 62 3.9.2 K-Means Clustering 62 3.10 Outlier Analysis 63 4. Data Mining in Higher Education 64 4.1 Data Mining for Education 65 4.1.1 Prediction 65 4.1.2 Clustering 67 4.2 Relationship Mining 68 4.3 Distillation of Data for Human Judgment 69 4.4 Scenario of Higher Education in India 70 4.5 Data Mining: A Way to Improve Today s Higher Learning Institutions 71 4.6 Supervised and Unsupervised modeling 72 4.7 Application of Data Mining in Higher Education 73 4.8 Data Mining Areas 75 4.9 Major Data Mining Tasks that Can be Used to Find Certain Patterns 78 4.10 The Integration of Data Mining Processes in Higher Education Topics 79 5. Analysis of Data Mining Applications in Education System 83 5.1 Predicting Alumni Pledge 84 5.2 Uses of Data Mining in CRCT Scores 84 5.3 Creating Meaningful Learning Outcome Typologies 85 5.4 Use of Data Mining Techniques to Develop Institutional Typologies 86 5.5 Academic Planning and Interventions Transfer Prediction 86 5.6 Predicting and Clustering Persisters and Non-Persisters 86 5.7 Predicting a Student s Performance 87
5.8 Improving Quality of Graduate Students by Data Mining 88 5.9 Proposed Analysis Guideline (DM-HEDU) 88 5.10 Data Analysis and Investigation 93 5.10.1 Domain Understanding 93 5.10.2 Data Understanding 93 5.10.3 Data Preparation 94 5.11 Data Mining Modeling 96 5.11.1 Predictive Data Mining Models 96 Model A: Predicting Student Success Rate for Individual 96 Student Model A.1: Decision Tree and Neural Network Classification 97 Technique Model A.2: Neural Network and RBF Prediction Techniques 97 Model B: Predicting Student Success Rate for Individual 98 Lecturer Model B.1: Decision Tree and Neural Network Classification 98 Technique Model B.2: Neural Network Model 98 5.11.2 Descriptive Data Mining Modeling 99 Model C: Model of Student Course Enrollment 99 Model D: Model of Lecturer Course Assignment Policy 100 Making Model E: Model of Lecturer Typologies 100 Model E.1: Cluster Number 1 101 Model E.2: Cluster Number 2 101 Model F: Model of Course Time Planning 101 5.12 Analysis and Discussion 101 5.12.1 Creditability of the Results 102 5.13 Factors Affecting the Reliability of Model 102 5.13.1 Reduction in the total number of data 102 5.13.1.1 Handling missing value phase 102
5.13.1.2 Incompleteness and low quality of important attribute 103 value 5.13.1.3 Data integration and database application 103 5.13.1.4 Inconsistency and value error among attributes 103 5.13.2 Reduction in the total number of attributes 103 5.13.2.1 Feature Selection in data preparation phase 104 5.13.2.2 Medium quality of attributes and their values 104 5.14 Suggestion to Improve the Model s Quality 104 5.14.1 Student s background knowledge (pre-university academic 105 information) 5.14.2 Student s course knowledge 105 5.14.3 Student s demographics knowledge 105 5.14.4 Lecturer academic knowledge 105 5.14.5 Lecturer s demographic knowledge 105 5.14.6 Course knowledge 106 5.15 Summary 106 6. Testing & Result on Potential Application of Data Mining in Higher 107 Education 6.1 Organization of Syllabus 107 6.1.1 Methodology 108 6.2 Predicting the Registration of Students in an Educational Program 109 6.2.1 Methodology 109 6.3 Predicting Student Performance 110 6.3.1 Methodology 110 6.4 Identifying Abnormal/ Erroneous Values 111 6.5 Result 112 6.6 Applying Data Mining Techniques to a Management Institute 112 6.6.1 The Admission Process 115 6.6.2 Counseling 116 6.6.3 Data Mining Techniques in admission & Counseling 116
6.7 Expected Benefits 117 7. Data Mining & Decision Making Environment 119 7.1 The Decision Making Environment in Higher Education 120 7.1.1 Demands for Improved Decision Making Capabilities 120 7.1.2 Challenges for Improving Decision Making Capabilities 122 7.2 A Framework for Decision Making Capabilities 123 7.2.1 Five Guiding Principles for Developing Decision Making 123 Capabilities 7.2.2 Program Lifecycle Management 124 7.2.3 Design Data & System Architecture 124 7.3 The Architecture of the Data Warehouse System 125 7.4 Dimensional Model of the HEIS Data Warehouse 126 7.5 Data Extraction, Transformation and Loading 128 7.6 Presenting Data 131 7.6.1 Predefined Queries 131 7.6.2 Detailed Ad hoc Queries 131 7.6.3 Summary Ad hoc Queries 132 7.7 Sustainable Approach for Data Warehousing at Institute 133 7.7.1 Decision Support Stages 133 7.8 Structure of Data Warehouse 136 7.9 Higher Education and Strategic Decision Support 138 7.9.1 Planning the Education Data Warehouse for Strategic Decision 138 Making 7.9.2 Planning ETL Processes and Data Warehouse Creation 139 7.9.3 Detailed Analysis 140 7.9.4 Planning / Execution / Implementation 141 8. Case Studies 142 8.1 Case study one: Academic planning and interventions transfer 142 prediction 8.1.1 Challenge 142
8.1.2 Solution 142 8.1.3 Results 143 8.2 Case study two: Predicting alumni pledges 143 8.2.1 Challenge 143 8.2.2 Solution 143 8.2.3 Results 144 8.3 The Research Approach 144 8.3.1 The Implementation 146 8.3.2 Data mining in Higher Education System 149 8.3.3 Proposed Model 149 8.3.4 Application 150 8.4 Results and Discussion 151 8.5 Case Study Three: Data Mining Process 154 8.5.1 Data Preparations 154 8.5.2 Data selection and transformation 154 8.5.3 Decision Tree 156 8.5.4 The ID3 Decision Tree 156 8.5.5 Measuring Impurity 156 8.5.6 Splitting Criteria 157 8.5.7 The ID3Algoritm 157 8.5.8 Results and Discussion 158 8.6 Case study four: Course planning of higher education 158 8.6.1 Methodology 159 8.6.1.1 Factors that determining the Quality of Education 159 8.6.2 System Architecture 161 8.6.3 CHAID for Data Mining 161 8.6.4 Link Analysis for Data Mining 163 8.6.5 Decision Forest for Data Mining 164 8.6.6 Course completion rate of entire PG students 165 8.6.7 Conclusion 165 8.7 Application Example 166
8.7.1 Conclusion 170 9. Conclusion and Future Work 171 9.1 Conclusion 9.2 Future Work Appendix A: An Introduction to Data Mining Software Tool WEKA & 174 RapidMiner WEKA 174 A.1 Description 174 A.2 Explorer 175 A.3 Regression 176 A.4 Building The Data Set for WEKA 176 A.5 Loading the Data into WEKA 177 A.6 Creating the Regression Model with WEKA 177 A.7 Interpreting the Regression Model 178 A.8 ARFF File 178 RapidMiner 179 A.1 Features 179 A.2 Purpose 180 A.3 Applications 180 A.4 Properties 180 A.5 GUI 181 Appendix B: Statistical Package for the Social Science (SPSS) 182 B.1 Statistics Included in the Base Software 182
B.2 The Most Popular IBM SPSS Products Include 183 Appendix C: Words & Acronyms 184 C.1 Words 184 C.2 Acronyms 188 Bibliography 190