Index Contents Page No. Introduction . Data Mining & Knowledge Discovery



Similar documents
Data Mining Applications in Higher Education

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining Solutions for the Business Environment

from Larson Text By Susan Miertschin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

An Overview of Knowledge Discovery Database and Data mining Techniques

Subject Description Form

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Chapter 12 Discovering New Knowledge Data Mining

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Data Mining Techniques in CRM

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

The Prophecy-Prototype of Prediction modeling tool

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Prediction of Heart Disease Using Naïve Bayes Algorithm

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Warehousing and Data Mining in Business Applications

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Fluency With Information Technology CSE100/IMT100

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

An Overview of Database management System, Data warehousing and Data Mining

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

Information Management course

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Data Mining Applications in Fund Raising

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Data Mart/Warehouse: Progress and Vision

Data Mining Analytics for Business Intelligence and Decision Support

An Introduction to Data Mining

Principles of Data Mining by Hand&Mannila&Smyth

Data Mining Applications in Manufacturing

ANALYTICS CENTER LEARNING PROGRAM

Rule based Classification of BSE Stock Data with Data Mining

Business Intelligence. Data Mining and Optimization for Decision Making

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Research Phases of University Data Mining Project Development

Data Mining Part 5. Prediction

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Predicting Students Final GPA Using Decision Trees: A Case Study

Predicting Student Performance by Using Data Mining Methods for Classification

Introduction to Data Mining

Data Preprocessing. Week 2

Introduction. A. Bellaachia Page: 1

MS1b Statistical Data Mining

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

A Review of Data Mining Techniques

Teaching Big Data and Analytics to Undergraduate and Graduate Students

The Data Mining Process

Learning outcomes. Knowledge and understanding. Competence and skills

The University of Jordan

USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES

Why is Internal Audit so Hard?

A New Approach for Evaluation of Data Mining Techniques

Nagarjuna College Of

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Course Syllabus. Purposes of Course:

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Customer Classification And Prediction Based On Data Mining Technique

Chapter 2 Literature Review

not possible or was possible at a high cost for collecting the data.

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

How To Predict Web Site Visits

Data Mining and Machine Learning in Bioinformatics

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Analytics on Big Data

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining System, Functionalities and Applications: A Radical Review

Web Data Mining: A Case Study. Abstract. Introduction

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Data Mining: Overview. What is Data Mining?

E-Intelligence form design and Data Preprocessing in Health Care

Transcription:

Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2. Data Mining & Knowledge Discovery 7 2.1 Data Mining Concepts 11 2.2 Data Mining Process 16 2.3 Data Mining as a Part of the Knowledge Discovery Process 21 2.4 Models for Data Mining 22 2.5 Goals of Data Mining and Knowledge Discovery 24 2.5.1 Prediction 24 2.5.2 Identification 24 2.5.3 Classification 24 2.5.4 Optimization 25 2.6 Types of Knowledge Discovered during Data Mining 25 2.6.1 Association Rules 25 2.6.2 Classification Hierarchy 25 2.6.3 Sequential Patterns 26 2.6.4 Patterns within Time Series 26 2.6.5 Categorization and segmentation 26 2.7 Learning 27

2.7.1 Inductive Learning 27 2.7.2 Supervised learning 27 2.7.3 Unsupervised learning 27 2.8 Data Mining and Data Warehousing 28 2.8.1 Introduction to Data Warehousing 28 2.8.2 Characteristics of Data Warehouse 29 2.8.3 Benefits of Data Warehouse 30 2.8.4 How Does a Data Warehouse Work 30 3. Data Mining Tasks & Algorithm 31 3.1 Association Analysis 31 3.2 Algorithm APRIORI 33 3.3 Classification and Prediction 34 3.4 Bayesian Classification 36 3.5 Decision Trees 36 3.5.1 Viewing Decision Trees as Segmentation with a Purpose 41 3.5.2 Applying Decision Trees to Business 42 3.5.3 Where Can Decision Trees be Used 42 3.5.4 Using Decision Trees for Data Preprocessing 43 3.5.5 Decision tress for Prediction 43 3.5.6 C4.5 Algorithm: Generating a Decision Tree 44 3.6 Neural Networks 46 3.6.1 Artificial Neural Network 48 3.6.2 The Mathematical Model 49 3.6.3 Neural Networks in Data Mining 50 3.6.4 Applying Neural Networks to Business 52 3.6.5 Neural Networks for Clustering 53 3.6.6 Neural Networks for Outlier Analysis 53 3.7 Rule Induction 54 3.7.1 Applying Rule Induction to Business 55 3.7.2 What is a rule 55

3.7.3 What to do with a Rule 56 3.7.4 Discovery 58 3.8 Deviation analysis 58 3.9 Clustering Analysis 58 3.9.1 K-means Partitional-Clustering Algorithm 62 3.9.2 K-Means Clustering 62 3.10 Outlier Analysis 63 4. Data Mining in Higher Education 64 4.1 Data Mining for Education 65 4.1.1 Prediction 65 4.1.2 Clustering 67 4.2 Relationship Mining 68 4.3 Distillation of Data for Human Judgment 69 4.4 Scenario of Higher Education in India 70 4.5 Data Mining: A Way to Improve Today s Higher Learning Institutions 71 4.6 Supervised and Unsupervised modeling 72 4.7 Application of Data Mining in Higher Education 73 4.8 Data Mining Areas 75 4.9 Major Data Mining Tasks that Can be Used to Find Certain Patterns 78 4.10 The Integration of Data Mining Processes in Higher Education Topics 79 5. Analysis of Data Mining Applications in Education System 83 5.1 Predicting Alumni Pledge 84 5.2 Uses of Data Mining in CRCT Scores 84 5.3 Creating Meaningful Learning Outcome Typologies 85 5.4 Use of Data Mining Techniques to Develop Institutional Typologies 86 5.5 Academic Planning and Interventions Transfer Prediction 86 5.6 Predicting and Clustering Persisters and Non-Persisters 86 5.7 Predicting a Student s Performance 87

5.8 Improving Quality of Graduate Students by Data Mining 88 5.9 Proposed Analysis Guideline (DM-HEDU) 88 5.10 Data Analysis and Investigation 93 5.10.1 Domain Understanding 93 5.10.2 Data Understanding 93 5.10.3 Data Preparation 94 5.11 Data Mining Modeling 96 5.11.1 Predictive Data Mining Models 96 Model A: Predicting Student Success Rate for Individual 96 Student Model A.1: Decision Tree and Neural Network Classification 97 Technique Model A.2: Neural Network and RBF Prediction Techniques 97 Model B: Predicting Student Success Rate for Individual 98 Lecturer Model B.1: Decision Tree and Neural Network Classification 98 Technique Model B.2: Neural Network Model 98 5.11.2 Descriptive Data Mining Modeling 99 Model C: Model of Student Course Enrollment 99 Model D: Model of Lecturer Course Assignment Policy 100 Making Model E: Model of Lecturer Typologies 100 Model E.1: Cluster Number 1 101 Model E.2: Cluster Number 2 101 Model F: Model of Course Time Planning 101 5.12 Analysis and Discussion 101 5.12.1 Creditability of the Results 102 5.13 Factors Affecting the Reliability of Model 102 5.13.1 Reduction in the total number of data 102 5.13.1.1 Handling missing value phase 102

5.13.1.2 Incompleteness and low quality of important attribute 103 value 5.13.1.3 Data integration and database application 103 5.13.1.4 Inconsistency and value error among attributes 103 5.13.2 Reduction in the total number of attributes 103 5.13.2.1 Feature Selection in data preparation phase 104 5.13.2.2 Medium quality of attributes and their values 104 5.14 Suggestion to Improve the Model s Quality 104 5.14.1 Student s background knowledge (pre-university academic 105 information) 5.14.2 Student s course knowledge 105 5.14.3 Student s demographics knowledge 105 5.14.4 Lecturer academic knowledge 105 5.14.5 Lecturer s demographic knowledge 105 5.14.6 Course knowledge 106 5.15 Summary 106 6. Testing & Result on Potential Application of Data Mining in Higher 107 Education 6.1 Organization of Syllabus 107 6.1.1 Methodology 108 6.2 Predicting the Registration of Students in an Educational Program 109 6.2.1 Methodology 109 6.3 Predicting Student Performance 110 6.3.1 Methodology 110 6.4 Identifying Abnormal/ Erroneous Values 111 6.5 Result 112 6.6 Applying Data Mining Techniques to a Management Institute 112 6.6.1 The Admission Process 115 6.6.2 Counseling 116 6.6.3 Data Mining Techniques in admission & Counseling 116

6.7 Expected Benefits 117 7. Data Mining & Decision Making Environment 119 7.1 The Decision Making Environment in Higher Education 120 7.1.1 Demands for Improved Decision Making Capabilities 120 7.1.2 Challenges for Improving Decision Making Capabilities 122 7.2 A Framework for Decision Making Capabilities 123 7.2.1 Five Guiding Principles for Developing Decision Making 123 Capabilities 7.2.2 Program Lifecycle Management 124 7.2.3 Design Data & System Architecture 124 7.3 The Architecture of the Data Warehouse System 125 7.4 Dimensional Model of the HEIS Data Warehouse 126 7.5 Data Extraction, Transformation and Loading 128 7.6 Presenting Data 131 7.6.1 Predefined Queries 131 7.6.2 Detailed Ad hoc Queries 131 7.6.3 Summary Ad hoc Queries 132 7.7 Sustainable Approach for Data Warehousing at Institute 133 7.7.1 Decision Support Stages 133 7.8 Structure of Data Warehouse 136 7.9 Higher Education and Strategic Decision Support 138 7.9.1 Planning the Education Data Warehouse for Strategic Decision 138 Making 7.9.2 Planning ETL Processes and Data Warehouse Creation 139 7.9.3 Detailed Analysis 140 7.9.4 Planning / Execution / Implementation 141 8. Case Studies 142 8.1 Case study one: Academic planning and interventions transfer 142 prediction 8.1.1 Challenge 142

8.1.2 Solution 142 8.1.3 Results 143 8.2 Case study two: Predicting alumni pledges 143 8.2.1 Challenge 143 8.2.2 Solution 143 8.2.3 Results 144 8.3 The Research Approach 144 8.3.1 The Implementation 146 8.3.2 Data mining in Higher Education System 149 8.3.3 Proposed Model 149 8.3.4 Application 150 8.4 Results and Discussion 151 8.5 Case Study Three: Data Mining Process 154 8.5.1 Data Preparations 154 8.5.2 Data selection and transformation 154 8.5.3 Decision Tree 156 8.5.4 The ID3 Decision Tree 156 8.5.5 Measuring Impurity 156 8.5.6 Splitting Criteria 157 8.5.7 The ID3Algoritm 157 8.5.8 Results and Discussion 158 8.6 Case study four: Course planning of higher education 158 8.6.1 Methodology 159 8.6.1.1 Factors that determining the Quality of Education 159 8.6.2 System Architecture 161 8.6.3 CHAID for Data Mining 161 8.6.4 Link Analysis for Data Mining 163 8.6.5 Decision Forest for Data Mining 164 8.6.6 Course completion rate of entire PG students 165 8.6.7 Conclusion 165 8.7 Application Example 166

8.7.1 Conclusion 170 9. Conclusion and Future Work 171 9.1 Conclusion 9.2 Future Work Appendix A: An Introduction to Data Mining Software Tool WEKA & 174 RapidMiner WEKA 174 A.1 Description 174 A.2 Explorer 175 A.3 Regression 176 A.4 Building The Data Set for WEKA 176 A.5 Loading the Data into WEKA 177 A.6 Creating the Regression Model with WEKA 177 A.7 Interpreting the Regression Model 178 A.8 ARFF File 178 RapidMiner 179 A.1 Features 179 A.2 Purpose 180 A.3 Applications 180 A.4 Properties 180 A.5 GUI 181 Appendix B: Statistical Package for the Social Science (SPSS) 182 B.1 Statistics Included in the Base Software 182

B.2 The Most Popular IBM SPSS Products Include 183 Appendix C: Words & Acronyms 184 C.1 Words 184 C.2 Acronyms 188 Bibliography 190