Predicting Student Persistence Using Data Mining and Statistical Analysis Methods
|
|
|
- Darlene Dean
- 10 years ago
- Views:
Transcription
1 Predicting Student Persistence Using Data Mining and Statistical Analysis Methods Koji Fujiwara Office of Institutional Research and Effectiveness Bemidji State University & Northwest Technical College December 18, 2012 * The original version was presented at the AIRUM 2012 conference with Doug Olney.
2 NTC Enrollment Northwest Technical College enrollment grew very quickly between 2004 and 2009, but now it has been falling off. Fall 2004 Headcount = 794 Fall 2009 Headcount = 1,604 Fall 2012 Headcount = 1,168 2
3 Fall Semester Headcount and FTE Headcount (30th day) FTE (Final) 1,800 1,600 1,604 1,400 1,373 1,420 1,384 1,200 1,217 1,168 1,000 Projected Fall 07 Fall 08 Fall 09 Fall 10 Fall 11 Fall 12 FTE = Undergraduate Semester Hours / 15. 3
4 NTC Enrollment Report Headcount and FTE for Term by Campus - Northwest Technical College Spring 2013 Date: 12/17/2012 Days to beginning of term: Unduplicated Duplicated Campus Headcount Headcount FTE Credits FYE NTC NTC Distance Total Unadjusted numbers 4
5 Retention & Grad Rates IPEDS definition 90% 80% 78% 74% 81% 71% 76% 70% 60% 50% 48% 40% 40% 37% 35% 39% Fall to Spring Fall to Fall 150% Grad Rates 30% 32% 29% 20% 10% 0% Fall 2007 (166) Fall 2008 (114) Fall 2009 (170) Fall 2010 (141) Fall 2011 (104) Cohort (Size) 5
6 Retention/Success Study at NTC (& other colleges) A study was conducted last year to identify the factors that affected student persistence (success, to be defined later). Factors: Credit Completion Rate & GPA The first semester credit completion rate was analyzed using a 2 (success status: successful vs. unsuccessful) 2 (college ready status: ready vs. underprepared) 2 (late registration status: regular vs. late) ANOVA. 6
7 What did we learn? (AIRUM 2011 / AIR 2012) Credit Comp. Rate (%) 100 Unsuccessful Successful We focused on these students. We need to focus on these students as well. Early Reg. Late Reg. Early Reg. Late Reg. Underprepared College Ready 7
8 More Proactive Retention/Success Efforts How can an IR office contribute to this project? We looked at eight data mining and two statistical models to find the best method to develop a predictive model for student persistence (success). 8
9 Proposed Retention/Success Plan [Summer] Retention/Success Study & Develop a Model [Spring] Interventions [Fall] Run the model 9
10 Data and Variables 10
11 Target Variable Target Variable: Second Fall Success Define Second Fall Success = Success if At the beginning of second Fall Students came back to NTC (retained) or Students transferred out to another institution (transferred) or Students completed the program (graduated) Why did we use success and not retention? 11
12 Models Decision Tree Naive Bayes Classifier K Nearest Neighbor Classifier Neural Network Support Vector Machine Bagging Boosting Random Forest Probit Regression Logistic Regression 12
13 DATA Combined Fall first year, full time, degree seeking students data There was no difference in the success status and cohort year (Chisquare test, p = 0.72). In this study, we included first time regular students and transfer students whose attempted transferred credits 6. 13
14 Predictor Variables Age Gender (Female/Male) College Ready Status (Underprepared/Ready) if a student needed to take at least one college readiness course, then the student was categorized as Underprepared. Pell Status (No/Yes) Late Registration Status (before or on/after August 1st) Underrepresented Status (No/Yes) Admission Status (Regular/Transfer) 14
15 Predictor Variables # Attempted Credit Hours Enrolled Semester GPA Enrolled Semester Credit Completion Rate # complete observations =
16 How to Compare the Models? 16
17 How to Estimate a Model Performance? Simple Cross Validation Original Data N = 622 Training (for modeling) N1 = 311 Testing (for validation) N2 =
18 How to Compare the Results? We propose to use these three indicators: Overall Classification Rate Sensitivity Prob. that the model classified as Successful when given to a group of students who were actually Successful Specificity Prob. that the model classified as Unsuccessful when given to a group of students who were actually Unsuccessful 18
19 Results and Conclusion 19
20 Results Method Classification Sensitivity Specificity Logistic Regression 81.7% 89.8% 67.8% Probit Regression 81.7% 90.8% 66.1% Decision Tree 81.4% 89.3% 67.8% Naive Bayes Classifier 82.3% 90.3% 68.7% K Nearest Neighbor (5) 82.3% 90.3% 68.7% Neural Network 81.4% 86.7% 72.2% Support Vector Machine 82.6% 92.9% 65.2% Bagging 82.3% 89.8% 69.6% Boosting 78.9% 87.8% 63.5% Random Forest 83.0% 92.9% 65.2% 20
21 Conclusion and Future Research We are going to use a Neural Network model and a bagging model. December These models will be applied to the actual data. The list will be sent to the Student Services. January The Student Services will start contacting to the students. Interventions (what types?) Next Summer and Fall Summer Conduct Retention Study Fall Evaluate the models 21
22 Conclusion and Future Research (contd.) Current Model Need Students First Semester Academic Performance Data Interventions: Spring Semester Ideal Model Can run before the Fall semester begins Need other predictors: i.e.) FA data? Interventions: Fall Semester 22
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Community College Transfer Students Persistence at University
Community College Transfer Students Persistence at University Alexandra List Denise Nadasen University of Maryland University College Presented at Northeast Association for Institutional Research, November
Higher Education State Fact Book Louisiana Board of Regents 2015 2016 NOTE: Data is subject to change Revised 05/13/2016
Higher Education State Fact Book Louisiana Board of Regents 2015 2016 NOTE: Data is subject to change Revised 05/13/2016 Table of Contents Methodology... 3 State... 4 Four Year and Specialized Universities...
Dawn Broschard, EdD Senior Research Analyst Office of Retention and Graduation Success [email protected]
Using Decision Trees to Analyze Students at Risk of Dropping Out in Their First Year of College Based on Data Gathered Prior to Attending Their First Semester Dawn Broschard, EdD Senior Research Analyst
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
Co-Curricular Activities and Academic Performance -A Study of the Student Leadership Initiative Programs. Office of Institutional Research
Co-Curricular Activities and Academic Performance -A Study of the Student Leadership Initiative Programs Office of Institutional Research July 2014 Introduction The Leadership Initiative (LI) is a certificate
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08
EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08 PURPOSE Matthew Wetstein, Alyssa Nguyen & Brianna Hays The purpose of the present study was to identify specific
THE UNIVERSITY OF NORTH CAROLINA REMEDIAL/DEVELOPMENTAL ACTIVITIES REPORT 2011-12. The University of North Carolina General Administration
THE UNIVERSITY OF NORTH CAROLINA REMEDIAL/DEVELOPMENTAL ACTIVITIES REPORT 2011-12 The University of North Carolina General Administration February 2013 1 Remedial/Developmental Activities in UNC Institutions:
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Data Mining: STATISTICA
Data Mining: STATISTICA Outline Prepare the data Classification and regression 1 Prepare the Data Statistica can read from Excel,.txt and many other types of files Compared with WEKA, Statistica is much
National Center for Education Statistics
National Center for Education Statistics IPEDS Data Center Rensselaer Polytechnic Institute UnitID 194824 OPEID 00280300 Address 110 8th St, Troy, NY, 12180-3590 Web Address www.rpi.edu Institution Characteristics
Higher Education Data Warehouse Conference
Higher Education Data Warehouse Conference Collaboratively Delivering a Student Retention & Success Model with Agile Style Presenters Mary Brooks Project Manager Denise Krallman Director of Institutional
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, [email protected]) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
FYE Accelerated English Courses El Camino College Fall 2010
FYE Accelerated English Courses El Camino College Fall 2010 In Fall 2010, the First Year Experience (FYE) program offered accelerated English courses where students enrolled in back to back 8 week reading
Attrition in Online and Campus Degree Programs
Attrition in Online and Campus Degree Programs Belinda Patterson East Carolina University [email protected] Cheryl McFadden East Carolina University [email protected] Abstract The purpose of this study
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
Evaluation of College Possible Postsecondary Outcomes, 2007-2012
Evaluation of College Possible Postsecondary Outcomes, 2007-2012 Prepared by: Caitlin Howley, Ph.D. Kazuaki Uekawa, Ph.D. August 2013 Prepared for: College Possible 540 N Fairview Ave, Suite 304 Saint
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Application of Predictive Analytics to Higher Degree Research Course Completion Times
Application of Predictive Analytics to Higher Degree Research Course Completion Times Application of Decision Theory to PhD Course Completions (2006 2013) Rachna 1 I Dhand, Senior Strategic Information
Grants Work-Study Loans Less costly State or federal funds are leveraged with earnings students receive. jobs, usually on campus
Financial Aid What are the types of financial aid? Federal or state financial aid generally is one of three types: grants, work-study, or loans. The general characteristics of the three types of aid are
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Louisiana Tech University FIVE-YEAR STRATEGIC PLAN. FY 2017-2018 through FY 2021-2022
Louisiana Tech University FIVE-YEAR STRATEGIC PLAN FY 2017-2018 through FY 2021-2022 July 1, 2016 DEPARTMENT ID: 19A Higher Education AGENCY ID: 19A-625 Louisiana Tech University Louisiana Tech University
San Jose City College Strategic Planning Performance Indicators
Strategic Planning Goal 1: Student Success Regularly evaluate all academic programs and student services and determine how well they promote student success and specifically how they will be improved to
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
Student Profile -Statistics on enrollment at University of Florida
Page 26 AGENDA ITEM: XI Florida Polytechnic University Board of Trustees December 2, 2015 Subject: 2015 Student, Faculty and Staff Profile Proposed Board Action No Action Required- Information Only Background
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
This presentation was made at the California Association for Institutional Research Conference on November 19, 2010.
This presentation was made at the California Association for Institutional Research Conference on November 19, 2010. 1 This presentation was made at the California Association for Institutional Research
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Climate Change: Transforming a Summer Bridge Program with Student Support
Climate Change: Transforming a Summer Bridge Program with Student Support Program Overview Session outcomes Climate change indicators in Higher Education Research indicators Institutional approaches to
CS 207 - Data Science and Visualization Spring 2016
CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler [email protected] An introduction to techniques for the automated and human-assisted analysis of data sets. These
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention
, 225 October, 2013, San Francisco, USA Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention Ji-Wu Jia, Member IAENG, Manohar Mareboyana Abstract---In this paper, we have
Distance Learning Report
Dramatic Increases in Distance Learning Enrollments Continue at Ohio s Public Colleges and Universities Distance learning enrollments at the University System of Ohio s colleges and universities continued
The Influence of a Summer Bridge Program on College Adjustment and Success: The Importance of Early Intervention and Creating a Sense of Community
The Influence of a Summer Bridge Program on College Adjustment and Success: The Importance of Early Intervention and Creating a Sense of Community Michele J. Hansen, Ph.D., Director of Assessment, University
ANNUAL PROGRAM LEARNING OUTCOMES ASSESSMENT SUMMARY REPORT
ASSESSMENT SCHEDULE - Due: December 16, 2009 1. Past Year: Indicate which program level learning outcomes assessment or any other assessment projects you completed in the 2008-2009 academic year. Learning
Institutionalizing Change to Improve Doctoral Completion
Institutionalizing Change to Improve Doctoral Completion Ph.D. Completion Project Interventions CGS July 2010 Dr. Judith Stoddart, Assistant Dean Dr. Karen Klomparens, Dean, Overall Goals Raising awareness
Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization
Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization Department of Integrated Systems Engineering The Ohio State University (Expected Duration: Semesters) Our society
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Community Colleges Accountability Measures and Definitions
PARTICIPATION -- KEY MEASURES 1. Enrollment Fall Headcount Definition: Unduplicated fall enrollment by race/ethnicity, age, full-time/part-time, academic/technical and residency status (in-district, out-of-district,
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
CHANGE/DECLARE A MAJOR/MINOR
CHANGE/DECLARE A MAJOR/MINOR Id: Reg.003 Change or Declare a Major/Minor Contact: Office of Registrar Last Modified: May 15, 2013 CHANGE/ DECLARE A MAJOR/MINOR DECLARE A MAJOR POLICY Undergraduate Students
ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis
ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational
USING PREDICTIVE ANALYTICS TO UNDERSTAND HOUSING ENROLLMENTS
USING PREDICTIVE ANALYTICS TO UNDERSTAND HOUSING ENROLLMENTS Heather Kelly, Ed.D., University of Delaware Karen DeMonte, M.Ed., University of Delaware Darlena Jones, Ph.D., EBI MAP-Works Predictive Analytics:
Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
Understanding the leaky STEM Pipeline by taking a close look at factors influencing STEM Retention and Graduation Rates
Understanding the leaky STEM Pipeline by taking a close look at factors influencing STEM Retention and Graduation Rates Di Chen Institutional Research Analyst Heather Kelly Director of Institutional Research
MASSACHUSETTS COLLEGE OF ART AND DESIGN 2014 PERFORMANCE REPORT / April 2015
MASSACHUSETTS COLLEGE OF ART AND DESIGN 2014 PERFORMANCE REPORT / April 2015 1 / MASSACHUSETTS COLLEGE OF ART AND DESIGN / PERFORMANCE REPORT 2014 Garvin Hill Jamie Lewis 15 Industrial Design 2 / MASSACHUSETTS
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
National Community College Benchmark Project: Peer Institution Comparison
National Community College Project: Peer Institution Comparison IR Report # 239 Office of Institutional Research November Overview of the National Community College Project Responding to the needs for
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Predictive Analytics for Student Success
Predictive Analytics for Student Success A Data Sharing Initiative Between Two Community Colleges and an Online University An Executive Report Presented by University of Maryland University College Funded
Excess Units in Pursuit of the Bachelor s Degree
Excess in Pursuit of the Bachelor s -- An Analysis on Native Freshmen Graduated during 2009-2010 Office of Institutional Research Sacramento State July-September, 2011 This report intends to examine the
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Predicting Student Academic Performance at Degree Level: A Case Study
I.J. Intelligent Systems and Applications, 2015, 01, 49-61 Published Online December 2014 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2015.01.05 Predicting Student Academic Performance at Degree
RANGER COLLEGE FACT BOOK
RANGER COLLEGE FACT BOOK Spring 2014 INSTITUTIONAL DEMOGRAPHICS TABLE OF CONTENTS History of College Location General Information Chart #1 Chart #2 Chart #3 Chart #4 Chart #5 Top 15 Fastest Growing Public
