PREDICTING STUDENT RETENTION & SUCCESS IN ONLINE PROGRAMS William Bloemer & Karen Swan UNIVERSITY OF ILLINOIS SPRINGFIELD
The rarely articulated implication of all of this data floating around is that un-augmented human cognition is no longer sufficient. Every day, every day, there is more to know, more ways to know it, and heightened expectations by students, faculty members, alumni, football coaches, trustees, regulators, elected officials that senior managers will do something efficacious with what they know.... The best tool in [their] battle against ignorance is advanced analytics. -- THORTON MAY, 2011 c
The calls for more accountability in higher education, the shrinking budgets that often force larger class sizes, and the pressures to increase degree-completion rates are all raising the stakes for colleges and universities today, especially with respect to the instructional enterprise. As resources shrink, teaching and learning is becoming the key point of accountability. -- MALCOLM BROWN & VERONICA DIAZ, 2011
75% GRADUATE HIGH SCHOOL 67% ENROLL IN POST-SECONDARY PROGRAMS
37% ENROLL IN 2-YEAR PROGRAMS 63% ENROLL IN 4-YEAR PROGRAMS 11.7% GRADUATE IN 3 YEARS 51.8% GRADUATE IN 6 YEARS WHICH MEANS THAT OF ALL THE STUDENTS THAT ENTER HIGH SCHOOL, ONLY 18.5% GRADUATE FROM A POST- SECONDARY INSTITUTION IN ONE & 1/2 TIMES THE SUGGESTED TIME TO DEGREE
AND FULLY 81.5% DO NOT OBTAIN A POST- SECONDARY DEGREE IN SOMETHING APPROACHING A REASONABLE AMOUNT OF TIME In 2008, adults with a bachelor s degree earned, on average, about 81% more than high school graduates. In 2009, the unemployment rate for high school dropouts was more than twice as high as the unemployment rate for college graduates. By 2018, 63% of all American job openings will require some sort of postsecondary education.
Most college students today are non-traditional. Most attend non-selective institutions. Just 14% live on campus. One-third work full-time, and another 44% work part-time. 60% of students who earn degrees, earn them from different institutions than the ones in which they started.
PREDICTIVE ANALYTICS marries large data sets, statistical techniques, and predictive modeling. It could be thought of as the practice of mining institutional data to produce actionable intelligence. -- CAMPBELL, DEBLOIS, & OBLINGER, 2010
LEARNING ANALYTICS is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. -- 1ST INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS AND KNOWLEDGE, 2011
LEVELS OF ANALYSIS EXPLORATORY STATISTICS INFERENTIAL STATISTICS DESCRIPTIVE STATISTICS Comprehensive Higher confidence level for prediction 1% of solutions Single system Low confidence levels for prediction 9% of solutions Single system Subjective interpretation 90% of solutions
LEARNING ANALYTICS explores existing situation whole population establishes variables of interest informs decision making EXPERIMENTAL RESEARCH tests hypotheses sample of a larger population random assignment of subjects to experimental & control conditions establishes theories of causation
PROMISE promise is to help, to identify problems so they can be addressed struggling students everywhere can benefit cost can be saved efficiencies created hard questions answered
framing problem Walmartization of OTL research Do we lean on the numbers too much? What is lost? (privacy, security, intellectual property) Data don t make decisions, people do instrumental reason PERILS
PREDICTIVE ANALYTICS REPORTING (PAR) PROGRAM WCET initiative funded by Bill and Melinda Gates Foundation Federation of student and course data from multiple, very different institutions American Public University Systems, Colorado Community College System, University of Hawaii System, Rio Salado College, University of Phoenix, University of Illinois Springfield
PAR GOALS Demonstrate that this CAN be done Explore any patterns across institutions Development of future directions (variables) to be included in research PAR: OPERATIONAL ISSUES ANALYTICAL/METHODOLOGICAL -- operationalizing progression, retention & completion -- varies by institution. ORGANIZATIONAL -- academic semesters/periods vary greatly by institution, including whether or not multiple courses pursued simultaneously.
PAR variables student level variables (n=661,705) ID institution date of birth gender Race ethnicity non-res. alien? military class veteran? degree type CIP code (major) degree start date multiple majors? Inst. course completes transfer credits program changes prior deg completes degree hrs attempted degree hrs completed deved attempted deved completed course level variables (n>3,000,000) total course extensions previous term GPA prior term withdrawals concurrent courses academic level deved course? course size course start date course end date course grade outcome variables -- academic status; course grade
FINDINGS There was no apparent relationships existed between age, gender, or ethnicity as a function of the student s risk profile; For students at-risk, disenrollment was influenced by the number of concurrent courses in which that student was enrolled; For students not-at-risk of disenrollment, institutionspecific factors predicted student success.
Who will get F, W? Measures: - Likelihood (OT MC) - Classification Table Get Best: - Model - Threshold Higher is better Predictor 2 2,985 31551 total registrations Predictor 1
Who will get F, W? Predictors - Major, Level 0 of 0 correct OT MC = 747
Who will get F, W? Predictors - Major, Level - Course but how? 0 of 0 correct OT MC = 747
Who will get F, W? Predictors - Major, Level - Course Subject Level 0 of 0 correct OT MC = 1348
Who will get F, W? Predictors - Major, Level - Course Individual Courses 21 of 39 correct OT MC = 3008 0.7% identified. Identified cases 53.8% correct.
Distribution of course coefficient values 200 180 160 140 Number of courses 120 100 80 60 40 20 0-25 -20-15 -10-5 0 5 Coefficient value
Who will get F, W? Predictors - Major, Level - Course Individual Courses w/ FW rates > average 17 of 33 correct OT MC = 2452 0.6% identified. Identified cases 51.5% correct.
Who will get F, W? Predictors - Major, Level - Course Course history - FW rates - Average GPA Prior 2 years 1 of 11 correct OT MC = 1358 0.0% identified. Identified cases 9.1% correct.
Why 2 years? 1.2 Quality of fit by years of history used 1 0.8 OT MC 0.6 0.4 0.2 0 0 1 Years of history Missing values? default by course level Most courses offered at least once every two years Older information not always useful
How to deal with course differences? Predictors Subject, Level All courses Courses w/ high FW rates Course history OT MC 1348 3008 2452 1358 D.O.F. 45 899 306 2 for this
Student s Prior GPA, but over how many years? Quality of fit by years of history used 3500 3450 OT MC 3400 3350 3300 0 1 2 3 4 5 6 Years of history Last annual GPA 3527
Who will get F, W? Predictors - Major, Level - Individual Courses (w/ high FW rates) - Last Annual Prior GPA 239 of 427 correct OT MC = 3527 8.0% identified. Identified cases 56.0% correct.
Who will get F, W? Predictors - Major, Level - Individual Courses (w/ high FW rates) - Last Annual GPA - Last Annual FW hours and FW rate 347 of 598 correct OT MC = 3843 11.6% identified. Identified cases 58.0% correct.
What else? Factor OT MC increase Factor OT MC increase Registration order 103 Double or changing majors Degree Seeking 2 Course Load 29 Is Major 37 Course Size 13 OGM_OLC 113 Changes in GPA 24 Prior Hrs 14 Newby or previous issues online 1 62 306 for them all combined
Who will get F, W? Predictors - Major, Level - Individual Courses (w/ high FW rates) - Last Annual GPA - Last FW hours and FW rate - Interactions GPA x Course (~10%) 418 of 674 correct OT MC = 4128 14.0% identified. Identified cases 62.0% correct.
Who will get F, W? Predictors - Major, Level - Individual Courses (w/ high FW rates) - Last Annual GPA - Last FW hours and FW rate - Interactions GPA x Course (~10%) Hrs FW x Course (~10%) FW rate x Course (~10%) 532 of 788 correct OT MC = 4548 17.8% identified. Identified cases 67.5% correct.
Who will get F, W? Predictors - Major, Level - Individual Courses (w/ high FW rates) - Last Annual GPA - Last FW hours and FW rate Interactions GPA x Course (~10%) Hrs FW x Course (~10%) FW rate x Course (~10%) - Other factors OGM_OLC Registration Order Is major New online, etc 573 of 850 correct OT MC = 4958 19.2% identified. Identified cases 67.4% correct.
Effect of Threshold on prediction accuracy Lower flags more cases both right and not 100% 90% 80% 70% 60% 0.6 0.5 threshold 0.4 % Accuracy 50% 0.3 40% 0.2 30% 20% 10% 0.1 0.05 threshold 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % of Total FW's identified
At very low 0.05 level threshold 2% FW 18% FW
Overfitting? Random 75% calibration sample. 25% test sample Simple Model % FW s identified % Accuracy Total 7.3% 51.6% 75% split 6.9% 52.0% 25% check 7.3% 54.9% Complex Model % FW s identified % Accuracy Total 18.1% 67.5% 75% split 19.5% 69.2% 25% check 15.3% 53.2%
Effect sizes Factor Registration order Prior FW online 7% Not Major 36% Newby online 38% OGM_OLC OGM_OGC Last GPA +0.656 (1 s) Last FW rate (+15%, 1 s) 4% Last FW hours (+3, 1 course and 1 s) Odds ratio change for FW Last in 82% more likely than first in 40% vs OLM_OLC 31% less vs OLM_OLC 39% less 79%
Decision trees
Decision trees
Where does it fit? Many Institutions PAR Across Single Institution Here Parts of Institution Level, Dept, Course Individual Student Impact of findings Compromises with data Limit predictive power More & better data Better predictive power
bbloe1@uis.edu kswan4@uis.edu