Developing Data- Driven Predictive Models of Student Success Kresge Data Mining Project Phase Two Report University of Maryland University College
Report Table of Contents Executive Summary... 1 Introduction... 3 Research Goals... 3 Section 1: General Grant Overview... 4 Section 2: Key Findings and Conclusions from Phase 1... 5 Objectives and Milestones... 6 Section 3: Relevant Literature... 7 Section 4: Data Sources... 10 Section 5: Overview of Research Design and Target Variables... 10 Section 6: Key Findings from Data Mining... 12 Research Goal 1: Profile students based on community college course taking behaviors... 12 Figure 1. Change in GPA for students retained or not retained at UMUC... 14 Table 1. Community college grade distributions for students successful or not at UMUC... 15 Table 2. Community college grade distributions for students retained or not at UMUC... 16 Figure 2. Success quadrants... 16 Figure 3. Likelihood of community college course selections for Stars... 18 Figure 4. Likelihood of community college course selections for Strivers and Slackers... 19 Figure 5. Likelihood of community college course selections for Splitters... 20 Figure 6. Binned number of community college credits by community college GPA for each success profile... 22 Figure 7. Binned number of community college credits by UMUC GPA for each success profile... 23 Figure 8. Binned number of community college credits by delta GPA for each success profile... 24 Table 3. Community college credits and GPA and UMUC GPA by success profile... 25 Section 7: Key Findings from Predictive Analyses... 26 Research Goal 2: Identify demographic profiles of MC and PGCC students transferring to UMUC 26 Table 4. Description of demographic and community college course taking background data clusters for Montgomery College... 27 Table 5. Description of demographic and community college course taking background data clusters for Prince George s Community College... 28 Research Goal 3: Determine MC and PGCC transfer students performance at UMUC... 30
Report Table 6. UMUC first term GPA... 30 Research Goal 4: Identify demographic and community college background factors predicting course success at UMUC... 30 Table 7. Summary of predictors for logistic regressions predicting overall GPA and success in specific courses... 32 Research Goal 4a: Examine demographic, community college background factors, and course efficiency as predicting course success at UMUC... 33 Table 8. Courses taken at community college by institution... 34 Table 9. Success at UMUC by coursework taken or not taken... 34 Table 10. Course efficiency rates differentiated by types of courses taken or not... 34 Table 11. Results of multivariate logistic regression analysis of success at UMUC... 35 Research Goal 4b: Examine demographic, community college background factors, and change in GPA as predicting retention at UMUC... 36 Table 12. Results of multivariate logistic regression analysis of retention at UMUC... 36 Research Goal 5: Investigate predictors of behaviors in WebTycho and success at UMUC... 37 Table 13. Description of WebTycho activity clusters for Montgomery College... 39 Table 14. Description of WebTycho activity clusters for Prince George s Community College... 40 Table 15. Summary of top ten predictors of success at UMUC and WebTycho cluter membership... 42 Section 8: Summary of Results... 44 Section 9: Research and Intervention Planning in Phase 3... 46 References... 50 Appendices... 52
Report Page 1 Executive Summary This report documents analyses and findings completed in Phase 2 of the Kresge Data Mining Grant: Developing Data- Driven Predictive Models of Student Success. This grant was awarded to University of Maryland University College (UMUC) in collaboration with two community college partners: Montgomery College (MC) and Prince George s Community College (PGCC). The purpose of the grant was: 1. To build an integrated database tracking students across institutions, from community college to UMUC. 2. To use predictive statistical models and data mining techniques to track and model students progress across institutions. 3. To identify factors predictive of students success at UMUC that may inform the development of interventions aimed to improve outcomes for undergraduate students transferring from community colleges to UMUC or other four- year institutions. In Phase 1 of the grant UMUC, in collaboration with partner institutions, designed and developed a database, the Kresge Data Mart (KDM), with records of more than 250,000 students. This database includes information on student demographics, academic performance at UMUC and the community college, and student behaviors in courses hosted in WebTycho, UMUC s propriety online learning management system. Key results from Phase 1 included a literature review of publications on students performance in online courses, successful course completion, re- enrollment, and retention. Further, literature on data mining techniques in higher education was examined. The literature review showed that factors such as the number of schools students attended, the number of credits students transferred, and the students community college GPA were associated with successful course completion and retention. Regression analyses determined that students online classroom activities prior to the start of a class and during the early weeks of the course were predictive of successful course completion. In Phase 1, three goals for the project were identified: 1. Validate the predictive models and data mining techniques explored in Phase 1 on an expanded dataset. 2. Build profiles of successful students and their online learning behaviors. 3. Develop interventions to improve the success of students transferring from community colleges to UMUC. The above three goals were accomplished in Phase 2, which involved examination of students demographic profiles, course work from the community colleges, and performance at UMUC. A variety of methodologies were used to identify predictors of students success and retention. These include: 1. Cluster analyses to determine profiles of students based on demographic factors and community college course- taking backgrounds.
Report Page 2 2. Logistic regression to examine demographic factors and variables associated with students community college course- taking histories to predict success at UMUC. 3. Cluster analyses to determine profiles of students online behaviors in courses at UMUC. 4. Data mining techniques to identify profiles in the student population based on GPA and re- enrollment. Community college grade distributions and course taking preferences for these different groups of students were examined. In addition to predicting outcomes associated with success, analyses in Phase 2 determined a variety of trends characterizing the student population and developed student profiles based on demographics, prior academic work, and online classroom behavior. The primary outcome measures of interest in Phase 2 include students success at UMUC, defined as earning a first term GPA of 2.0 or above and students retention at UMUC within 12 months following their first academic term. Key findings are presented below. 1. Across studies, age and marital status were associated with success at UMUC. Older, married students are more likely to succeed, perhaps indicative of students maturity or a stronger commitment to their educational goals. 2. Four success profiles of students at UMUC were identified based on students GPA and re- enrollment. Profiles differed in terms of community college course taking preferences and course load, and in the change in GPA when transferring to UMUC. Again, these results suggests that the degree of student preparedness, particularly in specific target areas (e.g., accounting, economics), is predictive of success at UMUC. 3. Course efficiency, the ratio of credits earned to credits attempted, in the community college was determined to be a predictor of success at UMUC. The higher the course efficiency, the more likely a student will succeed. 4. A new factor, delta GPA, was introduced in these analyses, corresponding to the difference between students GPA at the community college and at UMUC. While most students experienced a decreased GPA when transferring to UMUC, the magnitude of this decrease was predictive of students continued enrollment at UMUC, beyond the first term (i.e., retention) 5. Similarly, students who took math or honors courses in community college were more likely to succeed at UMUC, suggesting that rigor of community college courses may prepare students to succeed at a university. 6. Students behaviors in the online classroom indicated high variability in the extent to which they engage in course content and course- related activities. A substantial percentage of students accessed course content and course materials to a limited extent, thus impacting successful course completion. Based on findings in Phase 2, interventions aimed at promoting success of transfer students at UMUC are presented. These interventions differ in the audience targeted and whether they provide social support (e.g.., peer mentor) or academic support (e.g.., check- list) to promote student success. Further, long- term initiatives to promote student success that have been developed collaboratively with partner institutions are introduced.
Report Page 3 Introduction The purpose of this report is to document work done by UMUC, MC, and PGCC on the Kresge Data Mining Grant: Developing Data- Driven Predictive Models of Student Success. This report has three primary purposes: 1. To review prior work completed on the Kresge Data Mining Grant in Phase 1. 2. To document work completed in Phase 2 of the grant, expanding on findings from Phase 1. 3. To introduce research- driven future directions and interventions aimed at promoting transfer students success at UMUC; the evaluation of these interventions will be undertaken in Phase 3 of the Kresge grant. The research in this report has been conducted by the UMUC Institutional Research Office. Research from Phase 2 has been documented in detail. This report presents the research in nine sections: Section 1: General grant overview Section 2: Key findings and conclusions from Phase 1 Section 3: Relevant literature Section 4: Data sources Section 5: Overview of research design and target variables Section 6: Key findings from data mining Section 7: Key findings from predictive analyses Section 8: Summary of results Section 9: Research and intervention planning in Phase 3 In Phase 2, five key research goals were accomplished. Specifically, researchers were able to: 1. Profile students at UMUC based on community college course taking behaviors. 2. Identify demographic profiles of MC and PGCC students transferring to UMUC. 3. Determine MC and PGCC transfer students performance at UMUC. 4. Identify demographic and community college background factors predicting course success at UMUC. a. Examine demographic, community college background factors, and course efficiency as predicting course success at UMUC. b. Examine demographic, community college background factors, and change in GPA as predicting retention at UMUC. 5. Investigate predictors of behaviors in WebTycho and success at UMUC.
Report Page 4 Section 1: General Grant Overview Grant Partnership UMUC is a four- year public university that offers online degree programs to a diverse population of working adults. With support from this grant, UMUC established partnerships with two Maryland community colleges that also serve large and diverse student populations. Montgomery College (MC), established in 1946, enrolls over 60,000 students annually. Prince George s Community College (PGCC) enrolls more than 40,000 students from approximately 128 different countries. Both institutions serve the metro- D.C. area, but differ in that PGCC serves more low income students. Both institutions have endorsed the goals of this project and are committed to working with UMUC to find ways to promote student success throughout their academic careers. Financial Support The Kresge Foundation awarded UMUC a $1.2 million grant to build an integrated database, explore data mining techniques, build predictive models of student success, implement and evaluate intervention strategies that are designed to improve student success, and disseminate the results of this research to national constituents. In Phase 1 of the research study, approximately 41% of total grant funds were expended on purchasing hardware to house the data- mining database, collecting data from partner institutions, and to provide dedicated salaries for a data mining specialist and a graduate assistant. Additional staff resources were provided in kind by UMUC. In Phase 2, UMUC expended funds for additional data collection, data mining consulting, and conferences presentations. (See Appendix A for the financial statement.) In Phase 3, expenses are expected to total $400,000. These funds are intended to be spent on collecting additional data from the community colleges, additional data mining research, and implementing interventions, with a graduate student to coordinate the interventions. In addition, funds will support a national convening to present and discuss research findings on educational data mining, predictive modeling, and learner analytics.
Report Page 5 Section 2: Key Findings and Conclusions from Phase I In Phase 1, a Memorandum of Understanding (MOU) was negotiated and signed between UMUC and partner institutions in order to clarify the data security and parameters for use of this data in the research project. The MOU allows UMUC researchers to conduct research using individual student data while protecting student information and confidentiality. UMUC, in collaboration with partner community colleges MC and PGCC, designed, developed, and implemented a database of over 250,000 student records. The Kresge Data Mart (KDM) contains information on student demographics, academic performance at the community colleges and at UMUC, and student behaviors in the online classroom at UMUC. Key outcomes of Phase 1 included a literature review on students success in online courses. Further, literature about the use of data mining techniques in higher education was identified and reviewed; this literature is described in Section 3. Data mining determined that factors associated with successful outcomes included students prior academic work, namely the number of schools students attended, the number of credits students transferred, and students GPA in community college. These predictors were associated with both successful GPA and retention at UMUC. Additional findings from Phase 1 included that certain online course behaviors, such as opening and reading conference notes in the first four weeks of a course, were associated with course success, as was students engagement in the online classroom prior to the start of a class. The analyses in Phase 1 were focused on examining a large variety of factors to determine their value in predicting student success. These findings were used to develop initial predictive models of successful performance at UMUC. These predictive models were refined and validated in Phase 2. At the conclusion of Phase 1, three goals for the completion of the grant were identified: 1. Validate the predictive models and data mining techniques explored in Phase 1 on an expanded dataset. 2. Build profiles of successful students and their online learning behaviors. 3. Develop interventions to improve the success of students transferring from community colleges to UMUC. Objectives and Milestones Specific objectives and milestones are presented below for each stage of the research project. Objectives from Phase 1 and Phase 2 of the project are abridged with planned Phase 3 work further expanded. These objectives and milestones have been modified throughout the course of the project, but are consistent with grant requirements.
Report Page 6 Objectives Milestones Status Phase 1 April 2011 October 2012 Develop a Project Develop a project action and collaboration plan with the Complete Action Plan partnering agencies. Data Collection and Prepare a data universe (integrated database system) on CC Complete Preparation transfer students in the UMUC population (KDM) Understand variables; define student characteristics and Complete retention data; develop data dictionary. Data Analysis Conduct initial predictive analyses and employ data mining Complete techniques to identify factors contributing of students success Project Evaluation Conduct ongoing project evaluation. Take action on Complete identified areas for improvement. Phase 2 November 2012 October 2013 Develop and Analyze data and identify factors that predict success/failure. Complete Validate Analytic Models of Student Success Validate predictive analyses and models developed through data mining techniques to predict students success and retention at UMUC. Complete Disseminate Key Findings Develop Interventions Build student profiles based on analyses. Discuss results with Kresge Workgroup and share with advisory board. Discuss results with Project Partners and obtain feedback. Present key findings at national conferences on higher education Work with stakeholders at UMUC and CC partners to develop a list of potential interventions. Complete Complete Complete Ongoing Complete Project Evaluation Conduct ongoing project evaluation. Take action on Ongoing identified areas for improvement. Research Plan 3 Design and develop KDM2 to update and In progress Plan Phase 3 analyses on expanded integrated data. In progress Phase 3 November 2013 October 2014 Develop Review relevant literature on interventions that promote In progress Interventions student success in online learning. Develop an implementation plan and timeline for piloting of In progress interventions. Implement Pilot Interventions Implement and evaluate pilot interventions. Not yet started Disseminate Results on Interventions Develop and disseminate report on the pilot interventions Not yet started Phase 3 Analyses Develop and execute Phase 3 research plan In progress Report Findings Present key findings from Phase 3 analyses at national conferences; publish research in journals Not yet started Prepare written report of both Phase 3 analyses and full scope of Kresge grant work. Not yet started Dissemination of Results and Develop website and repository for educational data mining and student success. Not yet started Resources Host a national convening on data mining and learner analytics. Not yet started Project Evaluation Conduct final project evaluation. Not yet started
Report Page 7 Section 3: Relevant Literature The literature review discussed below addresses examinations of factors contributing to students success in online courses, research on the use of data mining techniques in educational research, and research on factors impacting the success and retention of non- traditional students. A review of published literature on students success in online courses, research on the use of data mining techniques in educational research, and research on factors impacting the success and retention of non- traditional students was undertaken to inform the development of interventions aimed at promoting students success. Online student success literature Current literature on student success focuses on student outcomes such as course success, course withdrawal, retention, and retention. For example, student variables such as student characteristics, previous course work, grades, and time spent in course discussions and activities may be useful in predicting course success (Aragon & Johnson, 2008; Morris & Finnegan, 2009; Morris, Finnegan & Lee 2009; Park & Choi, 2009). Course- level variables acquired from student login data from the learning management system may have predictive value in measuring course withdrawal (Willging & Johnson, 2008; Nistor & Neubauer, 2010). Student, course, program, and institution level variables such as student characteristics, number of transfer credits, final grade in any given course, experience in online environments, and course load may be useful in predicting re- enrollment and retention (Aragon & Johnson, 2008; Morris & Finnegan, 2009; Boston, Diaz, Gibson, Ice, Richardson & Swan, 2011). Although these studies showcase a variety of findings related to student success, the majority of studies in retention in online learning environments use traditional statistical or qualitative methods. Park and Choi (2009) point out that expansion of methods such as data mining may have utility when student, course, program, and institutional level variables are well defined and institutionally meaningful. Literature related to educational data mining focusses on exploratory research. Educational data mining literature Data mining is a method of discovering new and potentially useful information from large amounts of data (Baker & Yacef, 2009; Luan, 2001). Educational data mining is a subset of the field of data mining that draws on a wide variety of literatures such as statistics, psychometrics, and computational modeling to examine relationships that may predict student outcomes (Romano & Ventura, 2007; Baker & Yacef, 2009). In educational data mining, data mining algorithms are used to create and improve models of student behavior in order to better understand student learning (Luan, 2002). Data mining methods are most helpful for finding patterns already present in data, not necessarily in testing hypotheses (Luan, 2001). Baker and Yucef (2009) suggest that research in higher education should use a variety of algorithms, such as classification, clustering or association algorithms in determining relationships between variables. Although many definitions of these techniques exist in data mining literature, Han and Kamber (2001) offer the following definitions. Classification is the process of finding a set of models or functions that describe and distinguish data classes or concepts to predict a class of objects whose class label is unknown. Clustering analyzes data objects that are related to similar outcomes without consulting a class label. Association is the
Report Page 8 discovery of rules showing attribute value conditions that occur frequently together in a given set of data (Han & Kamber, 2001). Recent research suggests that these data mining algorithms can be used to examine variables related to student success. Yu, DiGangi, Jannach- Pennell, Lo, and Kaprolet (2010) used a classification algorithm to explore potential predictors related to student retention in a traditional undergraduate institution. In this study, the authors used a decision tree to explore demographic, academic performance, and enrollment variables as they related to student retention. This study revealed a predictable relationship between earned hours and retention, but also found that at this institution, retention was closely related to state of residence (in- state/out of state) and living location (on campus/off campus). The authors speculate that this finding points to the potential utility of online courses in improving retention for out- of- state or off- campus students. Despite these recent developments in exploring variables related to student success in traditional higher education settings, research using data mining techniques to uncover relationships among variables in online courses is limited in scope. This study is designed to fill this gap in the extant literature by utilizing data on online students who attended multiple institutions. Retention in Non- Traditional Student Populations Historically, research on student retention largely focused on the experiences of traditional students, until a seminal book by Tinto (1993) expanded on extant models of retention to consider which factors may impact the retention of non- traditional students. Across the literature, non- traditional students are considered to be those above age 26 or taking classes through non- traditional pathways, including distance and online learning. For both traditional and non- traditional students, retention was thought to be a consequence of students academic and social integration (Tinto, 1993). Other research has echoed the central role of social factors in predicting retention for non- traditional students, online, and distance learners (Boston, Diaz, Gibson, Ice, Richardson, & Swan, 2009). At the same time, the processes and policies that foster social integration in online environments are different from the factors that foster social connections in more traditional settings. For students enrolled in online courses, feelings of social integration may stem from learners and instructors conveying a sense of themselves through the use of para- language (i.e., emoticons), self- disclosure, humor or other verbal expressions of personal emotions and/or values (Boston et al., 2009). These behaviors are believed to result in open communication, trust, and group cohesion and are identified as necessary for successful collaboration (Boston et al., 2009). Using social network analysis, Dawson (2010) found that visualizing classroom interaction patterns could provide insights into the nature of interactions for high- versus low- achieving students completing an online course. Dawson (2010) determined that high- performing students primarily interacted with other high- performing students, and likewise, low- performing students were more likely to have interactions with other low- performing students. More importantly, in examining instructor- student interactions, instructors networked with high- performing students (81.7%) at significantly higher rates than they did with low- performing students (34.61%). Social connections in online learning may result in cognitive and learning gains as well. Rovai (2002) found a correlation between levels of engagement in the classroom community and increased levels of content learning and understanding; this was especially true for females.
Report Page 9 Theories of student retention have considered the contributions that student motivation and challenges that external barriers may present for students continued enrollment in college. Kember (1989) presents students decisions to re- enroll as the result of a cost- benefit analysis, wherein students compare the price of attendance and time- commitment associated with college attendance to the anticipated benefits of receiving a degree. Examinations of student retention have focused on two complimentary processes, those of persistence and attrition (e.g., Rovai, 2002); with positive academic variables associated with persistence and negative academic variables associated with attrition (Bean & Metzner, 1985). In predicting persistence, external factors, such as family and organizational support of the students academic efforts, played a major role in determining intent to persist, and course satisfaction and perceived relevance to students daily lives was a significant source of motivation to persist in college course work (Park & Choi, 2009). Predictive models of student retention have considered students background factors, such as previous GPA and academic performance (Bean & Metzner, 1985). Further, students use of web- based technologies positively impacted students engagement and retention for online learners (Chen, Lambert, & Guidry, 2010). Whereas the aforementioned studies focused on individual student factors predicting retention, Moore and Fetzner (2009) addressed the institutional characteristics that fostered commitment in non- traditional students. These factors included having a leadership culture that fosters commitment to student success and institutional policies and practices that incorporate student support services and technological support. For online learners, access to services and to support that meets their needs was found to be crucial (Moore & Fetzner, 2009). Further, student satisfaction, defined as students happy with their progress and with support received for learning, and with a perception that the knowledge they were learning was valuable, was predictive of retention. Faculty satisfaction, stemming from involvement in curricular design and training in the use of online technologies supporting learning, were found to be key to engagement and contributors to retention (Moore & Fetzner, 2009). The findings from the published literature, offers insights into (a) factors that may be modeled as predictive of students success, (b) techniques that may be used to investigate and model student success, and (c) areas, specific to the needs of non- traditional learners, that may be targeted for intervention.
Report Page 10 Section 4: Data Sources One of the key achievements of Phase 1 of the Kresge research grant was the development of the KDM, an integrated multi- institutional database that chronicles the prior academic work of transfer students. Data for the KDM came from four data systems: 1. Banner - Montgomery College s Student Information System 2. Datatel - Prince George s Community College s Student Information System 3. PeopleSoft UMUC s student information system. 4. WebTycho UMUC s propriety learning management system that records students activities in an online classroom. Demographic, academic, and enrollment data were collected on each student from each institution. In addition, transfer data and online classroom behavior data were included from UMUC. Demographic data included students gender, age, marital status, and race/ethnicity. Enrollment data included course registration, program of study or major, and student status. Academic data included information about students academic history prior to transferring to UMUC, such as course grades, repeated courses, and remedial coursework. Transfer data included the number of courses transferred, transfer GPA, and prior degrees earned. There were two sources for this data: community college data provided through the Kresge project, and UMUC transcript data. The latter may be incomplete because UMUC records contain information only on courses students chose to transfer to UMUC and equivalent to a UMUC course. Classroom behavior data was specific to each course and each WebTycho session. Each session recorded a login time, access to various modules within the classroom, and posting of or responding to conference notes. Each action that students made in the classroom was recorded and totaled for each session, defining student activity. The KDM served as the primary resource for all the analyses and findings for this research grant. Section 5 describes the research and methods for Phase 2. Section 5: Overview of Research Design and Target Variables In Phase 2, research was developed to comprehensively answer and expand on questions introduced during Phase 1 of the project. The findings from these knowledge sheets are summarized in subsequent sections. Section 6 of this report presents findings from data mining analyses focused on exploratory analyses identifying potential predictors of students success and retention at UMUC. The following questions were considered. 1. Which profiles of students at UMUC can be identified? 2. To what extent does community college course taking differentiate each success profile at UMUC? Section 7 of this report presents findings from predictive analyses, including cluster analyses and logistic regression, modeling factors in students demographic and community college course taking backgrounds that predict success at UMUC and validating specific predictors of students success identified in Section 6. The following questions were considered. 3. What are the demographic profiles of community college students transferring from MC and PGCC to UMUC?
Report Page 11 4. Which factors from students demographic profiles and course- taking backgrounds in CC predict success at UMUC overall, and in specific courses? 5. What kinds of online learning behaviors do students transferring to UMUC engage in? These questions encompassed examinations of students performance in community college overall (Research Questions 1 and 2) as well as in specific courses (Research Questions 2 and 4 ). The questions examined not only UMUC GPA but also reenrollment (Research Question 1) as a desired outcome variable, and considered not only performance but also process and learning behaviors at UMUC (Research Question 4). In addition, a number of possible predictors of success not previously considered, were included, such as students course efficiency in community college (the ratio of credits completed to credits attempted) and change in GPA (the difference between students community college and UMUC GPA). Student Population. The population of interest for the Phase 2 analyses was defined as first term undergraduate students transferring to UMUC from MC or PGCC between Spring 2005 and Spring 2012. Subsets of this population were drawn for subsequent analyses. Variables. In this report a number of outcomes are associated with student success: Course Success earning a final grade of A, B, or C in any course. Unsuccessful Course Completion earning a grade of D, F, FN, or W in a course. Student success students first term GPA of 2.0 or above. Re- enrollment enrollment in the immediate next semester after initial enrollment. Retention defined as re- enrollment at UMUC within 12 months after initial enrollment. The first term GPA cut- off point of 2.0 is based on current UMUC policies that define academic probation. On a 4- point scale, 2.0 corresponds to a C average.
Report Page 12 Section 6: Key Findings from Data Mining The findings presented in this section are a result of data mining efforts aimed at identifying factors contributing to students success and retention at UMUC. Data mining is an exploratory technique that identifies factors emerging from big data and allows iterative predictive models to be run, using a variety of algorithms and boosting techniques to improve prediction accuracy. In the data mining phase of the analyses, a large number and variety of models were run with the aim of predicting retention and student success at UMUC. The key models and factors identified through data mining are presented. A summary of models to be discussed in the results can be found in Appendix B, along with information about model fit. Research Goal 1. Profile students at UMUC based on community college course taking behaviors In these analyses two joint indicators of students success at UMUC were used: achievement at UMUC of a first- semester GPA of 2.0 or above and retention at UMUC. These indicators of success were used to create outcome profiles, and then a predictive model was built on the students prior academic work and demographic variables. Sample. The initial data set consisted of 14,218 students with a total of 187,697 course enrollments from Montgomery College, and 11,046 students and a total of 156,373 course enrollments from Prince George s Community College. The top 50 courses from each community college were determined and were organized by course subject area. These top 50 courses from each community college represented a sample from a total number of 1,404 PGCC courses and the 2,737 MC courses. As a result, the final dataset included 12,637 students and 108,237 enrollments. The number of students and a listing of all of the courses included in each data set are included in Appendix C. Methods. Data exploration was performed using IBM Modeler, SPSS, SAS JMP 10 Pro, and Excel. Data were transformed and new variables were created as needed. Transformations were performed in Modeler, JMP, and Excel. A variety of black box algorithms - neural nets, boosted trees, and Random Forests - were used to develop profiles of students success. Random Forests is a recently developed algorithm which provides strong data modeling, but its findings may not be readily interpretable. It built a large number of small trees and averaged the results. JMP s Bootstrap Forest, used on a dataset of variables derived solely from the community college data, provided a way of differentiating the likelihood of retention among those students with low UMUC GPAs. To evaluate effectiveness, these models were developed on a subset of the data and then applied to a different subset (a holdout dataset) that had not been used in the model building. The misclassification rate (the proportion of wrong predictions) was used to evaluate the effectiveness of the models. A number of other measurements of effectiveness were assessed, including lift, sensitivity, specificity, false positive rate and false negative rate. However, the models which performed well on the original dataset did not yield equally good results on the holdout dataset, indicating that the models were overfitting the data (i.e., they would not generalize well to other data).
Report Page 13 Four indices of model fit were used to compare and evaluate model quality. Performance indicators were calculated based on model fit for the validation data subset. Overall accuracy is the percentage of students correctly identified as successful or not successful. Accuracy improvement (lift) compares the accuracy of the model to the accuracy of predicting the majority case ( successful ) for everyone. Negative lift means the accuracy is worse than simply predicting the majority case for everyone. False positive rate is the percentage of not successful students identified as successful. False negative rate is the percentage of successful students identified as not successful. Results - Student retention Change in GPA. The first set of models used students retention as an outcome. The strongest predictor of student retention was change in GPA, computed by subtracting students community college GPA from their GPA in their first semester at UMUC. Values range from - 4.0 to +4.0 and were binned in intervals of 0.25. Among students who were retained within a year, only 40% experienced a drop in their GPA in their first semester at UMUC. By contrast, among students who were not retained within a year, 70% had experienced a decrease in their GPA.!The main finding is that, regardless of whether their UMUC GPA was above or below 2.0, students whose first- semester GPA at UMUC was lower than what it was at community college were less likely to demonstrate persistence at the university. Model 1 summary information is presented in Appendix B. The distribution of delta GPA is presented in Figure 1 on the following page.
Report Page 14 Figure 1. Change in GPA for Students Retained or Not Retained at UMUC 100% 90% 80% Change in GPA and Reten;on % of students retained or not 70% 60% 50% 40% 30% 20% 10% 0% Change in GPA from CC to UMUC (binned) Reten1on YES Reten1on NO
Report Page 15 Results - Student Success Demographic Factors. A variety of models, presented in Appendix B, were used to determine predictors of success. First, a model was developed to predict students success using demographic factors. Independent models predicting success at UMUC based on student demographics were run separately for MC and PGCC students. Model information is summarized in Models 2, 3, and 4 in Appendix B Community College GPA. Community college GPA was binned as being successful if greater than or equal to 2.0, or unsuccessful if less than 2.0. CC GPA was found to be a significant predictor of students success at UMUC GPA. Further, students success at UMUC was predicted by the percentage of A, B, and C grades that students received at community college. See Appendix B, Models 5, 6, and 7 for summary information. Distributions of community college grades for students classified as successful or not successful at UMUC are displayed in Table 1.! The main finding is that students who earned a UMUC first term GPA of 2.0 or above were more likely to have earned As at community college than students earning a UMUC first term GPA below 2.0.!Conversely, students who earned a UMUC first term GPA below 2.0 were more likely to have earned Fs or Ws at community college than students earning a UMUC first term GPA above 2.0. See Appendix B, Model 10 for summary information. The importance of students community college performance in predicting UMUC success was upheld through both data mining and predictive (Section 7) approaches. Table 1. Community College Grade Distributions for Students Successful or Not at UMUC (N=15890) CC grades (mean %) A grades B grades C grades D grades F grades W grades UMUC GPA 2.0 (10,871 students) 30% 27% 17% 6% 10% 11% UMUC GPA < 2.0 (5,019 students) 16% 20% 17% 7% 22% 19% Note: Grade distributions were computed based on the total number of course enrollments Similarly, distributions of community college grades for students classified as retained or not at UMUC are displayed in Table 2 on the following page.!no substantial differences were found when evaluating whether or not there were differential community college grade distributions for those students retained at UMUC within a year.
Report Page 16 Table 2. Community College Grade Distributions for Students Retained or Not UMUC. (N=15890) CC grades (mean %) A grades B grades C grades D grades F grades W grades Retention YES 26% 26% 17% 6% 12% 12% Retention NO 22% 23% 16% 6% 17% 16% In addition to independently considering these two outcomes of student success UMUC GPA and retention at UMUC researchers also examined these two predictors jointly. Thus, profiles of student success at UMUC were determined that classified students based on successful GPA and retention. All combinations of the two attributes were examined. Four quadrants were formed with students evidencing a high or low GPA, and being retained or not. These four Success Quadrants were named Stars, Strivers, Slackers, and Splitters. Each quartile is described in Figure 2 below. Figure 2. Success Quadrants
Report Page 17 The proportions of these groups are as follows: Success Quadrant Full dataset Top Community College courses only Stars 59% 62% Strivers 17% 16% Slackers 15% 13% Splitters 9% 9% Community College Course selections. We examined the specific community college course selections of students belonging to each of these profiles. Distributions varied between the community colleges, but overall Stars represented about 59% of all students. As such, if courses of study in community college were representative of success profiles, we would expect 59% of students in any academic subject area to be Stars when transferring to UMUC. For example, of students who took accounting at community college, 60% would be expected to be Stars. If the actual proportion of success profiles in any area of study differed significantly from the expected proportion, insight into these subject areas may prove informative for promoting student success at UMUC. In particular, comparing the proportions of Stars and Strivers in different areas of study could help identify attributes and course- taking behaviors that predict earning a successful GPA at UMUC. Similarly, comparing Strivers and Slackers could help identify attributes and course- taking behaviors that distinguish retention outcomes. A likelihood formula was devised to compare the percentage of students in each success profile taking courses in various subject areas. This formula measured the difference between the actual proportion and the expected proportion of students in each success profile enrolled in a given subject area. In its simplest form, the formula can be expressed as the following ratio: Actual proportion Expected proportion Expected proportion Actual proportion refers to the number of students in a particular success quadrant enrolled in a particular subject area, divided by the total number of students enrolled in that subject area. The expected proportion equals the total number of students in a particular success quadrant divided by the total number of students. The likelihood percentages were charted in Figures 3, 4 and 5. For ease of comparison, the academic subject areas on the X- axis are listed in the same order on every graph. For students in each success quadrant, the percentages on the Y- axis represents how more or less likely they were to have been enrolled in a particular subject at community college than they would have been based on their success quadrant alone, (i.e., whether or not students from a given success quadrant were over or under represented in a given subject area). Data points that are close to 0% indicate proportions near the expected level.
Report Page 18 Figure 3. Likelihood of Community College Course Selections for Stars 25% Rela4ve Likelihood of CC Subject Choices: Stars 20% % more or less likely to take this subject 15% 10% 5% 0% - 5% - 10% - 15% - 20% - 25% Stars
Report Page 19 Figure 4. Likelihood of Community College Course Selection for Strivers and Slackers 50% Rela4ve Likelihood of CC Subject Choices: Groups with UMUC GPA < 2.0 (Strivers & Slackers) 40% % more or less likely to take this subject 30% 20% 10% 0% - 10% - 20% - 30% - 40% - 50% Strivers Slackers
Report Page 20 Figure 5. Likelihood of Commuity College Course Selection for Splitters 25% Rela4ve Likelihood of CC Subject Choices: SpliJers % more or less likely to take this subject 20% 15% 10% 5% 0% - 5% - 10% - 15% - 20% - 25% Spli6ers
Report Page 21 As shown in Figures 3, 4 and 5 on the preceding pages:! Transfer students who took accounting, economics, or higher- level math classes in community college were more likely to earn a first- semester GPA of 2.0 or above at UMUC (i.e., classified as Stars or Splitters).!Students who took more classes in history, sociology, psychology, and similar social sciences were more likely to earn a GPA of less than 2.0 at UMUC.!The two low- GPA groups, Strivers and Slackers were less likely to take courses in subject areas that Stars took.!the Splitters, the smallest group, did not show a distinct pattern of course taking behavior. In the likelihood analyses described above, Strivers and Slackers showed nearly identical preferences (See Figure 4). The average numbers of classes students took and passed in each subject area were compared for Strivers and Slackers. On average, Slackers passed fewer classes in all of the subject areas preferred by Stars than did Strivers. In addition, Slackers are noticeably less likely than Strivers to take courses in several areas: Developmental English Business/management Sociology Psychology Quantitative Measures of Community College Course Taking. Beyond examining specific course enrollments, the role of the total extent of students community college course- taking backgrounds as a predictor of success was examined. A number of measures of quantity of course completions at community college were used. First, the total number of community college credits earned was binned into five buckets: Bin CC credits % of students 1 <12 20% 2 12 to <30 21% 3 30 to <45 13% 4 45 to <60 14% 5 60 32%
Report Page 22 As shown in Figure 6 below, the four success profiles show different patterns when the number of credits earned is compared to the average community college GPA. For all groups, more credits earned was generally associated with a higher GPA at community college; that is, students tend to earn a higher GPA as they accumulate more credits. However, students in the different success profiles have different starting and ending points for their GPAs Figure 6. Binned Number of Community College Credits by Community College GPA for each Success Profile
Report Page 23 The number of community college credits earned was not related to the student s GPA at UMUC, as shown in Figure 7 below.!however, the students GPA at UMUC was differentiated by the overall success profile to which they belonged. Figure 7. Binned Number of Community College Credits by UMUC GPA for each Success Profile
Page 24 We also compared the difference between the GPA at community college and UMUC (the delta GPA) to the binned number of community college credits earned. As shown in Figure 8 below, Stars and Splitters had the smallest change in GPA across all number of credits earned. On the other hand, Slackers and Strivers tended to have a greater difference in GPA with more credit earned. Slackers earned better grades than Strivers at community college and worse grades at UMUC. Slackers may have been less prepared for UMUC due to their course taking behavior at community college. (See Model 9 in Appendix B for model fit information). Figure 8. Binned Number of Community College Credits by delta GPA for each Success Profile In addition to community college courses completed, the average semester course load at the community college was considered. A student s community college course load was defined as the number of total credit hours completed divided by the number of terms during which a student was enrolled at the community college. The median number of credits per term was 8, and the median number of terms was 5. Given the wide range of enrollment histories, the median was considered a more meaningful average than the mean for these variables.
Page 25 When the data were sorted by the four UMUC success quadrants, intriguing differences emerged, as can be seen in Table 3. The low- GPA groups at UMUC averaged slightly more credits per term at the community college than the higher- GPA groups, but on average were enrolled for fewer terms. Low GPA at UMUC Table 3. Community College Credits and Community College and UMUC GPA by Success Profiles Stars Strivers Slackers Splitters Credits per CC term (median) 8 8.8 8.3 7.3 Number of CC terms (median) 5 4 3 4 Credits per CC term: range 0-25 0-21 0-19 0-18 Number of CC terms: range 1-30 1-31 1-29 1-20 CC GPA (mean) 2.5 1.8 1.8 2.6 UMUC GPA (mean) 3.2 0.5 0.2 3.1 In Phase 3 of the project, to be completed over the next year, we will derive and test a blended model considering both the number of courses students completed at community college and the percentage of Fs and Ws that students earned at community college as potentially predictive of students success at UMUC. (See Model 10 in Appendix B.) Though preliminary work shows that this model has adequate fit, the blended models require further exploration and testing. Data mining has been a fruitful technique for exploring the richness of the transfer student data available. However, because the majority of students at UMUC were largely successful, the algorithms maximized fit by placing all students into the largest grouping, thereby classifying all students as successful, and limiting the predictive power of the models. For this reason, some models demonstrated only modest fit and may have produced inflated rates of false positive classifications. Future efforts at data mining will need to determine a standard for acceptable model fit that parallels the construct of statistical significance.
Page 26 Section 7: Key Findings from Predictive Analyses Key findings from Phase 2 of the analyses are presented below. These analyses were undertaken based on exploratory findings from data mining described in Section 6. Those factors identified as potentially contributing to success at UMUC, were more comprehensively examined using traditional statistical methods, such as logistic regression. These methods were considered to be more robust analytic approaches than data mining. A summary of the predictive models analyzed are presented in the Appendix D. Research Goal 2: Identify demographic profiles of MC and PGCC students transferring to UMUC First, cluster analyses were used to determine the demographic and community college background profiles of students transferring to UMUC from MC and PGCC. Cluster analysis is a data mining technique that determines naturally occurring groups in a sample; separate cluster analyses were run for students transferring from Montgomery College and Prince George s County Community College. Sample. The target sample in these analyses consisted of stateside UMUC undergraduate students who transferred from MC or PGCC between Spring 2011 and Summer 2012 and who enrolled in WebTycho courses. A total of 806 MC students and a total of 566 PGCC students were included in the cluster analyses. Variables Included. Demographic data included student s age, gender, race, and marital status as well as the number of terms students had skipped enrollment. Community college course completion data included variables related to students year of enrollment, the subject, title, and catalog number of courses completed, final grades in each course, total credit hours billed, total credit hours attempted, and the location wherein students completed courses. Each course was also identified as being online or face- to- face and as honors or developmental as appropriate. Data were provided on whether students repeated courses. Analysis. K- means cluster analysis was chosen as the clustering method. K- means clustering is commonly used in such analyses and has been demonstrated to have computational efficiency. In order to run a cluster analysis, the K- means algorithm requires a number of desired outcome clusters to be input; for both the MC and the PGCC samples, a variety of cluster- numbers (i.e., 4, 5, or 6 cluster groups) were input and inspected to determine the best cluster structure. Though no qualitative differences between clusters were identified, by using sensitivity analyses, a five- cluster model was identified as optimal for both the MC and the PG samples, based on the number of students populating the smallest cluster. Results. Table 4 on the following page displays the 5- cluster solution groupings based on demographic profiles and academic background in community college, for students at Montgomery College (Table 4), while Table 5 presents the same groupings for students at Prince George s Community College. The tables show each cluster ID, the percentage of students classified into that cluster (size %), and the percentage of students within each cluster exhibiting the variable stated in the left column. For
Page 27 example, in the MC cluster analysis (Table 4), we can see that in Cluster 1, to which 39% of students were assigned, 96% of students received at least one A grade. The Importance column provides information about the relative importance of variables in determining cluster membership for a particular student. For example, the variable received at least one A grade, was more important in determining students cluster membership (importance = 1.00) than the variable took an online course, which contributed less in determining students cluster membership (importance = 0.19). Table 4. Description of Demographic and Community College Course Taking Background Data Clusters for Montgomery College (N=806) Cluster ID ALL 1 2 3 4 5 Size 806 316 116 122 176 76 Size (%) 100% 39% 15% 15% 22% 9% Importance Percent of Students in each cluster that: Received at Least One A 74% 96% 4% 91% 97% 8% 1.00 Have a Low GPA 17% 0% 66% 0% 0% 82% 0.82 Have a High GPA 14% 2% 0% 0% 62% 0% 0.82 Received at Least One F 9% 8% 11% 9% 5% 14% 0.55 Have a Low Course Load 17% 0% 86% 5% 17% 4% 0.51 Have a High Course Load 15% 9% 1% 48% 10% 16% 0.51 Received at Least One W 54% 93% 9% 50% 16% 55% 0.46 Have a Low Course 15% 0% 61% 2% 0% 62% 0.38 Efficiency Percent of Students that Took Speech Course 54% 93% 1% 47% 44% 11% 0.35 Percent of Students that Took an Online Course 48% 73% 8% 15% 58% 36% 0.19 Percent of Students Under 26 Old 32% 39% 12% 43% 24% 29% 0.06 Percent of Students Over 45 Years Old 7% 5% 13% 3% 11% 3% 0.06 Percent Married 15% 10% 18% 15% 22% 14% 0.03 Percent Single 45% 55% 30% 47% 31% 59% 0.03 Percent Female 50% 51% 49% 41% 55% 55% 0.00 Percent Male 49% 49% 51% 57% 44% 45% 0.00 Percent of Students Who Did Not Stop Out Between Institutions Percent of Student Who Took an Honors Course at MC Percent of Student Who Took an Online Course at MC Percent of Student Who Repeated One Course at MC 83% 81% 82% 85% 84% 87% 0.00 2% 3% 0% 3% 3% 0% 0.00 48% 73% 8% 15% 58% 37% 0.19 0% 0% 0% 0% 0% 0% 0.00
Page 28 Table 5. Description of Demographic and Community College Course Taking Background Data Clusters for Prince George s Community College (N=566) Cluster ID ALL 1 2 3 4 5 Size 566 67 179 143 58 119 Size (%) 100% 12% 32% 25% 10% 21% Importance Percent of Students in each cluster that: Received at Least One A 74% 6% 94% 100% 9% 83% 1.00 Have a Low GPA 14% 37% 0% 0% 91% 3% 0.78 Have a High GPA 16% 0% 14% 45% 0% 3% 0.78 Received at Least 10% 9% 8% 3% 19% 16% 0.70 One F Have a Low Course Load 17% 43% 2% 31% 16% 6% 0.14 Have a High Course Load 17% 7% 20% 13% 33% 16% 0.14 Received at Least One W 61% 21% 87% 27% 59% 87% 0.00 Have a Low Course Efficiency 12% 28% 0% 0% 78% 3% 0.48 Percent of Students that Took Speech Course 59% 6% 84% 48% 17% 82% 0.31 Percent of Students that Took an Online Course 43% 13% 100% 27% 33% 0% 0.58 Percent of Students Under 26 Years Old 23% 30% 30% 13% 38% 13% 0.06 Percent of Students Over 45 Years Old 14% 7% 12% 22% 5% 13% 0.06 Percent Married 17% 15% 17% 23% 10% 16% 0.04 Percent Single 56% 57% 63% 36% 78% 60% 0.04 Percent Female 65% 48% 73% 68% 55% 66% 0.00 Percent Male 35% 52% 28% 32% 45% 33% 0.00 Percent of Students Who Did Not Stop Out Between Institutions Percent of Student Who Took an Honors Course at PGCC Percent of Student Who Took an Online Course at PGCC Percent of Student Who Repeated One Course at PGCC 79% 82% 79% 80% 67% 83% 0.00 4% 1% 5% 5% 3% 5% 0.00 43% 13% 100% 27% 33% 0% 0.58 39% 4% 66% 2% 26% 70% 0.00 In addition to considering students cluster membership as a whole, Tables 4 and 5 allow us to consider demographic factors related to students course performance at the community college level. For example, among students transferring to UMUC from PGCC, 21% were classified into Cluster 5. For this cluster of students, a salient demographic factor may be that these students tend to be older (i.e., only 13% of these students were under 26 years old). Salient factors in their
Page 29 academic backgrounds may be that 87% of these students had withdrawn from at least one course and 70% of these students had repeated at least one course at PGCC. Clusters were named to aid in interpretability. The first cluster, identified across both MC and PGCC, were successful students. This cluster name corresponded to Cluster 4 at MC (22%) and Cluster 3 at PGCC (25%). The successful cluster was distinguished by including students who had received at least one A (MC: 97%; PGCC: 100%) and who had a high GPA (MC: 62%; PGCC: 45%). Conversely, there was an unsuccessful student cluster; or Cluster 5 in the MC sample (9%) and Cluster 4 in the PGCC sample (10%). Students classified into this cluster had received at least one F (MC: 19%; PGCC:14%) and had a low GPA (MC: 91%; PGCC: 82%). The remaining three clusters, were distinguished less by their course performance, and more by their course- taking load and course efficiency. In the overloaded cluster were students who both had a high course load (MC: 48%; PGCC: 20%) and had at least one W (MC:50%: PGCC:87%). This suggests that while these students were taking a significant number of courses and were capable of being successful (i.e., 91% of students at MC and 94% of students at PGCC in this cluster had at least one A), they were apparently struggling with course load, as indicated by the high percentage of students with withdrawals in this cluster. The overloaded cluster corresponds to Cluster 3 in the MC data (15%) and Cluster 2 in the PGCC data (32%). The next cluster, cautious course takers, was also distinguished by a high percentage of students receiving at least one W (MC: 93%; PGCC: 87%), while otherwise being successful (96% of these students received at least one A at MC and 83% received at least one A at PGCC). Unlike the overloaded cluster, students in this cluster were not distinguished by having a high course load, and may have been withdrawing from courses in which they were having specific difficulties. Interestingly, students in this cluster took speech courses at a high rate (MC: 93%, PGCC: 82%). This cluster of cautious course takers corresponded to Cluster 1 at MC (39%) and Cluster 5 in the PGCC sample (21%). The final cluster, inefficient course takers, corresponded to Cluster 2 from MC (15%) and Cluster 1 (12%) from PGCC. This cluster was distinguished by students having a low course efficiency (MC: 61%; PGCC: 28%); at the same time, these were students who also had a low course load (MC: 86%; PGCC: 43%). Students belonging to this cluster may have struggled with academic content or with balancing course work and competing out- of- school responsibilities. Variables included in the cluster analysis and identified as important in determining a given student s cluster membership may not have predictive significance in determining students success at UMUC. Subsequent predictive analyses were completed to determine which demographic and community college course performance variables resulted in later success at UMUC. The k- means algorithm is a clustering method that categorizes students into k number of clusters. Although four-, five-, and six- cluster solutions were examined, no theoretical rationale was developed for retaining a five- cluster solution. The five- cluster solution was selected based on the number of students classified into the smallest cluster. Clustering was performed independently on two different samples (e.g., students in MC and PGCC) lending validity to identified clusters. Hierarchical clustering methods are recommended to explored and confirm the number of clusters.
Page 30 Research Goal 3: Determine MC and PGCC transfer students performance at UMUC The two community colleges in the partnership wish to understand how their students perform after transferring to UMUC. Sample. There were 7,970 students from Montgomery College and 4,971 students from Prince George s Community College. Results. Summary results show that a large proportion of students from each institution are successful in their first term at UMUC, as 58% of MC students and 48% of PGCC students earned first- term GPAs of 3.0 and above. However, a significant proportion of students were not successful, earning a first term GPA below a 2.0 (23% of MC students and 31% of PGCC students). Table 6. UMUC First Term GPA Montgomery College Students (N=7970) First Term GPA Cumulative Frequency Percent Frequency Percent of Total Below 2.0 1,865 23.4 1,865 23.4% 2.0 2.9 1,515 19.01 3,380 19.0% 3.0 or Above 4,590 57.59 7,970 57.6% Prince George's Community College Students (N=4971) First Term GPA Cumulative Frequency Percent Frequency Percent of Total Below 2.0 1,558 31.34 1,558 31.3% 2.0 2.9 1,051 21.14 2,609 21.1% 3.0 or Above 2,362 47.52 4,971 47.5% Research Goal 4. Identify demographic and community college background factors predicting course success at UMUC Building upon the findings in Table 6 above, the next research goal was to examine which demographic factors and variables in students community college course- taking backgrounds were predictive of success at UMUC. To this end, three types of analyses were undertaken. These analyses used logistic regression to build predictive models for the dichotomous outcome variable course success or student success. The analyses described below were first the result of a kitchen sink model, which included all possible predictors of course success. Second, predictive analyses were run for students successful performance in specific types of courses. Finally, a model which added students course efficiency (the ratio of credits earned to credits attempted) as an additional predictor was fit to predict students course success. In building the full model predicting course success and examining students performance in specific types of courses, the following data and methods were used:
Page 31 Sample description. The total number of students included in the data set were 2,771, however, the number of students included in each logistic regression varies depending on whether or not students were enrolled in the specific courses under examination. See Appendix E for a list of which courses were included in analyses. Variables included. Independent Variables. Independent variables potentially predictive of students success at UMUC included: the number of each type of letter grade (A, B, C, D, F, W) students earned at community college, age, and race/ethnicity. Based on initial bivariate correlation and linear regression analyses for each of the independent variables, a number of variables (e.g., count of subjects, credits earned, credits billed, credits attempted, credits earned) were excluded due to concerns about multi- co- linearity, or indicators being redundant with one another (i.e., explaining the same portion of variance in students GPA). Variance Inflation Factor (VIF) was used as the co- linearity diagnostic indicator. Additional student- related variables such as marital status and highest degree earned were excluded from analyses because the data set on these variables included a large number of missing values, leading to inflated VIF. Dependent Variables. There were two types of outcome measures of interest in the study. The first was students GPA at UMUC, with a GPA of 2.0 or above being indicative of student success. GPA was used as the dependent variable for models predicting students success overall. The second dichotomous outcome variable of interest was students course success. Students earning final grades A, B, and C were defined as exhibiting successful course completion, whereas students earning Ds, Fs or Ws were defined as unsuccessfully completing courses. Successful course completion was used as the dependent variable for models predicting students success in specific courses, such as general education courses. Analyses. A series of logistic regressions were run to determine which variables predicted students success at UMUC overall and success in specific courses of interest. Specifically, the sub- questions addressed in these logistic regressions include: 1. Which factors predict course success at UMUC for this population? 2. Do the variables predicting course success at UMUC differ for students who transferred from PGCC versus MC? 3. Which variables best predict gateway course success for UMUC students? 4. Which variables best predict success in the first Written Communication course completed at UMUC? 5. Which variables best predict success in the first General Education Math course completed at UMUC? Gateway courses were identified as 14 classes commonly taken by transfer students new to UMUC. Written Communication courses are twenty courses that satisfy the communications general education requirement at UMUC. General Education Math courses are four courses satisfying the general education mathematics requirement at UMUC. There were no redundancies in the students enrolled in courses of each type (i.e., a student would be enrolled in only one of the four possible general education math courses). Course names and the number of students in each course are displayed in Appendix B.
Page 32 Results. Rather than examining the analytical models individually, we were interested in looking at the analyses in conjunction with one another to identify the variables that are significant in predicting student success in terms of both overall GPA and success in target courses. The target particular courses were chosen because they are frequently taken by students new to UMUC. Further, the courses are foundational both for general education and skill building and for successful performance in subsequent higher- level courses. Table 7. Summary of Predictors for Logistic Regressions Predicting Overall GPA at UMUC and Success in Specific Courses (Total N = 2771) Variable Written General Course Transfer Transfer Gateway Communic Education Success from PG from MC Courses* ation Math Age in Years + + + + (6) + Number of Course + + + + (6) + + with Grade of A Number of Course + + + + (3) + with Grade of B Number of Course + + +(1); (1) with Grade of C Number of Course - - - (2) with Grade of D Number of Course - - - - (10) - with Grade of F Number of Course - - - - (2) - with Grade of W Gender - (2) Race: African- - - - (2) American Race: Not Specified - (1) English Course - Taken Math Course Taken + + + (3) Speech Course - (1) Taken Honors Course Taken Computers Course + + (2) Taken Repeated - R 2 Variance 0.219 0.221 0.196 - - 0.115 0.154 Explained** Prediction Accuracy 77.5% 73.1% 80.4% - - 79.2% 77.6% *As summarized across the 14 gateway courses, number in ( ) indicates number of gateway courses for which that independent variable is a significant predictor of success. Note: Ethnicity, RaceWhite, RaceAmericanIndian, RaceAsian, Foreign, and Remedial were not significant predictors in any of the logistic regression models, and therefore excluded from the final model. + Connotes an independent variable that is a positive predictor, as levels of the independent variable increase, so do levels of the dependent variable - Connotes an independent variable that is a negative predictor, as levels of the independent variable increase, the dependent variable decreases ** R 2 is the variance in successful GPA explained by this model. It represents a measure of effect size and model fit.
Page 33 As shown in Table 7, the independent variables that were determined to be important predictors of success across courses were student age and the counts of A grades and F grades that students earned in courses transferred from community college. Those indicators were repeatedly found to be statistically significant in the logistic regression models. Also shown to be important, but to a somewhat lesser extent were the count of B s and D s in courses transferred from community college, the count of W s in transferred courses, Math- Course Registration, Computer- Course Registration, and African- American identification. Other statistically significant predictors appeared in fewer of the logistic regression models. Research Goal 4a. Examine demographic, community college background factors, and course efficiency as predicting course success at UMUC. Finally, a multivariate analysis was performed to predict UMUC success using independent variables identified as significant predictors from previous analyses and adding an additional computed variable, course efficiency. The dependent variable was success at UMUC, defined as first term GPA. Sample. The sample consisted of 9,063 students who transferred to UMUC from MC and PGCC as new undergraduate students between Spring 2005 and Spring 2012. Of these, 60% of students (n=5,448) were transfers from Montgomery College and 40% (n=3615) were transfers from Prince George s Community College. Variables. Independent Variables. The data set included a variety of demographic variables including age, gender, race, marital status, cohort, community college of origin, and terms skipped. The community college coursework variables included in the data set were categorized by the course subjects, including English, math, speech, and computers, as well as the classification of courses as honors, remedial, or online/face- to- face. Only those independent variables found to be significant predictors of GPA were included in the final model. Further, a course efficiency variable was computed and included in the analyses. Course efficiency is defined as the total number of credits earned divided by the total number of credits attempted in community college. As such, course efficiency was an indicator of how consistently students persist in completing their coursework, rather than failing or withdrawing from classes. The course efficiency rate for all students was approximately 70%, indicating, that on average, students were completing their course work, without receiving F or W grades, in 70% of courses taken. We considered course efficiency an important variable to consider in understanding students success. Course efficiency allows researchers to consider course completion relative to total course load. Further, having low course efficiency indicates unnecessary costs to students, both financially and in terms of time invested. Dependent Variables. The outcome variable of interest was UMUC first term GPA, with a successful GPA being that of 2.0 or above and an unsuccessful GPA being below 2.0.
Page 34 Analyses. Prior to conducting a logistic regression, descriptive analyses were completed for each of the independent variables. Table 8 displays the percent of students from MC and from PGCC taking various types of courses. Table 8. Percent of Students Taking Certain Courses at Community College by Institution (N=9063) Course MC PGCC Total English 69% 75% 71% Math 71% 40% 66% Speech 53% 61% 57% Computer 19% 60% 36% Honors 4% 5% 4% On- line 45% 38% 37% Remedial 48% 52% 49% Table 9 displays the percent of students successful at UMUC by the types of courses taken at community college. Table 9. Percent of Students Successful at UMUC by Community College Coursework Taken or Not Taken (N=7889) Course Taken Not Taken Diff. English 75.7% 74.7% 1.0 Math 77.6% 70.9% 6.7* Speech 77.7% 72.3% 5.4* Computer 75.6% 75.3% 0.3 On- line 74.6% 76.0% - 1.4 Honors 86.0% 74.9% 11.1* Remedial 74.5% 76.3% - 1.8 * Statistically significant Further, as the variable course efficiency was of particular interest in these analyses, Table 10 displays students course efficiency relative to subject areas (e.g., math) and course classification (e.g., honors). Table 10. Course Efficiency Rates Differentiated by Types of Courses Taken or Not (N=9063) Course Taken Not Taken Diff. English 71% 66% 5* Math 71% 68% 3* Speech 74% 64% 10* Computer 73% 68% 5* Honors 80% 70% 10* On- line 70% 70% 0 Remedial 67% 73% - 6* * Statistically significant
Page 35 Discussion. Students at Montgomery College enrolled in Math courses at a higher rate than did students at Prince George s Community College. At PGCC, students enrolled in computer classes at a higher rate than they did at MC. This may be because computer courses at PGCC are classified as general education core courses that fulfill math requirements. (See Table 8.) Students course taking preferences in community college differentiated their success at UMUC. Students enrolled in math or honors courses at community demonstrated greater success at UMUC. (See Table 9.) Students also had differing rates of course efficiency depending on their course taking preferences. (See Table 10.) In particular, students taking honors and speech courses had higher rates of course efficiency than students who did not take such courses. Likewise, students taking English, math, and computer courses had higher rates of course efficiency. Conversely, students taking courses classified as remedial had a lower rate of course efficiency than did students who were not enrolled in remedial courses. Finally, a logistic regression was run to determine which independent variables might be predictors of success in terms of first term GPA at UMUC. (See Table 11 below.) Demographic factors, primarily age, marital status and race were found to be significantly related to success at UMUC. Specifically, older or married students were found to have higher GPAs at UMUC. Compared to white students, students identifying as African American, Hispanic, or with an unspecified race/ethnicity tended to have a significantly lower GPA at UMUC. Table 11. Results of Multivariate Logistic Regression Analysis of Success at UMUC (N=7615) Variable B S.E. Sig. Exp(B) Age.268.027.000 1.308* Gender -.083.060.164.920 Asian Ethnicity -.055.119.643.946 African American Ethnicity -.876.081.000.417* Hispanic Ethnicity -.380.113.001.684* Unspecified Race -.470.104.000.625* Married.422.085.000 1.525* English Course Taken -.187.081.021.829* Math Course Taken.345.072.000 1.413* Speech Course Taken.078.070.269 1.081 Computer Course Taken -.078.063.218.925 Honors Course Taken.467.166.005 1.594* Remedial Course Taken.029.068.674 1.029 Online Course Taken -.175.059.003.839* Course Efficiency.241.012.000 1.273* Note: White was used as reference category for race/ethnicity variables thus not entered in the logistic regression model. *Statistically significant Table 11 also shows that prior coursework was related to success at UMUC. Math courses and honors courses were related to success at UMUC, while online courses at the community college level were inversely related to success. Finally, course efficiency at the community college was found to be a significant predictor of success at UMUC.
Page 36 Research Goal 4b. Examine demographic, community college background factors, and change in GPA as predicting retention at UMUC Given that data- mining of the initial had identified the change in GPA from community college to UMUC (i.e., delta GPA) as a significant factor, we were interested in examining the extent to which delta GPA predicted retention. To this end, we completed a logistic regression of the data. Sample. The sample used in these analyses was the same as that used for the initial data mining explorations. Specifically, this data set included students enrolled in the top 50 most popular courses at each of the community colleges and transferring to UMUC. The total dataset included 12,637 students and 108,237 enrollments Variables. Independent variables. A number of independent variables were used in these analyses: students community college GPA, race/ethnicity, gender, and age were used as control variables. Then, the predictor of interest, delta GPA, was entered into the model. Dependent Variables. The outcome of interest in these analyses was retention at UMUC, defined as students enrollment in a course at UMUC within 1- year of the entering semester. A binary coding (0 or 1) was used depending on whether or not students were retained. Analyses. A step- wise logistic regression was used to examine whether delta GPA was predictive of retention, controlling for background factors. In Step 1 of the model, background characteristics, including race/ethnicity (white was used as the referent group) and community college GPA, were entered into the model; at Step 2, delta GPA was entered. Results. Overall, the majority of transfer students (76.35%) were retained at UMUC. After controlling for demographic factors and community college GPA, students change in GPA upon transferring to UMUC was nonetheless a significant predictor of retention. (See Table 12.) Table 12. Results of Multivariate Logistic Regression Analysis of Retention at UMUC (N=12637) Variable B S.E. Sig. Exp(B) Age - 0.13.002 0.00 1.19* Gender 0.15 0.04 0.00 1.17* Hispanic 0.19 0.08 0.03 1.21* African American 0.34 0.05 0.00 1.40* Asian 0.43 0.09 0.00 1.54* Race/Ethnicity Unknown 0.17 0.07 0.02 1.19* Community College GPA 0.65 0.02 0.00 1.91* Delta GPA 0.64 0.02 0.00 1.89* *Statisitcally significant Excluded from the model were students classified as Non- resident alien, American Indian, Hawaiian/Pacific Islander, or Two or more ethnicities, as these were not significant predictors in the model.
Page 37 Research Goal 5. Investigate predictors of behaviors in WebTycho and success at UMUC In addition to examining which factors are good predictors of students success at UMUC, we were also interested in determining predictors of students behaviors in online courses, specifically in the WebTycho (UMUC s proprietary Learning Management System) online classroom environment. Sample. Participants in this set of analyses were undergraduate students who transferred to UMUC from MC or PGCC and were enrolled in their first semester at UMUC between Spring 2011 and Summer 2012. Further, these students were enrolled in at least one online WebTycho course. The sample included 806 students from Montgomery College with 2579 course enrollments and 566 students from Prince George s Community College with 1761 course enrollments. Variables. Analyses. Independent Variables. Predictive modeling of students course success and behaviors in the WebTycho environment was based on 23 independent variables, including demographic variables (such as marital status, gender, age, race/ethnicity) and course- taking behavior in community college (such as courses taken in subject areas of English, math, speech, or computers, or courses that were honors, remedial, or online). Further, students course load, course efficiency, and GPA were included as potential predictors. See Appendix F for a full list of independent variables considered and their definitions. Dependent Variables. Two dependent variables were the targets of these analyses: students success and online classroom behaviors. The first was successful course completion, with final grades of A, B, and C considered successful course completions and grades of D, F, or W considered unsuccessful course completions. The second outcome variable of interest in this study was students behavioral patterns in the WebTycho online classroom. A cluster analysis was used to identify five clusters of students behaviors in WebTycho, and the outcome variable for predictive analyses was this WebTycho Classroom Behavior Cluster Identifier. This identifier was a single digit code that identified the cluster to which a given student was assigned based on classroom behaviors recorded during the first week of the course. Cluster Analyses. Clusters of students behaviors in the WebTycho environment were determined based on the frequency with which students engaged in 10 key behaviors in the WebTycho online classroom: 1) creating conference notes 2) reading conference notes 3) updating conference notes 4) creating response notes 5) opening the chat 6) opening course content 7) opening the class roster 8) opening the class 9) opening the reserved readings
Page 38 10) opening the webliography A five- cluster solution was selected and separate cluster analyses were performed for students in Montgomery College and Prince George s Community College. Predictive Modeling. Several algorithms were used to determine and validate predictive models of students course success and behaviors in the WebTycho learning environment. These included CRT, CHAID, QUEST, and C5.0, which are algorithms that build decision trees. Decision trees were selected as the desired method of output as the results we were interested in were a series of questions regarding whether or not students had successfully completed courses and to which WebTycho behavioral cluster students belonged. Binning, boosting, oversampling, and weighted costs were all techniques used to potentially maximize model accuracy. Binning reduces the range of values for a given variable (Witten, Frank, & Hall, 2011), by separating a continuous variable into categories. An example is classifying students in Age Bins instead of using a continuous range of Age in Years. Boosting is a technique whereby several of the predictive models are strung together in a series. Oversampling is a technique whereby certain cases (in this instance, cases are students) are excluded. If one cluster has two- thirds of the cases, it is very possible that the predictive models will determine the most accurate method to assign everyone to that cluster. Such a model provides no insight. By reducing the number of cases in the majority cluster, and then running the algorithm, the resulting predictive model s decision tree might improve. An improved decision tree would have cases in all five clusters being predicted accurately. Note, however that overall accuracy and the total number of cases predicted correctly might diminish. Weighting costs does not necessarily improve the overall accuracy of the predictive model, but it can avoid certain types of predictive mistakes. Weighted costs sets values for incorrect predictions. Incorrect predictions that are weighted more are avoided more often than incorrect predictions that are weighted less. Partitioning was used to test accuracy. A small subset of cases was set aside prior to running the algorithms that produced the predictive models. Random selection determined which cases were used in the training data set and which were in the testing data set. These cases are then run through the predictive model and accuracy was assessed. Each time the algorithm was executed, the selection process was recreated, which means the exact number of cases and which cases are in the training set was re- determined. This leads to different accuracy levels in each execution. Because of the different combinations of algorithms, binning, boosting, oversampling, and weighted costs, a total of 64 models were produced: 32 each for MC and for PGCC. Each model was evaluated on accuracy on test data. The test data sets were 25% of the sample. The other 75% of the data were used to train the predictive models. Results. We first present results from cluster analyses, identifying 5 clusters of students actions in the WebTycho classroom interface. Cluster analyses were performed separately for students who transferred from Montgomery College and Prince George s Community College.
Page 39 Table 13 (Montgomery College) and Table 14 (Prince George s County Community College) on the following pages provide information about cluster identifiers, the percentage of students classified into each cluster, and within each cluster, the percentage of students exhibiting the behavior stated in the left column. For example Cluster 1, into which 12% of MC students were classified, included students 100% of whom had read a high number of conference notes and 24% of whom had created a high number of conference notes. The importance values listed indicate the relative importance of each independent variable in determining students cluster membership. Table 13. Description of WebTycho Activity Clusters for Students Transferred from Montgomery College Cluster ID ALL 1 2 3 4 5 Size 2579 319 830 541 528 361 Size (%) 100% 12% 32% 21% 20% 14% Importance % of Students Who Read a High Number of Conference Notes 34% 100% 1% 100% 0% 0% 1.00 % of Students Who Read Zero Conference Notes 20% 0% 0% 0% 100% 0% 1.00 % of Students Who Created a High Number of Conference Notes 13% 24% 11% 19% 0% 16% 0.25 % of Students Who Created Zero Conference Notes 66% 52% 63% 52% 100% 56% 0.25 % of Students Who Created a High Number of Response Notes 30% 80% 13% 56% 0% 28% 1.00 % of Students Who Created Zero Response Notes 36% 2% 36% 6% 100% 21% 1.00 % of Students Who Updated a High Number of Conference Notes 4% 17% 1% 7% 0% 6% 0.16 % of Students Who Updated Zero Conference Notes 90% 72% 96% 83% 100% 87% 0.16 % of Students Who Chatted in a Study Group a High Number of Times <1% 3% 0% 0% 0% 0% 0.03 % of Students Who Never Chatted in a Study Group 98% 93% 98% 98% 100% 95% 0.03 % of Students Who Opened Course Content a High Number of Times 34% 52% 25% 49% 8% 52% 0.46 % of Students Who Never Opened Course Content 39% 18% 40% 21% 81% 21% 0.46 % of Students Who Opened the Class Roster a High Number of Times 11% 24% 7% 17% 4% 10% 0.12 % of Students Who Never Opened the Class Roster 74% 52% 81% 65% 89% 68% 0.12 % of Students Who Entered the Class High Number of Times 41% 91% 0% 69% 5% 100% 1.00 % of Students Who Never Entered the Class 10% 0% 0% 0% 51% 0% 1.00 % of Students Who Opened the Reserved Readings a High Number of 2% 2% 0% 2% 1% 3% 0.01 Times % of Students Who Never Opened the Reserved Readings 97% 96% 98% 95% 99% 96% 0.01 % of Students Who Opened the Webliography a High Number of 9% 17% 6% 14% 3% 9% 0.09 Times % of Students Who Never Opened the Webliography 76% 63% 80% 66% 92% 71% 0.09
Page 40 Table 14. Description of WebTycho Activity Clusters for Students Transferred from Prince George s Community College Cluster ID ALL 1 2 3 4 5 Size 1761 434 284 320 418 305 Size (%) 100% 25% 16% 18% 24% 17% Importance % of Students Who Read a High Number of Conference Notes 24% 8% 78% 0% 3% 52% 1.00 % of Students Who Read Zero Conference Notes 21% 0% 0% 100% 11% 3% 1.00 % of Students Who Created a High Number of Conference Notes 15% 14% 21% 0% 13% 24% 0.15 % of Students Who Created Zero Conference Notes 64% 58% 51% 100% 61% 51% 0.15 % of Students Who Created a High Number of Response Notes 26% 0% 80% 0% 25% 45% 1.00 % of Students Who Created Zero Response Notes 39% 0% 5% 100% 75% 12% 1.00 % of Students Who Updated a High Number of Conference Notes 3% 2% 10% 0% 0% 7% 0.08 % of Students Who Updated Zero Conference Notes 91% 93% 79% 100% 97% 81% 0.08 % of Students Who Chatted in a Study Group a High Number of Times <1% 0% 0% 0% 2% 1% 0.00 % of Students Who Never Chatted in a Study Group 98% 98% 97% 100% 97% 97% 0.00 % of Students Who Opened Course Content a High Number of Times 34% 36% 42% 6% 22% 67% 0.49 % of Students Who Never Opened Course Content 37% 28% 29% 93% 32% 7% 0.49 % of Students Who Opened the Class Roster a High Number of Times 11% 9% 13% 1% 4% 29% 0.19 % of Students Who Never Opened the Course Roster 72% 77% 65% 93% 83% 37% 0.19 % of Students Who Entered the Class High Number of Times 36% 27% 86% 3% 6% 78% 1.00 % of Students Who Never Entered the Class 11% 0% 0% 60% 0% 0% 1.00 % of Students Who Opened the Reserved Readings a High Number of 3% 2% 2% 0% 2% 5% 0.01 Times % of Students Who Never Opened the Reserved Readings 96% 96% 95% 99% 95% 93% 0.01 % of Students Who Opened the Webliography a High Number of 10% 6% 4% 3% 5% 39% 0.72 Times % of Students Who Never Opened the Webliography 74% 88% 96% 94% 83% 0% 0.72 As can be seen from the preceding tables, clusters differed in the extent to which students were active in the WebTycho online classroom. For example, for PGCC students, 78% of students in Cluster 2 read a high number of conference notes, while 0% of students in Cluster 3 and only 3% of students in Cluster 4 read a high number of conference notes. Likewise, while 86% of students from PG entered the online class a high number of times, 60% of students in Cluster 3 never entered the class.
Page 41 Clusters were named to aid in interpretability. In both the MC and PGCC analyses, two high engagement clusters emerged. The first, high social and content engagement, refers to Cluster 1 in the MC sample (12% of students) and Cluster 2 in the PGCC sample (16%). In this cluster were students who had both a high rate of engagement in course content (e.g., entered the online class a high number of times, opened course content a high number of times) as well as in the social- learning aspects of the course (e.g., read a high number of conference notes, created a high number of response notes). The next cluster, high content engagement, included students who entered the class a high number of times and accessed course content a high number of times, but were more limited in participation in the social aspects of course taking (e.g., reading conference notes, creating response notes). This high content engagement cluster corresponded to Cluster 5 for both the MC sample (14%) and the PGCC sample (17%). There was also a disengaged source use cluster identified for both MC (Cluster 4; 20%) and PGCC (Cluster 3; 18%) transfer students. This cluster was distinguished by students, a substantial portion of whom had never entered the class; this was true for 51% students in this cluster who transferred from MC and 60% of students assigned to this cluster who transferred from PGCC. Further, 100% of students assigned to this cluster, from both MC and PGCC, had read zero conference notes. The remaining two clusters included students who engaged in the WebTycho online classroom to a more limited extent. In the moderate engagement or just showing up cluster were students who entered the online class to a moderate extent, but did not exhibit high engagement in any other course- taking behaviors. This moderate engagement cluster corresponded to Cluster 3 in the MC sample and to Cluster 1 in the PGCC sample: 21% of students at MC and 25% of students at PGCC were classified into this group. This cluster had a modest representation of students who entered the class a high number of times (MC: 69%; PGCC: 27%), but also low levels of students creating a high number of response notes (MC: 19%; PGCC: 14%). Finally, in the low engagement cluster were students who exhibited the same limited engagement pattern as the moderate engagement cluster, but to a lesser extent. This cluster corresponded to Cluster 2 in the MC sample (32%) and to Cluster 4 in the PGCC sample (24%). For example, in this cluster were students substantial percentages of whom created zero response notes (MC: 36%; PGCC: 75%). But 0% of students in this cluster, from both MC and PGCC, never entered the class, distinguishing the low engagement cluster from the disengaged cluster. In the next phase of analyses we used demographic profiles to predict students WebTycho behavioral cluster membership. A total of 32 models were produced for each community college. Model accuracy and various algorithms and adjustments were compared. See Appendix G for technical details. Ultimately, the ten most predictive demographic variables for cluster membership were identified for students who transferred from each community college. The most accurate model for both Montgomery College and Prince George s Community College students was CHAID using Binned, Boosted, Weighted, and Oversampling techniques. All 23 independent variables were used in the model, but none had high importance in making the predictions.
Page 42 Predicting Student success. Predictive models were run with the 23 independent variables predicting students success. Again, 32 models for each community college were constructed. Model accuracy information is presented in Appendix H. Again, the most accurate model for both Montgomery College and Prince George s Community College students was CHAID using Binned, Boosted, and Oversampling techniques. All of the independent variables were used in each of the models, but none had a high importance in making the predictions. The top ten variables predicting WebTycho behavioral cluster membership and success at UMUC for students transferring from PG and from MCCC and the importance of these variables are listed in Table 15 below. Variables with extremely low importance are not listed in the table. Table 15. Summary of Top Ten Predictors of Success at UMUC and WebTycho Cluster Membership (Total N = 1372) Importance in Variable Importance in Predicting UMUC Success for MC Students Importance in Predicting UMUC Success for PGCC Students Predicting WebTycho Behavior for MC Students Importance in Predicting WebTycho Behavior for PGCC Students Age 0.06 0.14 0.10 0.13 Marital Status 0.22 0.09 0.23 Gender - 0.06 Race/Ethnicity - 0.06 0.11 Subject Code UMUC 0.09 0.04 0.07 0.03 Number of Courses 0.08 Credits Earned 0.05 0.04 Average Course Load - 0.06 0.09 Course Efficiency 0.08-0.07 Number of Terms Skipped Community College GPA Number of A Grades Earned at Community College Number of F Grades Earned at Community College 0.05-0.05 0.06 0.06 0.06 0.04 0.10-0.10 0.12 0.14 Number of Ws at 0.05-0.03 Community College Computer Course 0.05 - Taken Math Course Taken 0.06 Honors Course Taken 0.03 Online Course Taken 0.13-0.05 Course Repeated - 0.11 0.05 A number of factors that had been identified as contributors to students success in previous analyses were also identified as among the top ten important variables in predictive models of WebTycho behavior and success at UMUC. Specifically, for students from both MC and PGCC, age
Page 43 and marital status continued to be important demographic factors. The number of As and Fs students earned at community college and community college GPA further contributed to prediction accuracy. While the accuracy of the models varied, predictive models provided results that were better than chance. With five clusters to predict, a random guess of a particular student s cluster membership would be correct only 20% of the time. Guessing every student will behave in a manner corresponding to the largest cluster would be correct 30% to 40% of the time. In contrast, the models we developed have predictive accuracy levels of 58%. Likewise, in terms of predicting course success, a random guess of a particular student s success as a dichotomous value successful or not would be correct 50% of the time. Or, guessing that a random student would earn a passing grade would be correct 63% to 76% of the time. In contrast, the models we developed have predictive accuracy levels of 75% to 80%. A key conclusion from these analyses is that, indeed, data mining approaches add value in determining which variables have the most effect in predicting students performance and students online classroom behaviors. The ultimate value will lie in being able to identify the students who on a less successful trajectory while they are still in the community college and to apply interventions to enable them to be more successful. There are two limitations associated with these analyses. First, further exploration is needed to justify the clustering solutions selected through data hierarchical clustering methods or a theoretical model. Second, the variety of algorithms and model- fit optimization techniques used may have improved the classification accuracy of particular models; however, these also limited the potential applicability or generalizability of these models to other data sets. Future analyses should validate the identified models using data mining techniques.
Page 44 Section 8: Summary of Results Based on the comprehensive analyses described above, using both predictive models and data mining techniques to understand predictors of student success at UMUC, a number of conclusions may be drawn. These findings emerge from looking across studies and across student sub- populations and through the use of varied statistical methods. 1) Student success. Overall, students transferring from MC and PGCC are successful at UMUC. Indeed, 60% of transfer students were classified as Stars, indicating that they were earning a GPA of 2.0 or above in the first term at UMUC and re- enrolling in a subsequent term. Data indicate that earning high grades at the community college was an indicator of successful performance at UMUC. 2) Demographics. In various analyses, students age and marital status were repeatedly found to be predictors of success at UMUC. Older, married students tended to earn higher GPAs and be retained. These findings may be indicative of students greater maturity or dedication to their education goals. At the same time, minority status (i.e., African American or Hispanic) was associated with lower performance at UMUC. More investigation needs to be done to determine how best to reach these underserved populations and improve success. 3) Community College Courses. Course efficiency in community college, the ratio of credits earned to credits attempted, was determined to be a predictor of success at UMUC. The higher the ratio, the more likely the student will succeed. Similarly, students who took math or honors courses were more likely to succeed. These results point to the importance of considering not only quantitative measures of students course work (e.g., course load) but also qualitative aspects of students work (e.g., honors and math). 4) Online Classroom Behavior. Patterns in students behaviors in the online classroom have some value in predicting success. In the analysis of online classroom data, students varied greatly in the extent to which they engaged in course content and course- related activities, with a substantial percentage of students not accessing the course or materials at all. Results from this research have indicated that online classroom activity is tied to course success. Though demographic factors and factors in students community college course- taking backgrounds were predictive of success at UMUC and of students behaviors in the online classroom, more robust data are needed to more fully understand the relationship between academic behaviors and student success. 5) Success Profiles. Data mining revealed four student success profiles: Stars, Strivers, Slackers, and Splitters. These profiles provided a useful framework for understanding students at UMUC and introduced a new outcome measure that combined performance (first term GPA) and retention (retention within a12- month window). Factors from the students academic profile at the community college, such as course taking behavior, course load, and change in GPA between the community college and UMUC, were predictive of which student success profile the student would fall within. These results suggest that student preparedness, particularly in specific areas (e.g., accounting, economics) is
Page 45 important in attaining success at UMUC. More exploration of additional outcome variables, such as reenrollment and graduation, are planned for phase 3 of this project. 6) Change in GPA. A new factor, the change GPA between the community college and the first term GPA at UMUC, was introduced in these analyses. Many students experienced a decrease in GPA when transferring to UMUC; however, the magnitude of this decrease has predictive value in determine whether or not students are retained at UMUC. More research is needed to better understand the tradeoff between the difficulty of course work and a higher GPA to help determine what strategies community colleges may employ to better prepare students for their academic transition. 7) Transitional Period. Transferring from community college to a four- year institution is a particularly challenging transition for students. For one, students GPAs tend to suffer during the first semester at the four- year institution. The magnitude of the change in GPA seems to have an effect on students retention, differentiating the Strives and Slackers. For another, indicators of students preparation, such as course efficiency and subject areas, were predictors of success at UMUC. This suggests that students need to prepare for the rigor of UMUC course work. Finally, the number of credits students earned prior to transfer may serve as an indicator of students preparedness to pursue their study at UMUC.
Page 46 Section 9: Research and Intervention Planning in Phase 3 Phase 3 of this project will include a data update, additional research, and the implementation of interventions both within the community colleges and at UMUC. A national convening and a number of long term projects are also envisioned, as described below. Data Update UMUC will update the Kresge Data Mart with more recent community college data, along with new community college variables, including Accuplacer scores, a financial aid indicator, revised course grades and cumulative GPA, as well as indicators of online course experience. UMUC will provide more recent online course behavior for students at UMUC, as well as Accuplacer scores and a financial aid indicator. These new data introduce a number of potential avenues to allow further examination of student behavior at the community college and success at UMUC. Research Plans Several analytical studies are planned for the coming year: Examine community college variables to predict students re- enrollment, retention, and completion. Expand upon the analyses of students behaviors in the online classroom to predict reenrollment, retention, and completion at UMUC. Explore success in community college courses and determine its influence on success at UMUC. Explore student core competencies, particularly math ability, as a predictor of success at UMUC. Use data mining to explore community college course subject areas to determine if clusters emerge to predict success at UMUC. Study the successful completion of developmental math course sequences and its influence on student success in students first college level math course. National Convening Thanks to the generosity of the Kresge Foundation, UMUC received additional support to expand our efforts in the area promoting student success by hosting a national convening on data mining and learner analytics. This convening, being planned for the summer of 2014, will bring together individuals from universities, community colleges, national organizations, state and national departments of education, private businesses and foundations with the goal of creating communities of practice around learning analytics and specifically around the design of the interventions aimed at the critical issue of making sure the transfer process to bachelors degrees is successful. This convening will promote the collaborative development of strategies and interventions by community colleges and four- year institutions. Community College Student Interventions During Phase 3 of the project, UMUC will work with Montgomery College and Prince George s County Community College to implement interventions aimed at improving the outcome for students who are on trajectories that may not lead to later success, based on the predictive
Page 47 modeling described above. Four interventions are proposed and summarized in Appendix I. These interventions differ in the target populations and the type of support academic, social, or course- specific. All are intended to generate positive learning behaviors. This comprehensive set of interventions will provide feedback on how best to help students transfer from community college to UMUC and become more successful. New Student Checklist. This intervention is designed to promote a positive experience for students new to UMUC. Students will be provided with a checklist prescribing behaviors they should engage in prior to and during the beginning weeks of their first semester. Students will be selected from a set of courses taught. The courses will have at least two sections taught by the same faculty member in the same section. One section will receive a checklist, while the other will not. The checklist will contain information such as how to access the library, how to get information about financial aid, and how to contact an advisor. The use of the checklist will be expected to be part of the student s participation grade for the course; however, use of the checklist will not be evaluated. Outcome measures will include course completion and a student survey. Comparisons will be made between the treatment group and the control group to determine if the checklist had an impact on first term success. The development of this intervention is underway, and a pilot is planned for the Spring or Summer of 2014. Community College Specific Mentor. This intervention would identify students who have transferred to UMUC from MC or PGCC and have successfully completed their gateway courses. These students will be offered an opportunity to serve as a mentor to new incoming students from their community college; new incoming students will be paired with mentors who transferred from the same institution. The goal for the mentoring relationship is to improve first semester success for new transfer students from MC and PGCC and to build a socially integrated community. Effectiveness of this intervention will be assessed by comparing the first term GPAs of students receiving mentoring to those students who did not receive a mentor. Reenrollment and retention will also be examined. A potential challenge may be in identifying an adequate number of mentors (i.e., students who transferred from community college, have successfully completed gateway courses, are still enrolled, and interested in serving as mentors.) Further, the specific nature of the mentoring process (i.e., the types of information and advice that mentors may be expected to provide) needs to be outlined, and mentors need to be trained. This intervention is currently being planned, and a pilot scheduled for 2014. The Predictive Analytics Reporting (PAR) Intervention, planned in collaboration with the Gates Foundation. PAR has used a set of data from a cross- section of institutions across the country,, including UMUC, to build models that predict student success in terms of course completion. PAR has identified specific high risk courses that are targeted for intervention. UMUC has agreed to explore Accounting 220 because it is a gateway course that has a high rate of lack of successful completion, when compared with other UMUC courses. ACCT 220 has an intervention that uses an interactive online tutoring program to improve course success. This intervention has been in place
Page 48 for a few semesters and the data can be mined to examine the effects of this intervention on the success of students. Intervention effectiveness will be assessed through post- hoc analyses, by comparing the successful course completion rates of ACCT 220 students who used the online tutoring systems to course completion rates for students who did not. The tutoring intervention was offered to all students, and students self- selected into treatment. Thus, exogenous factors may influence why students chose to participate, threatening the validity and generealizeability of the results. The data from this intervention is being collected. PAR researchers will conduct the analyses, and report the results. Coaching Undergraduates for Success and Persistence (CUSP). This intervention is designed to support students development of a critical competency necessary for academic success academic writing. In this program, students new to UMUC are administered a writing diagnostic test to determine whether they are in need of additional support for their academic writing. Based on the outcome of the diagnostic, students are then assigned a mentor; mentors and mentees are matched based on a variety of demographic factors. This program provides students with guidance on writing at the university level with support from the Effective Writing Center (EWC), the UMUC English department and additional writing- related resources. This intervention was implemented in summer and fall 2013. Currently a total of 20 students have completed the program and reached the Exploring Your Writing milestone. The writing milestone requires submitting an end of semester writing sample to demonstrate improvement on the diagnostic assessment. While intervention implementation and data collection are on- going, a number of assessments of intervention effectiveness are planned. These include examining students first- term GPA, successful course completions, and retention at UMUC as well as comparing students performance on the diagnostic and end- of- term writing assessments. Long Term Projects Below are described potential interventions planned by partner community colleges, MC and PGCC, for which UMUC will offer support. Pearson Developmental Education Math Modules. Pearson publishing has recently modularized developmental education courses for both of the community colleges partners. The module approach is designed to better meet students developmental needs. Rather than being required to master the content of a full- term course, students can progressively complete smaller modules and focus on areas where they need the most help. This intervention is aimed at improving course success in the first college level math course for students who started in developmental math education. As developmental math modules have only recently been implemented at both of the community colleges, the effectiveness of this intervention will be evaluated after students transfer to UMUC. Girls to Women. Girls to Women is a project designed by Montgomery College to support students from high school to community college to a four- year university. The project is modeled after a
Page 49 similar project, Boys to Men, whereby African American male high school students have been mentored and advised to prepare them to navigate through college to obtain a four- year degree. The Girls to Women project will focus on African American women enrolled at Montgomery College. This intervention is expected to improve the preparedness for college level courses, success in college courses, and the completion of a college credential. While this intervention will be funded by the Kresge Foundation, it is a longitudinal study that both UMUC and Montgomery College are committed to complete and report on beyond the scope of the Kresge Research Grant. The results will help students whether or not they choose to transfer to UMUC. The students will be tracked to the four- year school of their choice. Math Success. The math departments at each partner institution have begun to meet to discuss students success in math courses and to align courses offered at the community college and at UMUC. While specific assessment of student measures associated with this initiative are not planned at this time, this work is intended to foster discussions between UMUC and partner institutions about the nature of students math content learning and preparation. The research from this grant has shown that math is a significant predictor of success. This discussion provides an opportunity to explore possible activity that can improve student math success.
Page 50 References Aragon, S.R. & Johnson, E.S. (2008). Factors influencing completion and non- completion in online community college courses. American Journal of Online Education, 22(3), 146-158. Baker, R.S. & Yacef, K. (2009). The state of educational data mining: A review and future visions. Journal of Educational Data Mining, 1(1), 3-17. Bean, J. P. & Metzner, B. S. (1985). A Conceptual Model of Nontraditional Undergraduate Student Attrition. Review of Educational Research, 55(4) 485 540. JSTOR. Retrieved from http://www.jstor.org/stable/1170245 Boston, W., Diaz, S.R., Gibson, A.M., Ice, P., Richardson J., & Swan, K. (2011). An exploration of the relationship between indicators of the community of inquiry framework and retention in online programs. Journal of Asynchronous Learning Networks, 13(3), 67-83 Chen, P.S. D., Lambert, A. D. & Guidry, K. R. (2010). Engaging online learners: The impact of Web- based learning technology on college student engagement. Computers & Education, 54(4), 1222 1232. Elsevier. http://dx.doi.org/10.1016/j.compedu.2009.11.008 Dawson, S. (2010). Seeing the learning community: An exploration of the development of a resource for monitoring online student networking. British Journal of Educational Technology, 41(5), 736 752. Wiley Online Library. http://dx.doi.org/10.1111/j.1467-8535.2009.00970.x Finnegan, C., Morris, L.V., and Lee, K. (2009). Differences by course discipline on student behavior, persistence, and achievement in online courses of undergraduate general education. Journal of College Student Retention, 10(1), 39-54. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. China Machine Press, 8, 3-6. Herzog, S. (2006). Estimating student retention and degree completion time: Decision trees and neural networks vis a vis regression. New Directions for Institutional Research, 131, 17-33. Ho Yu, C., DiGangi, S., Jannasch- Pennell, A., Lo, W., & Kaprolet, C. (2007, February). A data- mining approach to differentiate predictors of retention. Paper presented at the Educause Southwest Conference, Phoenix, AZ. Kember, D. (1989). A longitudinal- process model of drop- out from distance education. The Journal of Higher Education, 278 301. JSTOR. Retrieved from http://www.jstor.org/stable/198225 Luan, J. & Zhao, C- M. (2006). Data mining: Going beyond traditional statistics. New Directions for Institutional Research, 131, 7-16.
Page 51 Luan, J. (2001). Data mining as driven by knowledge management and higher education: Persistence clustering and prediction. Paper presented at the SPSS Public Conference, University of California at San Francisco. Luan, J, (2002). Data mining and its applications in higher education. New Directions for Institutional Research, 113, 7-36. Moore, J. C. & Fetzner, M. J. (2009). The road to retention: A closer look at institutions that achieve high course completion rates. Journal of Asynchronous Learning Networks, 13(3), 3 22. Retrieved from http://sloanconsortium.org/jaln/v13n3/road- retention- closer- look- institutions- achieve- high- course- completion- rates Morris, L.V. & Finnegan, C.L. (2009). Best practices in predicting and encouraging student persistence and achievement online. Journal of College Student Retention, 10(1), 5-34. Nistor, N., & Neubauer. K (2010). From participation to dropout: Quantitative participation patterns in online university courses. Computers in Education, 55, 663-672. Park, J- H., & Choi. H- J. (2009). Factors influencing adult learners decision to drop out or persist in online learning. Educational Technology and Society, 12(4), 207-217. Rovai, A. P. (2002). Building sense of community at a distance. The International Review of Research in Open and Distance Learning, 3(1). Retrieved from http://www.irrodl.org/index.php/irrodl/article/view/79/152 Romero, C., Ventura, S., Espejo, P.G. and Hervas, C. (2008, June). Data mining algorithms to classify students. Paper presented at the1st International Conference on Educational Data Mining, Montreal, Canada. Tinto, V. (1993). Leaving college: Rethinking the causes and cures of student attrition (2nd ed.). Chicago: University of Chicago Press. Willging, P.A. & Johnson, S.D. (2009). Factors influencing adult student decisions to dropout of online courses. Journal of Asynchronous Learning Networks, 13 (3), 115-127.
Page 52 Appendix A. Budget and Financial Statement Item Total Budget Expenses as of 10/31/13 Match Expenses as of 10/31/13 Remaining Budget Personnel 780,000.00 331,289.51 200,000.00 248,710.49 Fringe Benefits 206,700.00 47,220.55 90,100.00 69,379.45 Insurance - - - - Travel 10,500.00 8,174.45-1,624.68 Food Services (Hyatt Hotel) 700.87 Equipment 250,000.00 200,000.00 50,000.00 - Supplies 50,000.00 50,000.00 - Printing and Copying 14,400.00 1,154.41-11,069.59 Conference/Seminar 2,176.00 Telephone and Fax - - - - Postage and Delivery - - - - Indirect Costs 78,000.00 34,749.28 20,000.00 23,250.72 Other Contractual Srvcs/Evaluation 475,000.00 108,819.90 70,000.00 296,180.10 Total Costs 1,864,600.00 734,284.97 480,100.00 650,215.03
Report ID G271008 University of Maryland University College Report Title 271008 Project-Grant report Business Unit/Name: UCUSA UMUC Stateside Scope name G_271008 Project: 271008 The Kresge Foundation Grant Layout ID 204_ Start/End dates: 11/01/2010 12/31/2014 Version 12.1.01 Manager: Miyares,Javier Run November 26, 2013 at 09:55 As of: 11/01/2013 2013-11-01 Trees as of: 11/01/2013 2013-11-01 Account Description Current Month Actual YTD Actual Inception To Date Actual Total Budget Remaining Budget Revenue Tuition and fees 0.00 0.00 0.00 0.00 0.00 434000 Cont Busn-Industry (50,000.00) 495,092.49 1,150,000.00 1,200,000.00 50,000.00 Gifts, Grants and Contracts (50,000.00) 495,092.49 1,150,000.00 1,200,000.00 50,000.00 Sales and service educational 0.00 0.00 0.00 0.00 0.00 Sales and service auxiliary 0.00 0.00 0.00 0.00 0.00 Miscellaneous income 0.00 0.00 0.00 0.00 0.00 Total Net Revenues (50,000.00) 495,092.49 1,150,000.00 1,200,000.00 50,000.00 Expenses 600000 Exempt Regular Salary 0.00 0.00 0.00 0.00 0.00 601030 Exempt Contingent Salary 4,850.00 41,251.78 66,075.00 106,430.94 40,355.94 601070 Contractual Employee 0.00 0.00 265,214.51 265,214.51 0.00 Salaries 4,850.00 41,251.78 331,289.51 371,645.45 40,355.94 610000 Tiaa Optional Ret 0.00 0.00 0.00 0.00 0.00 610010 Health Insurance 0.00 0.00 6,372.00 6,372.00 0.00 610040 Employees Retirement 0.00 0.00 14,681.24 14,681.24 0.00 610050 Social Security Fica 371.03 3,155.76 25,149.53 26,763.54 1,614.01 610170 Unemploy Ins-Empl 13.58 115.50 1,017.78 2,137.77 1,119.99 610190 Workers' Comp-Empl 0.00 0.00 0.00 0.00 0.00 Fringe benefits 384.61 3,271.26 47,220.55 49,954.55 2,734.00 Sub Total Payroll 5,234.61 44,523.04 378,510.06 421,600.00 43,089.94 821000 In-State Travel 0.00 0.00 0.00 0.00 0.00 821010 Out-Of-State Travel 2,221.96 5,617.19 8,174.45 11,150.37 2,975.92 825040 Printing/Reproduction 0.00 0.00 0.00 12,184.59 12,184.59 825140 Food Services (2,221.96) 0.00 700.87 700.87 0.00 825150 Conference Services 1,115.00 1,115.00 1,115.00 5,000.00 3,885.00 825170 Other Contract Srvcs 0.00 24,482.05 108,819.90 453,148.76 344,328.86 826030 Office Supplies 0.00 0.00 39.41 39.41 0.00 826050 Conference Materials 0.00 0.00 0.00 (0.00) (0.00) 826400 Capital Equipment 0.00 0.00 200,000.00 200,000.00 0.00 830510 Conference/Seminar (1,115.00) 0.00 2,176.00 2,176.00 0.00 886020 Indirect Cost Exp 0.00 3,640.18 34,749.28 44,000.00 9,250.72 Operating expense 0.00 34,854.42 355,774.91 728,400.00 372,625.09 0.00 0.00 Total Expenses 5,234.61 79,377.46 734,284.97 1,150,000.00 415,715.03 Net (55,234.61) 415,715.03 415,715.03 50,000.00 (365,715.03) Page 1 of 1 Financial report
Page 53 Model No. Appendix B. Summary Information for Models Considered in Data Mining Analyses Description Training subset (N) Validation subset (N) Predictor variable(s) Response variable Model/ Algorithm Performance indicators % Comments 1 Effects of Delta 11340 3842 Delta GPA Retention Neural Net Overall accuracy 78.7% GPA on Accuracy 0.0% Retention improvement (lift) False positive rate 77.6% False negative rate 6.1% 2 Effects of 11859 4031 Age_Years UMUC GPA Logistic Overall accuracy 69.0% Student Race/Ethnicity 2.0 regression Accuracy 0.2% Demographics and community Gender Marital Status improvement (lift) False positive rate 97.5% college on From_PG_and_MC False negative rate 0.9% UMUC Success From_PG Boosted tree Overall accuracy 69.1% From_MC Accuracy 0.4% Most Accurate improvement (lift) False positive rate 94.2% False negative rate 2.2% 3 Effects of MC 5262 1832 Age_Years UMUC GPA Boosted tree Overall accuracy 64.6% Student Race/Ethnicity 2.0 Accuracy 2.0% Demographics on UMUC Gender Marital Status improvement (lift) False positive rate 92.1% Success False negative rate 2.6% 4 Effects of PGCC 4852 1686 Age_Years UMUC GPA Boosted tree Overall accuracy 65.2% Student Race/Ethnicity 2.0 Accuracy 1.9% Demographics on UMUC Gender Marital Status improvement (lift) False positive rate 92.4% Success False negative rate 2.41% 5 CC GPA 11899 3991 Community College UMUC GPA Neural Net Overall accuracy 69.2% Neural net GPA 2.0 Accuracy 1.84% probably improvement (lift) overfit False positive rate 81.5% False negative rate 6.9% 6 CC GPA 11899 3991 Community College UMUC GPA All Models Overall accuracy 67.9% Predicts all
Page 54 (above 2.0) GPA > 067.9% 2.0 Accuracy 0% are successful improvement (lift) False positive rate 100% False negative rate 0% 7 Course Success 7604 3745 %As, %Bs, %C s UMUC GPA Bootstrap Overall accuracy 71.1% at CC or %As, 2.0 Forest Accuracy 4.72% %Bs, C s improvement (lift) False positive rate 73.4% False negative rate 7.9% 8 Community 10956 3636 Community College UMUC GPA Logistic Overall accuracy 67.3% College Course Course Load 2.0 regression Accuracy 0% Load (Average number of credit hours per Boosted tree improvement (lift) False positive rate 100% term) False negative rate 0% Overall accuracy 67.4% Accuracy 0.1% improvement (lift) False positive rate 99.2% False negative rate 0.3% 9 Number of CC 11340 3842 Binned Credit UMUC GPA Boosted tree Overall accuracy 73.6% credits (binned) 2.0 Accuracy 1.9% improvement (lift) False positive rate 77.9% False negative rate 6.6% 10 Effects of 11340 3842 # of CC Courses UMUC GPA Neural Net Overall accuracy 71.3% Neural net Courses Taken % F grades 2.0 Accuracy 4.8% probably and % F and %W Grades at % W grades improvement (lift) False positive rate 70.3% overfit CC predictive of False negative rate 9.2% UMUC Success Logistic Overall accuracy 71.3% regression Accuracy 4.7% improvement (lift) False positive rate 75.8% False negative rate 6.6%
Page 55 Appendix C. Top 50 Courses at MC & PGCC Grouped by Subject Area (listed by number of enrollments within subject) Montgomery College Enrollments Prince George s Community College Enrollments Accounting AC 201 Accounting I 3,485 ACC 101 Principles of Accounting I 1,589 AC 202 Accounting II 2,206 ACC 102 Principles of Accounting II 942 Business/Management BA 101 Intro to Business 2,495 MGT 101 Introduction to Business 2,102 BA 210 Statistics for Bus Admin 1,521 BUS 122 Business Law I 984 MG 201 Business Law I 1,261 MGT 160 Principles of Management 715 MG 101 Princ Management 832 MGT 162 Financial Planning & Investmnt 664 MGT 272 Managing Workplace Diversity 605 MGT 261 Human Resource Management 560 Biology BI 101 General Biology 2,272 BIO 101 General Biology I 2,559 BI 107 Prin of Biology I 1,167 BIO 205 Human Anatomy & Physiology I 748 BI 105 Environmental Biology 814 Nutrition FM 103 Intro to Nutrition 1,610 NTR 101 Introductory Nutrition 1,368 BIO 115 Basic Nutrition 834 Chemistry CH 101 Princ of College Chem I 1,048 CHM 101 General Chemistry I 706 CH 100 Intro College Chem-A 751
Page 56 Montgomery College Enrollments Prince George s Community College Enrollments Computer Science CA 120 Intro to Computer Applics 2,724 CIS 101 Computer Literacy 3,724 CS 110 Computer Concepts 985 CIS 111 Computer Programming I 701 CS 140 Intro to Programming 905 CA 106 Computer Use and Mgmt 843 Criminal Justice CJ 110 Admin of Justice 958 CJT 251 Criminal Law 578 CJT 151 Intro to Criminal Justice 559 CJT 254 Criminal Evidence/Procedure 519 CJT 153 Law Enforcement/Community 475 Economics EC 201 Prin of Economics I 3,126 ECN 103 Principles of Macroeconomics 1,595 EC 202 Prin of Economics II 2,109 ECN 104 Principles of Economics II 795 English (Developmental) RD 99 College Reading Skills II 864 EGL 100 Introduction to Composition 1,215 EN 2 Basic English II 699 DVR 6 College Reading & Study Skills 1,106 EN 1 Basic English I 665 DVE 1 Developmental Composition 694 English Composition EN 101 Tech of Rdng & Wrtg I 5,288 EGL 101 Comp I: Expository Writing 4,425 EN 102 Tech of Rdng & Wrtg II 4,655 EGL 102 Comp II: Intro to Literature 2,851 EN 109 Wrtg/Technology & Business 1,222 EGL 132 Comp II: Writing for Business 698
Page 57 Montgomery College Enrollments Prince George s Community College Enrollments ESOL EL 104 American English Lang IV 1,018 RD 103 Read/Non-Native Speakers III 843 EL 103 American Eng Language III 793 Health HE 100 Prin Health Living 2,710 HLE 115 Personal and Community Health 1,383 HE 101 Pers & Comm Health 642 History HS 201 US Hist Colonial-1865 964 HST 141 History of U.S. I 1,124 HS 202 US Hist 1865-Pres 962 HST 245 African-American History 857 Math (Developmental) MA 100 Intermediate Algebra (D) 3,362 MAT 104 Intermediate Algebra (D) 2,158 MA 91 Elementary Algebra (D) 2,366 DVM 7 Introductory Algebra (D) 1,834 MA 110 Survey of College Math 1,824 DVM 3 Prealgebra (D) 1,647 MA 90 Prealgebra (D) 1,175 MAT 112 Mathematics General Education 1,486 MA 103 Intermediate Algebra (D) 961 MAT 135 College Algebra 1,403 CAP 103 Math Confidence Bldg 986 DVM 5 Developmental Math (D) 573 Math: Statistics MA 116 Elements of Statistics 2,262 MAT 114 Intro to Statistics 807 MAT 221 Statistics 473 Math: Calculus MA 160 Elem Applied Calculus I 1,499
Page 58 MA 181 Calculus I 1,150 Montgomery College Enrollments Prince George s Community College Enrollments Philosophy PL 201 Intro to Philosophy 867 PHL 101 Intro Phil: Art of Questioning 1,667 Political Science PS 101 American Government 895 POS 101 American National Government 702 Psychology PY 102 General Psychology 3,778 PSY 101 General Psychology 3,349 PSY 207 Human Growth and Development 611 Sociology SO 101 Intro Sociology 3,210 SOC 101 Intro to Sociology 2,177 Speech SP 108 Intro to Human Communication 4,116 SPH 101 Intro to Speech Communication 3,124 SPH 109 Interpersonal Communication 932 Miscellaneous SN 101 Elem Spanish I 1,449 PED 103 Circuit Weight Train/Aerobics 1,567 MA 180 Precalculus 1,408 ART 101 Introduction to Art 843 AR 101 Introduction to Drawing 907 SPN 101 Spanish for Beginners 551 AN 101 Intro Soc/Cul Anthro 739 PSC 101 Intro to Astronomy 485 AR 127 Art Appreciation 651
Page 59 Appendix D. Summary Information for Models Considered in Predictive Analyses Purpose Sample Predictor variable(s) Response variable Analyses Model Descriptors Describe students demographic and CC course taking backgrounds Examine transfer students performance at UMUC Determine whether demographic characteristics and CC course taking backgrounds predicted success at UMUC, both overall and in specific courses UMUC students transferring from CC Spring 2011 Summer 2012 MC: N=806 PGCC: N=566 MC: N = 7970 PGCC: N = 4971 N = 2771 students transferring to UMUC Spring 2005 Spring 2012 - - Age, Gender, Marital Status - - Course Classification: Honors, Online - - GPA - - More than 1 A - - More than 1 F - - More than 1 W - - More than 1 Course Repeat - - Course Load, Course Efficiency - - No Stop- out between Institutions N/A - - Age, Ethnicity - - Number of As, Bs, Cs, Ds, Fs, Ws earned at CC - - More than 1 course(s) in English, Math, Speech, Computers - - More than 1 courses classified as Honors, Remedial - - More than 1 courses repeated Cluster Membership (5 Cluster solution) First Term GPA @ UMUC UMUC First- Term Success (i.e., GPA 2.0) UMUC Course Success (grade A, B, or C) in: a. Gateway Course b. Written Communication c. General Ed Math K- means cluster analyses Descriptive Logistic Regression N/A N/A R 2 Variance explained range: 0.115 0.221 Prediction Accuracy Range: 79.2% - 73.1%
Page 60 Determine whether demographic characteristics and CC course taking backgrounds, and course efficiency predicted success at UMUC Examine whether students demographic characteristics and and CC course taking behaviors were predictive of success at UMUC and membership in online classroom N = 9063 students who transferred to UMUC from CC Spring 2005 Spring 2012 UMUC students transferring from CC Spring 2011 Summer 2012 MC: N = 806 PGCC: N = 566 - - Age, Gender, Marital Status, Ethnicity - - 1 course(s) in English, Math, Speech, Computers - - 1 courses classified as Honors, Remedial, Online - - Course Efficiency 23 Independent Variables examined; Top 10 predictors for each analysis included: - - Age and Marital Status - - Count of As, Fs, and Ws - - CC GPA - - UMUC Subject Area UMUC First- Term Success (i.e., GPA 2.0) Membership in online classroom behavioral cluster UMUC First- Term Success (i.e., GPA 2.0) Logistic Regression K- Means Cluster analyses to determine clusters of WebTycho behaviors (5 Cluster Solution) Predictive modeling using CHAID algorithm w/binned, Boosted, Weighted, and Oversampling Techniques Prediction Accuracy: 76.4% Cluster analysis: Random Guessing: 20% Correct Classification Model: 58% Correct Classification UMUC Success: Random Guessing: 50% Model: 75 80% Correct Classification
Page 61 Appendix E. List of which courses were included in analyses and the number of students enrolled in each course. General Education Courses Course ACCT 220 BMGT 110 CCJS 100 CMIS 102 EDCP 100 EDCP 103 GVPT 170 HIST 157 IFSM 201 LIBS 150 MATH 012 MATH 009 PSYC 100 WRTG 101 Number of Students 441 520 250 478 645 120 232 406 2056 6246 265 319 351 359 Writing Communication Courses Course COMM 390 COMM 393 COMM 394 ENGL 101 ENGL 291 ENGL 294 ENGL 303 ENGL 384 ENGL 391 ENGL 485 WRTG 101 WRTG 288 WRTG 289 WRTG 291 WRTG 293 WRTG 388 WRTG 390 Number of Students 94 70 116 39 60 76 117 4 46 12 331 40 9 149 49 22 84
Page 62 Course WRTG 391 WRTG 393 WRTG 394 Number of Students 454 340 659 General Education Math Courses Course MATH 106 MATH 107 MATH 115 Number of Students 105 307 34
Page 63 Appendix F. Independent Variables Considered and their Definitions Independent Variable MARRIED_STATUS_CD GENDER_CD AGE_YEARS RACE_LD NUM_OF_TERMS_SKIPPED ENGLISH_YN MATH_YN SPEECH_YN COMPUTER_YN HONORS_YN REMEDIAL_YN REPEATED_YN ONLINE_YN COUSE_LOAD COURSE_EFFICIENCY GRADE_POINT_AVERAGE CNT_GRADE_A CNT_GRADE_F CNT_GRADE_W CNT_COURSES CREDITS_REG_ATT CREDITS_REG_ERND CREDITS_FOR_GPA Description a single letter code, 6 different values (D, E, M, S, U, and *) a single letter code, 3 different values (F, M, and *) number of years calculated from Date of Birth to January 2013 full word description, 6 different values the number of terms between UMUC cohort semester and the student's most recent previous institution semester, this number was based on transcript data stored in the KDM, not the coursework files received directly from the partner institutions Y or N flag indicating whether or not the student took an English course at either MC or PGCC Y or N flag indicating whether or not the student took an mathematics course at either MC or PGCC Y or N flag indicating whether or not the student took a speech at either MC or PGCC Y or N flag indicating whether or not the student took a computer science or related course at either MC or PGCC Y or N flag indicating whether or not the student took an honors course at either MC or PGCC Y or N flag indicating whether or not the student took a remedial course at either MC or PGCC Y or N flag indicating whether or not the student repeated a course at either MC or PGCC Y or N flag indicating whether or not the student took an online course at either MC or PGCC total number of hours billed by MC and PGCC divided by the number of terms the student was active at MC and PGCC. total number of hours earned at MC and PGCC divided by the number of terms the student was active at MC and PGCC the number of credits attempted for each course at MC and PGCC was multiplied by the grade, then this product was divided by the number of credits attempted at MC and PGCC the number of times the student earned the grade of A at MC and PGCC the number of times the student earned the grade of F at MC and PGCC the number of times the student earned the grade of W at MC and PGCC the number of courses the student attempted the grade of W at MC and PGCC the number of college level credits the student attempted at MC and PGCC the number of college level attempted credits the student passed at MC and PGCC the number of college level attempted credits the student took at MC and PGCC
Page 64 Appendix G. Accuracy of Models Predicting Clusters of WebTycho Behaviors Accuracy of Montgomery College Predictive Models Model Accuracy Algorithm No Adjustments Oversampled Binned Binned and Oversampled Binned and Boosted Binned, Boosted, and Oversampled Binned, Boosted, and Weighted Binned, Boosted, Weighted, and Oversampled CRT 53% 34% 53% 34% 39% 46% 39% 48% CHAID 53% 36% 53% 34% 42% 58% 42% 58% QUEST 53% 38% 53% 38% 53% 30% 53% 36% C5.0 39% 46% 39% 57% 53% 59% 53% 58% Table 2 Accuracy of Prince George s Community College Predictive Models Model Accuracy Algorithm No Adjustments Oversampled Binned Binned and Oversampled Binned and Boosted Binned, Boosted, and Oversampled Binned, Boosted, and Weighted Binned, Boosted, Weighted, and Oversampled CRT 11% 14% 11% 36% 35% 45% 35% 41% CHAID 48% 38% 50% 36% 39% 58% 39% 58% QUEST 11% 32% 11% 33% 52% 30% 52% 36% C5.0 11% 15% 11% 19% 11% 16% 11% 21%
Page 65 Appendix H. Accuracy of Models Predicting Success at UMUC Accuracy of Montgomery College Predictive Models Model Accuracy Algorithm No Adjustments Oversampled Binned Binned and Oversampled Binned and Boosted Binned, Boosted, and Oversampled Binned, Boosted, and Weighted Binned, Boosted, Weighted, and Oversampled CRT 62% 65% 66% 68% 63% 75% 63% 74% CHAID 65% 72% 65% 69% 64% 75% 64% 76% QUEST 67% 65% 66% 68% 66% 71% 66% 68% C5.0 63% 69% 67% 72% 66% 70% 66% 73% Accuracy of Prince George s Community College Predictive Models Model Accuracy Algorithm No Adjustments Oversampled Binned Binned and Oversampled Binned and Boosted Binned, Boosted, and Oversampled Binned, Boosted, and Weighted Binned, Boosted, Weighted, and Oversampled CRT 56% 23% 11% 70% 62% 75% 62% 79% CHAID 67% 67% 64% 70% 58% 80% 58% 73% QUEST 65% 37% 11% 31% 65% 67% 65% 67% C5.0 11% 17% 11% 19% 11% 13% 11% 14%
Page 66 Appendix I. Intervention Matrix Intervention Description Target audience Girls to women Holistic support MC African with tutor, advisor, American and mentor women Math analysis Share data on transfer student success and discuss challenges PGCC transfer students Dev ed Modularize courses Dev ed math students Mentor & writing ACCT 220 Welcome checklist CC Specific Learning Communities Provide mentor and analyze writing skill Interactive online tutor Provide new students with a checklist of how to navigate the university Identify recent successful students to serve as mentors New students at UMUC Beginning course in ACCT New students from community colleges New PG students Timeframe Measures Outcome Source/data HS - CC - 4 yr CC - 4 yr CC - 4 yr Retention; GPA; satisfaction Success in first math Success in first math 4 yr Writing milestone 4 yr Course success 4 yr reenrollment, retention, GPA, satisfaction New to 4 yr First term GPA; course success Persistence Student success in math Student success in math Course success; GPA Course success; Persistence Course success; retention Minorities (Kresge) Math (Kresge) Math (Kresge) English (Kresge) Course success (PAR) First term success (Kresge) Collaborative discussions with CC