1 The Effects of Class Size in Online College Courses: Experimental Evidence Eric Bettinger, Christopher Doss, Susanna Loeb and Eric Taylor
2 VERY PRELIMINARY PLEASE DO NOT QUOTE OR CITE The Effects of Class Size in Online College Courses: Experimental Evidence By Eric Bettinger, Christopher Doss, Susanna Loeb, and Eric Taylor May 12, 2014
3 1. Introduction Class size is a perennial issue in the economics of education. It has implications for both the cost and production of education. In K-12 education, the effects of class size have been vigorously debated (e.g. Mosteller 1995 Hanushek 2002, Krueger 2002, Hoxby 2000, Krueger and Whitmore 2000, Angrist and Lavy 1999, Leuven Oosterbeek, Ronning 2008, Gary-Bobo, Mahjoub 2009, Woesman and West 2006, Dynarski, Hyman, and Dynarski 2011, Chetty, Friedman, Hilger, Saez, Schazenbach, and Yagan 2010). Additionally, policymakers often view class size as a policy lever to possibly improve K-12 education, and several states regulate class size (e.g. Morgan-Hart Class Size Reduction Act in California). Class size in college has also received some attention (Bettinger and Long 2008). Our setting focuses on class size in virtual classrooms. Online college courses are becoming more and more important in higher education. About one-third of US students take at least one course online during their college career. The proportion of students who take at least one course online has tripled over the past decade (Allen and Seaman 2013). More than 70 percent of public colleges now have programs that occur completely online. Class size online presents a different set of financial and educational production challenges. For example, the cost of adding an additional student is often negligible in online settings. No new desk and no new equipment is needed. The new costs are only incurred through additional staff time. Additionally, whereas class size might affect students through peers or congestion (e.g. Lazear 2002), interactions substantially change in an online setting where discussion boards are the primary forum where peers interact. While online courses may
4 present an opportunity to reduce higher education costs, any adverse impact of class size could lead to a deterioration in the overall quality of college courses. Selection issues are also pervasive. Students may be strategic in their courses. Students choose courses, class sizes, and professors, and this may be done in nonrandom ways that could also be related to outcomes. For instance, good instructors generally attract more students and thus have larger classes. Therefore, larger class sizes could be associated with instructor reputation again making it difficult to compare class sizes. To measure the effects of collegiate class size while addressing these empirical difficulties, we track nearly 118,000 of the students who enrolled in DeVry University in While DeVry began primarily as a technical school in the 1930s, today 80 percent of the University s students are seeking a bachelor s degree, and most students major in business management, technology, health, or some combination. Two-thirds of undergraduate courses occur online, the other third occur at nearly 100 physical campuses throughout the United States. In 2010 DeVry enrolled over 130,000 undergraduates, or about 5 percent of the for-profit college market, placing it among the 8-10 largest for-profit institutions that combined represent about 50 percent of the market. In 2013, DeVry conducted a unique set of class size experiments across most of their online course offerings. In each included course, they created both large and small classes. They modified class sizes by 2 to 5 students in each section. These differences represent changes as small as 2.9 percent or as large as 25 percent. DeVry University conducted this experiment in an interesting way. Their registration system makes course capacity visible to students at the time of registration while not revealing the professor. Rather than revealing the different class sizes to students, DeVry had the system list all sections as having the same size. At a certain point in
5 registration, DeVry took new registrants and assigned them to previously created sections. For example, the first two sections of a particular course typically fill up weeks before the term started. In the couple of days prior to the term, the experiment started. Five new registrants after this point would be assigned to one of the first two courses. The changed class would be the large class while the unchanged class would be the small class. These changes would occur throughout all of the existing sections -- every other section became a large class. Modifying class sizes with new registrants hides from students whether they are in a treatment or control school. However, it also complicates the estimation of class size effects. Students who register later in the registration process differ systematically from those who register early. Our prior work (Bettinger, Loeb, and Taylor 2013) suggests that weaker students are more likely to register closer to the start of the course. If this is the case, then the class size intervention created both a class size effect and a potentially negative peer effect. As we discuss below, we use a few techniques to try to separate these effects; however, we note that these effects at least theoretically should reinforce each other making a negative estimate from class effect becomes even more negative from peers. The results suggest that, after addressing issues of selection, small changes in class size generally have no effect on student learning in online courses. Large classes do not seem to adversely affect students. This result is consistent across different types of courses where one could expect a meaningful exception. For example classes which require substantial faculty interaction or courses where increased class size might crowd out meaningful interactions with faculty theoretically could generate meaningful class size effects. We find, however, that even in these courses no class size effect is present. Our finding can either be interpreted as class size
6 not having an effect in these online settings or that the effects of class size are somewhat static in the local changes that we examine. We discuss these possibilities at length. The paper is organized as follows. Section 2 includes the background on class size and online schooling. Section 3 presents our data and methodology. Section 4 presents our baseline results. Section 5 presents heterogeneity results. Section 6 presents robustness checks. Section 7 concludes. 2. Background on Class Size Economists have long been interested in the effects of class size on student outcomes. However, much of this research suffers from selection bias by comparing the outcomes of students in small and large classes without taking into account confounding factors that may influence both class size and outcomes. However, several studies focus on sources of exogenous variation in class size such as the Tennessee STAR experiment. This intervention assigned students in kindergarten through third grade to class sizes ranging from 15 to 22. Exploiting the random assignment research design, Mosteller (1995) finds positive effects on early achievement. Krueger and Whitmore (2000) provide evidence of longer-term effects: students who had been placed in a smaller class were more likely to attend college later and performed better on their SAT scores. Dynarski, Hyman, and Schazenbach (2011) extend this analysis to college completion showing that college completion rates increase by 1.6 percent and students who were assigned to small classes are more likely than students assigned to large classes to major in STEM, business, or economics. Chetty et al (2010) show mixed results as to whether the small classes led to higher earnings for students later in life.
7 One of the criticisms of research based on STAR (e.g. Hoxby 2000, Lehrer and Ding 2011) is that teachers knew that they were part of an experiment an experiment that could have led to a more desirable outcome for the teachers regardless of the impact on students. Therefore, subsequent papers use alternative sources of variation rather than policy experiments. Angrist and Lavy (1999) estimate the impact of class size on student outcomes in Israel. Following Maimonides' teachings centuries earlier, Israeli schools create an additional class once class size reaches 40. In small schools, this creates dramatic variation as a school with 40 students should have two classes with an average of 20 students per class while a school with 39 students will only have one class. Using this exogenous variation, Angrist and Lavy (1999) estimate that smaller classes lead to better student test scores. Hoxby (2000) instead exploits year-to-year changes in the size of the student population. Due to randomness in the number of children ready for school each year, there is natural variation in class size over time. Hoxby uses an instrumental variables strategy based on this variation and does not find a class size effect on student achievement. There are several possible reasons for the different estimated effects of class size found in the K-12 literature. First, the respective authors look at different populations and contexts. Class size may matter in some settings but not others. A second reason for the differing results is that the source of variation for each of these studies is from different parts of the size distribution, and the effects of class size could be very nonlinear; there could be positive effects over some parts of the distribution while no effects in other parts. While the debate continues over class size in primary and secondary settings, there are few studies that evaluate the effects of class size on student outcomes in college. The small literature that exists focuses only on limited environments (i.e. only one institution or one subject), often
8 does not address issue of bias, and has also found conflicting results. For example, Dillon, Kokkelenberg, and Christy (2002) compare small and large classes at one institution and find that class size affects the grade distribution of a course. However, their paper does not fully address potential selection bias in how students sort into classes. Using a national database (TUCE III), Kennedy and Siegfried (1997) instead examine the average student outcomes in 69 economics classes representing 53 different universities. They find class size is not related to student achievement. Becker and Powers (2001) find that class size is negatively related to student learning in economics courses. Bettinger and Long (2008) use differences in yields from year to year as an instrument for class size in introductory courses. They find that student outcomes are worse in large classes. They also demonstrate that student ability and professor characteristics differ in large and small collegiate classes. These selection issues can compound simple comparisons. Theoretical Framework: Why might Class Size Matter? There could be a number of mechanisms by which assignment to a large class affects student outcomes. These include the direct peer effects of the number in a classroom as well as the indirect effects that stem from the impact of size on faculty and student behavior. The most cited model in class size papers is Lazear s disruptive peer model. Lazear (1999) presents a model where disruptive peers create a negative externality, which reduces other students' learning. Large classes could potentially have more disruptive students than a smaller class thereby suggesting more disruptions and lower achievement in larger courses. The likelihood of disruptions may also vary by the types of students in a specific class (e.g. honors sections) and this may explain why not all large classes appear to have negative effects on their
9 students. College students may disturb classes by arriving late, using cell phones, playing computer games, or asking repetitive questions. Lazear's model holds more generally for any interaction between students or faculty that crowds out productive learning. For example, larger classes may also suffer from congestion effects. Not only can disruptions crowd out learning, but the pace of instruction can vary dramatically depending on congestion. Students who learn slowly may cause the class to move slower through topics. By contrast, students who learn quickly may inhibit slower learning students from asking productive questions. These differences in the paces of learning can crowd out productive learning. In online settings, many of the discussions occur through asynchronous posting in discussion boards. Students may struggle to get personalized responses to their postings if congestion crowds out productive discussion. This could occur if students postings are not complimentary. Class size may also affect the relationship students have with their professors. For example, if a professor has limited time to devote to a class, as the size of the course increases, each student will have less personal time. In this way, class size could affect student engagement with the professor. Topp (1984) suggests that large class sizes early in a student's academic career may alienate students leading to disassociation with the institution and consequently student withdrawal. Class size may also affect the instructor's behavior. As class size increases, professors may change the way in which they teach or the teaching technologies they employ. For example, professors may rely less on classroom discussion in a large section of a course. If classroom discussion helps students learn (or helps a student feel integrated in the university), then changes in class size may affect student learning and dropout behavior.. In the context of online courses, many of the classes have large products with multiple submission
10 deadlines. Students may be dependent on professor input to improve their assignments, but it may be more difficult for professors to provide quality feedback in larger classes. Additionally, a number of papers on class size have claimed that larger class size affects the acquisition of higher order cognitive thought processes. In the mid-to-late 1990's, the collection of data on introductory economic student test scores spurred a number of articles on the effects of class size on student outcomes (e.g. Kennedy and Siegfried 1995; Becker and Powers 2001). Some of the research during this time focused on pedagogy (e.g. McKeachie 1986) while others focused on identifying scenarios where class size might matter. While generally these papers find that increased class size did not affect learning, the research showed that new cognitive skills were less likely to be assimilated by students in large classes. The small sample sizes and lack of exogenous variation make it difficult to interpret the conclusions from these papers as causal. 3. Data and Methodology In this paper we capitalize on an experiment conducted by DeVry University to address the empirical question of whether class size affects student outcomes in the online, college context. In particular we ask if increasing online class sizes affects student GPA, credits received in the next term, and persistence in the next term. In addition, we investigate heterogeneity in this effect by course type (Science/Mathematics courses, Social Science courses, etc ), and by assignment type (courses that include projects, laboratories, both, or neither). The treatment-control contrast in this study combines both a contrast in class size, and a contrast in peer quality. Class size was directly manipulated, and then differences in peer quality
11 arose from the normal patterns of student registration at the university. Students in 111 online college courses were quasi-randomly assigned to either a small or large class section. Table 1 presents the descriptive statistics for the sample. Panel A indicates that classes, on average, contained about 32 students; though enrollment ranged between 16 and 40 students. Panel B shows that large sections on average contained over 33 students, in contrast to small sections, which contained on average over 30 students. Each large section of a course therefore enrolled 3 more students, or 10 percent more students, on average, than the small sections of the same course. This increase in class size, however, ranged from 2.9 to 25 percent. Appendix Figure 1 shows the distribution of class size changes across students. Registration for online courses at DeVry follows a few simple rules and patterns, ignoring for a moment the experimental manipulation. Students register themselves for courses and sections. The enrollment window starts six months before the term begins and ends a few days into the eight-week term; if demand exceeds the University s projections additional sections are added. During registration, online course sections have no differentiating characteristics: meeting time and location are irrelevant, class sizes are identical, and professors are not identified. These features generate a simple but strong pattern: section 1 fills up with students first, then section 2 fills, and so on. Students who choose to deviate from the pattern are few. Additionally, there is a correlation between observable student characteristics and registration date. Notably, for example, students with higher prior GPAs register earlier (a correlation of about 0.30 in any given term), thus generating important between-section variation in mean prior GPA. During the four experimental terms, student registration began exactly as described in the previous paragraph. All sections of the same course had the same class size. Then, two or six
12 weeks before the start of the term, DeVry changed the class sizes of all odd numbered sections. This change day was six weeks out for the November and January terms (two-thirds of the sample) and two weeks out for the July and September terms. For nine out of ten courses, class sizes were increased in the odd numbered sections. A university administrator simply increased the enrollment cap in those sections, and new registrants began filling those slots. In one out of ten courses, class sizes were decreased in the odd numbered sections. The administrator removed students from those sections and enrolled them in new sections. 1 In the final weeks before the term, additional students registered filling the now large and small sections. The change in enrollment caps created differences between sections in class size, but also created differences between sections in peer quality. Consider the courses where class size was increased in the odd numbered sections. Absent the experiment, students in section 1 and section 2 would have experienced the same class size and same distribution of peer quality. During the experiment, section 1 had both more students and the added students were likely lessacademically-prepared students. The added students were drawn from students who registered in the final weeks before the term began, students who registered only after the enrollment caps had been raised. By contrast, students in the last sections created, say and, experienced similar peer quality even though section had more students. Because of the importance of the assignment date, we distinguish in our analysis between those students who registered prior to the class size assignment date and those who registered subsequently. Among incumbent students (i.e. students assigned prior to the class size assignment date), our only identifying assumption is that there is randomness locally around the registration time. For late registrants, many were quasi-randomly assigned in that the administrator reassigned them somewhat 1 The selection of students to be moved was arbitrary, but not, strictly speaking, random. The administrator who carried out this task had only one consideration when selecting whom to move: students with a hold on their account for financial or academic reasons could not be moved.
13 arbitrarily. In the case that students registered late enough to see a choice between sections, then the late registrant s choice of section size may be endogenous. Identification Strategy: We employ a fixed effects strategy in order to estimate the effect of class size on a variety of student outcomes. The key insight is that student characteristics covary with the time at which they enroll in a course. Generally those students who register for a course earlier have stronger prior academic records. However, since assignment to a large classroom was quasirandomly assigned, there will be variation in class size among students who registered for the same course at similar times. Moreover, in a short enough time window, that class size variation should be spread among students with similar observed and unobserved characteristics. The following regression model summarizes the above: Here Y isct represents the outcome of interest for student, i, enrolled in course, c, during session, s, and registered at time, t. The primary outcomes of interest are the grade the student received in the course (measured on 0-4 scale), the credits obtained in the next quarter and enrollment in the next quarter. T isct represents the treatment of interest, which is the intended class size of the section to which the student was assigned. We model this treatment in two ways: as a binary indicator for small or large section and as the log of class size. Our preferred specification is the log class size since this takes into account both the increase in class size and the enrollment in the small section. That is, the log of class size lends itself to a percent increase interpretation, which depends on the enrollment in the small class size and the increase in class size, whereas the binary indicator hides that heterogeneity in treatment. X isct represents a vector of student characteristics that includes prior cumulative GPA at DeVry, an indicator for
14 being a new student, an indicator for missing prior GPA if not a new student, and an indicator for previously having failed a course. sct represents session-by-course-by-student group fixed effect. There are many ways to conceptualize these fixed effects. At the core, we order students by the time they registered for a course in a particular session. Each grouping of students, in order they registered, for a particular course in a particular session is its own fixed effect. In practice we did this in two ways: we either created groups of 15 students or divided the course into 20 quantiles. 2 Again, the identifying assumption is that within these groupings students are similar on all observable and unobservable characteristics and were randomly assigned to different class sizes based on the scheme explained above. As stated earlier, students who registered after the class caps were changed, and therefore typically had weaker prior academic outcomes, were randomly assigned to previously full sections. By enlarging these previously full sections with later registrants, class sizes were not only increased, but previously lower performing peers were mixed with previously higher performing peers. There could therefore be a class size effect and a peer effect that would differentially affect a student based on their prior academic success. To test if students who registered before and after the cap were differentially affected by this treatment, we interacted a binary indicator for registering after the cap changed with the treatment. 4. Main Results Covariate Balance 2 These groupings are arbitrary and our results are robust to a variety of groupings and quantiles divisions.
15 The identifying assumptions can be partially tested by looking at the covariate balance across class size within these fixed effects. Table 2 provides those results for both groups of 15 students and 20 quantiles. Panel A looks at the entire sample and Panels B and C disaggregate the sample by those who registered before the cap on the sections were changed, and those who registered after the cap change, respectively. When disaggregating, all covariates are balanced except one, which we would expect given the number of tests we are conducting. On the full sample there is a marginally significant imbalance with regards to prior cumulative GPA (in some models). However there are two things to note. Firstly these imbalances are quantitatively small given that prior cumulative GPA is on a 4 point scale. With the 20 quantile fixed effects, for example, students in small classrooms have a GPA that is greater by just points on a 4 point scale. In addition this slight imbalance would indicate previously higher achieving students are more likely to be in smaller classrooms. To the extent that past academic success is a powerful predictor of future academic success, this would potentially bias our estimates in the positive direction. Given that we find no effect of small classrooms, this bias seems to be negligible. Baseline Results: Table 3 presents baseline results from four models, and two different fixed effects strategies. Models 1 and 3 present the main effects of enrolling in a small class, as described in Section 2. It is immediately clear that in all cases the point estimates are quantitatively small and statistically insignificant. Small class sizes do not seem to affect student grades, the number of credits they attempt in the session, nor the probability that they enroll in the next session. Models 2 and 4 present those results when interacting with a binary indicator for a small class and with the log of class size, respectively. Again the results are insignificant across both
16 fixed effect models and across all student outcomes. Small classes do not seem to affect student grades or persistence regardless of when the student registered, and therefore by proxy, regardless of prior academic success. This point is especially salient because it likely rules out both pure class size effects and peer effects. For students who registered before the cap changed, and had stronger previous academic records, we would expect both the increased class size, and the introduction of lower performing peers to negatively affect their outcomes. They would therefore be negatively affected by two factors. In contrast, later registrants would also potentially be negatively affected by larger class sizes, but could be positively affected by having classes with stronger performing peers. In our results then, should differ by student registration date. Models 2 and 4 show that, while the point estimates on grades for those who registered before the cap change are generally negative and those for students who registered after the cap change are more positive, they again are small and insignificant. This may indicate that both the class size and the peer effects had negligible effects on the students. While it is evident that the point estimates in our results are insignificant, it is useful to estimate precisely how small an effect we would be able to detect. To do so we will concentrate on Model 4 of Table 3, which can be used to estimate the effect size on both those that registered before the class size cap was changed, and for those that registered after. Furthermore we will estimate the effect size for a 10 percent change in class size, since that was the average change in our sample. The exact point estimate varies to some extent, depending on the student group used in the fixed effects. The 20 quantile fixed effect model indicates that for those students who registered before the class size cap was change, the 95 percent confidence interval of the effect size of a 10 percent increase in class size on class grade will range from to (the standard deviation of class grade for students who registered before the cap changed was 1.16).
17 This provides a more conservative estimate. A similar calculation with the 15 student group fixed effect model yields a point estimate range of to Looking at the students who registered after the registration cap changed (the standard deviation of the class grade is 1.31) the 20 quantile fixed effect model yields a range of to and the 15 student fixed effect model yields a range of to Table 4 presents the 95 percent confidence interval of the effect size for all outcomes for both the 15 student group and 20 quantile fixed effects models. These are clearly small effects on outcomes. To put these effect sizes in a clearer context, we can rule out effects typically found in the higher education literature. For example, Bandiera, Larcinese, and Rasul found that a 1 standard deviation increase in university class size had a effect size on end of year test scores. Similarly, De Giorgi, Pellizzari, and Woolston found that a standard deviation increase in class size (approximately 20 students in classes that on average contained 131 students) had an effect size of on grades. Finally, in their 2009 paper Angrist, Lang, and Oreopolis found that assigning college students a fellowship and support services had an effect size on fall grades of 0.227, most of which was concentrated on women where the effect size was Effect sizes of these magnitudes would have been easily detected given the power in our sample. 5. Heterogeneity of Results The effect of class size on student outcomes need not be constant for all classes. To see if there is any heterogeneity in the class size effect we divided the sample in two ways. First, we divided the sample by academic discipline and separated courses that could be described as Business/Marketing, Computer/Technology, General Education, Humanities,
18 Science/Mathematics, or Social Sciences. Secondly, we divided the sample by the types of assignments each class required of the students: project only, laboratories only, both projects and laboratories, or neither projects nor laboratories. There are many reasons to think the effect of a class size increase could vary based on discipline type and assignment type. Firstly, the peer effect could change. Peers could have a greater effect on classes that required more interaction such as projects, laboratories, and perhaps computer technology. This is especially relevant in this study where the class size increase and the quality of the one s peers were intimately related. Similarly, science and mathematics classes that typically have problem sets may be affected by peer quality if students informally form study groups. In larger classes there is also more competition for the professor s time (if students contact the professor electronically), and professors may change the way they organize and structure the class. The likelihood that students contact professors may depend on the discipline or assignment structure of the class. Similarly, the likelihood that the professor changes the structure of the class may depend on the assignment type and/or discipline. For example in larger classes humanities professors may change the length of written assignments due. The quantity and rigor of projects, laboratories, and problem sets may also change with increasing class sizes. Table 5 shows the covariate balance for each of these different samples. Though there are a few covariates that are significantly different than zero at the 10 percent or 5 percent level, this is expected given the number of tests we are conducting. The most imbalanced sample contains both project and laboratory oriented courses, as we will see below, despite these imbalances we still fail to find a significant effect of class size on student outcomes. Tables 6 and 7 show the effects of class size on courses in different disciplines and with different assignment types, respectively. These models are analogous to Model 4 of Table 3