The effects of value-added modeling decisions on estimates of teacher effectiveness
|
|
- Mitchell Norton
- 7 years ago
- Views:
Transcription
1 University of Iowa Iowa Research Online Theses and Dissertations 214 The effects of value-added modeling decisions on estimates of teacher effectiveness Paula Lynn Cunningham University of Iowa Copyright 214 Paula Lynn Cunningham This dissertation is available at Iowa Research Online: Recommended Citation Cunningham, Paula Lynn. "The effects of value-added modeling decisions on estimates of teacher effectiveness." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Educational Psychology Commons
2 THE EFFECTS OF VALUE-ADDED MODELING DECISIONS ON ESTIMATES OF TEACHER EFFECTIVENESS by Paula Lynn Cunningham A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa December 214 Thesis Supervisor: Professor Catherine J. Welch
3 Copyright by PAULA LYNN CUNNINGHAM 214 All Rights Reserved
4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Paula Lynn Cunningham has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) at the December 214 graduation. Thesis Committee: Catherine J. Welch, Thesis Supervisor Robert D. Ankenmann Timothy N. Ansley Stephen B. Dunbar Marcus J. Haack David B. Bills
5 To the memory of Bernice Rita Shanklin ii
6 Never despair, but if you do, work on in despair. Fortune Cookie Chuong Garden Restaurant Grinnell, Iowa iii
7 ACKNOWLEDGMENTS I wish to express my sincere gratitude to my advisor, Cathy Welch, for her guidance, support, and understanding, both during the dissertation research and through all my years as a graduate student. A knowledgeable and patient mentor, she has helped me to become an independent researcher. Most of all, I am thankful for her kindness and reassurance when I needed to pause my graduate study and her encouragement when I was able to resume it. I also wish to acknowledge the generous input of Steve Dunbar to the success of this research project from its beginning. In addition, I thank all the members of my dissertation committee for their insightful feedback in the form of suggestions that aim to strengthen this document, helping to make it an accomplishment of which I can truly be proud. I also wish to thank all the talented people of the Iowa Testing Programs, and Matt Whittaker in particular for his efforts in creating the matched longitudinal data sets and generating the state means results used in this study. It was as a graduate research assistant for Iowa Testing Programs that I learned how test development and psychometric research are accomplished. Through assignments that challenged me and the realization that my contributions mattered, I grew more confident as I gained understanding. I feel grateful for having had the privilege of working among these dedicated professionals. Not least of all I acknowledge the source of my strength to continue in this enterprise, the solid foundation keeping me upright: my family. I cannot praise too highly my husband Charles and son Evan for their encouragement at the outset of this journey and their support through its completion. From the shaky first semester to comprehensive examinations and the dissertation phase they have been there for me, sharing all the trials, successes, despair and joy in short, life that happened along the way. iv
8 ABSTRACT This study was undertaken to evaluate the impact of modeling decisions made by those charged with implementing teacher evaluation systems that incorporate student achievement data; such choices include how growth is to be modeled, whether student characteristics are to be controlled for, how many years of data are to be used, and which test subject is to be selected. Using a three-cohort longitudinal data set from a school district in which reading and mathematics test scores from a vertically-scaled assessment allowed determination of growth in grades three, four, and five, estimated teacher effects were derived from five value-added models, and the resulting rank orderings of the teachers were examined. The models compared were a covariate adjustment model that conditioned on prior achievement only, a covariate adjustment model that conditioned on certain student characteristics as well as prior achievement, a gain score model, the growth model underlying the vertically-scaled assessment, and student growth percentiles. Teacher rank orderings derived under the five models were highly consistent with one another using either one or three classroom years of test scores. Only when the movement of teachers between quartiles was examined did a difference in performance between some models emerge. The high degree of consistency between the two covariate adjustment models suggested that control for student-level characteristics was unnecessary. Using three years of test scores rather than one led to a small decrease in between-model correlations and a small increase in teacher movement between quartiles. Comparison of teacher value-added based on reading scores versus mathematics scores gave mixed results, with between-model correlations in mathematics being slightly higher than those for reading but with reading showing greater consistency in quartile movement between cohorts. v
9 The year-to-year change in teacher rank orderings was very striking, as low, and even negative, correlations emerged between years. Movement of teachers between quartiles from one year to the next was far greater than that observed when comparing the modeling conditions. Using a teacher rating scheme in which groups of teachers were distinguished from average effectiveness if they appeared in the extremes of the rankings, nearly half of teachers changed ratings from one year to the next. Such low intertemporal stability of teacher value-added is a significant result that should be considered by all stakeholders in teacher evaluation. vi
10 PUBLIC ABSTRACT This study examined the impact of modeling decisions made in implementing value-added teacher evaluation; such choices include the growth model itself, whether to control for student characteristics, how many years of scores to use, and the subject tested. Estimates of teacher effectiveness were derived from five models, which were a covariate adjustment model that conditioned on prior achievement only, a covariate adjustment model that conditioned on certain student characteristics as well as prior achievement, a gain score model, the growth model underlying the assessment, and student growth percentiles. The resulting rank orderings of the teachers were examined and found to be highly consistent with one another using scores for either one or three classroom years. When the movement of teachers between quartiles of the rank orderings was examined, a difference in performance between some models did emerge. The covariate adjustment models were highly consistent, suggesting that control for student-level characteristics was unnecessary. Using three years of data rather than one did not significantly change model performance, and comparison of rank orderings based on reading scores versus mathematics scores gave mixed results. The year-to-year inconsistency in rank orderings was striking. Movement of teachers between quartiles from one year to the next was far greater than that observed when comparing modeling conditions. Under a rating scheme in which teachers were distinguished from average effectiveness if they appeared in the extremes of the rankings, nearly half of teachers changed ratings from one year to the next. vii
11 TABLE OF CONTENTS LIST OF TABLES...x LIST OF FIGURES... xii CHAPTER I INTRODUCTION...1 An Approach to Teacher Evaluation...1 Implementing VAM-based Teacher Evaluation...3 Purpose of the Study and Research Questions...7 CHAPTER II LITERATURE REVIEW...8 Status versus Growth...8 Growth Models...9 Growth Models versus Value-Added Models...11 Four Widely Used Models...12 Gain Score Model...12 Residual Gain/Covariate Adjustment Model...13 Student Growth Percentile Model...1 Educational Value-Added Assessment System...16 Research on Comparison of Models...17 Ongoing Concerns about Value-Added Models...19 Bias...19 Precision...2 Stability...22 Practical Considerations...23 CHAPTER III METHODS...28 Data...28 Value-Added Models...31 Covariate Adjustment Model 1 (CA1)...31 Covariate Adjustment Model 2 (CA2)...32 Gain Score Model (GAIN)...3 Iowa Growth Model (IOWA)...3 Student Growth Percentile Model (SGP)...36 The Study and Research Questions...37 Section 1: Question 1a...37 Section 2: Question 1b...39 Section 3: Question Section 4: Question viii
12 CHAPTER IV RESULTS...4 Section 1: Effect of Model Choice with Single Cohorts...4 Spearman Rank Order Correlations... Quartile Analysis... Section 2: Effect of Model Choice with Multiple Cohorts...6 Spearman Rank Order Correlations...8 Quartile Analysis...8 Section 3: Stability between Cohorts...9 Teacher Retention between Cohorts...9 Between-cohort Spearman Rank Order Correlations...6 Quartile Analysis...61 Rating Consistency...63 Section 4: Generalizability across Tests...64 Effect of Model Choice with Single Cohorts...64 Effect of Model Choice with Multiple Cohorts...6 Stability between Cohorts...67 Between-subject Spearman Rank Order Correlations...69 Summary of Results...69 CHAPTER V DISCUSSION...19 Summary of Findings...19 Research Question Research Question Research Question Implications for Practice Limitations and Continuing Research...12 Conclusion APPENDIX CATERPILLAR PLOTS OF TEACHER VALUE-ADDED...12 REFERENCES...14 ix
13 LIST OF TABLES Table 3.1 Table 3.2 Table 3.3 Table 3.4 Group Means with Standard Deviations on the Reading Subtest for All Cohorts and Grades...49 Group Means with Standard Deviations on the Mathematics Subtest for All Cohorts and Grades... Percentages of Students with Positive Status on FRL, IEP, ELL, and Combinations Thereof...1 Correlations between Reading Subtest Score, Mathematics Subtest Score, FRL, IEP, and ELL Variables...2 Table 3. R 2 Values for Best Predictive Models...3 Table 4.1 Table 4.2 Table 4.3 Table 4.4 Table 4. Table 4.6 Pooled Spearman Rank Order Correlations between Models for Single-year Analysis...73 Transition Matrices Showing Quartile Consistency between Models for Single-year Analysis...74 Percent of Teachers who Changed Quartile by Model for Single-year Analysis...76 Pooled Spearman Rank Order Correlations between Models for Multiple-year Analysis...77 Transition Matrices Showing Quartile Consistency between Models for Multiple-year Analysis...78 Percent of Teachers who Changed Quartile by Model for Multipleyear Analysis...8 Table 4.7 Percent Teacher Retention between Cohorts...81 Table 4.8 Pooled Spearman Rank Order Correlations between Cohorts...82 Table 4.9 Median Spearman Rank Order Correlations between Cohorts...83 Table 4.1 Transition Matrices Showing Year-to-year Consistency of Quartiles...84 Table 4.11 Percent of Teachers who Changed Quartile Year-to-year...86 Table 4.12 Percent of Teachers who Changed Rating Year-to-year...87 x
14 Table 4.13 Table 4.14 Table 4.1 Table 4.16 Table 4.17 Table 4.18 Table 4.19 Table 4.2 Table 4.21 Table 4.22 Table 4.23 Table 4.24 Spearman Correlations between Models Pooled by Subject for Single-year Analysis...88 Transition Matrices Showing Quartile Consistency between Models for Single-year Analysis for the Reading Subtest...89 Transition Matrices Showing Quartile Consistency between Models for Single-year Analysis for the Mathematics Subtest...91 Percent of Teachers who Changed Quartile Due to Model by Subtest for Single-year Analysis...93 Spearman Correlations between Models Pooled by Subject for Multiple-year Analysis...94 Transition Matrices Showing Quartile Consistency between Models for Multiple-year Analysis for the Reading Subtest...9 Transition Matrices Showing Quartile Consistency between Models for Multiple-year Analysis for the Mathematics Subtest...97 Percent of Teachers who Changed Quartile Due to Model by Subtest for Multiple-year Analysis...99 Pooled Spearman Rank Order Correlations between Cohorts by Subtest...1 Median Spearman Rank Order Correlations between Cohorts by Subtest...11 Transition Matrices Showing Year-to-year Consistency in Quartiles for Reading Subtest...12 Transition Matrices Showing Year-to-year Consistency in Quartiles for Mathematics Subtest...14 Table 4.2 Percent of Teachers who Changed Quartile Year-to-year by Subject...16 Table 4.26 Percent of Teachers who Changed Rating Year-to-year by Subject...17 Table 4.27 Between-subject Spearman Correlations Pooled over Methods...18 Table.1 Additional Test Items Answered Correctly by the Class of the Highest-ranked Teacher Compared to the Class of the Lowest-ranked Teacher xi
15 LIST OF FIGURES Figure 2.1 Illustration of the Gain Score Model...2 Figure 2.2 Figure 2.3 Illustration of the Residual Gain Model...26 Illustration of a Linear Regression Line and a Median Quantile Regression Line...27 Figure 3.1 Structure of the Longitudinal Data Sets...44 Figure 3.2 Attribution of Growth Using Fall-to-fall Testing Schedule...4 Figure 3.3 Figure 3.4 Figure 3. Figure 4.1 Figure 4.2 The Iowa Growth Model: Plots Demonstrating the Relationship between Standard Score and Percentile Rank for Levels of the Reading Subtest of the Iowa Assessments...46 The Eighteen Rank Orderings Generated under Each VAM Condition with Single-year Data...47 The Six Rank Orderings Generated under Each VAM Condition with Multiple-year Data...48 Rank Ordering Change from Cohort 1 to Cohort 2 for Fourth Grade Mathematics Using the Gain Score Model...71 Rank Ordering Change from Cohort 1 to Cohort 2 for Fourth Grade Reading Using the Gain Score Model...72 Figure A1 Caterpillar Plots for Cohort 1 under the CA1 Model...12 Figure A2 Caterpillar Plots for Cohort 1 under the CA2 Model Figure A3 Caterpillar Plots for Cohort 1 under the GAIN Model Figure A4 Caterpillar Plots for Cohort 1 under the IOWA Model Figure A Caterpillar Plots for Cohort 1 under the SGP Model Figure A6 Caterpillar Plots for Cohort 2 under the CA1 Model...13 Figure A7 Caterpillar Plots for Cohort 2 under the CA2 Model Figure A8 Caterpillar Plots for Cohort 2 under the GAIN Model xii
16 Figure A9 Caterpillar Plots for Cohort 2 under the IOWA Model Figure A1 Caterpillar Plots for Cohort 2 under the SGP Model Figure A11 Caterpillar Plots for Cohort 3 under the CA1 Model...13 Figure A12 Caterpillar Plots for Cohort 3 under the CA2 Model Figure A13 Caterpillar Plots for Cohort 3 under the GAIN Model Figure A14 Caterpillar Plots for Cohort 3 under the IOWA Model Figure A1 Caterpillar Plots for Cohort 3 under the SGP Model Figure A16 Caterpillar Plots for Combined Cohorts under the CA1 Model...14 Figure A17 Caterpillar Plots for Combined Cohorts under the CA2 Model Figure A18 Caterpillar Plots for Combined Cohorts under the GAIN Model Figure A19 Caterpillar Plots for Combined Cohorts under the IOWA Model Figure A2 Caterpillar Plots for Combined Cohorts under the SGP Model xiii
17 1 CHAPTER I INTRODUCTION Accountability in K-12 education is an ongoing concern. The most recent reauthorization of the Elementary and Secondary Education Act (ESEA), the No Child Left Behind Act of 21 (NCLB), mandated testing of students to hold schools and districts accountable for making Adequate Yearly Progress (AYP) toward 1 percent proficiency in reading and mathematics by 214 to avoid facing sanctions. A few years later, the Secretary of Education announced the Growth Model Pilot Program (GMPP; Spellings, 2); there was subsequent movement by many states away from using the status measure of proficiency toward another measure, growth to a standard, in the belief that using this measure could allow some schools to make AYP that would fail to do so under the status measure. Over time, growth models have become the preferred method of analyzing student achievement test data for the purpose of accountability (Betebenner & Linn, 21). In 29, as part of the American Recovery and Reinvestment Act, the Race to the Top (RTTP) initiative placed emphasis on teacher evaluation using student test scores (United States Department of Education, 29). Value-added modeling, in which student achievement is attributed to various causes, such as teachers, schools, and sometimes background characteristics, is the most recent tool being brought to bear on the question of accountability. With many states choosing to emphasize teacher evaluation and with their students longitudinal data having been recorded over years of standardized testing, value-added modeling is now receiving a lot of attention. An Approach to Teacher Evaluation Numerous states are implementing evaluation systems that incorporate students standardized tests scores to some degree in consequential decisions about teacher salaries, promotions, tenure and even dismissal (Braun, 2). Value-added models (VAMs) are
18 2 used to quantify deviations from expected student performance on a test after a year of instruction, based on characteristics such as the student s achievement on the previous year s test. Teachers in elementary grades whose students take standardized tests in subjects such as reading and mathematics can be held accountable for getting them to achieve their expected scores. The movement toward linking student performance on tests to teacher evaluations gained considerable momentum through the awarding of points in the Race to the Top initiative to states that did link them (Braun, 212). Many proponents take the view that VAMs hold the promise of adding objectivity to teacher evaluation systems that have heretofore relied on seniority, attainment of credentials, and principal observations of classroom performance (Braun, 212). They might suggest that the first two measures do not really reflect teacher effectiveness in the classroom and that principal observations occur too infrequently and result in satisfactory ratings for virtually all teachers, making them less useful as a measure to distinguish between teachers (Papay, 212). In addition, some VAMs purport to control for student background characteristics; this fact has been interpreted as meaning that VAMs level the playing field, so that teachers are evaluated more fairly. Yet VAM-derived teacher effects are themselves known to contain considerable error, in particular when they result from fewer than three years of accumulated test data. They are also subject to unpredictable bias introduced either because they do or do not attempt to account for student background characteristics (McCaffrey, Lockwood, Koretz, & Hamilton, 23). When such statistical controls are introduced, there is a further concern that they result in different achievement expectations for different groups of students (Ballou, Sanders, & Wright, 24). Another consideration is striking a balance between complexity and transparency: VAMs applied in educational settings can be very complex and involve numerous factors, so that explaining to teachers how they work and how their rankings are generated is not simple (National Research Council & National Academy of Education, 21).
19 3 Implementing VAM-based Teacher Evaluation Despite the enthusiasm with which some state legislatures are mandating new teacher evaluation systems that incorporate student test scores, there does not exist a clear set of best practices available to guide those charged with implementing them. There are numerous requirements and consequential decisions facing state departments of education and individual school districts during the process of implementing teacher evaluation systems that rely on the use of VAMs. Adopting such models for teacher evaluation places many requirements on the states and school districts for their proper use. The most obvious requirement is the existence of matched longitudinal test score data for students, and depending on the model chosen, even more student-level demographic data may be required. Within this data, accurate links to classroom teachers must exist, or else the student data will be unable to be included in the analysis and will effectively be considered missing. The problem of missing test scores must be handled either by deletion of cases or imputation of values, with consequences arising from either choice (Cunningham, Welch, & Dunbar, 214). Experts are required both to conduct the analysis using VAMs and to produce reports and lead training sessions that provide support for administrators and educators to make appropriate inferences from the analysis. Furthermore, an evaluation of the system must be established in order to monitor the effects of its implementation on students and teachers alike, being sensitive to unintended consequences that may occur in response. Among the decisions state departments of education and school districts may have some input into are the uses to which these analyses may be put and whether the stakes for educators are high or low. While it is generally agreed by researchers that the use of student achievement test data to evaluate teachers for low stakes purposes, such as for use in establishing which teachers may benefit most from improvement strategies through professional development, is a warranted use of VAMs, there is far less agreement about the extent to which they should be relied upon for mandated evaluation for high stakes
20 4 purposes, such as merit pay or tenure (National Research Council & National Academy of Education, 21). The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) make clear that there should be evidence of validity and reliability for every test use and that the greater the consequences of the test use are, the stronger the evidence in support of that use should be. States and school districts need to consider that the researchers who understand and use VAMs the most are not in agreement that high-stakes teacher evaluation is an appropriate use of this technique. When the use of student achievement data for teacher evaluation has been mandated, a decision must be made about whether the VAM-derived teacher effects will replace or complement other measures of teacher effectiveness that are being used. It should be considered whether the use of VAMs results in more useful, accurate, and fair outcomes than other measures. All such measures are imperfect, but as part of a teacher evaluation system using multiple measures, such as standardized principal evaluations that include classroom visits and video recordings, tests of teachers content knowledge, surveys of students and parents, and teacher peer evaluations, some concerns expressed by researchers may be allayed (Kane & Staiger, 212). States and school districts must still make a determination about how to weigh the VAM-derived teacher effects with those other measures. Finally, numerous choices need to be made concerning the value-added modeling itself. There are many different types of VAMs discussed in the literature, yet at this time there is no method that has emerged as dominant (National Research Council & National Academy of Education, 21). Some factors that are considered by VAM researchers include whether to specify teacher effects as fixed or random, whether to take a univariate or multivariate approach to modeling, how to disentangle school effects from teacher effects, and how to handle incomplete student records (McCaffrey et al., 23).
21 States and school districts as users of VAM results would likely have to depend upon their experts for advice about the impact of these decisions upon the analyses they conduct. However, these users can and arguably should have input on certain aspects of the modeling so that they have ownership of the process and remain accountable to their stakeholders. State departments of education and individual school districts could be involved in decision making about how to characterize student growth in achievement, how many years of data to use for evaluating teachers, and which, if any, student characteristics to control for in the analysis (Raudenbush, 24). There are many metrics available to characterize the student growth modeled by VAMs, and the preference of growth metric will depend on factors such as the type of assessments available and the ease with which student growth can be understood by policymakers and practitioners. One metric that has seen much use in VAMs is residual gain, which is a measure of how much a student s score deviates from the regression of current scores on past scores; a VAM that uses this method to characterize growth is called a covariate adjustment model. Another metric used in VAMs is the gain score, which is literally the difference between one year s achievement and the prior year s achievement on the score scale. While there is no single preferred model for value-added analysis, these are among the more commonly used choices (McCaffrey et al., 23). There are, however, additional growth metrics that could find application in VAMs. One consideration is that expected annual growth on an assessment, conditional on prior achievement, can be predicted by projecting forward a year through its vertical scale, which is established on a growth model (Furgol, Fina, & Welch, 211). Another growth metric that could be utilized to calculate estimates of teacher value-added is the student growth percentile (SGP; Betebenner, 29). The SGP metric relies on the use of quantile regression, conditioning on prior achievement to describe the current achievement of students. Because the first of these metrics depends on using a verticallyscaled assessment whereas the second does not, the types of assessments available to the
22 6 state or school district may dictate which of these is preferred. Furthermore, a particular growth metric may come to be seen as more acceptable by practitioners, particularly if its details can be communicated thoroughly enough to be accurate yet transparently enough to be understandable. State departments of education and individual school districts adopting VAMbased evaluation systems need to decide over how many years of instruction teachers will be evaluated. One of the major hurdles in applying VAMs to teacher evaluation is that teachers, especially in the elementary grades, often have very small classrooms. While more years teaching in the district will increase the amount of student data available to evaluate the teacher and perhaps thereby lower the standard errors of teacher effects, the solution is not simply to use seven years of data and assume that there will be substantial improvement in the errors of the estimates. After all, not all teachers will have been teaching for that many years in a district, so there will always be many teachers who have few students and, as a result of that, estimated teacher effects with larger standard errors. Furthermore, there is the question of whether it is appropriate to use seven-year-old data for current teacher evaluations; that question would have to be taken up by those who set policy. The use of student and sometimes teacher characteristics to adjust expected student growth is controversial, with many value-added researchers embracing the idea because the practice may result in greater stability of the estimated teacher effects. It is also purported to correct for influences on student achievement from outside the school environment, so that teachers are fairly evaluated regardless of the composition of their classrooms. However, it is not uncommon for those who make decisions for states and school districts to be more reluctant to include demographic covariates, in order to avoid the appearance of adopting different expectations for different groups of students. While research on the effect of including such covariates is somewhat mixed, it is clear that prior achievement is the single most important one, accounting for much more variance
23 7 in the estimates than demographic covariates do (Ballou et al., 24; Lockwood et al., 27). Statistical control for student-level characteristics is easily implemented as part of a covariate adjustment model. Purpose of the Study and Research Questions In order to discover information that could provide guidance to policymakers and practitioners in making decisions concerning teacher evaluation systems that incorporate student achievement data, a study was undertaken to evaluate the impact of choices made concerning how student growth is to be modeled, how many years of data are to be used, and whether student characteristics are to be controlled for in the analysis. The study used a three-cohort longitudinal data set from a school district in which reading and mathematics test scores from a vertically-scaled assessment were available for four consecutive years in each cohort, such that growth could be assessed in the third, fourth, and fifth grades. Estimated teacher effects were derived from VAMs using five different metrics for growth, and the resulting rank orderings of the teachers were examined. Research questions for the study included: 1. How do the rank orderings derived from different metrics for growth compare with one another for both (a) single year and (b) multiple year analyses? 2. How do the rank orderings derived using the various growth metrics compare year-to-year between the cohorts? 3. How generalizable are the answers to the questions 1 and 2 above from one test subject to another? These three research questions address various aspects of the application of VAMs to a practical setting. The methods used to address each research question are described specifically in Chapter III.
24 8 CHAPTER II LITERATURE REVIEW This chapter discusses value-added modeling within the broader context of student growth in achievement, beginning with the distinction between status and growth and their use as accountability measures. This introduction is followed by the definition of a growth model and an explanation of the general types of growth models, as categorized by different researchers. The key distinction between growth models and VAMs is given; this is followed by a discussion of applications and considerations for several models. Ongoing concerns about bias, error, and stability in the estimates generated by VAMs are described next. Finally, considerations for those involved in the implementation of teacher evaluation systems that incorporate student achievement data are addressed. Status versus Growth As accountability systems in education have evolved over time due to changes in the guidance provided by government agencies, there has been a concomitant movement away from a reliance on status measures to the adoption of growth measures (Briggs & Betebenner, 29). The difference between a status measure and a growth measure is a distinction between single and multiple snapshots of student achievement. Castellano and Ho (213a) define status as the academic performance of a student or group (a collection of students) at a single point in time, and they define growth as the academic performance of a student or group (a collection of students) over two or more time points. It was felt that status measures, such as yearly average performance, were not sufficient for the purpose of accountability and that student change over time would be a better measure. With growth measures, each student s progress could be compared against that student s own achievement in the previous year rather than against a cohort average (Callender, 24).
25 9 Growth Models Castellano and Ho (213a) define a growth model as a collection of definitions, calculations, or rules that summarizes student performance over two or more time points and supports interpretations about students, their classrooms, their educators, or their schools. The authors also classify growth models according to several criteria. One such classification is made according to the primary interpretations growth models support, which include growth description, growth prediction, and value-added. Another useful classification system is based on the statistical foundations underlying the growth model, in which three categories are proposed: gain-based models, conditional status models, and multivariate models. The first of these statistical foundations supports models that use a gain score to quantify growth. A gain score is simply the difference between a test score at one point in time and a test score at another point in time. One essential feature of a test used in the context of a gain-based model is the existence of a vertical scale, which affords a developmental basis for interpretations of growth over successive grade levels. With test scores for all grade levels placed on the same scale, it is possible to compare a student s fall test score from the third grade level to that from the fourth grade level and interpret this difference as the growth the student made over the year in the subject being tested (Castellano & Ho, 213a). The second statistical foundation underlies growth models that allow one to interpret a student s current status in light of what that student s status is expected to be, based on the past scores of that student and others. These are called conditional status models because they refer to the current status conditional on the past status, meaning that they take past test scores into account. This statistical foundation is different from that underlying the gain-based models, wherein growth is assessed from two points in time by the difference of current status and past status, in that current status for this case is compared to an expected status that is arrived at based on past performance and
26 1 potentially other information. Castellano and Ho (213a) give as examples of conditional status models the residual gain model, in which conditional status is defined by the difference of the current score and the score expected given past scores, and the student growth percentile model, in which the expectation is expressed through the percentile rank of the current score in the distribution of scores of students who had the same score at an earlier time. The third statistical foundation described by Castellano and Ho (213a) is the basis for multivariate models that are used primarily to estimate school and teacher effects in value-added applications, as it is not the ideal foundation for the purposes of growth description or prediction. Such models make use of large amounts of data and can be very complex. Perhaps the most widely implemented model of this type is the Educational Value-Added Assessment System, known as SAS EVAAS (Sanders & Horn, 1994); this model requires the use of specialized proprietary software from the SAS Institute (SAS Institute, 212). The perspective offered by Castellano and Ho (213a) concerning the systematic classification of growth models based on their statistical foundations is not intended to be taken as the only correct interpretation. There are other systems to classify growth models according to their statistical foundations. For instance, Briggs and Betebenner (29) assert that all statistical models for test score growth are essentially models of conditional achievement. They note that models can be distinguished from one another based on whether they model student achievement conditional on time or conditional on prior achievement. Models that conceptualize achievement conditional on time are referred to as absolute growth models, and those that conceptualize achievement conditional on prior achievement are referred to as relative growth models. In their scheme, a gain score model is an absolute growth model that is constrained to use scores from only two longitudinal time points. They too note the
27 11 requirement for this model that scores be placed on a vertical scale in order to make meaningful comparisons in an absolute sense (Briggs & Betebenner, 29). These authors assert that the quantity of interest in a relative growth model is the residual, the difference between a student s observed achievement and the achievement that would be predicted given the student s prior achievement. Use of residuals provides a normative interpretation of growth: the residual shows the amount of growth above or below the statistical expectation. Models as different in complexity as simple linear regression models, such as the residual gain model, and multivariate models, such as SAS EVAAS, are relative growth models by this definition. The common foundation underpinning these models is the principle of relative growth, defined as the difference between observed and expected achievement (Briggs & Betebenner, 29). Growth Models versus Value-Added Models Briggs and Betebenner (29) state, the leap from a growth model to what can be called a value-added model is a short one. They also assert that all growth models can be turned into VAMs through three steps. In order for a VAM to be used to generate teacher value-added, the following steps would need to occur. First, one must define what constitutes expected achievement for a student. Second, one must calculate a deviation from the expected achievement that contrasts what has been observed to what would be expected for the student. Third, one must make the inference that this deviation from what would be expected is an expression of the value-added to student achievement by the teacher. Making a similar argument, Castellano and Ho (213a) state, we consider value-added to be an inference, not a model. Others take the view that growth models and VAMs are distinct due to the fact that growth models do not generally control for student background or school factors (Baker et al., 21). They argue that one cannot attribute student growth in achievement to teachers without controlling for the effects of these factors. Yet Castellano and Ho (213a) point out that without the existence of a rigorous experimental design in which,
28 12 among other requirements, students are assigned randomly to classrooms, no model can support value-added inferences on its own. The reality is that in practice, as opposed to in research, most statistical models that have been used to support value-added inferences have tended not to include such predictor variables as race or socioeconomic status measures (National Research Council & National Academy of Education, 21). Four Widely Used Models Hereafter follows a brief description of four models frequently used to characterize student growth for accountability purposes, including teacher evaluation. These are the gain score model, the residual gain/covariate adjustment model, the student growth percentile model, and the SAS EVAAS model. Gain Score Model As noted earlier, a gain score is simply the difference between a test score at one point in time and a test score at another point in time. In the context of accountability, the two time points of interest occur at two grade levels, so the scores need to be placed on a common scale that is in turn representative of increasing competence in the domain being tested. The gain score model is an absolute growth model that describes a student s growth relative to his or her own previous score. As the following example (Castellano & Ho, 213a) shows, the gain score is the difference between the test score at the current time point and the test score at the previous time point. This calculation is depicted graphically in Figure 2.1, where a student s scores in third and fourth grade on a hypothetical vertically-scaled test are shown. This student s scores are marked with black dots, and the gain score is shown by the vertical difference between them. In this case the third grade score, which is 3, is subtracted from the fourth grade score, which is 37, to yield a gain score of +2. Gain scores can be aggregated to the group level by averaging a set of students gain scores, in order to characterize the average change in performance for the group. Most often the average of students individual gain scores can serve as a group-level
29 13 summary statistic for a subset of students, such as those in a particular classroom, school, or district. When the average gain score is positive, one can conclude that the students as a group made positive gains, whereas when the average gain score is negative, one can conclude that the group of students declined overall in their performance. Gain score models can be used for making value-added determinations of teacher effectiveness, by considering the value-added to be the deviation from the average gain in the district. However, some have expressed concern that gain-based models are not the best to use for making value-added inferences, due to the dependence of school effects upon the vertical scaling properties of tests (Briggs & Weeks, 29). Since vertical scales are developed to enable student growth in achievement to be described, and not necessarily to support causal inferences about that growth, Briggs and Weeks (29) argue that some properties of the vertical scale may be poorly suited for the purpose of accountability. For instance, some vertical scales reflect that higher scoring students make greater gains than those who score lower (Castellano & Ho, 213a). Such a vertical scale may correctly describe the observed pattern of growth with respect to initial status, but it does not make for the best accountability tool where growth expectations for all students are required to be equal. On the other hand, note Castellano and Ho (213a), these differential, scale-based expectations for lower-scoring students may be precisely what the accountability model should reflect. Residual Gain/Covariate Adjustment Model Linear regression is a statistical method that allows the prediction of an outcome variable from one or more predictor variables. The residual gain model uses linear regression to predict students expected scores from their prior scores. The residual gain is then calculated as the observed current score minus the expected score determined by the model. The residual is the quantity that describes the amount students scored above or below their expected scores, which were determined by their prior performance.
30 14 The following example, offered by Castellano and Ho (213a), will serve as an illustration of the residual gain model. Suppose there is a sample of eight students in fourth grade with test scores for both the third and fourth grades. Figure 2.2(a) shows a scatterplot for the students third and fourth grade scores, which are: (34,33), (34,3), (34,36), (3,3), (3,36), (3,37), (3,37), and (3,38). The eight students are represented in the plot by solid black dots, and the black line in the figure is the prediction line for fourth grade scores given third grade scores, which is the output of the linear regression method. The prediction line is the least squares best fit of the average fourth grade score across all the third grade scores; thus the line represents the expected fourth grade score at every possible third grade score. For instance, for a student with a third grade score of 3, the model predicts an expected fourth grade score of 364. Determining the expected current score is only the first step in the residual gain model. Figure 2.2(b) illustrates the calculation of the residual gain score, which is the difference between the observed current score and the expected current score. For a particular student whose score in third grade was 3 and in fourth grade was 37, his or her expected fourth grade score is predicted to be 364 by the linear regression line. In this case the expected fourth grade score, which is 364, is subtracted from the observed fourth grade score, which is 37, yielding a residual gain of +11. The typical summary statistic for a group of students is the average residual gain for those students in the same classroom, school, or district. The mean residual gain score is expected to be zero across the data set used in the analysis; for any given classroom of the data set, however, the mean residual gain score is not necessarily expected to be zero. The magnitude and sign of the mean residual gain score reveal something about the achievement of the students in the classroom being examined, with respect to expectations for their achievement (Castellano & Ho, 213a).
31 1 When the assumption is made that the average residual gain is the value-added to the average test scores in the group by a teacher or school, the model is a type of VAM called a covariate adjustment model. Like the residual gain model, the covariate adjustment model makes predicted expectations for outcome variables by using one or more predictor variables. The covariate adjustment model is one of the most commonly used models to support value-added interpretations (Castellano & Ho, 213a). Student Growth Percentile Model The student growth percentile (SGP) model describes current student status by taking into account past performance and thus utilizes a conditional status statistical foundation. Since SGPs give the relative position of a student s current score within the conditional distribution of scores from students with similar past performance, the SGP model, like other relative growth models, provides a normative interpretation of growth (Betebenner, 29). As shown in the previous section, the result of the covariate adjustment model is a single line representing the best prediction of the outcome variable using a predictor variable. The solid black line shown in Figure 2.3 is the linear regression line in the example of Castellano and Ho (213a) from the previous section, where the predictor variable is the third grade score and the outcome variable is the fourth grade score. Using a technique called quantile regression, the SGP model fits not just one line, the conditional mean that is the result of linear regression, but rather 99 lines, one for each conditional percentile (1 through 99). Shown in Figure 2.3 by a dashed line is the line for the conditional median (the th line), which represents the best prediction for the median of the fourth grade scores given the third grade scores. Points lying along or closest to this line would be assigned SGPs of. Points lying above the conditional median line would be assigned SGPs higher than, depending on which conditional percentile they are closest to; likewise, points lying below the conditional median line would be assigned SGPs lower than.
32 16 Median SGPs are the most commonly used aggregate SGP metric, which was suggested because SGPs are percentile ranks and on a scale that is not recommended for averaging (Betebenner, 29). However, it has been shown recently that using averages of percentile ranks can support more stable aggregate statistics for SGPs (Castellano & Ho, 213b). Castellano (211) showed that using the mean function may in fact be preferable to the median function when aggregating SGPs, as mean SGPs were found to classify and rank groups more similarly to value-added effects than were median SGPs. SGPs support descriptive interpretations of growth of student groups when aggregated at the classroom, school, or district level. The aggregates summarize how the SGPs are distributed with either an average value or a typical value from the group. According to Betebenner (29), SGPs are not intended to be used to support valueadded interpretations, although it is reported that SGPs derived from quantile regression are strongly correlated with value-added estimates from the SAS EVAAS model (Briggs & Betebenner, 29). Educational Value-Added Assessment System The SAS EVAAS model is an example of a multivariate model primarily designed to support value-added inferences for schools and teachers (Sanders & Horn, 1994). The model considers all available student scores for up to as many as five years, in order to create statistical expectations for performance by tracking students moving through their classrooms and schools over time. Greater or lesser than expected performance can be attributed to the students teachers and schools, with a causal determination of how much each teacher or school contributes to average student performance (Castellano & Ho, 213). In this model, the effect of teachers on student performance is assumed to persist into the future undiminished. That is, the degree to which student performance in third grade is attributable to the third grade teacher persists into fourth grade, fifth grade, and
33 17 on. Because of this feature, the SAS EVAAS model is termed a layered model, as successive teacher effects are layered onto students over time (Braun, 2). Performance expectations are set for students in a particular classroom by considering all these students current test scores and their test scores from before the students enter the classroom and after they leave it; the model also includes the average scores for the district and individual test scores in all other subjects, in addition to teacher effects from other teachers over time. The SAS EVAAS model is complex, incorporates a large amount of information, and requires highly specialized proprietary software to run (SAS Institute, 212). Research on Comparison of Models There is no VAM used in an educational setting that is generally agreed upon as being the best one for accountability decisions, and all VAMs have both favorable and unfavorable features depending upon the context in which they are applied. In any comparison between VAMs that are used on either simulated or real data sets, there is no way to definitively assess which model is producing the correct (or closer to correct) teacher effects, which are assumed to reflect teacher effectiveness in the classroom. In a study that compared a simple fixed effects model (SFEM) that was parameterized as a gain score model, a layered mixed effects model (LMEM) that has similarities to the SAS EVAAS model, and a hierarchical linear mixed model (HLMM), the researchers suggested that policymakers, school districts, and stakeholders would likely prefer the SFEM because of its transparency (Tekwe et al., 24). Three cohorts of elementary school students with test score data in reading and mathematics were used to calculate school effects under these models. The researchers found high correlations between rankings from all these models, ranging from.91 to 1. in reading and from.96 to 1. in mathematics. Since they believed that the SFEM was the more desirable model because it was more easily understandable, the authors concluded there was no benefit to using the other models in this context.
Supporting State Efforts to Design and Implement Teacher Evaluation Systems. Glossary
Supporting State Efforts to Design and Implement Teacher Evaluation Systems A Workshop for Regional Comprehensive Center Staff Hosted by the National Comprehensive Center for Teacher Quality With the Assessment
More informationUsing Value Added Models to Evaluate Teacher Preparation Programs
Using Value Added Models to Evaluate Teacher Preparation Programs White Paper Prepared by the Value-Added Task Force at the Request of University Dean Gerardo Gonzalez November 2011 Task Force Members:
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationConnecting English Language Learning and Academic Performance: A Prediction Study
Connecting English Language Learning and Academic Performance: A Prediction Study American Educational Research Association Vancouver, British Columbia, Canada Jadie Kong Sonya Powers Laura Starr Natasha
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationTulsa Public Schools Teacher Observation and Evaluation System: Its Research Base and Validation Studies
Tulsa Public Schools Teacher Observation and Evaluation System: Its Research Base and Validation Studies Summary The Tulsa teacher evaluation model was developed with teachers, for teachers. It is based
More informationInterpretive Guide for the Achievement Levels Report (2003 Revision) ITBS/ITED Testing Program
Interpretive Guide for the Achievement Levels Report (2003 Revision) ITBS/ITED Testing Program The purpose of this Interpretive Guide is to provide information to individuals who will use the Achievement
More informationAmerican Statistical Association
American Statistical Association Promoting the Practice and Profession of Statistics ASA Statement on Using Value-Added Models for Educational Assessment April 8, 2014 Executive Summary Many states and
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationRUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY
RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY I. INTRODUCTION According to the Common Core Standards (2010), Decisions or predictions are often based on data numbers
More informationAutomated Scoring for the Assessment of Common Core Standards
Automated Scoring for the Assessment of Common Core Standards David M. Williamson, Senior Research Director, Applied Research and Development, ETS Randy E. Bennett, Frederiksen Chair, Assessment Innovation,
More informationAssessment Policy. 1 Introduction. 2 Background
Assessment Policy 1 Introduction This document has been written by the National Foundation for Educational Research (NFER) to provide policy makers, researchers, teacher educators and practitioners with
More informationStability of School Building Accountability Scores and Gains. CSE Technical Report 561. Robert L. Linn CRESST/University of Colorado at Boulder
Stability of School Building Accountability Scores and Gains CSE Technical Report 561 Robert L. Linn CRESST/University of Colorado at Boulder Carolyn Haug University of Colorado at Boulder April 2002 Center
More informationFlorida s Plan to Ensure Equitable Access to Excellent Educators. heralded Florida for being number two in the nation for AP participation, a dramatic
Florida s Plan to Ensure Equitable Access to Excellent Educators Introduction Florida s record on educational excellence and equity over the last fifteen years speaks for itself. In the 10 th Annual AP
More informationThe MetLife Survey of
The MetLife Survey of Preparing Students for College and Careers Part 2: Teaching Diverse Learners The MetLife Survey of the American Teacher: Preparing Students for College and Careers The MetLife Survey
More informationTest Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but
Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic
More informationDesign principles for assessment-based accountability systems
Invited Testimony Texas Commission on Next Generation Assessments and Accountability Austin, Texas, January 20, 2016 Design principles for assessment-based accountability systems 10 principles for test-based
More informationSimulations, Games and Experiential Learning Techniques:, Volume 1,1974
EXPERIENCES WITH THE HARVARD MANAGEMENT GAME Ralph M. Roberts, The University of West Florida The Harvard Management Game [1] was introduced at The University of West Florida in the Fall of 1971, as the
More informationTechnical Report. Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: 2004-2005 to 2006-2007
Page 1 of 16 Technical Report Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: 2004-2005 to 2006-2007 George H. Noell, Ph.D. Department of Psychology Louisiana
More informationNCEE EVALUATION BRIEF April 2014 STATE REQUIREMENTS FOR TEACHER EVALUATION POLICIES PROMOTED BY RACE TO THE TOP
NCEE EVALUATION BRIEF April 2014 STATE REQUIREMENTS FOR TEACHER EVALUATION POLICIES PROMOTED BY RACE TO THE TOP Congress appropriated approximately $5.05 billion for the Race to the Top (RTT) program between
More informationA Practitioner s Guide to Growth Models. Authored By: Katherine E. Castellano, University of California, Berkeley
A Practitioner s Guide to Growth Models Katherine E. Castellano University of California, Berkeley Andrew D. Ho Harvard Graduate School of Education February 2013 Authored By: Katherine E. Castellano,
More information2013 A-F Letter Grade Accountability System TECHNICAL MANUAL
2013 A-F Letter Grade Accountability System TECHNICAL MANUAL Arizona Department of Education John Huppenthal, Superintendent For more information, please contact: Research & Evaluation Section (602) 542-5151
More informationInvestment manager research
Page 1 of 10 Investment manager research Due diligence and selection process Table of contents 2 Introduction 2 Disciplined search criteria 3 Comprehensive evaluation process 4 Firm and product 5 Investment
More informationSchool Leader s Guide to the 2015 Accountability Determinations
School Leader s Guide to the 2015 Accountability Determinations This guide is intended to help district and school leaders understand Massachusetts accountability measures, and provides an explanation
More informationPublic Housing and Public Schools: How Do Students Living in NYC Public Housing Fare in School?
Furman Center for real estate & urban policy New York University school of law wagner school of public service november 2008 Policy Brief Public Housing and Public Schools: How Do Students Living in NYC
More informationA STUDY OF WHETHER HAVING A PROFESSIONAL STAFF WITH ADVANCED DEGREES INCREASES STUDENT ACHIEVEMENT MEGAN M. MOSSER. Submitted to
Advanced Degrees and Student Achievement-1 Running Head: Advanced Degrees and Student Achievement A STUDY OF WHETHER HAVING A PROFESSIONAL STAFF WITH ADVANCED DEGREES INCREASES STUDENT ACHIEVEMENT By MEGAN
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationShould non-cognitive skills be included in school accountability systems? Preliminary evidence from California s CORE districts
Evidence Speaks Reports, Vol 1, #13 March 17, 2016 Should non-cognitive skills be included in school accountability systems? Preliminary evidence from California s CORE districts Martin R. West Executive
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationMode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)
Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) April 30, 2008 Abstract A randomized Mode Experiment of 27,229 discharges from 45 hospitals was used to develop adjustments for the
More informationCOUPLE OUTCOMES IN STEPFAMILIES
COUPLE OUTCOMES IN STEPFAMILIES Vanessa Leigh Bruce B. Arts, B. Psy (Hons) This thesis is submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Clinical Psychology,
More informationNational Chiayi University Department of Education, Coursework Guidelines for Master s and Doctoral Students
National Chiayi University Department of Education, Coursework Guidelines for Master s and Doctoral Students 1. Classes The graduate institute of this department offers master s degree and doctoral degree
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationAbstract Title Page Not included in page count.
Abstract Title Page Not included in page count. Title: The Impact of The Stock Market Game on Financial Literacy and Mathematics Achievement: Results from a National Randomized Controlled Trial. Author(s):
More informationCorrelational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots
Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship
More informationILLINOIS STATE BOARD OF EDUCATION MEETING October 16, 2002
ILLINOIS STATE BOARD OF EDUCATION MEETING October 16, 2002 TO: FROM: Illinois State Board of Education Robert E. Schiller, Superintendent Christopher Koch, Director Agenda Topic: Materials: Staff Contact(s):
More informationTIME-MANAGEMENT PRACTICES OF SCHOOL PRINCIPALS IN THE UNITED STATES. Peggie Johnson Robertson. Dissertation submitted to the Faculty of the
TIME-MANAGEMENT PRACTICES OF SCHOOL PRINCIPALS IN THE UNITED STATES by Peggie Johnson Robertson Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial
More informationEvery Student Succeeds Act
Every Student Succeeds Act A New Day in Public Education Frequently Asked Questions STANDARDS, ASSESSMENTS AND ACCOUNTABILITY Q: What does ESSA mean for a classroom teacher? A: ESSA will end the obsession
More informationTeacher Prep Student Performance Models - Six Core Principles
Teacher preparation program student performance models: Six core design principles Just as the evaluation of teachers is evolving into a multifaceted assessment, so too is the evaluation of teacher preparation
More informationInformation and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools. Jonah E. Rockoff 1 Columbia Business School
Preliminary Draft, Please do not cite or circulate without authors permission Information and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools Jonah E. Rockoff 1 Columbia
More informationIII. FREE APPROPRIATE PUBLIC EDUCATION (FAPE)
III. FREE APPROPRIATE PUBLIC EDUCATION (FAPE) Understanding what the law requires in terms of providing a free appropriate public education to students with disabilities is central to understanding the
More informationPlacement Stability and Number of Children in a Foster Home. Mark F. Testa. Martin Nieto. Tamara L. Fuller
Placement Stability and Number of Children in a Foster Home Mark F. Testa Martin Nieto Tamara L. Fuller Children and Family Research Center School of Social Work University of Illinois at Urbana-Champaign
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationMeasurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical
More information096 Professional Readiness Examination (Mathematics)
096 Professional Readiness Examination (Mathematics) Effective after October 1, 2013 MI-SG-FLD096M-02 TABLE OF CONTENTS PART 1: General Information About the MTTC Program and Test Preparation OVERVIEW
More informationLearning and Teaching
B E S T PRACTICES NEA RESEARCH BRIEF Learning and Teaching July 2006 This brief outlines nine leading research-based concepts that have served as a foundation for education reform. It compares existing
More informationTEST-DRIVEN accountability is now the
Ten Big Effects of the No Child Left Behind Act on Public Schools The Center on Education Policy has been carefully monitoring the implementation of NCLB for four years. Now Mr. Jennings and Ms. Rentner
More informationChapter 5: Analysis of The National Education Longitudinal Study (NELS:88)
Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in
More informationExecutive Summary April 2009
Executive Summary April 2009 About the International Coach Federation The International Coach Federation (ICF) is the largest worldwide resource for business and personal coaches, and the source for those
More informationThe test uses age norms (national) and grade norms (national) to calculate scores and compare students of the same age or grade.
Reading the CogAT Report for Parents The CogAT Test measures the level and pattern of cognitive development of a student compared to age mates and grade mates. These general reasoning abilities, which
More informationMaster Plan Evaluation Report for English Learner Programs
Master Plan Evaluation Report (2002-03) for English Learner Programs Page i Los Angeles Unified School District Master Plan Evaluation Report for English Learner Programs 2002-03 Prepared by Jesús José
More informationACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores
ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores Wayne J. Camara, Dongmei Li, Deborah J. Harris, Benjamin Andrews, Qing Yi, and Yong He ACT Research Explains
More informationSecondly, this study was peer reviewed, as I have mentioned, by other top experts in the testing and measurement community before it was released.
HOME SCHOOLING WORKS Pass it on! Online Press Conference March 23, 1999, 12:00pm EST A transcript of the opening remarks by Michael Farris, Esq. & Lawrence M. Rudner, Ph.D. Michael Farris: Good morning.
More informationIMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions
IMPLEMENTATION NOTE Subject: Category: Capital No: A-1 Date: January 2006 I. Introduction The term rating system comprises all of the methods, processes, controls, data collection and IT systems that support
More informationConstructing a TpB Questionnaire: Conceptual and Methodological Considerations
Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory
More informationTeacher Performance Evaluation System
Chandler Unified School District Teacher Performance Evaluation System Revised 2015-16 Purpose The purpose of this guide is to outline Chandler Unified School District s teacher evaluation process. The
More informationEconomic inequality and educational attainment across a generation
Economic inequality and educational attainment across a generation Mary Campbell, Robert Haveman, Gary Sandefur, and Barbara Wolfe Mary Campbell is an assistant professor of sociology at the University
More informationCore Goal: Teacher and Leader Effectiveness
Teacher and Leader Effectiveness Board of Education Update January 2015 1 Assure that Tulsa Public Schools has an effective teacher in every classroom, an effective principal in every building and an effective
More informationTHE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service
THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal
More informationLocal outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
More informationThe Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces
The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More informationAnalysis of academy school performance in GCSEs 2014
Analysis of academy school performance in GCSEs 2014 Final report Report Analysis of academy school performance in GCSEs 2013 1 Analysis of Academy School Performance in GCSEs 2014 Jack Worth Published
More informationAcademic Achievement of English Language Learners in Post Proposition 203 Arizona
Academic Achievement of English Language Learners in Post Proposition 203 Arizona by Wayne E. Wright Assistant Professor University of Texas, San Antonio Chang Pu Doctoral Student University of Texas,
More informationWORKING PAPEr 22. By Elias Walsh and Eric Isenberg. How Does a Value-Added Model Compare to the Colorado Growth Model?
WORKING PAPEr 22 By Elias Walsh and Eric Isenberg How Does a Value-Added Model Compare to the Colorado Growth Model? October 2013 Abstract We compare teacher evaluation scores from a typical value-added
More informationCanonical Correlation Analysis
Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,
More informationclassroom Tool Part 3 of a 5 Part Series: How to Select The Right
How to Select The Right classroom Observation Tool This booklet outlines key questions that can guide observational tool selection. It is intended to provide guiding questions that will help users organize
More informationPartial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests
Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Final Report Sarah Maughan Ben Styles Yin Lin Catherine Kirkup September 29 Partial Estimates of Reliability:
More informationChapter 5. Summary, Conclusions, and Recommendations. The overriding purpose of this study was to determine the relative
149 Chapter 5 Summary, Conclusions, and Recommendations Summary The overriding purpose of this study was to determine the relative importance of construction as a curriculum organizer when viewed from
More informationThe MetLife Survey of
The MetLife Survey of Challenges for School Leadership Challenges for School Leadership A Survey of Teachers and Principals Conducted for: MetLife, Inc. Survey Field Dates: Teachers: October 5 November
More informationExponential Growth and Modeling
Exponential Growth and Modeling Is it Really a Small World After All? I. ASSESSSMENT TASK OVERVIEW & PURPOSE: Students will apply their knowledge of functions and regressions to compare the U.S. population
More informationSouth Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
More informationMapping State Proficiency Standards Onto the NAEP Scales:
Mapping State Proficiency Standards Onto the NAEP Scales: Variation and Change in State Standards for Reading and Mathematics, 2005 2009 NCES 2011-458 U.S. DEPARTMENT OF EDUCATION Contents 1 Executive
More informationTechnical Review Coversheet
Status: Submitted Last Updated: 8/6/1 4:17 PM Technical Review Coversheet Applicant: Seattle Public Schools -- Strategic Planning and Alliances, (S385A1135) Reader #1: ********** Questions Evaluation Criteria
More informationAppendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
More informationThe Virginia Reading Assessment: A Case Study in Review
The Virginia Reading Assessment: A Case Study in Review Thomas A. Elliott When you attend a conference organized around the theme of alignment, you begin to realize how complex this seemingly simple concept
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More informationValidity, Fairness, and Testing
Validity, Fairness, and Testing Michael Kane Educational Testing Service Conference on Conversations on Validity Around the World Teachers College, New York March 2012 Unpublished Work Copyright 2010 by
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationChapter 2 - Why RTI Plays An Important. Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004
Chapter 2 - Why RTI Plays An Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004 How Does IDEA 2004 Define a Specific Learning Disability? IDEA 2004 continues to
More informationValue-Added Measures of Educator Performance: Clearing Away the Smoke and Mirrors
Value-Added Measures of Educator Performance: Clearing Away the Smoke and Mirrors (Book forthcoming, Harvard Educ. Press, February, 2011) Douglas N. Harris Associate Professor of Educational Policy and
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationCompetitive Pay Policy
www.salary.com/hr Copyright 2002 Salary.com, Inc. Competitive Pay Policy Lena M. Bottos and Christopher J. Fusco, SPHR Salary.com, Inc. Abstract A competitive pay policy articulates an organization s strategy
More informationUsing the Leadership Pipeline transition focused concept as the vehicle in integrating your leadership development approach provides:
Building your Leadership Pipeline Leadership transition focused development - White Paper The Leadership Pipeline framework Business case reflections: 1. Integrated leadership development 2. Leadership
More informationMEMO TO: FROM: RE: Background
MEMO TO: FROM: RE: Amy McIntosh, Principal Deputy Assistant Secretary, delegated the authority of the Assistant Secretary, Office of Planning, Evaluation and Policy Development Dr. Erika Hunt and Ms. Alicia
More informationMissing data in randomized controlled trials (RCTs) can
EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationA STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA
A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS Gary Kader and Mike Perry Appalachian State University USA This paper will describe a content-pedagogy course designed to prepare elementary
More informationMeta-Analytic Synthesis of Studies Conducted at Marzano Research Laboratory on Instructional Strategies
Meta-Analytic Synthesis of Studies Conducted at Marzano Research Laboratory on Instructional Strategies By Mark W. Haystead & Dr. Robert J. Marzano Marzano Research Laboratory Englewood, CO August, 2009
More informationRaw Score to Scaled Score Conversions
Jon S Twing, PhD Vice President, Psychometric Services NCS Pearson - Iowa City Slide 1 of 22 Personal Background Doctorate in Educational Measurement and Statistics, University of Iowa Responsible for
More informationALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES. Non-Regulatory Guidance
ALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES Non-Regulatory Guidance August 2005 Alternate Achievement Standards for Students with the Most Significant
More informationBasic Concepts in Research and Data Analysis
Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationInterpreting and Using SAT Scores
Interpreting and Using SAT Scores Evaluating Student Performance Use the tables in this section to compare a student s performance on SAT Program tests with the performance of groups of students. These
More informationProgram Rating Sheet - Athens State University Athens, Alabama
Program Rating Sheet - Athens State University Athens, Alabama Undergraduate Secondary Teacher Prep Program: Bachelor of Science in Secondary Education with Certification, Social Science 2013 Program Rating:
More informationProblem of the Month Through the Grapevine
The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems
More informationSchool Performance Framework: Technical Guide
School Performance Framework: Technical Guide Version 1.6 August 2010 This technical guide provides information about the following topics as they related to interpreting the school performance framework
More informationAn introduction to Value-at-Risk Learning Curve September 2003
An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk
More information