The effects of value-added modeling decisions on estimates of teacher effectiveness

Size: px
Start display at page:

Download "The effects of value-added modeling decisions on estimates of teacher effectiveness"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations 214 The effects of value-added modeling decisions on estimates of teacher effectiveness Paula Lynn Cunningham University of Iowa Copyright 214 Paula Lynn Cunningham This dissertation is available at Iowa Research Online: Recommended Citation Cunningham, Paula Lynn. "The effects of value-added modeling decisions on estimates of teacher effectiveness." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Educational Psychology Commons

2 THE EFFECTS OF VALUE-ADDED MODELING DECISIONS ON ESTIMATES OF TEACHER EFFECTIVENESS by Paula Lynn Cunningham A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa December 214 Thesis Supervisor: Professor Catherine J. Welch

3 Copyright by PAULA LYNN CUNNINGHAM 214 All Rights Reserved

4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Paula Lynn Cunningham has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) at the December 214 graduation. Thesis Committee: Catherine J. Welch, Thesis Supervisor Robert D. Ankenmann Timothy N. Ansley Stephen B. Dunbar Marcus J. Haack David B. Bills

5 To the memory of Bernice Rita Shanklin ii

6 Never despair, but if you do, work on in despair. Fortune Cookie Chuong Garden Restaurant Grinnell, Iowa iii

7 ACKNOWLEDGMENTS I wish to express my sincere gratitude to my advisor, Cathy Welch, for her guidance, support, and understanding, both during the dissertation research and through all my years as a graduate student. A knowledgeable and patient mentor, she has helped me to become an independent researcher. Most of all, I am thankful for her kindness and reassurance when I needed to pause my graduate study and her encouragement when I was able to resume it. I also wish to acknowledge the generous input of Steve Dunbar to the success of this research project from its beginning. In addition, I thank all the members of my dissertation committee for their insightful feedback in the form of suggestions that aim to strengthen this document, helping to make it an accomplishment of which I can truly be proud. I also wish to thank all the talented people of the Iowa Testing Programs, and Matt Whittaker in particular for his efforts in creating the matched longitudinal data sets and generating the state means results used in this study. It was as a graduate research assistant for Iowa Testing Programs that I learned how test development and psychometric research are accomplished. Through assignments that challenged me and the realization that my contributions mattered, I grew more confident as I gained understanding. I feel grateful for having had the privilege of working among these dedicated professionals. Not least of all I acknowledge the source of my strength to continue in this enterprise, the solid foundation keeping me upright: my family. I cannot praise too highly my husband Charles and son Evan for their encouragement at the outset of this journey and their support through its completion. From the shaky first semester to comprehensive examinations and the dissertation phase they have been there for me, sharing all the trials, successes, despair and joy in short, life that happened along the way. iv

8 ABSTRACT This study was undertaken to evaluate the impact of modeling decisions made by those charged with implementing teacher evaluation systems that incorporate student achievement data; such choices include how growth is to be modeled, whether student characteristics are to be controlled for, how many years of data are to be used, and which test subject is to be selected. Using a three-cohort longitudinal data set from a school district in which reading and mathematics test scores from a vertically-scaled assessment allowed determination of growth in grades three, four, and five, estimated teacher effects were derived from five value-added models, and the resulting rank orderings of the teachers were examined. The models compared were a covariate adjustment model that conditioned on prior achievement only, a covariate adjustment model that conditioned on certain student characteristics as well as prior achievement, a gain score model, the growth model underlying the vertically-scaled assessment, and student growth percentiles. Teacher rank orderings derived under the five models were highly consistent with one another using either one or three classroom years of test scores. Only when the movement of teachers between quartiles was examined did a difference in performance between some models emerge. The high degree of consistency between the two covariate adjustment models suggested that control for student-level characteristics was unnecessary. Using three years of test scores rather than one led to a small decrease in between-model correlations and a small increase in teacher movement between quartiles. Comparison of teacher value-added based on reading scores versus mathematics scores gave mixed results, with between-model correlations in mathematics being slightly higher than those for reading but with reading showing greater consistency in quartile movement between cohorts. v

9 The year-to-year change in teacher rank orderings was very striking, as low, and even negative, correlations emerged between years. Movement of teachers between quartiles from one year to the next was far greater than that observed when comparing the modeling conditions. Using a teacher rating scheme in which groups of teachers were distinguished from average effectiveness if they appeared in the extremes of the rankings, nearly half of teachers changed ratings from one year to the next. Such low intertemporal stability of teacher value-added is a significant result that should be considered by all stakeholders in teacher evaluation. vi

10 PUBLIC ABSTRACT This study examined the impact of modeling decisions made in implementing value-added teacher evaluation; such choices include the growth model itself, whether to control for student characteristics, how many years of scores to use, and the subject tested. Estimates of teacher effectiveness were derived from five models, which were a covariate adjustment model that conditioned on prior achievement only, a covariate adjustment model that conditioned on certain student characteristics as well as prior achievement, a gain score model, the growth model underlying the assessment, and student growth percentiles. The resulting rank orderings of the teachers were examined and found to be highly consistent with one another using scores for either one or three classroom years. When the movement of teachers between quartiles of the rank orderings was examined, a difference in performance between some models did emerge. The covariate adjustment models were highly consistent, suggesting that control for student-level characteristics was unnecessary. Using three years of data rather than one did not significantly change model performance, and comparison of rank orderings based on reading scores versus mathematics scores gave mixed results. The year-to-year inconsistency in rank orderings was striking. Movement of teachers between quartiles from one year to the next was far greater than that observed when comparing modeling conditions. Under a rating scheme in which teachers were distinguished from average effectiveness if they appeared in the extremes of the rankings, nearly half of teachers changed ratings from one year to the next. vii

11 TABLE OF CONTENTS LIST OF TABLES...x LIST OF FIGURES... xii CHAPTER I INTRODUCTION...1 An Approach to Teacher Evaluation...1 Implementing VAM-based Teacher Evaluation...3 Purpose of the Study and Research Questions...7 CHAPTER II LITERATURE REVIEW...8 Status versus Growth...8 Growth Models...9 Growth Models versus Value-Added Models...11 Four Widely Used Models...12 Gain Score Model...12 Residual Gain/Covariate Adjustment Model...13 Student Growth Percentile Model...1 Educational Value-Added Assessment System...16 Research on Comparison of Models...17 Ongoing Concerns about Value-Added Models...19 Bias...19 Precision...2 Stability...22 Practical Considerations...23 CHAPTER III METHODS...28 Data...28 Value-Added Models...31 Covariate Adjustment Model 1 (CA1)...31 Covariate Adjustment Model 2 (CA2)...32 Gain Score Model (GAIN)...3 Iowa Growth Model (IOWA)...3 Student Growth Percentile Model (SGP)...36 The Study and Research Questions...37 Section 1: Question 1a...37 Section 2: Question 1b...39 Section 3: Question Section 4: Question viii

12 CHAPTER IV RESULTS...4 Section 1: Effect of Model Choice with Single Cohorts...4 Spearman Rank Order Correlations... Quartile Analysis... Section 2: Effect of Model Choice with Multiple Cohorts...6 Spearman Rank Order Correlations...8 Quartile Analysis...8 Section 3: Stability between Cohorts...9 Teacher Retention between Cohorts...9 Between-cohort Spearman Rank Order Correlations...6 Quartile Analysis...61 Rating Consistency...63 Section 4: Generalizability across Tests...64 Effect of Model Choice with Single Cohorts...64 Effect of Model Choice with Multiple Cohorts...6 Stability between Cohorts...67 Between-subject Spearman Rank Order Correlations...69 Summary of Results...69 CHAPTER V DISCUSSION...19 Summary of Findings...19 Research Question Research Question Research Question Implications for Practice Limitations and Continuing Research...12 Conclusion APPENDIX CATERPILLAR PLOTS OF TEACHER VALUE-ADDED...12 REFERENCES...14 ix

13 LIST OF TABLES Table 3.1 Table 3.2 Table 3.3 Table 3.4 Group Means with Standard Deviations on the Reading Subtest for All Cohorts and Grades...49 Group Means with Standard Deviations on the Mathematics Subtest for All Cohorts and Grades... Percentages of Students with Positive Status on FRL, IEP, ELL, and Combinations Thereof...1 Correlations between Reading Subtest Score, Mathematics Subtest Score, FRL, IEP, and ELL Variables...2 Table 3. R 2 Values for Best Predictive Models...3 Table 4.1 Table 4.2 Table 4.3 Table 4.4 Table 4. Table 4.6 Pooled Spearman Rank Order Correlations between Models for Single-year Analysis...73 Transition Matrices Showing Quartile Consistency between Models for Single-year Analysis...74 Percent of Teachers who Changed Quartile by Model for Single-year Analysis...76 Pooled Spearman Rank Order Correlations between Models for Multiple-year Analysis...77 Transition Matrices Showing Quartile Consistency between Models for Multiple-year Analysis...78 Percent of Teachers who Changed Quartile by Model for Multipleyear Analysis...8 Table 4.7 Percent Teacher Retention between Cohorts...81 Table 4.8 Pooled Spearman Rank Order Correlations between Cohorts...82 Table 4.9 Median Spearman Rank Order Correlations between Cohorts...83 Table 4.1 Transition Matrices Showing Year-to-year Consistency of Quartiles...84 Table 4.11 Percent of Teachers who Changed Quartile Year-to-year...86 Table 4.12 Percent of Teachers who Changed Rating Year-to-year...87 x

14 Table 4.13 Table 4.14 Table 4.1 Table 4.16 Table 4.17 Table 4.18 Table 4.19 Table 4.2 Table 4.21 Table 4.22 Table 4.23 Table 4.24 Spearman Correlations between Models Pooled by Subject for Single-year Analysis...88 Transition Matrices Showing Quartile Consistency between Models for Single-year Analysis for the Reading Subtest...89 Transition Matrices Showing Quartile Consistency between Models for Single-year Analysis for the Mathematics Subtest...91 Percent of Teachers who Changed Quartile Due to Model by Subtest for Single-year Analysis...93 Spearman Correlations between Models Pooled by Subject for Multiple-year Analysis...94 Transition Matrices Showing Quartile Consistency between Models for Multiple-year Analysis for the Reading Subtest...9 Transition Matrices Showing Quartile Consistency between Models for Multiple-year Analysis for the Mathematics Subtest...97 Percent of Teachers who Changed Quartile Due to Model by Subtest for Multiple-year Analysis...99 Pooled Spearman Rank Order Correlations between Cohorts by Subtest...1 Median Spearman Rank Order Correlations between Cohorts by Subtest...11 Transition Matrices Showing Year-to-year Consistency in Quartiles for Reading Subtest...12 Transition Matrices Showing Year-to-year Consistency in Quartiles for Mathematics Subtest...14 Table 4.2 Percent of Teachers who Changed Quartile Year-to-year by Subject...16 Table 4.26 Percent of Teachers who Changed Rating Year-to-year by Subject...17 Table 4.27 Between-subject Spearman Correlations Pooled over Methods...18 Table.1 Additional Test Items Answered Correctly by the Class of the Highest-ranked Teacher Compared to the Class of the Lowest-ranked Teacher xi

15 LIST OF FIGURES Figure 2.1 Illustration of the Gain Score Model...2 Figure 2.2 Figure 2.3 Illustration of the Residual Gain Model...26 Illustration of a Linear Regression Line and a Median Quantile Regression Line...27 Figure 3.1 Structure of the Longitudinal Data Sets...44 Figure 3.2 Attribution of Growth Using Fall-to-fall Testing Schedule...4 Figure 3.3 Figure 3.4 Figure 3. Figure 4.1 Figure 4.2 The Iowa Growth Model: Plots Demonstrating the Relationship between Standard Score and Percentile Rank for Levels of the Reading Subtest of the Iowa Assessments...46 The Eighteen Rank Orderings Generated under Each VAM Condition with Single-year Data...47 The Six Rank Orderings Generated under Each VAM Condition with Multiple-year Data...48 Rank Ordering Change from Cohort 1 to Cohort 2 for Fourth Grade Mathematics Using the Gain Score Model...71 Rank Ordering Change from Cohort 1 to Cohort 2 for Fourth Grade Reading Using the Gain Score Model...72 Figure A1 Caterpillar Plots for Cohort 1 under the CA1 Model...12 Figure A2 Caterpillar Plots for Cohort 1 under the CA2 Model Figure A3 Caterpillar Plots for Cohort 1 under the GAIN Model Figure A4 Caterpillar Plots for Cohort 1 under the IOWA Model Figure A Caterpillar Plots for Cohort 1 under the SGP Model Figure A6 Caterpillar Plots for Cohort 2 under the CA1 Model...13 Figure A7 Caterpillar Plots for Cohort 2 under the CA2 Model Figure A8 Caterpillar Plots for Cohort 2 under the GAIN Model xii

16 Figure A9 Caterpillar Plots for Cohort 2 under the IOWA Model Figure A1 Caterpillar Plots for Cohort 2 under the SGP Model Figure A11 Caterpillar Plots for Cohort 3 under the CA1 Model...13 Figure A12 Caterpillar Plots for Cohort 3 under the CA2 Model Figure A13 Caterpillar Plots for Cohort 3 under the GAIN Model Figure A14 Caterpillar Plots for Cohort 3 under the IOWA Model Figure A1 Caterpillar Plots for Cohort 3 under the SGP Model Figure A16 Caterpillar Plots for Combined Cohorts under the CA1 Model...14 Figure A17 Caterpillar Plots for Combined Cohorts under the CA2 Model Figure A18 Caterpillar Plots for Combined Cohorts under the GAIN Model Figure A19 Caterpillar Plots for Combined Cohorts under the IOWA Model Figure A2 Caterpillar Plots for Combined Cohorts under the SGP Model xiii

17 1 CHAPTER I INTRODUCTION Accountability in K-12 education is an ongoing concern. The most recent reauthorization of the Elementary and Secondary Education Act (ESEA), the No Child Left Behind Act of 21 (NCLB), mandated testing of students to hold schools and districts accountable for making Adequate Yearly Progress (AYP) toward 1 percent proficiency in reading and mathematics by 214 to avoid facing sanctions. A few years later, the Secretary of Education announced the Growth Model Pilot Program (GMPP; Spellings, 2); there was subsequent movement by many states away from using the status measure of proficiency toward another measure, growth to a standard, in the belief that using this measure could allow some schools to make AYP that would fail to do so under the status measure. Over time, growth models have become the preferred method of analyzing student achievement test data for the purpose of accountability (Betebenner & Linn, 21). In 29, as part of the American Recovery and Reinvestment Act, the Race to the Top (RTTP) initiative placed emphasis on teacher evaluation using student test scores (United States Department of Education, 29). Value-added modeling, in which student achievement is attributed to various causes, such as teachers, schools, and sometimes background characteristics, is the most recent tool being brought to bear on the question of accountability. With many states choosing to emphasize teacher evaluation and with their students longitudinal data having been recorded over years of standardized testing, value-added modeling is now receiving a lot of attention. An Approach to Teacher Evaluation Numerous states are implementing evaluation systems that incorporate students standardized tests scores to some degree in consequential decisions about teacher salaries, promotions, tenure and even dismissal (Braun, 2). Value-added models (VAMs) are

18 2 used to quantify deviations from expected student performance on a test after a year of instruction, based on characteristics such as the student s achievement on the previous year s test. Teachers in elementary grades whose students take standardized tests in subjects such as reading and mathematics can be held accountable for getting them to achieve their expected scores. The movement toward linking student performance on tests to teacher evaluations gained considerable momentum through the awarding of points in the Race to the Top initiative to states that did link them (Braun, 212). Many proponents take the view that VAMs hold the promise of adding objectivity to teacher evaluation systems that have heretofore relied on seniority, attainment of credentials, and principal observations of classroom performance (Braun, 212). They might suggest that the first two measures do not really reflect teacher effectiveness in the classroom and that principal observations occur too infrequently and result in satisfactory ratings for virtually all teachers, making them less useful as a measure to distinguish between teachers (Papay, 212). In addition, some VAMs purport to control for student background characteristics; this fact has been interpreted as meaning that VAMs level the playing field, so that teachers are evaluated more fairly. Yet VAM-derived teacher effects are themselves known to contain considerable error, in particular when they result from fewer than three years of accumulated test data. They are also subject to unpredictable bias introduced either because they do or do not attempt to account for student background characteristics (McCaffrey, Lockwood, Koretz, & Hamilton, 23). When such statistical controls are introduced, there is a further concern that they result in different achievement expectations for different groups of students (Ballou, Sanders, & Wright, 24). Another consideration is striking a balance between complexity and transparency: VAMs applied in educational settings can be very complex and involve numerous factors, so that explaining to teachers how they work and how their rankings are generated is not simple (National Research Council & National Academy of Education, 21).

19 3 Implementing VAM-based Teacher Evaluation Despite the enthusiasm with which some state legislatures are mandating new teacher evaluation systems that incorporate student test scores, there does not exist a clear set of best practices available to guide those charged with implementing them. There are numerous requirements and consequential decisions facing state departments of education and individual school districts during the process of implementing teacher evaluation systems that rely on the use of VAMs. Adopting such models for teacher evaluation places many requirements on the states and school districts for their proper use. The most obvious requirement is the existence of matched longitudinal test score data for students, and depending on the model chosen, even more student-level demographic data may be required. Within this data, accurate links to classroom teachers must exist, or else the student data will be unable to be included in the analysis and will effectively be considered missing. The problem of missing test scores must be handled either by deletion of cases or imputation of values, with consequences arising from either choice (Cunningham, Welch, & Dunbar, 214). Experts are required both to conduct the analysis using VAMs and to produce reports and lead training sessions that provide support for administrators and educators to make appropriate inferences from the analysis. Furthermore, an evaluation of the system must be established in order to monitor the effects of its implementation on students and teachers alike, being sensitive to unintended consequences that may occur in response. Among the decisions state departments of education and school districts may have some input into are the uses to which these analyses may be put and whether the stakes for educators are high or low. While it is generally agreed by researchers that the use of student achievement test data to evaluate teachers for low stakes purposes, such as for use in establishing which teachers may benefit most from improvement strategies through professional development, is a warranted use of VAMs, there is far less agreement about the extent to which they should be relied upon for mandated evaluation for high stakes

20 4 purposes, such as merit pay or tenure (National Research Council & National Academy of Education, 21). The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) make clear that there should be evidence of validity and reliability for every test use and that the greater the consequences of the test use are, the stronger the evidence in support of that use should be. States and school districts need to consider that the researchers who understand and use VAMs the most are not in agreement that high-stakes teacher evaluation is an appropriate use of this technique. When the use of student achievement data for teacher evaluation has been mandated, a decision must be made about whether the VAM-derived teacher effects will replace or complement other measures of teacher effectiveness that are being used. It should be considered whether the use of VAMs results in more useful, accurate, and fair outcomes than other measures. All such measures are imperfect, but as part of a teacher evaluation system using multiple measures, such as standardized principal evaluations that include classroom visits and video recordings, tests of teachers content knowledge, surveys of students and parents, and teacher peer evaluations, some concerns expressed by researchers may be allayed (Kane & Staiger, 212). States and school districts must still make a determination about how to weigh the VAM-derived teacher effects with those other measures. Finally, numerous choices need to be made concerning the value-added modeling itself. There are many different types of VAMs discussed in the literature, yet at this time there is no method that has emerged as dominant (National Research Council & National Academy of Education, 21). Some factors that are considered by VAM researchers include whether to specify teacher effects as fixed or random, whether to take a univariate or multivariate approach to modeling, how to disentangle school effects from teacher effects, and how to handle incomplete student records (McCaffrey et al., 23).

21 States and school districts as users of VAM results would likely have to depend upon their experts for advice about the impact of these decisions upon the analyses they conduct. However, these users can and arguably should have input on certain aspects of the modeling so that they have ownership of the process and remain accountable to their stakeholders. State departments of education and individual school districts could be involved in decision making about how to characterize student growth in achievement, how many years of data to use for evaluating teachers, and which, if any, student characteristics to control for in the analysis (Raudenbush, 24). There are many metrics available to characterize the student growth modeled by VAMs, and the preference of growth metric will depend on factors such as the type of assessments available and the ease with which student growth can be understood by policymakers and practitioners. One metric that has seen much use in VAMs is residual gain, which is a measure of how much a student s score deviates from the regression of current scores on past scores; a VAM that uses this method to characterize growth is called a covariate adjustment model. Another metric used in VAMs is the gain score, which is literally the difference between one year s achievement and the prior year s achievement on the score scale. While there is no single preferred model for value-added analysis, these are among the more commonly used choices (McCaffrey et al., 23). There are, however, additional growth metrics that could find application in VAMs. One consideration is that expected annual growth on an assessment, conditional on prior achievement, can be predicted by projecting forward a year through its vertical scale, which is established on a growth model (Furgol, Fina, & Welch, 211). Another growth metric that could be utilized to calculate estimates of teacher value-added is the student growth percentile (SGP; Betebenner, 29). The SGP metric relies on the use of quantile regression, conditioning on prior achievement to describe the current achievement of students. Because the first of these metrics depends on using a verticallyscaled assessment whereas the second does not, the types of assessments available to the

22 6 state or school district may dictate which of these is preferred. Furthermore, a particular growth metric may come to be seen as more acceptable by practitioners, particularly if its details can be communicated thoroughly enough to be accurate yet transparently enough to be understandable. State departments of education and individual school districts adopting VAMbased evaluation systems need to decide over how many years of instruction teachers will be evaluated. One of the major hurdles in applying VAMs to teacher evaluation is that teachers, especially in the elementary grades, often have very small classrooms. While more years teaching in the district will increase the amount of student data available to evaluate the teacher and perhaps thereby lower the standard errors of teacher effects, the solution is not simply to use seven years of data and assume that there will be substantial improvement in the errors of the estimates. After all, not all teachers will have been teaching for that many years in a district, so there will always be many teachers who have few students and, as a result of that, estimated teacher effects with larger standard errors. Furthermore, there is the question of whether it is appropriate to use seven-year-old data for current teacher evaluations; that question would have to be taken up by those who set policy. The use of student and sometimes teacher characteristics to adjust expected student growth is controversial, with many value-added researchers embracing the idea because the practice may result in greater stability of the estimated teacher effects. It is also purported to correct for influences on student achievement from outside the school environment, so that teachers are fairly evaluated regardless of the composition of their classrooms. However, it is not uncommon for those who make decisions for states and school districts to be more reluctant to include demographic covariates, in order to avoid the appearance of adopting different expectations for different groups of students. While research on the effect of including such covariates is somewhat mixed, it is clear that prior achievement is the single most important one, accounting for much more variance

23 7 in the estimates than demographic covariates do (Ballou et al., 24; Lockwood et al., 27). Statistical control for student-level characteristics is easily implemented as part of a covariate adjustment model. Purpose of the Study and Research Questions In order to discover information that could provide guidance to policymakers and practitioners in making decisions concerning teacher evaluation systems that incorporate student achievement data, a study was undertaken to evaluate the impact of choices made concerning how student growth is to be modeled, how many years of data are to be used, and whether student characteristics are to be controlled for in the analysis. The study used a three-cohort longitudinal data set from a school district in which reading and mathematics test scores from a vertically-scaled assessment were available for four consecutive years in each cohort, such that growth could be assessed in the third, fourth, and fifth grades. Estimated teacher effects were derived from VAMs using five different metrics for growth, and the resulting rank orderings of the teachers were examined. Research questions for the study included: 1. How do the rank orderings derived from different metrics for growth compare with one another for both (a) single year and (b) multiple year analyses? 2. How do the rank orderings derived using the various growth metrics compare year-to-year between the cohorts? 3. How generalizable are the answers to the questions 1 and 2 above from one test subject to another? These three research questions address various aspects of the application of VAMs to a practical setting. The methods used to address each research question are described specifically in Chapter III.

24 8 CHAPTER II LITERATURE REVIEW This chapter discusses value-added modeling within the broader context of student growth in achievement, beginning with the distinction between status and growth and their use as accountability measures. This introduction is followed by the definition of a growth model and an explanation of the general types of growth models, as categorized by different researchers. The key distinction between growth models and VAMs is given; this is followed by a discussion of applications and considerations for several models. Ongoing concerns about bias, error, and stability in the estimates generated by VAMs are described next. Finally, considerations for those involved in the implementation of teacher evaluation systems that incorporate student achievement data are addressed. Status versus Growth As accountability systems in education have evolved over time due to changes in the guidance provided by government agencies, there has been a concomitant movement away from a reliance on status measures to the adoption of growth measures (Briggs & Betebenner, 29). The difference between a status measure and a growth measure is a distinction between single and multiple snapshots of student achievement. Castellano and Ho (213a) define status as the academic performance of a student or group (a collection of students) at a single point in time, and they define growth as the academic performance of a student or group (a collection of students) over two or more time points. It was felt that status measures, such as yearly average performance, were not sufficient for the purpose of accountability and that student change over time would be a better measure. With growth measures, each student s progress could be compared against that student s own achievement in the previous year rather than against a cohort average (Callender, 24).

25 9 Growth Models Castellano and Ho (213a) define a growth model as a collection of definitions, calculations, or rules that summarizes student performance over two or more time points and supports interpretations about students, their classrooms, their educators, or their schools. The authors also classify growth models according to several criteria. One such classification is made according to the primary interpretations growth models support, which include growth description, growth prediction, and value-added. Another useful classification system is based on the statistical foundations underlying the growth model, in which three categories are proposed: gain-based models, conditional status models, and multivariate models. The first of these statistical foundations supports models that use a gain score to quantify growth. A gain score is simply the difference between a test score at one point in time and a test score at another point in time. One essential feature of a test used in the context of a gain-based model is the existence of a vertical scale, which affords a developmental basis for interpretations of growth over successive grade levels. With test scores for all grade levels placed on the same scale, it is possible to compare a student s fall test score from the third grade level to that from the fourth grade level and interpret this difference as the growth the student made over the year in the subject being tested (Castellano & Ho, 213a). The second statistical foundation underlies growth models that allow one to interpret a student s current status in light of what that student s status is expected to be, based on the past scores of that student and others. These are called conditional status models because they refer to the current status conditional on the past status, meaning that they take past test scores into account. This statistical foundation is different from that underlying the gain-based models, wherein growth is assessed from two points in time by the difference of current status and past status, in that current status for this case is compared to an expected status that is arrived at based on past performance and

26 1 potentially other information. Castellano and Ho (213a) give as examples of conditional status models the residual gain model, in which conditional status is defined by the difference of the current score and the score expected given past scores, and the student growth percentile model, in which the expectation is expressed through the percentile rank of the current score in the distribution of scores of students who had the same score at an earlier time. The third statistical foundation described by Castellano and Ho (213a) is the basis for multivariate models that are used primarily to estimate school and teacher effects in value-added applications, as it is not the ideal foundation for the purposes of growth description or prediction. Such models make use of large amounts of data and can be very complex. Perhaps the most widely implemented model of this type is the Educational Value-Added Assessment System, known as SAS EVAAS (Sanders & Horn, 1994); this model requires the use of specialized proprietary software from the SAS Institute (SAS Institute, 212). The perspective offered by Castellano and Ho (213a) concerning the systematic classification of growth models based on their statistical foundations is not intended to be taken as the only correct interpretation. There are other systems to classify growth models according to their statistical foundations. For instance, Briggs and Betebenner (29) assert that all statistical models for test score growth are essentially models of conditional achievement. They note that models can be distinguished from one another based on whether they model student achievement conditional on time or conditional on prior achievement. Models that conceptualize achievement conditional on time are referred to as absolute growth models, and those that conceptualize achievement conditional on prior achievement are referred to as relative growth models. In their scheme, a gain score model is an absolute growth model that is constrained to use scores from only two longitudinal time points. They too note the

27 11 requirement for this model that scores be placed on a vertical scale in order to make meaningful comparisons in an absolute sense (Briggs & Betebenner, 29). These authors assert that the quantity of interest in a relative growth model is the residual, the difference between a student s observed achievement and the achievement that would be predicted given the student s prior achievement. Use of residuals provides a normative interpretation of growth: the residual shows the amount of growth above or below the statistical expectation. Models as different in complexity as simple linear regression models, such as the residual gain model, and multivariate models, such as SAS EVAAS, are relative growth models by this definition. The common foundation underpinning these models is the principle of relative growth, defined as the difference between observed and expected achievement (Briggs & Betebenner, 29). Growth Models versus Value-Added Models Briggs and Betebenner (29) state, the leap from a growth model to what can be called a value-added model is a short one. They also assert that all growth models can be turned into VAMs through three steps. In order for a VAM to be used to generate teacher value-added, the following steps would need to occur. First, one must define what constitutes expected achievement for a student. Second, one must calculate a deviation from the expected achievement that contrasts what has been observed to what would be expected for the student. Third, one must make the inference that this deviation from what would be expected is an expression of the value-added to student achievement by the teacher. Making a similar argument, Castellano and Ho (213a) state, we consider value-added to be an inference, not a model. Others take the view that growth models and VAMs are distinct due to the fact that growth models do not generally control for student background or school factors (Baker et al., 21). They argue that one cannot attribute student growth in achievement to teachers without controlling for the effects of these factors. Yet Castellano and Ho (213a) point out that without the existence of a rigorous experimental design in which,

28 12 among other requirements, students are assigned randomly to classrooms, no model can support value-added inferences on its own. The reality is that in practice, as opposed to in research, most statistical models that have been used to support value-added inferences have tended not to include such predictor variables as race or socioeconomic status measures (National Research Council & National Academy of Education, 21). Four Widely Used Models Hereafter follows a brief description of four models frequently used to characterize student growth for accountability purposes, including teacher evaluation. These are the gain score model, the residual gain/covariate adjustment model, the student growth percentile model, and the SAS EVAAS model. Gain Score Model As noted earlier, a gain score is simply the difference between a test score at one point in time and a test score at another point in time. In the context of accountability, the two time points of interest occur at two grade levels, so the scores need to be placed on a common scale that is in turn representative of increasing competence in the domain being tested. The gain score model is an absolute growth model that describes a student s growth relative to his or her own previous score. As the following example (Castellano & Ho, 213a) shows, the gain score is the difference between the test score at the current time point and the test score at the previous time point. This calculation is depicted graphically in Figure 2.1, where a student s scores in third and fourth grade on a hypothetical vertically-scaled test are shown. This student s scores are marked with black dots, and the gain score is shown by the vertical difference between them. In this case the third grade score, which is 3, is subtracted from the fourth grade score, which is 37, to yield a gain score of +2. Gain scores can be aggregated to the group level by averaging a set of students gain scores, in order to characterize the average change in performance for the group. Most often the average of students individual gain scores can serve as a group-level

29 13 summary statistic for a subset of students, such as those in a particular classroom, school, or district. When the average gain score is positive, one can conclude that the students as a group made positive gains, whereas when the average gain score is negative, one can conclude that the group of students declined overall in their performance. Gain score models can be used for making value-added determinations of teacher effectiveness, by considering the value-added to be the deviation from the average gain in the district. However, some have expressed concern that gain-based models are not the best to use for making value-added inferences, due to the dependence of school effects upon the vertical scaling properties of tests (Briggs & Weeks, 29). Since vertical scales are developed to enable student growth in achievement to be described, and not necessarily to support causal inferences about that growth, Briggs and Weeks (29) argue that some properties of the vertical scale may be poorly suited for the purpose of accountability. For instance, some vertical scales reflect that higher scoring students make greater gains than those who score lower (Castellano & Ho, 213a). Such a vertical scale may correctly describe the observed pattern of growth with respect to initial status, but it does not make for the best accountability tool where growth expectations for all students are required to be equal. On the other hand, note Castellano and Ho (213a), these differential, scale-based expectations for lower-scoring students may be precisely what the accountability model should reflect. Residual Gain/Covariate Adjustment Model Linear regression is a statistical method that allows the prediction of an outcome variable from one or more predictor variables. The residual gain model uses linear regression to predict students expected scores from their prior scores. The residual gain is then calculated as the observed current score minus the expected score determined by the model. The residual is the quantity that describes the amount students scored above or below their expected scores, which were determined by their prior performance.

30 14 The following example, offered by Castellano and Ho (213a), will serve as an illustration of the residual gain model. Suppose there is a sample of eight students in fourth grade with test scores for both the third and fourth grades. Figure 2.2(a) shows a scatterplot for the students third and fourth grade scores, which are: (34,33), (34,3), (34,36), (3,3), (3,36), (3,37), (3,37), and (3,38). The eight students are represented in the plot by solid black dots, and the black line in the figure is the prediction line for fourth grade scores given third grade scores, which is the output of the linear regression method. The prediction line is the least squares best fit of the average fourth grade score across all the third grade scores; thus the line represents the expected fourth grade score at every possible third grade score. For instance, for a student with a third grade score of 3, the model predicts an expected fourth grade score of 364. Determining the expected current score is only the first step in the residual gain model. Figure 2.2(b) illustrates the calculation of the residual gain score, which is the difference between the observed current score and the expected current score. For a particular student whose score in third grade was 3 and in fourth grade was 37, his or her expected fourth grade score is predicted to be 364 by the linear regression line. In this case the expected fourth grade score, which is 364, is subtracted from the observed fourth grade score, which is 37, yielding a residual gain of +11. The typical summary statistic for a group of students is the average residual gain for those students in the same classroom, school, or district. The mean residual gain score is expected to be zero across the data set used in the analysis; for any given classroom of the data set, however, the mean residual gain score is not necessarily expected to be zero. The magnitude and sign of the mean residual gain score reveal something about the achievement of the students in the classroom being examined, with respect to expectations for their achievement (Castellano & Ho, 213a).

31 1 When the assumption is made that the average residual gain is the value-added to the average test scores in the group by a teacher or school, the model is a type of VAM called a covariate adjustment model. Like the residual gain model, the covariate adjustment model makes predicted expectations for outcome variables by using one or more predictor variables. The covariate adjustment model is one of the most commonly used models to support value-added interpretations (Castellano & Ho, 213a). Student Growth Percentile Model The student growth percentile (SGP) model describes current student status by taking into account past performance and thus utilizes a conditional status statistical foundation. Since SGPs give the relative position of a student s current score within the conditional distribution of scores from students with similar past performance, the SGP model, like other relative growth models, provides a normative interpretation of growth (Betebenner, 29). As shown in the previous section, the result of the covariate adjustment model is a single line representing the best prediction of the outcome variable using a predictor variable. The solid black line shown in Figure 2.3 is the linear regression line in the example of Castellano and Ho (213a) from the previous section, where the predictor variable is the third grade score and the outcome variable is the fourth grade score. Using a technique called quantile regression, the SGP model fits not just one line, the conditional mean that is the result of linear regression, but rather 99 lines, one for each conditional percentile (1 through 99). Shown in Figure 2.3 by a dashed line is the line for the conditional median (the th line), which represents the best prediction for the median of the fourth grade scores given the third grade scores. Points lying along or closest to this line would be assigned SGPs of. Points lying above the conditional median line would be assigned SGPs higher than, depending on which conditional percentile they are closest to; likewise, points lying below the conditional median line would be assigned SGPs lower than.

32 16 Median SGPs are the most commonly used aggregate SGP metric, which was suggested because SGPs are percentile ranks and on a scale that is not recommended for averaging (Betebenner, 29). However, it has been shown recently that using averages of percentile ranks can support more stable aggregate statistics for SGPs (Castellano & Ho, 213b). Castellano (211) showed that using the mean function may in fact be preferable to the median function when aggregating SGPs, as mean SGPs were found to classify and rank groups more similarly to value-added effects than were median SGPs. SGPs support descriptive interpretations of growth of student groups when aggregated at the classroom, school, or district level. The aggregates summarize how the SGPs are distributed with either an average value or a typical value from the group. According to Betebenner (29), SGPs are not intended to be used to support valueadded interpretations, although it is reported that SGPs derived from quantile regression are strongly correlated with value-added estimates from the SAS EVAAS model (Briggs & Betebenner, 29). Educational Value-Added Assessment System The SAS EVAAS model is an example of a multivariate model primarily designed to support value-added inferences for schools and teachers (Sanders & Horn, 1994). The model considers all available student scores for up to as many as five years, in order to create statistical expectations for performance by tracking students moving through their classrooms and schools over time. Greater or lesser than expected performance can be attributed to the students teachers and schools, with a causal determination of how much each teacher or school contributes to average student performance (Castellano & Ho, 213). In this model, the effect of teachers on student performance is assumed to persist into the future undiminished. That is, the degree to which student performance in third grade is attributable to the third grade teacher persists into fourth grade, fifth grade, and

33 17 on. Because of this feature, the SAS EVAAS model is termed a layered model, as successive teacher effects are layered onto students over time (Braun, 2). Performance expectations are set for students in a particular classroom by considering all these students current test scores and their test scores from before the students enter the classroom and after they leave it; the model also includes the average scores for the district and individual test scores in all other subjects, in addition to teacher effects from other teachers over time. The SAS EVAAS model is complex, incorporates a large amount of information, and requires highly specialized proprietary software to run (SAS Institute, 212). Research on Comparison of Models There is no VAM used in an educational setting that is generally agreed upon as being the best one for accountability decisions, and all VAMs have both favorable and unfavorable features depending upon the context in which they are applied. In any comparison between VAMs that are used on either simulated or real data sets, there is no way to definitively assess which model is producing the correct (or closer to correct) teacher effects, which are assumed to reflect teacher effectiveness in the classroom. In a study that compared a simple fixed effects model (SFEM) that was parameterized as a gain score model, a layered mixed effects model (LMEM) that has similarities to the SAS EVAAS model, and a hierarchical linear mixed model (HLMM), the researchers suggested that policymakers, school districts, and stakeholders would likely prefer the SFEM because of its transparency (Tekwe et al., 24). Three cohorts of elementary school students with test score data in reading and mathematics were used to calculate school effects under these models. The researchers found high correlations between rankings from all these models, ranging from.91 to 1. in reading and from.96 to 1. in mathematics. Since they believed that the SFEM was the more desirable model because it was more easily understandable, the authors concluded there was no benefit to using the other models in this context.

Supporting State Efforts to Design and Implement Teacher Evaluation Systems. Glossary

Supporting State Efforts to Design and Implement Teacher Evaluation Systems. Glossary Supporting State Efforts to Design and Implement Teacher Evaluation Systems A Workshop for Regional Comprehensive Center Staff Hosted by the National Comprehensive Center for Teacher Quality With the Assessment

More information

Using Value Added Models to Evaluate Teacher Preparation Programs

Using Value Added Models to Evaluate Teacher Preparation Programs Using Value Added Models to Evaluate Teacher Preparation Programs White Paper Prepared by the Value-Added Task Force at the Request of University Dean Gerardo Gonzalez November 2011 Task Force Members:

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Connecting English Language Learning and Academic Performance: A Prediction Study

Connecting English Language Learning and Academic Performance: A Prediction Study Connecting English Language Learning and Academic Performance: A Prediction Study American Educational Research Association Vancouver, British Columbia, Canada Jadie Kong Sonya Powers Laura Starr Natasha

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Tulsa Public Schools Teacher Observation and Evaluation System: Its Research Base and Validation Studies

Tulsa Public Schools Teacher Observation and Evaluation System: Its Research Base and Validation Studies Tulsa Public Schools Teacher Observation and Evaluation System: Its Research Base and Validation Studies Summary The Tulsa teacher evaluation model was developed with teachers, for teachers. It is based

More information

Interpretive Guide for the Achievement Levels Report (2003 Revision) ITBS/ITED Testing Program

Interpretive Guide for the Achievement Levels Report (2003 Revision) ITBS/ITED Testing Program Interpretive Guide for the Achievement Levels Report (2003 Revision) ITBS/ITED Testing Program The purpose of this Interpretive Guide is to provide information to individuals who will use the Achievement

More information

American Statistical Association

American Statistical Association American Statistical Association Promoting the Practice and Profession of Statistics ASA Statement on Using Value-Added Models for Educational Assessment April 8, 2014 Executive Summary Many states and

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY I. INTRODUCTION According to the Common Core Standards (2010), Decisions or predictions are often based on data numbers

More information

Automated Scoring for the Assessment of Common Core Standards

Automated Scoring for the Assessment of Common Core Standards Automated Scoring for the Assessment of Common Core Standards David M. Williamson, Senior Research Director, Applied Research and Development, ETS Randy E. Bennett, Frederiksen Chair, Assessment Innovation,

More information

Assessment Policy. 1 Introduction. 2 Background

Assessment Policy. 1 Introduction. 2 Background Assessment Policy 1 Introduction This document has been written by the National Foundation for Educational Research (NFER) to provide policy makers, researchers, teacher educators and practitioners with

More information

Stability of School Building Accountability Scores and Gains. CSE Technical Report 561. Robert L. Linn CRESST/University of Colorado at Boulder

Stability of School Building Accountability Scores and Gains. CSE Technical Report 561. Robert L. Linn CRESST/University of Colorado at Boulder Stability of School Building Accountability Scores and Gains CSE Technical Report 561 Robert L. Linn CRESST/University of Colorado at Boulder Carolyn Haug University of Colorado at Boulder April 2002 Center

More information

Florida s Plan to Ensure Equitable Access to Excellent Educators. heralded Florida for being number two in the nation for AP participation, a dramatic

Florida s Plan to Ensure Equitable Access to Excellent Educators. heralded Florida for being number two in the nation for AP participation, a dramatic Florida s Plan to Ensure Equitable Access to Excellent Educators Introduction Florida s record on educational excellence and equity over the last fifteen years speaks for itself. In the 10 th Annual AP

More information

The MetLife Survey of

The MetLife Survey of The MetLife Survey of Preparing Students for College and Careers Part 2: Teaching Diverse Learners The MetLife Survey of the American Teacher: Preparing Students for College and Careers The MetLife Survey

More information

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic

More information

Design principles for assessment-based accountability systems

Design principles for assessment-based accountability systems Invited Testimony Texas Commission on Next Generation Assessments and Accountability Austin, Texas, January 20, 2016 Design principles for assessment-based accountability systems 10 principles for test-based

More information

Simulations, Games and Experiential Learning Techniques:, Volume 1,1974

Simulations, Games and Experiential Learning Techniques:, Volume 1,1974 EXPERIENCES WITH THE HARVARD MANAGEMENT GAME Ralph M. Roberts, The University of West Florida The Harvard Management Game [1] was introduced at The University of West Florida in the Fall of 1971, as the

More information

Technical Report. Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: 2004-2005 to 2006-2007

Technical Report. Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: 2004-2005 to 2006-2007 Page 1 of 16 Technical Report Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: 2004-2005 to 2006-2007 George H. Noell, Ph.D. Department of Psychology Louisiana

More information

NCEE EVALUATION BRIEF April 2014 STATE REQUIREMENTS FOR TEACHER EVALUATION POLICIES PROMOTED BY RACE TO THE TOP

NCEE EVALUATION BRIEF April 2014 STATE REQUIREMENTS FOR TEACHER EVALUATION POLICIES PROMOTED BY RACE TO THE TOP NCEE EVALUATION BRIEF April 2014 STATE REQUIREMENTS FOR TEACHER EVALUATION POLICIES PROMOTED BY RACE TO THE TOP Congress appropriated approximately $5.05 billion for the Race to the Top (RTT) program between

More information

A Practitioner s Guide to Growth Models. Authored By: Katherine E. Castellano, University of California, Berkeley

A Practitioner s Guide to Growth Models. Authored By: Katherine E. Castellano, University of California, Berkeley A Practitioner s Guide to Growth Models Katherine E. Castellano University of California, Berkeley Andrew D. Ho Harvard Graduate School of Education February 2013 Authored By: Katherine E. Castellano,

More information

2013 A-F Letter Grade Accountability System TECHNICAL MANUAL

2013 A-F Letter Grade Accountability System TECHNICAL MANUAL 2013 A-F Letter Grade Accountability System TECHNICAL MANUAL Arizona Department of Education John Huppenthal, Superintendent For more information, please contact: Research & Evaluation Section (602) 542-5151

More information

Investment manager research

Investment manager research Page 1 of 10 Investment manager research Due diligence and selection process Table of contents 2 Introduction 2 Disciplined search criteria 3 Comprehensive evaluation process 4 Firm and product 5 Investment

More information

School Leader s Guide to the 2015 Accountability Determinations

School Leader s Guide to the 2015 Accountability Determinations School Leader s Guide to the 2015 Accountability Determinations This guide is intended to help district and school leaders understand Massachusetts accountability measures, and provides an explanation

More information

Public Housing and Public Schools: How Do Students Living in NYC Public Housing Fare in School?

Public Housing and Public Schools: How Do Students Living in NYC Public Housing Fare in School? Furman Center for real estate & urban policy New York University school of law wagner school of public service november 2008 Policy Brief Public Housing and Public Schools: How Do Students Living in NYC

More information

A STUDY OF WHETHER HAVING A PROFESSIONAL STAFF WITH ADVANCED DEGREES INCREASES STUDENT ACHIEVEMENT MEGAN M. MOSSER. Submitted to

A STUDY OF WHETHER HAVING A PROFESSIONAL STAFF WITH ADVANCED DEGREES INCREASES STUDENT ACHIEVEMENT MEGAN M. MOSSER. Submitted to Advanced Degrees and Student Achievement-1 Running Head: Advanced Degrees and Student Achievement A STUDY OF WHETHER HAVING A PROFESSIONAL STAFF WITH ADVANCED DEGREES INCREASES STUDENT ACHIEVEMENT By MEGAN

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Should non-cognitive skills be included in school accountability systems? Preliminary evidence from California s CORE districts

Should non-cognitive skills be included in school accountability systems? Preliminary evidence from California s CORE districts Evidence Speaks Reports, Vol 1, #13 March 17, 2016 Should non-cognitive skills be included in school accountability systems? Preliminary evidence from California s CORE districts Martin R. West Executive

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) April 30, 2008 Abstract A randomized Mode Experiment of 27,229 discharges from 45 hospitals was used to develop adjustments for the

More information

COUPLE OUTCOMES IN STEPFAMILIES

COUPLE OUTCOMES IN STEPFAMILIES COUPLE OUTCOMES IN STEPFAMILIES Vanessa Leigh Bruce B. Arts, B. Psy (Hons) This thesis is submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Clinical Psychology,

More information

National Chiayi University Department of Education, Coursework Guidelines for Master s and Doctoral Students

National Chiayi University Department of Education, Coursework Guidelines for Master s and Doctoral Students National Chiayi University Department of Education, Coursework Guidelines for Master s and Doctoral Students 1. Classes The graduate institute of this department offers master s degree and doctoral degree

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Abstract Title Page Not included in page count.

Abstract Title Page Not included in page count. Abstract Title Page Not included in page count. Title: The Impact of The Stock Market Game on Financial Literacy and Mathematics Achievement: Results from a National Randomized Controlled Trial. Author(s):

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

ILLINOIS STATE BOARD OF EDUCATION MEETING October 16, 2002

ILLINOIS STATE BOARD OF EDUCATION MEETING October 16, 2002 ILLINOIS STATE BOARD OF EDUCATION MEETING October 16, 2002 TO: FROM: Illinois State Board of Education Robert E. Schiller, Superintendent Christopher Koch, Director Agenda Topic: Materials: Staff Contact(s):

More information

TIME-MANAGEMENT PRACTICES OF SCHOOL PRINCIPALS IN THE UNITED STATES. Peggie Johnson Robertson. Dissertation submitted to the Faculty of the

TIME-MANAGEMENT PRACTICES OF SCHOOL PRINCIPALS IN THE UNITED STATES. Peggie Johnson Robertson. Dissertation submitted to the Faculty of the TIME-MANAGEMENT PRACTICES OF SCHOOL PRINCIPALS IN THE UNITED STATES by Peggie Johnson Robertson Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

Every Student Succeeds Act

Every Student Succeeds Act Every Student Succeeds Act A New Day in Public Education Frequently Asked Questions STANDARDS, ASSESSMENTS AND ACCOUNTABILITY Q: What does ESSA mean for a classroom teacher? A: ESSA will end the obsession

More information

Teacher Prep Student Performance Models - Six Core Principles

Teacher Prep Student Performance Models - Six Core Principles Teacher preparation program student performance models: Six core design principles Just as the evaluation of teachers is evolving into a multifaceted assessment, so too is the evaluation of teacher preparation

More information

Information and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools. Jonah E. Rockoff 1 Columbia Business School

Information and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools. Jonah E. Rockoff 1 Columbia Business School Preliminary Draft, Please do not cite or circulate without authors permission Information and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools Jonah E. Rockoff 1 Columbia

More information

III. FREE APPROPRIATE PUBLIC EDUCATION (FAPE)

III. FREE APPROPRIATE PUBLIC EDUCATION (FAPE) III. FREE APPROPRIATE PUBLIC EDUCATION (FAPE) Understanding what the law requires in terms of providing a free appropriate public education to students with disabilities is central to understanding the

More information

Placement Stability and Number of Children in a Foster Home. Mark F. Testa. Martin Nieto. Tamara L. Fuller

Placement Stability and Number of Children in a Foster Home. Mark F. Testa. Martin Nieto. Tamara L. Fuller Placement Stability and Number of Children in a Foster Home Mark F. Testa Martin Nieto Tamara L. Fuller Children and Family Research Center School of Social Work University of Illinois at Urbana-Champaign

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

096 Professional Readiness Examination (Mathematics)

096 Professional Readiness Examination (Mathematics) 096 Professional Readiness Examination (Mathematics) Effective after October 1, 2013 MI-SG-FLD096M-02 TABLE OF CONTENTS PART 1: General Information About the MTTC Program and Test Preparation OVERVIEW

More information

Learning and Teaching

Learning and Teaching B E S T PRACTICES NEA RESEARCH BRIEF Learning and Teaching July 2006 This brief outlines nine leading research-based concepts that have served as a foundation for education reform. It compares existing

More information

TEST-DRIVEN accountability is now the

TEST-DRIVEN accountability is now the Ten Big Effects of the No Child Left Behind Act on Public Schools The Center on Education Policy has been carefully monitoring the implementation of NCLB for four years. Now Mr. Jennings and Ms. Rentner

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

Executive Summary April 2009

Executive Summary April 2009 Executive Summary April 2009 About the International Coach Federation The International Coach Federation (ICF) is the largest worldwide resource for business and personal coaches, and the source for those

More information

The test uses age norms (national) and grade norms (national) to calculate scores and compare students of the same age or grade.

The test uses age norms (national) and grade norms (national) to calculate scores and compare students of the same age or grade. Reading the CogAT Report for Parents The CogAT Test measures the level and pattern of cognitive development of a student compared to age mates and grade mates. These general reasoning abilities, which

More information

Master Plan Evaluation Report for English Learner Programs

Master Plan Evaluation Report for English Learner Programs Master Plan Evaluation Report (2002-03) for English Learner Programs Page i Los Angeles Unified School District Master Plan Evaluation Report for English Learner Programs 2002-03 Prepared by Jesús José

More information

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores Wayne J. Camara, Dongmei Li, Deborah J. Harris, Benjamin Andrews, Qing Yi, and Yong He ACT Research Explains

More information

Secondly, this study was peer reviewed, as I have mentioned, by other top experts in the testing and measurement community before it was released.

Secondly, this study was peer reviewed, as I have mentioned, by other top experts in the testing and measurement community before it was released. HOME SCHOOLING WORKS Pass it on! Online Press Conference March 23, 1999, 12:00pm EST A transcript of the opening remarks by Michael Farris, Esq. & Lawrence M. Rudner, Ph.D. Michael Farris: Good morning.

More information

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions IMPLEMENTATION NOTE Subject: Category: Capital No: A-1 Date: January 2006 I. Introduction The term rating system comprises all of the methods, processes, controls, data collection and IT systems that support

More information

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory

More information

Teacher Performance Evaluation System

Teacher Performance Evaluation System Chandler Unified School District Teacher Performance Evaluation System Revised 2015-16 Purpose The purpose of this guide is to outline Chandler Unified School District s teacher evaluation process. The

More information

Economic inequality and educational attainment across a generation

Economic inequality and educational attainment across a generation Economic inequality and educational attainment across a generation Mary Campbell, Robert Haveman, Gary Sandefur, and Barbara Wolfe Mary Campbell is an assistant professor of sociology at the University

More information

Core Goal: Teacher and Leader Effectiveness

Core Goal: Teacher and Leader Effectiveness Teacher and Leader Effectiveness Board of Education Update January 2015 1 Assure that Tulsa Public Schools has an effective teacher in every classroom, an effective principal in every building and an effective

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Analysis of academy school performance in GCSEs 2014

Analysis of academy school performance in GCSEs 2014 Analysis of academy school performance in GCSEs 2014 Final report Report Analysis of academy school performance in GCSEs 2013 1 Analysis of Academy School Performance in GCSEs 2014 Jack Worth Published

More information

Academic Achievement of English Language Learners in Post Proposition 203 Arizona

Academic Achievement of English Language Learners in Post Proposition 203 Arizona Academic Achievement of English Language Learners in Post Proposition 203 Arizona by Wayne E. Wright Assistant Professor University of Texas, San Antonio Chang Pu Doctoral Student University of Texas,

More information

WORKING PAPEr 22. By Elias Walsh and Eric Isenberg. How Does a Value-Added Model Compare to the Colorado Growth Model?

WORKING PAPEr 22. By Elias Walsh and Eric Isenberg. How Does a Value-Added Model Compare to the Colorado Growth Model? WORKING PAPEr 22 By Elias Walsh and Eric Isenberg How Does a Value-Added Model Compare to the Colorado Growth Model? October 2013 Abstract We compare teacher evaluation scores from a typical value-added

More information

Canonical Correlation Analysis

Canonical Correlation Analysis Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,

More information

classroom Tool Part 3 of a 5 Part Series: How to Select The Right

classroom Tool Part 3 of a 5 Part Series: How to Select The Right How to Select The Right classroom Observation Tool This booklet outlines key questions that can guide observational tool selection. It is intended to provide guiding questions that will help users organize

More information

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Final Report Sarah Maughan Ben Styles Yin Lin Catherine Kirkup September 29 Partial Estimates of Reliability:

More information

Chapter 5. Summary, Conclusions, and Recommendations. The overriding purpose of this study was to determine the relative

Chapter 5. Summary, Conclusions, and Recommendations. The overriding purpose of this study was to determine the relative 149 Chapter 5 Summary, Conclusions, and Recommendations Summary The overriding purpose of this study was to determine the relative importance of construction as a curriculum organizer when viewed from

More information

The MetLife Survey of

The MetLife Survey of The MetLife Survey of Challenges for School Leadership Challenges for School Leadership A Survey of Teachers and Principals Conducted for: MetLife, Inc. Survey Field Dates: Teachers: October 5 November

More information

Exponential Growth and Modeling

Exponential Growth and Modeling Exponential Growth and Modeling Is it Really a Small World After All? I. ASSESSSMENT TASK OVERVIEW & PURPOSE: Students will apply their knowledge of functions and regressions to compare the U.S. population

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Mapping State Proficiency Standards Onto the NAEP Scales:

Mapping State Proficiency Standards Onto the NAEP Scales: Mapping State Proficiency Standards Onto the NAEP Scales: Variation and Change in State Standards for Reading and Mathematics, 2005 2009 NCES 2011-458 U.S. DEPARTMENT OF EDUCATION Contents 1 Executive

More information

Technical Review Coversheet

Technical Review Coversheet Status: Submitted Last Updated: 8/6/1 4:17 PM Technical Review Coversheet Applicant: Seattle Public Schools -- Strategic Planning and Alliances, (S385A1135) Reader #1: ********** Questions Evaluation Criteria

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

The Virginia Reading Assessment: A Case Study in Review

The Virginia Reading Assessment: A Case Study in Review The Virginia Reading Assessment: A Case Study in Review Thomas A. Elliott When you attend a conference organized around the theme of alignment, you begin to realize how complex this seemingly simple concept

More information

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

More information

Validity, Fairness, and Testing

Validity, Fairness, and Testing Validity, Fairness, and Testing Michael Kane Educational Testing Service Conference on Conversations on Validity Around the World Teachers College, New York March 2012 Unpublished Work Copyright 2010 by

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Chapter 2 - Why RTI Plays An Important. Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004

Chapter 2 - Why RTI Plays An Important. Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004 Chapter 2 - Why RTI Plays An Important Role in the Determination of Specific Learning Disabilities (SLD) under IDEA 2004 How Does IDEA 2004 Define a Specific Learning Disability? IDEA 2004 continues to

More information

Value-Added Measures of Educator Performance: Clearing Away the Smoke and Mirrors

Value-Added Measures of Educator Performance: Clearing Away the Smoke and Mirrors Value-Added Measures of Educator Performance: Clearing Away the Smoke and Mirrors (Book forthcoming, Harvard Educ. Press, February, 2011) Douglas N. Harris Associate Professor of Educational Policy and

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Competitive Pay Policy

Competitive Pay Policy www.salary.com/hr Copyright 2002 Salary.com, Inc. Competitive Pay Policy Lena M. Bottos and Christopher J. Fusco, SPHR Salary.com, Inc. Abstract A competitive pay policy articulates an organization s strategy

More information

Using the Leadership Pipeline transition focused concept as the vehicle in integrating your leadership development approach provides:

Using the Leadership Pipeline transition focused concept as the vehicle in integrating your leadership development approach provides: Building your Leadership Pipeline Leadership transition focused development - White Paper The Leadership Pipeline framework Business case reflections: 1. Integrated leadership development 2. Leadership

More information

MEMO TO: FROM: RE: Background

MEMO TO: FROM: RE: Background MEMO TO: FROM: RE: Amy McIntosh, Principal Deputy Assistant Secretary, delegated the authority of the Assistant Secretary, Office of Planning, Evaluation and Policy Development Dr. Erika Hunt and Ms. Alicia

More information

Missing data in randomized controlled trials (RCTs) can

Missing data in randomized controlled trials (RCTs) can EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS Gary Kader and Mike Perry Appalachian State University USA This paper will describe a content-pedagogy course designed to prepare elementary

More information

Meta-Analytic Synthesis of Studies Conducted at Marzano Research Laboratory on Instructional Strategies

Meta-Analytic Synthesis of Studies Conducted at Marzano Research Laboratory on Instructional Strategies Meta-Analytic Synthesis of Studies Conducted at Marzano Research Laboratory on Instructional Strategies By Mark W. Haystead & Dr. Robert J. Marzano Marzano Research Laboratory Englewood, CO August, 2009

More information

Raw Score to Scaled Score Conversions

Raw Score to Scaled Score Conversions Jon S Twing, PhD Vice President, Psychometric Services NCS Pearson - Iowa City Slide 1 of 22 Personal Background Doctorate in Educational Measurement and Statistics, University of Iowa Responsible for

More information

ALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES. Non-Regulatory Guidance

ALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES. Non-Regulatory Guidance ALTERNATE ACHIEVEMENT STANDARDS FOR STUDENTS WITH THE MOST SIGNIFICANT COGNITIVE DISABILITIES Non-Regulatory Guidance August 2005 Alternate Achievement Standards for Students with the Most Significant

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Interpreting and Using SAT Scores

Interpreting and Using SAT Scores Interpreting and Using SAT Scores Evaluating Student Performance Use the tables in this section to compare a student s performance on SAT Program tests with the performance of groups of students. These

More information

Program Rating Sheet - Athens State University Athens, Alabama

Program Rating Sheet - Athens State University Athens, Alabama Program Rating Sheet - Athens State University Athens, Alabama Undergraduate Secondary Teacher Prep Program: Bachelor of Science in Secondary Education with Certification, Social Science 2013 Program Rating:

More information

Problem of the Month Through the Grapevine

Problem of the Month Through the Grapevine The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems

More information

School Performance Framework: Technical Guide

School Performance Framework: Technical Guide School Performance Framework: Technical Guide Version 1.6 August 2010 This technical guide provides information about the following topics as they related to interpreting the school performance framework

More information

An introduction to Value-at-Risk Learning Curve September 2003

An introduction to Value-at-Risk Learning Curve September 2003 An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk

More information