The relationship between mathematics preparation and conceptual learning gains in physics: a possible hidden variable in diagnostic pretest scores


 Abigail Hampton
 1 years ago
 Views:
Transcription
1 Addendum to: The relationship between mathematics preparation and conceptual learning gains in physics: a possible hidden variable in diagnostic pretest scores David E. Meltzer Department of Physics and Astronomy, Iowa State University, Ames, Iowa NORMALIZED LEARNING GAIN: A KEY MEASURE OF STUDENT LEARNING A single examination yields information about a student s knowledge state at one point in time. However, the primary interest of instructors is in learning, that is, the transition between states. In addition to being inadequate by itself for measuring that transition, performance on a single exam may be strongly correlated with a student s preinstruction preparation and knowledge. In order to assess learning per se it is necessary to have a measure that reflects the transition between knowledge states and that has maximum dependence on instruction, with minimum dependence on students preinstruction states. Therefore, it would be desirable to have a measure that is correlated with instructional activities, but that has minimum correlation with students pretest scores. In addition, the ideal measure would be reliable in the sense that minor differences in test instrument should yield approximately the same value of learning gain. The question of how to measure learning gain is not a simple one, and it is subject to many methodological difficulties. 1 Here I will simply identify some of the most significant issues. 1
2 To the extent that pre and posttesting makes use of an instrument that has both a minimum and maximum possible score (as most do), there are issues of floor and ceiling effects. That is, a student can get no less than 0% nor more than 100% correct on such an instrument. This immediately leads to a problem in quantifying learning gain. For, a student whose pretest score is, say, 20% may gain 80 percentage points, while a student who begins at 90% may gain no more than ten. If the first student gains ten percentage points, and the second gains nine, one would be hard pressed to conclude that the first student has really learned more. Quite likely, a somewhat different assessment instrument might have led to a very different conclusion. (For instance, if the questions were significantly harder so that the two students pretest scores were 10% and 45%, respectively, the second student might very well have shown a much greater gain as measured simply in terms of percentage points.) As a consequence of the ceiling effect (and other reasons), it is common to observe a strong negative correlation between students absolute gain scores (posttest score minus pretest score) and their pretest score: higher pretest scores tend to result in smaller absolute gains, all else being equal. If, therefore, one wants a measure of learning gain that is reliable such that simply modifying the testing instrument would not lead to widely disparate results the absolute gain score is unlikely to suffice. The fact that the gain score tends to be correlated (negatively) with pretest scores is also an obstacle to isolating a measure of learning from confounding effects of preinstruction state. One way of dealing with this problem is to derive a measure that normalizes the gain score in a manner that takes some account of the variance in pretest scores. As is well known, the use of such a measure in 2
3 physics education research was introduced by Hake in Ref. 2; it is called the normalized gain g and it simply the absolute gain divided by the maximum possible gain: posttest score pretest score g = maximum possible score pretest score In Hake s original study he found that, for a sample of 62 highschool, college, and university physics courses enrolling over 6500 students, the mean normalized gain <g> for a given course on the Force Concept Inventory was almost completely uncorrelated (r = +0.02) with the mean pretest score of the students in that course. (In this case, mean normalized gain <g> is found by using the mean pretest score and mean posttest score of all students in the class.) At the very least, then, <g> seems to have the merit of being relatively independent of pretest score, and one might therefore expect that the reliability of whatever measure of learning gain it yields would not be threatened simply because a particular class had unusually high or low pretest scores. That is, if a diverse set of classes has a wide range of pretest scores but all other learning conditions are essentially identical, the values of normalized learning gain measured in the different classes should not differ significantly. The pretestindependence of normalized gain also suggests that a measurement of the <g> difference between two classes having very different pretest scores would be reproduced even by a somewhat different test instrument which results in a shifting of pretest scores. If one class had, say, a <g> = 0.80 and the other had <g> = 0.20, using another instrument which significantly alters their pretest scores might still be expected to yield more or less unchanged values of <g>. 3
4 By contrast, Hake found that the absolute gain scores were significantly (negatively) correlated with pretest score (r = 0.49). Suppose then that Class A (pretest score = 20%) had an absolute learning gain of 16 points, while Class B (pretest score = 90%) only had an eightpoint gain, i.e., half as much as A. We would suspect that there might actually have been a greater learning gain in Class B, but that it was disguised by the ceiling effect on the test instrument. Suppose an assessment instrument that tested for the same concepts but with harder questions changed the pretest scores to 10% for A, and 45% for B. We might now find A with a gain of 18 points and B with a gain of 44 points and so now Class B is found to have the greater learning gain. Yet in this example, both classes would be found to have unchanged <g> values (<g> A = 0.20; <g> B = 0.80). Thus we can argue that <g> is the more reliable measure because the relationship it yields is reproduced with the use of a test instrument that differs from the original only in an insignificant manner. Now, this is a mock example with madeup numbers and untested assumptions. However it is consistent with the findings of Hake s survey and I would argue that it is a plausible scenario. Some empirical support might be drawn from the data reported at RPI (Ref. 3, Table II). For seven sections of the standard studio course, the mean pretest score and mean absolute gain score as measured on the FCI were 49.8% and 9.6%±2.2% (s.e.), respectively (s.e. standard error). The mean pretest score for the same seven sections as measured on the FMCE was lower (35.4%) and, as we might expect from the discussion above, the mean absolute gain score was indeed higher: 13.8%±1.4% (s.e.). These mean gain scores are significantly different according to a onetailed, paired twosample ttest (t = 2.23, p = 0.03), and the higher FMCE gain might naively be interpreted 4
5 as a substantially higher learning gain (13.8/9.6 = 144%). However, despite the significantly different pretest scores, the mean normalized gains as determined by the two instruments were statistically indistinguishable: <g> FCI = 0.18±0.04(s.e.); <g> FMCE = 0.21±0.02(s.e.); (t = 0.84, p = 0.22). The FMCE and the FCI do not measure exactly the same concept knowledge and so one might argue that the measured learning gains really should be different. We have insufficient data to dispute that argument in this case and so this example should be seen as primarily illustrative. Probably the most persuasive empirical support for use of normalized gain as a reliable measure lies in the fact that <g> has now been measured for literally tens of thousands of students in many hundreds of classes worldwide with extremely consistent results. The values of <g> observed for both traditional courses and those taught using interactiveengagement methods both fall into relatively narrow bands which are reproduced with great regularity, for classes at a very broad range of institutions with widely varying student demographic characteristics (including pretest scores). 4 This provides a strong argument that normalized gain <g> is a valid and reliable measure of student learning gain. References 1 Walter R. Borg and Meredith D. Gall, Educational Research, An Introduction (Longman, New York, 1989), 5 th ed., pp Richard R. Hake, Interactive engagement versus traditional methods: A sixthousandstudent survey of mechanics test data for introductory physics courses, Am. J. Phys. 66, (1998); 5
6 3 Karen Cummings, Jeffrey Marx, Ronald Thornton and Dennis Kuhl, Evaluating innovations in studio physics, Phys. Educ. Res., Am. J. Phys. Suppl. 67, S38S44 (1999). 4 Edward F. Redish and Richard N. Steinberg, Teaching physics: figuring out what works, Phys. Today 52, (1999); Richard R. Hake, "Lessons from the physics education reform effort," Conservation Ecology 5, 28 (2002), 6
An Introduction to Regression Analysis
The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator
More informationThe impact of training of university teachers on their teaching skills, their approach to teaching and the approach to learning of their students
The impact of training of university teachers on their teaching skills, their approach to teaching and the approach to learning of their students active learning in higher education Copyright 2004 The
More informationUsing the Collegiate Learning Assessment (CLA) to Inform Campus Planning
Using the Collegiate Learning Assessment (CLA) to Inform Campus Planning Anne L. Hafner University Assessment Coordinator CSULA April 2007 1 Using the Collegiate Learning Assessment (CLA) to Inform Campus
More informationThe Capital Asset Pricing Model: Some Empirical Tests
The Capital Asset Pricing Model: Some Empirical Tests Fischer Black* Deceased Michael C. Jensen Harvard Business School MJensen@hbs.edu and Myron Scholes Stanford University  Graduate School of Business
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationGreat Minds Think Alike, and Fools Seldom Differ: A Theory on the Moments of Firms' Performance Distributions.
Paper to be presented at the DRUID Academy 2012 on January 1921 at University of Cambridge /The Moeller Centre Great Minds Think Alike, and Fools Seldom Differ: A Theory on the Moments of Firms' Performance
More informationCan political science literatures be believed? A study of publication bias in the APSR and the AJPS
Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Alan Gerber Yale University Neil Malhotra Stanford University Abstract Despite great attention to the
More information8 INTERPRETATION OF SURVEY RESULTS
8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward
More informationUnrealistic Optimism About Future Life Events
Journal of Personality and Social Psychology 1980, Vol. 39, No. 5, 806820 Unrealistic Optimism About Future Life Events Neil D. Weinstein Department of Human Ecology and Social Sciences Cook College,
More informationWhen Moderation Is Mediated and Mediation Is Moderated
Journal of Personality and Social Psychology Copyright 2005 by the American Psychological Association 2005, Vol. 89, No. 6, 852 863 00223514/05/$12.00 DOI: 10.1037/00223514.89.6.852 When Moderation Is
More informationWhat LargeScale, Survey Research Tells Us About Teacher Effects On Student Achievement: Insights from the Prospects Study of Elementary Schools
What LargeScale, Survey Research Tells Us About Teacher Effects On Student Achievement: Insights from the Prospects Study of Elementary Schools Brian Rowan, Richard Correnti, and Robert J. Miller CPRE
More informationNeedBased Aid and Student Outcomes: The Effects of the Ohio College Opportunity Grant
NeedBased Aid and Student Outcomes: The Effects of the Ohio College Opportunity Grant Eric Bettinger Stanford University School of Education May 2010 ABSTRACT This paper exploits a natural experiment
More informationIntellectual Need and ProblemFree Activity in the Mathematics Classroom
Intellectual Need 1 Intellectual Need and ProblemFree Activity in the Mathematics Classroom Evan Fuller, Jeffrey M. Rabin, Guershon Harel University of California, San Diego Correspondence concerning
More informationEnhancing Recovery Rates in IAPT Services: Lessons from analysis of the Year One data.
P a g e 0 Restricted Enhancing Recovery Rates in IAPT Services: Lessons from analysis of the Year One data. Alex Gyani 1, Roz Shafran 1, Richard Layard 2 & David M Clark 3 1 University of Reading, 2 London
More informationREAD MORE READ BETTER? A METAANALYSIS OF THE LITERATURE ON THE RELATIONSHIP BETWEEN EXPOSURE TO READING AND READING ACHIEVEMENT M. Lewis S. J.
READ MORE READ BETTER? A METAANALYSIS OF THE LITERATURE ON THE RELATIONSHIP BETWEEN EXPOSURE TO READING AND READING ACHIEVEMENT M. Lewis S. J. Samuels University of Minnesota 1 Abstract While most educators
More informationWHAT COMMUNITY COLLEGE DEVELOPMENTAL MATHEMATICS STUDENTS UNDERSTAND ABOUT MATHEMATICS
THE CARNEGIE FOUNDATION FOR THE ADVANCEMENT OF TEACHING Problem Solution Exploration Papers WHAT COMMUNITY COLLEGE DEVELOPMENTAL MATHEMATICS STUDENTS UNDERSTAND ABOUT MATHEMATICS James W. Stigler, Karen
More informationOn Some Necessary Conditions of Learning
THE JOURNAL OF THE LEARNING SCIENCES, 15(2), 193 220 Copyright 2006, Lawrence Erlbaum Associates, Inc. On Some Necessary Conditions of Learning Ference Marton Department of Education Göteborg University,
More informationBIS RESEARCH PAPER NO. 112 THE IMPACT OF UNIVERSITY DEGREES ON THE LIFECYCLE OF EARNINGS: SOME FURTHER ANALYSIS
BIS RESEARCH PAPER NO. 112 THE IMPACT OF UNIVERSITY DEGREES ON THE LIFECYCLE OF EARNINGS: SOME FURTHER ANALYSIS AUGUST 2013 1 THE IMPACT OF UNIVERSITY DEGREES ON THE LIFECYCLE OF EARNINGS: SOME FURTHER
More informationIntroduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
More informationNSSE MultiYear Data Analysis Guide
NSSE MultiYear Data Analysis Guide About This Guide Questions from NSSE users about the best approach to using results from multiple administrations are increasingly common. More than three quarters of
More informationHAS STUDENT ACHIEVEMENT INCREASED SINCE NO CHILD LEFT BEHIND?
ANSWERING THE QUESTION THAT MATTERS MOST HAS STUDENT ACHIEVEMENT INCREASED SINCE NO CHILD LEFT BEHIND? Center on Education Policy JUNE 2007 ANSWERING THE QUESTION THAT MATTERS MOST HAS STUDENT ACHIEVEMENT
More informationRESEARCH. A CostBenefit Analysis of Apprenticeships and Other Vocational Qualifications
RESEARCH A CostBenefit Analysis of Apprenticeships and Other Vocational Qualifications Steven McIntosh Department of Economics University of Sheffield Research Report RR834 Research Report No 834 A CostBenefit
More informationEVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION. Carl Edward Rasmussen
EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NONLINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate
More informationInference by eye is the interpretation of graphically
Inference by Eye Confidence Intervals and How to Read Pictures of Data Geoff Cumming and Sue Finch La Trobe University Wider use in psychology of confidence intervals (CIs), especially as error bars in
More informationRisk Attitudes of Children and Adults: Choices Over Small and Large Probability Gains and Losses
Experimental Economics, 5:53 84 (2002) c 2002 Economic Science Association Risk Attitudes of Children and Adults: Choices Over Small and Large Probability Gains and Losses WILLIAM T. HARBAUGH University
More informationInto the Eye of the Storm: Assessing the Evidence on Science and Engineering Education, Quality, and Workforce Demand
Into the Eye of the Storm: Assessing the Evidence on Science and Engineering Education, Quality, and Workforce Demand October 2007 B. Lindsay Lowell Georgetown University lowellbl@georgetown.edu Hal Salzman
More informationIs This a Trick Question? A Short Guide to Writing Effective Test Questions
Is This a Trick Question? A Short Guide to Writing Effective Test Questions Is This a Trick Question? A Short Guide to Writing Effective Test Questions Designed & Developed by: Ben Clay Kansas Curriculum
More informationGrowing by Degrees. Online Education in the United States, 2005
Growing by Degrees Education in the United States, 2005 Neither this book nor any part maybe reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming,
More informationSCORES RELIABILITY AND VALI FERENCES ABOUT TEACHERS BA TUDENT TEST SCORES RELIABIL LIDITY OF INFERENCES ABOUT ERS BASED ON STUDENT TEST S
SCORES RELIABILITY AND VALI FERENCES ABOUT TEACHERS BA TUDENT TEST SCORES RELIABIL D VALIDITY OF INFERENCES ABO ERS BASED ON STUDENT TEST S S RELIABILITY AND VALIDITY O LIDITY OF INFERENCES ABOUT ES ABOUT
More informationDistinguishing Between Binomial, Hypergeometric and Negative Binomial Distributions
Distinguishing Between Binomial, Hypergeometric and Negative Binomial Distributions Jacqueline Wroughton Northern Kentucky University Tarah Cole Northern Kentucky University Journal of Statistics Education
More information