The relationship between mathematics preparation and conceptual learning gains in physics: a possible hidden variable in diagnostic pretest scores



Similar documents
Real Time Physics and Interactive Lecture Demonstration Dissemination Project - Evaluation Report

Effective use of wireless student response units ( clickers ) Dr. Douglas Duncan University of Colorado, Boulder dduncan@colorado.

Effect of Visualizations and Active Learning on Students Understanding of Electromagnetism Concepts

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Fixed-Effect Versus Random-Effects Models

Evaluation Novelty in Modeling-Based and Interactive Engagement Instruction

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Hands on Banking Adults and Young Adults Test Packet

Secondary Analysis of Teaching Methods in Introductory Physics: a 50k- Student Study

A + dvancer College Readiness Online Improves. ACCUPLACER Scores of Miami Dade College Freshmen. May John Vassiliou, M. S.

A Study in Engineering and Military Ethics

Using Interactive Demonstrations at Slovak Secondary Schools

How To Find Out How Animation Affects A Conceptual Question

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Analyses of the Critical Thinking Assessment Test Results

Transformative Learning: Shifts in Students Attitudes toward Physics Measured with the Colorado Learning Attitudes about Science Survey

Computers in teaching science: To simulate or not to simulate?

Using Clicker Technology in a Clinically-based Occupational Therapy Course: Comparing Student Perceptions of Participation and Learning

How To Teach Math To A Grade 8

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

The Effect of Explicit Feedback on the Use of Language Learning Strategies: The Role of Instruction

Abstract Title: Identifying and measuring factors related to student learning: the promise and pitfalls of teacher instructional logs

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Smith on Natural Wages and Profits (Chapters VIII and IX of The Wealth of Nations 1 ) Allin Cottrell

A Self and Peer Assessment Intervention in Mathematics Content Courses for Pre-Service Elementary School Teachers Xin Ma, Richard Millman, Matt Wells

Measuring Evaluation Results with Microsoft Excel

Skepticism about the external world & the problem of other minds

Achievement and Satisfaction in a Computer-assisted Versus a Traditional Lecturing of an Introductory Statistics Course

Research Proposal: Evaluating the Effectiveness of Online Learning as. Opposed to Traditional Classroom Delivered Instruction. Mark R.

Drawing Inferences about Instructors: The Inter-Class Reliability of Student Ratings of Instruction

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

An Experiment in the Use of Mobile Phones for Testing at King Mongkut s Institute of Technology North Bangkok, Thailand

An Experiment in the Use of Mobile Phones for Testing at King Mongkut s Institute of Technology North Bangkok, Thailand

Use of Placement Tests in College Classes

Name: Date: Use the following to answer questions 3-4:

Math Science Partnership (MSP) Program: Title II, Part B

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions

Tutorial 5: Hypothesis Testing

NIA COVER PAGE New Investigator Award

Central New Mexico Community College. State Competency 1 State Competency 2 State Competency 3

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

A physics department s role in preparing physics teachers: The Colorado Learning Assistant model

Local outlier detection in data forensics: data mining approach to flag unusual schools

Progress Report Phase I Study of North Carolina Evidence-based Transition to Practice Initiative Project Foundation for Nursing Excellence

Developing Learning Activites. Robert Beichner

Virtual Teaching in Higher Education: The New Intellectual Superhighway or Just Another Traffic Jam?

WISE Power Tutorial All Exercises

The initial knowledge state of college physics students

The effects of highly rated science curriculum on goal orientation engagement and learning

The Gravity Model: Derivation and Calibration

A Study of Student Attitudes and Performance in an Online Introductory Business Statistics Class

Statistics Review PSY379

Chapter Eight: Quantitative Methods

Mark Elliot October 2014

Analysis of Student Retention Rates and Performance Among Fall, 2013 Alternate College Option (ACO) Students

10. Analysis of Longitudinal Studies Repeat-measures analysis

Recall this chart that showed how most of our course would be organized:

Engineering Cultures: Online versus In-class

The Effect of Electronic Writing Tools on Business Writing Proficiency

January 2010 Report No

A Proven Approach to Stress Testing Consumer Loan Portfolios

Research Findings on the Transition to Algebra series

Worksheet for Teaching Module Probability (Lesson 1)

Measuring the Success of Small-Group Learning in College-Level SMET Teaching: A Meta-Analysis

EXPLORATORY DATA ANALYSIS USING THE BASELINE STUDY OF THE KHANYISA EDUCATION SUPPORT PROGRAMME

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Student s Attitude of Accounting as a Profession: Can the Video "Takin' Care of Business" Make a Difference? INTRODUCTION

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Chi Square Tests. Chapter Introduction

The Mystery of Good Teaching by DAN GOLDHABER

Unit 27: Comparing Two Means

COMPARISON OF INTERNET AND TRADITIONAL CLASSROOM INSTRUCTION IN A CONSUMER ECONOMICS COURSE

FIELD STUDIES IN GEOGRAPHY AND ENVIRONMENTAL SCIENCE AS A VEHICLE FOR TEACHING SCIENCE AND MATHEMATICAL SKILLS

Student Preferences for Learning College Algebra in a Web Enhanced Environment

Comparing Loan Performance Between Races as a Test for Discrimination

Introduction to SCALE-UP : Student-Centered Activities for Large Enrollment University Physics

(Refer Slide Time: 01.26)

Teaching Phonetics with Online Resources

Educational Technology Professional Development Program

Battleships Searching Algorithms

The Journal of Scholarship of Teaching and Learning (JoSoTL)

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Test Anxiety, Student Preferences and Performance on Different Exam Types in Introductory Psychology

CLASSROOM DISCOURSE TYPES AND STUDENTS LEARNING OF AN INTERACTION DIAGRAM AND NEWTON S THIRD LAW

Course Descriptions. Seminar in Organizational Behavior II

Skills versus Concepts: Attendance and Grades in Information Technology Courses

Running Head: COMPARISON OF ONLINE STUDENTS TO TRADITIONAL 1. The Comparison of Online Students Education to the

Constructivist Teaching and Student Achievement: The Results of a School-level Classroom Observation Study in Washington

9. Sampling Distributions

Sample Size and Power in Clinical Trials

MATH 205 STATISTICAL METHODS

Descriptive Statistics

The Improvement of Critical Thinking Skills in What Philosophy Is. Maralee Harrell 1 Carnegie Mellon University

AMS 5 CHANCE VARIABILITY

CBMS Survey of Undergraduate Programs

An analysis of the 2003 HEFCE national student survey pilot data.

Academic Department: General Business Strategic Plan Results & Responses. Degree BBA: General Business

Transcription:

Addendum to: The relationship between mathematics preparation and conceptual learning gains in physics: a possible hidden variable in diagnostic pretest scores David E. Meltzer Department of Physics and Astronomy, Iowa State University, Ames, Iowa 50011 NORMALIZED LEARNING GAIN: A KEY MEASURE OF STUDENT LEARNING A single examination yields information about a student s knowledge state at one point in time. However, the primary interest of instructors is in learning, that is, the transition between states. In addition to being inadequate by itself for measuring that transition, performance on a single exam may be strongly correlated with a student s preinstruction preparation and knowledge. In order to assess learning per se it is necessary to have a measure that reflects the transition between knowledge states and that has maximum dependence on instruction, with minimum dependence on students preinstruction states. Therefore, it would be desirable to have a measure that is correlated with instructional activities, but that has minimum correlation with students pretest scores. In addition, the ideal measure would be reliable in the sense that minor differences in test instrument should yield approximately the same value of learning gain. The question of how to measure learning gain is not a simple one, and it is subject to many methodological difficulties. 1 Here I will simply identify some of the most significant issues. 1

To the extent that pre- and post-testing makes use of an instrument that has both a minimum and maximum possible score (as most do), there are issues of floor and ceiling effects. That is, a student can get no less than 0% nor more than 100% correct on such an instrument. This immediately leads to a problem in quantifying learning gain. For, a student whose pretest score is, say, 20% may gain 80 percentage points, while a student who begins at 90% may gain no more than ten. If the first student gains ten percentage points, and the second gains nine, one would be hard pressed to conclude that the first student has really learned more. Quite likely, a somewhat different assessment instrument might have led to a very different conclusion. (For instance, if the questions were significantly harder so that the two students pretest scores were 10% and 45%, respectively, the second student might very well have shown a much greater gain as measured simply in terms of percentage points.) As a consequence of the ceiling effect (and other reasons), it is common to observe a strong negative correlation between students absolute gain scores (posttest score minus pretest score) and their pretest score: higher pretest scores tend to result in smaller absolute gains, all else being equal. If, therefore, one wants a measure of learning gain that is reliable such that simply modifying the testing instrument would not lead to widely disparate results the absolute gain score is unlikely to suffice. The fact that the gain score tends to be correlated (negatively) with pretest scores is also an obstacle to isolating a measure of learning from confounding effects of preinstruction state. One way of dealing with this problem is to derive a measure that normalizes the gain score in a manner that takes some account of the variance in pretest scores. As is well known, the use of such a measure in 2

physics education research was introduced by Hake in Ref. 2; it is called the normalized gain g and it simply the absolute gain divided by the maximum possible gain: posttest score pretest score g = maximum possible score pretest score In Hake s original study he found that, for a sample of 62 high-school, college, and university physics courses enrolling over 6500 students, the mean normalized gain <g> for a given course on the Force Concept Inventory was almost completely uncorrelated (r = +0.02) with the mean pretest score of the students in that course. (In this case, mean normalized gain <g> is found by using the mean pretest score and mean posttest score of all students in the class.) At the very least, then, <g> seems to have the merit of being relatively independent of pretest score, and one might therefore expect that the reliability of whatever measure of learning gain it yields would not be threatened simply because a particular class had unusually high or low pretest scores. That is, if a diverse set of classes has a wide range of pretest scores but all other learning conditions are essentially identical, the values of normalized learning gain measured in the different classes should not differ significantly. The pretest-independence of normalized gain also suggests that a measurement of the <g> difference between two classes having very different pretest scores would be reproduced even by a somewhat different test instrument which results in a shifting of pretest scores. If one class had, say, a <g> = 0.80 and the other had <g> = 0.20, using another instrument which significantly alters their pretest scores might still be expected to yield more or less unchanged values of <g>. 3

By contrast, Hake found that the absolute gain scores were significantly (negatively) correlated with pretest score (r = 0.49). Suppose then that Class A (pretest score = 20%) had an absolute learning gain of 16 points, while Class B (pretest score = 90%) only had an eight-point gain, i.e., half as much as A. We would suspect that there might actually have been a greater learning gain in Class B, but that it was disguised by the ceiling effect on the test instrument. Suppose an assessment instrument that tested for the same concepts but with harder questions changed the pretest scores to 10% for A, and 45% for B. We might now find A with a gain of 18 points and B with a gain of 44 points and so now Class B is found to have the greater learning gain. Yet in this example, both classes would be found to have unchanged <g> values (<g> A = 0.20; <g> B = 0.80). Thus we can argue that <g> is the more reliable measure because the relationship it yields is reproduced with the use of a test instrument that differs from the original only in an insignificant manner. Now, this is a mock example with made-up numbers and untested assumptions. However it is consistent with the findings of Hake s survey and I would argue that it is a plausible scenario. Some empirical support might be drawn from the data reported at RPI (Ref. 3, Table II). For seven sections of the standard studio course, the mean pretest score and mean absolute gain score as measured on the FCI were 49.8% and 9.6%±2.2% (s.e.), respectively (s.e. standard error). The mean pretest score for the same seven sections as measured on the FMCE was lower (35.4%) and, as we might expect from the discussion above, the mean absolute gain score was indeed higher: 13.8%±1.4% (s.e.). These mean gain scores are significantly different according to a one-tailed, paired twosample t-test (t = 2.23, p = 0.03), and the higher FMCE gain might naively be interpreted 4

as a substantially higher learning gain (13.8/9.6 = 144%). However, despite the significantly different pretest scores, the mean normalized gains as determined by the two instruments were statistically indistinguishable: <g> FCI = 0.18±0.04(s.e.); <g> FMCE = 0.21±0.02(s.e.); (t = 0.84, p = 0.22). The FMCE and the FCI do not measure exactly the same concept knowledge and so one might argue that the measured learning gains really should be different. We have insufficient data to dispute that argument in this case and so this example should be seen as primarily illustrative. Probably the most persuasive empirical support for use of normalized gain as a reliable measure lies in the fact that <g> has now been measured for literally tens of thousands of students in many hundreds of classes worldwide with extremely consistent results. The values of <g> observed for both traditional courses and those taught using interactive-engagement methods both fall into relatively narrow bands which are reproduced with great regularity, for classes at a very broad range of institutions with widely varying student demographic characteristics (including pretest scores). 4 This provides a strong argument that normalized gain <g> is a valid and reliable measure of student learning gain. References 1 Walter R. Borg and Meredith D. Gall, Educational Research, An Introduction (Longman, New York, 1989), 5 th ed., pp. 728-733. 2 Richard R. Hake, Interactive engagement versus traditional methods: A six-thousandstudent survey of mechanics test data for introductory physics courses, Am. J. Phys. 66, 64-74 (1998); http://www.physics.indiana.edu/~sdi/. 5

3 Karen Cummings, Jeffrey Marx, Ronald Thornton and Dennis Kuhl, Evaluating innovations in studio physics, Phys. Educ. Res., Am. J. Phys. Suppl. 67, S38-S44 (1999). 4 Edward F. Redish and Richard N. Steinberg, Teaching physics: figuring out what works, Phys. Today 52, 24-30 (1999); Richard R. Hake, "Lessons from the physics education reform effort," Conservation Ecology 5, 28 (2002), http://www.consecol.org/vol5/iss2/art28 6