Measurement: Reliability and Validity



Similar documents
Measurement. How are variables measured?

Medical Outcomes Study Questionnaire Short Form 36 Health Survey (SF-36)

ACTIVITY LEFT ARM RIGHT ARM

The RAND 36-Item Health Survey

Test-Retest Reliability and The Birkman Method Frank R. Larkey & Jennifer L. Knight, 2002

Quality of Life. Questionnaire 3. 4 weeks after randomisation. Graag in laten vullen door geincludeerde patiënt METEX studie

RESEARCH METHODS IN I/O PSYCHOLOGY

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

Medicare Health Outcomes Survey Modified (HOS-M) Questionnaire (English) Insert HOS-M Cover Art (English)

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

White House 806 West Franklin Street P.O. Box Richmond, Virginia

UNDERSTANDING THE TWO-WAY ANOVA

Measurement: Reliability and Validity Measures. Jonathan Weiner, DrPH Johns Hopkins University

Chapter 5 Conceptualization, Operationalization, and Measurement

DATA COLLECTION AND ANALYSIS

Reliability and validity, the topics of this and the next chapter, are twins and

RESEARCH METHODS IN I/O PSYCHOLOGY

Chapter 3 Psychometrics: Reliability & Validity

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Test Reliability Indicates More than Just Consistency

Validity and Reliability in Social Science Research

Investigating the genetic basis for intelligence

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

A PARADIGM FOR DEVELOPING BETTER MEASURES OF MARKETING CONSTRUCTS

General Symptom Measures

Chapter Eight: Quantitative Methods

An Instructor s Guide to Understanding Test Reliability. Craig S. Wells James A. Wollack

a. Will the measure employed repeatedly on the same individuals yield similar results? (stability)

Part 2: Analysis of Relationship Between Two Variables

Stroke Impact Scale VERSION 3.0

The Personal Learning Insights Profile Research Report

Guided Reading 9 th Edition. informed consent, protection from harm, deception, confidentiality, and anonymity.

Chapter 1: The Nature of Probability and Statistics

The Link Between Hypnosis & Thai Yoga Bodywork

Descriptive Statistics

Reliability Analysis

Basic Concepts in Research and Data Analysis

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Sample Size and Power in Clinical Trials

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Step 6: Writing Your Hypotheses Written and Compiled by Amanda J. Rockinson-Szapkiw

1 Short Introduction to Time Series

psychology the science of psychology CHAPTER third edition Psychology, Third Edition Saundra K. Ciccarelli J. Noland White

CALCULATIONS & STATISTICS

Intelligence. My Brilliant Brain. Conceptual Difficulties. What is Intelligence? Chapter 10. Intelligence: Ability or Abilities?

A Theory of Social Comparison Processes Leon Festinger

Multiple Imputation for Missing Data: A Cautionary Tale

Symphysis Pubis Dysfunction (SPD)

INTRODUCTION TO MULTIPLE CORRELATION

Technical Information

Two-Way ANOVA Lab: Interactions

II. DISTRIBUTIONS distribution normal distribution. standard scores

The child is given oral, "trivia"- style. general information questions. Scoring is pass/fail.

ST-GEORGE RESPIRATORY QUESTIONNAIRE (SGRQ)

FACTOR ANALYSIS NASC

COUNSELLING IN PSYCHOLOGICAL TESTING

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Association Between Variables

WHAT IS A JOURNAL CLUB?

5.1 Identifying the Target Parameter

Quantitative Research: Reliability and Validity

Theoretical Biophysics Group

St. George s Respiratory Questionnaire PART 1

Sampling Procedures Y520. Strategies for Educational Inquiry. Robert S Michael

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

This chapter discusses some of the basic concepts in inferential statistics.

Nursing Journal Toolkit: Critiquing a Quantitative Research Article

Challenging Statistical Claims in the Media: Course and Gender Effects

MASTER OF ARTS IN PSYCHOLOGY (208)

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Appendix A: Science Practices for AP Physics 1 and 2

standardized tests used to assess mental ability & development, in an educational setting.

CHAPTER 7 PRESENTATION AND ANALYSIS OF THE RESEARCH FINDINGS

Descriptive Statistics and Measurement Scales

Human Resources Generalist/Consultant

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Chapter 8: Quantitative Sampling

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Experimental methods. Elisabeth Ahlsén Linguistic Methods Course

Correlational Research

Standard Deviation Estimator

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Occupational Solution

Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview

Sample Return-to-Work Program

Sources of Validity Evidence

Intelligence. Cognition (Van Selst) Cognition Van Selst (Kellogg Chapter 10)

Richard E. Zinbarg northwestern university, the family institute at northwestern university. William Revelle northwestern university

Statistics, Research, & SPSS: The Basics

Standardized Tests, Intelligence & IQ, and Standardized Scores

Standard Operating Procedure for Handling an Inanimate Load

Transcription:

Measurement: Reliability and Validity Y520 Strategies for Educational Inquiry Robert S Michael Reliability & Validity-1 Introduction: Reliability & Validity All measurements, especially measurements of behaviors, opinions, and constructs, are subject to fluctuations (error) that can affect the measurement s reliability and validity. Reliability refers to consistency or stability of measurement. Can our measure or other form of observation be confirmed by further measurements or observations? If you measure the same thing would you get the same score? Validity refers to the suitability or meaningfulness of the measurement. Does this instrument describe accurately the construct I am attempting to measure? In statistical terms: Validity is analogous to unbiasedness (valid = unbiased) Reliability is analogous to variance (low reliability = high variance) Reliability & Validity-2 1

Reliability The observed score on an instrument can be divided into two parts: Observed score = true score + error An instrument is said to be reliable if it accurately reflects the true score, and thus minimizes the error component. The reliability coefficient is the proportion of true variability to the total observed (or obtained) variability. E.g., If you see a reliability coefficient of.85, this means that 85% of the variability in observed scores is presumed to represent true individual differences and 15% of the variability is due to random error. Reliability is a correlation computed between two events: Repeated use of the instrument (stability) Similarity of items (split half / internal consistency) Equivalence of 2 instruments (equivalence) Reliability & Validity-3 Stability: Test-Retest Test - retest reliability: Same group of respondents complete the instrument at two different points in times. How stable are the responses? The correlation coefficient between the 2 sets of scores describes the degree of reliability. For cognitive domain tests, correlations of.90 or higher are good. For personality tests,.50 or higher. For attitude scales,.70 or higher. Problems with test-retest reliability procedures: Differences in performance on the second test may be due to the first test - i.e., responses actually change as a result of the first test. Many constructs of interest actually change over time independent of the stability of the measure. The interval between the 2 administrations may be too long and the construct you are attempting to measure may have changed. The interval may be too short. Reliability is inflated because of memory. Reliability & Validity-4 2

Alternate (aka Equivalent) Forms Involves using differently worded questions to measure the same construct. Questions or items are reworded to produce two items that are similar but not identical. Items must focus on the same exact aspect of behavior with the same vocabulary level and same level of difficulty. Items must differ only in their wording. Reliability is the correlation between the responses to the pairs of questions. Alternate forms reliability is said to avoid the practice effects that can inflate test-retest reliability (i.e., respondent can recall how they answered on the identical item on the first test administration). Reliability & Validity-5 Alternate Forms: Example From the Watson-Glaser Critical Thinking test: Test A: Terry, don t worry about it. You ll graduate someday. You re a college student, right? And all college students graduate sooner or later. Test B: Charlie, don t worry about it. You ll get a promotion someday. You re working for a good company, right? And everyone who works for a good company gets a promotion sooner or later. Reliability & Validity-6 3

Internal Consistency: Homogeneity Is a measure of how well related, but different, items all measure the same thing. Is applied to groups of items thought to measure different aspects of the same concept. A single item taps only one aspect of a concept. If several different items are used to gain information about a particular behavior or construct, the data set is richer and more reliable. Reliability & Validity-7 Internal Consistency Example Example: The Rand 36-item Health Survey measures 8 dimensions of health. One of these dimensions is physical function. Instead of asking just one question, How limited are you in your day-to-day activities? Rand found that asking 10 questions produced more reliable results, and conveyed a better understanding of physical function. Reliability & Validity-8 4

Internal Consistency: Rand Example The following questions are about activities you might do during a typical day. Does your health now limit you in these activities. If so, how much? (Response options are: limited a lot, limited a little, not limited at all). Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports. Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing golf. Lifting or carrying groceries. Climbing several flights of stairs. Climbing one flight of stairs. Bending, kneeling, or stooping. Walking more than a mile. Walking several blocks. Walking one block. Bathing or dressing yourself. Reliability & Validity-9 Internal Consistency: Cronbach s Alpha Cronbach s alpha: Indicates degree of internal consistency. Is a function of the number of items in the scale and the degree of their intercorrelations. Ranges form 0 to 1 (never see 1). Measures the proportion of variability that is shared among items (covariance). When items all tend to measure the same thing they are highly related and alpha is high. When items tend to measure different things they have very little correlation with each other and alpha is low. The major source of measurement error is due to the sampling of content. Reliability & Validity-10 5

Cronbach s Alpha Conceptual Formula: where k = number of items and r = the average correlation among all items. Given a 10 item scale with the average correlation between items is.50 (e.g., calculate the correlation between iems 1 and 2, between item 1 and 3, between 1 and 4, etc. for all possible pairs). (10 taken 2 at a time = 45 pairs) Alpha = 10 (.5) / (1+9(.5)) = 0.91 Reliability & Validity-11 Validity Validity indicates how well an instrument measures the construct it purports to measure. Types of validity: Construct validity Content validity Criterion validity Reliability & Validity-12 6

Content Validity Is a judgment of how appropriate the items seem to a panel of reviewers who have knowledge of the subject matter. Does the instrument include everything it should and nothing it should not? An instrument has content validity when its items are randomly chosen from the universe of all possible items. Example: Constructing a vocabulary test using a sample of all vocabulary words studied in a semester. Content validity is a subjective judgment; there is no correlation coefficient. A table of specification helps. Reliability & Validity-13 Criterion Validity Criterion validity indicates how well, or poorly, and instrument compares to either another instrument or another predictor. It has two parts: concurrent validity & predictive validity Concurrent validity requires that the scale be judged against some other method that is acknowledged as a gold standard (e.g., another well-accepted scale). If the correlation between the new scale and the old scale is strong, the new scale is said to have concurrent validity. Predictive validity is the ability of a scale to forecast future events, behaviors, attitudes or outcomes Do SAT scores predict first semester GPA? Validity coefficients are correlation coefficients between the new scale and the gold standard or some future outcome. Reliability & Validity-14 7

Construct Validity Indicates how well the scale measures the construct it was designed to measure. Ways of establishing construct validity: Known group differences Correlation with measure of similar constructs Correlation with unrelated & dissimiilar constructs Internal consistency Response to experimental manipulation Opinion of judges. Reliability & Validity-15 Inferences in Measurement The following slide contains a list of nine items. For each item: Describe how we measure, Identify the level of measurement, Describe the inference we make, and Describe the construct. Reliability & Validity-16 8

Inferences in Measurement Length of table Weight of person Speed of car Temperature Humidity Wind chill index Discomfort Index (Smog Index / Pollen Index) Intelligence Anxiety Reliability & Validity-17 Inventing Constructs How would you describe the relationship(s) among the following: Operational definitions Variables Measurement Scales of measurement Inferences from measurements Constructs Discuss each for the items on the following slide Reliability & Validity-18 9

Inventing Constructs Temperature Wind Chill Discomfort Index Intelligence Extroversion Reliability & Validity-19 Example: Intelligence and its Measurement Intelligence is one construct that continues to generate discussion, and divergent opinions. The Journal of Educational Psychology held a symposium in 1921 entitled Intelligence and its Measurement. Definitions of the constructed were offered by researchers. As you read, how would you operationalize each definition? What would items on a scale look like? Reliability & Validity-20 10

Definitions: Intelligence and its Measurement (a) The ability to give responses that are true or factual. E. L. Thorndike The ability to carry on abstract thinking. L. M. Terman The ability to learn to adjust oneself to the environment. S. S. Colvin The ability to adapt oneself to relatively new situations in life. R. Pinter Reliability & Validity-21 Definitions: Intelligence and its Measurement (b) The capacity for knowledge and knowledge possessed. V. A. C. Henmon A biological mechanism by which the effect of a complexity of stimuli are brought together and given a somewhat unified effect in behavior. J. Peteson Reliability & Validity-22 11

Definitions: Intelligence and its Measurement (c) The ability to inhibit an instinctive adjustment, to redefine the inhibited instinctive adjustment in the light of imaginally experienced trial and error, and to realize the modified instinctive adjustment into overt behavior to the advantage of the individual as a social animal. L. L. Thurstone Reliability & Validity-23 Definitions: Intelligence and its Measurement (d) The ability to acquire abilities. H. Woodrow The ability to learn or profit by experience. W. F. Dearborn Intelligence is what intelligence tests test. E. G. Boring (1923) Reliability & Validity-24 12