Measurement: Reliability and Validity Measures. Jonathan Weiner, DrPH Johns Hopkins University



Similar documents
Test Reliability Indicates More than Just Consistency

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

RESEARCH METHODS IN I/O PSYCHOLOGY

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

RESEARCH METHODS IN I/O PSYCHOLOGY

WHAT IS A JOURNAL CLUB?

Nursing Journal Toolkit: Critiquing a Quantitative Research Article

Guided Reading 9 th Edition. informed consent, protection from harm, deception, confidentiality, and anonymity.

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests

Research Methods & Experimental Design

Measurement: Reliability and Validity

Descriptive Statistics

Chapter 3 Psychometrics: Reliability & Validity

Statistics, Research, & SPSS: The Basics

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Reliability and validity, the topics of this and the next chapter, are twins and

Additional sources Compilation of sources:

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

How to Approach a Study: Concepts, Hypotheses, and Theoretical Frameworks. Lynda Burton, ScD Johns Hopkins University

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University

Validity and Reliability in Social Science Research

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

The ACGME Resident Survey Aggregate Reports: An Analysis and Assessment of Overall Program Compliance

Downtown Emergency Service Center s Vulnerability Assessment Tool for Individuals Coping with Chronic Homelessness:

Factorial Invariance in Student Ratings of Instruction

T-test & factor analysis

A PARADIGM FOR DEVELOPING BETTER MEASURES OF MARKETING CONSTRUCTS

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

II. DISTRIBUTIONS distribution normal distribution. standard scores

Latent Class Regression Part II

UNDERSTANDING THE TWO-WAY ANOVA

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

DATA COLLECTION AND ANALYSIS

Evaluation of Diagnostic and Screening Tests: Validity and Reliability. Sukon Kanchanaraksa, PhD Johns Hopkins University

Summary Measures (Ratio, Proportion, Rate) Marie Diener-West, PhD Johns Hopkins University

Designing a Questionnaire

CALCULATIONS & STATISTICS

a. Will the measure employed repeatedly on the same individuals yield similar results? (stability)

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores

Reliability Overview

Correlational Research

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Technical Information

STRONG INTEREST INVENTORY ASSESSMENT

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Early Childhood Measurement and Evaluation Tool Review

Study Designs. Simon Day, PhD Johns Hopkins University

An Empirical Study on the Effects of Software Characteristics on Corporate Performance

MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF Dear Student

General Symptom Measures

True Score Theory Measurement Error Theory of Reliability Types of Reliability Reliability & Validity

Cohort Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

Measurement Systems Correlation MSC for Suppliers

Software Solutions Appendix B. B.1 The R Software

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Industrial Hygiene Concepts

UNIVERSITY OF NAIROBI

THE ACT INTEREST INVENTORY AND THE WORLD-OF-WORK MAP

How to Get More Value from Your Survey Data

Test-Retest Reliability and The Birkman Method Frank R. Larkey & Jennifer L. Knight, 2002

Exploring Graduates Perceptions of the Quality of Higher Education

Reliability Analysis

8 th European Conference on Psychological Assessment

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter 6 Appraising and Managing Performance. True/False Questions

Measurement and Measurement Scales

CLUSTER ANALYSIS FOR SEGMENTATION

Motivations of Play in Online Games. Nick Yee. Department of Communication. Stanford University

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Sources of Validity Evidence

Measures of Prognosis. Sukon Kanchanaraksa, PhD Johns Hopkins University

Economics and Economic Evaluation

Introduction to Path Analysis

Windows-Based Meta-Analysis Software. Package. Version 2.0

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Family APGAR. Smilkstein, G. 1978

E3: PROBABILITY AND STATISTICS lecture notes

Biostatistics and Epidemiology within the Paradigm of Public Health. Sukon Kanchanaraksa, PhD Marie Diener-West, PhD Johns Hopkins University

The Influence of Stressful Life Events of College Students on Subjective Well-Being: The Mediation Effect of the Operational Effectiveness

Gerry Hobbs, Department of Statistics, West Virginia University

Charlesworth School Year Group Maths Targets

Data Analysis: Analyzing Data - Inferential Statistics

Community Change Models. William R. Brieger, MPH, CHES, DrPh Johns Hopkins University

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

How Far is too Far? Statistical Outlier Detection

Statistiek II. John Nerbonne. October 1, Dept of Information Science

FACTORS AFFECTING EMPLOYEE PERFORMANCE EVALUATION IN HAMEDAN HEALTH NETWORKS

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

Transcription:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2007, The Johns Hopkins University and Jonathan Weiner. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

Measurement: Reliability and Validity Measures Jonathan Weiner, DrPH Johns Hopkins University

Section A Definitions and Reliability

Measurement Measurement is a systematic, replicable process by which objects or events are quantified and/or classified with respect to a particular dimension This is usually achieved by the assignment of numerical values 4

Levels of Measurement 1. Nominal measure 2. Ordinal measure 3. Interval measure 4. Ratio measure 5

Measurement Reliability Reliability of a Measure The degree to which a measurement technique can be depended upon to secure consistent results upon repeated application The rubber ruler issue 6

Measurement Validity Validity of a Measure The degree to which any measurement approach or instrument succeeds in describing or quantifying what it is designed to measure The 35-inch yardstick issue 7

So, Variation in a Repeated Measure Can be due to the following reasons: 1. Chance or unsystematic events 2. Systematic inconsistency 3. Actual change in the underlying event being measured 8

Measurement Reliability Not just the property of an instrument Rather, a measure or instrument has a certain degree of reliability when applied to certain populations under certain conditions 9

Sources of Unsystematic Threats to Reliability 1. Subject reliability factors due to research subject (e.g., patient fatigue, mood) 2. Observer reliability factors due to observer/ rater/interviewer (e.g., abilities of interviewer, different opinions) 3. Situational reliability conditions under which measurements are made (e.g., busy day at the clinic, new management) Source: Suchman 10

Unsystematic Threats to Reliability 4. Instrument reliability the research instrument or measurement approach itself (e.g., poorly worded questions, quirk in mechanical device) 5. Data processing reliability manner in which data are handled (e.g., miscoding) 11

How Do We Evaluate Observer Measurement Reliability? Inter-rater agreement Compare two or more of the observers/raters at a point in time Percentage of overall agreement, Kappa Test-retest Compare measurements made by the same observer/rater at two points in time Timeframe should be short enough that the construct itself hasn t changed 12

Calculating Kappa Kappa = Observed Agreement Expected Agreement due to Chance kappa = (Oa Ea ) / (N - Ea), where Ea = sum of expected counts in cells A and D: [(N 1 * N 3 )/N] + [(N 2 * N 4 )/N] (rounded to nearest whole number) Oa = sum of observed count in cells A and D and N is the total number of respondent pairs Rater 2 Yes Rater 2 No Rater 1 Yes Rater 1 No A B N 3 = A+B C D N 4 = C+D N 1 = A+C N 2 = B+D N 13

Criteria for Interpreting Kappa Statistics Level of Agreement Kappa Value Almost Perfect Substantial Moderate Fair Slight Poor 0.81-1.00 0.61-0.80 0.41-0.60 0.21-0.40 0.00-0.20 <0.00 Source: Landis, J.R., Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics. 33:159. 14

Section B Measurement: Reliablilty and Validity of Measures

How Do We Evaluate Instrument Reliability? General "congruence" of instrument/ questionnaire (at same point in time) Item-total correlation Internal consistency the extent to which the items in a scale hang together (Cronbach s coefficient or alpha statistic) 16

How Do We Evaluate Instrument Reliability? General "congruence" of instrument/ questionnaire (at same point in time) Item-total correlation Internal consistency the extent to which the items in a scale hang together (Cronbach s coefficient or alpha statistic) Approaches developers use to assess / improve congruence and efficiency Split half (split questionnaire into two parts) Short form-long form (two different lengths) Test-Retest Similar to observer re-test 17

Calculating Cronbach s Alpha Reliability Measure for Multi-Item Scales Total scale variance = sum of item variances and all item covariances [k/(k-1)]* [1- (sum of item variances/total scale variance) Where k = number of items Range between 0 and 1 Criteria for assessment 0.70 = adequate reliability for group comparisons 0.90 = adequate reliability for individual monitoring 18

Relationship between Reliability and Validity They are closely inter-dependent There can not be validity without reliability There can be reliability without validity 19

Measurement Validity (Recap) Validity of a measure The degree to which any measurement approach or instrument succeeds in describing or quantifying what it is designed to measure Validity reflects those errors in measurement that are systematic or constant 20

The Term Validity Has More than One Connotation The general concept of validity is broader than just validity of approaches to measurement In general, measurement reliability and validity issues fall into Campbell and Stanley s instrumentation category 21

How to Evaluate Measurement Validity Face Validity Measurement is accepted by those concerned as being logical on the "face of it" (also expert validity) Content Validity Do the items included in the measure adequately represent the universe of questions that could have been asked? 22

How to Evaluate Measurement Validity Criterion-Related Validity Does the new measure agree with an external criterion, e.g., an accepted measure? Predictive evidence Predictive of future event or outcome of interest Concurrent evidence Correlation with gold standard at the same point in time Shortened scale with full scale 23

How to Evaluate Measurement Validity Construct validity is the measure consistent with the theoretical concept being measured? All tests of validity ultimately designed to support/refute the instrument s construct validity Construct validity never fully established 24

Assessing Construct Validity Convergent evidence Demonstrate that your measure correlates highly (.5-.7) with measures of the same construct Groups known to differ along construct have significantly different scores on measure Discriminant evidence Low correlation with instruments measuring a different construct; or differences between known groups Factorial evidence Clustering of items supports the theory-based grouping of items 25

Some Advanced Measurement Terms/Topics Field of psychometrics is advancing Well developed scales and composite indices are available for most measures For example, Short-Form (SF) 36, Euro-QoL Many advantages to well standardized robust multi-item measures Supported by computer generated questions in the field of Item Response Theory IRT is gaining popularity (also known as adaptive testing) If a patient can walk a mile, why ask if they can walk 100 yards? 26

Practical Application of Measurement Reliability and Validity Factors to Consider How much time and money do you have to carry out your own tests? How small a difference in the measurement do you expect? Can you use a previously validated measure? Does the previous measure work within the context of your setting? 27