# X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

Save this PDF as:

Size: px
Start display at page:

Download "X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores"

## Transcription

1 Reliability It is the user who must take responsibility for determining whether or not scores are sufficiently trustworthy to justify anticipated uses and interpretations. (AERA et al., 1999) Reliability Refers to the consistency or stability of scores If a test taker was administered the test at a different time, would she receive the same score? If the test contained a different selection of items, would the test taker get the same score? Classical Test Theory The theory of reliability can be demonstrated with mathematical proofs X = T + E X = Obtained or observed score (fallible) T = True score (reflects stable characteristics) E = Measurement error (reflects random error) 1

2 Random Measurement Error All measurement is susceptible to error Any factor that introduces error into the measurement process effects reliability To increase our confidence in scores, we try to detect, understand, and minimize random measurement error Sources of RME Many theorists believe the major source of random measurement error is Content Sampling Since tests represent performance on only a sample of behavior, the adequacy of that sample is very important Sources of RME The error that results from differences between the sample of items (the test) and the domain of items (all items) is the content sampling error If the sample is representative and of sufficient size, then error due to content sampling will be relatively small 2

3 Sources of RME Can also be the result temporal instability, or random and transient: Situation-Centered influences (e.g., lighting & noise) Person-Centered influences (e.g., fatigue, illness) Other Sources of Error Administration errors (e.g., incorrect instructions, inaccurate timing) Scoring errors (e.g., subjective scoring, clerical errors) All possible sources of error contribute to the lack of precise measurement Classical Test Theory X = T + E X = Obtained or observed score (fallible) T = True score (reflects stable characteristics) E = Error score (reflects random error) Using group data, extended to: σ 2 X= σ 2 T+ σ 2 E 3

4 σ 2 X= σ 2 T+ σ 2 E σ 2 X= Observed score variance σ 2 T= True score variance σ 2 E = Error score variance Partitioning the Variance Percentage of observed score variance due to true score differences: σ 2 T/ σ 2 X Percentage of observed score variance due to random error: σ 2 E / σ 2 X Reliability = % of observed score variance due to true score differences, or = σ 2 T/ σ 2 X So, how do we estimate reliability; how do we partition the variance? 4

5 The Reliability Coefficient (r xx ) Percentage of observed score variance due to true score differences: r xx = σ 2 T/ σ 2 X Percentage of observed score variance due to random error: 1 -r xx = σ 2 E / σ 2 X Types of Reliability Coefficients Test-Retest Reliability Reflects the temporal stability of a measure Most applicable with tests administered more than once and/or with constructs that are viewed as stable It is important to consider the length of the interval between the two test administrations 5

6 Test-Retest Reliability Subject to carry-over effects Appropriate for tests that are not appreciably impacted by carry-over effects Alternate Form Reliability Involves the administration of two parallel forms; can be done in two ways: Delayed Administration reflects error due to temporal stability and content sampling Simultaneous Administration reflects only error due to content sampling Alternate Form Reliability Limitations include Reduces, but may not eliminate carry-over effects Relatively few tests have alternate forms Some tests with alternate forms: PPVT EVT CVLT 6

7 Internal Consistency Estimates of reliability that are based on the relationship between items within a test and are derived from a single administration of a test Split-Half Reliability Divide the test into two equivalent halves, usually an odd/even split Reflects error due to content sampling Since this really only provides an estimate of the reliability of a half-test, use the Spearman- Brown Formula to estimate the reliability of the complete test Half-Test Coefficients and Corresponding Full- Test Coefficients Corrected with the Spearman- Brown Formula Half-Test Correlation Spearman-Brown Reliability

8 Coefficient Alpha & KR 20 Reflects error due to content sampling Also sensitive to the heterogeneity of the test content (or item homogeneity) Mathematically, it s the average of all possible split-half coefficients Since coefficient alpha is a general formula, it is more popular Reliability of Speed Tests For speed tests, reliability estimates derived from a single administration of a test are inappropriate Test-retest and alternate-form reliability are appropriate, but split-half, Coefficient Alpha and KR 20 should be avoided Inter-Rater Reliability Reflects differences due to the individuals scoring the test Important when scoring requires subjective judgement by the scorer 8

9 Reliability Type # Test Forms # Testing Sessions Summary Test-Retest One Form Two Sessions Administer the same test to the same group at two different sessions. Alternate Forms Simultaneous Administration Two Forms One Session Administer two forms of the test to the same group in the same session. Delayed Administration Two Forms Two Sessions Administer two forms of the test to the same group at two different sessions. Split-Half One Form One Session Administer the test to a group one time. Split the test into two equivalent halves. Coefficient Alpha or KR-20 One Form One Session Administer the test to a group one time. Apply appropriate procedures. Inter-Rater One Form One Session Administer the test to a group one time. Two or more raters score the test independently. Sources of Error Variance associated with Major Types of Reliability Type of Reliability Test-Retest Reliability Time Sampling Error Variance Alternate-Form Reliability Simultaneous Administration Content Sampling Delayed Administration Time Sampling & Content Sampling Split-Half Reliability Content Sampling Coefficient Alpha & KR-20 Content Sampling & Item Heterogeneity Inter-Rater Reliability Differences Due to Raters/Scorers It is possible to partition the error variance into its components: 9

10 Reliability of Difference Scores Difference scores are calculated when comparing performance on two tests The reliability of the difference between two test scores is generally lower than the reliabilities of the two tests This is due to both tests having unique error variance that they contribute to the difference score Reliability of Composite Scores When there are multiple scores available for an individual, one can calculate composite scores (e.g., assigning grades in class) There are different approaches, but the important issue is that the reliability of a composite score is generally greater than the reliability of the individual scores Standards for Reliability If a test score is used to make important decisions that will significantly impact individuals, the reliability should be very high - >.90 or >.95 If a test is interpreted independently but as part of a larger assessment process (e.g., personality test), most set the standard as.80 or greater 10

11 Standards for Reliability If a test is used only in group research or is used as a part of a composite (e.g., classroom tests), lower reliability estimates may be acceptable, e.g..70s Improving Reliability Increase the number of items (better domain sampling) Use multiple measurements (composite scores) Use Item Analysis procedures to select the best items Increase standardization of the test Reliability Expected when increasing the Number of Items Current The Reliability Expected When The Number of Items is Increased by: Reliability X 1.25 X 1.50 X 2.0 X

12 Standard Error of Measurement When comparing the reliability of tests, the reliability coefficient is the statistic of choice When interpreting individualscores, the SEM generally proves to be the most useful statistic It is an index of the average amount of error in test scores, or the standard deviation of error scores around the true score The SEM is calculated using the reliability coefficient and the standard deviation of the test Standard Error of Measurement Since the test s reliability coefficient is used in calculating the SEM, there is a direct relationship between r xx and SEM As the reliability of a test increases, the SEM decreases; as reliability decreases, the SEM increases Confidence Intervals A confidence interval reflects a range of scores that will contain the test taker s true score with a prescribed probability The SEM is used to calculate CIs Like any SD, the SEM can be interpreted in terms of frequencies represented in a normal distribution 12

13 Confidence Intervals = Obtained Score ±(z-score) x SEM If z-score is 1, the result is a 68% confidence interval If the z-score is 1.96, the result is a 95% confidence interval Confidence Intervals CIs remind us that measurement error is present in allscores and we should interpret scores cautiously Confidence intervals are interpreted as The range within which a person s true score is expected to fall X% of the time Generalizability Theory An extension of Classical Test Theory, but where CT says all error is random, GT recognizes sources of systematic error For example -in scoring essays, some graders are consistently more rigorous and some graders are consistently more lenient If you have a situation in which there is no opportunity for systematic error to enter the model, CT and GT are mathematically identical 13

14 Generalizability Theory CT is most useful when objective tests are administered under standardized conditions (e.g., SAT or GRE) If these considerations are not met, consideration of the principles raised by GT may be useful (e.g., essay or projective tests) Validity The degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Validity is, therefore, the most fundamental consideration in developing and evaluating tests. (AERA et al., 1999) Validity Does the test measure what it is intended or designed to measure? Refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests Validity can be threatened in several ways 14

15 Threats to Validity Construct Underrepresentation, or when a test does not measure important aspects of the specified construct Construct-Irrelevant Variance, or when the test measures characteristics, content, or skills that are unrelated to the test construct Reliability & Validity Reliability is necessary for validity, but is not sufficient A test can be reliable without being valid, but a test cannot be valid without being reliable Reliability & Validity Validity analyses are often based on correlational analyses between tests and validation measures The reliability of the test (and validation measures) sets an upper limit on the validity coefficient 15

16 Traditional Types of Validity Content validity Does the test provide an accurate estimate of the test taker s mastery of the specified content domain? Criterion-related validity Can the test predict performance on a specified criterion? Construct validity Does the test accurately reflect the dimensions underlying the construct measured by the test? Validity Evidence vs. Types The common practice has been to refer to types of validity (e.g., content validity) However, validity is a unitary concept Reflects the degree to which allthe accumulated evidence supports the intended interpretation of test scores for proposed purposes Validity Evidence vs. Types As a result, the current emphasis in the Standards and advanced texts is to refer to types of validity evidence rather than distinct types of validity Instead of Content Validity, we have Evidence Based on Test Content Instead of Criterion-Related Validity, we have Evidence Based on Relations to Other Variables 16

17 Classical Test Theory X = T + E X = Obtained or observed score (fallible) T = True score (reflects stable characteristics) E = Error score (reflects random error) Validity Theory T = R + I T = True score (reflects stable characteristics) R = stable relevant characteristics I = stable irrelevant characteristics Relevant & Irrelevant Stable Characteristics Consider a T&M Test: Relevant Stable Characteristics -knowledge of T&M content (R) Irrelevant Stable Characteristics -reading comprehension skill (I) 17

18 Validity Theory X = R + I + E X = Obtained or observed score (fallible) R = relevant stable characteristics I = irrelevant stable characteristics E = Error score (reflects random error) σ 2 X= σ 2 R+ σ 2 I+ σ 2 E σ 2 X= Observed score variance σ 2 R= Variance due to relevant stable characteristics σ 2 I= Variance due to irrelevant stable characteristics (systematic error) σ 2 E =Error score variance Reliability & Validity We define reliability as: σ 2 T/ σ 2 X We define validity as: σ 2 R/ σ 2 X 18

19 Random & Systematic Error Reliability analysis -random measurement error is an error component of the observed score that is separate from the true score Validity analysis -systematic measurement error (due to stable but irrelevant characteristics) is an error component of the true score Validity Evidence Based on Test Content Focuses on how well the test items sample the behaviors or subject matter the test is designed to measure Does the test provide an accurate estimate of the test taker s mastery of the specified content domain? In other words, does the test adequately measure the content, behaviors, or skills it is thought to measure? Validity Evidence Based on Test Content Most relevant, appropriate, and important for tests used to make inferences about the knowledge and/or skills assessed by a sample of items E.g., achievement tests or employment tests Traditionally, has involved a qualitative process in which the test is compared to a detailed description of the test domain developed by respected experts 19

20 Face Validity Not technically a facet of validity, but refers to a test appearing to measure what it is designed to measure Content-based evidence of validity is acquired through a systematic and technical analysis of the test content, face validity only involves the superficial appearance of a test Face Validity Often desirable as it makes the test more acceptable to those taking the test and the general public Can be undesirable if it makes the test transparent to the point that it makes it easy for test takers to feign symptoms (e.g., forensic & employment settings) Validity Evidence Based on Relations to Other Variables Test-Criterion Evidence Can the test predict performance on a specified criterion? How accurately do test scores predict criterion performance? (traditionally considered Criterion-Related Validity) Convergent & Discriminant Evidence Examines relationship with existing tests that measure similar or dissimilar constructs. (traditionally incorporated under Construct Validity) Contrasted Group Evidence Do different groups perform differently on the test? (traditionally incorporated under Construct Validity) 20

21 Test-Criterion Evidence The criterion variable is a measure of some attribute or outcome that is of primary interest, as determined by the test users E.g., the SAT is used to predict academic performance in college, so the criterion variable is college GPA Approach #1: Predictive Studies Indicates how accurately test data can predict criterion scores that are obtained at a later time When prediction is the intended application, predictive studies retain the temporal differences and other characteristics of the practical situation Predictive Studies Predictive studies are time-consuming and expensive Involve the use of an experimental group, so people may be admitted to programs or given jobs without regard to their performance on the predictor, even if their scores suggest a low probability of success 21

22 Approach #2: Concurrent Studies Obtains predictor and criterion data at the same time Most useful when prediction over time is not important E.g., examining the relationship between the results of a brief paper-and-pencil measure and a clinical interview Concurrent Studies Problems - May be applied evenwhen changes over time are important - May result in a restricted range of scores on one or both measures (e.g., SAT administered only to college freshmen) - Range restriction reduces the size of correlation coefficients Criterion Validity Coefficients Use the correlation between the predictor and criterion to reflect the relationship between the predictor and criterion (r xy ) Linear regression is used to predict scores on criterion variable, but is dependent upon the strength of the correlation coefficient 22

23 Linear Regression and S E Linear regression assumes a perfect relationship between the predictor and criterion, it does not take into account prediction error To correct for this, we use a statistic referred to as the Standard Error of Estimate (S E ). Standard Error of Estimate (S E ) Represents the average amount of prediction error -the average number of points by which predicted scores differ from actual criterion scores A residualis the difference between the actual criterion score and its predicted value S E is the standard deviation of the distribution of prediction errors S E andconfidence Intervals When using linear regression to predict performance, S E is used to construct confidence intervals representing a range of scores within which the actual criterion score is predicted to fall We use information about prediction accuracy (S E ) to convert a single predicted criterion score into a range within which we expect the actual criterion score to fall 23

24 SEM and S E The SEM indicates the margin of measurement error caused by the imperfect reliability of the test The S E indicates the margin of prediction error caused by the imperfect validity of the test Predicting Criterion Scores If you have large criterion validity coefficients, the S E will be relatively small As a general rule, criterion validity coefficients will not be as large as the reliability coefficients we are accustomed to Interpreting Validity Coefficients How large should validity coefficients be? No simple answer -the relationship should be large enough so that information from the test helps predict performance on the criterion If the test helps predict criterion performance better than any existing predictor, the test may be useful even if the coefficients are relatively small 24

25 Decision-Theory Models & Selection Efficiency Analysis Does not rely on the absolute value of the criterion validity coefficients, but examines the test s contribution in decision making situations If the test s use results in a high proportion of accurate decisions it is useful, otherwise get rid of it Selection Efficiency Analysis Base Rateis the overall proportion of people in the group under study who can be successful on the criterion Selection Rateis the rate of people to be selected to people applying Selection Efficiency Analysis If you have a low selection rate(a situation where you have many applicants to fill a small number of positions) and a low base rate (success on the criterion is fairly rare), even tests with relatively small criterion validity coefficients can significantly improve decision making 25

26 Convergent & Discriminant Evidence Convergent evidence is obtained when you have strong correlations with existing tests that measure similar constructs Discriminant evidence is obtained when you have low correlations with existing tests that measure dissimilar constructs Contrasted Group Studies Validity evidence can also be gathered by examining groups that are expected, based on theory, to differ on the construct being measured Depressed versus Control on BDI Examine developmental trends in performance Internal Structure Evidence Examining the internal structure of the test to determine if its structure is consistent with the hypothesized structure of the construct it is designed to measure Factor analysis & measures of item homogeneity. Traditionally incorporated under Construct Validity 26

27 Response Processes Evidence Are the responses invoked by the test consistent with the construct being assessed?» For example, does a test of math reasoning require actual analysis and reasoning, or simply rote calculations? Traditionally incorporated under Construct Validity Consequential Validity Evidence Also referred to as Validity Evidence Based on Consequences of Testing» If the test is thought to result in some benefit, are those benefits being achieved? A topic of controversy since some suggest that the concept should incorporate social issues and values Sources of Validity Evidence Source Example Major Applications Evidence Based on Test Content Analysis of item relevance and content coverage Achievement tests and tests used in the selection of employees Evidence Based on Test-criterion; convergent Relations to Other Variables and discriminant evidence; Wide variety of tests contrasted groups studies Evidence Based on Internal Structure Factor analysis, analysis of test homogeneity Wide variety of tests, but particularly useful with tests of constructs like personality or intelligence Evidence Based on Analysis of the processes Any test that requires examinees to engage in a Response Processes engaged in by the examinee cognitive or behavioral or examiner activity Evidence Based on Consequences of Testing Structure Analysis of the intended and unintended consequences of testing. Most applicable to tests designed for selection and promotion, but useful on a wide range of tests 27

### RESEARCH METHODS IN I/O PSYCHOLOGY

RESEARCH METHODS IN I/O PSYCHOLOGY Objectives Understand Empirical Research Cycle Knowledge of Research Methods Conceptual Understanding of Basic Statistics PSYC 353 11A rsch methods 01/17/11 [Arthur]

### RESEARCH METHODS IN I/O PSYCHOLOGY

RESEARCH METHODS IN I/O PSYCHOLOGY Objectives Understand Empirical Research Cycle Knowledge of Research Methods Conceptual Understanding of Basic Statistics PSYC 353 11A rsch methods 09/01/11 [Arthur]

### Introduction to Reliability

Introduction to Reliability What is reliability? Reliability is an index that estimates dependability (consistency) of scores Why is it important? Prerequisite to validity because if you are not measuring

### Chapter 5 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 5 Multiple Choice Questions (The answers are provided after the last question.) 1. Which of the following is not an assumption underlying testing and measurement? a. Various approaches to measuring

### Chapter 3 Psychometrics: Reliability & Validity

Chapter 3 Psychometrics: Reliability & Validity 45 Chapter 3 Psychometrics: Reliability & Validity The purpose of classroom assessment in a physical, virtual, or blended classroom is to measure (i.e.,

### Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

### Test Reliability Indicates More than Just Consistency

Assessment Brief 015.03 Test Indicates More than Just Consistency by Dr. Timothy Vansickle April 015 Introduction is the extent to which an experiment, test, or measuring procedure yields the same results

### Reliability and Validity. Measurement Error. Random vs. Systematic Error 10/7/10. Random error. Systematic error. Chance fluctuations in measurement

Reliability and Validity Measurement Error Chance fluctuations in measurement More precise measurement Fluctuations noted more easily Random vs. Systematic Error Random error Chance fluctuations in measurement

### Factors Affecting Performance. Job Analysis. Job Analysis Methods. Uses of Job Analysis. Some Job Analysis Procedures Worker Oriented

Factors Affecting Performance Motivation Technology Performance Environment Abilities Job Analysis A job analysis generates information about the job and the individuals performing the job. Job description:

### Reliability Overview

Calculating Reliability of Quantitative Measures Reliability Overview Reliability is defined as the consistency of results from a test. Theoretically, each test contains some error the portion of the score

### Sources of Validity Evidence

Sources of Validity Evidence Inferences made from the results of a selection procedure to the performance of subsequent work behavior or outcomes need to be based on evidence that supports those inferences.

### Guided Reading 9 th Edition. informed consent, protection from harm, deception, confidentiality, and anonymity.

Guided Reading Educational Research: Competencies for Analysis and Applications 9th Edition EDFS 635: Educational Research Chapter 1: Introduction to Educational Research 1. List and briefly describe the

### Measurement Types of Instruments

Measurement Types of Instruments Y520 Strategies for Educational Inquiry Robert S Michael Measurement - Types of Instruments - 1 First Questions What is the stated purpose of the instrument? Is it to:

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Procedures for Estimating Internal Consistency Reliability. Prepared by the Iowa Technical Adequacy Project (ITAP)

Procedures for Estimating Internal Consistency Reliability Prepared by the Iowa Technical Adequacy Project (ITAP) July, 003 Table of Contents: Part Page Introduction...1 Description of Coefficient Alpha...3

### Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview

Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview The abilities to gather and interpret information, apply counseling and developmental theories, understand diagnostic frameworks,

### DATA COLLECTION AND ANALYSIS

DATA COLLECTION AND ANALYSIS Quality Education for Minorities (QEM) Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. August 23, 2013 Objectives of the Discussion 2 Discuss

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M

EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 1 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M CONTENTS Reliability And Validity Sample Size Determination Sampling

### Compliance with APA s Standards for Psychological and Educational Testing

Compliance with APA s Standards for Psychological and Educational Table of Contents INTRODUCTION... 5 TEST CONSTRUCTION, EVALUATION, AND DOCUMENTATION\L1... 6 1. VALIDITY...6 1.1 A rationale should be

### The Revised Dutch Rating System for Test Quality. Arne Evers. Work and Organizational Psychology. University of Amsterdam. and

Dutch Rating System 1 Running Head: DUTCH RATING SYSTEM The Revised Dutch Rating System for Test Quality Arne Evers Work and Organizational Psychology University of Amsterdam and Committee on Testing of

### A PARADIGM FOR DEVELOPING BETTER MEASURES OF MARKETING CONSTRUCTS

A PARADIGM FOR DEVELOPING BETTER MEASURES OF MARKETING CONSTRUCTS Gilber A. Churchill (1979) Introduced by Azra Dedic in the course of Measurement in Business Research Introduction 2 Measurements are rules

### Validity and Reliability in Social Science Research

Education Research and Perspectives, Vol.38, No.1 Validity and Reliability in Social Science Research Ellen A. Drost California State University, Los Angeles Concepts of reliability and validity in social

### An Instructor s Guide to Understanding Test Reliability. Craig S. Wells James A. Wollack

An Instructor s Guide to Understanding Test Reliability Craig S. Wells James A. Wollack Testing & Evaluation Services University of Wisconsin 1025 W. Johnson St., #373 Madison, WI 53706 November, 2003

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales. Olli-Pekka Kauppila Rilana Riikkinen

Doing Quantitative Research 26E02900, 6 ECTS Lecture 2: Measurement Scales Olli-Pekka Kauppila Rilana Riikkinen Learning Objectives 1. Develop the ability to assess a quality of measurement instruments

### Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### Background and Technical Information for the GRE Comparison Tool for Business Schools

Background and Technical Information for the GRE Comparison Tool for Business Schools Background Business schools around the world accept Graduate Record Examinations (GRE ) General Test scores for admission

### Follow this and additional works at: Part of the Educational Methods Commons

Johnson & Wales University ScholarsArchive@JWU Research Methodology Center for Research and Evaluation 4-1-2013 Construct Validity: An Illustration of Examining Validity Evidence Based on Relationships

### Vineland Adaptive Behaviour Scales (VABS) II

Outcome Measure Sensitivity to Change Population Domain Type of Measure ICF-Code/s Description Vineland Adaptive Behaviour Scales (VABS) II Yes Paediatrics Behavioural Function Neuropsychological Impairment

### Research Psychometric Properties

Research Psychometric Properties There are two important psychometric properties reliability and validity. Reliability deals with the extent to which a measure is repeatable or stable. Essentially reliability

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Comprehensive Receptive and Expressive Vocabulary Test Second Edition (CREVT-2)

Comprehensive Receptive and Expressive Vocabulary Test Second Edition (CREVT-2) The Comprehensive Receptive and Expressive Vocabulary Test Second Edition (Wallace & Hammill, 2002) is a norm-referenced

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### Glossary of Testing, Measurement, and Statistical Terms

Glossary of Testing, Measurement, and Statistical Terms 425 Spring Lake Drive Itasca, IL 60143 www.riversidepublishing.com Glossary of Testing, Measurement, and Statistical Terms Ability - A characteristic

### Basic Concepts in Classical Test Theory: Relating Variance Partitioning in Substantive Analyses. to the Same Process in Measurement Analyses ABSTRACT

Basic Concepts in Classical Test Theory: Relating Variance Partitioning in Substantive Analyses to the Same Process in Measurement Analyses Thomas E. Dawson Texas A&M University ABSTRACT The basic processes

### Psychometrics 101 Part 2: Essentials of Test Score Interpretation. Steve Saladin, Ph.D. University of Idaho

Psychometrics 101 Part 2: Essentials of Test Score Interpretation Steve Saladin, Ph.D. University of Idaho Standards for Educational and Psychological Testing 15.10 Those responsible for testing programs

### Class 6: Chapter 12. Key Ideas. Explanatory Design. Correlational Designs

Class 6: Chapter 12 Correlational Designs l 1 Key Ideas Explanatory and predictor designs Characteristics of correlational research Scatterplots and calculating associations Steps in conducting a correlational

### Statistics, Research, & SPSS: The Basics

Statistics, Research, & SPSS: The Basics SPSS (Statistical Package for the Social Sciences) is a software program that makes the calculation and presentation of statistics relatively easy. It is an incredibly

### Interpreting GRE Scores: Reliability and Validity

Interpreting GRE Scores: Reliability and Validity Dr. Lesa Hoffman Department of Psychology University of Nebraska Lincoln All data from: GRE Guide to the Use of Scores 2010 2011 http://www.ets.org/s/gre/pdf/gre_guide.pdf

### Early Childhood Measurement and Evaluation Tool Review

Early Childhood Measurement and Evaluation Tool Review Early Childhood Measurement and Evaluation (ECME), a portfolio within CUP, produces Early Childhood Measurement Tool Reviews as a resource for those

### WHAT IS A JOURNAL CLUB?

WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club

### ACES. Report Requested: Study ID: R090xxx. Placement Validity Report for CLEP Sample ADMITTED CLASS EVALUATION SERVICE TM

ACES Report Requested: 08-01-2009 Study ID: R090xxx Placement Validity Report for CLEP Sample Your College Board Validity Report is designed to assist your institution in validating your placement decisions.

### Measurement: Reliability and Validity

Measurement: Reliability and Validity Y520 Strategies for Educational Inquiry Robert S Michael Reliability & Validity-1 Introduction: Reliability & Validity All measurements, especially measurements of

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Technical Information

Technical Information Trials The questions for Progress Test in English (PTE) were developed by English subject experts at the National Foundation for Educational Research. For each test level of the paper

### Constructing Indices and Scales. Hsueh-Sheng Wu CFDR Workshop Series Summer 2012

Constructing Indices and Scales Hsueh-Sheng Wu CFDR Workshop Series Summer 2012 1 Outline What are scales and indices? Graphical presentation of relations between items and constructs for scales and indices

### Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

### a. Will the measure employed repeatedly on the same individuals yield similar results? (stability)

INTRODUCTION Sociologist James A. Quinn states that the tasks of scientific method are related directly or indirectly to the study of similarities of various kinds of objects or events. One of the tasks

### DATA ANALYSIS AND INTERPRETATION OF EMPLOYEES PERSPECTIVES ON HIGH ATTRITION

DATA ANALYSIS AND INTERPRETATION OF EMPLOYEES PERSPECTIVES ON HIGH ATTRITION Analysis is the key element of any research as it is the reliable way to test the hypotheses framed by the investigator. This

### AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

### Estimate a WAIS Full Scale IQ with a score on the International Contest 2009

Estimate a WAIS Full Scale IQ with a score on the International Contest 2009 Xavier Jouve The Cerebrals Society CognIQBlog Although the 2009 Edition of the Cerebrals Society Contest was not intended to

### Validity, Fairness, and Testing

Validity, Fairness, and Testing Michael Kane Educational Testing Service Conference on Conversations on Validity Around the World Teachers College, New York March 2012 Unpublished Work Copyright 2010 by

### Correlational Research

Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

### Research Methods & Experimental Design

Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

### Coefficient of Determination

Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

### E205 Final: Version B

Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

### ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores Wayne J. Camara, Dongmei Li, Deborah J. Harris, Benjamin Andrews, Qing Yi, and Yong He ACT Research Explains

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### High School Psychology and its Impact on University Psychology Performance: Some Early Data

High School Psychology and its Impact on University Psychology Performance: Some Early Data John Reece Discipline of Psychology School of Health Sciences Impetus for This Research Oh, can you study psychology

### Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Final Report Sarah Maughan Ben Styles Yin Lin Catherine Kirkup September 29 Partial Estimates of Reliability:

### ACES. Report Requested: Study ID: R08xxxx. Placement Validity Report for ACCUPLACER Sample ADMITTED CLASS EVALUATION SERVICE TM

ACES Report Requested: 02-01-2008 Study ID: R08xxxx Placement Validity Report for ACCUPLACER Sample Your College Board Validity Report is designed to assist your institution in validating your placement

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample

MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of

### CRITICAL THINKING ASSESSMENT

CRITICAL THINKING ASSESSMENT REPORT Prepared by Byron Javier Assistant Dean of Research and Planning 1 P a g e Critical Thinking Assessment at MXC As part of its assessment plan, the Assessment Committee

### Standard error vs. Standard error of measurement

Statistics Corner Questions and answers about language testing statistics: Standard error vs. Standard error of measurement James Dean Brown (University of Hawai'i at Manoa) QUESTION: One of the statistics

### How to Read a Research Article

RACHEL DUNIFON How to Read a Research Article The goal of this Research Brief is to provide information that will make reading a research article more illuminating. For example, you might want to learn

### The Theory of Cognitive Acuity: Extending Psychophysics to the Measurement of Situational Judgment

This presentation will introduce a theory of cognitive acuity (TCA) which provides a psychophysical method for estimating respondents sensitivity to the correctness of situational judgment test (SJT) response

### Standardized Tests, Intelligence & IQ, and Standardized Scores

Standardized Tests, Intelligence & IQ, and Standardized Scores Alphabet Soup!! ACT s SAT s ITBS GRE s WISC-IV WAIS-IV WRAT MCAT LSAT IMA RAT Uses/Functions of Standardized Tests Selection and Placement

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### Chapter Eight: Quantitative Methods

Chapter Eight: Quantitative Methods RESEARCH DESIGN Qualitative, Quantitative, and Mixed Methods Approaches Third Edition John W. Creswell Chapter Outline Defining Surveys and Experiments Components of

### Experimental methods. Elisabeth Ahlsén Linguistic Methods Course

Experimental methods Elisabeth Ahlsén Linguistic Methods Course Experiment Method for empirical investigation of question or hypothesis 2 types a) Lab experiment b) Naturalistic experiment Question ->

### Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

### What is correlational research?

Key Ideas Purpose and use of correlational designs How correlational research developed Types of correlational designs Key characteristics of correlational designs Procedures used in correlational studies

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### University of Oklahoma

A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop

Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

### Measurement Systems Correlation MSC for Suppliers

Measurement Systems Correlation MSC for Suppliers Copyright 2003-2007 Raytheon Company. All rights reserved. R6σ is a Raytheon trademark registered in the United States and Europe. Raytheon Six Sigma is

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### Assessment and Placement: October 21, 2015 Edward A. Morante, Ed.D.

Assessment and Placement: October 21, 2015 Edward A. Morante, Ed.D. Assessment : Different Meanings +1. Assessment for basic skills - advising and placement into entering courses 2. Classroom Assessment

### DOCTOR OF PHILOSOPHY DEGREE. Educational Leadership Doctor of Philosophy Degree Major Course Requirements. EDU721 (3.

DOCTOR OF PHILOSOPHY DEGREE Educational Leadership Doctor of Philosophy Degree Major Course Requirements EDU710 (3.0 credit hours) Ethical and Legal Issues in Education/Leadership This course is an intensive

### Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### 6.2 Normal distribution. Standard Normal Distribution:

6.2 Normal distribution Slide Heights of Adult Men and Women Slide 2 Area= Mean = µ Standard Deviation = σ Donation: X ~ N(µ,σ 2 ) Standard Normal Distribution: Slide 3 Slide 4 a normal probability distribution

### Grade Level Year Total Points Core Points % At Standard 9 2003 10 5 7 %

Performance Assessment Task Number Towers Grade 9 The task challenges a student to demonstrate understanding of the concepts of algebraic properties and representations. A student must make sense of the

### MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS Instructor: Mark Schilling Email: mark.schilling@csun.edu (Note: If your CSUN email address is not one you use regularly, be sure to set up automatic

### EVT 2 Publication Summary Form

EVT 2 Publication Summary Form PRODUCT DESCRIPTION Product name Expressive Vocabulary Test, Second Edition Product acronym EVT 2 Author Kathleen T. Williams, PhD Copyright date 1997, 2007 Brief description

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

### Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### Reliability and validity, the topics of this and the next chapter, are twins and

Research Skills for Psychology Majors: Everything You Need to Know to Get Started Reliability Reliability and validity, the topics of this and the next chapter, are twins and cannot be completely separated.

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance