Research Psychometric Properties

Similar documents
Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Test Reliability Indicates More than Just Consistency

X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

Chapter 3 Psychometrics: Reliability & Validity

Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview

Strategic HR Partner Assessment (SHRPA) Feedback Results

Measurement: Reliability and Validity Measures. Jonathan Weiner, DrPH Johns Hopkins University

Sources of Validity Evidence

2011 Validity and Reliability Results Regarding the SIS

Measurement: Reliability and Validity

GLOSSARY of Assessment Terms

History and Purpose of the Principles for the Validation and Use of Personnel Selection Procedures

Validity and Reliability in Social Science Research

Study Guide for the Special Education: Core Knowledge Tests

Reliability and validity, the topics of this and the next chapter, are twins and

Study Guide for Basic Data Entry Test

Study Guide for the English Language, Literature, and Composition: Content Knowledge Test

Study Guide for the Mathematics: Proofs, Models, and Problems, Part I, Test

Speech-Language Pathology Study Guide

National assessment of foreign languages in Sweden

Standardized Tests, Intelligence & IQ, and Standardized Scores

Personality and socioemotional. What, How, and When. Oliver P. John University of California, Berkeley September 2, 2014

Practice Quiz - Intelligence

1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some

ETS Automated Scoring and NLP Technologies

Evaluation: Designs and Approaches

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

RESEARCH METHODS IN I/O PSYCHOLOGY

Parents Guide Cognitive Abilities Test (CogAT)

5 Systems of Equations

High School Functions Interpreting Functions Understand the concept of a function and use function notation.

Using Personality to Predict Outbound Call Center Job Performance

Efficacy Report WISC V. March 23, 2016

Implementing Portfolio Management: Integrating Process, People and Tools

Chapter 6 Appraising and Managing Performance. True/False Questions

RESEARCH METHODS IN I/O PSYCHOLOGY

Early Childhood Measurement and Evaluation Tool Review

standardized tests used to assess mental ability & development, in an educational setting.

Study Guide for the Elementary Education: Content Knowledge Test

NMSU Administration and Finance Custodial Services/Solid Waste and Recycling

The Revised Dutch Rating System for Test Quality. Arne Evers. Work and Organizational Psychology. University of Amsterdam. and

Neuropsychology Research Program: Thomas P. Ross, Ph.D.

Performance Appraisal in the Workplace

Advances in Nurse Hiring Processes: Pre-employment Assessments that Predict Job Performance (SUBMITTED FOR REVIEW) Primary Author:

CHAPTER 2: CLASSIFICATION AND ASSESSMENT IN CLINICAL PSYCHOLOGY KEY TERMS

4 G: Identify, analyze, and synthesize relevant external resources to pose or solve problems. 4 D: Interpret results in the context of a situation.

High Learning Potential Assessment

The Official Study Guide

Assessment Centres and Psychometric Testing. Procedure

TEST REVIEW. Purpose and Nature of Test. Practical Applications

The Official Study Guide

xxx Lesson 19 how memory works and techniques to improve it, and (2) appreciate the importance of memory skills in education and in his or her life.

Stages of Instructional Design V. Professional Development

The Comprehensive Test of Nonverbal Intelligence (Hammill, Pearson, & Wiederholt,

ELEMENTS OF AN HYPOTHESIS

Work in the 21 st Century: An Introduction to Industrial- Organizational Psychology. Chapter 6. Performance Measurement

Validation of the Chally Assessment System with a Sample of Sales Representatives

Reception baseline: criteria for potential assessments

Description of the ICTS Standard Setting Process ILLINOIS CERTIFICATION TESTING SYSTEM (ICTS) 1

Study Guide for the Middle School Science Test

Good Practice Guidelines for Indicator Development and Reporting

Assuring Public Safety in the Delivery of Substance Abuse Treatment Services. An IC&RC Position Paper on Alcohol & Drug Counselor (ADC) Credentialing

Survey of Organizational Excellence. Survey Constructs. 913 Sample Organization

Basic Concepts in Research and Data Analysis

I D C T E C H N O L O G Y S P O T L I G H T

Abstract Title: Identifying and measuring factors related to student learning: the promise and pitfalls of teacher instructional logs

Clinician Portal: Enabling a Continuity of Concussion Care

BOARD OF NURSE EXAMINERS FOR THE STATE OF TEXAS 333 GUADALUPE STREET, SUITE AUSTIN, TX 78701

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

Advantages and Disadvantages of Various Assessment Methods

Scores, 7: Immediate Recall, Delayed Recall, Yield 1, Yield 2, Shift, Total Suggestibility, Confabulation.

A Study on Use of ERP Software with reference to Revenir Development Systems Aurangabad (India)

INTERNATIONAL ENGLISH LANGUAGE TESTING SYSTEM

PERFORMANCE APPRAISAL SYSTEM

Dimension Weight. 9% 13% Cost 43%

Comparison of Insights Discovery System to Myers-Briggs Type Indicator

What Makes a Good Test?

The University of Texas at Austin

Psychology (MA) Program Requirements 36 credits are required for the Master's Degree in Psychology as follows:

A s h o r t g u i d e t o s ta n d A r d i s e d t e s t s

Standardized Assessment

Experimental methods. Elisabeth Ahlsén Linguistic Methods Course

Frequently Asked Questions

Dr. Wei Wei Royal Melbourne Institute of Technology, Vietnam Campus January 2013

The Official Study Guide

Five Technology Trends for Improved Business Intelligence Performance

INTERNATIONAL TEST COMMISSION

Intelligence. My Brilliant Brain. Conceptual Difficulties. What is Intelligence? Chapter 10. Intelligence: Ability or Abilities?

PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES

PREPARED CHILDBIRTH EDUCATORS CERTIFICATION GENERAL TESTING AND RECERTIFICATION GUIDELINES

Role Delineation Study/ Practice Analysis, 6th Edition

What Makes PMI Certifications Stand Apart?

ESP MARKETING TEACHER S NOTES

GOI Protocol Page 1. General Organizational Index (GOI) (Expanded) Item Definitions and Scoring

Studying for Kaplan Nursing School Entrance Exam. Kaplan Nursing School Entrance Exam study guide may be purchased online from these stores:

How To Choose A Private Tutor For Test Prep

Cut employee review time by up to 90 percent.

WHAT IS A JOURNAL CLUB?

Transcription:

Research Psychometric Properties There are two important psychometric properties reliability and validity. Reliability deals with the extent to which a measure is repeatable or stable. Essentially reliability refers to the consistency of a measure. A reliable measure would return the same results from time to time, assuming the underlying phenomenon which we were measuring has not changed. Reliability is of particular importance because it establishes a ceiling for validity. A measure can be no more valid than it is reliable. Methods for establishing reliability include: Test re-test As depicted in the graphic below, test re-test reliability consists of administering the measure at Time 1, then after some period of time administering it again at Time 2. A reliable measure would produce similar results at both points in time. On a job satisfaction measure those who scored high at Time 1 would also score high at Time 2 while those who scored low at Time 1 would also score low at Time 2. The problem with this method of examining reliability is that if sufficient time has not elapsed between Time 1 and Time 2 individuals may simply remember their Time 1 responses and instead of reliability being examined it s simply individual memory that s being examined. On the other hand, should too much time elapse between the two measures, it might appear that the measure in not reliable when actually the phenomenon being measured has changed between Time 1 and Time 2. Parallel forms Similar to test re-test reliability, parallel forms consists of two versions of the measure and one version is administered at Time 1 while the second version is administered at Time 2. The memory problems associated with the test re-test method of examining reliability are corrected with this method in that the individuals have not seen the version 2 items previously. One difficulty of parallel forms reliability is that it requires developing a considerable number of items that measure the same thing. Parallel forms reliability is depicted below.

Internal consistency This method of establishing reliability essentially relies upon how well the several items within the measure hang together. Suppose that a measure of customer satisfaction consists of 10 statements regarding how satisfied the customer is with service, prices, etc. Basically the reliability of the measure is examined by estimating how well the items produce similar results. The graphic below depicts the relationship between Item 1 and Item 2, but to establish Internal Consistency reliability we d hope to see similar relationships between all possible pairs. Inter-rater This form of reliability is examined when there are multiple judges or raters evaluating the same phenomenon. Inter-rate reliability is essentially the extent of consensus among the raters. When there is a high degree of consensus it would be evidence of inter-rater reliability. Validity refers to the extent to which a study actually captures or measures what it purports to examine. Two broad categories of validity are external and internal validity. External validity refers to the generalizability of the research findings. For example, would our findings from a particular study also apply to other groups, other geographical

areas, other types of jobs, etc.? Internal validity essentially refers to how confident the researcher is in the study results. This type of validity encompasses a range of issues from how rigorous the study was in terms of research design, sample, measures, etc. to whether the measures are measuring what they are supposed to measure. In this description of validity we will focus on this second area, the extent to which measures capture what they are purported to capture. Although the following will be focused on examining internal validity in an employment testing setting, the discussion is appropriate in a variety of settings. Techniques for examining internal validity include: Face The concern of face validity is to not alienate the test taker. Basically face validity is to what extent, to the naïve, casual test-taker does the test measure what it purports to measure. If an employee is completing a measure of job satisfaction, but realizes that there are items that appear to be geared toward honesty or integrity, the validity of the job satisfaction measure may be diminished because of the lack of face validity. Content Content validity focuses on the extent to which an employment test samples the work domain. For example, a typing test would be content valid when testing applicants for the position of typist. Construct A construct is a mental definition or abstraction such as intelligence, job satisfaction, customer satisfaction, or organizational commitment. Construct validity refers to how well a measure of a construct actually measures the underlying construct such as how well does an IQ test actually measure intelligence. Criterion-related There are two types of criterion-related validity in employment testing; concurrent and predictive. In concurrent criterion-related validity the test or measure that is thought to be predictive of performance is administered to current employees and the results on the test are compared with some criterion of performance such as quantity produced, sales, or supervisory evaluations. Ideally we d like to see a very positive relationship so that higher performers score higher while lower performers score lower. If that were the case, we could conclude that our measure predicts performance and that it is a valid test and we could use it for employee selection. However, frequently the results in our concurrent test are as depicted below

In the above example, there is very little relationship between how employees scored on the test and their job performance. A problem that sometimes occurs with concurrent criterion-related validity studies is known as restriction of range. In our current employee group we don t have any poor or even below average employees; they ve all been terminated or quit because they couldn t meet performance requirements or they weren t even hired in the first place. If they could have been included in our assessment, our results might have looked like: As is obvious from the above graph, there is clearly a relationship between our test and performance and we would have concluded that the test was valid had we had the full range of employees in our study. A second type of criterion-related validity assessment is known as predictive. In a predictive study applicants, rather than current employees, are administered our measure that we think will be predictive of performance. The test results are not used as a basis upon which to hire, the actual selection is based upon some other factors or all applicants are hired. After a suitable time (e.g. six months or a year), performance of those individuals is determined and is compared with

their test scores from the test we re seeking to validate. If there is a strong relationship between the two, we d conclude the test is valid.