# Standardized Assessment for Clinical Practitioners: A Primer

Size: px
Start display at page:

## Transcription

4 Figure 1. Normal Curve Standard scores are assigned to all raw scores based on the standard deviation, but there are many different types of standard scores. Perhaps the most popular type of standard score is a metric where the mean is 100 and the standard deviation is 15. In the example of Jamal, the raw score mean was 40 points and the raw score standard deviation was 4 points. Jamal s obtained raw score of 32 is two standard deviations below the mean at the 2nd percentile, so it would be assigned a standard score of 70. A standard score of 70 means the same thing for all tests normed using this 100/15 metric. In this way, standardized tests make it easier for you to interpret scores on different tests and to compare scores across tests. Another popular standard score metric is the T score. In this system the mean is always set to T 50, and the standard deviation is always 10 T score points. So, T 40 and T 60 are one standard deviation below and above the mean, respectively. If Jamal s raw score had been transformed to a T score metric, it would be T 30, which has the same meaning as a standard score of 70 (i.e., two standard deviations below the mean at the 2nd percentile). For normally distributed constructs, percentiles and standard deviations line as shown in Figure 1 (i.e., one standard deviation below the mean is the 16th percentile). However, keep in mind that not all constructs of clinical interest are normally distributed in the population. When a construct is distributed in a way that the scores pile up on one end of the scale and taper off gradually at the other end, the distribution is called skewed. These distributions can be either positively or negatively skewed. A negatively skewed distribution might be obtained when measuring a construct that most subjects of one age can easily perform and only very few cannot. For example, a test of phonological awareness for 8-year-olds might be negatively skewed because most 8-year-olds can easily perform these tasks, and only very few cannot. A positively skewed distribution may be obtained when measuring a construct that most individuals cannot perform and only a few can. For example, a test of phonological awareness for 3-year-olds may be positively skewed because most cannot perform these tasks, but a few can. In skewed distributions, the percentiles and standard deviation units do not line up the same way as the normal curve. They vary to the extent that the distribution is skewed. 4

5 Figure 2. Positively and Negatively Skewed Distributions Why do many tests have basal and ceiling rules? Most tests that SLPs use are designed for assessing clients across a wide range of ages and abilities; therefore, not all test items are necessary or appropriate for every client. In most cases, the test items are ordered from easiest to hardest. Basal rules enable you to establish where to start the test so that you do not need to administer every item. For example, if you are testing a 6-year-old child for language development, the test developers might have you start the test with items appropriate for 5 ½-year-olds just to be sure the child understands the task and to give him or her some practice with the items. You would not need to administer items intended for 3- or 4-year-olds unless the child had trouble responding to items designed for 5-year-olds. Typically, the start point in any test, subtest, or series of items is set at a level where 90% or more of all children that age have responded to the earlier items correctly. This helps reduce testing time and ensures that you administer only the items appropriate for each client. Ceiling rules enable you to know when to stop testing because you have exceeded the child s ability to respond correctly. Psychometricians have analyzed the standardization data to determine when you can be sure that if you administer another item the child will very likely get it wrong. Usually, the discontinue rule is set so that after a certain number of items are answered incorrectly there is less than a 10% chance that the examinee will respond to any of the remaining items correctly. This reduces testing time, but equally importantly, it prevents frustrating the examinee by administering many items that he or she cannot respond to correctly. What are confidence intervals, and why should I use them? The very high degree of precision and standard procedures used in administering and scoring standardized, norm-referenced tests may make you think that you can be 100% confident in the exact score obtained every time you administer the test. Unfortunately, because we are measuring human behavior and not a physical characteristic (e.g., height or weight), there is always some measurement error inherent in all clinical tests. 5

6 Sources of measurement error include fluctuations in human performance over time related to health or fatigue, lack of internal consistency within a set of questions, or even differences in rapport between the examiner and examinee. For all these reasons, the client s true score may be slightly higher or lower than the specific score obtained. It is best to think of a range of scores that most likely describe your client s performance, rather than a single point score. The confidence interval is a range of scores around a client s obtained score that is sure to include the client s true score with 90% or 95 % likelihood. Many tests report critical values that may be used to build a confidence interval around each client s standard score, such as plus or minus 5 points. Confidence intervals are derived from the standard error of measurement. What is standard error of measurement and why should I be concerned about it? The standard error of measurement (SEM) is an estimate of the amount of measurement error in a test, which is different for every test. Conceptually, the SEM is the reverse of reliability the greater the reliability of a test, the smaller the standard error of measurement. You should be concerned about standard errors of measurement because you can have more confidence in the accuracy of a test score when the reliability is high and the standard error of measurement is small. Psychometricians use the SEM to create the confidence interval. The higher the reliability, the smaller the SEM and the narrower the confidence interval. A narrower confidence interval means you have a more precise score. We recommend that practitioners take measurement error into account when interpreting test scores by using confidence intervals. Some tests have confidence intervals built into the norms tables. How do I determine if a test has good norms? The accuracy of any standard score depends on the accuracy of the raw score mean and standard deviation obtained from the normative sample used to create the transformations to standard scores. The normative sample must be large enough to provide stable estimates of the population mean score and standard deviation. Very small normative samples may not have accurate raw score means and standard deviations because too much depends on the performance of the few subjects tested. The larger the sample, the more confidence you can have that a few errant subjects (referred to by psychometricians as outliers) did not have undue influence on the raw score mean and standard deviation. We can then say that the raw score means and standard deviations obtained from the normative data are stable. There is more to quality norms then the size of the sample. The subjects in the sample must be representative of the types of clients with whom you use the test. Representation, however, is sometimes misunderstood. A test of early cognitive, motor, and language development for infants and toddlers, for instance, does not have to include developmentally delayed or at-risk subjects in the normative sample. The performance of delayed or at-risk children might pull the mean lower and the sample would no longer be representative of normal cognitive, motor, or language development. Other factors known from previous research to affect performance on the task of interest should also be represented in the normative sample. For example, it is known that mothers with less education tend to provide less than average language stimulation to their developing children and the lack of early language stimulation has a substantial impact on the child s cognitive and language development. When creating a test of early development, it would be important to ensure that children from different parent-education backgrounds are represented in approximately the same proportions as they are found in the general population. It is incumbent upon the test developer to understand what factors influence scores on the construct being measured and ensure proper representation of those factors in the normative sample. 6

7 Psychometricians must also be concerned with how long ago the normative sample was collected. Norms that were collected many years ago may no longer fairly represent today s children or the current generation of adults. In state-mandated achievement testing, there is a requirement to update norms every 7 years. In the area of cognitive assessment, researchers have shown that norms tend to shift approximately 3 to 4 points every 10 years. Cognitive ability test scores typically improve across generations due to societal improvements in neonatal care, well-baby checks, nutrition, education, etc., so the norms from 10 years ago may no longer apply. Less is known about changes in language development or behavioral standards across generations, but the older the norms, the more concern psychometricians have about their validity today. Do all assessments require norms? Not all tests require norms. When tests are keyed to specific external standards or criteria they are called criterion-referenced tests. This is common in educational settings where students must meet curriculum standards set by the state board of education. In clinical practice, some constructs may be better referenced to an external standard of expected performance than to a sample of normal subjects; for example, tests of basic concepts, phonological awareness, or academic achievement. What is reliability? In general, reliability refers to the dependability of a test over time. Actually, there are several different types of reliability and each type estimates a different source of possible measurement error. The measures of reliability all range between 0 and.99. The most common type is called internal consistency reliability, which measures the extent to which all the items in a test measure the same construct. To calculate internal consistency reliability, psychometricians use various formulas such as split-half reliability, or the coefficient alpha (also called Cronbach s Alpha). All of these formulas are based on some way of calculating the extent to which the items in a test correlate with each other. The higher the correlation between items, the more we can assume that all the items measure the same thing. So, this type of reliability estimates measurement error based on inconsistency within the item set. For test batteries that include multiple subtests, this should be calculated separately for each subtest. Another popular type of reliability is test retest reliability. To estimate this type of reliability, the same test is administered twice to the same examinee, with a specific interval between the two administrations. Scores from the two test administrations are compared to see how highly they correlate and how much change there is between the scores in the two testing sessions. This type of reliability estimates measurement error from changes in human performance over time and is sometimes referred to as the stability coefficient. What is validity? Though internal consistency reliability is a way to determine if all of the items in a test measure the same thing, other information is collected to provide evidence that the items measure the right thing. In other words, does a test of verbal intelligence actually measure verbal intelligence or is it really measuring language proficiency? To answer this question, one might design a study to show that a new verbal intelligence test correlates highly with other established tests of verbal intelligence, but not as highly with tests of language development. This type of evidence of validity is called concurrent validity because different tests are given at the same time and the relationship between their scores is compared. If a new verbal test correlated highly with another test of verbal intelligence, this would be called evidence of convergent validity because the new test scores converge with scores from a known test of the same construct. If the new verbal test did not correlate as highly with a test of language proficiency, this would be evidence of divergent validity because the new test scores diverge with scores from a test which it is not supposed to relate to as highly. This shows that the two tests measure somewhat different constructs. 7

8 Many professionals ask us, What is the validity coefficient for this test? This is the wrong question to ask because validity is not a single number. It is a collection of evidence that supports the hypothesis that the test measures what it is supposed to measure. Some professionals ask, Is this test valid? Tests are not valid in general, but they are valid for specific purposes. A test of language development may be valid in assessing language development, for example, but not for assessing specific language disorders (e.g., pragmatic language disorder). So, the question should be, Is this test valid for the purpose for which I intend to use it? It is important to be clear about how you intend to use a test and then look for evidence of validity to support that use. Clinical validity refers to how the test performs in specific clinical populations. A test of working memory, for example, might be expected to show much lower mean scores in a clinical sample of subjects known to have working memory disorder as compared to a nonclinical sample. (Nonclinical simply means normal subjects.) In these studies, it is important that the clinical and nonclinical samples are matched according to other characteristics that may influence scores on the test, such as maternal education and age. In this way, you can be more certain that any differences observed between the clinical and nonclinical groups are truly due to the clinical disorder and not to other factors that were uncontrolled in the study because they are different between the two groups. Another concept related to clinical validity is statistical significance. If a finding is statistically significant, it means that you can probably repeat the finding if you conduct the study again. It is important for the score difference between the clinical and nonclinical groups to be statistically significant. It is even more important that the size of the difference is large enough to be clinically meaningful. Sometimes a difference of only a couple of points can be statistically significant (i.e., repeatable), but the difference may not be large enough to be clinically useful. To determine how meaningful the difference is, divide the difference by the standard deviation. Now you have a rough estimate of the effect size. Effect sizes (also called the standard difference) are often reported in the Examiner s or Technical Manual in a table, comparing a particular clinical group to a typically developing matched sample. Effect sizes of.20 are considered small, but perhaps still meaningful, depending on the purpose. Effect sizes of.50 and.80 are considered medium and large, respectively. Sometimes a test has a cut score (or cut-off score) to determine if the client is at risk and should be referred for more in-depth testing or has a particular disorder. So, in a test of depression, for example, one might say that any client with a score more than two standard deviations below the mean (i.e., 70) is classified as having depression, and any subject who scores above 70 is classified as nonclinical. We want to see how well this cut score differentiates between the clinical and nonclinical samples. As shown in Figure 3, subjects in the known clinical sample with scores below 70 are considered true positives because they are correctly classified as having the disorder. Subjects in the nonclinical sample with scores of 70 or higher are considered true negatives as they are correctly classified as not having the disorder. < Clinical True positive False negative Non-clinical False positive True negative Figure 3. Sensitivity and Specificity 8

9 Subjects in the known clinical sample with scores of 70 or higher are called false negatives because they have been classified as not having the disorder when they do have it. Those in the nonclinical sample with scores below 70 are called false positives because they have been incorrectly classified as having the disorder. False positives and negatives are always in a delicate balance, depending on where the cut score is set and the correct cut score depends on the purpose of testing. If the cut score is lowered, the percentage of false negatives increases and the percentage of false positives decreases. This may be appropriate in situations in which you want to be sure that you do not incorrectly label someone as having the disorder. If the cut score is raised, the percentage of false positives increase and the percentage of false negatives decrease. This is may be appropriate in situations when it is important to identify everyone who might have the disorder and incorrectly identifying a person does not have harmful consequences. Some tests with well-developed cut scores do not require norms. This may be the case when the purpose of the test is to classify subjects as belonging to one or another group, but not to rate the severity of a disorder. Test developers and researchers sometimes conduct studies with subjects already identified as having or not having a disorder. These studies are designed to evaluate the performance of a test. In real practice, you do not know ahead of time if the person you are testing has the disorder after all, that is why you are testing. To determine how likely you are to correctly classify someone as having the disorder in real practice, divide the number of true positive cases in the sensitivity/specificity table by the sum of the number of true positive and false positives cases. This will give you an estimate of the positive predictive power of the test in applied situations. Even this method may have problems, however, if the percentage of clients in your practice with that disorder is much higher than in the study. Conclusion Standardized clinical assessments are extremely useful scientific instruments that inform, but do not replace, professional judgement. They are created for use and interpretation by highly trained professionals who also take into account the client s history and other test scores. We hope that this brief paper gives you an appreciation for the science behind the tests you use in practice and, more importantly, the basic knowledge to evaluate the quality of the testing instruments you choose to use. Copyright 2016 NCS Pearson, Inc. All rights reserved. CLINA /16

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Early Childhood Measurement and Evaluation Tool Review

Early Childhood Measurement and Evaluation Tool Review Early Childhood Measurement and Evaluation (ECME), a portfolio within CUP, produces Early Childhood Measurement Tool Reviews as a resource for those

### A s h o r t g u i d e t o s ta n d A r d i s e d t e s t s

A short guide to standardised tests Copyright 2013 GL Assessment Limited Published by GL Assessment Limited 389 Chiswick High Road, 9th Floor East, London W4 4AL www.gl-assessment.co.uk GL Assessment is

### Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### X = T + E. Reliability. Reliability. Classical Test Theory 7/18/2012. Refers to the consistency or stability of scores

Reliability It is the user who must take responsibility for determining whether or not scores are sufficiently trustworthy to justify anticipated uses and interpretations. (AERA et al., 1999) Reliability

### Technical Report. Overview. Revisions in this Edition. Four-Level Assessment Process

Technical Report Overview The Clinical Evaluation of Language Fundamentals Fourth Edition (CELF 4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language

### Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: THE NORMAL CURVE AND "Z" SCORES: The Normal Curve: The "Normal" curve is a mathematical abstraction which conveniently

### Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview

Assessment, Case Conceptualization, Diagnosis, and Treatment Planning Overview The abilities to gather and interpret information, apply counseling and developmental theories, understand diagnostic frameworks,

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### Raw Score to Scaled Score Conversions

Jon S Twing, PhD Vice President, Psychometric Services NCS Pearson - Iowa City Slide 1 of 22 Personal Background Doctorate in Educational Measurement and Statistics, University of Iowa Responsible for

### Test Item Analysis & Decision Making Offered by the Measurement and Evaluation Center

Test Item Analysis & Decision Making Offered by the Measurement and Evaluation Center 1 Analyzing Multiple-Choice Item Responses Understanding how to interpret and use information based on student test

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### Psychometrics 101 Part 2: Essentials of Test Score Interpretation. Steve Saladin, Ph.D. University of Idaho

Psychometrics 101 Part 2: Essentials of Test Score Interpretation Steve Saladin, Ph.D. University of Idaho Standards for Educational and Psychological Testing 15.10 Those responsible for testing programs

### research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

### Test Reliability Indicates More than Just Consistency

Assessment Brief 015.03 Test Indicates More than Just Consistency by Dr. Timothy Vansickle April 015 Introduction is the extent to which an experiment, test, or measuring procedure yields the same results

### Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory

### Standardized Tests, Intelligence & IQ, and Standardized Scores

Standardized Tests, Intelligence & IQ, and Standardized Scores Alphabet Soup!! ACT s SAT s ITBS GRE s WISC-IV WAIS-IV WRAT MCAT LSAT IMA RAT Uses/Functions of Standardized Tests Selection and Placement

### The test uses age norms (national) and grade norms (national) to calculate scores and compare students of the same age or grade.

Reading the CogAT Report for Parents The CogAT Test measures the level and pattern of cognitive development of a student compared to age mates and grade mates. These general reasoning abilities, which

### Background and Technical Information for the GRE Comparison Tool for Business Schools

Background and Technical Information for the GRE Comparison Tool for Business Schools Background Business schools around the world accept Graduate Record Examinations (GRE ) General Test scores for admission

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### AUTISM SPECTRUM RATING SCALES (ASRS )

AUTISM SPECTRUM RATING ES ( ) Sam Goldstein, Ph.D. & Jack A. Naglieri, Ph.D. PRODUCT OVERVIEW Goldstein & Naglieri Excellence In Assessments In Assessments Autism Spectrum Rating Scales ( ) Product Overview

### The Detroit Tests of Learning Aptitude 4 (Hammill, 1998) are the most recent version of

Detroit Tests of Learning Aptitude 4 The Detroit Tests of Learning Aptitude 4 (Hammill, 1998) are the most recent version of Baker and Leland's test, originally published in 1935. To Hammill's credit,

### Technical Information

Technical Information Trials The questions for Progress Test in English (PTE) were developed by English subject experts at the National Foundation for Educational Research. For each test level of the paper

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### The Random Sampling Road to Reasonableness. Reduce Risk and Cost by Employing a Complete and Integrated Validation Process

The Random Sampling Road to Reasonableness Reduce Risk and Cost by Employing a Complete and Integrated Validation Process By: Michael R. Wade Planet Data Executive Vice President Chief Technology Officer

### DAYC-2 Developmental Assessment of Young Children 2 nd Edition. Holli Ford, M. Ed., BCBA al.us

DAYC-2 Developmental Assessment of Young Children 2 nd Edition Holli Ford, M. Ed., BCBA fordhp@vestavia.k12. al.us New Edition in 2012 The DAYC-2 is a comprehensive tool for infants and young children.

### Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Normative Data

Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Normative Data Version 2.0 Only 2003 Grant L. Iverson, Ph.D. University of British Columbia & Riverview Hospital Mark R. Lovell, Ph.D.

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

### Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD. Additional Information

RIAS Report Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD Name: Client Sample Gender: Female Year Month Day Ethnicity: Caucasian/White Grade/education: 11 th Grade Date Tested 2007 1 9 ID#: SC 123

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### WMS III to WMS IV: Rationale for Change

Pearson Clinical Assessment 19500 Bulverde Rd San Antonio, TX, 28759 Telephone: 800 627 7271 www.pearsonassessments.com WMS III to WMS IV: Rationale for Change Since the publication of the Wechsler Memory

### Research Design Concepts. Independent and dependent variables Data types Sampling Validity and reliability

Research Design Concepts Independent and dependent variables Data types Sampling Validity and reliability Research Design Action plan for carrying out research How the research will be conducted to investigate

### PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES

PRELIMINARY ITEM STATISTICS USING POINT-BISERIAL CORRELATION AND P-VALUES BY SEEMA VARMA, PH.D. EDUCATIONAL DATA SYSTEMS, INC. 15850 CONCORD CIRCLE, SUITE A MORGAN HILL CA 95037 WWW.EDDATA.COM Overview

### Technical report. Eleanor Semel and Elisabeth H Wiig. Overview. Revisions in this Edition

Technical report Eleanor Semel and Elisabeth H Wiig Overview The Clinical Evaluation of Language Fundamentals Fourth Edition UK (CELF 4 UK ) is an individually administered test for determining if a child

### Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests Final Report Sarah Maughan Ben Styles Yin Lin Catherine Kirkup September 29 Partial Estimates of Reliability:

### BAS3 PARENT REPORT. What is the BAS3? Why has my child been assessed? Why was the BAS3 used to assess my child?

What is the BAS3? The British Ability Scales 3 (BAS3) is a collection of individual tests that are used to assess the general thinking and reasoning skills of children aged from 3 to 17 years. The tests

### Linda J. Lombardino, R. Jane Lieberman, Jaumeiko J. C. Brown

TECHNICAL REPORT Linda J. Lombardino, R. Jane Lieberman, Jaumeiko J. C. Brown Overview...1 Assessment Process...1 Subtest Description...2 Scores Reported...3 Minimizing Item Bias...4 Reliability and Validity...4

### Behavior Rating Inventory of Executive Function - Adult Version BRIEF-A. Interpretive Report. Developed by

Behavior Rating Inventory of Executive Function - Adult Version BRIEF-A Interpretive Report Developed by Peter K. Isquith, PhD, Robert M. Roth, PhD, Gerard A. Gioia, PhD, and PAR Staff Client Information

### How Is the CPA Exam Scored?

How Is the CPA Exam Scored? Effective January 1, 2011 Prepared by the American Institute of Certified Public Accountants How is the CPA Exam Scored? Anyone who has taken the Uniform CPA Examination, prepared

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### University of Huddersfield Repository

University of Huddersfield Repository Whitaker, Simon The measurement of low IQ with the WAIS-IV: A critical review Original Citation Whitaker, Simon (2012) The measurement of low IQ with the WAIS-IV:

### TEST REVIEW. Purpose and Nature of Test. Practical Applications

TEST REVIEW Wilkinson, G. S., & Robertson, G. J. (2006). Wide Range Achievement Test Fourth Edition. Lutz, FL: Psychological Assessment Resources. WRAT4 Introductory Kit (includes manual, 25 test/response

### Estimate a WAIS Full Scale IQ with a score on the International Contest 2009

Estimate a WAIS Full Scale IQ with a score on the International Contest 2009 Xavier Jouve The Cerebrals Society CognIQBlog Although the 2009 Edition of the Cerebrals Society Contest was not intended to

### oxford english testing.com

oxford english testing.com The Oxford Online Placement Test: The Meaning of OOPT Scores Introduction oxford english testing.com The Oxford Online Placement Test is a tool designed to measure test takers

### Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

### Examinee Information. Assessment Information

A WPS TEST REPORT by Patti L. Harrison, Ph.D., and Thomas Oakland, Ph.D. Copyright 2010 by Western Psychological Services www.wpspublish.com Version 1.210 Examinee Information ID Number: Sample-03 Name:

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### Hypothesis Testing - Relationships

- Relationships Session 3 AHX43 (28) 1 Lecture Outline Correlational Research. The Correlation Coefficient. An example. Considerations. One and Two-tailed Tests. Errors. Power. for Relationships AHX43

Equivalence of Q-interactive - Administered Cognitive Tasks: WISC IV Q-interactive Technical Report 2 Mark H. Daniel, PhD Senior Scientist for Research Innovation July 2012 Copyright 2012. NCS Pearson,

### DATA COLLECTION AND ANALYSIS

DATA COLLECTION AND ANALYSIS Quality Education for Minorities (QEM) Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. August 23, 2013 Objectives of the Discussion 2 Discuss

### EDUCATIONAL ASSESSMENT

EDUCATIONAL ASSESSMENT Aptitude tests - predict ability to profit from specialized training Achievement tests - to assess mastery of material presented in a course of instruction I. APTITUDE TESTS Aptitude

### Standardized Assessment

CHAPTER 3 Standardized Assessment Standardized and Norm-Referenced Tests Test Construction Reliability Validity Questionnaires and Developmental Inventories Strengths of Standardized Tests Limitations

### CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

### Preschool Language Scales-5: Assessing Language from 0-7

Preschool Language Scales-5: Assessing Language from 0-7 Nancy Castilleja, MA CCC-SLP Pearson Assessment Course Objectives identify three key differences between PLS-4 and PLS-5 describe two research studies

### Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

### UNDERSTANDING TERRA NOVA FOR TEACHERS AND PARENTS

TERRANOVA TESTING.. UNDERSTANDING TERRA NOVA FOR TEACHERS AND PARENTS WHAT IS TERRA NOVA? Terra Nova is a norm-reference nationally standardized Achievement test. Nationally standardized means that the

### How to Read a Research Article

RACHEL DUNIFON How to Read a Research Article The goal of this Research Brief is to provide information that will make reading a research article more illuminating. For example, you might want to learn

### Comprehensive Assessment of Spoken Language (CASL) The Comprehensive Assessment of Spoken Language (Carrow-Woolfolk, 1999a) is a

Comprehensive Assessment of Spoken Language (CASL) The Comprehensive Assessment of Spoken Language (Carrow-Woolfolk, 1999a) is a norm-referenced, individually administered test designed to measure expressive,

### ALBUQUERQUE PUBLIC SCHOOLS

ALBUQUERQUE PUBLIC SCHOOLS Speech and Language Initial Evaluation Name: Larry Language School: ABC Elementary Date of Birth: 8-15-1999 Student #: 123456 Age: 8-8 Grade:6 Gender: male Referral Date: 4-18-2008

### Early Childhood Measurement and Evaluation Tool Review

Early Childhood Measurement and Evaluation Tool Review Early Childhood Measurement and Evaluation (ECME), a portfolio within CUP, produces Early Childhood Measurement Tool Reviews as a resource for those

### INCREASE YOUR PRODUCTIVITY WITH CELF 4 SOFTWARE! SAMPLE REPORTS. To order, call 1-800-211-8378, or visit our Web site at www.pearsonassess.

INCREASE YOUR PRODUCTIVITY WITH CELF 4 SOFTWARE! Report Assistant SAMPLE REPORTS To order, call 1-800-211-8378, or visit our Web site at www.pearsonassess.com In Canada, call 1-800-387-7278 In United Kingdom,

### Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

### Summary of IVA-2 Research Studies

Summary of IVA-2 Research Studies The IVA+Plus and IVA-2 test procedures, structure and quotient scales are the same. Since the test itself, including the norms, have not changed, all of the original reliability

### Interpretive Report of WMS IV Testing

Interpretive Report of WMS IV Testing Examinee and Testing Information Examinee Name Date of Report 7/1/2009 Examinee ID 12345 Years of Education 11 Date of Birth 3/24/1988 Home Language English Gender

### Vineland Adaptive Behaviour Scales (VABS) II

Outcome Measure Sensitivity to Change Population Domain Type of Measure ICF-Code/s Description Vineland Adaptive Behaviour Scales (VABS) II Yes Paediatrics Behavioural Function Neuropsychological Impairment

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 1.3 Homework Answers 1.80 If you ask a computer to generate "random numbers between 0 and 1, you uniform will get observations

### Benchmark Assessment in Standards-Based Education:

Research Paper Benchmark Assessment in : The Galileo K-12 Online Educational Management System by John Richard Bergan, Ph.D. John Robert Bergan, Ph.D. and Christine Guerrera Burnham, Ph.D. Submitted by:

About Hypothesis Testing TABLE OF CONTENTS About Hypothesis Testing... 1 What is a HYPOTHESIS TEST?... 1 Hypothesis Testing... 1 Hypothesis Testing... 1 Steps in Hypothesis Testing... 2 Steps in Hypothesis

### Harrison, P.L., & Oakland, T. (2003), Adaptive Behavior Assessment System Second Edition, San Antonio, TX: The Psychological Corporation.

Journal of Psychoeducational Assessment 2004, 22, 367-373 TEST REVIEW Harrison, P.L., & Oakland, T. (2003), Adaptive Behavior Assessment System Second Edition, San Antonio, TX: The Psychological Corporation.

### Data Analysis: Describing Data - Descriptive Statistics

WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

### How to Verify Performance Specifications

How to Verify Performance Specifications VERIFICATION OF PERFORMANCE SPECIFICATIONS In 2003, the Centers for Medicare and Medicaid Services (CMS) updated the CLIA 88 regulations. As a result of the updated

### standardized tests used to assess mental ability & development, in an educational setting.

Psychological Testing & Intelligence the most important aspect of knowledge about genetic variability is that it gives us respect for people s individual differences. We are not all balls of clay that

### Measuring reliability and agreement

Measuring reliability and agreement Madhukar Pai, MD, PhD Assistant Professor of Epidemiology, McGill University Montreal, Canada Professor Extraordinary, Stellenbosch University, S Africa Email: madhukar.pai@mcgill.ca

### CRITICAL THINKING ASSESSMENT

CRITICAL THINKING ASSESSMENT REPORT Prepared by Byron Javier Assistant Dean of Research and Planning 1 P a g e Critical Thinking Assessment at MXC As part of its assessment plan, the Assessment Committee

### Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management

### Introduction to Reliability

Introduction to Reliability What is reliability? Reliability is an index that estimates dependability (consistency) of scores Why is it important? Prerequisite to validity because if you are not measuring

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### 4. Describing Bivariate Data

4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with

### Validity of the Measure of Academic Proficiency and Progress (MAPP TM ) Test

TM Validity of the Measure of Academic Proficiency and Progress (MAPP TM ) Test Measurable Results. Immeasurable Value. A Summary from ETS Introduction Validity of the Measure of Academic Proficiency and

### Guidelines for Medical Documentation of ADHD

National Conference of Bar Examiners 302 South Bedford Street, Madison, WI 53703 Phone: (608) 316-3070; Fax: (608) 316-3119 Email: mpre.ada@ncbex.org Request for MPRE Test Accommodations Guidelines for

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop

Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Session 1.6 Measures of Central Tendency

Session 1.6 Measures of Central Tendency Measures of location (Indices of central tendency) These indices locate the center of the frequency distribution curve. The mode, median, and mean are three indices

### Accountability Brief

Accountability Brief Public Schools of North Carolina State Board of Education North Carolina Department of Public Instruction Michael E. Ward, Superintendent March, 2003 Setting Annual Growth Standards:

### Chapter 8. Hypothesis Testing

Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing

### Measurement Types of Instruments

Measurement Types of Instruments Y520 Strategies for Educational Inquiry Robert S Michael Measurement - Types of Instruments - 1 First Questions What is the stated purpose of the instrument? Is it to:

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### The California Critical Thinking Skills Test. David K. Knox Director of Institutional Assessment

The California Critical Thinking Skills Test David K. Knox Director of Institutional Assessment What is the California Critical Thinking Skills Test? The California Critical Thinking Skills Test (CCTST)

### Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

### Behavior Rating Inventory of Executive Function BRIEF. Interpretive Report. Developed by. Peter K. Isquith, PhD, Gerard A. Gioia, PhD, and PAR Staff

Behavior Rating Inventory of Executive Function BRIEF Interpretive Report Developed by Peter K. Isquith, PhD, Gerard A. Gioia, PhD, and PAR Staff Client Information Client Name : Sample Client Client ID