# What are plausible values and why are they useful?

Save this PDF as:

Size: px
Start display at page:

Download "What are plausible values and why are they useful?"

## Transcription

1 What are plausible values and why are they useful? Matthias von Davier Educational Testing Service, Princeton, New Jersey, United States Eugenio Gonzalez Educational Testing Service, Princeton, New Jersey, United States Robert J. Mislevy University of Maryland, College Park, Maryland, United States This paper shows some of the adverse effects that can occur when plausible values are not used or are used incorrectly during analysis of large-scale survey data. To ensure that the uncertainty associated with measures of skills in largescale surveys is properly taken into account, researchers need to follow the procedures outlined in the article and in more detail in, for example, Little and Rubin (1987) and Schafer (1997). There is no need to rely on computational shortcuts by averaging plausible values before calculations: analytical shortcuts such as averaging plausible values produce biased estimates and should be discouraged. All analyses using survey assessment results need not only use the data as provided by the different assessment organizations but also adhere to the appropriate procedures described in this article and the included references, as well as in the guides provided for users of international survey assessment databases. 9

4 IERI MONOGRAPH SERIES: ISSUES AND METHODOLOGIES IN LARGE-SCALE ASSESSMENTS A Simple Example Assume we want to predict the score of a basketball free-throw contest, 1 and assume we have students from two different schools. Students from School A tend to succeed, on average, on 50% of the free-throw trials, and the success rates across students is normally distributed. Students from School B tend to succeed, on average, on 70% of the free-throw trials, and the success rate across students is also normally distributed. Let us now assume that within each school the standard deviation of a student s success rate in free throws is 10%. Thus, only a few students in School A are likely to have a true free-throw success rate as high as 70%, and only a few of the students from School B are likely to have a free-throw success rate as low as 50%. We want to come up with a good estimate of a student s true free-throw success rate based on a limited number of observations and our knowledge of the school she attends. Note that we can only observe the trials of a student selected for the tryouts, and we can observe the school the student belongs to (we can ask her). The true success rate of 50% cannot be seen, and a true rate of 50% means that the student does not have to succeed at a rate of 50% in all cases; it simply means that the student with this success rate should succeed in 50% of throws over the long run, and that with each single shot, the student stands a chance of succeeding or failing. Let us now assume that a student who wants to participate in the tryout learns that each applicant gets only three shots, so he tries to find out what results he can expect. Table 2 presents the results of this student s practice with three shots, which he repeats 10 times. Table 2: Results for 10 repetitions of three-throw tryouts of a student from School A with an average long-term success rate of 50% Trial Rate The first thing we notice is that, for three free-throws, we can observe only 0%, 33%, 67%, and 100% levels of success, and no other percentages in between. The average percentage across the 10 repetitions, though, lies between these numbers, and is 53% for the data given in Table 2. The standard deviation of our 10 averages is 36 when basing the try-out sample on three free-throws only, again using the data in Table 2 for calculations. From 1 Note that the estimates in this example are based on very small sample sizes. Statistics based on such small samples would not be reported in large-scale survey assessments because the errors associated with such estimates would be too large. The example presented in this section is for demonstration purposes only and intended to explain concepts rather than actual procedures. 12

5 What are plausible values and why are they useful? inspection of this table, we also see that this student would be quite successful in 6 out of 10 cases and produce a success rate of either 100% or 67%. However, there are also two cases with 0% success. Obviously, three throws is a small number from which to accurately estimate the success rate of a player. What happens when the number of throws is increased by a factor of four? Table 3 contains the data for a student with the same actual long-term success rate (50% success rate), who throws 10 repeats of 12 trials each. Table 3: Results for 10 repetitions of 12-throw tryouts of a student from School A with a long-term success rate of 50% Trial Rate The 10 averages of 12 trials appear in the last row of Table 3. Notice that there are no cases where all trials were successful (100%) or unsuccessful (0%). The largest success rate in this sample is 83% (repeated trial number 4); the smallest success rate is 42% in the repeated trials 2, 6, 7, 9, and 10. The average success rate of these 12 trials is 52%, and the standard deviation of these averages over 12 trials is Note that this is still not exactly 50%, even though this player threw 120 times. This outcome is not an error, but is due to the fact that the actual results of a limited number of trials generally vary somewhat from the long-term expected success rate. Some trials will be slightly below the expected value; some will be slightly above. We can determine the theoretical standard deviation of the averages mathematically in this simple experiment. The standard deviation of a single trial is 50 in this case, and this number needs to be divided by the square root of the number of trials, which gives us a theoretical standard deviation of the average percentage, that is, the standard error (s. e.) of the estimate of average percentage of success. For a threethrow tryout, the standard deviation of the percentage is 28.87; for the 12-throw trials, it is Intuitively, these differences in standard deviations of averages make 13

6 IERI MONOGRAPH SERIES: ISSUES AND METHODOLOGIES IN LARGE-SCALE ASSESSMENTS considerable sense in that more trials per tryout seem to lead to more consistent results. Unfortunately, this approach to obtaining higher accuracy and reducing uncertainty about the estimate seems to be a rather inefficient one. For a target of a standard error of 5%, we would need 100 free-throws per tryout; a standard error of 1% would require 2,500 throws. Using What We Know about the Student While it would be good to use other students tryout throws from the same school as evidence of what we could expect if a student from this school comes to a tryout, how much can we gain from this type of information? Given our knowledge about student averages, and their variability, relative to the long-term success rates from the different schools, we might be able to stabilize or improve what we know if we could see a very small number of trial free-throws. What, then, would be our best guess of a student s success rates on the free throws if we had not seen this student perform any free throws but knew which school he was attending? Our guess would be that this student would have a success rate of 70% if he came from School B. And if he were from School A, his success rate would be 50%, right? So what would we guess if this student threw once and succeeded, and is from School A? Still 50%, or would we guess 100% based on a single trial, or somewhere in between? When we have only a very limited number of observations, it is hard to judge how a player will perform in the long run. In such instances, including information on additional variables, for example, the school the student comes from, can help. Note that we may not do the right thing for an individual student because we can get lucky in terms of finding a student from School A who has an actual long-term success performance of 70%. We can also find students from School B who have an actual long-term success rate of only 50%. However, this situation changes drastically if we compare groups of students or selections involving multiple students (as in teams, for example). Group-level Considerations The person putting together a team and the individual player chosen (or not) as part of that team operate from different perspectives. The person who selects multiple players for a team is interested mostly in how that set of players will perform on average, and how much variability this team will show in terms of performance. The individual player, however, wants his or her actual performance reflected in the most accurate terms. As a continuation of the tryout example, let us assume that we are in the process of selecting a bunch of new players from the two Schools A and B. How should we proceed in that case? Let us say we have observed the actual performance on 10 trials for 20 players from each of the two schools. Figure 1 shows a distribution of observed scores generated based on the information we know about the schools and 14

7 What are plausible values and why are they useful? the number of successful free-throws for the two groups. Also shown in the graph in parentheses are the numbers we are interested in but cannot directly see: the actual long-term success rates of each of the students next to their actual score based on the 10 throws. As an example of how to read the graph, Score 4 is observed in two cases in School A and two cases in School B. The students from A who scored 4 have an actual long-run success rate of 39% and 53%, whereas the students from School B with Score 4 have success rates in the long-run of 71% and 43%. Also note that no student from School A scored higher than 8, and no student from School B scored lower than 4. Figure 1: Distribution of 20 students from each school scoring on 10 free-throws in a tryout for a new team School A: E=50%, S=10% Score School B: E=70%, S=10% -/- 0 -/- -/- 1 -/- (43%) (33%) 2 -/- (39%) (58%) 3 -/- (39%) (53%) 4 (71%) (43%) (56%) (47%) (43%) (54%) 5 (58%) (55%) (62%) (67%) (39%) (64%) (54%) (56%) (49%) 6 (70%) (64%) (62%) (70%) (51%) (49%) (67%) 7 (66%) (67%) (65%) (85%) (53%) (69%) 8 (65%) (68%) (71%) -/- 9 (72%) (84%) -/- 10 (73%) What is particularly interesting about Figure 1 is the fact that students with rather different true success rates get the same observed score on the 10 trials, and that the average true success rate by school is also different for the same score level. For example, the group of students with Score 6 has an average true success rate of 52.4% ( 1 /5*( )) for School A, and an average of 66.5% (¼*( )) for students from School B. While readers might think these results are fabricated to make a point, 2 these numbers are indeed based on data generated using random draws calculated with a spreadsheet fed with only the basic long-term success rates determined for Schools A and B. We later show how the same phenomenon is observed when we use simulated student response data from an assessment. Let us now look at the same data from a different perspective. Table 4 gives the means of the true success rates of the (admittedly small) groups of students in the percentage metric. It also gives the observed score expressed as a percentage, as well 2 We did not fabricate the values in Figure 1 but obtained them by using a spreadsheet and available statistical functions that allowed us to conduct random draws from a uniform distribution, a function that allows calculation of inverse normal function values, and a Boolean logic function. 15

8 IERI MONOGRAPH SERIES: ISSUES AND METHODOLOGIES IN LARGE-SCALE ASSESSMENTS as the difference between the two. In Table 4, we observe, as we move away from the school average true success rate, larger differences between the observed score and the average true score of those who obtained it. Table 4: Observed scores expressed as percentages on 10 trials, average true success rates in the score groups by school, and differences between observed scores and averages Score Mean A Difference A Mean B Difference B /- -/ /- -/ /- -/ /- -/ These differences seem to be centered on the average skill we identified for the two schools. The smallest difference between the observed score and the average true skill of students from School A is found for score 50 (which is also the average percentage of success for School A). The smallest difference between the observed scores and the average true skill of students from School B is found at score level 70 (which is the average success rate overall for students in School B). Given these observations, we would substantially misjudge the situation if we were to state that students from School A with a score of 60 (6/10 successful throws x 100) have the same average success rate compared to students from School B with a score of 60. In our example, the average success rate based on the long-term rates for students with this score is 52.4% for students from School A and 66.5% for students from School B. (We will show this effect again later with our simulated assessment data from a much larger sample.) For the person forming a new team, the best decision would be to select students from School B at a higher rate than students from School A, even if those students have the same observed score, because the long-term success rate of students seems to be higher for students in School B than for those in School A. There is a way of taking into account the fact that we know, from previous trials, how students from these different school teams perform, on average, and how their performance varies. Table 5 provides an example of how this might look. Note that, for only one throw, we would remain very close to the school-based long-term averages as our guess for the student s expected long-term success rate. Because we 16

9 What are plausible values and why are they useful? do not know much after seeing just one successful throw, all we can do is estimate that a student who throws once and succeeds will have a expected long-term success rate of 51.92% if she is from School A and an estimated long-term success rate of 71.36% if she is from School B. Table 5: Average long-term success rate used to derive expected success rates given the successes on a number of throws, Schools A and B A B A B A B A B Throws /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ Note: Values given are for one throw, 10 throws, 100 throws, 500 throws. This situation changes markedly when we increase the number of throws for each of the candidates. After completing 100 throws, a student from School A who succeeded on 70% of the throws would have an estimated success rate of 66%, quite close to the 70% he has shown. This estimate is 16 percentage points away from what we would expect if we only knew which school the student came from. A student from School B who succeeded in 50 out of 100 cases would get an estimated success rate of 53.47%, much closer to the 50% he has shown than to the 70% we would expect if all we knew was which school he came from. The values for our expectations after 500 throws would be even closer to the observed percentages. In our comparison between the schools, we had each student throw 10 times, which obviously presents us with a better picture than if that student had thrown just once, but is still not as informative as seeing 100 throws. We have tabulated the result for 20 students each from School A and School B based on 10 throws. Table 6 shows the relationship between the expected values, the observed values, and the actual values obtained from our sample of students from each school. 17

10 IERI MONOGRAPH SERIES: ISSUES AND METHODOLOGIES IN LARGE-SCALE ASSESSMENTS Table 6: Observed scores based on 10 throws, expected values based on the observed score and prior information about school-players average long-term performance and variability of players within schools Observed Expected Mean A Diff A Observed Expected Mean B Diff B /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ /- -/ In order to derive these expected scores based on observed data (free throws) and prior knowledge (long-term school average and within-school variability), we used Bayesian methods as the tool of choice. Without going into the mathematical details and formal introductions of the concepts involved, we endeavor, through the text box below, to give a little more detail for interested readers. By using some simplifying assumptions that allow us to approximate the discrete observed success rate variable with a continuous normally distributed variable, we can derive the expected long-term proficiency for each observed score, given the known school-team membership of each student. Likelihoods, Prior Information, and Posteriors Figures 2 and 3 illustrate how we obtained the numbers in Table 6. The two figures show an abscissa (x-coordinate) axis that represents the long-term success rate expressed as a percentage. The ordinate (y-coordinate) axis represents the function value of the continuous probability densities in the figures. The figures include a dashed line that represents the probability density based on the assumption that the long-term success rate is normally distributed, with an expectation of 70% and a standard deviation of 10% in School B. The two dotted plots in the figures represent different amounts of information gathered (the likelihood) from observing 10 throws and four successes (represented in Figure 2), or from observing 100 throws and 40 successes (represented in Figure 3). 18

11 What are plausible values and why are they useful? Figure 2: Relationship between a priori distribution of a measure in School B, the likelihood of the observed data (10 throws), and the derived posterior distribution of the given observations Key: Long-term proficiency in School B Likelihood for 10 throws Posterior distribution Figure 3: Relationship between a priori distribution of a measure in School B, the likelihood of the observed data (100 throws), and the derived posterior distribution of the given observations Key: Long-term proficiency in School B Likelihood for 10 throws Posterior distribution The product of the ordinate (y-coordinate) values of the dashed and the dotted lines are calculated, normalized, and plotted as a solid line. The solid lines depict the resulting posterior distribution, which integrates our observations and prior knowledge about the long-term success rates in a given school. Note how in both figures the solid line is between the prior distribution given for the school and the curve depicting the likelihood of success for the 10-throw trial as well as for the 100-throw trial. The two figures show that the best guess we have might be located between what we observe in the student try-out and what we know about the school in terms of long-term success rate. For only 10 trials and 4 successes, the posterior expectation for a student is much closer to the school-based distribution, which means our best prediction for the student s long-term success would be about 60%. If 100 throws are observed, our best guess is much closer to the observed value of 40 out of 100 successful trials; for a student from School B, our best guess after 100 throws would be about 45%. 19

12 IERI MONOGRAPH SERIES: ISSUES AND METHODOLOGIES IN LARGE-SCALE ASSESSMENTS Note that the differences between the expected columns and the averages of the actual long-term performance from our sample in Table 6 are much smaller than the differences between the observed score and the average long-term success in Table 4. Note also that this method gives us expected values even for extreme observed scores such as 0 or 10 successes, which would not qualify as a reasonable estimate of the long-term success rate. For 10/10 successful throws, we get an expected long-term success rate of 64.3% for students from School A and 78.6% for School B, which in both is 64.3%-50% = 14.3 and 78.6%-70% = 8.6 percentage points higher than the respective average school performances, but is still not quite at 100%. The same occurs for 0/10 successful throws. Here, we would expect that students who show this performance and are from School A would have, on average, an actual long-term success rate of only 35%, while students from School B would still have a long-term success rate of 50%, even if they had not scored once during 10 trials. These are 14.3 and 20 percentage points lower than the respective school averages, but not quite 0%. We could argue that these estimates make more sense because an estimate of 0% seems too extreme given that we observed only 10 throws, and given that we know how the students from each school tend to perform in the long run. Recall that the goal is not to assign a mark, grade, or score to individual students, but to describe a group of students in terms of their long-term expected success rate and the variability of the success rate based on the data we have. The expected values presented in Table 6 do seem to perform well in terms of tracking the expected longterm success rates given school membership and the observed number of successes during 10 throws. Note, however, that the expected values do not represent the fact that we have variability of long-term success rates even for the same given number of observed successes. Figure 4 gives an impression of the variability of scores if they were based on the expected values (rounded to save space in the figure) compared to the distribution of observed score-based percentages. The values in the figure are based on School A only; the results for School B would be similar. Figure 4: Distribution of observed score-based percentages and expected percentages for the 20 students from School A Observed score percentages Score Expected score percentages -/- 0 -/- -/- 1 -/- (20%) (20%) 2 (41%) (41%) (30%) (30%) 3 (44%) (44%) (40%) (40%) 4 (47%) (47%) (50%) (50%) (50%) (50%) 5 (50%) (50%) (50%) (50%) (60%) (60%) (60%) (60%) (60%) 6 (53%) (53%) (53%) (53%) (53%) (70%) (70%) (70%) 7 (56%) (56%) (56%) (80%) (80%) 8 (59%) (59%) -/- 9 -/- -/- 10 -/- 20

13 What are plausible values and why are they useful? In Figure 4, the score obtained in the 10-throw tryout for students from School A is associated with either the observed score-based percentage (on the left-hand side) or the expected values given in Table 5 on the right-hand side. When we calculate the variability of these measures based on the 20 values obtained from the left-hand side of Figure 4, we arrive at an estimate of for the standard deviation of the observed score-based percentages, and an estimate of 5.01 for the standard deviation of the expected value-based measures on the right-hand side. The actual standard deviation of the sample, as given in Figure 1, is 9.5; the observation-only-based values over-estimate this value, while the expectation-based values under-estimate the actual variability of the long-term success rate in our sample. Uncertainty Where Uncertainty Is Due The expected values obtained from the free-throw score and prior knowledge based on school performance are useful in establishing that a result from a short tryout may not be the best predictor of long-term performance. This is especially the case when groups are concerned: we need to describe them in terms of statistical characteristics that are free from undesirable effects introduced by observing only a very small selection of the behavior of interest (as with the few trials in our example). The expected values presented in Table 5 are surprisingly close to the actual average of the players given their observed performance. However, we found that the expected values did not reflect the reality of only a few trials; the true long-term success rate of students with the same number of successes on a short trial was still quite variable. Unfortunately, the actual long-term success rates are not something we would have available. So how can we make up for the fact that the observed values do not serve our purpose, while the expected values do not vary enough? Note that Figure 4 reveals something peculiar about our use of the expectations we generated. We always replace the same observed score for a given school with the same expected percentage. In contrast, the actual long-term performances of students vary, even if they have the same observed number of successful throws. There is a way to reintroduce the knowledge that the long-term rate estimates are not certain even for the same observed score. This possibility also involves an application of Bayesian statistical methods. We thus derived, in similar fashion to how we produced the expected values, a measure of variability around these expected values. Figures 3 and 4 showed that while we were able to derive an expected value for each set of numbers of throws, for successful throws, and for a given school, we still witnessed considerable variability around this expected value. A more appropriate representation of this remaining uncertainty can be gained by generating values that follow these posterior distributions. The values, randomly drawn from the posterior distribution show that, after 10 throws and knowing which school a student comes from, a considerable amount of uncertainty remains about this student s probable long-term success rate. Even after 100 throws, we still see some uncertainty. 21

### Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

### Sampling Distributions and the Central Limit Theorem

135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

### Performance Assessment Task Fair Game? Grade 7. Common Core State Standards Math - Content Standards

Performance Assessment Task Fair Game? Grade 7 This task challenges a student to use understanding of probabilities to represent the sample space for simple and compound events. A student must use information

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### THE STATISTICAL TREATMENT OF EXPERIMENTAL DATA 1

THE STATISTICAL TREATMET OF EXPERIMETAL DATA Introduction The subject of statistical data analysis is regarded as crucial by most scientists, since error-free measurement is impossible in virtually all

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Benchmark Assessment in Standards-Based Education:

Research Paper Benchmark Assessment in : The Galileo K-12 Online Educational Management System by John Richard Bergan, Ph.D. John Robert Bergan, Ph.D. and Christine Guerrera Burnham, Ph.D. Submitted by:

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Distributions: Population, Sample and Sampling Distributions

119 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 9 Distributions: Population, Sample and Sampling Distributions In the three preceding chapters

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Fixed-Effect Versus Random-Effects Models

CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

### Mathematics Cognitive Domains Framework: TIMSS 2003 Developmental Project Fourth and Eighth Grades

Appendix A Mathematics Cognitive Domains Framework: TIMSS 2003 Developmental Project Fourth and Eighth Grades To respond correctly to TIMSS test items, students need to be familiar with the mathematics

### Hypothesis Testing. Chapter Introduction

Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic

### POLI 300 Handout #2 N. R. Miller RANDOM SAMPLING. Key Definitions Pertaining to Sampling

POLI 300 Handout #2 N. R. Miller Key Definitions Pertaining to Sampling RANDOM SAMPLING 1. Population: the set of units (in survey research, usually individuals or households), N in number, that are to

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

### What Does the Correlation Coefficient Really Tell Us About the Individual?

What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment

### SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Problem of the Month Through the Grapevine

The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems

### AP Statistics 2011 Scoring Guidelines

AP Statistics 2011 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

### Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

### MEASURES OF DISPERSION

MEASURES OF DISPERSION Measures of Dispersion While measures of central tendency indicate what value of a variable is (in one sense or other) average or central or typical in a set of data, measures of

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### 1.7 Graphs of Functions

64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most

### OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

### Asymmetry and the Cost of Capital

Asymmetry and the Cost of Capital Javier García Sánchez, IAE Business School Lorenzo Preve, IAE Business School Virginia Sarria Allende, IAE Business School Abstract The expected cost of capital is a crucial

### SAMPLING DISTRIBUTIONS

0009T_c07_308-352.qd 06/03/03 20:44 Page 308 7Chapter SAMPLING DISTRIBUTIONS 7.1 Population and Sampling Distributions 7.2 Sampling and Nonsampling Errors 7.3 Mean and Standard Deviation of 7.4 Shape of

### PROBABILITIES AND PROBABILITY DISTRIBUTIONS

Published in "Random Walks in Biology", 1983, Princeton University Press PROBABILITIES AND PROBABILITY DISTRIBUTIONS Howard C. Berg Table of Contents PROBABILITIES PROBABILITY DISTRIBUTIONS THE BINOMIAL

### 11.2 POINT ESTIMATES AND CONFIDENCE INTERVALS

11.2 POINT ESTIMATES AND CONFIDENCE INTERVALS Point Estimates Suppose we want to estimate the proportion of Americans who approve of the president. In the previous section we took a random sample of size

### Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

### Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 13 and Accuracy Under the Compound Multinomial Model Won-Chan Lee November 2005 Revised April 2007 Revised April 2008

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Grade 7 Math Statistics and Probability Grade 7 Math Grade 7 Math Start Date: March 18, 2013 End Date : May 24, 2013

Unit Overview Students will be able to: use random samples to show that they produce representative samples and support valid inferences in statistics use random sample results to draw inferences about

### Intro to Data Analysis, Economic Statistics and Econometrics

Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### Stat 20: Intro to Probability and Statistics

Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014 By the end of this lecture... You will be able to: Determine what we expect the sum

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### The Effect of Admissions Test Preparation: Evidence from NELS:88

The Effect of Admissions Test Preparation: Evidence from NELS:88 Introduction For students planning to apply to a four year college, scores on standardized admissions tests--the SAT I or ACT--take on a

### 3. Data Analysis, Statistics, and Probability

3. Data Analysis, Statistics, and Probability Data and probability sense provides students with tools to understand information and uncertainty. Students ask questions and gather and use data to answer

### APPENDIX N. Data Validation Using Data Descriptors

APPENDIX N Data Validation Using Data Descriptors Data validation is often defined by six data descriptors: 1) reports to decision maker 2) documentation 3) data sources 4) analytical method and detection

### The Correlation Heuristic: Interpretations of the Pearson Coefficient of Correlation are Optimistically Biased

The Correlation Heuristic: Interpretations of the Pearson Coefficient of Correlation are Optimistically Biased Eyal Gamliel, Behavioral Sciences Department, Ruppin Academic Center. E-mail: eyalg@ruppin.ac.il

### Need for Sampling. Very large populations Destructive testing Continuous production process

Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

### Problem Solving and Data Analysis

Chapter 20 Problem Solving and Data Analysis The Problem Solving and Data Analysis section of the SAT Math Test assesses your ability to use your math understanding and skills to solve problems set in

### Common Core State Standards. Standards for Mathematical Practices Progression through Grade Levels

Standard for Mathematical Practice 1: Make sense of problems and persevere in solving them. Mathematically proficient students start by explaining to themselves the meaning of a problem and looking for

### Investigation in Learning and Teaching of Graphs of Linear Equations in Two Unknowns

Investigation in Learning and Teaching of Graphs of Linear Equations in Two Unknowns Arthur Man Sang Lee The University of Hong Kong Anthony Chi Ming Or Education Infrastructure Division, Education Bureau

### Credit Card Market Study Interim Report: Annex 4 Switching Analysis

MS14/6.2: Annex 4 Market Study Interim Report: Annex 4 November 2015 This annex describes data analysis we carried out to improve our understanding of switching and shopping around behaviour in the UK

### EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY

EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY Objective: To learn how to use the Excel spreadsheet to record your data, calculate values and make graphs. To analyze the data from the Acceleration Due

### STATISTICS 151 SECTION 1 FINAL EXAM MAY

STATISTICS 151 SECTION 1 FINAL EXAM MAY 2 2009 This is an open book exam. Course text, personal notes and calculator are permitted. You have 3 hours to complete the test. Personal computers and cellphones

### Dealing with Missing Data

Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

### What is the purpose of this document? What is in the document? How do I send Feedback?

This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Statistics

### ELEMENTARY STATISTICS IN CRIMINAL JUSTICE RESEARCH: THE ESSENTIALS

ELEMENTARY STATISTICS IN CRIMINAL JUSTICE RESEARCH: THE ESSENTIALS 2005 James A. Fox Jack Levin ISBN 0-205-42053-2 (Please use above number to order your exam copy.) Visit www.ablongman.com/replocator

### We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

Statistical Learning: Chapter 5 Resampling methods (Cross-validation and bootstrap) (Note: prior to these notes, we'll discuss a modification of an earlier train/test experiment from Ch 2) We discuss 2

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

### PA Common Core Standards Standards for Mathematical Practice Grade Level Emphasis*

Habits of Mind of a Productive Thinker Make sense of problems and persevere in solving them. Attend to precision. PA Common Core Standards The Pennsylvania Common Core Standards cannot be viewed and addressed

### Stability of School Building Accountability Scores and Gains. CSE Technical Report 561. Robert L. Linn CRESST/University of Colorado at Boulder

Stability of School Building Accountability Scores and Gains CSE Technical Report 561 Robert L. Linn CRESST/University of Colorado at Boulder Carolyn Haug University of Colorado at Boulder April 2002 Center

### Algebra 1 Course Information

Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

### 5.4 The Quadratic Formula

Section 5.4 The Quadratic Formula 481 5.4 The Quadratic Formula Consider the general quadratic function f(x) = ax + bx + c. In the previous section, we learned that we can find the zeros of this function

### High School Functions Building Functions Build a function that models a relationship between two quantities.

Performance Assessment Task Coffee Grade 10 This task challenges a student to represent a context by constructing two equations from a table. A student must be able to solve two equations with two unknowns

### 99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA

Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected

### PRACTICE BOOK MATHEMATICS TEST (RESCALED) Graduate Record Examinations. This practice book contains. Become familiar with

This book is provided FREE with test registration by the Graduate Record Examinations Board. Graduate Record Examinations This practice book contains one actual full-length GRE Mathematics Test (Rescaled)

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Comparing Alternate Designs For A Multi-Domain Cluster Sample

Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705

### Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un

Quadratic Modeling Business 10 Profits In this activity, we are going to look at modeling business profits. We will allow q to represent the number of items manufactured and assume that all items that

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Gaming the Law of Large Numbers

Gaming the Law of Large Numbers Thomas Hoffman and Bart Snapp July 3, 2012 Many of us view mathematics as a rich and wonderfully elaborate game. In turn, games can be used to illustrate mathematical ideas.

### Unit 16 Normal Distributions

Unit 16 Normal Distributions Objectives: To obtain relative frequencies (probabilities) and percentiles with a population having a normal distribution While there are many different types of distributions

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop

Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

### Measurement with Ratios

Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

### Anti-Gaming in the OnePipe Optimal Liquidity Network

1 Anti-Gaming in the OnePipe Optimal Liquidity Network Pragma Financial Systems I. INTRODUCTION With careful estimates 1 suggesting that trading in so-called dark pools now constitutes more than 7% of

### 9. Sampling Distributions

9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

### Chapter 8: Interval Estimates and Hypothesis Testing

Chapter 8: Interval Estimates and Hypothesis Testing Chapter 8 Outline Clint s Assignment: Taking Stock Estimate Reliability: Interval Estimate Question o Normal Distribution versus the Student t-distribution:

### Statistics and Probability Investigate patterns of association in bivariate data.

Performance Assessment Task Scatter Diagram Grade 9 task aligns in part to CCSSM grade 8 This task challenges a student to select and use appropriate statistical methods to analyze data. A student must