Center, Spread, and Shape in Inference: Claims, Caveats, and Insights



Similar documents
Hypothesis testing. Null and alternative hypotheses

1. C. The formula for the confidence interval for a population mean is: x t, which was

Confidence Intervals for One Mean

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

I. Chi-squared Distributions

5: Introduction to Estimation

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

STATISTICAL METHODS FOR BUSINESS

Sampling Distribution And Central Limit Theorem

Practice Problems for Test 3

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Math C067 Sampling Distributions

One-sample test of proportions

Confidence Intervals

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Chapter 7: Confidence Interval and Sample Size

PSYCHOLOGICAL STATISTICS

Determining the sample size

Statistical inference: example 1. Inferential Statistics

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

OMG! Excessive Texting Tied to Risky Teen Behaviors

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Measures of Spread and Boxplots Discrete Math, Section 9.4

Lesson 17 Pearson s Correlation Coefficient

Overview of some probability distributions.


Output Analysis (2, Chapters 10 &11 Law)

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

1 Computing the Standard Deviation of Sample Means

Properties of MLE: consistency, asymptotic normality. Fisher information.

LECTURE 13: Cross-validation

Normal Distribution.

Now here is the important step

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Hypergeometric Distributions

Quadrat Sampling in Population Ecology

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

3 Basic Definitions of Probability Theory

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Chapter 14 Nonparametric Statistics

Topic 5: Confidence Intervals (Chapter 9)

Soving Recurrence Relations

Confidence Intervals for Linear Regression Slope

Confidence intervals and hypothesis tests

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Central Limit Theorem and Its Applications to Baseball

CHAPTER 3 THE TIME VALUE OF MONEY

Descriptive Statistics

1 Correlation and Regression Analysis

% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%

Maximum Likelihood Estimators.

Lesson 15 ANOVA (analysis of variance)

Professional Networking

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Chapter 7 Methods of Finding Estimators

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Hypothesis testing using complex survey data

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

A Mathematical Perspective on Gambling

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

FM4 CREDIT AND BORROWING

Asymptotic Growth of Functions

Present Values, Investment Returns and Discount Rates

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

Rainbow options. A rainbow is an option on a basket that pays in its most common form, a nonequally

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Incremental calculation of weighted mean and variance

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Subject CT5 Contingencies Core Technical Syllabus

Building Blocks Problem Related to Harmonic Series

TIAA-CREF Wealth Management. Personalized, objective financial advice for every stage of life

INVESTMENT PERFORMANCE COUNCIL (IPC)

Theorems About Power Series

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Best of security and convenience

CHAPTER 3 DIGITAL CODING OF SIGNALS

Section 11.3: The Integral Test

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

National Institute on Aging. What Is A Nursing Home?

A Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets

The Stable Marriage Problem

Department of Computer Science, University of Otago

Transcription:

Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the mea years of educatio (startig with first grade) of Math/Stats teachers at two-year colleges. I record years of educatio for each teacher, calculate the sample mea ad stadard deviatio, ad use those to set up a 95% cofidece iterval for the populatio mea: x ± 2 s. (The sample is large eough that the t multiplier for 95% cofidece is approximately 2.) What do I claim about this iterval? I claim to be 95% cofidet that the mea years of educatio of all Math/Stats teachers at two-year colleges is i the iterval. 2. A set of 6 umbered cards represet how may phoe umbers each perso i a family of 6 kows by heart. I would like to produce a iterval estimate for the mea umber kow by all 6, based o iformatio from of the 6 family members who happe to be available. Assume the populatio stadard deviatio is 3. We pick cards at radom ad set up the iterval x ± 2 3 = x ± 3. What do I claim about this iterval? I claim to be 95% cofidet that the ukow populatio mea is i that iterval. 3. A set of 16 umbered cards represet the umber of filligs for a group of 16 studets. Suppose a detist visitig the school has time to examie 2 studets. Based o their mea umber of filligs, he is to test the hypothesis that the mea for the whole group is agaist the alterative that it is less tha. The test is to be carried out at the 0.05 level. We ll assume the stadard deviatio for all 16 studets is 3. Two cards are picked at radom from those 16, ad we calculate their mea x. The test statistic is x 3/. Oce we fid the test statistic is ot less tha 1.65, we do ot 2 reject the ull hypothesis at the 0.05 level. I fact, the mea of all 16 cards is, so the ull hypothesis is true. I the log ru, how ofte will it be rejected? I claim it will be rejected 5% of the time. Itroductio Our key iferece results i itroductory statistics cofidece itervals ad hypothesis tests about meas or proportios are costructed from what we claim to kow about the ceter, spread, ad shape of the samplig distributios. I fact, such claims must be made cautiously, because if certai criteria cocerig the backgroud of our data are ot met, the ceter, spread, ad shape of the samplig distributios are affected. We ca be specific about what aspects of our iterval or test are udermied, depedig o how the three key 1

coditios are violated. Furthermore, we ca carry out activities with studets to give them a cocrete feel for how improper samplig desig ca affect three specific aspects of our iferece claims. Because of limited time, we ll focus oly o quatitative variables. Claims The key results obtaied are summarized i the followig claims about the samplig distributio of sample mea: If the populatio for a quatitative variable has mea µ ad stadard deviatio σ, the (uder the right circumstaces) the distributio of sample mea for a sample of size has ceter: mea µ spread: stadard deviatio σ shape: approximately ormal These pave the way for us to make the followig iferece claims: If sampled values of a quatitative variable for a sample of size have mea x, ad the variable s stadard deviatio is kow to be σ, the a 95% cofidece iterval for µ is x±2 σ. To test the hypothesis H 0 : µ = µ 0, we stadardize x to z = x µ 0 σ/ ad decide whether to reject H 0 based o the probability of z this low, high, or differet from 0, depedig o the form of the alterative hypothesis. Caveats Studets ad, admittedly, we teachers are iclied to take this iformatio ad ru with it, although we do occasioally metio the followig caveats: The sample must be ubiased. Selectios must be idepedet; the populatio should be at least 10 times the sample size. The sample size must be large eough. However, to avoid the discomfort of assigig studets problems that reach a dead-ed, we ted to give them carte blache: assume the sample is radom, ad large eough to guaratee ormality through the Cetral Limit Theorem, ad that our selectios are idepedet. Ofte, because these assuraces start to become repetitive, we let studets assume that, by default, all the ecessary coditios are i place. Thus, i the iterest of guarateeig them situatios where they ca plug ito a hady formula with impuity, we deprive them of two valuable lessos: 2

1. There are direct cosequeces that lik each caveat with oe of the claims about ceter, spread, ad shape, respectively. Thus, there are meaigful coectios betwee what is leared about producig ad summarizig data, ad about samplig distributios, ad what is applied to perform iferece. These coectios are threads that tie together all the most importat material i a itroductory statistics course. 2. I may real-life situatios, the coditios eeded to justify the claims are ot met. Ujustified claims about ceter I geeral, bias ca arise i two ways, each of which ca udermie our claim that the mea of the distributio of sample meas equals the populatio mea. 1. Bias arisig from the samplig desig: the sample does ot truly represet the larger populatio of iterest. Do our teachers truly represet the etire populatio of math/stat teachers at two-year colleges? 2. Bias arisig from the study desig: the sampled values are assessed improperly, resultig i biased summaries. Note that this type of bias ca arise eve if the sampled idividuals are truly represetative of the larger populatio. Is askig teachers to report their years of educatio out loud a good way to obtai completely accurate data values? Are we really 95% cofidet that the mea years of educatio for the etire populatio of math/stat teachers at two-year colleges falls i the iterval we produced earlier? Ujustified claims about spread Whe we preset studets with the cocept of radom variables, we ofte metio rules for meas ad variaces: the mea of the sum of two radom variables equals the sum of the meas, ad the variace of the sum equals the sum of the variaces. But the latter oly holds if the two variables are idepedet. If there is depedece, the σ 2 X 1 +X 2 ca be substatially greater or less tha σ 2 X 1 + σ 2 X 2. The additivity of variaces lets us assert that the variace of X = 1 (X 1 +X 2 + +X ) is 1 + + σ 2 ) = 1 ) = σ2 2(σ2 2(σ2 ad that the stadard deviatio of X is the square root of this, σ. However, these claims break dow if the X i are depedet, a circumstace that ca arise i real-life samplig situatios, icludig whe we sample without replacemet from a populatio that is ot sufficietly large relative to the sample size. Thus, if we fail to satisfy the requiremet that the populatio be at least 10 times the sample size, the claimed stadard deviatio σ i our cofidece iterval formula is icorrect. Are we really 95% cofidet that the iterval x ± 2 3 = x ± 3 based o a sample of cards cotais the mea for all 6 cards? 3

Ujustified claims about shape Thaks to the Cetral Limit Theorem, we ca say that the distributio of sample mea for radom samples of size is approximately ormal if the uderlyig distributio itself is approximately ormal, or if the sample size is large eough to offset o-ormality i the uderlyig distributio. If the sample is too small ad the shape of the samplig distributio is ot ormal, the the multiplier 2 i our cofidece iterval is icorrect, because it is based o a ormal probability distributio. Likewise, our P-value i a hypothesis test would be icorrect. Earlier, we picked 2 cards at radom from 16 cards whose mea was ad stadard deviatio was 3, ad tested the true ull hypothesis H 0 : µ = agaist the alterative H a : µ < at the 0.05 level. Ca we really claim that H 0 will be rejected 5% of the time i the log ru? Isights through activities 1. A biased sample mea: Our activity today had to do with teachers years of educatio. You ca survey studets about aother variable that may produce biased resposes: ask them to report out loud how may classes they missed the week before. Note that bias ca occur here o two frots. First, the studets who are missig class that day are ot represeted i the survey! Secod, studets may ot feel comfortable tellig their istructor how may classes they missed. I both cases, the bias leads to our sample mea providig a uderestimate of the true populatio mea umber of classes missed. Alteratively, you ca ask studets to pick a umber at radom (off the top of their heads) from 1 to 20. We use the sample mea umber picked, x, ad the stadard deviatio of the umbers 1 to 20 (5.77) to costruct a 95% cofidece iterval for the ukow populatio mea, 10.5. Presumably we are 95% cofidet that the populatio mea falls i this iterval. However, such selectios which are actually haphazard, ot radom ted to be biased towards higher umbers. Our sample mea is a biased estimator for µ, ad our cofidece iterval is wrog. 2. A icorrect stadard deviatio: We sample cards without replacemet from a populatio with mea 5.5, stadard deviatio approximately 3. I the first case, we will sample from a populatio of 0 cards (10 times the sample size. I the secod case, we sample from 6 (the populatio is oly 50% larger tha the sample). I the third case, the populatio is the same size as the sample. To give our activity cotext, we ca say the values represet how may phoe umbers people have memorized. (a) Sample cards from 0: a ordiary deck mius all the face cards, so it cosists of each of the umbers 1 through 10. (b) Sample cards from 6: the umbers 1, 3, 5, 6, 8, 10. (c) Sample cards from : the umbers 2, 3, 8, 9.

I each case, we sample cards without replacemet at least 20 times from the give populatio, recordig the sample mea value each time. Assumig selectios are idepedet, the distributio of sample mea selectio has a mea of 5.5 ad a stadard 3 deviatio of = 1.5. We calculate the mea ad stadard deviatio of each group of sample meas. I all three cases, the mea of sample meas will i fact be approximately 5.5. I the first case, the stadard deviatio should coform well to the claimed value, 1.5. I the secod case, the stadard deviatio will be less. I the third case, it is of course zero! If we sample idividuals without replacemet from a populatio of 6 idividuals whose stadard deviatio is 3, ad use their sample mea to set up a cofidece iterval for the populatio mea, the iterval will be wider tha it should be (margi of error 2 3 = 3 istead of about 2(.8) = 1.6). We would claim that we are oly 95% cofidet that the populatio mea falls i that iterval, but because these sample meas do t stray as far from 5.5, our probability of capturig 5.5 i our cofidece iterval will be cosiderably higher tha 95%. I the extreme, if we sample from cards, our 95% cofidece iterval actually has a 100% success rate, because our sample mea is always 5.5 ad there is o margi of error at all. As far as a hypothesis test is cocered, we would calculate z to be x 5.5 3/, whereas the actual deomiator for cards sampled without replacemet from 6 is less tha 3/. Our calculated z is less extreme tha it should be, makig us vulerable to Type II Errors, failig to reject a false ull hypothesis. 3. A o-ormal samplig distributio: A right-skewed distributio of umbers is created from a deck of cards: use all of each of the umbers 1 ad 2, ad oly 1 each of the umbers 3 through 10. (This distributio models the umber of filligs i a group of people where most have just oe or two, but a few have more.) The populatio mea is ad the stadard deviatio is 3. Each studet picks 2 cards at radom from these 16 cards ad reports his or her sample mea. Each uses the sample mea to test H 0 : µ = vs. H a : µ < at the 0.05 level. The log-ru probability of rejectig, theoretically, is 0.05. However, the lowest possible sample mea is 1, which stadardizes to 1 3/ 2 = 1.1. A sample mea low eough to produce a P-value of 0.05 or less must fall below (to the left of) 1.65, but this is impossible. We ca take hudreds such samples, ad ever maage to reject H 0 at the 5% level. I geeral, our P-value based o z probabilities is icorrect if we are stadardizig a sample mea or sample proportio for a small sample take from a skewed populatio. Summarizig claims ad caveats i cofidece itervals A 95% cofidece iterval for µ is costructed as x ± 2 σ 5

If there is bias arisig from a poor samplig or study desig, the the correct iterval is ot cetered at x. If there is depedece i our sampled idividuals (such as whe we sample without replacemet from a relatively small populatio), the the stadard deviatio σ is icorrect. If the sample size is ot large eough to offset o-ormality i the shape of the populatio, the the multiplier 2 is icorrect. [Note 1: the same problems arise if we stadardize with s ad base our cofidece iterval o a t distributio. Note 2: Aother source of samplig depedece would be if the idividuals are aware of each other s resposes, ad ted to adjust their ow resposes accordigly. Yet aother is the use of a two-sample procedure for paired data. I particular, the claimed formula for the stadard deviatio of the differce betwee sample meas is udermied if the variables X 1 ad X 2 are depedet via a paired desig.] 6