Hypothesis Tests Applied to Means

Similar documents
Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Hypothesis testing. Null and alternative hypotheses

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

1. C. The formula for the confidence interval for a population mean is: x t, which was

PSYCHOLOGICAL STATISTICS

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

I. Chi-squared Distributions

Determining the sample size

Confidence Intervals for One Mean

Chapter 14 Nonparametric Statistics

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

One-sample test of proportions

5: Introduction to Estimation

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Lesson 17 Pearson s Correlation Coefficient

Math C067 Sampling Distributions

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Output Analysis (2, Chapters 10 &11 Law)

Confidence intervals and hypothesis tests

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

1 Computing the Standard Deviation of Sample Means

Statistical inference: example 1. Inferential Statistics

Quadrat Sampling in Population Ecology

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Confidence Intervals

Lesson 15 ANOVA (analysis of variance)

Measures of Spread and Boxplots Discrete Math, Section 9.4

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Properties of MLE: consistency, asymptotic normality. Fisher information.

Practice Problems for Test 3

1 Correlation and Regression Analysis

Sampling Distribution And Central Limit Theorem

Chapter 7: Confidence Interval and Sample Size


Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Normal Distribution.

Chapter 7 Methods of Finding Estimators

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Incremental calculation of weighted mean and variance

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Modified Line Search Method for Global Optimization

CHAPTER 3 DIGITAL CODING OF SIGNALS

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Hypergeometric Distributions

Hypothesis testing using complex survey data

OMG! Excessive Texting Tied to Risky Teen Behaviors

Section 11.3: The Integral Test

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Maximum Likelihood Estimators.

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

STATISTICAL METHODS FOR BUSINESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Soving Recurrence Relations

3. Greatest Common Divisor - Least Common Multiple

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Asymptotic Growth of Functions

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Topic 5: Confidence Intervals (Chapter 9)

THE TWO-VARIABLE LINEAR REGRESSION MODEL

CHAPTER 3 THE TIME VALUE OF MONEY

Solving Logarithms and Exponential Equations

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

A probabilistic proof of a binomial identity

A Guide to the Pricing Conventions of SFE Interest Rate Products

Sequences and Series

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Overview of some probability distributions.

Now here is the important step

A Mathematical Perspective on Gambling

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

LECTURE 13: Cross-validation

MARTINGALES AND A BASIC APPLICATION

INVESTMENT PERFORMANCE COUNCIL (IPC)

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

3 Basic Definitions of Probability Theory

CHAPTER 11 Financial mathematics

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Confidence Intervals for Linear Regression Slope

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Chapter 5: Inner Product Spaces

Mathematical goals. Starting points. Materials required. Time needed

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

3 Energy Non-Flow Energy Equation (NFEE) Internal Energy. MECH 225 Engineering Science 2

Estimating Probability Distributions by Observing Betting Practices

Central Limit Theorem and Its Applications to Baseball

Transcription:

The Samplig Distributio of the Mea Hypothesis Tests Applied to Meas Recall that the samplig distributio of the mea is the distributio of sample meas that would be obtaied from a particular populatio (with mea µ ad stadard deviatio ), if a ifiite umber of radom samples were draw from the populatio ad the mea from each of these samples was calculated. By the Cetral Limit Theorem (CLT), for ay give populatio, (with mea µ ad stadard deviatio ) the distributio of sample meas for a particular sample size,, approaches a ormal distributio, with a mea idetical to that of the populatio (i.e. µ ) ad a stadard deviatio of, as the sample size gets larger ad approaches ifiity. The CLT tells us that what the mea ad stadard deviatio will be for ay give sample size ad also tells us that the shape of the samplig distributio of the mea approaches ormality as the sample size icrease, regardless of the shape of the populatio that the sample mea is assumed to come from. Hypothesis Testig whe the Stadard Deviatio of the Populatio () is Kow I the example provided earlier (i.e. the high school couselor who wated to determie if the course he developed actually icreased SAT scores) the stadard deviatio of the populatio was kow. This is ofte the case whe usig stadardized tests as a depedet variable i a research study because such tests are give to a represetative radom sample of the populatio to determie orms. Coceptually, testig whether the mea of a sample (that was likely provided some sort of treatmet) differs from the mea of the populatio is o differet tha testig how likely it is to have obtaied a particular observatio from a populatio. However, we eed to use the stadard deviatio of the samplig distributio of the mea, kows as the stadard error of the mea or simply the stadard error. Example Dr. Frost is a researcher i a large school district that aually evaluates their studets academic progress i mathematics usig the Iowa Test of Basic Skills (ITBS). I the testig maual for the ITBS she discovers that the average performace of sixth grade studets i the atio o this test (i.e. the orm) is 85 ad that the stadard deviatio of scores i the atio is 0. She wats to kow if studets i her school district are performig at the same level. How could she go about testig this? What if the stadard deviatio of scores i the atio was oly 30? What if it was oly 0? Techically, or mathematically, oe could always use the stadard error of the mea because whe, it simply represets the stadard deviatio of the populatio.

Cofidece itervals are aother way to covey the results of a research study. The mea obtaied from a sample is a ubiased estimate of the mea i the populatio ad ca be cosidered a poit estimate of the mea of the populatio from which the sample was draw. Poit estimates use a sigle umber to estimate a ukow quatity. Cofidece itervals use a rage of values to estimate a ukow quatity. I the case of estimatig the mea of the populatio from which a particular sample was draw, a cofidece iterval represets a rage of possible populatio meas that may correspod to the give sample. I geeral, a cofidece iterval is obtaied by usig the sample statistic, its associated stadard error, ad the critical values associated with Type I error rate you are willig to risk. The critical values are the values of the test statistic that must be exceeded to reject the ull hypothesis. For example, if we are willig to risk makig a Type I error 5% of the time (i.e. α 0.05) the the critical value for coductig a z-test (used whe is kow) is ±.96. Note that cofidece itervals always use the critical value associated with a two-tailed test. We use these values to solve for the populatio parameter we are tryig to estimate. I the case of a z-test: z. 05 µ ±. 96 µ So a 95% cofidece iterval would be expressed by ±.96( ). What would a 95% cofidece iterval be for the true mathematics achievemet test scores for the sixth grade studets i Dr. Frost s school district? I geeral, a (-α) cofidece iterval for a z-test is obtaied by: ± z (α/) ( ) sample mea ± z (α/) (stadard error of the mea) What would a 90% cofidece iterval be for the true mathematics achievemet test scores for the sixth grade studets i Dr. Frost s school district? What would a 99% cofidece iterval be? Eve more geerally, a (-α) cofidece iterval for ay statistical test is obtaied by: sample estimate ± critical value (α/) (stadard error of the estimate) Hypothesis Testig whe the Stadard Deviatio of the Populatio () is NOT Kow Hypothesis Testig with Oe Sample: The Oe-Sample t Test I most practical research situatios that do ot utilize a stadardized test as a depedet variable, the stadard deviatio of the depedet variable i the populatio is ukow. Therefore, it must be estimated from the sample. Obviously, similar to the fact that the mea obtaied from ay oe sample is oe of a ifiite umber of sample meas that could have bee obtaied had aother sample bee

3 draw, the stadard deviatio obtaied from ay oe sample is oe of a ifiite umber of possible stadard deviatios that could have bee obtaied had aother sample bee draw. The samplig distributio of the stadard deviatio is coceptually the same as the samplig distributio of the mea. However, this distributio is ot ormal, like the samplig distributio of the mea is, rather it is positively skewed, ad the smaller the sample size the more positively skewed it is. Therefore, the stadard deviatio obtaied from ay oe sample is more likely to uderestimate the true stadard deviatio i the populatio, rather tha overestimate it. This is especially true for smaller observatios. As we saw earlier, the magitude of the stadard deviatio of the populatio has a direct effect o the magitude of the test statistic that will be obtaied. The larger the stadard deviatio is i the populatio the more difficult it is to detect differeces betwee a sample mea ad the populatio mea. This makes sese coceptually, because if a distributio is more variable tha there is a larger rage of likely values that ca be obtaied from ay oe sample. O the other had, the smaller the stadard deviatio i the sample is the easier it is to detect differeces betwee the sample mea ad the populatio mea. This also makes sese coceptually, because if a distributio is less variable tha there is a smaller rage of likely values that ca be obtaied from ay oe sample. Therefore, whe usig a estimate of the stadard deviatio i the populatio from a sample we must accout for the fact that we are more likely to reject a ull hypothesis that is true (i.e. commit a Type I error). This is true because we kow that smaller stadard deviatios lead to larger test statistics which are less likely to occur uder the ull hypothesis AND we kow that the stadard deviatio we obtai from ay oe sample is likely to uderestimate the true stadard deviatio i the populatio. The t-distributio ca be used istead of the stadard ormal distributio (i.e. z- distributio) whe we do ot kow the value of the populatio stadard deviatio ad we wat to test if a sample mea differs from some hypothesized populatio if the sample was draw from a ormal populatio OR the sample size is large eough to assume that the samplig distributio of the mea is ormal. I geeral, the t-distributio has fatter tails ad fewer values ear the mea. As sample size icreases, the t-distributio more closely resembles the stadard ormal distributio. Similar to the ormal distributio, this distributio represets a family of distributios. However, whereas the ormal distributio differs depedig o the mea ad stadard deviatio i the populatio the all t-distributios are all assumed to be stadardized, with a mea of zero ad a stadard deviatio of oe, ad differ depedig o sample size. The t-distributio that should be used depeds o the degrees of freedom. For the t- distributio the degrees of freedom are the umber of observatios whose values could be chaged if the mea must remai costat which is because the mea is used to estimate the stadard deviatio i the populatio. Techically, the t-statistic is almost idetical to the z-statistic, except the stadard deviatio estimated from the sample replaces the populatio stadard deviatio that was previously assumed to be kow.

4 t µ s µ s A 95% cofidece iterval for the true populatio mea associated with the sample draw is also calculated similarly ad ca be expressed by: ± t s ) Example (. 05 Professor Dyett, the Health Director at Marquette Uiversity, believes that studets at Marquette are very health coscious ad therefore cosume less sugar tha most people i the U.S. She kows that the average perso i the U.S. cosumes about 00 lbs. of raw sugar per year, mostly i the form of soft driks, cadies, ad pastries. Although she believes studets at Marquette cosume less sugar tha ormal she also wats to kow if they are cosumig more sugar tha ormal. To test her belief she radomly samples 5 studets ad asks them to log their sugar cosumptio for the ext year. She fids that the average sugar cosumptio i her sample is 80. lbs with a stadard deviatio of 35.5 lbs. What ca she coclude? Use both a poit estimate to test her hypothesis ad a iterval estimate. Hypothesis Testig with Two Idepedet Samples: The Idepedet Samples t-test Up util this poit we have oly cosidered sigle-sample techiques. However the more iterestig research questios typically ivolve two or more samples. For example, oe might wat to compare whether studets taught usig oe method score higher o a test tha those taught usig aother method. As aother example oe might wat to compare lefthaded people ad right-haded people to see if oe group is more creative tha the other. Wheever two idepedet samples are compared it is likely that meas obtaied from the two groups will differ by some amout due to chace aloe. Therefore, we eed to determie if the differece is large eough to coclude that the two samples are from two differet populatios. I geeral, the commo form of the test statistics that deal with hypotheses about meas ca be thought of as a ratio of the () differece betwee iformatio obtaied from the sample ad that assumed to be true i the populatio uder the ull hypothesis ad () the differece that would be expected to occur by chace aloe. Because we have two differet groups we eed symbolism to differetiate betwee the populatio parameters ad sample statistics obtaied from each of the groups so we use µ to represet the populatio mea from oe of the groups ad µ to represet the populatio

5 mea from the other group. Similarly, we use to represet the sample mea from oe of the groups ad to represet the sample mea from the other group. I additio, it is possible that the two samples are comprised of a differet umber of observatios so we use to represet the sample size of oe sample ad to represet the sample size of the other sample It is also possible that the two samples come from populatios that differ i their variability so we use to represet the variace for the populatio that oe of the groups is from ad to represet the stadard deviatio for the populatio that oe of the groups is from. We use to represet the variace for the populatio that the other group is from ad to represet the stadard deviatio for the populatio that oe of the groups is from. Similarly we use to represet the estimate of variace from oe of the samples ad s to represet s the estimate of the stadard deviatio from oe of the samples. We use s to represet the estimate of variace from the other sample ad s to represet the estimate of the stadard deviatio from the other sample. The ull hypothesis that we are testig i this case is: H 0 : µ µ or, equivaletly, H 0 : µ µ 0. To test this hypothesis we have to cosider the samplig distributio of the differece betwee meas. This distributio is approximately ormally distributed with a mea of µ µ. The stadard deviatio of this distributio is the stadard error of the differece betwee meas ad ca be determied by utilizig the law of variaces that states that the variace of a sum or differece of two idepedet variables is simply the sum of the two variaces. It should be oted that this is oly true if the two variables are idepedet. Therefore, sice the variace of the first samplig distributio of meas is ad the variace of the secod samplig distributio of meas is samplig distributio of the differece betwee the two meas is simply stadard deviatio of this distributio is equivalet to. the the variace of the ad the. Note that this is ot mathematically If we kew the two populatio variaces (recall the variace is simply the square of the stadard deviatio, which is typically reported rather tha the variace) we could simply calculate a z-score usig the observed differece betwee the meas i our sample, the hypothesized differece betwee the meas i the populatio (typically assumed to be zero) ad the stadard error of the differece betwee meas. Specifically,

6 ( z ) ( µ µ ) ( ) 0 However, we typically do ot kow the two populatio variaces so we have to estimate them from our samples. The problem with simple replacig with s ad with s is that we do t kow the exact samplig distributio of a t-statistic that is calculated i this way. I other words, we do t kow the degrees of freedom to use if we simply replace with s ad with s to calculate a test statistic. Oe way to get aroud this problem is to assume that the two populatios that the two samples are draw from have equal variaces, (i.e. ). This assumptio is kow as the homogeeity of variace assumptio ad if it is met our problem is solved because we do kow the samplig distributio of a test statistic that is calculated uder this assumptio. Specifically, we kow that it follows a t-distributio with degrees of freedom. If this assumptio is met the the sample variaces should be similar. Typically, for small samples ( < 0), if oe of the sample variaces is more tha 4 times larger tha the other tha the assumptio has bee violated. For larger samples, oe should be cocered if oe variace is more tha twice as large as the other. Oe ca also coduct a statistical test, kow as Levee s test, to see if this assumptio has bee met. The test computes the differece betwee the observed score i each group with its absolute deviatio from the mea (i.e., where i deotes the particular ij j observatio ad j deotes the group) ad coducts a t-test for idepedet samples o the differece scores for the two groups. However, if make this assumptio the we have a slightly differet problem to solve i that we have two estimates of the populatio variace, s ad s which oe should we use? Ituitively, you might thik that we should simply average the two estimates but this ca oly be doe if both groups have the same umber of observatios (i.e. equal sample sizes, ) because if the sample sizes are ot equal the the two estimates will ot be equally represetative of the variability i the populatio. Specifically, we kow that a larger sample should produce a more stable estimate of the populatio variace tha a smaller sample so we use weighted average of the two estimates to obtai a pooled variace estimate. This estimate gives more weight to the estimate of the variace that is obtaied form the larger sample ad is expressed by: s P ( ) s ( ) s It is this pooled estimate that should be used whe calculatig a test statistic to determie if the meas obtaied from two idepedet samples are similar eough to coclude that they

7 may have bee draw from the same populatio. This test statistic follows a t-distributio with degrees of freedom. Two degrees of freedom are lost because two sample meas are utilized whe estimatig the two sample stadard deviatio. Specifically: t s P s P s sp A 95% cofidece iterval for the differece betwee two mea obtaied from idepedet samples is expressed by: ( ) ± t ( ) Example.05 s A researcher was iterested i determiig if readig comprehesio of dyslexic childre was the same uder ormal ad reduced visual cotrast. So she radomly assiged 4 childre to oe of twp groups. Oe group was give a readig comprehesio test usig ormal text ad aother group was give a readig comprehesio test usig text that was covered by a plastic sheet desiged to reduce the visual cotrast of the text. However five childre i the secod group were uable to participate. The average readig comprehesio test scores for the first group was 75.0 ( 00) while the average readig comprehesio test scores for the secod group was 56.7 ( s s 33.3). What should she coclude? Use both poit ad iterval estimates. The fact that meas obtaied from two groups are foud to statistically differ from each other does ot mea that they differ from each other i a meaigful way, kow as the effect size. Cohes d is a effect size measure that expresses the differece betwee two meas i terms of stadard deviatio uits. Specifically, it ca be estimated by: d s P What is the effect size measure from the study described above? If the homogeeity of variace assumptio has bee violated The Welsh-Satterthwaite solutio ca be used which tries to estimate the appropriate degrees of freedom (df) for the test statistic. It is based o the idea that the true df for the t-statistic must fall somewhere betwee the smaller of either ad ad. Sice the magitude of the critical t-value is larger with a smaller umber of df (makig it more difficult to reject the

8 ull) we ca use the smallest umber of degrees of freedom to coduct the statistical test. If this test is sigificat the it would surely be sigificat if a larger umber of df were used. If it is ot the the correct umber of df ca be calculated usig the formula i your book. The stadard t-test for idepedet samples (i.e. assumig homogeeity of variace ad that the sample is draw from a populatio that is ormally distributed, which implies the samplig distributio of the differece betwee the meas will be ormally distributed) is said to be robust to violatios of the assumptios, especially for equal sample sizes. I other words, moderate violatios of the assumptios will ot drastically affect the magitude of the test statistic obtaied. However, if oe believes that the assumptios have bee drastically violated they should cosider alterate statistical tests. Depedet or Paired Sample t test Sometimes we may wat to determie whether or ot there has bee some chage i our sample over time, i which case we would collect data from respodets o more tha oe occasio, kow as repeated measures desig. Whe this desig is used it is importat to esure that a subject s secod respose is ot affected by their first respose or that time is ifluecig performace i subjects. Other times we wat to determie if two related samples that ca be matched up i some way, such as data obtaied from spouses or twis, differ, kow as a matched sample desig. This type of desig ca also be used to match subjects o some extraeous variable a researcher wishes to cotrol for, such as IQ. This helps to esure that differeces foud i the depedet variable for the two groups are ot caused by differeces i the extraeous variable. I either of these desigs a paired sample t-test is the appropriate statistical test to coduct The advatage of usig either of these research desigs is to miimize the possibility that the subjects i oe group are substatially differet from the other to begi with (before treatmet) which would bias the results. Similar to the t-test for idepedet samples, the ull hypothesis that we are testig i this case is: H 0 : µ µ or, equivaletly, H 0 : µ µ 0. However the t-test for depedet samples is coducted o differece scores which simply represet the differece betwee scores obtaied at the two differet poits i time or betwee the two matched respodets. If there is truly o differece betwee the two depedet samples tha the average of differece scores would be expected to be zero. This reduces the complexity of our test because our data is reduced to oe observatio per perso/matched pair so we are really testig a hypothesis usig oe sample of data. The ull hypothesis that we are testig ca be reduced to: H 0 : µ D µ µ 0. The test statistic i this case ca be expressed by: D µ D D 0 t sd sd ad a cofidece iterval is expressed by: D ± t s ). 05( D

9 The degrees of freedom for our test statistic are the same as they were for the oe-sample t- test ad represet the umber of pairs of observatios, or. Example Dr. Fredrick believes that the eviromet is more importat tha geetics i ifluecig itelligece. He locates pairs of idetical twis that have bee reared apart, where oe twi has bee raised i a eriched eviromet ad the other twi has bee raised i a impoverished eviromet, ad admiisters a stadardized IQ test to each twi. He obtais the followig data: Pair Eriched Impoverished Eviromet Eviromet Differece 00 0-95 93 3 6 6 4 07 0-3 5 85 75 0 6 96 00-4 7 35 4 8 0 0 0 9 08 00 8 0 90 88 00 90 0 3 08 5 The mea of the observed differece scores is 3.75 ad the stadard deviatio of these scores is 5.345. What is the ull hypothesis ad what does this mea from a substative perspective? How does this relate to his research hypothesis? Coduct a depedet sample t-test. Ca Dr. Frederick coclude that the eviromet is more importat tha geetics from the results of his study? Why or why ot?