Hypothesis testing. Null and alternative hypotheses



Similar documents
1. C. The formula for the confidence interval for a population mean is: x t, which was

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

I. Chi-squared Distributions

One-sample test of proportions

Math C067 Sampling Distributions

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Determining the sample size

Practice Problems for Test 3

PSYCHOLOGICAL STATISTICS

5: Introduction to Estimation

Output Analysis (2, Chapters 10 &11 Law)

Sampling Distribution And Central Limit Theorem

Case Study. Normal and t Distributions. Density Plot. Normal Distributions


1 Computing the Standard Deviation of Sample Means

Statistical inference: example 1. Inferential Statistics

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Confidence intervals and hypothesis tests

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

OMG! Excessive Texting Tied to Risky Teen Behaviors

Lesson 17 Pearson s Correlation Coefficient

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Chapter 7: Confidence Interval and Sample Size

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Confidence Intervals

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Lesson 15 ANOVA (analysis of variance)

Chapter 14 Nonparametric Statistics

Confidence Intervals for One Mean

1 Correlation and Regression Analysis

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Chapter 7 Methods of Finding Estimators

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Measures of Spread and Boxplots Discrete Math, Section 9.4

Hypergeometric Distributions

A Mathematical Perspective on Gambling

Normal Distribution.

Properties of MLE: consistency, asymptotic normality. Fisher information.

Soving Recurrence Relations

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Central Limit Theorem and Its Applications to Baseball

Maximum Likelihood Estimators.

A probabilistic proof of a binomial identity

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Quadrat Sampling in Population Ecology

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Topic 5: Confidence Intervals (Chapter 9)

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Sequences and Series

CHAPTER 3 DIGITAL CODING OF SIGNALS

LECTURE 13: Cross-validation

Predictive Modeling Data. in the ACT Electronic Student Record

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

A Recursive Formula for Moments of a Binomial Distribution

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Section 11.3: The Integral Test

3. Greatest Common Divisor - Least Common Multiple

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Department of Computer Science, University of Otago

The Forgotten Middle. research readiness results. Executive Summary

Overview of some probability distributions.

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

7.1 Finding Rational Solutions of Polynomial Equations

Hypothesis testing using complex survey data

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Solving Logarithms and Exponential Equations

Chapter 5: Basic Linear Regression

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Lecture 2: Karger s Min Cut Algorithm

Modified Line Search Method for Global Optimization

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Incremental calculation of weighted mean and variance

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

The Stable Marriage Problem

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

1 The Gaussian channel

How to use what you OWN to reduce what you OWE

Transcription:

Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate that the populatio mea is equal to some specified value ad the use sample iformatio to decide whether the hypothetical value ca be rejected or ot i the light of sample evidece. The decisio will deped o (1) the size of the differece betwee the hypothetical populatio mea ad the sample mea, () the size of the samplig error associated with the sample mea, ad (3) the degree of certaity the decisio-maker requires before rejectig the iitial hypothesis. Null ad alterative hypotheses First we set up what is kow as the ull hypothesis, H 0, about the populatio parameter, e.g. we may claim that the populatio mea µ is equal to some value µ 0, say. This is usually writte as H 0 :µ=µ 0. We the stipulate a alterative hypothesis, H 1, which may state, e.g., that the populatio mea is ot equal to µ 0, H 1 :µ µ 0. The purpose of hypothesis testig is to see if we have sufficiet evidece to reject the ull hypothesis. Typically, the ull hypothesis says that there is othig uusual or importat about the data we are cosiderig; for example, if we were lookig at the average test scores of childre who have received a particular teachig method, the ull hypothesis would be that the mea is equal to the atioal average. If we are testig a ew drug, ad are lookig at the proportio of people takig the drug whose coditio improves, we would take as our ull the proportio who improve with a placebo, or with a previous drug. If we are lookig for a relatioship betwee two variables, the ull hypothesis is usually that there is o relatioship, that is that the regressio coefficiet betwee them is 0. The alterative hypothesis is thus that there is somethig iterestig or differet about the populatio for example that the average test score from the ew teachig method is ot equal to the atioal average, or that the proportio who improve with the ew drug is ot equal to the previous rate, or that there is a relatioship betwee the two variables, so that the regressio coefficiet is ot equal to 0.

We treat H 0 as our default positio, ad we usually require quite strog evidece to reject the ull hypothesis typically 90%, 95% or 99%, depedig o the cotext. Test statistic Havig set up our ull ad alterative hypotheses, we look for a suitable test statistic that will give us evidece for or agaist the two hypotheses. For example, if we are lookig for evidece about the populatio mea (H 0 :µ=µ 0 vs. H 1 :µ µ 0 ), we will most likely use a statistic based o the sample mea, X. From our work i sectio 4, a suitable statistic (assumig we ow the stadard deviatio σ of the populatio) is X µ 0 Z = - that is, we measure X -µ 0 i terms of the Stadard Error ( σ / ) of X as a estimator for µ, which is equal to σ/. For large samples, 30, we kow that the distributio of X is ormal, so that Z will be a stadard ormal variable, that is Z N(0,1). The larger is ( X -µ 0 ), the bigger is Z, ad the less credible it is that H 0 is correct. So essetially what we are tryig to do is to measure whether the sample mea, X, is sigificatly differet from µ 0. Decisio rule We ow have to decide how large Z must be for us to reject H 0. This is related to the risk we are prepared to take of a icorrect decisio. I decidig whether to accept or reject a ull hypothesis, there are two types of error we may make: A Type 1 error is to reject the ull hypothesis whe it is correct. A Type error is to accept the ull hypothesis whe it is icorrect. We usually specify our decisio rule i terms of the probability of a type 1 error we are prepared to accept, deoted α. Depedig o α, we ca calculate critical values of the test statistic Z, so that if Z lies beyod the critical values, we reject H 0, while if Z lies withi the critical values, we accept H 0. Thus, i the case of the populatio mea, if our acceptable level of Type 1 error is α=0.05, the the critical values of the test statistic will be

Z=±1.96, sice we kow from sectio 4 that, if H 0 is true ad µ=µ 0, the P(-1.96<Z<1.96)=0.95. Hece we kow that, if µ=µ 0, there would be a less tha 5% probability of obtaiig a value of greater tha 1.96 or less tha -1.96, so that the probability of a type 1 error i rejectig H 0 is less tha 5%. If we obtai a value of Z betwee the critical values, we coclude that we do ot have sufficiet evidece to reject H 0, so we accept it. The acceptable probability of Type 1 error is also called the sigificace level of the test. If, say, α=5%, ad we reject H 0, we will say that we reject H 0 at the 5% level of sigificace, or that X is sigificatly differet from µ 0 at the 5% level of sigificace, etc. Thus, we set up our decisio rule to give H 0 the beefit of the doubt. We require 95% cofidece to reject it. Note agai that if we reject the ull hypothesis, we are ot sayig there is a 95% probability that µ µ 0. µ is a costat which either is equal to µ 0 or it is t. What we are sayig is that, if µ were equal to µ 0, there would be a 95% chace of obtaiig a test statistic betwee the critical values. Oly 5% of the time would we obtai a value for Z that would lead us to reject H 0. Hece P(Reject H 0 H 0 true) 0.05. Note that if we were prepared to accept a Type 1 error probability of 10%, we would set our critical values at Z=±1.645, while if we were oly prepared to accept a 1% Type 1 error, we would set critical values of Z=±.58. Power of a test The power of a hypothesis test is the probability β of a Type error. Give two tests of a hypothesis H 0, we say that oe test is more powerful tha the other if, give a specified level of Type 1 error, it has a lower probability of Type error. Example Suppose we kow that average household icome i the populatio is 300 p.w., with stadard deviatio 50 per week. We are tryig to see whether households i a particular tow have a higher or lower average icome. We take a radom sample of 100 households i the tow, ad fid a average icome of 85 p.w. We wish to test the hypothesis that

average household icome i the tow is equal to the atioal average, with a 5% level of sigificace. Here H 0 is µ= 300, ad H 1 is µ 300. X µ 0 Our test statistic is Z=, with µ 0 =300, σ=50, ad =100. From the ( σ / ) sample, X =85. Hece, Z=(85-300)/(50/ 100) = -15/5 = -3. Give a 5% sigificace level, the critical values of the Z statistic are ±1.96. Our decisio rule is to accept H 0 if -1.96<Z<1.96, ad reject H 0 otherwise. Hece, we reject H 0, ad coclude that µ 300. I fact, we may coclude that the average household icome i this tow is sigificatly less tha the atioal average, at the 5% (or ideed at the 1%) level of sigificace. Two-tailed ad oe-tailed tests The example above ivolved a two-tailed test of sigificace that is, we were tryig to see if X was sigificatly higher or sigificatly lower tha µ 0. That is, H 1 was specified as µ µ 0. I a oe-tailed test, the alterative hypothesis is H 1 :µ>µ 0, or Hµ<µ 0. This would be appropriate if we had some a priori reaso to believe that we were likely to fid a differece i a particular directio. For example, if we were tryig to see if graduates have the same icome as the rest of the populatio, we might use a 1-tailed test, as we would aturally assume that graduates ted to ejoy a higher icome, so H 1 would be that µ>µ 0, where µ is graduate average icome, ad µ 0 is the average for the whole populatio. Whe we use a 1-tailed test, the critical value of Z is differet. For example, at the 5% level of sigificace, we would use a critical value for Z of 1.645, istead of ±1.96, sice P(Z>1.645 H 0 )=5%. (Hece ±1.645 as the 10% critical value for a -tailed test, sice P(Z<-1.645 H 0 ) is also 5%, so we have 5% i each tail.) If our alterative hypothesis were µ<µ 0, the our critical value would be Z=-1.645, rejectig H 0 if Z falls below this.

1-tailed vs. Two-tailed test f(z).5%.5%.5% -1.96 0 1.645 1.96 Z Proportios The procedure ad ratioale for testig hypotheses about populatio proportios are similar to those used for meas. They are based o the ormal distributio ad apply to large samples, 30. The ull hypothesis is specified i terms of the populatio proportio P, ad the sample proportio, p, ad the stadard error, SE(p)=( P(1-P))/ are used i the test statistic. For example, suppose we wish to test the ull hypothesis that the proportio of households i a certai tow with at least oe wage-earer is 0.85. We have a radom sample of 100 households, ad the proportio of the sample with at least oe wageearer is p=0.81. We have H 0 : P=P 0 =0.85 H 1 :P 0.85. Z = P p P 1 P ) 0 ( 0 0 =. 81.85.85 *.15 100 = -.04/.0357 = -1.10. Note that we use the stadard error calculated from the populatio proportio based o the ull hypothesis this is because we are tryig to say If the ull hypothesis were true, how likely would it be to get this

much differece betwee the sample proportio ad populatio proportio?. So we cosider the probability distributio of the test statistic that would apply if the ull hypothesis were true. As 1.10<1.96, the r% level of sigificace -tailed critical value of the Z statistic, we caot reject H 0, i other words the sample proportio is ot sigificatly differet from 0.85 (at the 5% level). We therefore accept H 0. Differece betwee two sample meas So far we have made ifereces o a sigle sample. Now we shall make ifereces from two samples. Typically we shall have two radom samples from two populatios ad we shall be makig ifereces about the differeces betwee the meas of the two populatios usig the differece betwee the two sample meas. For example, we may be iterested i testig whether boys are achievig sigificatly differet results i school tha girls. To be able to aswer such a questio, we first eed to study the samplig distributio of the differece betwee two sample meas. If a radom sample of size 1 is take from oe populatio with mea µ 1 ad variace σ 1, ad aother radom sample of size is take from aother populatio with mea µ ad variace σ, the differece betwee the two sample meas is defied as d=( X 1 X ) where X 1 ad X are idepedet radom variables because they will ot vary from oe set of two samples to aother, ad because chages i X 1 are ot iflueced by chages i X ad vice-versa. E(d) = E( X 1- X ) = E( X 1)-E( X ) = µ 1 -µ = D. i.e. the sample differece (d) is a ubiased estimator of the populatio differece D. Var(d) = Var( X 1 X ) = Var( X 1) + Var( X ) = (σ 1 / 1 ) + (σ+ / ) Sice X 1 ad X are idepedet.

σ The stadard error of d is give by SE(d)= 1 σ + ad shows that the larger are the two variaces ad the smaller the sample sizes, the larger will be the samplig error of d. If X 1 ad X are ormally distributed, the X 1 ad X are also ormally distributed. Also, if both samples are large ( 1, 30), the eve if X ad X are ot ormally distributed, the Cetral Limit Theorem esures that X 1 ad X will be approximately ormally distributed. If either of these is true, the d will also be ormally distributed, as the differece betwee two ormal variables. Thus, σ d=( X 1 X ) N[(µ 1 -µ ), 1 σ + ] The cofidece iterval for the differece betwee the populatio meas ca ow be easily calculated. The 95% cofidece iterval is (µ 1 -µ ) = ( X 1 X ) ±1.96 σ 1 σ + The calculated cofidece iterval will cotai the true populatio differece i 95% of samples. Hece, the hypothesis test for the populatio differece ca also be performed i the usual maer. Let H 0 : µ 1 -µ =0, ad H 1 :µ 1 -µ 0. The test statistic is Z = ( X1 X ) 0, σ σ + 1 ad the decisio rule, for a 5% sigificace level, will be to reject H 0 if Z 1.96, otherwise accept H 0. Example A school wats to fid out if there is a differece i test performace betwee boys ad girls. A sample of test scores of 60 boys ad 50 girls is

examied. It is foud that the boys have sample mea X 1=54 with stadard deviatio 14, ad the girls have sample mea X =60, with stadard deviatio 9. NB: we shall igore for ow the problem of estimatig the populatio stadard deviatios, ad assume these figures are correct. We set up H 0 : X 1- X =0 H 1 : X 1- X 0. Our test statistic is ( X 1 X ) 0 σ 1 σ + = 54 60 14 60 9 + 50 = -6/ (4.68) = -1.8. As usual, for a 5% level of sigificace o a two-tailed test, our critical value for Z is ±1.96, so we do ot have sufficiet evidece to reject the ull hypothesis. Girls are doig better, but ot sigificatly better. Differece betwee two sample proportios This ca be tested i a similar maer. Exercise Two differet teachig methods are tried with differet groups of studets o the same course. I the first group, 47 out of 63 studets pass. I the secod group, 66 out of 78 pass. The departmet wats to work out whether oe teachig method is sigificatly better tha the other. Formulate suitable ull ad alterative hypotheses, ad calculate a suitable test statistic, to test this.