Introduction to Hypothesis Testing. Introduction to Hypothesis Testing

Introduction to Hypothesis Testing Decision Examples TRUE STATE DECISION Innocent Guilty Declare innocent correct decision ERROR Declare guilty ERROR correct decision How can the jury avoid Convicting an innocent person? Freeing a guilty person? Is one kind of error worse than another? What does the instruction, Innocent unless the evidence proves guilt beyond a reasonable doubt, suggest about how our system, in theory, balances the two? Introduction to Hypothesis Testing Decision Examples DECISION Decide no Prostate Cancer TRUE STATE No Prostate Cancer correct decision Prostate Cancer ERROR Biopsy ERROR correct decision The evidence comes from a painless blood test PSA <4. or less is considered normal, cancer-free PSA > 4. can be caused by infection or by cancer of the prostrate Urologists disagree on how much above 4. or for how long above 4. the PSA should be to call for a biopsy, and on how age of the patient should influence the decision Note, disagreements are about the decision criterion and the relative costs of the two types of errors

Introduction to Hypothesis Testing What would you do if you wanted to determine if a two sided coin is fair? You d probably flip it a bunch of times to see if about / the time it s heads and / the time it s tails. You might also set a criteria by which it would be considered unfair. For example, you might suggest that out of flips if there are 9 or more heads or tails the coin is unfair. This scenario is a simple hypothesis test. Using what is known about probabilities and sampling distributions, even more precise tests may be developed. Introduction to Hypothesis Testing as researchers, we need to decide at what point we believe the coin is unfair a typical guideline is to call anything within the middle 95% of the distribution fair, while the upper and lower.5% would be unfair unfair 4 Area=.5% CRITICAL 3 REGION fair Area=95% unfair Area=.5% CRITICAL REGION α + α

3 Introduction to Hypothesis Testing Number of Heads Probability.4.9.6 9.537 8.8 7.936 6.56 5.936 4.8 3.537.6.9.4.5.5 p...5 std dev 3 4 5 6 7 8 9 3 # heads using the addition rule of probability, notice that the probability of,, or heads out of is <.5 or.5% the same is true for,, or Hypothesis Testing Definition: An inferential procedure that uses sample data to evaluate a hypothesis about a population Hypothesis testing involves a standardized set of procedures so a researcher can objectively evaluate a hypothesis The process starts with a research question -- how will the population mean change after a treatment (independent variable) is administered?

4 Hypothesis Testing: The Steps. State the hypotheses: null & alternative. Set the criterion 3. Obtain sample data 4. Calculate the test statistic 5. Decided to reject or fail to reject the null hypothesis and interpret your decision. State the hypotheses the null hypothesis, H, is always the hypothesis that states that there is no treatment effect, no change, no difference, etc. the alternative hypothesis, H, states that there was a treatment effect, usually in terms of the independent variable, I.V., having an effect on the dependent variable, D.V. hypothesis are always stated in terms of populations remember, even though samples are used, the goal of inferential statistics is to make statements about the population of interest

5. State the hypotheses (cont.) Null Hypothesis H for example, suppose a researcher wanted to know what effect smoking marijuana has on reaction time knowing the population mean on this particular reaction time instrument is. seconds, the hypothesis can be set up H : µ=. sec Control Group. State the hypotheses (cont.) Alternative Hypothesis H when the direction of the effect is not known, the alternative hypothesis will be stated in terms of inequality, H : µ. sec there are instances, based on theory or previous research, when the alternative hypothesis is stated in terms of direction for example, based on previous research, it is known that smoking marihuana increases the amount of time it takes to react H : µ>. sec

6. State the hypotheses (cont.) notice in the previous example that the null hypothesis, H, still maintains equality this should always be the case therefore, H : µ=. sec H : µ>. sec. Set the criterion referring back to the example of flipping the coin, setting the criterion, α, is the statistical equivalent of deciding at what point is the coin unfair as was already mentioned, the middle 95% is usually considered fair in this example, the remaining 5% would be considered error, therefore the criterion is α=.5 4 3 Area=.5% Area=95% Area=.5% α + α

7. Set the criterion (cont.) The criterion, α, is also known as Type I Error Type I error is defined as the probability of rejecting a true null hypothesis that is to say, if the null hypothesis is true and we reject it, there is a predetermined chance (usually a 5%) that we are wrong errors will be discussed in detail later on. Set the criterion (cont.) The criterion delimits what is called the critical region The critical region is defined as the extreme scores in a distribution where the probability of obtaining them is < α when the null hypothesis is true 4 Two-Tailed Test 3 Critical Region α + α Critical Region 4 3 One-Tailed Test Critical Region + α

8. Set the criterion (cont.) as was previously mentioned, the unit normal table can be used to calculate area proportions above or below a score or scores in a distribution corresponding to a given percentage Example Find the z-score associated with the upper and lower scores when considering 95% of a normal distribution upper and lower scores two-tailed test α should be divided by before looking up the z-score α/ =.5/ =.5. Set the criterion (cont.) 4 3 p=.5 p=.5 α + α In Appendix D: Table A look for p=.5 in the area beyond z The z-score is.96. Since it s a two-tailed test z= +/-.96.

9 4. Obtain Sample Data After manipulating as per your hypothesis, collect sample data Use descriptive statistics to see how your data looks like 4. Calculate the test statistic one of the challenges you will face is deciding which test statistic to use you will learn what each one is used for as the class progresses

5. Decide to reject or fail to reject if the test statistic falls in the critical region, the null hypothesis is rejected if the test statistic does not fall in the critical region, the null hypothesis is NOT rejected 4 3 4 3 α + α test statistic test statistic α + α Notice that no statements are made about the alternative hypothesis Caveat: hypothesis testing does not prove anything this is particularly true of the alternative hypothesis the reason probability statements are not made about the alternative hypothesis, is that there still might be other alternative hypothesis comments such as supports the theory and provide evidence to suggest are common ways of describing research findings

Example: Suppose I am interested in determining whether or not review sessions have any effect on exam performance. I will administer the independent variable, a review session, to a sample of students in an attempt to determine if this has an effect on the dependent variable, exam performance. Based on information gathered in previous semesters, I know that the population mean for a given exam is 4. Step : State the hypotheses A researcher always states two opposing hypotheses NULL HYPOTHESIS: States that the treatment has no effect (there is no change, no difference, nothing happened). The null hypothesis is always written as H o. Example: H : µ=4 (Even with the review session, the mean exam score is 4) µ represents the hypothesized population mean for students having review sessions

Step : State the hypotheses (cont) ALTERNATIVE HYPOTHESIS: Predicts that the independent variable will have an effect on the dependent variable (this is the hypothesis the researcher roots for The alternative hypothesis is written as H or H A. We ll use H. Example: H : µ 4 µ represents the hypothesized population mean for students having review sessions. The true population mean for these students may be higher or lower than 4 Step : State the hypotheses (cont) Hypotheses: H : µ=4 H : µ 4 The task is to choose between these two hypotheses The null hypothesis is the hypothesis that is actually tested (we can only test one distribution at at time) The null hypothesis states that the mean for the review population will be 4 -- the same as the untreated, previous population

3 Step : Setting the criterion Our decision is going to be based on a comparison of our sample mean and the hypothesized population mean X compared to µ Small discrepancy fail to reject null hypothesis Large discrepancy reject null hypothesis How far away does our sample data mean need to be from the hypothesized mean in order to tell if the effect is due to our manipulation or just sampling error? The process of answering this question involves establishing an alpha level. Step : Setting the criterion (cont) ALPHA LEVEL (LEVEL OF SIGNIFICANCE): An area under the curve that we use to define very unlikely or very extreme sample values Incompatible Ho α Alpha is symbolized as α By convention, α is usually set at.5,., or. The alpha level is used to split the distribution into two sections: Sample means that are compatible with the null hypothesis (the center of the distribution) Sample means that are significantly different from the null hypothesis (the very unlikely values that fall in the tails of the distribution) + α Compatible Ho Incompatible Ho

4 Step : Setting the criterion (cont) If alpha is set at α=.5, then the extreme 5% of scores in the sampling distribution would represent those extreme or unlikely sample values This extreme region of the distribution that we define with α is called the critical region If we set α to.5 for our example, this would mean that if our sample mean falls in the critical region, we would believe that the mean of the population of the review group is not 4 (the same as the non-review group). It is something larger or smaller, depending on which tail it falls in. 4 3.5% CRITICAL REGION.5% CRITICAL REGION α + α Step : Setting the criterion (cont) Directional vs. Non-directional Hypotheses (One-tailed vs. Two-Tailed) TWO-TAILED HYPOTHESIS TEST (NON-DIRECTIONAL): The alternative hypothesis does not specify the direction of change in the mean; all that is predicted is that some change will occur Example: Do review sessions have 4 any effect on exam performance? 3 H : µ=4 H : µ 4 Sample values that are substantially different (either larger or smaller) than the hypothesized population mean would lead to a rejection of the null hypothesis

5 Step : Setting the criterion (cont) Directional vs. Non-directional Hypotheses (One-tailed vs. Two-Tailed) ONE-TAILED HYPOTHESIS TEST (DIRECTIONAL): The alternative hypothesis specifies either an increase or a decrease in the mean due to treatment; a specific prediction about the direction of change is made Example: Do review sessions 4 improve exam performance? 3 H : µ< 4 H : µ> 4 Only sample values substantially larger than 4 would lead to a rejection of the null hypothesis Step : Setting the criterion (cont) Effects on Alpha: Due to convention, alpha is most often set at.5 For a two-tailed test, alpha must be divided between the two tails (.5 in each tail of the distribution) For a one-tailed test, all of the alpha amount is found in one tail (.5) 4 Two-Tailed Test 3.5.5 4 α + α 3 One-Tailed Test.5 + α

6 Step 3: Obtain sample data In order to ensure that the researcher makes an objective decision, the data is collected after the researcher has stated the hypotheses and set the alpha level. Our hypothesis is that the review session will improve test scores. Thus, we should a one-tailed test, α =.5 EXAMPLE A X = 8 σ X =.9 EXAMPLE B X = 8 σ X =.67 Step 4: Calculate the test statistic z = X µ σ X EXAMPLE A 8 4 z =.9 z =.75 EXAMPLE B 8 4 z =.67 z =.5

7 Step 5: Evaluate the null hypothesis In the final step, you compare your sample data to the null hypothesis and make a decision There are possible decisions:. Reject the null hypothesis: if our sample mean is substantially different from what the null hypothesis predicts (if the sample mean falls in the critical region). Fail to reject the null hypothesis: if our sample mean is not substantially different from the null hypothesis (does not fall in the critical region) Step 5: Evaluate the null hypothesis (cont) ) Reject the null hypothesis: The sample mean provides evidence that the treatment had an effect Findings are considered statistically significant when the null hypothesis is rejected EXAMPLE A In Appendix D:Table A, lookup what the p value is for z=.75 Which column should you look at, B or C? Is the p value less or greater than alpha? Did the treatment have an effect? Was it statistically significant?

8 Step 5: Evaluate the null hypothesis (cont) ) Fail to reject the null hypothesis: Findings are considered statistically nonsignificant when we fail to reject the null hypothesis EXAMPLE B In Appendix D:Table A, lookup what the p value is for z=.5 Which column should you look at, B or C? Is the p value less or greater than alpha? Did the treatment have an effect? Was it statistically significant? Type I & Type II error the fifth step of hypothesis testing is deciding to reject or fail to reject the null hypothesis when this decision is made one of two things is possible, either you are right or you are wrong TRUE STATE DECISION Ho H Do not reject Ho correct decision p =-α Type II error p = β Reject Ho Type I error p = α correct decision p =-β

9 Type I & Type II error Type I error, α (alpha), is defined as the probability of rejecting a true null hypothesis Type II error, β (beta), is defined as the probability of failing to reject a false null hypothesis TRUE STATE DECISION Ho H Do not reject Ho correct decision p =-α Type II error p = β Reject Ho Type I error p = α correct decision p =-β Type I & Type II error analogy consider a court case H : not guilty H : guilty TRUE STATE DECISION not guilty guilty not guilty correct decision Type II error guilty Type I error correct decision A Type I error would occur if a jury convicted an innocent person A Type II error would occur if a jury let a guilty man walk Our justice system sets the probability of a Type I error to beyond a reasonable doubt, just as researchers set it to.5,., etc.

Type I & Type II error example of a Type I error: A researcher concludes that a certain drug treatment significantly decreases the possibility of heart disease when, if fact, it doesn t. example of a Type II error. A researcher concludes that a certain drug does not significantly decrease overactive behavior in children when, in fact, it does. DECISION NO decrease heart disease decrease heart disease TRUE STATE NO decrease heart disease correct decision Type I error decrease heart disease Type II error correct decision