Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the population Predict sample statistics based on population parameters (e.g. µ) Select random sample from population Compare observed sample data with predicted values 2 Step 1: State the Hypotheses The null hypothesis, H 0, states that in the population there is no change, no difference, or no relationship H 0 : µ treatment = constant (e.g. µ) e.g. H 0 : µ treatment = 100 This is read as: The null hypothesis is that the population mean of people receiving the treatment equals 100 H 0 is that the treatment had no effect 3 1

H 0 The null hypothesis must contain an equal sign of some sort (=,, ) Statistical tests are designed to reject H 0, never to accept it 4 H 1 : The Alternative Hypothesis The alternative hypothesis usually takes the following form: H 1 : µ treatment constant (e.g. µ) e.g. H 1 : µ treatment 100 This is read as: The alternative hypothesis states that the population mean of people receiving the treatment does not equal 100 H 1 is that the treatment had an effect 5 H 0 and H 1 Together, the null and alternative hypotheses must be mutually exclusive and exhaustive Mutual exclusion implies that H 0 and H 1 cannot both be true at the same time Exhaustive implies that each of the possible outcomes of the experiment must make either H 0 or H 1 true 6 2

Step 2: Set the Decision Criteria What sample means are consistent with H 0 and what sample means are consistent with H 1? Separate distribution of sample means into two sets of regions one whose means are consistent with H 0 and one whose means are consistent with H 1 n = 25, µ = 100, σ = 15 for graph Extreme, lowprobability values if H 0 is true Sample means close to H 0 : highprobability values if H 0 is true 90 95 100 105 110 Extreme, lowprobability values if H 0 is true 7 α Level The α level (alpha level; level of significance) is a probability value that is used to define the very unlikely sample outcomes if H 0 is true Psychologists usually adopt α = 0.05, although α = 0.01 and α = 0.001 are sometimes used The critical region is composed of the extreme sample values that are very unlikely (as specified by the α level) to be obtained if H 0 is true 8 Since we can reject H 0 two ways (extremely small or extremely large sample means), the α level is divided across the two tails of the distribution Find the z-score whose area above equals α / 2 z = 1.96 for α = 0.05 Find raw scores that Critical Regions Extreme, lowprobability values if H 0 is true, z = -1.96 Sample means close to H 0 : highprobability values if H 0 is true Extreme, lowprobability values if H 0 is true, z = 1.96 correspond to that z score X = 100 + 1.96 3 = 105.9 X = 100 1.96 3 = 94.1 90 95 100 105 110 9 3

Step 3: Collect Data & Compute Sample Statistics Randomly sample from population In this example, n = 25 Give the sample the treatment Measure the dependent variable Calculate the z score of sample mean in the sampling distribution In this example the sample statistics are, = 107, s = 14; population parameters from slide 7 (IQs) 10 Step 4: Make a Decision If the sample mean s z- score is in the extreme tails of the sampling distribution (e.g. in the critical region), reject H 0 ; otherwise, fail to reject H 0 Critical region is z > 1.96 or z < -1.96 for α = 0.05 The example z is 2.33. It is in the critical region. Therefore, reject H 0 It is likely the case that the treatment had an effect Extreme, lowprobability values if H 0 is true, z = -1.96 Sample means close to H 0 : highprobability values if H 0 is true 90 95 100 105 110 = 107; z = 2.33 Extreme, lowprobability values if H 0 is true, z = 1.96 11 Reject H 0 or Fail to Reject H 0 The only decisions you ever make in hypothesis testing are Reject H 0. or Fail to reject H 0 No other decisions are possible Never reject H 1 Never accept H 1 Never accept H 0 12 4

Type I (α) Error A type I (or α) error occurs when a researcher rejects H 0 when H 0 is really true Researcher concludes that the treatment had an effect when it did not This should happen with a probability equal to α 13 Type II (β) Errors A type II (or β) error occurs when a researcher fails to reject H 0 when H 0 is really false Researcher concludes that there is insufficient evidence to suggest that the treatment had an effect when in fact it does have an effect This should happen with a probability equal to β 14 β Unlike α, β is not directly set by the researcher β depends on the sample size (n) β depends on how much the treatment affects the dependent variable β depends on the variability of the data β depends on α 15 5

Type-I and Type-II Errors Ideally, we would like to minimize both Type- I and Type-II errors This is not possible for a given sample size When we lower the α level to minimize the probability of making a Type-I error, the β level will rise When we lower the β level to minimize the probability of making a Type-II error, the α level will rise 16 Type-I and Type-II Errors 17 Factors that Influence a Hypothesis Test The size of the mean difference The larger the mean difference is, the more likely you are to reject H 0 The variability of the scores The more variable the scores are, the less likely you are to reject H 0 The number of scores in the sample The larger the sample size, the more likely you are to reject H 0 18 6

Assumptions of the z-score Hypothesis Test Random sampling If the sample is not selected randomly from the population, it probably will not represent the population Independent observations σ does not change as a result of the treatment Distribution of sample means is normal 19 Directional vs Non-Directional Hypotheses The hypotheses we have been talking about are called non-directional hypotheses because they do not specify how the population mean should differ from the constant That is, they do not say that the population mean should be larger than the constant They only state that the population mean should differ from the constant Non-directional hypotheses are sometimes called two-tailed tests 20 Directional vs Non-Diretional Hypotheses Directional hypotheses include an ordinal relation between the population mean and the constant That is, they state that the population mean should be larger than the constant For directional hypotheses, the H 0 and H 1 are written as: H 0 : µ treatment constant H 1 : µ treatment > constant Directional hypotheses are sometimes called one-tailed tests 21 7

1 Tailed When performing a one tailed test, all of the critical region is in one tail of the distribution of sample means Do not divide α by two when finding the z score for the critical region This increases statistical power the probability of correctly rejecting a false H 0 22 1 Tailed vs. 2 Tailed 1 Tailed α=.05, z = 1.65 Critical region in one tail α=.05, z = -1.96 Critical region in two tails 2 Tailed α=.05, z = 1.96 Critical region in two tails -3-2 -1 0 1 2 3-3 -2-1 0 1 2 3 23 Concerns about Hypothesis Testing Hypothesis testing focuses on the data, and not the hypothesis When we reject H 0, we should really say This specific sample mean is very unlikely (p <.05) if the null hypothesis is true Statistical significance practical significance The effect size can be small, but still be statistically significant if the sample size is sufficiently large 24 8

Effect Size A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used Cohen s d is a measure of effect size 25 Effect Size What is the effect size for the example on slide 5? Magnitude of d d = 0.2 d = 0.5 d = 0.8 Evaluation of Effect Size Small effect Medium effect Large effect This is a small effect 26 Statistical Power Statistical power is the probability that a statistical test will correctly reject a false H 0 Probability that a statistical test will identify a treatment effect if one really exists Power = 1 β= 1 probability of a Type II error 27 9

Statistical Power Calculate before performing the study Need to know / estimate How much the treatment changes the dependent variable Sample size α σ, µ 28 Statistical Power Example How much the treatment changes the dependent variable Researchers hypothesize that having proper nutrition during the first two years will increase IQ by 3 points (notice 1 tailed) µ = 100, σ = 15 Sample size n = 25 α =.05 29 Distribution of Sample Means If the treatment has no effect, by the central limit theorem, the distribution of sample means will have: a mean = population mean = 100 a standard deviation = σ/ n = 15 / 25 = 3 If the treatment has the hypothesized effect, the distribution of sample means will have a mean = population mean + effect of treatment = 100 + 3 = 103 a standard deviation = σ/ n = 15 / 25 = 3 add a constant to all scores does not change the standard deviation 30 10

z Score of Critical Region This is a one-tailed test with α =.05 Consult a table to find the z with an area above equal to.05 z = 1.65 31 Statistical Power Example 91 94 97 100 103 106 109 112 115 z 0 1 1.65 2 32 Statistical Power Example Power equals area to right of the z score for the critical region under the treatment distribution of sample means Areas to the right of the z score for the critical region correspond to rejecting H 0 Areas under the treatment distribution of sample means correspond to a false H 0 Both combined correspond to rejecting a false H 0 = power 33 11

Statistical Power Example Find the z score in the treatment distribution of sample means that is at the same location as the z score for the critical region in the no treatment distribution of sample means z treatment = z critical region z mean of treatment z mean of treatment = (103 100) / 3 = 1 z treatment = 1.65 1 = 0.65 Power = area above z = 0.65 Power =.26 Only about a 1 in 4 chance of observing this effect 34 Factors that Influence Power Sample size As sample size increases, power increases α level As α decreases (fewer Type I errors), β increases (more Type II errors), and 1 β (power) decreases Number of tails (directional vs non-directional) One tailed tests have more statistical power than two tailed tests. Can you explain why? 35 12