Chapter 08. Introduction

Transcription

1 Chapter 08 Introduction Hypothesis testing may best be summarized as a decision making process in which one attempts to arrive at a particular conclusion based upon "statistical" evidence. A typical hypothesis test contains two contradicting statements about the value of a population parameter of interest. These statements are called the null hypothesis, denoted by, and the alternative hypothesis (aka the research hypothesis) denoted by. Since these two hypotheses are contradicting, at least one of them must be false. Hypothesis testing is the statistical process used to decide which statement (or hypothesis) appears to be true and which appears to be false. The evidence we use to determine which hypothesis is correct arrives in the form of randomly sampled data from the population (or populations) of interest. The first step of any hypothesis test is to establish the null and alternative hypotheses, which in turn will help us to determine exactly what we are testing. In practice, the researcher is responsible for setting up the null and alternative hypotheses based on the type of research conducted. For us, setting up the null and alternative hypotheses will stem from the careful interpretation of a "claim" found in the description of a given research statement or question. Typically, a claim is associated with the alternative hypothesis, but is occasionally associated with the null hypothesis. If the wording of the claim suggests equality of any kind then it is associated with the null hypothesis. For example, if the claim states a parameter is "equal to", "greater than or equal to" (at least), or "less than or equal to" (at most) a given value, then it is associated with the null hypothesis. Alternately, if the claim specifically lacks equality, that is, states a parameter is less than, greater than, or unequal to a given value, then the claim is associated with the alternative hypothesis (i.e., will only contain <, >, or ). For instance, if we claim "the average surface temperature of the water in the North Atlantic in September is greater than 38 F" (i.e. µ > 38), then this claim addresses the alternative hypothesis because "more than" does not imply equality. For this claim, the alternative hypothesis will look like When the claim is associated with the alternative hypothesis, a null hypothesis needs to be devised so that it contradicts the alternative hypothesis. An easy way to accomplish this is to simply set the parameter equal to the value already specified in the alternative hypothesis. For example, a null hypothesis for the current alternative hypothesis could simply be stated as Another option for the null hypothesis would be to use : µ < 38, which would also contradict the alternative hypothesis and it too infers equality. However, in an attempt to simplify this process, we will 3:15:08 PM]

2 Chapter 08 use the equal sign (=) instead of the less than or equal to sign (< ), or the greater than or equal to sign (>). Let's look at another example. Suppose a researcher makes the following claim: "The mean age of a person diagnosed with type II diabetes is less than 29 years of age.". In this instance, the claim made by the researcher is again the alternative hypothesis because "less than" does not imply equality. In this instance, the alternative hypothesis will look like... Additionally, if a researcher makes the claim: "The mean age of a person diagnosed with type II diabetes is greater than 29 years of age.", then the alternative hypothesis becomes If a claim indicates a parameter (or parameters) is not equal to some value, as in the statement "The mean age of a person diagnosed with type II diabetes is not 29 years of age.", the alternative hypothesis would be listed as We now need to fit a null hypothesis to each of these three alternative hypotheses. Fortunately, by utilizing straight equality in the null hypothesis, we can create a single null hypothesis that can be used with any of the three previous alternative hypotheses, that being... Regardless of which of the previous three alternative hypotheses we want to test, this one null hypothesis contradicts all of them. The reason we can get away with a single null hypothesis for any of the three previously mentioned alternative hypotheses is because the remaining steps involved in a hypothesis test are determined by the type of inequality used in the alternative hypothesis, and not those used in the null hypothesis. This is precisely the reason why we are able to use the equals sign (=) exclusively in the null hypothesis, and let the alternative hypothesis contain either <, >, or. Occasionally, the null hypothesis is specified in the "claim" and the alternative hypothesis has to be created to fit the research scenario. For example, consider the claim "the mean age of men getting married for the first time is at least 25 years old". The phrase "at least 25" implies "greater than or equal to 25". Thus, the alternative hypothesis needs to contradict this statement and therefore should be written such that the parameter is "less than 25". Again, it is the alternative hypothesis that plays a roll in the completion of the hypothesis test, so whether the null hypothesis utilizes the "greater than or equal to" sign, or just the "equal to" sign, the test will remain the same. As a result, either of the following sets of hypotheses would be 3:15:08 PM]

3 Chapter 08 correct for this claim. However, as mentioned before, for the sake of simplicity, we will utilize the null hypothesis than contains only the equal sign. Additional examples of setting up the null and alternative hypotheses through interpretation of different claims about various parameters are given here. Claim: The mean body temperature of a healthy adult is not 98.6 degrees F. The claim "is not 98.6" implies this statement is associated with the alternative hypothesis, which will contain the not equal to symbol. Thus, as a consequence, the null hypothesis will contain the "equals to" symbol. The only correct set of hypotheses is: Claim: The mean monthly student loan payment of graduates from the University of Oklahoma is thought to be more than $340. Since the claim states "more than," with no mention of equality, it must be the alternative hypothesis. Thus, an appropriate null and alternative hypotheses are: Claim: The population proportion of Democrats who will vote against their own party in the upcoming election is less than Since the statement contains the phrase "less than", with no mention of equality, it is referencing the alternative hypothesis. Also notice the claim is about a proportion, not a mean. Appropriate null and alternative hypotheses for this scenario are... Claim: The mean temperature in Nashville during the month of July is at least 84 degrees. In this case the claim that the temperature is "at least" 84 degrees contains equality because "at least" means "greater than or equal to". Therefore, this claim addresses the null hypothesis, and thus, the 3:15:08 PM]

4 Chapter 08 alternative hypothesis must consist of "less than 84". Incidentally, the null and alternative hypotheses can be written as... Claim: The standard deviation associated with the number of text messages sent by teenagers per day in the US equals 16. This claim states that the standard deviation associated with US teenagers and their texting habits equals 16, suggesting the statement is associated with the null hypothesis. Note that there was no mention of "greater than", "less than", "at most", or "at least" anywhere in the statement. This means the only way the alternative hypothesis can contradict the claim is, if it contains a "not equals to" sign. Consequently, the null and alternative hypotheses are: We can also make claims about two (or more) population parameters. An example might be "in September, the mean surface temperatures for the North Atlantic will not equal the mean surface temperature of the North Pacific", which represents the alternative hypothesis and is written as. Noting, that the null hypothesis that contradicts our alternative hypothesis is. Hypothesis test like these, involving two or more parameters, will be the focus of future chapters. To recap, for each of the hypothesis tests conducted in this chapter (and in the following chapters), three simple rules can be referenced in helping us with our construction of the null and alternative hypotheses. 1. The null hypothesis is always associated with an equals sign. 2. The alternative hypothesis never contains an equal sign, but contains either a < or > or. 3. The null and alternative hypotheses always contradict each other. 3:15:08 PM]

5 Chapter 8.2 The Origin of Hypothesis Testing Regardless of whether the claim coincides with the null or alternative hypothesis, when conducting a hypothesis test, we always assume the null hypothesis is true and test the reliability of the null hypothesis with sample data. The reason we test the null hypothesis is because, by assuming the null hypothesis is true, we are able to utilize the pre-established properties of sampling distributions. The basic idea of testing the null hypothesis involves using sample data to calculate a statistic that estimates the parameter of interest. Then, based on the proximity of the statistic in relation to the parameter, we decide whether or not there is sufficient evidence to conclude the null hypothesis is false. These underlying ideas regarding hypothesis testing might best be encapsulated by considering an example. The University President Example: Suppose the president of a university hypothesizes that the average age of students attending her university is 20.5 years. The president's claim (or hypothesis) implies equality, so the null hypothesis and alternative hypotheses are and respectively. Note that because no specific indication of testing for less than, greater than, at most, or at least was given, the alternative hypothesis must be. One way to investigate our null hypothesis is to take a random sample of students from the university and calculate their average age. Recall, due to sampling error, values of the sample mean,, vary from sample to sample, and serve only as a "good guess" to the value of the population mean. We do not expect the sample mean,, to equal the population mean, µ, but, if the null hypothesis is true, we do expect a vast majority of 's to be reasonably close to µ = Therefore, if the value of the sample mean age is close to 20.5, we will have little evidence to suggest that µ = 20.5 is not a viable statement. However, if the value of begins to deviate substantially from 20.5, then we start to question the legitimacy of the null hypothesis. This brings about the question of just how far does need to deviate from the value of µ stated in the null hypothesis before we begin to suspect the null hypothesis is incorrect? The answer to this question depends greatly upon the spread of the distribution of 's. Fortunately, by making use of results provided by the central limit theorem, the spread of the distribution of 's can be estimated. Recall, if n is greater than 30, the sampling distribution of the 's will be approximately normally distributed with a mean of µ and a standard deviation of. For instance, suppose a random sample of 30 university students was selected and their average age was found to be Additionally, assume it was known that the value of is 2.4 years. Thus, the standard deviation of the sampling distribution is, or years. More importantly, if the president's claim that the average age of the students at her university is 20.5 years is true, we would expect about 95% of the sample means to be within 1.96(0.438) of 20.5, or between and years. This is shown in Figure 8.1with 95% of the sample means falling between the red vertical lines. 3:15:28 PM]

6 Chapter 8.2 If the value of our sample mean,, falls within this central 95% of the sampling distribution, we fail to reject the null hypothesis because we assume the difference between and µ is the result of sampling error (recall, sampling error is the reason different samples provide different sample means). That is, when the value of our sample mean is between the values which define the central 95% of the sampling distribution, we lack sufficient evidence to suggest the true mean is not This is equivalent to saying, based on the evidence provided by our sample mean, we do not have enough evidence to reject the null hypothesis. In this instance, our sample mean of 20.8 clearly falls among the commonly expected values of, as indicated by the green line in Figure 8.2. Therefore, a sample mean of 20.8 provides us with an unsubstantial amount of evidence against the null hypothesis, meaning there is insufficient evidence to suggest the average age of students at the university is not 20.5 years. 3:15:28 PM]

7 Chapter 8.2 Notice the last statement said nothing about evidence in support of the null hypothesis, we just don't have enough evidence to say that it is false. That is, the difference between the sample mean and the hypothesized population mean was not large enough to "really" convince us otherwise. On the other hand, if the value of our sample mean,, was much further away from 20.5, like 21.5, we should be less inclined to believe that the null hypothesis is true. Namely, the veracity of the null hypothesis would be in question because the value of the sample mean (21.5) is so extreme compared to the stated value of 20.5 that there appears to be more than just sampling error present. This would indicate that the true average age is not 20.5, but probably some value larger than In fact, we can see from Figure 8.3that the sample mean of 21.5 is not located within the middle 95% of the distribution. Instead, it deviates greatly from the hypothesized center of Although it is possible that sampling error is the only cause of this sample mean being so extreme, the probability is very small. Therefore, instead of assuming the large distance between the sample mean and hypothesized population mean is the rare case of extreme sampling error, we instead adopt the more believable idea that the population mean is really larger than 20.5 and there is just a little sampling error. When the value of a statistic (our sample mean in this case), is beyond what would be expected due to sampling error alone, we say that our result is statistically significant. 3:15:28 PM]

8 Chapter :15:28 PM]

9 Chapter 8.3 Setting up a Hypothesis Test Once the null and alternative hypotheses are established, the next step is to determine whether we are conducting a one-tailed test or a two-tailed test, that is whether or not there are one or two rejection regions. The number of tails in our hypothesis test is determined by the alternative hypothesis becasue it reveals where the test statistic needs to fall in order to reject the null hypothesis. For instance, say we hypothesize that the average age of persons diagnosed with type II diabetes is not 29 years old, giving us a null and alternative hypotheses of: In this case, there are two different scenarios which allow us to reject the null hypothesis. We could reject the idea that µ = 29 if the sample mean is very small compared to the hypothesized population mean or if the sample mean is very large compared to the hypothesized population mean. Either way, we expect more than just sampling error to be causing the large difference between the hypothesized and sample means (i.e. a statistically significant difference). As a result we need to keep both tails of the sampling distribution labeled as potential "rejection regions," where a rejection region is defined as any area of the distribution typically not attributed to sampling error alone (see Figure 8.4). This is fittingly called a two-tailed test because both tails are potential "rejection regions." However, what if the claim was along the lines of "the mean age of a person who is diagnosed with type II diabetes is less than 29 years"? In keeping with the claim, the appropriate set of null and alternative 3:15:48 PM]

10 Chapter 8.3 hypotheses are: From inspection of the alternative hypothesis, in order to really convince anyone that the alternative hypothesis is true, we will need a sample mean that is much smaller than 29 such that its value is beyond the range of sample means accounted for by sampling error. This type of situation would enable us to suspect that the cause of the difference between the sample and hypothesized mean is due to more than just sampling error. Recognize that sample means greater than 29 will surely not convince anyone that the alternative hypothesis is true. We call this a one-tailed hypothesis test (or more formally, a left-tailed hypothesis test), as the only rejection region falls in the left tail. Therefore, we only reject the null hypothesis if the value of our sample mean finds itself in the left tail of the distribution as shown in Figure 8.5. In a similar fashion, if the null and alternative hypotheses were stated as: then the rejection region would fall to the right of the distribution because the only way to reject the null hypothesis is to obtain a sample mean large than 29 such that the value of the sample mean is beyond what would be considered sampling error (see Figure 8.6). 3:15:48 PM]

11 Chapter 8.3 Since the rejection region is found in the right tail, this too is a one-tailed test, but more specifically, we call it a right-tailed hypothesis test. 3:15:48 PM]

12 Chapter 8.4 Making an Error When conducting a hypothesis test, we must remember that we can never be absolutely certain which hypothesis is the correct one. When we complete a hypothesis test, we select what we think is the correct hypothesis, but we may be wrong. The reason we can never be absolutely certain which hypothesis is the correct one is because we are using a sample to make an inference about an entire population. For instance, in the example regarding the mean age of students at a university, we rejected the null hypothesis when the sample mean ( = 21.5) fell outside of the middle 95% of the distribution. However, even when the null hypothesis is true, there is still a chance, although small (2.5% for each tail for a total of 5%), of an estimate (our sample mean in this case) falling outside the central portion of the distribution (the area due to sampling error). When one of these rare yet possible estimates occurs, we reject the null hypothesis when in fact it should be retained. If we reject the null hypothesis but the null hypothesis is correct, we have made an error called a type I error. The probability of a type I error is denoted by (the Greek letter alpha), where is also called the "level of significance" or, the "type I error rate". The value of is easily obtained as it is the researcher (you) who gets to decide on what this value is. The value selected by the researcher always corresponds to the area in the rejection region(s). Thus, if we decide to use the middle 95% of the sampling distribution to account for sampling error, then that leaves 5% in the tails or = Just as confidence intervals should never have a level of confidence lower than 90% or rarely be greater than 99%, the value of should never rise above 0.10 or rarely fall below Regardless of the value we select for, we need to determine this value of before collecting our data and conducting our hypothesis test. If we let the results of the hypothesis test influence which alpha level we choose, what will keep us from selecting an alpha level that supports the result "we" desire, instead of the results given by our test? Thus, the level of alpha is always chosen a priori "before the fact" and never ex post facto or "after the fact." A second type of error arises whenever we fail to reject the null hypothesis, but the null hypothesis is actually false. This type of error is called type II error. The probability of a type II error is denoted (the Greek letter beta). For instance, in reference to the mean age of a college student example, a type II error can occur when the sample mean falls within the middle 95% of the sampling distribution such as = 20.8, but the population mean is not 20.5, as specified in the null hypothesis. Because the deviation of this sample mean from the hypothesized population mean could be attributed entirely to sampling error, we would have no substantial reason to reject the null hypothesis. Therefore, even if the true population mean is a value other than 20.5, the null hypothesis will not be rejected and a type II error will be committed. The below table summarizes the errors (and the non-errors) one can make when conducting a hypothesis test. 3:16:05 PM]

13 Chapter 8.4 Unfortunately, we can never be sure of the exact value of because calculating it requires us to know the true value of when the null hypothesis is wrong. If we knew the real value of, we would not bother conducting a hypothesis test. Although we will not be able to directly calculate the probability of a type II error, we will discuss ways in which the chance of comitting a type II error can be reduced. 3:16:05 PM]

14 Chapter 8.5 Power The complement of the probability of a type II error is called power. Power is the probability of rejecting the null hypothesis when indeed the null hypothesis is false. Thus, power represents the probability of making a good decision. Although power will not be discussed in detail in this text, the concept of power is important. If our hypothesis test has high power, then we will be more likely to make the correct decision of rejecting a false null hypothesis. Also, when making comparisons between different statistical tests designed to accomplish the same goal, the one with the highest power is generally preferred. Mathematically, power is denoted as 1-, where is the probability of a type II error. 3:16:15 PM]

15 Chapter 8.6 Relationships Between Type I Error, Type II Error, and Power In this section we will discuss how the probability of type I error, the probability of type II error, and power are related to one another. To do so, turn you attention to Figure 8.7, where the blue curve (on the left) represents the distribution with respect to the null hypothesis, which in this case is centered at zero (i.e., µ = 0). Likewise, the red curve (on the right) represents the distribution specified in the alternative hypothesis, (or more specifically, ). In reality we would never know the specific value of the parameter under the alternative hypothesis (µ = 2 in this case), but it makes it easier to discuss the relationship between power, the probability of a type I error, and the probability of a type II error when this value is known. To better understand the interconnections between type I error, type II error, and power, we need to adhere to the following rules. When referencing type I error, we are assuming the null hypothesis is the correct hypothesis. Therefore, when discussing type I error, we will consider only the blue curve in Figure 8.7. When we referencing type II error or power, we are assuming the alternative hypothesis is the correct hypothesis, thus, we will consider only the red curve in Figure 8.7. When the value of a sample mean,, falls between the vertical green lines, we will not reject the null hypothesis. In this case, if the null hypothesis is true, then no error has been made. However, if the sample mean falls to the left of the lower green line or the right of the upper green line and the null hypothesis is true, we will incorrectly reject the null hypothesis and a type I error will be committed. The probability of a type I error (denoted ) is represented by the area under the blue curve outside the green lines. In Figure 8.7 this area is labeled "Type I Error" and is also the rejection regions. 3:16:26 PM]

16 Chapter 8.6 On the other hand, if the true value of the population mean is two (i.e., the alternative hypothesis is correct) and the sample mean falls between the green lines, then we fail to reject the null hypothesis and a type II error is committed. The type II error rate,, (i.e. the probability of committing a type II error) is represented by the area under the red curve that falls between the green lines. In Figure 8.7 this area is labeled "Type II Error". Finally, if the position of the sample mean falls to the left of the lower green line, or the right of the upper green line, then no error has been committed. In fact, if this happens, it is desirable as we are rejecting a false null hypothesis in support of a true alternative hypothesis. As stated earlier, the probability of correctly rejecting the null hypothesis is power, and is represented by the area under the red curve to the left of the lower green line and the right of the upper green line. We can see from Figure 8.7 that almost all of the power falls to the right of the upper green line, which makes sense, as the value of the mean under the alternative hypothesis is greater than the mean under null hypothesis (two is greater than zero). Consequently, if the value of the mean under the alternative hypothesis was smaller than the value of the mean given by the null hypothesis, then most of the power would be found under the red curve and to the left of the lower green line. In our continued quest to see how type I error, type II error, and power are all related, consider Figure 8.8, which is similar to Figure 8.7, except the type I error rate has been reduced (i.e. the vertical green lines have been moved further apart). It is a common misconception to think that reducing the probability of a type I error is a beneficial. How could it be bad thing to lower your chance of committing an error? The problem with reducing,, the type I error rate, is that you simultaneously increase the probability of a type II error. As displayed in Figure 8.8, when the type I error rate decreases, the area under the blue curve and between the green lines increases. When this happens, the area between the green lines and under the red curve also increases, consequently increasing. In addition, when increases, power decreases. This diminishes our 3:16:26 PM]

17 Chapter 8.6 ability to correctly reject the null hypothesis when it is false. It is also worthy to note how much the probability of a type II error increased from a very small decrease in the probability of a type I error. By comparing Figure 8.7 and Figure 8.8, notice, it was not an equal exchange. For this example, when was decreased only a little, increased substantially. Additionally, as displayed in Figure 8.9, if we reduce the type II error rate to increase power, we in turn increase the type I error rate. This is yet another example of "you can't get something for nothing." If you reduce type I error rate, your type II error rate increases and power decreases. Similarly, if your type II error rate is decreased and power increased, you do so at the cost of increasing the type I error rate. This is why it is common to use a type I error rate that strikes a "happy middle ground", like say A type I error rate of 0.05 is small enough to minimize the chance of rejecting the null hypothesis incorrectly, but large enough to insure a relatively manageable type II error rate along with hopefully providing a decent amount of power. In general, it is recommended that we select a type I error rate (a.k.a. level of significance) between 0.1 and When a hypothesis test is a one-tailed test instead of a two-tailed test, the areas representing the type I error rate, the type II error rate, and power are all on one side of the distribution. Recall, a one-tailed test places the entire type I error rate (or rejection region) into either the left or right tail of the distribution. For example, the location of the type I error rate, type II error rate, and power for a right-tailed test are displayed in Figure :16:26 PM]

18 Chapter 8.6 To see firsthand the relationship between type I error, type II error, and power, activate the Alpha-Beta interactive tool below. Click here to use the Alpha-Beta Tool. To utilize this interactive tool, you can control not only the type I error rate but also the sample size, the standard deviation of the distributions, and the distance between the means hypothesized in the null and alternative hypotheses. Once all of these values are determined, the interactive tool displays the probability of a type II error and the power. This tool can be used to investigate how altering the values of these variables influences the probability of a type II error and power. However, when using this tool, keep in mind that the only variables the researcher would have control over in the "real world" are the sample size, the level of, and possibly whether or not a one or two-tailed test is conducted. The standard deviation would only be estimated after the sample is taken while the distance between the means, the probability of a type II error, and power are never known. In the interactive tool, the variables that a researcher would have control over are coded in blue while the variables that would generally be unknown to the researcher are coded in red. Upon activating the interactive tool, it may be helpful to increase your understanding of the relationship between type I error, type II error, and power by answering the following questions. For a set standard deviation and distance between the means: 1. Which has higher power, a one-tailed or a two-tailed test? 2. What happens to the probability of type II error as the probability of a type I error is decreased? 3. What happens to the power as the probability of a type I error is decreased? 3:16:26 PM]

19 Chapter What happens to the probability of a type II error as the sample size is increased? 5. What happens to the power as the sample size increases? For a set sample size and level of alpha: 1. What happens to the probability of a type II error as the standard deviation increases? 2. What happens to the power as the standard deviation increases? 3. What happens to the probability of a type II error as the distance between the means increases? 4. What happens to the power as the distance between the means increases? Answers: 1. The one tailed test has higher power. 2. The probability of a type II error increases. 3. The power decreases 4. The probability of a type II error decreases. 5. The power increases. 1. The probability of a type II error increases. 2. The power decreases. 3. The probability of a type II error decreases. 4. The power increases. 3:16:26 PM]

20 Chapter 8.7 Hypothesis Tests about a Population Mean, the Right Way! In section 8.2, we discussed one method of determining which hypothesis appears to be true. Recall, in section 8.2 the mean age of students at a university was thought to be 20.5 years. To test this theory, a random sample of 30 students was selected and their mean age was determined. If the sample mean age fell in the middle 95% of the sampling distribution, we failed to reject the null hypothesis. Whereas, if the sample mean fell in either tail (the rejection regions), the null hypothesis was rejected and the alternative hypothesis was thought to be correct. While this method accomplishes the goal, it does not reflect the process used by researchers. First, in section 8.2, we assumed we knew the population standard deviation,. In more authentic situations, will almost never be known and must be estimated with the sample standard deviation, s. Similar to constructing confidence intervals (see section 7.14), when s is used to estimate, we base our calculations on the t-distribution instead of a z-distribution. In addition, we do not usually utilize the sampling distribution alone to determine if our sample mean is "extreme enough" to reject the null hypothesis, as was done in section 8.2. Although this method suffices, it is missing a key aspect of hypothesis testing: a value that measures the "strength of the evidence against the null hypothesis". For instance, in the example from section 8.2 the null hypothesis,, was rejected when the sample mean was 21.5 because it fell into one of the rejection regions. But, this process never really indicates just how "far out" the sample mean was compared to the hypothesized population mean, or more importantly, how uncommon it would be to get a sample mean as extreme or even more extreme than 21.5 if the null hypothesis was true. Obviously the evidence was strong enough to reject the null hypothesis, but think about how much stronger the evidence would have been if the sample mean turned out to be say 25.8 (if 21.5 is extreme, then 25.8 is really extreme). Introducing the P-Value In order to mathematically state just how strong, or weak, the evidence is against the null hypothesis, we calculate the probability of getting a sample mean (or any other estimate for that matter) that is at least as extreme as the one obtained assuming the null hypothesis is true. This probability is called the p-value, and is used to determine the degree in which the null hypothesis is either rejected or retained. Graphically, the p-value is the area in the tail(s) beyond the sample mean. Regardless of where the sample mean falls in the sampling distribution, if the test is a left-tailed test, then the p-value is associated with the area to the left of the sample mean as shown in Figure :16:47 PM]

21 Chapter 8.7 Similarly, if the test is a right-tailed test, the p-value can be graphically represented by the area to the right of the sample mean, as shown in Figure Two-tailed tests are approached differently because the definition of the p-value states " at least as extreme as the one obtained, in the direction of the alternative hypothesis...". Thus, when the alternative hypothesis indicates the rejection region is in both tails ( ), the p-value is related to the area that extends outward towards both tails, beyond not just the sample mean, but beyond the location of its complement (mirror image for a symmetrical distribution) called the "pseudo" sample mean. An example of a two- 3:16:47 PM]

22 Chapter 8.7 tailed test is illustrated in Figure 8.13, where the solid green line represents the actual sample mean and the dashed green line represents the corresponding pseudo sample mean. The p-value is then found by considering the area the the left of the pseudo mean and to the right of the actual mean. Of course if the actual mean is positioned in the left tail instead of the right, then the p-value is found by combining the area to the left of the sample mean with the area to the right of the pseudo sample mean. Note, that if the sample mean is not in the rejection region, then the area beyond the sample mean and the pseudo sample mean will be larger than the area defined by the rejection region (which is equal to the level of significance, or ). Therefore, if the p-value is larger than the level of significance, we will fail to reject the null hypothesis. However, if the sample mean falls in a rejection region, then the area beyond the sample mean and pseudo sample mean will be less than the level of significance, causing us to reject the null hypothesis. Note that due to the inclusion of the pseudo sample mean, the p-value for a two-tailed test will always be twice as large as the p-value for an equivalent one-tailed test. In general, smaller p-values (smaller probabilities) indicate stronger evidence against the null hypothesis while larger p-values indicate weaker evidence against the null hypothesis. Thus, if your p-value is very small (smaller than say 0.001) then you have very strong evidence in support of rejecting the null hypothesis. Keep in mind however, that regardless of the determined p-value, we always reject the null hypothesis if the p-value is smaller than the stated level of significance. The p-value simply gives us an "idea" of how strong our evidence is against the null hypothesis. Determining the P-Value To determine the p-value, (i.e., the strength of the evidence against the null hypothesis), we must first convert the sample mean into a "test statistic" by standardizing it. This standardized value is called a test statistic because it is the value that is used to test the null hypothesis. A sample mean can be standardized 3:16:47 PM]

23 Chapter 8.7 (turned into a test statistic) by using Equation 8.1. Once the test statistic is found, we can then find the p- value using an appropriate interactive tool (or table). This process is similar to the method used in chapter 6, where we standardized x-scores by transforming them into z-scores, and then, found the probabilities associated with our z-scores. An example problem (or two) will hopefully make this process clearer, but first, let's consider the situations in which equation 8.1 provides us with reliable p-values. The assumptions for conducting a hypothesis test on a population mean (i.e. what we need to assume of we are to use Equation 8.1 appropriately) are: 1. The data was collected via a simple random sample. 2. The sample size must be large enough to ensure an approximately normal sampling distribution. According to the central limit theorem, we need a sample size adequate enough to make the sampling distribution approximately normal. Generally a sample size of 30 will suffice. Oxygen Intake Example: A research scientist claims that the mean oxygen intake per breath for smokers is less than 40.6 ml/kg. Based on a sample of 35 smokers the mean oxygen intake was found to be 39.2 ml/kg with a sample standard deviation of 3 ml/kg. Obviously the sample mean, 39.2 ml/kg is less than the stated 40.6 ml/kg. But, is this difference statistically significant, i.e. large enough to statistically convince us that the mean oxygen intake is less than 40.6 ml/kg? Stated differently, is there enough evidence to support the claim based on a level of significance of = 0.05? Step 1: Determine the Null and Alternative Hypotheses Because the research believes the oxygen intake is less than 40.6 ml/kg, but not equal to it, the claim is the alternative hypothesis. Therefore the null and alternate hypotheses can be written as = 40.6 and < 40.6 respectively. The alternative hypothesis indicates a left-tailed test is to be conducted, meaning the entire rejection region falls on the left side of the distribution. Since the level of significance is set at 0.05, the area in the rejection region will be If the p-value turns out to be less than or equal to 0.05, the null hypothesis should be rejected, otherwise the null hypothesis should be retained. Step 2: Calculate the Test Statistic Using Equation 8.1, determine the test statistic (the t-score) by standardizing the sample mean of 39.2 ml/kg with a sample standard deviation of 3 ml/kg. 3:16:47 PM]

24 Chapter 8.7 Step 3: Find the p-value Next, determine the correct degrees of freedom for which the t-distribution is applicable, and then find the area to the left of the test statistic. This is equivalent to finding the probability of obtaining a sample mean as extreme or more extreme than 39.2 when the true mean is assumed to be 40.6 ml/kg. The correct degrees of freedom are n - 1 = 34. The probability of being to the left of 39.2 (or being to the left of a t-score of based on 34 degrees of freedom) can be found using the interactive p-value calculator for a t-distribution. This application (along with others) is available in the floating menu at the right of the screen. To utilize the interactive p-value calculator, we first set the degrees of freedom to the correct value (34 for the current example). Once the degrees of freedom are set, we determine whether we are conducting a left-tailed test, a right-tailed test, or a two-tailed test. Since we are currently conducting a left-tailed test, we will utilize the box marked "Left Tailed". From here, to find the p-value associated with our test statistic, we adjust the slider (corresponding to the p-value) until the value of the test statistic (-2.76) matches the value in the box labeled "Left Tailed" (if we cannot get exactly -2.76, then get as close as possible). Once the value in the box labeled "Left Tailed" contains the value of the test statistic, the corresponding p- value can be found from inspection of the "Corresponding p-value" box. The p-value for this example turns out to be about :16:47 PM]

25 Chapter 8.7 Step 4: Decide which Hypothesis Appears to be True Since the p-value is 0.005, the probability of getting a sample mean as extreme as 39.2 assuming the population mean is 40.6 is about (not very likely). Since this probability is so small, we are led to believe that the null hypothesis is false. Therefore we reject the null hypothesis and favor the alternative hypothesis instead. In summary, since the p-value is smaller than the stated alpha level of 0.05, we reject the null hypothesis in favor of the alternative hypothesis. Step 5: Write a Statement(s) Explaining Your Conclusion An example of a good concluding statement would be: "Based on a sample of 35 smokers, there is sufficient evidence (p-value = 0.005) to reject the notion that the mean oxygen intake for a smoker is at least 40.6 ml/kg (the null hypothesis), and therefore, we conclude that the mean oxygen intake for smokers is lower than 40.6 ml/kg (the alternative hypothesis)." Professor Salary Example: A study conducted by the American Chemical Society claims that the average annual salary of tenured chemistry professors is $70,000. To test this claim, the Association of American Chemistry Professors randomly selected 52 tenured chemistry professors and found their mean salary to be $70,150 along with a standard deviation of $900. Investigate the claim using an alpha level of 0.05 (the level of significance). Step 1: Determine the Null and Alternative Hypotheses 3:16:47 PM]

26 Chapter 8.7 Since there was no indication of less than or greater than, the alternative hypothesis must contain the not-equal sign. Therefore, the null and alternative hypotheses are and respectively. As a result, this is a two-tailed test. Step 2: Calculate the Test Statistic The sample mean was $70,150. Using the stated standard deviation of $900 and the sample size of 52, we calculate the test statistic using Equation 8.1. Step 3: Find the P-Value After determining the degrees of freedom, we can find the p-value using the interactive p-value calculator for a t-distribution. Since we are conducting a two-tailed test, and the test statistic is positive, we find the p-value based on the "Upper two-tailed" critical value (if the test statistic was negative we would utilize the "Lower two-tailed" critical value). Thus, based on 51 degrees of freedom, the p-value that corresponds to a test statistic of t = 1.2 is about Step 4: Decide which Hypothesis Appears to be True Because the p-value, 0.24, is larger than the stated alpha level of 0.05 we fail to reject the null hypothesis. The data does not provide us with sufficient evidence to suggest the null hypothesis is false as the difference between $70,150 and the hypothesized $70,000 was not statistically significant. The probability of getting a sample mean as extreme as $70,150 when the population mean was thought to be $70,000 is 0.24, meaning such a result will happen in about one out of every four samples, which is quite common. Step 5: Write a Statement(s) Explaining Your Conclusion An example of a good concluding statement would be: "Based on a sample of 52 tenured chemistry professors and a 0.05 level of significance, we do not have enough evidence (p-value = 0.24) to reject the claim that the mean salary of tenured chemistry professors is $70,000". The same results can be obtained by using a statistical software package such as Minitab. For instance, below is the output that corresponds to our Professor Salary Example. 3:16:47 PM]

27 Chapter 8.7 Notice that, among other things, Minitab provides us with the value of the test statistic and the corresponding p-value. Regardless of whether we use Minitab or conduct the test "by hand" our decision remains the same, i.e., since the p-value is greater than 0.05 we fail to reject the null hypothesis. 3:16:47 PM]

28 Chapter 8.8 Hypothesis Testing about a Population Proportion Not only can we test values of hypothesized population means, but we can also test the values of hypothesized population proportions as well. Recall, proportions specify the part or percent of a population that has a specific characteristic or trait. For instance, we might hypothesize about the population proportion of registered voters who will participate in an upcoming election, or, hypothesize about the proportion of grizzly bears in Denali National Park that are male. There are very few differences between conducting a hypothesis test for a population proportion in comparison to conducting a hypothesis test for a population mean (see section 8.7). This is because the basic steps in conducting a hypothesis test on a population proportion are essentially the same as for a population mean. Additionally, all previously discussed terms retain their exact meaning, such as type I error, p-value, power, etc. The only difference is, we are concentrating on proportions instead of means, so consequently, the null and alternative hypotheses will be statements about a population proportion instead of a population mean. It is also the case that our test statistic will be calculated based on proportions, and assumes the shape of a z-distribution (similar to when confidence intervals about a population proportion were created). Thus, our test statistic will take the form of a z-score, where the equation for the test statistic is given in Equation 8.2. In Equation 8.2, P is the value of the population proportion (our hypothesized value), and is the sample proportion, which is defined as the number of observations with the trait/characteristic of interest, x, out of the number in our sample, n, i.e.,. However, before working on some examples, we need to first consider the requirements necessary for Equation 8.2 to provide reliable results. The assumptions for conducting a hypothesis test on a population proportion are: 1. The data was collected via a simple random sample. 2. The sample size must be large enough to ensure an approximately normal distribution. An easy rule of thumb is if both np > 15 and n(1-p)>15 are true, then the sample size is adequate. Since we are testing the null hypothesis, we must use the proportion stated in the null hypothesis and not the sample proportion when checking this assumption (other textbooks claim that 15 can be replaced by 10 or as little as 5, although research hints otherwise). Voter Example: A political analyst states that fewer than 30% of the voting population will vote in an upcoming city election. To support his claim, he randomly samples 400 registered voters in the city and 3:17:02 PM]

29 Chapter 8.8 determines that 98 plan to vote in the upcoming election. Conduct an appropriate hypothesis test for the political analyst using a level of significance of Step 1: Determine the null and alternative hypotheses Because the political analyst stated that less than 30% will vote (no equality implied), the claim must be the alternative hypothesis. Therefore the null and alternative hypotheses can be written as and respectively. Note, it is the political analyst's goal to disprove the null hypothesis, thus verifying his statement. Step 2: Check the assumptions Since P = 0.30, then (400)(0.3) = 120 and (400)(1-0.3) = 280. Since both are greater than 15, the sample size is sufficient to use the z-distribution. Step 3: Calculate the test statistic and p-value The value of the sample proportion is and the test statistic is: Because the alternative hypothesis states "less than," this is a left-tailed test and the rejection region is contained in the left tail only. Thus, to calculate the p-value that corresponds to the test statistic, we must first activate the p-value calculator for the z-distribution interactive tool (found in the floating menu to the right of the screen). Once activated, the interactive tool should look similar to Figure :17:02 PM]