How to Conduct a Hypothesis Test

Transcription

1 How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some cause that we should be looking for? We need to have a way to differentiate between events that easily occur by chance and those that are highly unlikely to occur randomly. Such a method should be streamlined and well defined so that others can replicate our statistical experiments. There are a few different methods used to conduct hypothesis tests. One of these methods is known as the traditional method, and another involves what is known as a p- value. The steps of these two most common methods are identical up to a point, then diverge slightly. Both the traditional method for hypothesis testing and the p-value method are outlined below. The Traditional Method The traditional method is as follows: 1. Begin by stating the claim or hypothesis that is being tested. Also form a statement for the case that the hypothesis is false. 2. Express both of the statements from the first step in mathematical symbols. These statements will use symbols such as inequalities and equals signs. 3. Identify which of the two symbolic statements does not have equality in it. This could simply be a "not equals" sign, but could also be an "is less than" sign ( ). The statement containing inequality is called the alternative hypothesis, and is denoted H1 or Ha. 4. The statement from the first step that makes the statement that a parameter equals a particular value is called the null hypothesis, denoted H0. 5. Choose which significance level that we want. A significance level is typically denoted by the Greek letter alpha. Here we should consider Type I errors. A Type I error occurs when we reject a null hypothesis that is actually true. If we are very concerned about this possibility occurring, then our value for alpha should be small. There is a bit of a trade off here. The smaller the alpha, the most costly the experiment. The values 0.05 and 0.01 are common values used for alpha, but any positive number between 0 and 0.50 could be used for a significance level. 6. Determine which statistic and distribution we should use. The type of distribution is dictated by features of the data. Common distributions include: z score, t score and chi-squared. 7. Find the test statistic and critical value for this statistic. Here we will have to consider if we are conducting a two tailed test (typically when the alternative hypothesis contains a is not equal to symbol, or a one tailed test (typically used when an inequality is involved in the statement of the alternative hypothesis). 8. From the type of distribution, confidence level, critical value and test statistic we sketch a graph. 9. If the test statistic is in our critical region, then we must reject the null hypothesis. The alternative hypothesis stands. If the test statistic is not in our critical region, then we fail to reject the null hypothesis. This does not prove that the null hypothesis is true, but gives a way to quantify how likely it is to be true. 10. We now state the results of the hypothesis test in such a way that the original claim is addressed.

2 The p-value Method The p-value method is nearly identical to the traditional method. The first six steps are the same. For step seven we find the test statistic and p-value. We then reject the null hypothesis if p-value is less than or equal to alpha. We fail to reject the null hypothesis if the p-value is greater than alpha. We then wrap up the test as before, by clearly stating the results. An Example of a Hypothesis Test Mathematics and statistics are not for spectators. To truly understand what is going on, we should read through and work through several examples. If we know about the ideas behind hypothesis testing and seen an overview of the method, then the next step is to see an example. The following shows an example of the both traditional method of a hypothesis test and the p-value method. A Statement of the Problem Suppose that a doctor claims that 17 year olds have an average body temperature that is higher than the commonly accepted average human temperature of 98.6 degrees Fahrenheit. A simple random statistical sample of 25 people, each of age 17, is selected. The average temperature of the 17 year olds is found to be 98.9 degrees, with standard deviation of 0.6 degrees. The Null and Alternative Hypotheses The claim being investigated is that the average body temperature of 17 year olds is greater than 98.6 degrees This corresponds to the statement x The negation of this is that the population average is not greater than 98.6 degrees. In other words the average temperature is less than or equal to 98.6 degrees. In symbols this is x < One of these statements must become the null hypothesis, and the other should be the alternative hypothesis. The null hypothesis contains equality. So for the above, the null hypothesis H0 : x = It is common practice to only state the null hypothesis in terms of an equals sign, and not a greater than or equal to or less than or equal to. The statement that does not contain equality is the alternative hypothesis, or H1 : x >98.6. Mathematics and statistics are not for spectators. To truly understand what is going on, we should read through and work through several examples. If we know about the ideas behind hypothesis testing and seen an overview of the method, then the next step is to see an example. The following shows an example of the both traditional method of a hypothesis test and the p-value method.

3 A Statement of the Problem Suppose that a doctor claims that 17 year olds have an average body temperature that is higher than the commonly accepted average human temperature of 98.6 degrees Fahrenheit. A simple random statistical sample of 25 people, each of age 17, is selected. The average temperature of the 17 year olds is found to be 98.9 degrees, with standard deviation of 0.6 degrees. The Null and Alternative Hypotheses The claim being investigated is that the average body temperature of 17 year olds is greater than 98.6 degrees This corresponds to the statement x The negation of this is that the population average is not greater than 98.6 degrees. In other words the average temperature is less than or equal to 98.6 degrees. In symbols this is x < One of these statements must become the null hypothesis, and the other should be the alternative hypothesis. The null hypothesis contains equality. So for the above, the null hypothesis H0 : x = It is common practice to only state the null hypothesis in terms of an equals sign, and not a greater than or equal to or less than or equal to. The statement that does not contain equality is the alternative hypothesis, or H1 : x >98.6. What is the Difference Between Alpha and P-Values In conducting a test of significance or hypothesis test there are two numbers that are easy to get confused. One number is called the p-value of the test statistic. The other number of interest is the level of significance, or alpha. These numbers are easily confused because they are both numbers between zero and one, and are in fact probabilities. Alpha The Level of Significance The number alpha is the threshold value that we measure p values against. It tells us how extreme observed results must be in order to reject the null hypothesis of a significance test. The value of alpha is associated to the confidence level of our test. The following lists some levels of confidence with their related values of alpha: For results with a 90% level of confidence, the value of alpha is = For results with a 95% level of confidence, the value of alpha is = For results with a 99% level of confidence, the value of alpha is = And in general, for results with a C% level of confidence, the value of alpha is 1 C/100. Although in theory and practice many numbers can be used for alpha, the most commonly used is The reason for this both because consensus shows that this level is appropriate, and historically it has been accepted as the standard.

4 The alpha value gives us the probability of a type I error. Type I errors occur when we reject a null hypothesis that is actually true. Thus, in the long run, for a test with level of significance of 0.05 = 1/20, a true null hypothesis will be rejected one out of every 20 times. P-Values (more on p-values below) The other number that is part of a test of significance is a p-value. A p-value is also a probability, but it comes from a different source than alpha. Every test statistic has a corresponding probability or p-value. This value is the probability that the observed statistic occurred by chance alone. Since there are a number of different test statistics, there are a number of different ways to find a p-value. For some cases we need to know the probability distribution of the population. The p-value of the test statistic is a way of saying how extreme that statistic is for our sample data. The smaller the p-value, the more unlikely the observed sample. Statistical Significance To determine if an observed outcome is statistically significant, we compare the values of alpha and the p - value. There are two possibilities that emerge: The p-value is less than or equal to alpha. In this case we reject the null hypothesis. When this happens we say that the result is statistically significant. In other words, we are reasonably sure that there is something besides chance alone that gave us an observed sample. The p-value is greater than alpha. In this case we fail to reject the null hypothesis. When this happens we say that the result is not statistically significant. In other words, we are reasonably sure that our observed data can be explained by chance alone. The implication of the above is that the smaller the value of alpha is, the more difficult it is to claim that a result is statistically significant. On the other hand, the larger the value of alpha is the easier is it to claim that a result is statistically significant. Coupled with this, however, is the higher probability that what we observed can be attributed to chance. What Level of Alpha Determines Statistical Significance Not all results of hypothesis tests are equal. A hypothesis test or test of statistical significance typically has a level of significance attached to it. This level of significance is a number that is typically denoted with the Greek letter alpha. One question that comes up in statistics class is, What value of alpha should be used for our hypothesis tests? The answer to this question, as with many other questions in statistics is, It depends on the situation. We will explore what we mean by this. Many journals throughout different disciplines define that statistically significant

5 results are those for which alpha is equal to 0.05 or 5%. But the main point to note is that there is not a universal value of alpha that should be used for all statistical tests. Commonly Used Values Levels of Significance The number represented by alpha is a probability, so it can take a value of any nonnegative real number less than one. Although in theory any number between 0 and 1 can be used for alpha, when it comes to statistical practice this is not the case. Of all levels of significance the values of 0.10, 0.05 and 0.01 are the ones most commonly used for alpha. As we will see, there could be reasons for using values of alpha other than the most commonly used numbers. Level of Significance and Type I Errors One consideration against a one size fits all value for alpha has to do with what this number is the probability of. The level of significance of a hypothesis test is exactly equal to the probability of a Type I error. A Type I error consists of incorrectly rejecting the null hypothesis when the null hypothesis is actually true. The smaller the value of alpha, the less likely it is that we reject a true null hypothesis. There are different instances where it is more acceptable to have a Type I error. A larger value of alpha, even one greater than 0.10 may be appropriate when a smaller value of alpha results in a less desirable outcome. In medical screening for a disease, consider the possibilities of a test that falsely tests positive for a disease with one that falsely tests negative for a disease. A false positive will result in anxiety for our patient, but will lead to other tests that will determine that the verdict of our test was indeed incorrect. A false negative will give our patient the incorrect assumption that he does not have a disease when he in fact does. The result is that the disease will not be treated. Given the choice we would rather have conditions that result in a false positive than a false negative. In this situation we would gladly accept a greater value for alpha if it resulted in a tradeoff of a lower likelihood of a false negative. Level of Significance and P-Values A level of significance is a value that we set to determine statistical significance. This is ends up being the standard by which we measure the calculated p-value of our test statistic. To say that a result is statistically significant at the level alpha just means that the p-value is less than alpha. For instance, for a value of alpha = 0.05, if the p-value is greater than 0.05, then we fail to reject the null hypothesis. There are some instances in which we would need a very small p-value to reject a null hypothesis. If our null hypothesis concerns something that is widely accepted as true, then there must be a high degree of evidence in

6 favor of rejecting the null hypothesis. This is provided by a p-value that is much smaller than the commonly used values for alpha. Conclusion There is not one value of alpha that determines statistical significance. Although numbers such as 0.10, 0.05 and 0.01 are values commonly used for alpha, there is no overriding mathematical theorem that says these are the only levels of significance that we can use. As with many things in statistics we must think before we calculate and above all use common sense. What is a P-Value? Hypothesis tests or test of significance involve the calculation of a number known as a p-value. This number is very important to the conclusion of our test. P-values are related to the test statistic and give us a measurement of evidence against the null hypothesis. Null and Alternative Hypotheses Tests of statistical significance all begin with a null and an alternative hypothesis. The null hypothesis is the statement of no effect or a statement of commonly accepted state of affairs. The alternative hypothesis is what we are attempting to prove. The working assumption in a hypothesis test is that the null hypothesis is true. Test Statistic We will assume that the conditions are met for the particular test that we are working with. A simple random sample gives us sample data. From this data we can calculate a test statistic. Test statistics vary greatly depending upon what parameters our hypothesis test concerns. Some common test statistics include: z - statistic for hypothesis tests concerning the population mean, when we know the population standard deviation. t - statistic for hypothesis tests concerning the population mean, when we do not know the population standard deviation. t - statistic for hypothesis tests concerning the difference of two independent population mean, when we do not know the standard deviation of either of the two populations. z - statistic for hypothesis tests concerning a population proportion. Chi-square - statistic for hypothesis tests concerning the difference between an expected and actual count for categorical data. Calculation of P-Values Test statistics are helpful, but it can be more helpful to assign a p-value to these statistics. A p-value is the probability that, if the null hypothesis were true, we would observe a statistic at least as extreme as the one observed. To calculate a p-value we use the appropriate software or statistical table that corresponds with our test statistic.

7 For example, we would use a standard normal distribution when calculating a z test statistic. Values of z with large absolute values (such as those over 2.5) are not very common and would give a small p- value. Values of z that are closer to zero are more common, and would give much larger p-values. Interpretation of the P-Value As we have noted, a p-value is a probability. This means that it is a real number from 0 and 1. While a test statistic is one way to measure how extreme a statistic is for a particular sample, p-values are another way of measuring this. When we obtain a statistical given sample, the question that we should always is, Is this sample the way it is by chance alone with a true null hypothesis, or is the null hypothesis false? If our p-value is small, then this could mean one of two things: The null hypothesis is true, but we were just very lucky in obtaining our observed sample. Our sample is the way it is due to the fact that the null hypothesis is false. In general, the smaller the p-value, the more evidence that we have against our null hypothesis. How Small Is Small Enough? How small of a p-value do we need in order to reject the null hypothesis? The answer to this is, It depends. A common rule of thumb is that the p-value must be less than or equal to 0.05, but there is nothing universal about this value. Typically, before we conduct a hypothesis test, we choose a threshold value. If we have any p-value that is less than or equal to this threshold, then we reject the null hypothesis. Otherwise we fail to reject the null hypothesis. This threshold is called the level of significance of our hypothesis test, and is denoted by the Greek letter alpha. There is no value of alpha that always defines statistical significance. How to Construct a Confidence Interval for the Population Variance One of the goals of inferential statistics is to estimate an unknown population parameter from a statistical sample. The estimate that we obtain is an interval of potential values, and is called a confidence interval. Attached to the interval is a level of confidence, indicating the reliability of our estimate. One parameter that we may want to estimate is the variance. The variance is a measurement of variability, or in other words, how spread out a data set is. We will see the steps and the theory behind the construction of a confidence interval for a population variance.

8 Assumptions It is always a good idea to clearly state what assumptions we need to make in order move forward. We assume that we are working with simple random sample of size n from a normal distribution. Or we assume that our sample size is large enough that we can invoke the central limit theorem. Chi-Square Random Variable If there is any variability whatsoever in a random variable, then the variance is always nonnegative. Due to this fact, the population variance is not distributed normally. Using some mathematical theory from mathematical statistics, given our assumptions the following is a chi-square random variable with n - 1 degrees of freedom. (n - 1)s 2 / σ 2 Here s 2 is the sample variance and σ 2 is the population variance. Confidence Interval For a two-sided 1 - α confidence interval, we locate the row that corresponds with our number of degrees of freedom. Next we read two numbers from this row. The first, denoted by A is the table value with probability α/2 to the left. The second table value, denoted by B is the table value with α/2 to the right. This means that 1- α is of our chi-square distribution is between these two numbers. This gives us: A < (n - 1)s 2 / σ 2 < B Since we want an interval for σ 2 we rearrange our inequality: A /[ (n - 1)s 2 ] < 1 / σ 2 < B / [ (n - 1)s 2 ] This gives us the following confidence interval: [ (n - 1)s 2 ] / B < σ 2 < [ (n - 1)s 2 ] / A. Note on Symmetry Many other confidence intervals are of the form estimate +/- margin of error. These confidence intervals, such as those for a population mean, are symmetric about the estimate that is used. Confidence intervals for the variance do not have this property. Variances are always nonnegative, and a chisquare distribution is too. Furthermore, a chi-square distribution is not symmetric.