Introduction. Statistics Toolbox

Transcription

1 Introduction A hypothesis test is a procedure for determining if an assertion about a characteristic of a population is reasonable. For example, suppose that someone says that the average price of a gallon of regular unleaded gas in Massachusetts is $1.15. How would you decide whether this statement is true? You could try to find out what every gas station in the state was charging and how many gallons they were selling at that price. That approach might be definitive, but it could end up costing more than the information is worth. A simpler approach is to find out the price of gas at a small number of randomly chosen stations around the state and compare the average price to $1.15. Of course, the average price you get will probably not be exactly $1.15 due to variability in price from one station to the next. Suppose your average price was $1.18. Is this three cent difference a result of chance variability, or is the original assertion incorrect? A hypothesis test can provide an answer. The following sections provide an overview of hypothesis testing with the Statistics Toolbox: Hypothesis Test Terminology Hypothesis Test Assumptions Example: Hypothesis Testing Available Hypothesis Tests Hypothesis Tests Hypothesis Test Terminology

2 Hypothesis Test Terminology To get started, there are some terms to define and assumptions to make: The null hypothesis is the original assertion. In this case the null hypothesis is that the average price of a gallon of gas is $1.15. The notation is H 0 : µ = There are three possibilities for the alternative hypothesis. You might only be interested in the result if gas prices were actually higher. In this case, the alternative hypothesis is H 1 : µ > The other possibilities are H 1 : µ < 1.15 and H 1 : µ The significance level is related to the degree of certainty you require in order to reject the null hypothesis in favor of the alternative. By taking a small sample you cannot be certain about your conclusion. So you decide in advance to reject the null hypothesis if the probability of observing your sampled result is less than the significance level. For a typical significance level of 5%, the notation is = For this significance level, the probability of incorrectly rejecting the null hypothesis when it is actually true is 5%. If you need more protection from this error, then choose a lower value of ph. The p-value is the probability of observing the given sample result under the assumption that the null hypothesis is true. If the p-value is less than, then you reject the null hypothesis. For example, if alpha = 0.05 and the p-value is 0.03, then you reject the null hypothesis. The converse is not true. If the p-value is greater than, you have insufficient evidence to reject the null hypothesis. The outputs for many hypothesis test functions also include confidence intervals. Loosely speaking, a confidence interval is a range of values that have a chosen probability of containing the true hypothesized quantity. Suppose, in the example, 1.15 is inside a 95% confidence interval for the mean, µ. That is equivalent to being unable to reject the null hypothesis at a significance level of Conversely if the 100(1- ) confidence interval does not contain 1.15, then you reject the null hypothesis at the alpha level of significance. Introduction Hypothesis Test Assumptions

3 Hypothesis Test Assumptions The difference between hypothesis test procedures often arises from differences in the assumptions that the researcher is willing to make about the data sample. For example, the Z-test assumes that the data represents independent samples from the same normal distribution and that you know the standard deviation,. The t-test has the same assumptions except that you estimate the standard deviation using the data instead of specifying it as a known quantity. Both tests have an associated signal-to-noise ratio The signal is the difference between the average and the hypothesized mean. The noise is the standard deviation posited or estimated. If the null hypothesis is true, then Z has a standard normal distribution, N(0,1). T has a Student's t distribution with the degrees of freedom,, equal to one less than the number of data values. Given the observed result for Z or T, and knowing the distribution of Z and T assuming the null hypothesis is true, it is possible to compute the probability (p-value) of observing this result. A very small p-value casts doubt on the truth of the null hypothesis. For example, suppose that the p-value was 0.001, meaning that the probability of observing the given Z or T was one in a thousand. That should make you skeptical enough about the null hypothesis that you reject it rather than believe that your result was just a lucky 999 to 1 shot. There are also nonparametric tests that do not even require the assumption that the data come from a normal distribution. In addition, there are functions for testing whether the normal assumption is reasonable. Hypothesis Test Terminology Example: Hypothesis Testing

4 Example: Hypothesis Testing This example uses the gasoline price data in gas.mat. There are two samples of 20 observed gas prices for the months of January and February, load gas As a first step, you may want to test whether the samples from each month follow a normal distribution. As each sample is relatively small, you might choose to perform a Lilliefors test (rather than a Jarque-Bera test). lillietest(price1) ans = 0 lillietest(price2) ans = 0 The result of the hypothesis test is a Boolean value that is 0 when you do not reject the null hypothesis, and 1 when you do reject that hypothesis. In each case, there is no need to reject the null hypothesis that the samples have a normal distribution. Suppose it is historically true that the standard deviation of gas prices at gas stations around Massachusetts is four cents a gallon. The Z-test is a procedure for testing the null hypothesis that the average price of a gallon of gas in January (price1) is $1.15. [h,pvalue,ci] = ztest(price1/100,1.15,0.04) 0 pvalue = ci = The Boolean output is 0, so you do not reject the null hypothesis. The result suggests that $1.15 is reasonable. The 95% confidence interval [ ] neatly brackets $1.15. What about February? Try a t-test with price2. Now you are not assuming that you know the standard deviation in price.

5 [h,pvalue,ci] = ttest(price2/100,1.15) 1 pvalue = e-04 ci = With the Boolean result 1, you can reject the null hypothesis at the default significance level, It looks like $1.15 is not a reasonable estimate of the gasoline price in February. The low end of the 95% confidence interval is greater than The function ttest2 allows you to compare the means of the two data samples. [h,sig,ci] = ttest2(price1,price2) sig = ci = The confidence interval (ci above) indicates that gasoline prices were between one and six cents lower in January than February. If the two samples were not normally distributed but had similar shape, it would have been more appropriate to use the nonparametric rank sum test in place of the t-test. You can still use the rank sum test with normally distributed data, but it is less powerful than the t-test. [p,h,stats] = ranksum(price1, price2) p = stats = zval: ranksum: 314

6 As might be expected, the rank sum test leads to the same conclusion but is less sensitive to the difference between samples (higher p-value). The box plot below gives less conclusive results. On a notched box plot, two groups have overlapping notches if their medians are not significantly different. Here the notches just barely overlap, indicating that the difference in medians is of borderline significance. (The results for a box plot are not always the same as for a t-test, which is based on means rather than medians.) Refer to Statistical Plots for more information about box plots. boxplot(prices,1) set(gca,'xticklabel',str2mat('january','february')) xlabel('month') ylabel('prices ($0.01)') Hypothesis Test Assumptions Available Hypothesis Tests

7 Available Hypothesis Tests The Statistics Toolbox has functions for performing the following tests. Function chi2gof dwtest jbtest kstest kstest2 lillietest ranksum runstest signrank signtest ttest ttest2 vartest vartest2 vartestn ztest What it Tests Chi-square test of distribution of one normal sample Durbin-Watson test Normal distribution for one sample Any specified distribution for one sample Equal distributions for two samples Normal distribution for one sample Median of two unpaired samples Randomness of the sequence of observations Median of two paired samples Median of two paired samples Mean of one normal sample Mean of two normal samples Variance of one normal sample Variance of two normal samples Variance of N normal samples Mean of normal sample with known standard deviation Example: Hypothesis Testing Statistical Plots

8 ztest Hypothesis testing for mean of one sample with known variance Syntax ztest(x,m,sigma) ttest(x,m) ztest(x,m,sigma,alpha) [h,sig,ci] = ztest(x,m,sigma,alpha,tail) ztest(...,alpha,tail,dim) Description ztest(x,m,sigma) performs a Z test at significance level 0.05 to determine whether a sample x from a normal distribution with standard deviation sigma could have mean m. x can also be a matrix or an n-dimensional array. For matrices, ztest performs separate Z tests along each column of x and returns a vector of results. For n-dimensional arrays, ztest works along the first nonsingleton dimension of x. ttest(x,m) performs a Z test of the hypothesis that the data in the vector ztest(x,m,sigma,alpha) gives control of the significance level alpha. For example, if alpha = 0.01 and the result is 1, you can reject the null hypothesis at the significance level If 0, you cannot reject the null hypothesis at the alpha level of significance. [h,sig,ci] = ztest(x,m,sigma,alpha,tail) allows specification of one- or two-tailed tests, where tail is a flag that specifies one of three alternative hypotheses: tail = 'both' specifies the alternative tail = 'right' specifies the alternative. tail = 'left' specifies the alternative. zval is the value of the Z statistic (default). where is the number of observations in the sample. sig is the probability that the observed value of Z could be as large or larger by chance under the null hypothesis that the mean of x is equal to m. ci is a 1-alpha confidence interval for the true mean. ztest(...,alpha,tail,dim) performs the test along dimension dim of the input x array. For a matrix x, dim=1 computes the Z test for each column (along the first dimension), and dim=2 computes the Z test for each row. By default, ztest works along the first nonsingleton dimension, so it treats a single-row input as a row vector. Example This example generates 100 normal random numbers with theoretical mean 0 and

9 standard deviation 1. The observed mean and standard deviation are different from their theoretical values, of course. You test the hypothesis that there is no true difference. m = mean(x) m = [h,sig,ci] = ztest(x,0,1) 0 sig = ci = The result, 0, means that you cannot reject the null hypothesis. The significance level is , which means that by chance you would have observed values of Z more extreme than the one in this example in 47 of 100 similar experiments. A 95% confidence interval on the mean is [ ], which includes the theoretical (and hypothesized) mean of zero. zscore Bibliography

10 ttest Hypothesis testing for single sample mean Syntax ttest(x) ttest(x,m) ttest(x,y) ttest(...,alpha) ttest(...,alpha,tail) [h,p,ci,stats] = ttest(...) ttest(...,alpha,tail,dim) Description ttest(x) performs a t-test of the hypothesis that the data in the vector x comes from a distribution with mean zero, and returns the result of the test in h. h=0 indicates that the null hypothesis (mean is zero) cannot be rejected at the 5% significance level. h=1 indicates that the null hypothesis can be rejected at the 5% level. The data are assumed to come from a normal distribution with unknown variance. x can also be a matrix or an n-dimensional array. For matrices, ttest performs separate t-tests along each column of x and returns a vector of results. For n-dimensional arrays, ttest works along the first nonsingleton dimension of x. ttest(x,m) performs a t-test of the hypothesis that the data in the vector x comes from a distribution with mean m. ttest(x,y) performs a paired t-test of the hypothesis that two matched (or paired) samples in the vectors x and y come from distributions with equal means. The difference x-y is assumed to come from a normal distribution with unknown variance. x and y must be vectors of the same length, or arrays of the same size. ttest(...,alpha) performs the test at the significance level (100*alpha)%. For example, if alpha = 0.01, and the result h is 1, you can reject the null hypothesis at the significance level If h is 0, you cannot reject the null hypothesis at the alpha level of significance. ttest(...,alpha,tail) performs the test against the alternative hypothesis specified by tail. There are three options for tail: 'both' 'right' 'left' m) (two-tailed test). This is the default. m) (right-tailed test). m) (left-tailed test). [h,p,ci,stats] = ttest(...) returns a structure with the following fields: 'tstat' 'df' 'sd' standard deviation of x-y. Output p is the p-value associated with the t-statistic

11 where is the sample standard deviation and is the number of observations in the sample. p is the probability that the value of the t-statistic is equal to or more extreme than the observed value by chance, under the null hypothesis that the mean of x is equal to m. ci is a 1-alpha confidence interval for the true mean. ttest(...,alpha,tail,dim) performs the test along dimension dim of the input x array. For a matrix x, dim=1 computes the t-test for each column (along the first dimension), and dim=2 computes the t-test for each row. By default, ttest works along the first nonsingleton dimension, so it treats a single-row input as a row vector. Example This example generates 100 normal random numbers with theoretical mean 0 and standard deviation 1. The observed mean and standard deviation are different from their theoretical values, of course, so you test the hypothesis that there is no true difference. Here is a normal random number generator test: [h,p,ci] = ttest(x,0) 0 p = ci = The result 0 means that you cannot reject the null hypothesis. The significance level is , which means that by chance you would have observed values of T more extreme than the one in this example in 45 of 100 similar experiments. A 95% confidence interval on the mean is [ ], which includes the theoretical (and hypothesized) mean of zero. tstat ttest2

12 ttest2 Hypothesis testing for difference in means of two samples Syntax ttest2(x,y) [h,significance,ci] = ttest2(x,y,alpha) [h,significance,ci,stats] = ttest2(x,y,alpha) [...] = ttest2(x,y,alpha,tail) ttest2(x,y,alpha,tail,'unequal') ttest2(...,dim) Description ttest2(x,y) performs a t-test to determine whether two samples from a normal distribution (x and y) could have the same mean when the standard deviations are unknown but assumed equal. The vectors x and y can have different lengths. x and y can also be matrices or n-dimensional arrays. For matrices, ttest2 performs separate t-tests along each column and returns a vector of results. x and y must have the same number of columns. For n-dimensional arrays, ttest2 works along the first nonsingleton dimension. x and y must have the same size along all the remaining dimensions. The result, h, is 1 if you can reject the null hypothesis that the means are equal at the 0.05 significance level and 0 otherwise. significance is the p-value associated with the t-statistic where s is the pooled sample standard deviation and n and m are the numbers of observations in the x and y samples. significance is the probability that the observed value of T could be as large or larger by chance under the null hypothesis that the mean of x is equal to the mean of y. ci is a 95% confidence interval for the true difference in means. [h,significance,ci] = ttest2(x,y,alpha) gives control of the significance level alpha. For example, if alpha = 0.01, and the result, h, is 1, you can reject the null hypothesis at the significance level ci in this case is a 100(1 - alpha) % confidence interval for the true difference in means. [h,significance,ci,stats] = ttest2(x,y,alpha) returns a structure stats with the following three fields: tstat df 'sd' variance case, or a vector containing the unpooled estimates of the population standard deviations in the unequal variance case

13 [...] = ttest2(x,y,alpha,tail) allows specification of one- or two-tailed tests, where tail is a flag that specifies one of three alternative hypotheses: tail = 'both' specifies the alternative tail = 'right' specifies the alternative. tail = 'left' specifies the alternative. (default). [...] = ttest2(x,y,alpha,tail,'unequal') performs the test assuming that the two samples come from normal distributions with unknown and possibly unequal variances. This is known as the Behrens-Fisher problem. ttest2 uses Satterthwaite's approximation for the effective degrees of freedom. [...] = ttest2(...,dim) performs the test along dimension dim of the input x and y arrays. For matrix inputs, dim=1 computes the t-test for each column (along the first dimension), and dim=2 computes the t-test for each row. By default, ttest2 works along the first nonsingleton dimension, so it treats single-row inputs as row vectors. Examples This example generates 100 normal random numbers with theoretical mean 0 and standard deviation 1. You then generate 100 more normal random numbers with theoretical mean 1/2 and standard deviation 1. The observed means and standard deviations are different from their theoretical values, of course. You test the hypothesis that there is no true difference between the two means. Notice that the true difference is only one-half of the standard deviation of the individual observations, so you are trying to detect a signal that is only one-half the size of the inherent noise in the process. [h,significance,ci] = ttest2(x,y) 1 significance = ci = The result 1 means that you can reject the null hypothesis. The significance is , which means that by chance you would have observed values of t more extreme than the one in this example in only 17 of 10,000 similar experiments! A 95% confidence interval on the mean is [ ], which includes the theoretical (and hypothesized) difference of ttest unidcdf