Let s look at some data provided by Crawley on ozone levels (in pphm) taken on 10 days from two market gardens B & C:

Transcription

1 Tutorial 4: Two-sample tests, power analysis, and tabular data Goal: To provide a more in depth look at univariate statistics and to explore quantitative and graphical methods necessary for determining the normality of sample data. We will also examine how to examine grouped data. Note: All text in the Arial font is instruction or explanation. All text in Courier font is input or output from R. Step-1: Comparing Variances Before comparing two sample means, one must customarily test to see whether the variances are significantly different. The simplest form of this test that we learned in lecture is Fishers F- max test where one divides the larger variance by the smaller variance. If the variances are the same then F=1. If the variances are different, they will equal a value > 1. As in all of biostatistics, we must ask how big does the value have to be before it is significantly greater than 1? Let s look at some data provided by Crawley on ozone levels (in pphm) taken on 10 days from two market gardens B & C: > gardenb <- c(5,5,6,7,4,4,3,5,6,5) > gardenc <- c(3,3,2,1,10,4,3,11,3,10) Since R contains built in tables of all the major statistics, we can query it to find out at the get go what the critical value is that we need to exceed at 9 df (10-1; both gardens) and an alpha of 0.05 (note since we are presupposing one garden is different than the other, this is a two-tailed test, so we must split the alpha to.025 and on the right hand side this will be 0.975): > qf(0.975,9,9) [1] Now, calculate the variance for each single sample (as we have done in previous univariate tutorials, and run the F-test): > var(gardenb) [1] > var(gardenc) [1] > F.ratio<-var(gardenC)/var(gardenB) > F.ratio [1] Since the calculated value of F is greater than the table value of F, we reject the null hypothesis and conclude that the variances are significantly different. This then assists us with following up with a specific two-sample test. NOTE: I assume here that you have already done all of the univariate analysis of these data and confirmed that the samples are normally distributed. Intro to R Page 1

2 This procedure, while instructive, can be speeded up by directly applying the built in function called var.test: > var.test (gardenb,gardenc) F test to compare two variances data: gardenb and gardenc F = , num df = 9, denom df = 9, p-value = alternative hypothesis: true ratio of variances is not equal to ratio of variances What s different about the two results? When we did it manually, the F-ratio was approximately 10; when we did it automatically, the F-ratio was approximately 1/10 th. The difference is that R does the variance test based on the order in which the variables were entered and doesn t recognize the difference between the smaller and larger one. The good news is that despite this, the P-value is still correct and you arrive at the same conclusion to reject the null hypothesis. You just need to be aware of this if you are reporting the F-value. Step-2: Two sample t-test with equal variances Let s add another garden to the mix and continue with an example, > gardena <- c(3,4,4,3,2,3,1,3,5,2) We can start calculating what the critical value of t will be for two samples of N = 10; which is df = 20 2 = 18 and again assume alpha = 0.05, split into two tails: > qt(0.975, 18) [1] And, we can proceed as above to use R to calculate the t-test long-hand (as we would on a calculator), or take the more direct approach, testing the variance first, then doing the equal variance t-test: > var.test (gardena, gardenb) F test to compare two variances data: gardena and gardenb F = 1, num df = 9, denom df = 9, p-value = 1 alternative hypothesis: true ratio of variances is not equal to 1 Intro to R Page 2

3 ratio of variances 1 So, we are unable to reject the null hypothesis of no difference between variances and conclude that the variances are homogeneous. Since the data are already normal (not shown), we can proceed with a Welch Equal Variance Two-Sample t-test: > t.test (gardena, gardenb) Welch Two Sample t-test data: gardena and gardenb t = , df = 18, p-value = alternative hypothesis: true difference in means is not equal to mean of x mean of y 3 5 The conclusion being that the null hypothesis should be rejected and you conclude that the two gardens are significantly different in mean ozone concentrations. Your stats will then likely be followed by some sort of graphics (in a thesis or manuscript). In the case of two-sample tests, side-by-side box-plots are usually the choice. A nice refinement is the use of the notched boxplot which provides a notch (95% confidence interval) around each median. Defacto, if the two notches do not overlap, then the medians are significantly different. Text can also be added to the plot. Try the following: > ozone <- c(gardena,gardenb) > label<-factor(c(rep("a",10),rep("b",10))) > boxplot(ozone~label,notch=t,xlab="garden",ylab="ozone") > text(2,2,"t= ") > text(2,1.5,"p=0.001") Intro to R Page 3

4 Ozone t= P=0.001 A B Garden If the data were not normally distributed, there is a nonparametric alternative to the two-sample t-test and that is the Wilcoxon Rank Sum Test. The automatic procedure in R for doing so is Wilcox.test: > wilcox.test(gardena,gardenb) Wilcoxon rank sum test with continuity correction data: gardena and gardenb W = 11, p-value = alternative hypothesis: true location shift is not equal to 0 Warning message: cannot compute exact p-value with ties in: wilcox.test.default(gardena, gardenb) This function actually approximates a z-value for purposes of computation and hypothesis testing. We obviously reject the null hypothesis because p = << The warninmg Intro to R Page 4

5 message at the end of the printout is not of particular concern. It is just to draw attention to the fact that there are ties in the data and an approximate value of p has been provided. This is not a real problem for most applications. Step 3: Tests on paired samples Recall that there are many instances in biological situations where the two samples are not independent in space or time. These are referred to as paired sample analyses. Using some data from Crawley (2006) where a measurement (number of invertebrate species) was taken upstream from a sewage outfall, the other measurement taken downstream (thus, two paired measurements for each stream): > streams down up To run a paired t-test, simply specify the paired = T option: > t.test(down,up,paired=t) Paired t-test data: down and up t = , df = 15, p-value = alternative hypothesis: true difference in means is not equal to mean of the differences Intro to R Page 5

6 Notice that at t = , df = 15, P = , so we reject the null hypothesis and conclude that here is a significant difference in species diversity above vs. below the outfall on each stream. Note that there is another approach to doing this type of problem and that involves an analysis of the paired differences (d). This method is more consistent with your text and what we discussed in lecture (and yields identical results): > d <- up-down > t.test(d) One Sample t-test data: d t = , df = 15, p-value = alternative hypothesis: true mean is not equal to mean of x Step-4: The Sign Test Last in our discussion of two sample tests, and an extension of the paired t-test that we just did, is the sign test. This test is the nonparametric equivalent of the paired t-test. The test is useful when either assumptions can not be met or if you are working with something that can be scored rather than measured explicitly. These tests are often useful in behavior studies where the investigator scores something as a positive or negative response to a stimulus. For example, suppose a dive team of 9 divers is evaluated fro a new training regimen. Each is asked to dive once and it is scored by observers. The divers then go on a 4-week training regimen and are brought back to each dive again and be scored again. After the numbers are tallied, each diver is rated as either better or worse relative to the pre-training regimen. Suppose 8 were judged to be better and 1 worse. How likely is it that 8 of 9 would be better? This is best modeled with a binomial distribution asking what is the number of failures (1) relative to the total (9): > binom.test(1,9) Exact binomial test data: 1 and 9 number of successes = 1, number of trials = 9, p-value = alternative hypothesis: true probability of success is not equal to Intro to R Page 6

7 probability of success Thus, this is quite a significant result (p = ) and unlikely to occur by chance. We reject the null hypothesis of no effect of the training regimen and conclude that it improves diving. The binomial test can be easily used to test or compare two proportions also. Suppose in a company that you observe that 196 men are promoted within a given year and only 4 women. Is this the blatant sexism that it appears to be? To test the question, we must examine how many total men and women there are in the company and then compare the two proportions. If there are 196 women and 3270 men, we can use the built in binomial proportions test in R which relies on the function prop.test: > prop.test(c(4,196),c(40,3270)) 2-sample test for equality of proportions with continuity correction data: c(4, 196) out of c(40, 3270) X-squared = , df = 1, p-value = alternative hypothesis: two.sided prop 1 prop Warning message: Chi-squared approximation may be incorrect in: prop.test(c(4, 196), c(40, 3270)) There is no statistical evidence of discrimination here with a p-value of A result like this will happen up to 47% of the time by chance alone. Step-5: Power and sample size determination A statistical test will not be able to detect a true difference if the sample size is too small compared to the magnitude of the difference. R has various methods for doing power and sample size calculations for one- and two-sample t-tests and comparing two proportions. Without going into an extensive review of the theory (see lecture notes and Zar for formulae) and manual calculations in R, let s look directly at the built in functions for power analysis in R. Let s consider an example where two groups are given different diets and their growth is measured. We wish to compute the sample size with a power of 90%, using a two-sided test at the 1% level to find a difference of 0.5 cm in a distribution with a SD of 2 cm. The R code for this is: Intro to R Page 7

8 > power.t.test(delta=0.5, sd=2, sig.level=0.01, power=0.9) Two-sample t test power calculation n = delta = 0.5 sd = 2 sig.level = 0.01 power = 0.9 alternative = two.sided NOTE: n is number in *each* group Note that delta stands for true difference, and sd is the standard deviation. This suggests that as ample size of 478 would be needed for this level of precision. Alternatively, one could start by substituting in sample size guesses into the power function and observing what happens to the calculated level of power as a result. For example (using 250 as a starting guess): > power.t.test(n=250, delta=0.5, sd=2, sig.level=0.01) Two-sample t test power calculation n = 250 delta = 0.5 sd = 2 sig.level = 0.01 power = alternative = two.sided NOTE: n is number in *each* group Important: note that there are 5 different parameters to the power test. They are all inter-related. With any 4 you can calculate the 5 th. This is a wonderful way to explore the inter-relationships for any experimental design and a very important set of analyses to do with pilot data prior to starting the full size experiment. One sample problems are handled simply by adding the type= one.sample in the call statement for the power function. For paired tests, simply specify type= paired. By way of example for a paired t-test example: > power.t.test(delta=10, sd=10, power=0.8,type="paired") Paired t test power calculation n = delta = 10 sd = 10 Intro to R Page 8

9 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs Notice that a significance level of 0.05 was accepted automatically as the default. Lastly, the power.prop.test is presented to compare proportions and is analogous to what we just did for t-tests. The main difference is that delta and sd are replaced by the hypothesized probabilities in the two groups, p1 and p2. Suppose there are two groups of people, one which is given nicotine gum and the other nothing. The binary outcome is cessation of smoking. The stipulated values are p1=0.15 and p2=0.30. We desire a power of at least 80% and a traditional 5% significance level. How many people do we need in each group to run this experiment within these parameters? > power.prop.test(power=0.8,p1=.15,p2=.30) Two-sample comparison of proportions power calculation n = p1 = 0.15 p2 = 0.3 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Step-7: Working with tabular data Let s take a look at two forms of tabular data and the approach to their analysis. The first involves chi-square contingency tables which rely on count data. Contingency tables attempt to examine whether one variable is contingent on another. A typical example might involve trying to see if there is a relationship between hair color and eye color. Suppose you sample 114 people walking down the street and score them based on these two criteria. The results can be displayed as: Blue eyes Brown eyes Fair hair Dark hair These are our observed frequencies (or counts). The next step is to create a model which predicts the expected frequencies. There are a variety of ways to do this, but usually one utilizes the marginal (row and column) totals to derive these values. The observed values are then compared in light of the expected values and a chi-square statistic is generated and evaluated Intro to R Page 9

10 for significance. To solve this problem in R, the procedure is straightforward. First create a 2x2 matrix and then call the chi-square procedure: > count<-matrix(c(38,14,11,51),nrow=2) > count [,1] [,2] [1,] [2,] Note that you entered the data into the matrix column-wise, not row-wise. Next, run the test: > chisq.test(count) Pearson's Chi-squared test with Yates' continuity correction data: count X-squared = , df = 1, p-value = 8.7e-09 Note the use of scientific notation and that the p-value is exceptionally small. Conclusion: there is a highly significant relationship between hair color and eye color for this group of people. A variant of RxC contingency table analysis is Fisher s Exact Test. This is used when one or more of the expected frequencies is less than 5. Consider a small data set where there are 8 ants nests over 10 trees each of two species (A & B): Tree-A Tree-B w/ ants 6 2 w/o ants 4 8 > x<-as.matrix(c(6,4,2,8)) > dim(x)<-c(2,2) > x [,1] [,2] [1,] 6 2 [2,] 4 8 > fisher.test(x) Fisher's Exact Test for Count Data data: x p-value = alternative hypothesis: true odds ratio is not equal to Intro to R Page 10

11 odds ratio The Fisher test can be used with matrices much bigger than 2 x 2. Alternatively, the function may be provided as two vectors containing factor levels, instead of using a matrix. This saves the trouble of having to do all the tallying. Each observation is just listed on a separate line: > table tree nests 1 A ants 2 B ants 3 A none 4 A ants 5 B none 6 A none 7 A ants 8 B ants 9 B none 10 A none 11 A none 12 B none 13 B none 14 A ants 15 A ants 16 B none 17 A ants 18 B none 19 B none 20 B none > attach(table) > fisher.test(tree,nests) Fisher's Exact Test for Count Data data: tree and nests p-value = alternative hypothesis: true odds ratio is not equal to odds ratio Which is the same answer we arrived at above. Intro to R Page 11

12 Problem: Practice Problem 10 (p.309, W&S) Using R, solve problem 10 in your textbook. Provide explicit tests of variance and normality prior to doing the test. Provide a publication-grade paired, notched box-plot summarizing the differences. Problem: Practice Problem 16 (p. 311, W&S) Using R solve the Problem 16 in your textbook. Provide a publication figure to summarize your results. Problem: Practice Problem 24 (p. 314, W&S) Using R, solve problem 24. Provide a publication grade figure to summarize your results. Intro to R Page 12