1 Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the same individual or plot, measurements taken for each group in the same site or on the same day). When the variation among individuals, sites, or days is large compared to the variation between treatments, you should use a paired t-test. If your data do not have a paired structure, or if the variation among the data points within groups is not large, use a two-sample t-test (it is more powerful). The paired t-test essentially checks whether the mean of the differences between paired measurements is significantly different from zero. To run the paired t-test: Have your data in separate columns such that rows contain paired observations from each group. Also construct a site or subject variable that is basically the row numbers. For example, enter the following data in three COLUMNS in a new View window. day kangaroo rats packrats Create a new View window, and in the Analysis browser, open Bivariate plots. Then double-click on Scattergram and a dialog box will appear. Click on OK. In the Variables browser click on day and then click X variable. Then drag through both rat variables and click Y variable. A graph will appear like the one below. Do the data look as though there is a consistent pattern within each pair but a lot of variance among pairs? Y Variables Bivariate Scattergram day kangaroo rats packrats Now double-click on Paired Comparisons in the Analysis browser. All the defaults are fine so click OK. If you still had the graph selected, three comparisons came up. Remove day. If the analysis didn t already appear, Add each species in the Variables browser. The output will show you the P-value in a brief table. In our example, the difference between kangaroo rat and packrat numbers is highly significant; there are on average 1.3 more packrats than kangaroo rats per plot.
2 Statistics with Statview - 19 Paired t-test Hypothesized Difference = 0 Mean Diff. DF t-value P-Value kangaroo rats, packrats <.0001 It is always a good idea to try a two-sample t-test on the data as well. That way you can be sure whether the use of the paired t-test was necessary. The means for the rat data don t differ significantly when tested with a two-sample t-test, so the paired t-test was necessary. If you had run only the two-sample t-test, you would have concluded that there wasn t a significant difference in numbers/plot between the two species. Unpaired t-test for density Grouping Variable: species Hypothesized Difference = 0 Mean Di DF t-value P-Value kangaroo rat, packrat Group Info for density Grouping Variable: species kangaroo rat packrat Cou Mean Variance Std. Dev. Std. Err Mann-Whitney U test (compares two groups when assumptions aren t met) A Mann-Whitney U test doesn t require normality or homogeneous variances, but it is somewhat less powerful than the t-test (which means the Mann-Whitney U test is less likely to find a significant difference between the two populations). So, if you have approximately normal data, then you should use a two-sample t-test. The following example assumes your data for the two iguana populations are grouped under a compact variable named Density. To run a Mann-Whitney U test: Pull down the Analyze menu and choose new View window. In the analysis browser, double click on Nonparametrics. A dialog box will appear. Click on the button for Unpaired two group/ Mann-Whitney, then click OK. In the variable browser, select the continuous variable (e.g., Density), and then click Add. Then click on the arrow in front of the continuous variable to reveal the compact variable name (Isl...). Click on the compact variable name and then click Add. A table showing the results of the Mann-Whitney U test will be displayed. The output consists of a table showing the parameters used in the calculation of the test, but the main entry to examine is the Tied P-value. If P<0.05, then the medians of the two groups differ. In this example P<0.05 so the two medians are significantly different. The next table shows the Rank information for the test, which you can just ignore, unless you are a real stats nerd and want to know how it calculated the result.
3 Statistics with Statview - 20 Mann-Whitney U for Density Grouping Variable: Islands U U Prime Z-Value P-Value Tied Z-Value Tied P-Value # Ties Mann-Whitney Rank Info for Density Grouping Variable: Islands Count Sum Ranks Mean Rank Island A Island B So again, this statistical test provides strong support for your original hypothesis that Island B has more iguanas than Island A. One-way ANOVA and Post-hoc paired comparisons (Fisher PLSD test) The one-way analysis of variance (ANOVA) is used when you are comparing the means of three or more groups and only one independent variable. It tests the significance of the difference among the means of the groups, and not which means are different from each other. In order to find out which means are significantly different from each other, you have to conduct Post-hoc paired comparisons. They are called Post-hoc, because you conduct the tests after you have completed an ANOVA and it shows that there is a significant difference among the means. One of the Post-hoc tests is the Fisher PLSD (Protected Least Sig. Difference) test, which gives you a test of all pairwise combinations. To conduct these two tests your data can be in arranged in either of two ways: 1) data for each group can be in a separate column, and be under a compact variable heading, or 2) data can be in a single column with a second column containing codes to show which data values belong in each group. In our example, you would have a single column of density data and a column that has A, B, and C in it for each population. The following example assumes that you are using log-transformed data from the iguana population density data example above for Islands A, B, C. To run the ANOVA test: Pull down the Analyze menu and choose New view. In the analysis browser, click on arrow in front of ANOVA to reveal the choices below. Choose the ANOVA table and Means table, and click on Create Analysis. A dialog box will appear showing you options for the analysis. Pull down the bar at the bottom of box that says no error bars and choose % confidence interval, which will give the 95% confidence intervals for the data. Then click OK. In the variable browser, select the continuous variable (e.g., log Density), and then click dependent.
4 Statistics with Statview - 21 Then click on the arrow in front of the continuous variable to reveal the compact variable name (Island). Click on the compact variable name and then click independent. An ANOVA table will be displayed along with a summary table of counts, means, std. dev. and std. errors. ANOVA Table for log Density DF Sum of Squares Mean Square F-Value P-Value Lambda Power Islands Residual Means Table for log Density Effect: Islands Island A Island B Island C Count Mean Std. Dev. Std. Err The next thing to do is interpret the results of the ANOVA. The ANOVA tested whether there were any differences in mean populations densities among the three populations. Look at the P-value. If this P-value is greater than 0.05, then there are no significant differences among any of the means. If the P-value is less than 0.05, then at least one mean is different from the others. In this example, P<0.05 in the ANOVA table, so the mean densities are significantly different. The second summary statistic table contains the counts, means, std. dev. and std. errors. Examine this information to make sure that it corresponds with the information that you have in the attribute pane in your original data set window. Now that you know the means are different, you want to find out which pairs of means are different from each other. e.g., is the density on Island A greater than B or C? To examine all pairwise comparisons of means, click on the arrow in front of Post-hoc tests in the analysis window to reveal the choices below. Then double-click on Fisher PLSD (Protected Least Sig. Difference) to get a table of all pairwise comparisons among the means. A table will appear in the window on the right showing the mean difference (or difference between the means), the critical difference (or the mean difference required for P<0.05), and the P-value. Fisher's PLSD for log Density Effect: Islands Significance Level: 5 % Island A, Island B Island A, Island C Island B, Island C Mean Diff. Crit. Diff P-Value S S The table shows that the mean population density of iguanas on Island A is significantly different from the mean density on Island B, but not Island C. Also, the density on Island B is significantly different from the density on Island C. Or more simply, Island B has more iguanas than either Island A or Island C, which don't differ. Plotting a graph (cell point chart) of these results might be helpful.
5 Statistics with Statview - 22 Two-way ANOVA In many studies the data collected are associated with groups defined by more than one independent variable. For example, say you want to know whether the bill size of finches is different between individuals of different species and between males and females of the same species. These two 2 factors (Species and Sex) can be examined simultaneously in a Two-way ANOVA. Also, you can find out whether the independent variables have joint effects on the dependent variable (bill size), or whether they act independently of each other (i.e., does bill size depend on sex in one species but not in the other species?). Again, this test requires that the data are normally distributed and all of the groups have homogeneous variances. So you need to check these assumptions first. If you want to compare means from two (or more) grouping variables simultaneously, as ANOVA does, there is no satisfactory non-parametric alternative. So you may need to transform your data. Your data may be set up with the groups entered in separate columns, and then compacted (see p. 8). Alternatively, you can use the stacked format. You will choose the same variables in the Variables browser, but they will be presented somewhat differently. Before we run a two-way ANOVA, first do a t-test on bill size just between species, then do a t- test on bill size just between sexes. Note the results. Do you think these results accurately represent the data? This exercise will show you how useful a two-way ANOVA can be in telling you more about the patterns in your data. Now run a two-way ANOVA on the same data. The procedure is much the same as for a Oneway ANOVA with one added step to include the second variable to the analysis. Pull down the Analyze menu and choose New View. In the analysis browser, click on arrow in front of ANOVA to reveal the choices below. Choose the ANOVA table and Means table, and click on Create Analysis. A dialog box will appear showing you options for the analysis. Click on the All effects button under Means tables and plots, and then pull down the bar where it says no error bars and choose % confidence interval, which will give the 95% confidence intervals for the data. Then click OK. In the variable browser, select the continuous variable (e.g., Bill size), and then click dependent. Then click on the arrow in front of the continuous variable to reveal the compact variable names. Click on the compact variable name Species and then click independent. Do the same for the variable name Sex. An ANOVA table will be displayed along with a summary table of counts, means, std. deviations and std. errors.
6 Statistics with Statview - 23 ANOVA Table for Species DF Sum of Squares Mean Square F-Value P-Value Lambda Power Species Sex Species * Sex < Residual Means Table for Species Effect: Species Sp. A Sp. B Count Mean Std. Dev. Std. Err Means Table for Species Effect: Sex female male Sp. A, female Sp. A, male Sp. B, female Sp. B, male Count Mean Std. Dev. Std. Err. Means Table for Species Effect: Species * Sex Count Mean Std. Dev. Std. Err The output consists of an ANOVA table that shows the significance of each of the independent variables and the interaction. The P-value associated with each independent variable tells you the probability that the means of the different levels of that variable are the same. So, again, if p<0.05, the levels of that variable are different. The P-value of the interaction term tells you the probability that the two variables act independently of each other. In our bill size example, there is a significant difference in bill size between species but not between sexes. There is also a significant species-sex interaction. This indicates that the effect of sex differs between the species. To get a better impression of what the interaction term means, make a graph showing the interaction. In the Analysis browser, double-click on interaction line under the ANOVA heading to make an interaction line graph. Three graphs should appear, the first two showing the effects of each variable separately, and the last combining the effects of both variables into one graph. If a dialog box appears instead of a graph, you have to first make sure that one of the tables in the view is selected first before you double-click on interaction line.
7 Statistics with Statview - 24 Cell Mean Interaction Line Plot for Bill size Effect: Species Error Bars: 95% Confidence Interval Sp. A Cell Mean Cell Sp. B Cell Mean Interaction Line Plot for Bill size Effect: Species * Sex Error Bars: 95% Confidence Interval Sp. A Cell Sp. B Interaction Line Plot for Bill size Effect: Sex Error Bars: 95% Confidence Interval female. female male Cell male To change symbols, click on the symbol in the key, then pull down the "Draw" menu. Choose "point" and pick the new symbol. If you look at the data, the interaction is apparent. In species A, the males have larger bills but in species B the females have larger bills. So simply looking at sex doesn t tell us anything (as you saw when you did the t-test) and neither sex has a consistently larger bill. When you did the t-test, you found that species was not significant, but the two-way ANOVA found that species was significant, over and above the interaction. This means that species A has larger bills overall. INDEPENDENT VARIABLE(S ALL) CONTINUOUS: CORRELATION & REGRESSION Correlation If the values of two CONTINUOUS variables appear to be related to one another, but neither causes the other, they are considered to be correlated. For example, the number of leaves falling in autumn and the number of geese flying south are generally correlated, but neither variable causes the other. The correlation coefficient, r, provides a quantitative measurement of how closely two variables are related. It ranges from 0 (no correlation) to 1 or -1 (the two variables are perfectly related, positively or negatively). Let s examine the correlation between bird weight and bill length, using the data displayed below. Bird #
8 Statistics with Statview bird weight (g) bill length (mm) Enter the data above in two COLUMNS in a new spreadsheet and name the columns Weight and Length. To visualize what the correlation represents, make a plot of the data: Create a new View window, and in the Analysis browser click on the arrow in front of Bivariate plots. Then double-click on Scattergram and a dialog box will appear. Click on OK. In the Variables browser click on the variable Weight and then click X variable. Then click on the variable Length and click Y variable. A graph will appear as below. 18 Bivariate Scattergram 16 Length Weight From this plot you can see that as weight increases there is also an increase in bird bill length. Thus, these two variables appear to be correlated. To quantify the extent of the correlation and see if it is statistically significant: Go to the View window and double-click on Correlation/Covar... in the Analysis browser. This will open a dialog box titled Correlation/Covariance. Click on the button for Fisher s r to z (p-values) and then click OK. A box will appear in the view window. In the Variables browser click on the variable Weight and then click Add (not necessary if you still have the graph selected). Then click on the variable Length and click Add. A table will appear as shown on the next page.
9 Statistics with Statview - 26 Correlation Matrix Weight Length Weight Length observations were used in this computation. Fisher's r to z Correlation P-Value Weight, Length observations were used in this computation. The value in the table is the Correlation matrix table is the Pearson correlation coefficient (0.66), which shows there is a positive correlation between Weight and Length. (A negative correlation would have a negative Pearson correlation coefficient.) The results of the Fisher s test shows that it is a statistically significant correlation (i.e., P<0.05). Regression Regressions and correlations are both used to test whether two variables are related to each other, and if so, how closely they are related. However, whereas correlation demands NO causal link, regression does. With regression we can determine if a change in one variable can be predicted from the change in another variable. A simple example is a one-to-one relationship between two variables, such as the relationship between the age and the number of growth rings of a tree. Another example is the relationship between the age and length of a fish. Age (years) Length (cm) The data consist of a value for the independent variable (x) and the associated value for the dependent variable (y). Think of these as on an x-axis and a y-axis. In our example, given x (the age of the fish) one can predict y (length). Generally, the independent variable (x) is controlled or standardized by the investigator, and the y variable is dependent on the value of x. A regression calculates the equation of the best fitting straight line through the (x,y) points that the data pairs define. In the equation of a line (y = a + bx), a is the y-intercept (where x=0) and b is the slope. The output of a regression will give you estimates for both of these values. If we wanted to predict the length of a fish at a given age, we could do so using the regression equation that best fits these data. Enter the data above into a new spreadsheet and name the two data columns Age and Length. To visualize the relationship between these two variables, make a plot of the data: Create a new View window, then in the Analysis browser click on the arrow in front of Regression.
10 Statistics with Statview - 27 Then double-click on Regression Plot and a dialog box will appear. Click OK, and a new box will appear in the window on the right. In the Variables browser, click on the variable Age and then click Independent variable. Then click on the variable Length and click Dependent variable. A graph will appear as below. Length Bivariate Scattergram with Regression Age Length = * Age; R^2 =.969 The graph shows that there is a strong positive relationship between fish Age and Length. The equation below the graph is the equation for the regression line that best describes the relationship between the two variables. The R^2 (or R 2 ) value is the coefficient of determination, and can be interpreted as the proportion of the variation in the data that is explained by the factors you have used in the analysis. R 2 ranges from 0 to 1. If it is close to 1, it means that your independent variable has explained almost all of why the value of your dependent variable differs from observation to observation. If R 2 is close to 0, it means that they have explained almost none of the variation in your dependent variable. In this example it appears that 97% of the variation in the variation in Length is explained by variation in Age. Now what you want to do is determine whether the relationship is statistically significant. To run a regression analysis: In the Analysis browser click on the arrow in front of Regression. Then select Regression sum..., ANOVA table, and Regression coef..., and click Create Analysis. A summary table should appear along with an ANOVA table and a coefficient table. If it does not appear then you should select your variables for the analysis in the Variables browser. The regression summary provides the basic data for the analysis along with the R 2 values. The results of the ANOVA table indicate whether the relationship between the two variables is significant. P<0.05.The output of the Regression Coefficient table consists of the estimates (which it calls coefficients) of y-intercept and the slope (which is your independent variable). Each of these has an associated P-value shown at the far right. If the P-value for the y-intercept is less than 0.05, then it is significantly different from zero. If the P-value for the slope is less than 0.05, then the slope is significantly different from zero. Examine the output for the regression analysis of the example data on the next page.
11 Statistics with Statview - 28 Regression Summary Length vs. Age Count Num. Missing R R Squared Adjusted R Squared RMS Residual ANOVA Table Length vs. Age DF Sum of Squares Mean Square F-Value P-Value Regression <.0001 Residual Total Regression Coefficients Length vs. Age Coefficient Std. Error Std. Coeff. t-value P-Value Intercept <.0001 Age <.0001 From the output, we can see the very high R 2 value reveals that 97% of the variation in length (dependent variable) can be explained by variation in age (independent variable). The low p-value (<0.05) in the ANOVA table indicates that the relationship is highly significant, and thus very unlikely to occur by chance alone. The output also indicates that the y-intercept and the slope are significantly different from zero. Analysis of Covariance (ANCOVA) ANCOVAs are hybrids between regressions and ANOVAs because they combine both continuous ("covariate") and categorical INDEPENDENT variables. The most common case in ecology is when you want to compare the slopes and/or intercepts of regression lines. In this case, your data will consist of a categorical variable that separates two or more sets of pairs of points that define separate regression lines. For ANCOVA analysis, your data must be "stacked" (one column for the continuous independent variable, one column for the categorical independent variable, and one column for the response or dependent variable). For example, assume we want to compare the growth rates of fish in warm and cool tanks. We will compare the regressions of length on age from the two temperatures, where age is our continuous independent variable (our covariate), temperature is our categorical independent variable, and length is our dependent or response variable. If you go back to the data set you had previously on fish age and length, add seven more rows and one column as shown on the next page. Age (years) Temperature Length (cm)
12 Statistics with Statview warm warm warm warm warm warm warm cool cool cool cool cool cool cool 20.4 First, graph these data to understand the relationships among the variables. Create a new View window, and in the Analysis browser, click on the arrow in front of Bivariate Plot. Double-click on Scattergram. In the dialog box, click the button for Regression with a... (don't worry about "means" or "slopes"), and under "When split, show lines for," choose "each group separately." Click OK. In the Variables browser, click on Age and then click independent variable. Then click on Length and click dependent variable. A graph of both data sets will appear with the combined regression line shown. Finally, click on Temperature and click split by. The graph will sort into the two separate data sets and show each regression line individually. The individual regression equations are shown below the graph. Length Bivariate Scattergram with Regression Split By: Temperature Age Length = * Age; R^2 =.969 (cool) Length = * Age; R^2 =.996 (warm) cool warm To run the ANCOVA analysis to see whether the slopes or the intercepts of these two lines differ:
13 Statistics with Statview - 30 In the Analysis browser, click on the arrow in front of ANOVA, double-click on ANOVA Table, and click OK in the dialog box to accept the default parameters. Analysis will appear for each regression line separately, but you want to compare the two lines in the same analysis. In the Variables browser, click on Temperature and then click on Remove (to recombine the date from the separate regressions) and then click on Independent (to include the variable in the analysis as an independent variable). The new analysis table will have a row for Age (the covariate or X-axis variable), Temperature (the categorical variable), the interaction between these two, and the residual or error term. ANOVA Table for Length DF Sum of Squares Mean Square F-Value P-Value Lambda Power Age < Temperature Age * Temperature Residual There are four possible types of outcomes in an ANCOVA analysis (see Figure 1 next page). To interpret the ANCOVA output, look at the interaction term first: If the interaction is significant (P<0.05), then the two slopes are different. If the two slopes are different, then the significance of the main effects for the categorical and continuous variables may not be very useful to interpret. In this case the continuous variable P-value is combining the data for the two regression lines and checking to see if the slope of that composite line is significant. It may or may not be significant, depending on whether the slopes of the separate lines are in a similar direction. The interpretation of the categorical variable is even harder. If the slopes are different, the categorical P-value tells you whether the mean Y-value of one line is different from the mean Y-value of the other. But since the slopes differ, this is pretty meaningless. If the interaction term is not significant (P>0.05), then the slopes are not different and both of the main effects of the continuous and categorical variables are easy to interpret. The continuous variable P-value tells you if the composite slope is different from zero, and the categorical variable P-value tells you whether the two Y-intercepts differ from each other (i.e., whether the two lines are coincident or parallel). In our example, the interaction term (age*temperature) is significant (P=0.0051) so the slopes of the two lines differ. That makes it more difficult to directly assess what else is going on. To sort this out, look at the graph you constructed. Given the slopes and intercepts we saw in the regressions, the outcome is most similar to Case 4. Both lines have positive slopes (which is why the composite slope tested by the "age" main effect is significant (p<0.0001)). The two lines also apparently have different y-values a the mean value of x (temperature main effect p=0.0310), but of course, how different the y-values are depends on where you look along the axis.
14 Statistics with Statview - 31 Figure 1. Possible outcomes of an ANCOVA. y Case 1 y Case 2 x variable (covariate) x variable (covariate) y Case 3 y Case 4 x variable (covariate) x variable (covariate) In Case 1, both the slopes and the intercepts are equal. This is the outcome you might expect if your treatment variable (temperature) had no effect on either the relationship of the X and Y variables (the effect of age on length) or the value of Y when X is 0 (the length of newly-hatched fish). In Case 2, the slopes are the same but the intercepts differ. This is the outcome you might expect if the relationship between X and Y (the effect of age on length) was the same for both treatments, but the values of Y were uniformly larger under one treatment or another (fish are generally bigger at a given age in warm water). In Case 3, the intercepts are the same but the slopes differ. This is the outcome you might expect if the treatments start the same but the treatment causes the relationship to become more different as the value of the covariate increases (fish hatch at the same size, but grow faster in warm water than in cool water). In Case 4, both the intercepts and the slopes differ. This is the outcome you might expect if the treatment causes changes in both the relationship of the X and Y variables (the effect of age on length) and the value of X when Y is 0 (the length of newly-hatched fish).