1 Using SPSS version 14 Joel Elliott, Jennifer Burnaford, Stacey Weiss SPSS is a program that is very easy to learn and is also very powerful. This manual is designed to introduce you to the program however, it is not supposed to cover every single aspect of SPSS. There will be situations in which you need to use the SPSS Help Menu or Tutorial to learn how to perform tasks which are not detailed in here. You should turn to those resources any time you have questions. The following document provides some examples of common statistical tests used in Ecology. To decide which test to use, consult your class notes, your Statistical Roadmap or the Statistics Coach (under Help menu in SPSS). Data entry p. 2 Descriptive statistics p. 4 Examining assumptions of parametric statistics Test for normality p. 5 Test for homogeneity of variances p. 6 Transformations p. 7 Comparative Statistics 1: Comparing means among groups Comparing two groups using parametric statistics Two-sample t-test p. 8 Paired T-test p. 10 Comparing two groups using non-parametric statistics Mann Whitney U test p. 11 Comparing three or more groups using parametric statistics One-way ANOVA and post-hoc tests p. 13 Comparing three or more groups using non-parametric statistics Kruskal-Wallis test p. 15 For studies with two independent variables Two-way ANOVA p. 17 ANCOVA p. 20 Comparative Statistics 2: Comparing frequencies of events Chi Square Goodness of Fit p. 23 Chi Square Test of Independence p. 24 Comparative Statistics 3: Relationships among continuous variables Correlation (no causation implied) p. 26 Regression (causation implied) p. 27 Graphing your data Simple bar graph p. 30 Clustered bar graph p. 31 Box plot p. 32 Scatter plot p. 32 Printing from SPSS p. 33
2 Start SPSS and when the first box appears for What would you like to do? click the button for Type in data. A spreadsheet will appear. The set-up here is similar to Excel, but at the bottom of the window you will notice two tabs. One is Data View. The other is Variable View. To enter your data, you will need to switch back and forth between these pages by clicking on the tabs.!" # $ % % Suppose you are part of a biodiversity survey group working in the Galapagos Islands and you are studying marine iguanas. After visiting a couple of islands you think that there may be higher densities of iguanas on island A than on island B. To examine this hypothesis, you decide to quantify the population densities of the iguanas on each island. You take 20 transects (100 m 2 ) on each island (A and B), counting the number of iguanas in each transect. Your data are shown below. A B First define the variables to be used. Go to Variable View of the SPSS Data Editor window as shown below. The first column (Name) is where you name your variables. For example, you might name one Location (you have 2 locations in your data set, Island A and Island B). You might name the other one Density (this is your response variable, number of iguanas). Other important columns are the Type, Label, Values, and Measure. o For now, we will keep Type as Numeric but look to see what your options are. At some point in the future, you may need to use one of these options. o The Label column is very helpful. Here, you can expand the description of your variable name. In the Name column you are restricted by the number & type of characters you can use. In the Label column, there are no such restrictions. Type in labels for your iguana data. o In the Values column, you can assign numbers to represent the different locations (so Island A will be 1 and Island B will 2 ). To do this, you need to assign Values to your categorical explanatory variable. Click on the cell in the Values column, and click on the that shows up. A dialog box will appear as below. Type in 1 in the value cell and A in the value label cell, and then hit Add. Type in 2 in the value cell and B in the value label cell. Hit Add again. Then Hit OK.
3 o In the Measure column, you can tell the computer what type of variables these are. In this example, island is a categorical variable. So in the Location row, go to the measure column (the far right) and click on the cell. There are 3 choices for variable types. You want to pick Nominal. Iguana density is a continuous variable... since scale (meaning continuous) is the default condition, you don t need to change anything. Now switch to the Data View. You will see that your columns are now titled Location and Density. To make the value labels appear in the spreadsheet pull down the View menu and choose Value Labels. The labels will appear as you start to enter data. You can now enter your data in the columns. Each row is a single observation. Since you have chosen View Value Labels and entered your Location value labels in the Variable View window, when you type 1 in the Location column, the letter A will appear. After you ve entered all the values for Island A, enter the ones from Island B below them. The top of your data table will eventually look like this: &
4 $ ( %!) * ( + ", " $ - Once you have the data entered, you want to summarize the trends in the data. There a variety of statistical measures for summarizing your data, and you want to explore your data by making tables and graphs. To help you do this you can use the Statistics Coach found under the Help menu in SPSS, or you can go directly to the Analyze menu and choose the appropriate tests. To get a quick view of what your data look like: Pull down the Analyze menu and choose Descriptive statistics, then Frequencies. A new window will appear. Put the Density variable in the box, then choose the statistics that you want to use to explore your data by the clicking on the Statistics and Charts buttons at the bottom of the box (e.g., mean, median, mode, standard deviation, skewness, kurtosis). This will produce summary statistics for the whole data set. Your results will show up in a new window. SPSS can also produce statistics and plots for each of the islands separately. To do this, you need to split the file. Pull down the Data menu and choose Split File. Click on Organize output by groups and then select the Island [Location] variable as shown below. Click OK. Now, if you repeat the Analyze Descriptive statistics Frequencies steps and hit Okay again, your output will now be similar to the following for each Island. Statistics(a) Statistics(b) Density N Valid 20 Missing 0 Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness.512 Kurtosis Std. Error of Kurtosis.992 Range 6.00 Minimum Maximum a Island = A Density N Valid 20 Missing 0 Mean Median Mode 15.00(a) Std. Deviation Variance Skewness.475 Std. Error of Skewness.512 Kurtosis Std. Error of Kurtosis.992 Range Minimum 9.00 Maximum a Multiple modes exist. The smallest value is shown b Island = B '
5 Histogram Histogram Island: A Island: B Frequency 4 3 Frequency Density Mean = Std. Dev. = N = Density Mean = Std. Dev. = N = 20 From these summary statistics you can see that the mean density of iguanas on Island A is smaller than that on Island B. Also, the variation patterns of the data are different on the two islands as shown by the frequency distributions of the data and their different dispersion parameters. In each histogram, the normal curve indicates the expected frequency curve for a normal distribution with the same mean and standard deviation as your data. The range of data values for Island A is lower with a lower variance and kurtosis. Also, the distribution of Island A is skewed to the left whereas the data for Island B is skewed to the right. You could explore your data more by making box plots, stem-leaf plots, and error bar charts. Use the functions under the Analyze and Graphs menus to do this. After getting an impression of what your data look like you can now move on to determine whether there is a significant difference between the mean densities of iguanas on the two islands. To do this we have to use comparative statistics. NOTE: Once you are done looking at your data for the two islands separately, you need to unsplit the data. Go to Data Split File and select Analyze all cases, do not create groups. # - $ $ As you know, parametric tests have two main assumptions: 1) approximately normally distributed data, and 2) homogeneous variances among groups. Let s examine each of these assumptions. Before you conduct any parametric tests you need to check that the data values come from an approximately normal distribution. To do this, you can compare the frequency distribution of your data values with those of a normalized version of these values (See Descriptive Statistics section above). If the data are approximately normal, then the distributions should be similar. From your initial descriptive data analysis you know that the distributions of data for Island A and B did not appear to fit an expected normal distribution perfectly. However, to objectively determine whether the distribution varies significantly from a normal distribution you have to conduct a normality test. This test will provide you with a statistic that determines whether your data are.
6 significantly different from normal. The null hypothesis is that the distribution on your data is NOT different from a normal distribution. For the marine iguana example, you want to know if the data from Island A population are normally distributed and if the data from Island B are normally distributed. Thus, your data must be split. (Data Split File Organize output by groups split by Location) Don t forget to unsplit when you are done! To conduct a statistical test for normality on your split data, go to Analyze Nonparametric Tests 1 Sample K-S. In the window that appears, put the response variable (in this case, Density) variable into the box on the right. Click Normal in the Test Distribution check box below. Then click OK. A output shows a Komolgorov-Smirnov (K-S) table for the data from each island. Your p-value is the last line of the table: Asymp. Sig. (2-tailed). If p>0.05 (i.e., there a greater than 5% chance that your null hypothesis is true), you should conclude that the distribution of your data is not significantly different from a normal distribution. If p<0.05 (i.e., there is a less than 5% chance that your null hypothesis is true), you should conclude that the distribution of your data is significantly different from normal. Note: always look at the p-value. Don t trust the test distribution is normal note below sometimes that lies. If your data are not normal, you should inspect them for outliers which can have a strong effect on this test. Remove the extreme outliers and try again. If this does not work, then you must either transform your data so that they are normally distributed, or use a nonparametric test. Both of these options are discussed later. One-Sample Kolmogorov-Smirnov Test(c) Density N 20 Normal Mean Parameters(a,b) Std. Deviation Most Extreme Absolute.218 Differences Positive.132 Negative Kolmogorov-Smirnov Z.975 Asymp. Sig. (2-tailed).298 a Test distribution is Normal. b Calculated from data. c Island = A One-Sample Kolmogorov-Smirnov Test(c) Density N 20 Normal Mean Parameters(a,b) Std. Deviation Most Extreme Absolute.166 Differences Positive.166 Negative Kolmogorov-Smirnov Z.740 Asymp. Sig. (2-tailed).644 a Test distribution is Normal. b Calculated from data. c Island = B For the iguana example, you should find that the data for both populations are not significantly different from normal (p > 0.05). With a sample size of only N=20 the data would have to be skewed much more or have some large outliers to vary significantly from normal. If your data are not normally distributed, you should try to transform the data to meet this important assumption. (See below.) - ( Another assumption of parametric tests is that the variances of each of the groups that you are comparing have relatively similar variances. Most of the comparative tests in SPSS will do this test /
7 for you as part of the analysis. For example, when you run a t-test, the output will include columns labeled Levene s test for Equality of Variances. The p-value is labeled Sig. and will tell you whether or not your data meet the assumption of parametric statistics. If the variances are not homogeneous, then you must either transform your data (e.g., using a log transformation) to see if you can equalize the variances, or you can use a nonparametric comparison test that does not require this assumption. - " " " " - $ 1 If your data do not meet one or both of the above assumptions of parametric statistics, you may be able to transform the data so that they do. You can use a variety of transformations to try and make the variances of the different groups equal or normalize the data. If the transformed data meet the assumptions of parametric statistics, you may proceed by running the appropriate test on the transformed data. If, after a number of attempts, the transformed data do not meet the assumptions of parametric statistics, you must run a non-parametric test. If the variances were not homogeneous, look at how the variances change with the mean. The usual case is that larger means have larger variances. If this is the case, a transformation such as common log, natural log or square root often makes the variances homogeneous. Whenever your data are percents (e.g., % cover) they will generally not be normally distributed. To make percent data normal, you should do an arcsine-square root transformation of the percent data (percents/100). To transform your data: Go to Transform Compute. You will get the Compute Variable window. In the Target Variable box, you want to name your new transformed variable (for example, Log_Density ). There are 3 ways you can transform your data. 1) using the calculator, 2) choosing functions from lists on the right, or 3) typing the transformation in the Numeric Expression box. For this example: In the Function Group box on the right, highlight Arithmetic by clicking on it once. Various functions will show up in the Functions and Special Variables box below. Choose the LG10 function. Double click on it. In the Numeric Expression box, it will now say LG10[?]. Double-click on the name of the variable you want to transform (e.g., Density) in the box on the lower left to make Density replace the?. Click Ok. SPSS will create a new column in your data sheet that has log-values of the iguana densities. NOTE: you might want to do a transformation such as LN (x + 1). Follow the directions as above but choose LN instead of LG10 from the Functions and Special Variables box. Move your variable in the parentheses to replace the?. Then type in +1 after your variable so it reads, for example, LN[Density+1]. NOTE: for the arcsine-square root transformation, the composite function to be put into the Numeric Expression box would look like: arcsin(sqrt(percent data/100)). 0
8 After your transform your data, redo the tests of normality and homogeneity of variances to see if the transformed data now meet the assumptions of parametric statistics. Again, if your data now meet the assumptions of the parametric test, conduct a parametric statistical test using the transformed data. If the transformed data still do not meet the assumption, you can do a nonparametric test instead, such as a Mann-Whitney U test on the original data. This test is described later in this handout. $ ( % % $ $ $ + $ $ % + $ 3 $ " + $ This test compares the means from two groups, such as the density data for the two different iguana populations. To run a two-sample t-test on the data: First, be sure that your data are unsplit. (Data Split File Analyze all cases, do not create groups.) Then, go to Analyze Compare Means Independent Samples T-test. Put the Density variable in the Test Variable(s) box and the Location variable in the Grouping Variable box as shown below. Now, click on the Define Groups button and put in the names of the groups in each box as shown below. The click Continue and OK. 2
9 The output consists of two tables Group Statistics Density Island N Mean Std. Deviation Std. Error Mean A B Density Equal variances assumed Equal variances not assumed Levene's Test for Equality of Variances F Sig. Independent Samples Test t df Sig. (2-tailed) t-test for Equality of Means Mean Difference 95% Confidence Interval of the Std. Error Difference Difference Lower Upper The first table shows the means and variances of the two groups. The second table shows the results of the Levene s Test for Equality of Variances, the t-value of the t-test, the degrees of freedom of the test, and the p-value which is labeled Sig. (2-tailed). Before you look at the results of the t-test, you need to make sure your data fit the assumption of homogeneity of variances. Look at the columns labeled Levene s test for Equality of Variances. The p-value is labeled Sig.. In this example the data fail the Levene s Test for Equality of Variances, so the data will have to be transformed in order to see if we can get it to meet this assumption of the t-test. If you logtransformed the data and re-ran the test, you d get the following output. Group Statistics Island N Mean Std. Deviation Std. Error Mean Log_Density A B Independent Samples Test Log_Density Equal variances assumed Equal variances not assumed Levene's Test for Equality of Variances F Sig. t df Sig. (2-tailed) t-test for Equality of Means Mean Difference 95% Confidence Interval of the Std. Error Difference Difference Lower Upper Now the variances of the two groups are not significantly different from each other (p =0.112) and you can focus on the results of the t-test. For the t-test, p=0.015 (which is <0.05) so you can conclude that the two means are significantly different from each other. Thus, this statistical test provides strong support for your original hypothesis that the iguana densities varied significantly between Island A and Island B. 4
10 WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the t-value, df, and p. For example: Iguanas are significantly more dense on Island B than on Island A (t=2.5, df=38, p<0.05). " You should analyze your data with a paired t-test only if you paired your samples during data collection. This analysis tests to see if the mean difference between samples in a pair is = 0. The null hypothesis is that the difference is not different from zero. For example, you may have done a study in which you investigated the effect of light intensity on the growth of the plant Plantus speciesus. You took cuttings from source plants and for each source plant, you grew 1 cutting in a high light environment and 1 cutting in a low-light environment. The other conditions were kept constant between the groups. You measured growth by counting the number of new leaves grown over the course of your experiment. Your data look like this: Plant Low Light High Light Enter your data in 2 columns named Low and High. Each row in the spreadsheet should have a pair of data. In Variable View, leave the Measure column on Scale. Leave Values as None. Go to Analyze Compare Means Paired Samples T-test. Highlight both of your variables and hit the arrow to put them in the Paired-Variables box. They will show up as Low-High. Hit OK. The following output should be produced. The output consists of 3 tables Paired Samples Statistics Pair 1 Low Light High Light Std. Error Mean N Std. Deviation Mean Paired Samples Correlations Pair 1 Low Light & High Light N Correlation Sig Pair 1 Low Light - High Light Paired Samples Test Paired Differences 95% Confidence Interval of the Std. Error Difference Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
11 The first table shows the summary statistics for the 2 groups. The second table shows information that you can ignore. The third table, the Paired Samples Test table, is the one you want. It shows the mean difference between samples in a pair, the variation of the differences around the mean, your t-value, your df, and your p-value (labeled as Sig (2-tailed)). In this case, the P-value reads 0.000, which means that it is very low it is smaller than the program will show in the default 3 decimal places. You can express this in your results section as p< WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the t-value, df, and p. For example: Plants in the high light treatment added significantly more leaves than their counterpart plants in the low light treatment (t=6.3, df=9, p<0.001). $ + $ $ $ %6-7 The t-test is a parametric test, meaning that it assumes that the sample mean is a valid measure of center. While the mean is valid when the distance between all scale values is equal, it's a problem when your test variable is ordinal because in ordinal scales the distances between the values are arbitrary. Furthermore, because the variance is calculated using squared deviations from the mean, it too is invalid if those distances are arbitrary. Finally, even if the mean is a valid measure of center, the distribution of the test variable may be so non-normal that it makes you suspicious of any test that assumes normality. If any of these circumstances is true for your analysis, you should consider using the nonparametric procedures designed to test for the significance of the difference between two groups. They are called nonparametric because they make no assumptions about the parameters of a distribution, nor do they assume that any particular distribution is being used. A Mann-Whitney U test doesn t require normality or homogeneous variances, but it is slightly less powerful than the t-test (which means the Mann-Whitney U test is less likely to show a significant difference between your two groups). So, if you have approximately normal data, then you should use a t-test. To run a Mann-Whitney U test: Go to Analyze Nonparametric tests 2 Independent samples and a dialog box will appear. Put the variables in the appropriate boxes, define your groups, and confirm that the Mann- Whitney U test type is checked. Then click OK.
12 The output consists of two tables. The first table shows the parameters used in the calculation of the test. The second table shows the statistical significance of the test. The value of the U statistic is given in the 1 st row ( Mann-Whitney U ). The p-value is labeled as Asymp. Sig. (2- tailed). Ranks Density Island N Mean Rank Sum of Ranks A B Total 40 Test Statistics(b) Density Mann-Whitney U Wilcoxon W Z Asymp. Sig. ( tailed) Exact Sig. [2*(1-.003(a) tailed Sig.)] a Not corrected for ties. b Grouping Variable: Island In the table above (for the marine iguana data), the p-value = 0.003, which means that the densities of iguanas on the two islands are significantly different from each other (p < 0.05). So, again this statistical test provides strong support for your original hypothesis that the iguana densities are significantly different between the islands. WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the U-value, df, and p. For example: Iguanas are significantly more dense on Island B than on Island A (U=91.5, df=39, p<0.01). $ - $ $ $ % % + +! 8!" -- Let s now consider parametric statistics that compare three or more groups of data. To continue the example using iguana population density data, let s add data from a series of 16 transects from a third island, Island C. Enter these data into your spreadsheet at the bottom of the column Density. Density (100 m 2 ) Island C: To enter the Location for Island C, you must first edit the Value labels by going to Variable View: add a third Value (3) and Value label (C). Then, back on Data View, type a 3 into the last cell of the Location column, and copy the C and paste it into the rest of the cells below. The appropriate parametric statistical test for continuous data with one independent variable and more than two groups is the One-way analysis of variance (ANOVA). It tests whether there is a
13 significant difference among the means of the groups, but does not tell you which means are different from each other. In order to find out which means are significantly different from each other, you have to conduct post-hoc paired comparisons. They are called post-hoc, because you conduct the tests after you have completed an ANOVA and it shows where significant differences lie among the groups. One of the Post-hoc tests is the Fisher PLSD (Protected Least Sig. Difference) test, which gives you a test of all pairwise combinations. To run the ANOVA test: Go to Analyze Compare Means One-way ANOVA. In the dialog box put the Density variable in the Dependent List box and the Location variable in the Factor box. Click on the Post Hoc button and then click on the LSD check box and then click Continue. Click on the Options button and check 2 boxes: Descriptive and Homogeneity of variance test. Then click Continue and then OK. The output will include four tables Descriptive statistics, results of the Levene test, the results of the ANOVA, and the results of the post-hoc tests. The first table gives you some basic descriptive statistics for the three islands. Descriptives Density A B C Total 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum The second table gives you the results of the Levene Test (which examines the assumption of homogeneity of variances). You must assess the results of this test before looking at the results of your ANOVA. Density Test of Homogeneity of Variances Levene Statistic df1 df2 Sig & &
14 In this case, your variances are not homogeneous (p<0.05), the data do not meet one of the assumptions of the test. Thus, and you cannot proceed to using the results of the ANOVA comparisons of means. You have two main choices of what to do. You can either transform your data to attempt to make the variances homogeneous or you may run a test that does not require homogeneity of variances (a non-parametric test like Welch s Test for three or more groups). First, try transforming the data for each population (try a log transformation), and then run the test again. The following tables are for the log transformed data. Log_Density A B C Total Descriptives 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum Test of Homogeneity of Variances Log_Density Levene Statistic df1 df2 Sig Now your variances are homogeneous (p>0.05), and you can continue with the assessment of the ANOVA. The third table gives you the results of the ANOVA test, which examined whether there were any significant differences in mean density among the three island populations of marine iguanas. ANOVA Log_Density Between Groups Within Groups Total Sum of Squares df Mean Square F Sig Look at the p-value in the ANOVA table ( Sig. ). If this p-value is > 0.05, then there are no significant differences among any of the means. If the p-value is < 0.05, then at least one mean is significantly different from the others. In this example, p = 0.01 in the ANOVA table, and thus p < 0.05, so the mean densities are significantly different. Now that you know the means are different, you want to find out which pairs of means are different from each other. e.g., is the density on Island A greater than B? Is it greater than C? How do B & C compare with each other? The Post Hoc tests, Fisher LSD (Least Sig. Difference), allow you to examine all pairwise comparisons of means. The results are listed in the fourth table. Which groups are and are not significantly different from each other? Look at the Sig. column for each comparison. B is different from both A and C, but A and C are not different from each other. ' '
15 Multiple Comparisons Dependent Variable: Log_Density LSD (I) Island A B C (J) Island B C A C A B Mean Difference 95% Confidence Interval (I-J) Std. Error Sig. Lower Bound Upper Bound * * * * *. The mean difference is significant at the.05 level. WHAT TO REPORT: Following a statement that describes the general patterns in the data, you should parenthetically report the F-value, df, and p from the ANOVA. Following statements that describe the differences between specific groups, you should report the p-value from the post-hoc test only. (NOTE: there is no F-value or df associated with the post-hoc tests only a p-value!) For example: Iguana density varies significantly across the three islands (F=5.0, df=2,53, p=0.01). Iguana populations on Island B are significantly more dense than on Island A (p<0.01) and on Island C (p=0.01), but populations on Islands A and C have similar densities (p>0.90). $ - $ $ $ % 9 * Like a Mann-Whitney U test was a non-parametric version of a t-test, a Kruskal-Wallis test is the non-parametric version of an ANOVA. The test is used when you want to compare three or more groups of data, and those data do not fit the assumptions of parametric statistics even after attempting standard transformations. Remind yourself of the assumptions of parametric statistics and the downside of using non-parametric statistics by reviewing the information on Page 11. To run the Kruskal-Wallis test: Go to Analyze Nonparametric Tests K Independent Samples. Note: Remember for the Mann-Whitney U test, you went to Nonparametric tests 2 Independent Samples. Now you have more than 2 groups, so you go to K Independent Samples instead, where K is just standing in for any number or more than 2. Put your variables in the appropriate boxes, define your groups, and be sure Kruskal-Wallis box is clicked on in the Test Type box. Click OK...
16 The output consists of two tables. The first table shows the parameters used in the calculation of the test. The second table shows you the statistical results of the test. As you will see, the test statistic that gets calculated is a chi-square value and it is reported in the first row of the second table. The p-value is labeled as Asymp. Sig. (2-tailed). Ranks density Location A B C Total N Mean Rank Test Statistics(a,b) density Chi-Square df 2 Asymp. Sig..004 a Kruskal Wallis Test b Grouping Variable: Location In the table above, the p-value = 0.004, which means that the densities on the three islands are significantly different from each other (p < 0.01). So, this test also supports the hypothesis that iguana densities differ among islands. We do not yet know which islands are different from which other ones. Unlike an ANOVA, a Kruskal-Wallis test does not have an easy way to do post-hoc analyses. So, if you have a significant effect for the overall Kruskal-Wallis, you can follow that up with a series of two-group comparisons using Mann-Whitney U tests. In this case, we would follow up the Kruskal-Wallis with three Mann-Whitney U tests: Island A vs. Island B, Island B vs. Island C, and Island C vs. Island A. WHAT TO REPORT: Following a statement that describes that general patterns in the data, you should parenthetically report the chi-square value, df, and p. For example: Iguana density varied significantly across the three islands ( 2 =11.3, df=2, p=0.004). : " " $ " (, % + + +! 8!3! 8! In many studies, researchers are interested in examining the effect of >1 independent variable (i.e., factors ) on a given dependent variable. For example, say you want to know whether the bill size of finches is different between males and females of two different species. In this example, you / /
17 have two factors (Species and Sex) and both are categorical. They can be examined simultaneously in a Two-way ANOVA, a parametric statistical test. The two-way ANOVA will also tell you whether the two factors have joint effects on the dependent variable (bill size), or whether they act independently of each other (i.e., does bill size depend on sex in one species but not in the other species?). What if we wanted to know, for a single species, how sex and body size affect bill size? We still have two factors, but now one of the factors is categorical (Sex) and one is continuous (Body Size). In this case, we need to use an ANCOVA an analysis of covariance. Both tests require that the data are normally distributed and all of the groups have homogeneous variances. So you need to check these assumptions first. If you want to compare means from two (or more) grouping variables simultaneously, as ANOVA and ANCOVA do, there is no satisfactory non-parametric alternative. So you may need to transform your data. +! 8! Enter the data as shown to the right: The two factors (Species and Sex) are put in two separate columns. The dependent variable (Bill length) is entered in a third column. Before you run a two-way ANOVA, you might want to first run a t-test on bill size just between species, then a t-test on bill size just between sexes. Note the results. Do you think these results accurately represent the data? This exercise will show you how useful a two-way ANOVA can be in telling you more about the patterns in your data. Now run a two-way ANOVA on the same data. The procedure is much the same as for a One-way ANOVA with one added step to include the second variable to the analysis. Go to Analyze General Linear Model Univariate. A dialog box appears as below. Your dependent variable goes in the Dependent Variable box. Your explanatory variables are Fixed Factors Now click Options. A new window will appear. Click on the check boxes for Descriptive 0 0
18 Statistics and Homogeneity tests, then click Continue. Click OK. The output will consist of three tables which show descriptive statistics, the results of the Levene s test and the results of the 2-way ANOVA. From the descriptive statistics, it appears that the means may be different between the sexes and also different between species. Dependent Variable: Bill size Sex Female Male Total Species Species A Species B Total Species A Species B Total Species A Species B Total Descriptive Statistics Mean Std. Deviation N From this second table, you know that your data meet the assumption of homogeneity of variance. So, you are all clear to interpret the results of your 2-way ANOVA. Levene's Test of Equality of Error Variances a Dependent Variable: Bill size F df1 df2 Sig Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+Sex+Species+Sex * Species 2 2
19 The ANOVA table shows the statistical significance of the differences among the means for each of the independent variables (i.e., factors or main effects. Here, they are Sex and Species) and the interaction between the two factors (i.e., Sex * Species). Let s walk through how to interpret this information Dependent Variable: Bill size Source Corrected Model Intercept Sex Species Sex * Species Error Total Corrected Total Tests of Between-Subjects Effects Type III Sum of Squares df Mean Square F Sig a a. R Squared =.870 (Adjusted R Squared =.845) Always look at the interaction term FIRST. The p-value of the interaction term tells you the probability that the two factors act independently of each other and that different combinations of the variables have different effects. In this bill-size example, the interaction term shows a significant sex*species interaction (p < 0.001). This means that the effect of sex on bill size differs between the two species. Simply looking at sex or species on their own won t tell you anything. To get a better idea of what the interaction term means, make a Bar Chart with error bars. See the graphing section of the manual for instructions on how to do this. If you look at the data, the interaction should become apparent. In Species A, bills are larger in males than in females, but in Species B, bills are larger in females than in males. So simply looking at sex doesn t tell us anything (as you saw when you did the t-test) and neither sex has a consistently larger bill when considered across both species. The main effects terms in a 2-way ANOVA basically ignore the interaction term and give similar results to the t-tests you may have performed earlier. So, the p-value associated with each independent variable (i.e., factor or main effect) tells you the probability that the means of the different groups of that variable are the same. So, if p < 0.05, the groups of that variable are significantly different from each other. In this case, it tests whether males and females are different from each other disregarding the fact that we have males and females from two different species in our data set. And it tests whether the two species are different from each other disregarding the fact that we have males and females from each species in our data set. 4 4
20 The two-way ANOVA found that species was significant if you ignore the interaction. This suggests that species A has larger bills overall, mainly because of the large size of the males of Species A, but does not always have larger bills because bill size also depends gender. WHAT TO REPORT: If there is a significant interaction term, the significance of the main effects cannot be fully accepted because of differences in the trends among different combinations of the variables. Thus, you only need to tell your reader about the interaction term of the ANOVA table. Describe the pattern and parenthetically report the appropriate F-value, df, and p). For example: The way that sex affected bill size was different for the two different species (F=95.6, df=1,16, p<0.001). (Often, a result like this would be followed up with two separate t-tests.) If the interaction term is not significant, then the statistical results for the main effects can be fully recognized. In this case, you need to tell your reader about the interaction term and about each main effect term of the ANOVA table. Following a statement that describes the general patterns for each of these terms, you should parenthetically report the appropriate F-value, df, and p. For example: Growth rates of the both invasive and native grass species were significantly higher at low population densities than at high population densities (F=107.1, df=1,36, p<0.001). However, the invasive grass grew significantly faster than the native grass at both populations densities (F=89.7, df=1,36, p<0.001). There is no interaction between grass species and population densities on growth rate (F=1.2, df=1,36, p>0.20).! 8! Remember, ANCOVA is used when you have 2 or more independent variables that are a mixture of categorical and continuous variables. Our example here is a study investigating the effect of gender (categorical) and body size (continuous) on bill size in a species of bird. Your data must be normally distributed and have homogeneous variances to use this parametric statistical test. Enter the data as shown to the right: The two factors (Species and Body Size) are put in two separate columns. The dependent variable (Bill size) is entered in a third column. To run the ANCOVA: Go to Analyze General Linear Model Univariate as you did for the two-way ANOVA. Put your dependent variable in the Dependent Variable box. Put your categorical explanatory variable in the Fixed Factor(s) box. Put your continuous explanatory variable in the Covariate(s) box. Click on Options. A new window will appear. Click on the check boxes for Descriptive Statistics and Homogeneity tests, then click Continue. Click on Model. A new window will appear. At the top middle of the pop-up window, specify the model as Custom instead of Full factorial. Highlight one of the factors shown on the left side of the pop-up window 5
21 (under Factors & Covariates ) and click the arrow button. That variable should now show up on the right side (under Model ). Do the same with the second factor. Now, highlight the two factors on the right simultaneously and click the arrow, making sure the option is set to interaction. In the end, your Model pop-up window should look something like the image below: Click Continue and then click OK. The output will consist of four tables which show the categorical ( between-subjects ) variable groupings, some descriptive statistics, the results of the Levene s test and the results of the ANCOVA. From the first and second table, it appears that males and females have similarly sized bills. Between-Subjects Factors sex Value Label N 1.00 male female 8 Descriptive Statistics Dependent Variable: bill_size sex Mean Std. Deviation N male female Total From the third table, you know that the data meet the assumption of homogeneity of variance. So, you are clear to interpret the results of the ANCOVA (assuming your data are normal ). Levene's Test of Equality of Error Variances(a) Dependent Variable: bill_size F df1 df2 Sig Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a Design: Intercept+sex+body_size+sex * body_size The ANCOVA results are shown in an ANOVA table which is interpreted similar to the table from the two-way ANOVA. You can see the statistical results regarding the two independent
22 variables (factors) and the interaction between the two factors (i.e., Sex * Body_size) are shown on three separate rows of the table below. Dependent Variable: bill_size Tests of Between-Subjects Effects Source Type III Sum of Squares df Mean Square F Sig. Corrected Model (a) Intercept sex body_size sex * body_size Error Total Corrected Total a R Squared =.862 (Adjusted R Squared =.827) As with the 2-way ANOVA, you must interpret the interaction term FIRST. In this example, the interaction term shows up on the ANOVA table as a row labeled sex*body_size and it tells you whether or not the way that body size affects bill size is the same for males as it is for females. The null hypothesis is that body size does affect bill size the same for each of the two sexes. In other words, the null hypothesis is that the two factors (body size and sex) do not interact in the way they affect bill size. Here, you can see that the interaction term is not significant (p=0.649). Therefore, you can go on to interpret the two factors independently. You can see that there is no effect of Sex on bill size (p=0.525). And, you can see that there is an effect of Body Size on bill size (p<0.001). Let s see how this looks graphically. Make a scatterplot with the dependent variable (Bill Size) on the y-axis and the continuous independent variable (Body Size) on the x-axis. To make the Male and Female data show up as different shaped symbols on your graph, move the categorical independent variable (Sex) into the box labeled Style as shown below: sex male female bill_size body_size
23 From the figure you can see 1) that the way that body size affects bill size is the same for males as it is for females (i.e., there is no interaction between the two factors), that males and females do not differ in their mean bill size (there is clear overlap in the distributions of male and female bill sizes), and 3) that body size and bill size are related to each other (as body size increase, bill size also increases). WHAT TO REPORT: If there is a significant interaction term, the significance of the main effects cannot be fully accepted because of differences in the trends among different combinations of the variables. Thus, you only need to tell your reader about the interaction term from the ANOVA table. Describe the pattern and parenthetically report the appropriate F-value, df, and p). For example: The way that prey size affected energy intake rate was different for large and small fish (F=95.6, df=1,16, p<0.001). (Typically, a result like this would be followed up with two separate regressions (see pg. 27 below) one for large fish and one for small fish.) If the interaction term is not significant, then the statistical results for the main effects can be fully recognized. In this case, you need to tell your reader about the interaction term and about each main effect term of the ANOVA table. Following a statement that describes the general patterns for each of these terms, you should parenthetically report the appropriate F-value, df, and p. For example: Males and females have similar mean bill sizes (F=0.4, df=1,12, p>050), and for both sexes, bill size increases as body size increases (F=68.3, df=1,12, p<0.001). There is no interaction between gender and body size on bill size (F=0.2, df=1,12, p>0.60). $ ( % $ ) ( - )! ; " : This test allows you to compare observed to expected values within a single group of test subjects. For example: Are guppies more likely to be found in predator or non-predator areas? You are interested in whether predators influence guppy behavior. So you put guppies in a tank that is divided into a predator-free refuge and an area with predators. The guppies can move between the two sides, but the predators can not. You count how many guppies were in the predator area and in the refuge after 5 minutes. Here are your data: number of guppies in predator area in refuge 4 16 Your null hypothesis for this test is that guppies are evenly distributed between the 2 areas. To perform the Chi-Square Goodness of fit test: &
24 Open a new data file in SPSS In Variable View, name the first variable Location. In the Measure column, choose Ordinal. Assign 2 values: one for Predator Area and one for Refuge. Then create a second variable called Guppies. In the Measure column, choose Scale. In Data View, enter the observed number of guppies in the 2 areas. Go to Data Weight Cases. In the window that pops up, click on Weight Cases by and select Guppies. Hit OK. Go to Analyze Nonparametric Tests Chi-square. Your test variable is Location. Under Expected Values click on Values. Enter the expected value for the refuge area first, hit add then enter the expected value for the predator area and hit add. Hit OK. In the Location Table, check the values to make sure the test did what you thought it was going to do. Are the observed and expected numbers for the 2 categories correct? Your Chi-Square value, df, and p-value are displayed in the Test Statistics Table. NOTE: Once you are done with this analysis, you will likely want to stop weighting cases. Go to Data Weight Cases and select Do not weight cases. WHAT TO REPORT: You want to report the 2 value, df, and p, parenthetically, following a statement that describes the patterns in the data. - )! ; <" $ " If you have 2 different test subject groups, you can compare their responses to the independent variable. For example, you could ask the question: Do female guppies have the same response to predators as male guppies? The chi-square test of independence allows you to determine whether the response of your 2 groups (in this case, female & male guppies) is the same or is different. You are interested in whether male and female guppies have different responses to predators. So you test 10 male and 10 female guppies in tanks that are divided into a predator-free refuge and an area with predators. Guppies can move between the areas predators can not. You count how many guppies were in the predator area and in the refuge after 5 minutes. Here are the data: number of guppies in predator area in refuge male guppies 1 9 female guppies 3 7 Your null hypothesis is that guppy gender does not affect response to predators or in other words, that there will be no difference in the response of male and female guppies to predators. Or in other words you predict that the effect of predators will not depend on guppy gender. To perform the test in SPSS: In Variable View, set up two variables: Gender and Location. Both are categorical, so they must be Nominal, and you need to set up Values. '
25 Enter your data in 2 columns. Each row is a single fish. Go to Analyze Descriptive Statistics Crosstabs. In the pop-up window, move one of your variables into the Rows window and the other one into the Column window. Click on the Statistics button on the bottom of the Crosstabs window, then click Chi-square in the new pop-up window. Click Continue, then Okay. Your output should look like this: Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent Gender * Location % 0.0% % Gender * Location Crosstabulation Location predators refuge Total Gender male female Total Chi-Square Tests Value df Asymp. Sig. (2- sided) Exact Sig. (2- sided) Exact Sig. (1- sided) Pearson Chi-Square 1.250(b) Continuity Correction(a) Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases 20 a Computed only for a 2x2 table b 2 cells (50.0%) have expected count less than 5. The minimum expected count is How to interpret your output: Ignore the 1st table. The second table (Gender*Location Crosstabulation) has your observed values for each category. You should check this table to make sure your data were entered correctly. In this example, the table correctly reflects that there were 10 of each type of fish, and that 1 male and 3 females were in the predator side of their respective tanks. In the 3rd table, look at the Pearson Chi-Square line. Your Chi-square value is 2 = Your p-value is p = This suggests that the response to predators was not different between male and female guppies. WHAT TO REPORT: You want to report the 2 value, df, and p, parenthetically, following a statement that describes the patterns in the data. For example: Male and female guppies did not differ in their response to predators (chi-square test of independence, 2 =1.25, df=1, p>0.20)..