TIEE Teaching Issues and Experiments in Ecology - Volume 1, January 2004

TIEE Teachig Issues ad Experimets i Ecology - Volume 1, Jauary 2004 EXPERIMENTS Evirometal Correlates of Leaf Stomata Desity Bruce W. Grat ad Itzick Vatick Biology, Wideer Uiversity, Chester PA, 19013 grat@pop1.sciece.wideer.edu vatick@pop1.sciece.wideer.edu stomata viewed at 400x i ail polish impressio from leaf uderside Marc Brodki, 2000 Appedix 1. Guidelies for Statistical Aalysis Moder biological research emphasizes the collectio of quatitative data o a variety of biological topics. Much of these data are highly variable. As a result, techiques of statistical aalysis are very valuable i helpig the biologist describe the variatio withi sets of data, express the degree of cofidece that ca be placed i average values, ad objectively test hypotheses about data collected from differet groups of subjects. This hadout describes a umber of techiques commoly used by biologists for these purposes ad that you will use i the aalysis of your stomata data. Experimets i Ecology, TIEE Volume 1 2004 - Ecological Society of America. (www.tiee.ecoed.et)

Bruce Grat ad Itzick Vatick TIEE Volume 1, Jauary 2004 A. Descriptive Statistics. After a set of data is collected it the ca be aalyzed statistically i order to better determie whether the data support or reject a give hypothesis. The first procedure that is usually doe is to calculate a set of parameters that describe two aspects of the data: (1) cetral tedecy ad (2) dispersio. dispersio cetral tedecy (1) Measures of Cetral Tedecy. Oe type of statistics determies the cetral tedecy of the data. The cetral tedecy provides iformatio o how the values of the data you collected cluster aroud some sigle middle value. There are three measures of cetral tedecy that are used i the aalysis of data, which are described below: MODE = the most frequetly observed value of the data MEDIAN = the middle value whe the data set is ordered i sequetial rak (i.e. highest to lowest, or lowest to highest) MEAN = average value. The mea is the most commoly used measure of cetral tedecy. It is estimated usig the sum of all the idividual values (x i ) divided by the total umber of idividuals i the sample (): x MEAN = X = Σ i = ( x 1 + x 2 + x 3 + x 4 +... + x N ) / Experimets i Ecology, TIEE 2004 - Ecological Society of America. (www.tiee.ecoed.et)

TIEE EXPERIMENT Evirometal Correlates of Leaf Stomata Desity page 3 (2) Measures of Dispersio. Aother set of statistics describes how spread out the data are. RANGE = The highest value mius the lowest value. VARIANCE. The variace is the sum of each of the differeces or deviatios betwee idividual values ad the mea value. The total differece is divided by the umber of idividuals i the sample mius oe. VARIANCE: σ 2 ( X i - X ) = = Σ 2-1 Σ ( X i 2 - * X 2 ) - 1 STANDARD DEVIATION. The square root of the variace. σ2 STANDARD DEVIATION = S = = Σ ( X i 2 - * X 2 ) - 1 STANDARD ERROR. The stadard error is the stadard deviatio divided by the square root of the sample size. STANDARD ERROR = S e.g. for the data: { 3, 3, 4, 5, 6, 6, 6, 6, 7, 8, 10 } that could represet a set of quiz scores, MODE = 6, MEDIAN = 6 MEAN = (3 + 3 + 4 + 5 + 6 + 6 + 6 + 6 + 7 + 8 + 10)/11 = 5.82 SAMPLE VARIANCE = 4.363636 STANDARD DEVIATION = 2.088932 STANDARD ERROR = 0.629837 Experimets i Ecology, TIEE Volume 1 2004 - Ecological Society of America. (www.tiee.ecoed.et)

Bruce Grat ad Itzick Vatick TIEE Volume 1, Jauary 2004 FINDING DESCRIPTIVE STATISTICS USING MICROSOFT S EXCEL The computer makes data aalysis easy. All you eed is to eter your data ito a spreadsheet ad follow the simple steps below: 1. Uder Tools click o Add Is ad the click o Aalysis ToolPack ad OK. 2. Look agai uder the Tools meu ad a ew optio Data Aalysis will appear at the bottom of the meu. Click o Data Aalysis, ad click o Descriptive Statistics. 3. Highlight your colum of data. Hit the Summary Statistics box so that a X appears. Next, specify the Output Rage, i.e. where you wat to put the aalysis output table, ad fially hit OK. 4. The program will spew out a table of statistics that will look somethig like this: Variable 1 Mea 5.818182 Stadard Error 0.629837 Media 6 Mode 6 Stadard Deviatio 2.088932 Sample Variace 4.363636 Kurtosis 0.338976 Skewess 0.454113 Rage 7 Miimum 3 Maximum 10 Sum 64 Cout 11 Cofidece Level(95.000%) 1.234455 Experimets i Ecology, TIEE 2004 - Ecological Society of America. (www.tiee.ecoed.et)

TIEE EXPERIMENT Evirometal Correlates of Leaf Stomata Desity page 5 B. Statistical Testig: Comparisos of Meas Usig a STUDENT S T-TEST For this lab activity, we are goig to carry the aalysis of the data oe step further ad determie whether the hypothesis you proposed for the distributio of stomata i the two groups of leaves you collected should be accepted or rejected. To do this you should compare the meas of your two experimetal groups usig a statistical test called Studet s t-test. The t-test is a statistical test used to determie if the meas of two data sets are sigificatly differet. I statistical terms, the t-test is used to determie if the two data sets you collected come from the same or differet distributios. The t-test ca oly be used whe comparig meas of two samples. More tha two requires a differet test. To perform a t-test, oe calculates a t-value from the two data sets you wish to compare. The t-value is a measure of the ratio of sigal to oise i your data. The "sigal i the umerator represets the differece betwee the meas. I other words, if the meas of your two samples are very differet the the sigal is large. t = sigal oise The "oise i the deomiator represets the total amout of variatio i both samples ad ca be foud by summig the stadard deviatios for each of the data sets (i.e. the pooled variatio) divided by each data set s sample size. Admittedly, this is kid of a tricky idea, however it makes sese whe you thik about it because if there is a great deal of variatio i either or both of your data sets, the it should be more difficult to tell their meas apart. This is the whole idea behid the t-test (as well as behid a large class of statistical tests called parametric tests). The equatio for the t-test (assumig uequal variaces) is thus mea #A - mea #B t = = pooled variatio σ 2 A A X A - + X B σ 2 B B Calculatio of the t-value ca be doe by had, o a calculator, or o a computer (such as the computer program MS-EXCEL). To calculate the t-value by had, all that is required beforehad is that oe kow the sample sizes, meas, ad variaces, σ 2, for each group. As oe ca see from the equatio above, as the differece betwee the meas of your groups gets bigger, the t-value gets bigger. Also, as the pooled variatio gets smaller, the t-value will get bigger. Experimets i Ecology, TIEE Volume 1 2004 - Ecological Society of America. (www.tiee.ecoed.et)

Bruce Grat ad Itzick Vatick TIEE Volume 1, Jauary 2004 Now, the ext questio to ask is - How big does the t-value have to be i order for oe to coclude that the meas are sigificatly differet? This is a really importat questio, ad the aswer is at the heart of all statistical aalyses. The aswer depeds o two thigs - how large is your sample? ad how cofidet do you wat to be that the averages are i fact differet? The effect of sample size ca be easily see i the equatio for the t-value above. Note that the sample sizes ( A ad B ) appear i the deomiator of the deomiator. Thus, as the s get larger, the pooled variatio gets smaller, ad as you recall, the effect of this is that the t-value gets larger. The other way to look at the sample size is more formal ad ivolves a term called degrees of freedom. The umber of degrees of freedom - df for a t-test equals the pooled sample size mius 2 (however if variaces are ot equal a more complicated approximatio method is used). The df tells you i effect how well you ca resolve your averages give the t-value you calculate the higher the df the greater the resolutio. The secod issue metioed above i determiig how big a t-value you eed depeds o how cofidet you wat to be. If you wated to be really cofidet that these meas differ, the you had better look for a very large t-value. But, if you are oly satisfied with a rather margial level of cofidece (say oe i 20 that you re wrog whe you say they differ) the you would be happy with a smaller t-value. The cofidece level is deoted i the test by the P value, which stads for probability. Probability is expressed as a decimal, P = 0.05 is the same as P = 5%. If you happeed to do a stats test ad get a P value of exactly 0.05, the there is a 95% chace that your averages differ, however there also is a 5% chace that if you coclude that the averages differ you are i fact wrog. The 0.95 cutoff really is the miimum criterio for sigificace, however, you ca be 99% sure if you hold out for a larger t-value ad use a P = 0.01 as your criterio for sigificace. The acceptable probability level is always determied BEFORE performig the t-test (most experimeters use P = 0.05). The computer makes data aalysis easy. Now it is time to cosider a example. All you eed is to eter the data below i a spreadsheet (which you already did to perform descriptive statistical aalyses above), ad perform a t test (followig the directios o whatever spreadsheet or stats package is available to you. data set A data set B t-test: Two-Sample Assumig Uequal Variaces 3 1 Data set A data set B 3 1 Mea 5.8181 3.8181 4 2 Variace 4.3636 4.3636 5 3 Observatios 11 11 6 4 Ho Mea Differece 0 6 4 df 20 6 4 t Stat 2.2453 6 4 P(T<=t) oe-tail 0.0181 7 5 t Critical oe-tail 1.7247 8 6 P(T<=t) two-tail 0.0362 10 8 t Critical two-tail 2.0859 Q are the meas for data set A ad B sigificatly differet, ad exactly what iformatio i the table above tells you this? Experimets i Ecology, TIEE 2004 - Ecological Society of America. (www.tiee.ecoed.et)