Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.

Size: px
Start display at page:

Download "Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D."

Transcription

1 Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological observations that can be tabulated as numerical facts, also known as data (singular = datum). Biological research can yield several different types of data. Important measurements include counts (frequency) and those that describe characteristics (length, mass, etc.). Data from a sample are often used to calculate estimates of the average values of the population of interest (mean, mode, and median) and others describing the dispersion around those values (range, variance, and standard deviation). I. Data, Parameters, and Statistics: A Review Recall that data can be of three basic types: 1. Attribute data. These are descriptive, "either-or" measurements, and usually describe the presence or absence of a particular attribute. The presence or absence of a genetic trait ("freckles" or "no freckles") or the type of genetic trait (type A, B, AB or o blood) are examples. Because such data have no specific sequence, they are considered unordered. 2. Discrete numerical data. These correspond to biological observations counted as integers (whole numbers). The number of leaves on each member of a group of plants, the number of breaths per minute in a group of newborns or the number of beetles per square meter of forest floor are all examples of discrete numerical data. These data are ordered, but do not describe physical attributes of the things being counted. 3. Continuous numerical data. These are data that fall along a numerical continuum. The limit of resolution of such data is the accuracy of the methods and instruments used to collect them. Examples are tail length, brain volume, percent body fat...anything that varies on a continuous scale. Rates (such as decomposition of hydrogen peroxide per minute or uptake of oxygen during respiration over the course of an hour) are also numerical continuous data. (Figure 1). (Continuous numerical data generally fall along a normal (Gaussian) distribution. This distribution is a function indicating the probability that a data point will fall between any two real numbers.) When an investigator collects numerical data from a group of subjects, s/he must determine how and with what frequency the data vary. For example, if one wished to study the distribution of shoe size in the human population, one might measure the shoe size of a sample of the human population (say, 50 individuals) and graph the numbers with "shoe size" on the x-axis and "number of individuals" on the y-axis. The resulting figure shows the frequency distribution of the data, a representation of how often a particular data point occurs at a given measurement. Biodiversity Data Analysis 1

2 Usually, data measurements are distributed over a range of values. Measures of the tendency of measurements to occur near the center of the range include the population mean (the average measurement), the median (the measurement located at the exact center of the range) and the mode (the most common measurement in the range). It is also important to understand how much variation a group of subjects exhibits around the mean. For example, if the average human shoe size is "9," we must determine whether shoe size forms a very wide distribution (with a relatively small number of individuals wearing all sizes from 1-15) or one which hovers near the mean (with a relatively large number of individuals wearing sizes 7 through 10, and many fewer wearing sizes 1-6 and 11-15). Measurements of dispersion around the mean include the range, variance and standard deviation. Parameters and Statistics If you were able to measure the height of every adult male Homo sapiens who ever existed, and then calculate a mean, median, mode, range, variance and standard deviation from your measurements, those values would be known as parameters. They represent the actual values as calculated from measuring every member of a population of interest. Obviously, it is very difficult to obtain data from every member of a population of interest, and impossible of that population is theoretically infinite in size. However, one can estimate parameters by randomly sampling members of the population. Such an estimate, calculated from measurements of a subset of the entire population, is known as a statistic. In general, parameters are written as Greek symbols equivalent to the Roman symbols used to represent statistics. For example, the standard deviation for a subset of an entire population is written as "s", whereas the true population parameter is written as σ. II. From Raw Data to Index of Biodiversity Now that you ve had a chance to review a bit of statistical information, it s time to apply it to your own project. In this section, you will be guided through the process of calculating indices from your raw data collected over the past two weeks, and then using those indices to compare the two habitat types you chose. A. Ordinal Data Points: Menhinick s Index (D) When you collected and counted organisms in your samples, you were taking a survey of the number of different species present in each of your two habitat types. You counted the number of individuals of various species in 12 samples collected from each of your two selected habitat types. From these counts, you can calculate a Menhinick s Index (D) for each counted sample. At the end of your preliminary calculations, you should have ten D values for each of the two habitats you are comparing. You will use these D values in the Mann-Whitney U test to determine whether your two habitats differ significantly in the measure of biodiversity you have chosen (species richness). Recall the formula for Menhinick s index, which represents the number of species in the sample divided by the square root of the number of individuals in the sample. s = the number of different species in your sample N = the total number of individual organisms in the sample. Biodiversity Data Analysis 2

3 Your team should have counted at least 10 samples from each of your two habitats, and can now calculate one Menhinick s index (D value) for each sample. Tabulate your D values here: Sample # D habitat1 D habitat So what do we do with these indices? You may have an intuitive sense that they will allow you to determine whether your two sampled habitats overlap in their degrees of biodiversity. But science isn t about intuition. Statistics and statistical tests are used to test whether the results of an experiment are significantly different from the null hypothesis prediction. What is meant by "significant?" For that matter, what is meant by "expected" results? To answer these questions, we must consider the matter of probability. B. Probability The significance level (also known as alpha (α)) for a given study is set by the investigator before the analysis is begun. Alpha is defined as the probability of mistakenly rejecting a null hypothesis that is true (Type I error). By convention, α is usually set at 0.05 (5%). The probability that an observed result is due to some factor other than chance is known as P. The result of a statistical test is a statistic. For example, the student s t test yields a t statistic, the Chi-square test yields a X2 statistic, and the Mann-Whitney U test yields a U statistic. Every value of a particular statistic is associated with a particular P value. If the P value associated with a calculated statistic (e.g., the U statistic you will calculate with the Mann- Whitney test, to be described below) is 0.05, this means that there is only a 5% chance that the rejection of the null hypothesis will be incorrect. A P value of less than 0.05 means that there is an even lower chance of a Type 1 error. (For example, a P value of 0.01 means that there is only a 1% chance that the results are due to chance, and not to the factor you are examining.) In essence, α is a cut off value that defines the area(s) in a probability distribution where a particular value is unlikely to fall. In some studies, a more rigorous α of 0.01 (1%) is required to reject the null hypothesis, and in some others, a more lenient α of 0.1 (10%) is allowed for rejection of the null hypothesis. For our study of biodiversity, you will use an α level of The term "significant" as used in every day conversation is not the same as the statistical meaning of the word. In scientific endeavors, significance has a highly specific and important definition. Every time you read the word "significant" in this lab manual, know that we refer to the following scientifically accepted standard: Biodiversity Data Analysis 3

4 The difference between an observed and expected result is said to be statistically significant if and only if: Under the assumption that there is no true difference, the probability that the observed difference would be at least as large as that actually seen is less than or equal to α (5%; 0.05). Conversely, under the assumption that there is no true difference, the probability that the observed difference would be smaller than that actually seen is greater than 95% (0.95). Once an investigator has calculated a statistic from collected data, s/he must be able to draw conclusions from it. How does one determine whether deviations from the expected (null hypothesis) are significant? There is a specific probability value linked to every possible value of any statistic. A probability distribution assigns a relative probability of any possible outcome (e.g., Menhinick s Index). The species richness calculations you performed for each sample, while expressed as a number, are not distributed along a normal curve. They are ordinal, rather than continuous, data. For this reason, a non-parametric statistical test, the Mann-Whitney U test, will be employed for your analysis. C. Statistical Hypotheses A non-parametric test is used to test the significance of qualitative or attribute data such as those you have been collecting for this research project. In the following sections, you will learn how to apply a statistical test to your data. Your team should already have devised two statistical hypotheses stated in terms of opposing statements, the null hypothesis (H o ) and the alternative hypothesis (H a ). The null hypothesis states that there is no significant difference between two populations being compared. The alternative hypothesis may be either directional (one-tailed), stating the precise way in which the two populations will differ ( Pond A will have greater species richness than Pond B. ), or nondirectional (two-tailed), not specifying the way in which two populations will differ ( Pond A and Pond B will differ in species richness ). Your team should already have devised null and alternative hypotheses for your survey of biodiversity. To determine whether or not there is a difference in biodiversity between your two sample sites, you must now perform statistical tests on your data, the series of Menhinick s Indices (D) that you calculated from your individual survey samples. III. Applying a Statistical Test to Your Menhinick s Indices Once your team has calculated a Menhinick s index (D) for each of your 12 samples from each of the two habitats, you are ready to employ a statistical test to determine whether there is overlap between the range of calculated indices. If there is a great deal of overlap, it means that there is not a significant difference between them, and you will fail to reject your null hypothesis. However, if there is very little overlap (5% or less), you can confidently conclude that two habitats do differ significantly in their species richness, and reject your null hypothesis. A. Non-parametric test for two samples: Mann-Whitney U The Mann-Whitney test allows the investigator (you) to compare your two habitat types without assuming that your D values are normally distributed. The Mann-Whitney U does have its rules. For this test to be appropriate: Biodiversity Data Analysis 4

5 You must be comparing two random, independent samples (your two sites) The measurements (Menhinick s Indices, in our case) should be ordinal No two measurements should have exactly the same value (though we can deal with ties in a way that will be explained shortly). The Mann-Whitney U test allows the investigator to determine whether there is a significant difference between two sets of ordered/ranked data, such as those your team has collected in its biodiversity study. Here is a stepwise explanation and example of how to apply this test to your data. 1. State your null and alternative hypotheses. (You already have done this, right?) H o : H a : Example: H o : There is no difference in the ranks of species richness between a silted pond and a clear pond. H o : There is a difference in the ranks of species richness between a silted pond and a clear pond. 2. State the significance level (alpha, α) necessary to reject H o. This is typically P < Rank your Menhinick s Indices from smallest to largest in a table, noting which index came from which habitat. Example: Table 1 shows 18 (imaginary) values for Menhinick s Indices from the two ponds mentioned before, silted (S) and clear (C). Table 2 shows the values ranked and labeled by pond type. Table 1. Menhinick s Indices Table 2. Ranked Menhinick s Indices for silted and clear ponds D silted D clear Rank Ranked D values Habitat S S S S S S C S S C C C S C C C C C Notice in the ranked table that if two values are the same, then the rank each one receives is the average of the two ranks. For example, value nine appears twice, at rank 6 and 7. Add the two ranks and divide by two to get their mean: 13/2 = 6.5. Each value is assigned their same, mean rank whenever there is a tie. Biodiversity Data Analysis 5

6 4. Assign points to each ranked value. Each silted rank gets one point for every clear rank that appears below it. Every clear value gets one point for every silted value that appears below it. For example, the first rank, 2(s) has 9 clear values below it, so it gets 9 points. Value 9(c) has 3 silted values below it, so it gets 3 points. Table 3. Points assigned to ranked D values in silted and clear ponds. Rank Ranked D Habitat Points values 1 2 S S S S S S C S S C C C S C C C C C 0 5. Calculate a U statistic for each category by adding the points for each habitat. U silted = = 75 U clear = = 6 Your final U value is the smaller of these two values. In this example our U value is 6. In general, the lower the U value, the greater the difference between the two groups being tested. (For example, if none of the D values overlapped, the U value would be zero. That means there is a large difference between the two groups: they do not overlap at all.) 6. You are now ready to move to the final step, determining whether to reject or fail to reject your null hypothesis. (Proceed to Section IV.) A video explanation of the Mann-Whitney U test procedure can be viewed here: B. Non-parametric test for multiple samples: Kruskal Wallis test We told you not to. But some teams just have to go that extra mile. Biodiversity Data Analysis 6

7 If your team is comparing more than two non-parametric data sets, a useful test, analogous to the ANOVA (Analysis Of Variance), is the Kruskal-Wallis test. This is well explained here: Kruskal Wallis: But you re on your own. We warned you. IV. Critical values for non-parametric statistics As you already know, a specific probability value linked to every possible value of any statistic, including the Mann-Whitney U statistic you just calculated. A. Critical values for the Mann-Whitney U statistic Remember that we have defined our significance level (α) as This implies that a correct null hypothesis will be rejected only 5% of the time, but correctly identified as false 95% of the time. A critical value of a statistic (e.g., your Mann-Whitney U statistic) is that value associated with a significance level of 0.05 or lower. The critical values for the Mann-Whitney U statistic are listed in Table 4. Compare your U value to those shown in the Table of Critical Values for the Mann-Whitney U (Table 4). Find the sample size (i.e., the number of Menhinick s Indices (D) you calculated) for each of your two habitats, and use the matrix to find the critical value for U at those two sample sizes. (For example, if you calculated 19 D values for one habitat and 17 for the other, then the critical value of the U statistic would be 99. This means that a U value of 99 or lower indicates rejection of the null hypothesis. In our example calculation, there were nine samples from two different habitats. In the Mann-Whitney U table, that corresponds to a critical value of 17. Our U statistic was 6, which is quite a bit lower than 17. This means that, if these were real data, we would reject the null hypothesis and fail to reject the alternative hypothesis. There is a significant difference in species richness between the clear and silted ponds. If your U value is lower than the critical value at the appropriate spot in the table, reject your null hypothesis. If your U value is greater than that in the table, fail to reject. B. Critical values for the Kruskall Wallis statistic If your team went crazy and decided to sample more than two different habitats, then your data analysis will be more complex. You will still use a non-parametric test, but it will be analogous to the parametric ANOVA, not the t-test. In this case, you will use the Kruskal Wallis test, as shown in the video linked above. Kruskal Wallis critical values are more complex, as they involve more than two data sets. Fortunately for us, J. Patrick Meyer (University of Virginia) and Michael A. Seaman (University of South Caroina) have made available a limited portion of a table of critical values they have calculated. These can be found here if your project involves either three or four data sets: The tables are not complete, but they do provide critical values for α levels of 0.1, 0.5, and You are unlikely to need other values; these will tell you whether to reject or fail to reject your null hypothesis. Biodiversity Data Analysis 7

8 Table 5. Critical values for the Mann-Whitney U statistic. Find the value that corresponds to the sample sizes of your two habitats. If your U value is smaller than that shown in the table, then there is less than 5% chance that the difference between your two habitats is due to chance alone. If your U value is smaller than the one shown in this table for your two sample sizes, reject your null hypothesis. If your U value is larger than that shown in the table, fail to reject your null hypothesis. (From The Open Door Web Site, Biodiversity Data Analysis 8

9 V. Project Completed. Is This the End? The study you are now completing is only the beginning of what could be a long-term research project to discover the various factors that affect biodiversity. The only thing you are determining now is whether or not there is a statistically significant difference between your two sample habitats. In other words, the research project you are now completing is a pilot Biodiversity Data Analysis 9

10 study. It establishes an observable fact (i.e., that there is or is not a difference in biodiversity between your two sample habitats). That fact should be subject to further investigation beyond what you have accomplished here. Although you may have established that there is or is not a difference in biodiversity between your two sample habitats, you still cannot definitively state why or why not there is a difference. To do that, you must move to the next step, which is to list as many competing hypotheses as possible as to why there is a difference (or even if your team has obtained negative results why there is not a difference, despite obvious differences in your two sample habitats). Each of these multiple hypotheses could form the basis for a research project that would take your team one step further towards discovering the reasons for your pilot study s observed result. You should be able to give a brief description of an experiment that could be designed to test each of your competing hypotheses. In your presentation, be sure to include a list of hypotheses that could explain your observed results. What factors differed between the two habitats that might cause differences in biodiversity? Would these factors affect the physiology of any organisms that lived there? Or would they simply be more hospitable to certain species and not others? When you consider your results, consider every aspect of your findings, and report anything you find intriguing enough to warrant further study. Science is not a one-project endeavor. Every finding of every research project can be seen as opening a new doorway to discovery of the most intimate mechanisms of life. Biodiversity Data Analysis 10

Appendix 2 Statistical Hypothesis Testing 1

Appendix 2 Statistical Hypothesis Testing 1 BIL 151 Data Analysis, Statistics, and Probability By Dana Krempels, Ph.D. and Steven Green, Ph.D. Most biological measurements vary among members of a study population. These variations may occur for

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Difference tests (2): nonparametric

Difference tests (2): nonparametric NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U Previous chapters of this text have explained the procedures used to test hypotheses using interval data (t-tests and ANOVA s) and nominal

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

The Mann-Whitney U test. Peter Shaw

The Mann-Whitney U test. Peter Shaw The Mann-Whitney U test Peter Shaw Introduction We meet our first inferential test. You should not get put off by the messy-looking formulae it s usually run on a PC anyway. The important bit is to understand

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Describing Populations Statistically: The Mean, Variance, and Standard Deviation Describing Populations Statistically: The Mean, Variance, and Standard Deviation BIOLOGICAL VARIATION One aspect of biology that holds true for almost all species is that not every individual is exactly

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Experimental Analysis

Experimental Analysis Experimental Analysis Instructors: If your institution does not have the Fish Farm computer simulation, contact the project directors for information on obtaining it free of charge. The ESA21 project team

More information

The Kruskal-Wallis test:

The Kruskal-Wallis test: Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions

More information

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

WISE Power Tutorial All Exercises

WISE Power Tutorial All Exercises ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II

More information

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals. 1 BASIC STATISTICAL THEORY / 3 CHAPTER ONE BASIC STATISTICAL THEORY "Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1 Medicine

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

HOW TO WRITE A LABORATORY REPORT

HOW TO WRITE A LABORATORY REPORT HOW TO WRITE A LABORATORY REPORT Pete Bibby Dept of Psychology 1 About Laboratory Reports The writing of laboratory reports is an essential part of the practical course One function of this course is to

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem) NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Statistics for Sports Medicine

Statistics for Sports Medicine Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

PEARSON R CORRELATION COEFFICIENT

PEARSON R CORRELATION COEFFICIENT PEARSON R CORRELATION COEFFICIENT Introduction: Sometimes in scientific data, it appears that two variables are connected in such a way that when one variable changes, the other variable changes also.

More information

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript JENNIFER ANN MORROW: Welcome to "Introduction to Hypothesis Testing." My name is Dr. Jennifer Ann Morrow. In today's demonstration,

More information

WHAT IS A JOURNAL CLUB?

WHAT IS A JOURNAL CLUB? WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population. SAMPLING & INFERENTIAL STATISTICS Sampling is necessary to make inferences about a population. SAMPLING The group that you observe or collect data from is the sample. The group that you make generalizations

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky

Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky Version 4.0 Statistics Guide Statistical analyses for laboratory and clinical researchers Harvey Motulsky 1999-2005 GraphPad Software, Inc. All rights reserved. Third printing February 2005 GraphPad Prism

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Levels of measurement in psychological research:

Levels of measurement in psychological research: Research Skills: Levels of Measurement. Graham Hole, February 2011 Page 1 Levels of measurement in psychological research: Psychology is a science. As such it generally involves objective measurement of

More information

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information