Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Transcription

1 Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

2 Introduction to Statistics with GraphPad Prism 2 Licence This manual is , Anne Segonds-Pichon. This manual is distributed under the creative commons Attribution-Non-Commercial-Share Alike 2.0 licence. This means that you are free: to copy, distribute, display, and perform the work to make derivative works Under the following conditions: Attribution. You must give the original author credit. Non-Commercial. You may not use this work for commercial purposes. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one. Please note that: For any reuse or distribution, you must make clear to others the licence terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Full details of this licence can be found at

3 Introduction to Statistics with GraphPad Prism 3 Table of contents Introduction to Statistics with GraphPad Prism (5.01)... 1 Introduction... 4 Chapter 1: Basic structure of an GraphPad Prism project... 5 Chapter 2: Qualitative data... 7 Example... 7 The χ 2 test... 9 The null hypothesis and the error types Chapter 3: Quantitative data Descriptive stats The mean The median: The variance The Standard Deviation (SD) Standard Deviation vs. Standard Error Confidence interval Assumptions of parametric data How can you check that your data are parametric/normal? Example Quantitative data representation The t-test Independent t-test Paired t-test Example Comparison of more than 2 means: Analysis of variance A bit of theory Example Correlation Example Correlation coefficient... 33

4 Introduction to Statistics with GraphPad Prism 4 Introduction Prism is the officially supported graphical package at Babraham. It is a straight forward package with a friendly environment. There is a lot of easy-to-access documentation and the tutorials are very good. Graphical representation of data is pivotal when one wants to present scientific results, in particular in publications. GraphPad allows you to build top quality graphs, much better than Excel for example and in a much more intuitive way. In this manual however, we are going to focus on the statistical menu of GraphPad. The data analysis approach is a bit friendlier than with SPSS (the statistical package officially supported by the institute). SPSS does not hold your hand all the way through your analysis whereas GraphPad does. On the down side, you cannot do as many as different analyses with GraphPad that you can with SPSS. If you need to run say a 3-way ANOVA then you will need to use SPSS. So the 2 packages work quite differently but whether you choose one and the other, in both cases you need some basic statistical knowledge if only to design your experiments correctly, so there is no way out of it! And don t forget: you use stats to present your data in a comprehensible way and to make your point; this is just a tool, so don t hate it, use it! To consult a statistician after an experiment is finished is often merely to ask him to conduct a postmortem examination. He can perhaps say what the experiment died of." R.A.Fisher, 1938.

5 Introduction to Statistics with GraphPad Prism 5 Chapter 1: Basic structure of an GraphPad Prism project Click on the GraphPad Prism icon and the window below will appear. Before you do anything with GraphPad you need to have in mind the type of graph/analysis you want to do as this will determine the type of table you are going to choose. Then there are 2 scenarios, either you enter your data directly into GraphPad, in which case, depending on the type of table you are choosing, you may need to know exactly how many data points you are going to deal with. Or your data are already into Excel in which case it seems that you cannot import from its latest version and even with the previous one it is not easy and it does not work for Mac. So whenever possible, as the Prism Help suggests, transfer data from Excel using copy and paste. As mentioned previously, unlike other softwares, you need to choose a type of table before doing anything else which will be dependent upon the type of graph/analysis you want to do. Unlike in Excel for instance, the worksheets don t have all the same structure. You can choose from 5 different types: - XY table in which each point is defined by both an X and a Y value, though for one X you can have several Y like replicates which will be used to calculate error bars. Replicates are in side-by-side sub columns. This type of table allows you to run linear regression, correlation and to calculate area under the curve. - Column table in which each column defines a treatment group. From this type of table, one can run a t-test and a one-way ANOVA or one of the non parametric equivalent tests. - Grouped table in which you can have 2 grouping variables, hence running 2-way ANOVAs. - Contingency table in which one can enter categorical data suitable for Fisher s exact test or Chi-square. - Survival table for survival analysis! In this manual we will cover only XY, column and contingency tables. The type of analysis you can run. You choose a table. You choose a graph.

6 Introduction to Statistics with GraphPad Prism 6 Whatever the type of tables you have chosen each Project contains the 5 folders: - Data Tables in which are the worksheets containing the data, - Info section in which you can enter information about the technical aspect of the experiment like the protocol or who was the experimenter, - Results in which are the outputs of the statistical analysis - Graphs in which are the graphs! They are automatically generated from your data but you can make them pretty afterwards - Layouts in which you can present your graphs and analysis.

7 Introduction to Statistics with GraphPad Prism 7 Chapter 2: Qualitative data Let s talk about the important stuff: your data. The first thing you need to do good stats is to know your data inside out. They are generally organised into variables, which can be divided into 2 categories: qualitative and quantitative. Qualitative data are non numerical data and the values taken are usually names (also nominal data) (e.g. variable sex: male or female). The values can be numbers but not numerical (e.g. an experiment number is a numerical label but not a unit of measurement). A qualitative variable with intrinsic order in their categories is ordinal. Finally, there is the particular case of qualitative variable with only 2 categories, it is then said to be binary or dichotomous (e.g. alive/dead or male/female). We are going to use an example to go through the analysis and the plotting of categorical data. Example (File: cats and dogs.xlsx) A researcher is interested in whether animals could be trained to line dance. He takes some cats and dogs (animal) and tries to train them to dance by giving them either food or affection as a reward (training) for dance-like behaviour. At the end of the week a note is made of which animal could line dance and which could not (dance). All the variables are dummy variables (categorical). The pivotal (!) question is: Is there an effect of training on dogs and cats ability to learn to line dance? It is quite intuitive that after having run such an experiment, you are going to end up with a contingency tables that is going to show the number of animals who danced or not according to the type of training they received. Those contingency tables are presented below. Count Type of training Food Affection Total Did they yes dance? no Total Cat Count Type of training Food Affection Total Did they Yes dance? no Total Dog The first thing to do is enter the data into GraphPad. While for some software it is OK or even easier to prepare your data in Excel and then import them, it is not such a good idea with GraphPad because, as we said before, the structure of the worksheets varies with the type of graph you want to do. So, first, you need to open a New Project which means that you have to choose among the different types of tables mentioned earlier. In our case we want to build a contingency table, so we choose Contingency and we click on OK. The next step is to enter the data after having named the columns and the rows.

8 Introduction to Statistics with GraphPad Prism 8 When you want to insert another sheet you have 2 choices. If the second sheet has the same structure and variable s names that the first one, you can right-click on the first sheet name (here Dog ) and choose Duplicate Current Sheet and all you have to do is change the values. If the second sheet has different structure, you click on New>New data table in the Sheet Menu. The first thing you want to do is look at a graphical representation of the data. GraphPad will have done it for you and if you go into Graphs you will see the results. You can change pretty much everything on a graph in GraphPad and it is very easy to make it look like that for instance: Counts Cat Dance Yes Dance No 0 Food Affection Counts Dog Dance Yes Dance No 0 Food Affection I will not go into much detail in this manual about all the graphical possibilities of GraphPad because it is not its purpose but it is very intuitive and basically, once you have entered the data in the correct way, you are OK. After that all you have to do is click on the bit you want to change and, usually, a window will pop up. To analyse such data you need to use a Fisher s exact test or a χ 2 test. Both tests will give you the same-ish p-value for big samples but for small samples the difference can be a bit more important and the p-value given by Fisher s exact test is more accurate. Having said that, the calculation of the Fisher s exact test is quite complex whereas the one for χ 2 is quite easy so

9 Introduction to Statistics with GraphPad Prism 9 only the calculation of the latter is going to be presented here. Also, the Fisher s test is often only available for 2x2 tables, as in GraphPad for example, so in a way the χ 2 is more general. For both tests, the idea is the same: how different are the observed data from what you would have expected to see by chance i.e. if there were no association between the 2 variables. Or, looking at the table you may also ask: knowing that 32 of the 68 cats did dance and that 36 of the 68 received affection, what is the probability that those 32 dancers would be so unevenly distributed between the 2 types of reward? A bit of theory: the Chi 2 test It could be either: - a one-way χ 2 test, which is basically a test that compares the observed frequency of a variable in a single group with what would be the expected by chance. - a two-way χ 2 test, the most widely used, in which the observed frequencies for two or more groups are compared with expected frequencies by chance. In other words, in this case, the χ 2 tells you whether or not there is an association between 2 categorical variables. An important thing to know about the χ 2, and for the Fisher s exact test for that matter, is that it does not tell you anything about causality; it is simply measuring the strength of the association between 2 variables and it is your knowledge of the biological system you are studying which will help you to interpret the result. Hence, you generally have an idea of which variable is acting the other. The Chi2 value is calculated using the formula below: The observed frequencies are the one you measured, the values that are in your table. Now, the expected ones are calculated this way: Expected frequency = (row total)*(column total)/grand total So, for the cat, for example: the expected frequency of cat that would line dance after having received food as reward is: - probability of line dancing: 32/68 - probability of receiving food: 32/68 So the expected frequency: (32*32)/68 = 15.1

10 Introduction to Statistics with GraphPad Prism 10 Did they dance? * Type of Training * Anima l Crosstabulation Animal Cat Dog Did they dance? Total Did they dance? Total Yes No Yes No Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count Type of Training Food as Affection as Reward Reward Total Intuitively, one can see that we are kind of averaging things here, we try to find out the values we should have got by chance. If you work out the values for all the cells, you get: So for the cat, the χ 2 value is: ( ) 2 / (6-16.9) 2 / (6-16.9) 2 / ( ) 2 /19.1 = 28.4 Let s do it with GraphPad. To calculate either of the tests, you click on = Analyze in the tool bar menu, then the window below will appear. GraphPad will offer you by default the type of analysis which go with the type of data you have entered. So to the question Which analysis for Contingency tables?, the answer is Chi-square and Fisher s exact test. If you are happy with it, and after having checked that the data sets to be analysed are the one you want, you can click on OK. The complete analysis will then appear in the Results section. Below are presented the results for the χ 2 and the Fisher s exact test.

11 Introduction to Statistics with GraphPad Prism 11 Let s start with the χ 2 : there is only one assumption that you have to be careful about when you run it: with 2x2 contingency tables you should not have cells with an expected count below 5 as if it is the case it is likely that the test is not accurate (for larger tables, all expected counts should be greater than 1 and no more than 20% of expected counts should be less than 5). If you have a high proportion of cells with a small value in it, then you should use a Fisher s exact test. However as I said before many software including GraphPad only offer the calculation of the Fisher s exact test for 2x2 tables. So when you have more than 2 categories and a small sample you are in trouble. You have 2 solutions to solve the problem: either you collect more data or you group the categories to boost the proportions. If you remember the χ 2 s formula, the calculation gives you estimation of the difference between your data and what you would have obtained if there was no association between your variables. Clearly, the bigger the value of the χ 2, the bigger the difference between observed and expected frequencies and the more likely to be significant the difference is. As you can see here the p-values vary slightly between the 2 tests ( vs ) though the conclusion remains the same: the type of reward has no effect whatsoever on the ability of dogs to line dance (p=0.9). Though the samples are not very big here, the assumptions for the χ 2 are met so you can choose either test. As for the cats, you are more than 99% confident (p< ) when you say that cats are more likely to line dance when they receive food as a reward than when they receive affection.

12 Introduction to Statistics with GraphPad Prism 12 A bit of theory: the null hypothesis and the error types. The null hypothesis (H 0 ) corresponds to the absence of effect (e.g.: the animals rewarded by food are as likely to line dance as the ones rewarded by affection) and the aim of a statistical test is to accept or to reject H 0. Traditionally, a test or a difference are said to be significant if the probability of type I error is: α =< 0.05 (max α=1). It means that the level of uncertainty of a test usually accepted is 5%. It also means that there is a probability of 5% that you may be wrong when you say that your 2 means are different, for instance, or you can say that when you see an effect you want to be at least 95% sure that something is significantly happening. Statistical decision True state of H 0 H 0 True H 0 False Reject H 0 Type I error (False Positive) Correct (True Positive) Do not reject H 0 Correct (True Negative) Type II error (False Negative) Tip: if your p-value is between 5% and 10% (0.05 and 0.10), I would not reject it too fast if I were you. It is often worth putting this result into perspective and asks yourself a few questions like: - what the literature says about what am I looking at? - what if I had a bigger sample? - have I run other tests on similar data and were they significant or not? The interpretation of a border line result can be difficult as it could be important in the whole picture. The specificity and the sensitivity of a test are closely related to Type I and Type II errors. Specificity = Number of True Negatives / (Number of False Positives + Number of True Negatives) A test with a high specificity has a low type I error rate. Sensitivity = Number of True Positives / (Number of False Negatives + Number of True Positives) A test with a high sensitivity has a low type II error rate.

13 Introduction to Statistics with GraphPad Prism 13 Chapter 3: Quantitative data When it comes to quantitative data, more tests are available but assumptions must be met before applying them. There are 2 types of stats tests: parametric and non-parametric ones. Parametric tests have 4 assumptions that must be met for the test to be accurate. Non-parametric tests are designed to be used with nominal or ordinal data (e.g. χ 2 test) and they make few or no assumptions about populations parameters like normality (e.g. Mann-Whitney test). 3-1 A bit of theory: descriptive stats The mean (or average) µ = average of all values in a column It can be considered as a model because it summaries the data. - Example: number of friends of each members of a group of 5 lecturers: 1, 2, 3, 3 and 4 Mean: ( )/5 = 2.6 friends per lecturer: clearly an hypothetical value! But if the values were: 0, 0, 1, 5 and 7, the mean would also be 2.6 but clearly it would not give an accurate picture of the data. So, how can you know that it is an accurate model? You look at the difference between the real data and your model. To do so, you calculate the difference between the real data and the model created and you make the sum so that you get the total error (or sum of differences). (x i - µ) = (-1.6) + (-0.6) + (0.4) + (0.4) + (1.4) = 0 And you get no errors! Of course: positive and negative differences cancel each other out. So to avoid the problem of the direction of the error, you can square the differences and instead of sum of errors, you get the Sum of Squared errors (SS). - In our example: SS = (-1.6) 2 + (-0.6) 2 + (0.4) 2 + (0.4) 2 + (1.4) 2 = 5.20 The median: The median is the value exactly in the middle of an ordered set of numbers. Example 1: , Median = 68 Example 2: , Median = 60 The variance This SS gives a good measure of the accuracy of the model but it is dependent upon the amount of data: the more data, the higher the SS. The solution is to divide the SS by the number of observations (N). As we are interested in measuring the error in the sample to estimate the one in the population, we divide the SS by N-1 instead of N and we get the variance (S 2 ) = SS/N-1 - In our example: Variance (S 2 ) = 5.20 / 4 = 1.3

14 Introduction to Statistics with GraphPad Prism 14 Why N-1 instead N? If we take a sample of 4 scores in a population they are free to vary but if we use this sample to calculate the variance, we have to use the mean of the sample as an estimate of the mean of the population. To do that we have to hold one parameter constant. - Example: mean of the sample is 10 We assume that the mean of the population from which the sample has been collected is also 10. If we want to calculate the variance, we must keep this value constant which means that the 4 scores cannot vary freely: - If the values are 9, 8, 11 and 12 (mean = 10) and if we change 3 of these values to 7, 15 and 8 then the final value must be 10 to keep the mean constant. - If we hold 1 parameter constant, we have to use N-1 instead of N. - It is the idea behind the degree of freedom: one less than the sample size. The Standard Deviation (SD) The problem with the variance is that it is measured in squared units which is not very nice to manipulate. So for more convenience, the square root of the variance is taken to obtain a measure in the same unit as the original measure: the standard deviation. - S.D. = (SS/N-1) = (S 2 ), in our example: S.D. = (1.3) = So you would present your mean as follows: µ = 2.6 +/ friends The standard deviation is a measure of how well the mean represents the data or how much your data are squattered around the mean.: - small S.D.: data close to the mean: mean is a good fit of the data (graph on the left) - large S.D.: data distant from the mean: mean is not an accurate representation (graph on the right) Standard Deviation vs. Standard Error Many scientists are confused about the difference between the standard deviation (S.D.) and the standard error of the mean (S.E.M. = S.D. / N). - The S.D. (graph on the left) quantifies the scatter of the data and increasing the size of the sample does not increase the scatter (above a certain threshold). - The S.E.M. (graph on the right) quantifies how accurately you know the true population mean, it s a measure of how much you expect sample means to vary. So the S.E.M. gets smaller as your samples get larger: the mean of a large sample is likely to be closer to the true mean than is the mean of a small sample.

15 Introduction to Statistics with GraphPad Prism 15 A big S.E.M. means that there is a lot of variability between the means of different samples and that your sample might not be representative of the population. A small S.E.M. means that most samples means are similar to the population mean and so your sample is likely to be an accurate representation of the population. Which one to choose? - If the scatter is caused by biological variability, it is important to show the variation. So it is more appropriate to report the S.D. rather than the S.E.M. Even better, you can show in a graph all data points, or perhaps report the largest and smallest value. - If you are using an in vitro system with no biological variability, the scatter can only result from experimental imprecision (no biological meaning). It is more sensible then to report the S.E.M. since the S.D. is less useful here. The S.E.M. gives your readers a sense of how well you have determined the mean. Choosing between SD and SEM also depends on what you want to show. If you just want to present your data on a descriptive purpose then you go for the SD or the SEM. If you want the reader to be able to infer an idea of significance then you should go for the SEM or the Confidence Interval (see below). We will go a bit more in details later. Confidence interval - The confidence interval quantifies the uncertainty in measurement. The mean you calculate from your sample of data points depends on which values you happened to sample. Therefore, the mean you calculate is unlikely to equal the true population mean exactly. The size of the likely discrepancy depends on the variability of the values (expressed as the S.D. or the S.E.M.) and the sample size. If you combine those together, you can calculate a 95% confidence interval (95% CI), which is a range of values. If the population is normal (or nearly so), you can be 95% sure that this interval contains the true population mean. 95% of observations in a normal distribution lie within +/- 1,96*SE

16 Introduction to Statistics with GraphPad Prism 16 One other way to look at error bars: Error bars Type Description Standard deviation (SD) Descriptive Typical or average difference between the data points and their mean. Standard error (SEM) Inferential A measure of how variable the mean will be, if you repeat the whole study many times. Confidence interval (CI), Inferential A range of values you can be usually 95% CI 95% confident contains the true mean. From Geoff Cumming et al. If you want to compare experimental results, it could be more appropriate to show inferential error bars such as SE or CI rather than SD. However if n is very small (for example n=3), rather than showing error bars and statistics, it is better to simply plot the individual data points. You can estimate statistical significance using the overlap rule for SE bars. In the same way, you can estimate statistical significance using the overlap rule for 95% CI bars.

17 Introduction to Statistics with GraphPad Prism A bit of theory: Assumptions of parametric data When you are dealing with quantitative data, the first thing you should look at is how they are distributed, how they look like. The distribution of your data will tell you if there is something wrong in the way you collected them or enter them and it will also tell you what kind of test you can apply to make them say something. T-test, analysis of variance and correlation tests belong to the family of parametric tests and to be able to use them your data must comply with 4 assumptions. 1) The data have to be normally distributed (normal shape, bell shape, Gaussian shape). Example of normally distributed data: There are 2 main types of departure from normality: - Skewness: lack of symmetry of a distribution - Kurtosis: measure of the degree of peakedness in the distribution The two distributions below have the same variance approximately the same skew, but differ markedly in kurtosis.

18 Introduction to Statistics with GraphPad Prism 18 2) Homogeneity in variance: The variance should not change systematically throughout the data. 3) Interval data: The distance between points of the scale should be equal at all parts along the scale 4) Independence: Data from different subjects are independent so that values corresponding to one subject do not influence the values corresponding to another subject. There are specific designs for repeated measures experiments. How can you check that your data are parametric/normal? GraphPad can test the normality of the distribution of your sample(s). To do so, you go: =Analyze>Column Analyses>Column statistics. You are given the choice between 3 tests for normality: D'Agostino and Pearson, Kolmogorov- Smirnov and Shapiro-Wilk. These tests require n>=7 and the D'Agostino and Pearson test is the one to go for. As GraphPad puts it: It first computes the skewness and kurtosis to quantify how far from Gaussian the distribution is in terms of asymmetry and shape. It then calculates how far each of these values differs from the value expected with a Gaussian distribution, and computes a single p-value from the sum of these discrepancies. The Kolmogorov-Smirnov test is not recommended, and the Shapiro-Wilk test is only accurate when no two values have the same value. Let s try it through an example. Example (File: coyote.xlsx) In this case, the normality test tells us that our data are normally distributed. Actually, the test does not tell you that your data are normally distributed, it tells you that they are not significantly different from normality ( p= and p=0.7757).

19 Introduction to Statistics with GraphPad Prism 19 However, the best way to get a real good idea of what is going on is to plot your data. When it comes to normality, there are 2 ways to plot your data: the histogram and the box plot. We are going to do both with Graphpad. Let s start with the histogram. To draw such a graph with GraphPad, you first need to calculate the frequency distribution. To do so, you go: =Analyze>Column Analyses>Frequency distribution. GraphPad will automatically draw a histogram from the frequency. The slightly delicate thing here is to determine the size of the bin: too small, the distribution may look anything but normal, too big, you will not see a thing. The best way is to try 2 or 3 bin size and see how it goes. Something else to be careful about: by default GraphPad will plot the counts (in Tabulate> Number of Data Points). It is OK when you plot just one group or one data set but when you want to plot several (or just 2 like here) and the groups are not of the same size then you should plot percentages (in Tabulate> Relative frequencies as percent) if you want to be able to compare them graphically. Histogram of Coyote:Freq. dist. (histogram) Female Male Percentage Bin Center Female Male Percentage Bin Center Percentage Female Male Bin Center As you can see, depending of the choice of the bin size, the histograms look quite different. And even though they don t look too normal the data still passed the test. It is why I don t like histogram that much, especially with not very big data sets.

20 Introduction to Statistics with GraphPad Prism 20 My preference goes to the box plot as it tells you in one go anything you need to know and you don t need to play with the bin size! To draw a box plot you choose it from the gallery of graphs in Column and you choose Tukey for Whiskers. Tukey was the guy who invented the box plot and this particular representation allows you to indentify outliers (which we will talk about later). It is very important that you know how a box plot is built. It is rather simple and it will allow you to get a pretty good idea about the distribution of your data in a glance. Below you can see the relationship between box plot and histogram. If your distribution is normal-ish then the box plot should be symmetrical. Regarding the outliers, there is no really right or wrong attitude. If there is a technical issue or an experimental problem, you should remove it of course but if there is nothing obvious, it is up to you. I would always recommend keeping outliers if you can; you can run the analysis with and without it for instance and see what effect it has on the p-value. If the outcome is still consistent with your hypothesis, then you should keep it. If not, then it is between you and your conscience!

21 Introduction to Statistics with GraphPad Prism 21 Finally, you can check the second assumption (homogeneity of variances). In GraphPad the second assumption is tested by default. When you ask for a t-test, GraphPad will calculate an F test to tell you if variances were different or not. Don't be too quick to switch to using the nonparametric Kruskal-Wallis ANOVA (or the Mann-Whitney test when comparing two groups). While nonparametric tests do not assume Gaussian distributions, the Kruskal-Wallis and Mann-Whitney tests do assume that the shape of the data distribution is the same in each group. So if your groups have very different standard deviations and so are not appropriate for one-way ANOVA, they also should not be analyzed by the Kruskal-Wallis or Mann-Whitney tests either. However ANOVA and t-tests are rather robust, especially when the samples are not too small so you can get away with small departure from normality and small differences in variances. Often the best approach is to transform the data and transforming to logarithms or reciprocals does the trick, restoring equal variance. Going back to the box plots, the symmetry tells you about the distribution of the data and if both (like in our case) are of the same size-ish, then you know that the variances are about the same. Quantitative data representation Let s go back to our coyotes. What you want from your graph is to see if there is difference between males and females and possibly, have an idea of the significance of the difference. The best way to do it is to plot the error bars as in confidence intervals (CI). Length (cm) Male Female There is about 40% of overlap between the error bars. Significance can still be reached up to 50% of overlap depending on sample size and variability. This is a very informative graph as you can spot the 2 means together with the confidence interval. We saw before that the 95% CI of the mean gives you the boundaries between which you are 95% sure to find the true population mean. It is always better when you want to compare visually 2 or more groups to use the CI than the SD or to some extent the SEM. It gives you a good idea of the dispersion of your sample and, as we saw before, it easily allows you to have an idea, before doing any stats, of the likelihood of a significant difference between your groups. Since your true group means have 95% chances of lying within their respective CI, such a big overlap between the CI tells you that the difference is probably not significant. In our particular example, from the graph we can say that the average body length of female coyotes, for instance, is a little bit more that 92 cm and that 95 out of 100 samples from the same population would have means between about 90 and 94 cm. We can also say that despite the fact that the females appear smaller than the males, this difference is probably not significant as the errors bars overlap a lot.

22 Introduction to Statistics with GraphPad Prism 22 To check that, we are going to run a t-test. 3-3 A bit of theory: the t-test The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups. The figure above shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the idealized distribution. The figure indicates where the control and treatment group means are located. The question the t-test addresses is whether the means are statistically different. What does it mean to say that the averages for two groups are statistically different? Consider the three situations shown in the figure below. The first thing to notice about the three situations is that the difference between the means is the same in all three. But, you should also notice that the three situations don't look the same -- they tell very different stories. The top example shows a case with moderate variability of scores within each group. The second situation shows the high variability case. The third shows the case with low variability. Clearly, we would conclude that the two groups appear most different or distinct in the bottom or low-variability case. Why? Because there is relatively little overlap between the two bell-shaped curves. In the high variability case, the group difference appears least striking because the two bell-shaped distributions overlap so much. This leads us to a very important conclusion: when we are looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their scores. The t-test does just this. The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. Figure

23 Introduction to Statistics with GraphPad Prism 23 3 shows the formula for the t-test and how the numerator and denominator are related to the distributions. The t-value will be positive if the first mean is larger than the second and negative if it is smaller. To run a t-test GraphPad, you go: =Analysis> Column analyses>t-tests and then you have to choose between 2 types of t-tests: Unpaired and Paired t-testd. The choice between the 2 is very intuitive. If you measure a variable in 2 different populations, you choose the independent t-test as the 2 populations are independent from each other. If you measure a variable 2 times in the same population, you go for the paired t-test. So say, you want to compare the weights of 2 breeds of sheep. To do so, you take a sample of each breed (the 2 samples have to be comparable) and you weigh each animal. You then run an Independent-samples t-test on your data to find out if the difference is significant. You may also want to compare 2 types of sheep food (A and B): to do so you define 2 samples of sheep comparable in every other ways and you weigh them at day 1 and say at day 30. This time you apply a Paired-Samples t-test as you are interested in each individual difference in weight between day 1 and day 30. Independent t-test Let s go back to our coyotes. You go =Analysis>Column analyses> t-tests. The default setting here is good as you want to run a Unpaired t-test.

24 Introduction to Statistics with GraphPad Prism 24 Though the males are bigger than the females, the difference between the 2 genders does not reach significance (p=0.1045). The variances of the 2 groups are not significantly different (p=0.8870) hence the second assumption for parametric test is met. Paired t-test Now let s try a Paired t-test. As we mentioned before, the idea behind the paired t-test is to look at a difference between 2 paired individuals or 2 measures for a same individual. For the test to be significant, the difference must be different from 0. Example (File: height husband wife.xlsx) 200 Height (cm) Husband Wife From the graph above, we can conclude that if husbands are taller than wives, this difference does not seem significant. Before running the paired t-test to get a p-value we are going to check that the assumptions for parametric stats are met. The box plots below seem to indicate that there is no significant departure from normality and this is confirmed by the D Agostino & Pearson test. 200 Husband and Wife Height Husband Wife

25 Introduction to Statistics with GraphPad Prism 25 Normality Husband are significantly taller than the wives (p<0.0001). On average, husbands are cm taller than their wives. The confidence interval does not include 0 hence the significance. The paired t-test turns out to be highly significant (see Table above). So, how come the graph and the test tell us different things? The problem is that we don t really want to compare the mean size of the wives to the mean size of the husband, we want to look at the difference pair-wise, in other words we want to know if, on average, a given wife is taller or smaller than her husband. So we are interested in the mean difference between husband and wife. Unfortunately, one of the down sides of GraphPad is that you cannot manipulate the data, for instance there is no equivalent of Excel s Function thanks to which one can apply formulas to join several values. In our case, we want to calculate and plot the difference in size between a husband and his wife. So no choice, we have to do it in Excel and then we can copy and paste it back into GraphPad after having created a new data table. The graph representing the difference is displayed below and one can see that the confidence interval does not include 0 meaning that the difference is likely to be significantly different from 0 which we already know by the paired t-test.

26 Introduction to Statistics with GraphPad Prism Difference Confidence Interval Now try to run a One Sample t-test which you will find under Column Analysis > Column Statistics. Same values as for the paired t-test. You will have noticed that GraphPad does not run a test for the equality of variances in the paired t- test; this is because it is actually looking at only one sample: the difference between the husbands and the wives. 3-4 Comparison of more than 2 means: Analysis of variance A bit of theory When we want to compare more than 2 means (e.g. more than 2 groups), we cannot run several t-test because it increases the familywise error rate which is the error rate across tests conducted on the same experimental data. Example: if you want to compare 3 groups (1, 2 and 3) and you carry out 3 t-tests (groups 1-2, 1-3 and 2-3), each with an arbitrary 5% level of significance, the probability of not making the type I error is 95% (= ). The 3 tests being independent, you can multiply the probabilities, so the overall probability of no type I errors is: 0.95 * 0.95 * 0.95 = Which means that the probability of making at least one type I error (to say that there is a difference whereas there is not) is =

27 Introduction to Statistics with GraphPad Prism or 14.3%. So the probability has increased from 5% to 14.3%. If you compare 5 groups instead of 3, the family wise error rate is 40% (= 1 - (0.95) n ) To overcome the problem of multiple comparisons, you need to run an Analysis of variance (ANOVA), which is an extension of the 2 group comparison of a t-test but with a slightly different logic. If you want to compare 5 means, for example, you can compare each mean with another, which gives you 10 possible 2-group comparisons, which is quite complicated! So, the logic of the t-test cannot be directly transferred to the analysis of variance. Instead the ANOVA compares variances: if the variance amongst the 5 means is greater than the random error variance (due to individual variability for instance), then the means must be more spread out than we would have explained by chance. The statistic for ANOVA is the F ratio: F = also: F = variance among sample means variance within samples (=random. Individual variability) variation explained by the model (systematic) variation explained by unsystematic factors If the variance amongst sample mean is greater than the error variance, then F>1. In an ANOVA, you test whether F is significantly higher than 1 or not. Imagine you have a dataset of 78 data points, you make the hypothesis that these points in fact belong to 5 different groups (this is your hypothetical model). So you arrange your data into 5 groups and you run an ANOVA. You get the table below. Source of variation Sum of Squares df Mean Square F p-value Between Groups < Within Groups Total Typical example of analyse of variance table Let s go through the figures in the table. First the bottom row of the table: Total sum of squares = (x i Grand mean) 2 In our case, Total SS = If you were to plot your data to represent the total SS, you would produce the graph below. So the total SS is the squared sum of all the differences between each data point and the grand mean. This is a quantification of the overall variability in your data. The next step is to partition this variability: how much variability between groups (explained by the model) and how much variability within groups (random/individual variability)?

28 Introduction to Statistics with GraphPad Prism 28 According to your hypothesis your data can be split into 5 groups because, for instance, the data come from 5 cell types, like in the graph below. So you work out the mean for each cell type and you work out the squared differences between each of the means and the grand mean ( n i (Mean i - Grand mean) 2 ). In our example (second row of the table): Between groups SS = and, since we have 5 groups, there are 5 1 = 4 df, the mean SS = 2.665/4 = If you remember the formula of the variance (= SS / N-1, with df=n-1), you can see that this value quantifies the variability between the groups means: it is the between group variance. Between group variability Within group variability There is one row left in the table, the within groups variability. It is the variability within each of the five groups, so it corresponds to the difference between each data point and its respective group mean: Within groups sum of squares = (x i - Mean i ) 2 which in our case is equal to This value can also be obtained by doing = 5.775, which is logical since it is the amount of variability left from the total variability after the variability explained by your model has been removed. In our example, the 5 groups sizes are 12, 12, 17, 17 and 17 so df = 5 x (n 1) = 73 So the mean within groups: SS = 5.775/73 = This quantifies the remaining variability, the one not explained by the model, the individual variability between each value and the mean of the group to which it belongs according to your hypothesis. At this point, you can see that the amount of variability explained by your model (0.6663) is far higher than the remaining one (0.0791).

29 Introduction to Statistics with GraphPad Prism 29 So, you can work out the F-ratio: F = / = The level of significance of the test is calculated by taking into account the F ratio and the number of df (degree of freedom) for the numerator and the denominator. In our example, p<0.0001, so the test is highly significant and you are more than 99% confident when you say that there is a difference between the groups means. Let s do it in more details. We want to find out if there is a significant difference in terms of protein expression between 5 cell types. Example (File: protein expression.xlsx): 10 Protein expression A B C D E Cell groups First we need to see whether the data meet the assumptions for a parametric approach. Well it does not look good: 2 out of 5 groups (C and D) show a significant departure from normality (See Table below). As for the homogeneity of variance, even before testing it, a look at the box plots (see Graph above) tells us that there is no way the second assumption is met. The data from groups C and D are quite skewed and a look at the raw data shows more than a 10-fold jump between values of the same group (e.g. in group A, value line 4 is 0.17 and value line 10 is 2.09). A good idea would be to log-transform the data so that the spread is more balanced and to check again on the assumptions. To do so, you go to = Analyse> Transform > Transform and you choose Y=Log(Y), you then re-run the analysis.

30 Introduction to Statistics with GraphPad Prism 30 OK, the situation is getting better: the first assumption is met and from what we see when we plot the transformed data (Box-plots and scatter plots below) the homogeneity of variance has improved a great deal. 1.5 Protein expression (Log) A B C D E Cell groups 1.5 Protein expression (Log) A B C D E

31 Introduction to Statistics with GraphPad Prism 31 Now that we have sorted out the data, we can run the ANOVA: to do so you go =Analyze >One-way ANOVA. The next thing you need to do is to choose is a post-hoc test. These post hoc tests should only be used when the ANOVA finds a significant effect. GraphPad is not very powerful when it comes to post-hoc tests as it offers only 2 tests: the Bonferroni test which is quite conservative so you should only choose it when you are comparing no more than 5 groups and the Tukey which is more liberal. Overall difference between the groups Homogeneity in variance

32 Introduction to Statistics with GraphPad Prism 32 There is an overall significant difference between the means (p< ), but even if you have an indication from the graph, you cannot tell which mean is significantly different from which. This is because the ANOVA is an omnibus test: it tells you that there is (or not) a overall difference between your means but not exactly which means are significantly different from which other ones. This is why you apply post-hoc tests. Post hoc tests could be compared as t-tests but with a more stringent approach, a lower significance threshold to correct for familywise error rate. From the table above you can find out which pairwise comparison reaches significance and which does not. One of the problems with GraphPad is that for post-hoc tests, it does not report the exact p-values which is more and more often asked in journals. And even for you, it is important to know the exact p-values: for example A vs. D is significant but it must be just about looking at the 95% CI as 0 is really on the side. Same thing for A vs. B: this time the test does not reach significance but again it must be quite close judging again by the CI. You can report the significance as in the graph below. * ** ** 0.4 Log(Protein Expression) A B C D E Cell groups 3-5 Correlation If you want to find out about the relationship between 2 variables, you can run a correlation. Example (File: roe deer.xlsx). When you want to plot data from 2 quantitative variables between which you suspect (hope?) that there is a relationship, the best choice to have a first look at you data is the scatter plot. So in GraphPad, you go choose an XY table. In our case we want to know if there is a relationship between the body mass and the parasite burden.

33 Introduction to Statistics with GraphPad Prism 33 Roe Deer Body Mass Male Female Parasites Burden You have to choose between the x- and the y-axis for your 2 variables. It is usually considered that x predicts y (y=f(x)) so when looking at the relationship between 2 variables, you must have an idea of which one is likely to predict the other one. In our particular case, we want to know how an increase in parasite burden affects the body mass of the host. By looking at the graph, one can think that something is happening here. Now, if you want to know if the relationship between your 2 variables is significant, you need to run a correlation test. A bit of theory: Correlation coefficient A correlation is a measure of a linear relationship (can be expressed as straight-line graphs) between variables. The simplest way to find out whether 2 variables are associated is to look at whether they covary. To do so, you combine the variance of one variable with the variance of the other. A positive covariance indicates that as one variable deviates from the mean, the other one deviates in the same direction, in other word if one variable goes up the other one goes up as well. The problem with the covariance is that its value depends upon the scale of measurement used, so you won t be able to compare covariance between datasets unless both data are measures in the same units. To standardise the covariance, it is divided by the SD of the 2 variables. It gives you the most widely-used correlation coefficient: the Pearson product-moment correlation coefficient r. Of course, you don t need to remember that formula but it is important that you understand what the correlation coefficient does: it measures the magnitude and the direction of the relationship between two variables. It is designed to range in value between 0.0 and 1.0.

34 Introduction to Statistics with GraphPad Prism 34 The 2 variables do not have to be measured in the same units but they have to be proportional (meaning linearly related) One last thing before we go back to our example: the coefficient of determination r 2 : it gives you the proportion of variance in Y that can be explained by X, in percentage. One way to run a correlation with GraphPad is simply to click on the little icon that represents a regression line in the Analysis window but before that don t forget that you need to check the normality of your data. In our case, we are good: D Agostino and Pearson tests: males: p= and females: p=0.5084). If you look into the results section, you will find that there is a strong negative relationship (for the males) and weak one (for the females) between the 2 variables, the body mass decreasing when the parasite burden increases (negative slopes). For the males the equation would be: Body Mass = *Parasite Burden. It tells you that each time the parasite burden increases by 1 unit, the body mass decreases by units and that the average male roe deer in that sample weights 30.2 kg. A coefficient of determination r 2 = 0.56 means that 56% of the variability observed in the body mass can be explained only by the parasite burden. The relationship between body mass and parasite burden is significant for males (p=0.0049) but not for females (p=0.2940). You may want to test whether there is a significant difference in the strength of the correlation between males and females. Some packages like SPSS allow you to run an ANCOVA which is a cross between the correlation and the ANOVA. It tests together the difference in body mass between males and females, the strength of the relationship between the body mass and the parasite burden and finally the interaction between parasite burden and gender i.e. the difference in the relationship

35 Introduction to Statistics with GraphPad Prism 35 between body mass and parasite burden. You cannot run this analysis with GraphPad Prism. However you can test whether the 2 slopes are significantly different. When you click on the regression line, you can choose to compare the slopes and the intercepts. Are the slopes equal? F = DFn=1 DFd=22 P= A key thing to remember when working with correlations is never to assume a correlation means that a change in one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).

36 Introduction to Statistics with GraphPad Prism 36 References Cumming G., Fidler F. and Vaux D.L. Error bars in experimental biology. The Journal of Cell Biology, Vol. 177, No.1, Field A Discovering statistics using SPSS (3 rd Edition). London: Sage. McKillup S Statistics explained. Cambridge: Cambridge University Press.