It is possible to perform tests to see if the sample data are consistent with the that they were sampled from a normal

Transcription

1 Department of Biochemistry and Microbiology BMS 617 Lecture 7 Non- normality and outliers Normally distributed data Many of the sta@s@cal tests we will study rely on the assump@on that the data were sampled from a normal distribu@on How reasonable is this assump@on? The normal distribu@on is an ideal distribu@on that likely never exists in reality Includes arbitrarily large values and arbitrarily small (nega@ve) values However, simula@ons show that most tests that rely on the assump@on of normality are robust to devia@ons from the normal distribu@on The ideal normal distribu@on Samples from a normal distribu@on Image shows data sampled from a theore@cal normal distribu@on Uses a very large sample size Close approxima@on to theore@cal distribu@on Tests for normality It is possible to perform tests to see if the sample data are consistent with the assump@on that they were sampled from a normal distribu@on Unfortunately, this is not what we really want to know Would really like to know if the distribu@on is close enough to normal for the test we use to be useful Tests for normality A test for normality is a sta@s@cal test for which the null hypothesis is The data were sampled from a normal distribu4on Common normality tests include D Agos@no- Pearson omnibus K2 normality test Shapiro- Wilk test Kolmogorov- Smirnov test 1

2 D Pearson omnibus K2 normality test The D Agos@no- Pearson omnibus K2 normality test works by compu@ng two values for the data set: The skewness, which measures how far the data is from being symmetric The kurtosis, which measures how sharply peaked the data is The test then combines these to a single value that describes how far from normal the data appear to lie Computes a p- value for this combined value Problem with normality tests If the p- value for a normality test is small, the interpreta@on is: If the data were sampled from an ideal normal distribu@on, it is unlikely the sample would be this skewed and/or kurto@c If the p- value for a normality test is large, then the data are not inconsistent with being sampled from a normal distribu@on However If the sample size is large, it is possible to get a small p- value even for small devia@ons from the normal distribu@on Data are likely sampled from a distribu@on that is close to, but not exactly, normal If the sample size is small, it is possible to get a large p- value even if the underlying distribu@on is far from normal Data do not provide sufficient evidence to reject the null hypothesis Useful to examine the values for skewness and kurtosis as well as the p- value Skewness and kurtosis Interpre@ng skewness and kurtosis The real ques@on we would like to answer is How much skewness and kurtosis are acceptable? Difficult to answer In general, interpret a skewness between and 0.5 as being approximately symmetric Between and - 0.5, or 0.5 and 1.0 is moderately skewed Less than or more than 1.0 is highly skewed For kurtosis, values between - 2 and 2 are generally accepted as being within limits Outside this is evidence the distribu@on is far from normal What to do if the data fail a test for normality If the data fail a test for normality, the following op@ons are available Can the data be transformed to data that come from a normal distribu@on? For example, if the data are nega@vely skewed, transforming to logs may give normally distributed data Are there a small number of outliers that are causing the data to fail a normality test? Next sec@on discusses outliers Is the departure from normality small? I.e. are the skewness and kurtosis small. If so, your sta@s@cal tests may s@ll be accurate enough Use a test that does not assume a normal distribu@on (a nonparametric test) Non- parametric tests The most common sta@s@cal tests assume the data are sampled from a normal distribu@on T- tests, ANOVA, Pearson correla@on, etc Some other tests do not make this assump@on Mann- Whitney test, Kruskal- Wallis test, Spearman correla@on, etc However, these tests have (much) lower sta@s@cal power than their parametric equivalents when the data are normally distributed 2

3 Choosing nonparametric tests When running a series of similar experiments, all data should be analyzed the same way Use normality tests to choose the sta@s@cal test for all experiments together Following common prac@ce is acceptable Ideally, run one experiment just to determine whether the data look like they come from a normal distribu@on For small data sets A test for normality does not tell you much Not likely to get a small p- value anyway Viola@ons of the normality assump@on are more egregious Non- parametric tests have very low sta@s@cal power The Mann- Whitney Test The Mann- Whitney test is the nonparametric equivalent of the unpaired T- test Use when you want to compare a variable between two groups, but you have reason to believe the data is not sampled from a normally- distributed popula@on How the Mann- Whitney Test works The Mann- Whitney test works as follows: Compute the rank for all values, regardless of which group they come from Smallest value has a rank of 1, next smallest has a rank of 2, etc. Choose one group: for each data point in that group, count the number of data points in the other group which are smaller Sum these values, and call the sum U 1 Similarly compute U 2, or use the fact that U 1 +U 2 =n 1 n 2 Let U=min(U 1,U 2 ) The distribu@on of U under the null hypothesis is known, so sogware can compute a p- value Pros and cons of nonparametric tests Pros of nonparametric tests: Since nonparametric tests do not rely on the assump@on of normally- distributed popula@ons, they can be used when that assump@on fails, or cannot be verified Cons of nonparametric tests: If the data really do come from normally- distributed popula@ons, the nonparametric tests are less powerful than their parametric counterparts i.e. they will give higher p- values For small sample sizes, they are much less powerful: Mann- Whitney p- values are always greater than 0.05 if the sample size is 7 or fewer Nonparametric Tests typically do not compute confidence intervals Can some@mes be computed, but ogen require addi@onal assump@ons Non- parametric tests are not related to regression models Cannot be extended to account for confounding variables using mul@ple regression techniques Choosing between parametric and nonparametric tests The choice between parametric and nonparametric tests is not straighiorward A common, but invalid, approach is to use normality tests to automate the choice The choice is most important for small data sets, for which normality tests are of limited use Using the data set to determine the sta@s@cal analysis will underes@mate p- values If data fail normality tests, a transforma@on may be appropriate The most "honest" approach is to perform in independent experiment with a large sample to test for normality, and then design the experiment in hand based on the results of this This is almost always imprac@cal For well- used experimental designs, an almost- equivalent approach is to follow customary procedure Essen@ally assuming this has been carried out in some way already How much difference does it make? The central limit theorem ensures that parametric tests work well with non- normal distribu@ons if the sample is large enough How large is large enough? Depends on the distribu@on! For most distribu@ons, sample sizes in the range of dozens will remove any issues with normality You will s@ll increase your sta@s@cal power by using a transforma@on if appropriate Conversely, if the data really come from a normally- distributed popula@on and you choose a nonparametric test, you will lose sta@s@cal power For large samples, however, the difference is minimal Small samples present problems: Non- parametric tests have very liole power for small samples Parametric tests can give misleading results for small samples if the popula@on data are non- normal Tests for normality are not helpful for small samples 3

4 Conclusions The booom- line conclusion is that large samples are beoer than small samples In general, the larger the beoer Of course, it can consuming and/or expensive to analyze large samples If your experimental design is going to use a small sample, you need to be able to jus@fy the data come from a normally distributed popula@on If this is a common experimental design that is conven@onally analyzed this way, that may be good enough For a new methodology, you should really perform an independent experiment with a large sample to test for normality first Use the results of this to guide the data analysis for future experiments Computa@onally- intensive nonparametric methods The nonparametric methods we examined worked by analyzing the ranks of the data Another class of nonparametric tests is the class of computa@onally- intensive methods There are two subclasses: Permuta@on or randomiza@on tests: Simulate the null distribu@on by repeatedly randomly reassigning group labels Compare the "real" data to the generated null distribu@on Bootstrapping techniques: Effec@vely generate many samples from the popula@on by resampling from the original sample Look at the distribu@on of summary data from the generated samples These techniques s@ll require a reasonable sample size to begin with Big enough to generate enough dis@nct permuta@ons or bootstraps Outliers Outliers are values in the data that are far from the other values Occur for several reasons: Invalid data entry Experimental mistakes Random chance In any distribu@on, some values are far from the others In a normal distribu@on, these values are rarer, but s@ll exist Biological diversity If your samples are from pa@ent or animal samples, the outlier may be correct and due to biological diversity May be an interes@ng finding! Wrong assump@ons For example, in a lognormal distribu@on, some values are far from the others Why test for outliers Presence of erroneous outliers, or assuming the wrong distribu@on, can introduce spurious results or mask real results Trying to detect outliers without a test can be problema@c We tend to want to observe paoerns in data Anything that appears to be counter to these paoerns seems to be an outlier We tend to see too many outliers Before tes@ng for outliers Before tes@ng for outliers: Check the data entry Errors here can ogen be fixed Were there problems with the experiment? If errors were observed during the experiment, remove data associated with those errors Many experimental protocols have quality control measures Is it possible your data is not normally distributed Most outlier tests assume the (non- outlier) data is normally distributed Was there anything different about any of the samples Was one of the mice phenotypically different, etc? Outlier tests Ager addressing the concerns on the previous slide, if you s@ll suspect an outlier you can run an outlier test Outlier tests answer the following ques@on: If the data were sampled from a normal distribu4on, what is the chance of observing one value as far from the others as is in the observed data? 4

5 Results of an outlier test If an outlier test results in a small p- value, then the conclusion is that the outlying value is (probably) not from the same distribu@on as the other values Jus@fies excluding it from the analysis If the outlier test results in a high p- value, there is no evidence the value came from a different distribu@on Doesn t prove it did come from the same distribu@on, just that there is no strong evidence to the contrary Guidelines on removing outliers If you address all the previous concerns, and an outlier test gives strong evidence of an outlier, then it is legi@mate to remove it from the analysis The rules for elimina@ng outliers should be established before you generate the data You should report the number of outliers removed and the ra@onale for doing so in any publica@on using the data How outlier tests work Outlier tests work by compu@ng the difference between the extreme value and some measure of central tendency That value is typically divided by a measure of the variability Resul@ng ra@o is compared with a table or expected distribu@on of those values Grubb s outlier test Grubb s outlier test calculates the difference between the extreme value and the mean of all values (including the extreme value), and divides by the standard devia@on Resul@ng value is then compared to a table of cri@cal values Cri@cal value depends on the sample size If the value is larger than the cri@cal value, then the extreme value can be considered an outlier Demo We ll experiment with the GRHL2 Basal- A and Basal- B data sets in GraphPad, checking for outliers and tes@ng for normality. 5