# Nonparametric statistics and model selection

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality. We were able to apply them to non-gaussian populations by using the central limit theorem, but that only really works for the mean (since the central limit theorem holds for averages of samples). Sometimes, we re interested in computing other sample statistics and evaluating their distributions (remember that all statistics computed from samples are random variables, since they re functions of the random samples) so that we can obtain confidence intervals for them. In other situations, we may not be able to use the central limit theorem due to small sample sizes and/or unusual distributions. In this chapter, we ll focus on techniques that don t require these assumptions. Such methods are usually called nonparametric or distribution-free. We ll first look at some statistical tests, then move to methods outside the testing framework. 5.1 Estimating distributions and distribution-free tests So far, we ve only used eyeballing and visual inspection to see if distributions are similar. In this section, we ll look at more quantitative approaches to this problem. Despite this, don t forget that visual inspection is usually an excellent place to start! We ve seen that it s important to pay attention to the assumptions inherent in any test. The methods in this section make fewer assumptions and will help us test whether our assumptions are accurate Kolmogorov-Smirnov test The Kolmogorov-Smirnov test tests whether two arbitrary distributions are the same. It can be used to compare two empirical data distributions, or to compare one empirical data distribution to any reference distribution. It s based on comparing two cumulative distribution functions (CDFs). Remember that the CDF of a random variable x is the probability that the random variable 1

2 Figure 5.1: Two Kolmogorov-Smirnov test plots (right column) with histograms of the data being tested (left column). On the top row, the empirical CDF (green) matches the test CDF (blue) closely, and the largest difference (dotted vertical red line, near.5) is very small. On the bottom, the empirical CDF is quite different from the test CDF, and the largest difference is much larger. is less than or equal to some value. To be a bit more precise, it s a function F such that F (a) = P (x a). When talking about data, it s often useful to look at empirical CDFs: F n (a) = 1 n i I(x i < a) 1 is the CDF of n observed data points. Now suppose we want to compare two CDFs, F 1 and F. They might be empirical CDFs (to compare two different datasets and see whether they re significantly different) or one might be a reference CDF (to see whether a particular distribution is an appropriate choice for a dataset). The Kolmogorov-Smirnov test computes the statistic D n : D n = max Fn(x) 1 Fn(x) x This compares the two CDFs and looks at the point of maximum discrepancy; see Figure 5.1 for an example. We can theoretically show that if F 1 is the empirical distribution of x and F is the true distribution x was drawn from, then lim n D n =. Similarly, if the two distributions have no overlap at all, the maximum difference will be 1 (when one CDF is 1 and the other is ). Therefore, we can test distribution equality by comparing the statistic D n to (if D n is significantly larger than and close to 1, then we might conclude that the distributions are not equal). Notice that this method is only defined for one-dimensional random variables: although there 1 Remember that I is a function that returns 1 when its argument is true and when its argument is false.

3 are extensions to multiple random variables; they are more complex than simply comparing joint CDFs. Also notice that this test is sensitive to any differences at all in two distributions: two distributions with the same mean but significantly different shapes will produce a large value of D n Shapiro-Wilk test The Shapiro-Wilk test tests whether a distribution is Gaussian using quantiles. It s a bit more specific than the Kolmogorov-Smirnov test, and as a result tends to be more powerful. We won t go into much detail on it in this class, but if you re interested, the Wikipedia page has more detail Wilcoxon s signed-rank test The two-sample t test we discussed in Chapter requires us to use the central limit theorem to approximate the distribution of the sample mean as Gaussian. When we can t make this assumption (i.e., when the number of samples is small and/or the distributions are very skewed or have very high variance), we can use Wilcoxon s signed-rank test for matched pairs. This is a paired test that compares the medians of two distributions. For example, when comparing the incomes of two different groups (especially groups that span the socioeconomic spectrum), the distributions will likely be highly variable and highly skewed. In such a case, it might be better to use a nonparametric test like Wilcoxon s signed-rank test. In contrast to the Kolmogorov-Smirnov test earlier, this test (like its unpaired cousin the Mann-Whitney U) is only sensitive to changes in the median, and not to changes in the shape. The null hypothesis in this test states that the median difference between the pairs is zero. We ll compute a test statistic and a corresponding p-value, which give us a sense for how likely our data are under the null hypothesis. We compute this test statistic as follows: (1) For each pair i, compute the difference, and keep its absolute value d i and its sign S i (where S i { 1,, +1}). We ll exclude pairs with S i =. () Sort the absolute values d i from smallest to largest, and rank them accordingly. Let R i be the rank of pair i (for example, if the fifth pair had the third smallest absolute difference, then R 5 = 3). (3) Compute the test statistic: W = S i R i i For unmatched pairs, we can use the Mann-Whitney U test, described in the next section 3

4 W has a known distribution. In fact, if N is greater than about, it s approximately normally distributed (if not, it still has a known form). So, we can evaluate the probability of observing it under a null hypothesis and thereby obtain a significance level. Intuitively, if the median difference is, then half the signs should be positive and half should be negative, and the signs shouldn t be related to the ranks. If the median difference is nonzero, W will be large (the sum will produce a large negative value or a large positive value). Notice that once we constructed the rankings and defined R i, we never used the actual differences! 5.1. Mann-Whitney U Test The Mann-Whitney U test is similar to the Wilcoxon test, but can be used to compare multiple samples that aren t necessarily paired. The null hypothesis for this test is that the two groups have the same distribution, while the alternative hypothesis is that one group has larger (or smaller) values than the other 3. Computing the test statistic is simple: (1) Combine all data points and rank them (largest to smallest or smallest to largest). () Add up the ranks for data points in the first group; call this R 1. Find the number of points in the group; call it n 1. Compute U 1 = R 1 n 1 (n 1 + 1)/. Compute U similarly for the second group. (3) The test statistic is defined as U = min(u 1, U ). As with W from the Wilcoxon test, U has a known distribution. If n 1 and n are reasonably large, it s approximately normally distributed with mean n 1 n / under the null hypothesis. If the two medians are very different, U will be close to, and if they re similar, U will be close to n 1 n /. Intuitively, here s why: If the values in the first sample were all bigger than the values in the second sample, then R 1 = n 1 (n 1 + 1)/ : this is the smallest possible value for R 1. U 1 would then be. If the ranks between the two groups aren t very different, then U 1 will be close to U. With a little algebra, you can show that the sum U 1 +U will always be n 1 n. If they re both about the same, then they ll both be near half this value, or n 1 n /. 3 Under some reasonable assumptions about the distributions of the data (see the Mann-Whitney U article on Wikipedia for more details), this test can be used with a null hypothesis of equal medians and a corresponding alternative hypothesis of a significant difference in medians. R 1 = n 1 (n 1 + 1)/ because in this case, the ranks for all the values from the first dataset would be 1 through n 1, and the sum of these values is n 1 (n 1 + 1)/.

5 5. Resampling-based methods All the approaches we ve described involve computing a test statistic from data and measuring how unlikely our data are based on the distribution of that statistic. If we don t know enough about the distribution of our test statistic, we can use the data to tell us about the distribution: this is exactly what resampling-based methods do. Permutation tests sample different relabelings of the data in order to give us a sense for how significant the true relabeling s result is. Bootstrap creates new datasets by resampling several times from the data itself, and treats those as separate samples. The next example illustrates a real-world example where these methods are useful. Example: Chicago teaching scandal In, economists Steven Levitt and Brian Jacob investigated cheating in Chicago public schools, but not in the way you might think: they decided to investigate cheating by teachers, usually by changing student answers after the students had taken standardized tests a So, how d they do it? Using statistics! They went through test scores from thousands of classrooms in Chicago schools, and for each classroom, computed two measures: (1) How unexpected is that classroom s performance? This was computed by looking at every student s performance the year before and the year after. If many students had an unusually high score one year that wasn t sustained the following year, then cheating was likely. () How suspicious are the answer sheets? This was computed by looking at how similar the A-B-C-D patterns on different students answer sheets were. Unfortunately, computing measures like performance and answer sheet similarity is tricky, and results in quantities that don t have well-defined distributions! As a result, it isn t easy to determine a null distribution for these quantities, but we still want to evaluate how unexpected or suspicious they are. To solve this problem, Levitt and Jacob use two nonparametric methods to determine appropriate null distributions as a way of justifying these measures. In particular: They assume (reasonably) that most classrooms have teachers who don t cheat, so by looking at the 5th to 75th percentiles of both measures above, they can obtain a null distribution for the correlation between the two. In order to test whether the effects they observed are because of cheating teachers, they randomly re-assign all the students to new, hypothetical classrooms and repeat their analysis. As a type of permutation test, this allows them to establish a baseline level for these measures by which they can evaluate the values they observed. While neither of these methods is exactly like what we ll discuss here, they re both examples of a key idea in nonparametric statistics: using the data to generate a null hypothesis rather than assuming any kind of distribution. What d they find? 3.% of classrooms had teachers who cheated on at least one standardized test when the two measures above were thresholded at the 95th percentile. They also used regression with a variety of classroom demographics to determine that academically poorer classrooms were more likely to have cheating teachers, and that policies that put more weight on test scores correlated with increased teacher cheating. a see Jacob and Levitt. Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating. For more economic statistics, see Steven Levitt s book with Stephen Dubner, Freakonomics. 5

6 5..1 Permutation tests Suppose we want to see if a particular complex statistic is significantly different between two groups. If we don t know the distribution of the statistic, then we can t assign any particular probabilities to particular values, and so we can t say anything interesting after computing a statistic from data. The idea behind permutation tests is to use the data to generate the null hypothesis. The key idea is that if there weren t a significant difference between this statistic across the two groups in our data, then the statistic should have the same value for any other two arbitrarily selected groups of the same sizes. We ll make up a bunch of fake groups and see if this is the case. We ll demonstrate the idea with comparing sample means, but note that in practice sample means could be compared using a t-test, and that permutation tests are generally used for more complicated statistics whose distribution can t be deduced or approximated. Let s call the two groups in our data A and B. Our test statistic of interest will be w = x A x B : the difference in the means of the two groups. Our null hypothesis will be H : x A = x B. As before, we want to see whether this difference is significantly large or small. In order to get a sense for the null distribution of w, we can try relabeling the points many different times and recomputing the statistic. Suppose the original set A has n data points, and B has m points. We ll randomly pick n points out of all n+m, assign those to our new group A i, and assign the others to our new group B i (where the subscript i indicates the iteration). We ll recompute w i based on these new groups, and repeat the process. We can then see whether our value w is unusual based on the collection of w i, perhaps by looking at what fraction of them are more extreme than w. We could either repeat this for every possible labeling (usually called an exact test), or randomly choose a subset and evaluate only on those (usually called a Monte Carlo approximation). So, this entire procedure requires only a procedure for computing the statistic of interest (in this case, w): the rest is determined by the data. We can apply permutations elsewhere, too! For example, if we suspect there is some kind of trend in time series data, we can permute the time indices and compare the correlation. For example, if we observe the time series data (, 1, 15.,.3) and suspect there s a trend, we can permute the order, obtaining permutations like (15., 1,.3, ) or (.3,, 15., 1), fit linear models to each one, and evaluate how unlikely the observed fit is. 5.. Bootstrap Suppose we have some complicated statistic y that we computed from our data x. If we want to provide a confidence interval for this statistic, we need to know its variance. When our statistic was simply x, we could compute the statistic s standard deviation (i.e., the standard error of that statistic) from our estimated standard deviation using s x / n. But for more complicated statistics, where we don t know the distributions, how do we provide a confidence interval?

7 Figure 5.: An illustration of bootstrap sampling. The top figure shows the true distribution that our data points are drawn from, and the second figure shows a histogram of the particular data points we observed (N = 5). The bottom row shows various bootstrap resamplings of our data (with n = N = 5). Even though they were obtained from our data, they can be thought of as samples from the true distribution (top). One approach is a method called bootstrap. The key idea here is that we can resample points from our data, compute a statistic, and repeat several times to look at the variance across different resamplings. Recall that our original data (N points) are randomly generated from some true distribution. If we randomly sample n points (n N, and often n = N) from our data with replacement 5, these points will also be random samples from our true distribution, as shown in Figure 5.. So, we can compute our statistic over this smaller random sample and repeat many times, measuring the variance of the statistic across the different sample runs. Everything we ve talked about so far has been based on the idea of trying to approximating the true distribution of observed data with samples. Bootstrap takes this a step further and samples from the samples to generate more data. A similar method, known as jackknife, applies a similar process, but looks at N 1 points taken without replacement each time instead of n with replacement. Put more simply, we remove one point at a time and test the model. Notice that we ve seen a similar idea before: our initial definition of Cook s distance was based on the idea of removing one point at a time. In practice, boostrap is more widely used than jackknife; jackknife also has very different theoretical properties. 5 This means that a single data point can be sampled more than once. 7

8 5.3 Model selection When fitting models (such as regression) to real data, we ll often have a choice to make for model complexity: a more complex model might fit the data better but be harder to interpret, while a simpler model might be more interpretable but produce a larger error. We ve seen this before when looking at polynomial regression models and LASSO in Chapters 3 and. In this section, we ll learn how to pick the best model for some data among several choices Minimum generalization error First, we ll define best by how well our model generalizes to new data that we ll see in the future. In particular, we ll define it this way to avoid overfitting. How can we figure out how well our model generalizes? The most common way is to divide our data into three parts: The training set is what we ll use to fit the model for each possible value of the manually-set parameters, the validation set is what we ll use to determine parameters that require manual settings, and the test set is what we use to evaluate our results for reporting, and get a sense for how well our model will do on new data in the real world. It s critically important to properly separate the test set from the training and validation sets! At this point you may be wondering: why do we need separate test and validation sets? The answer is that we choose a model based on its validation set performance: if we really want to see generalization error, we need to see how it does on some new data, not data that we used to pick it. A good analogy is to think of model fitting and parameter determination as a student learning and taking a practice exam respectively, and model evaluation as that student taking an actual exam. Using the test data in any way during the training or validation process is like giving the student an early copy of the exam: it s cheating! Figure 5.3 illustrates a general trend we usually see in this setup: as we increase model complexity, the training error (i.e. the error of the model on the training set) will go down, while the testing error will hit a sweet spot and then start increasing due to overfitting. For example, if we re using LASSO (linear regression with sum-of-absolute-value penalty) as described in Chapter, we need to choose our regularization parameter λ. Recall that λ controls model sparsity/complexity: small values of λ lead to complex models, while large values lead to simpler models. One approach is: (a) Choose several possibilities for λ and, for each one, compute coefficients using the training set.

9 Training error Validation error Error Polynomial degree Figure 5.3: Training and validation error from fitting a polynomial to data. The data were generated from a fourth-order polynomial. The validation error is smallest at this level, while the training error continues to decrease as more complex models overfit the training data. (b) Then, look at how well each one does on the validation set. Separating training and validation helps guard against overfitting: if a model is overfit to the training data, then it probably won t do very well on the validation data. (c) Once we ve determined the best value for λ (i.e., the one that achieves minimum error in step (b)), we can fit the model on all the training and validation data, and then see how well it does on the test data. The test data, which the model/parameters have never seen before, should give a measure of how well the model will do on arbitrary new data that it sees. The procedure described above completely separates our fitting and evaluation processes, but it does so at the cost of preventing us from using much of the data. Recall from last week that using more data for training typically decreases the variance of our estimates, and helps us get more accurate results. We also need to have enough data for validation, since using too little will leave us vulnerable to overfitting. One widely used solution that lets us use more data for training is cross-validation. Here s how it works: (1) First, divide the non-test data into K uniformly sized blocks, often referred to as folds. This gives us K training-validation pairs: in each pair, the training set consists of K 1 blocks, and the validation set is the remaining block. () For each training/validation pair, repeat steps (a) and (b) above: this gives us K different errors for each value of λ. We can average these together to get an average error for each λ, which we ll then use to select a model. (3) Repeat step (c) above to obtain the test error as an evaluation of the model. Although these examples were described with respect to LASSO and the parameter λ, the procedures are much more general: we could have easily replaced value for λ above with a different measure of model complexity. 9

10 Also note that we could use a bootstrap-like approach here too: instead of deterministically dividing our dataset into K parts, we could have randomly subsampled the non-test data K different times and applied the same procedure Penalizing complexity Another way to address the tradeoff between model complexity and fit error is to directly penalize model complexity. That is, we can express the badness of a model as the sum of an error term (recall the sum of squared error cost that we used in linear regression) and some penalty term that increases as the model becomes more complex. Here, the penalty term effectively serves as a proxy for minimizing generalization error. Two common penalty terms are AIC, or Akaike Information Criterion, which adds a penalty term equal to p (where p is the number of parameters), and BIC, or Bayesian Information Criterion, which adds a penalty term equal to p ln n (where n is the number of data points). BIC is closely related to another criterion known as MDL, or minimum description length. This criterion tries to compress the model and the error: if the model and the error are both small and simple, they ll be highly compressible. A complex model that produces low error will be hard to compress, and an overly simplistic model will produce high error that will be hard to compress. This criterion tries to achieve a balance between the two. Example: Bayesian Occam s Razor Another idea commonly used is Bayesian Occam s Razor. In Bayesian models, simpler models tend to have higher probability of observing data. Suppose we observe a number and want to decide which of two models generate it. The simpler model, H 1, says that it s equally likely to be 1 or, while the more complex model, H, says that it s equally likely to be 1,, 3, or. The following figure shows the two distributions: 1. H 1 1. H If we observe the number, the likelihood of this observation under H 1 is.5, while the likelihood under H is.5. Therefore, by choosing the model with the highest probability placed on our observation, we re also choosing the simpler model. Intuitively, the more values that a model allows for, the more it has to spread out its probability for each value.

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### Difference tests (2): nonparametric

NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

### Non-Parametric Tests (I)

Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

BIOSTATISTICS QUIZ ANSWERS 1. When you read scientific literature, do you know whether the statistical tests that were used were appropriate and why they were used? a. Always b. Mostly c. Rarely d. Never

### fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson Contents What Are These Demos About? How to Use These Demos If This Is Your First Time Using Fathom Tutorial: An Extended Example

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Nonparametric Test Procedures

Nonparametric Test Procedures 1 Introduction to Nonparametrics Nonparametric tests do not require that samples come from populations with normal distributions or any other specific distribution. Hence

### Come scegliere un test statistico

Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

### Comparing two groups (t tests...)

Page 1 of 33 Comparing two groups (t tests...) You've measured a variable in two groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the two groups are really different?

### COLORED GRAPHS AND THEIR PROPERTIES

COLORED GRAPHS AND THEIR PROPERTIES BEN STEVENS 1. Introduction This paper is concerned with the upper bound on the chromatic number for graphs of maximum vertex degree under three different sets of coloring

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

### Introduction to Statistics for Computer Science Projects

Introduction Introduction to Statistics for Computer Science Projects Peter Coxhead Whole modules are devoted to statistics and related topics in many degree programmes, so in this short session all I

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Statistics: revision

NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 3 / 4 May 2005 Department of Experimental Psychology University of Cambridge Slides at pobox.com/~rudolf/psychology

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

### Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.

Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

### Nonparametric tests, Bootstrapping

Nonparametric tests, Bootstrapping http://www.isrec.isb-sib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

PROBLEM SET 1 For the first three answer true or false and explain your answer. A picture is often helpful. 1. Suppose the significance level of a hypothesis test is α=0.05. If the p-value of the test

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### The Wilcoxon Rank-Sum Test

1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

### Hypothesis tests, confidence intervals, and bootstrapping

Hypothesis tests, confidence intervals, and bootstrapping Business Statistics 41000 Fall 2015 1 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### The Assumption(s) of Normality

The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo

How to choose a statistical test Francisco J. Candido dos Reis DGO-FMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There

### Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

### Business Statistics. Lecture 8: More Hypothesis Testing

Business Statistics Lecture 8: More Hypothesis Testing 1 Goals for this Lecture Review of t-tests Additional hypothesis tests Two-sample tests Paired tests 2 The Basic Idea of Hypothesis Testing Start

### A Logic of Prediction and Evaluation

5 - Hypothesis Testing in the Linear Model Page 1 A Logic of Prediction and Evaluation 5:12 PM One goal of science: determine whether current ways of thinking about the world are adequate for predicting

### NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### 496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT

496 STATISTICAL ANALYSIS OF CAUSE AND EFFECT * Use a non-parametric technique. There are statistical methods, called non-parametric methods, that don t make any assumptions about the underlying distribution

### Testing: is my coin fair?

Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### T adult = 96 T child = 114.

Homework Solutions Do all tests at the 5% level and quote p-values when possible. When answering each question uses sentences and include the relevant JMP output and plots (do not include the data in your

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

### Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

### Statistiek I. t-tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen. John Nerbonne 1/35

Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://wwwletrugnl/nerbonne/teach/statistiek-i/ John Nerbonne 1/35 t-tests To test an average or pair of averages when σ is known, we

### 1/22/2016. What are paired data? Tests of Differences: two related samples. What are paired data? Paired Example. Paired Data.

Tests of Differences: two related samples What are paired data? Frequently data from ecological work take the form of paired (matched, related) samples Before and after samples at a specific site (or individual)

### Checklists and Examples for Registering Statistical Analyses

Checklists and Examples for Registering Statistical Analyses For well-designed confirmatory research, all analysis decisions that could affect the confirmatory results should be planned and registered

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

### AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### Non-Inferiority Tests for Two Means using Differences

Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

### Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

### Absolute Value of Reasoning

About Illustrations: Illustrations of the Standards for Mathematical Practice (SMP) consist of several pieces, including a mathematics task, student dialogue, mathematical overview, teacher reflection

### TRANSCRIPT: In this lecture, we will talk about both theoretical and applied concepts related to hypothesis testing.

This is Dr. Chumney. The focus of this lecture is hypothesis testing both what it is, how hypothesis tests are used, and how to conduct hypothesis tests. 1 In this lecture, we will talk about both theoretical

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Descriptive Statistics: Measures of Central Tendency and Crosstabulation. 789mct_dispersion_asmp.pdf

789mct_dispersion_asmp.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lectures 7-9: Measures of Central Tendency, Dispersion, and Assumptions Lecture 7: Descriptive Statistics: Measures of Central Tendency

### 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written