Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Size: px
Start display at page:

Download "Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction"

Transcription

1 Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction Some parts of this lecture are adopted with permission from lectures given by Sira Vegas and Oscar Dieste at UPM

2 Outline Descriptive statistics Statistical Analysis Parametric Tests Student s t-test Paired t-test One-way ANOVA Non-parametric Tests Mann-Whitney Wilcoxon Sign test Jedlitschka, Vegas, Dieste 2014 Slide 2

3 DESCRIPTIVE STATISTICS Jedlitschka, Vegas, Dieste

4 Important notice In inferential statistics, the population parameters are clearly differentiated from estimators (parameters calculated from samples) Population parameters are designated by Greek letters: μ, σ 2, σ Estimators are designated by Latin letters: m, s 2, s In most cases, symbols have an associated subscript denoting the associated sample (a treatment, usually): μ a, s b Jedlitschka, Vegas, Dieste 2014 Slide 4

5 Important notice The notational aspect is important because there are some differences in the calculation of estimators as compared to population parameters, concretely in the case of the variance: Sample variance It affects standard deviation, as it is the squared root of the variance (n-1) are the degrees of freedom of the sample. This will be important soon Jedlitschka, Vegas, Dieste 2014 Slide 5

6 Measures of central tendency Dataset: { 1, 2, 2, 2, 3, 14 } Arithmetic Mean Median Mode = 4 middle value of the ordered values: 2 Which one appears most often: 2 Measures differ in their response to outliers Jedlitschka, Vegas, Dieste 2014 Slide 6

7 Mean, Median, Mode Jedlitschka, Vegas, Dieste 2014 Slide 7

8 Dispersion (1/2) Dataset: {1, 2, 2, 2, 3, 14} Range {min, max}: {1, 14} Standard deviation (SD) σ if the data is from the population (N & μ) s if the data is from the population (N-1 & ) informs about the variation from the average Is the square root of the variance : 4,51 Jedlitschka, Vegas, Dieste 2014 Slide 8

9 Dispersion (2/2) Interquartile Range Jedlitschka, Vegas, Dieste 2014 Slide 9

10 Shape Variance σ² The average of the squared differences from the mean. Skewness Kurtosis Jedlitschka, Vegas, Dieste 2014 Slide 10

11 Dependency Linear regression Correlation coefficient (Pearson) Interval or ratio & normal distribution More than two variables: Multivariate analysis: principal component, moment_correlation_coefficient Jedlitschka, Vegas, Dieste 2014 Slide 11

12 Motivation STATISTICAL ANALYSIS Jedlitschka, Vegas, Dieste

13 A simple experiment Experiments don t have to be complicated. They can be so simple as comparing a technology to something else 1 factor Jedlitschka, Vegas, Dieste 2014 Slide 13

14 Distribution and Probability Find out whether this is a fair die! What could be the idea? Jedlitschka, Vegas, Dieste 2014 Slide 14

15 Solution Approach Either you have a trustworthy expectation Or Take by chance one of the dice Throw it one hundred times Note down each single event Derive distribution Now take this one and check whether it fulfils the expectation Jedlitschka, Vegas, Dieste 2014 Slide 15

16 A simple experiment Experiments don t have to be complicated. They can be so simple as comparing a pair of techniques 1 factor with 2 levels In cases like these, we don t need expensive tools (SPSS, STATA, etc.) to analyze the experimental results A scholar wants to know if technique A (say functional testing) is better than B (say inspection) He performs an experiment with some students and gets the following data (metric: higher value means better ): Technique A A B B A B B B A A B Measure Jedlitschka, Vegas, Dieste 2014 Slide 16

17 Question How can we decide which technique (A, B) is better? SPSS The most obvious option is looking at the data: Descriptive statistics Median, means Quartiles, variances, standard deviation and suitable plots Box plots Column1 A B 29,9 26,6 11,4 23,7 25,3 28,5 16,5 14,2 21,1 17,9 24,3 N 5 6 mean 20,84 22,53 variance 52,50 29,51 std. dev 7,25 5,43 Jedlitschka, Vegas, Dieste 2014 Slide 17

18 Box plot min Q1 Q3 max min Q1 Q3 max Jedlitschka, Vegas, Dieste

19 Preliminary answer B looks better, but the results are quite similar. We cannot be sure! It is likely that differences arise due to random chance Don t believe it? Remember what we found out with the dice. Or think about throwing a coin four times (What do you expect? What do you get?). As we can see from this example, many processes have an associated probability distribution How can we make a decision on this case? Jedlitschka, Vegas, Dieste 2014 Slide 19

20 Key question Idea: if we would know the probability distribution, we could calculate the probability that B > A Formally speaking: μ b > μ a Problem: What happens if we ignore the probability distribution? Jedlitschka, Vegas, Dieste 2014 Slide 20

21 Reference distribution Fisher claims that it is possible to relate the experimental results with a reference distribution, which is based on the same experimental data. Using this reference distribution, we can obtain an estimation about the likelihood of a given results under the assumption that A and B does not differ (that is, supposing that μ b = μ a ) Does the difference between the two groups represent a real difference or was it due to chance? Jedlitschka, Vegas, Dieste 2014 Slide 21

22 Standard distributions Building the reference distribution, even for a small example, requires a lot of effort. Under some assumptions, reference distributions are close to known probability distribution, such as normal (Gauss) distribution or, in our particular case, Students t t is used instead of the normal distributed when the sample sizes involved are small The good thing is that standard distributions are tabulated. Significance levels can be obtained immediately from the tables. Jedlitschka, Vegas, Dieste 2014 Slide 22

23 Use the standard distribution Calculate the actual difference between means Say d = ( b a) Locate d in the histogram Calculate the area of the histogram that falls at the right side of d That area is the probability that, by mater of chance, we could obtain a difference between means of value ( b a) or higher We call it p-value If the p-value is below a cutoff value α (significance level) we can affirm the techniques A and B are not alike α is arbitrarily set at 0.05 We say that we have obtained a significant result Jedlitschka, Vegas, Dieste 2014 Slide 23

24 Back to the Example Observed difference Null Hypothesis is not rejected Jedlitschka, Vegas, Dieste

25 Parametric Test / Independent Sample T-TEST Jedlitschka, Vegas, Dieste

26 T-Test One factor experiments with one level One-sample t-test Compare mean response of a group against a specific value The formula shows the general concept used by the following tests = mean (of groups 1 and 2) µ 0 = specified value (e.g., population mean) n = number of subjects in groups (1 and 2) (equal!!!) s = Standard Deviation of group (1 and 2) df = n-1 Lookup t in Student's t-distribution table to obtain p-value. Jedlitschka, Vegas, Dieste 2014 Slide 26

27 T-Test One factor experiments with two levels Two-sample t-test Checks the statistical signification of the difference between the mean responses of two levels of a factor Checks the null hypothesis of the samples belonging to two subpopulations where the mean X is the same Pre-requisites : the two sample sizes (that is, the number, n, of participants of each group) are equal; it can be assumed that the two distributions have the same variance. 2 H = mean (of groups 1 and 2) 1 n = number of subjects in groups (1 and 2) (equal!!!) s = Standard Deviation of group (1 and 2) s² = unbiased estimators of the variances df = 2n-2 H 0 : 2 2 Jedlitschka, Vegas, Dieste

28 T-Test One factor experiments with two levels Special cases Unequal sample sizes, equal variance df = n 1 + n 2-2 Equal or Unequal sample sizes, unequal variances (also Welch s t-test) Jedlitschka, Vegas, Dieste 2014 Slide 28

29 T-Test Project A B Program 3,42 3,44 Defect 2,71 4,97 density 2,84 4,76 1,85 4,96 3,22 4,10 3,48 3,05 2,68 4,09 4,30 3,69 2,49 4,21 1,54 4,40 3,49 1. Calculate means 2. Calculate difference of means 3. Use formula (unequal N) 4. Check obtained t value for respective df in t distribution table 5. Reject H0 if t0 > t α/2,df (two sided) 5. Reject H0 if t0 > t α,df (one sided) Data taken from Wohlin et al Jedlitschka, Vegas, Dieste 2014 Slide 29

30 t-distribution requirements There are three requirements 1. Samples must be independent and identically distributed (i.i.d.). In practice, it means that assignment of levels (A s and B s) to experimental units (subjects) have to be performed in a randomized way i.i.d. implies homoscedasticity and non-interaction 2. Accordingly, the mean estimator should be normally distributed (or close to normality) 3. Response variables are measured on ratio scales. Ordinal metrics cannot be used Condition #1 is probably more important than condition #2 and #3 Jedlitschka, Vegas, Dieste 2014 Slide 30

31 Non-parametric tests If condition #2 does not hold There are several test to check normality or condition #3 does not hold Ordinal metrics can be used non-parametric test can be applied Condition #1 must hold The Wilcoxon Rank Sum or Mann-Whitney Test is one most popular tests. Quite easy, but requires a minimum sample size and has some technical problems (power calculation) Jedlitschka, Vegas, Dieste 2014 Slide 31

32 Parametric vs. non-parametric Obviously, t distribution is an instance of a parametric test The main difference between both types of tests is the assumption of the distribution of the sample Non-parametric test do not make any assumption Non-parametric tests can be applied in situations where parametric cannot, but in turn they are more conservative (less power) Jedlitschka, Vegas, Dieste 2014 Slide 32

33 Non-Parametric Test / Independent Sample MANN WHITNEY U TEST Jedlitschka, Vegas, Dieste

34 Mann Whitney U test Non-parametric test for independent groups It has greater efficiency than the t-test on non-normal distributions Pre-requisites The responses are at least ordinal The distributions of both groups are equal under the null hypothesis Jedlitschka, Vegas, Dieste 2014 Slide 34

35 Mann Whitney U test Method 1: For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of the U statistic. Choose the sample for which the ranks seem to be smaller (The only reason to do this is to make computation easier). Call this "sample 1," and call the other sample "sample 2." For each observation in sample 1, count the number of observations in sample 2 that have a smaller rank (count a half for any that are equal to it). The sum of these ranks is U. Jedlitschka, Vegas, Dieste 2014 Slide 35

36 Mann Whitney U test Method 2: For larger samples, a formula can be used: Add up the ranks for the observations which came from sample 1. Where there are tied groups, take the rank to be equal to the midpoint of the group. The sum of ranks in sample 2 is now determinate, because the sum of all the ranks equals N(N + 1)/2 where N is the total number of observations. U is then given by: and R = Sum of Ranks for the respective group Reject H0 if min(u1, U2) is <= the critical value for the MW Jedlitschka, Vegas, Dieste 2014 Slide 36

37 Mann Whitney U test Project A Rank B Rank Program 3,42 9 3,44 10 Defect 2,71 5 4,97 21 density 2,84 6 4, ,85 2 4, ,22 8 4, , ,05 7 2,68 4 4, , , ,49 3 4, ,54 1 4, ,49 12 S of Ranks U 1 = 99 (use formula) U 2 = 11 (use formula) Check min(u 1, U 2 ) in table n of smaller sample n of larger sample 11 <= 26: reject H0 Data taken from Wohlin et al Table: Jedlitschka, Vegas, Dieste 2014 Slide 37

38 Parametric Test / Dependent Sample PAIRED T-TEST Jedlitschka, Vegas, Dieste

39 Paired T-Test Parametric test for dependent samples E.g., repeated measures or matched pairs differences between all pairs must be calculated = mean of differences between pairs µ 0 = (optional) specified value (e.g., population mean) n = number of subjects s D = Standard Deviation of differences (1 and 2) df = n-1 Jedlitschka, Vegas, Dieste 2014 Slide 39

40 Example Paired T-Test 1. Calculate differences (P1 P2) 2. Calculate mean of differences 3. Calculate std. dev. of differences 4. Use formula 5. Check t value for respective df in table 6. Reject H0 if t0 > t α/2,df (two sided) 6. Reject H0 if t0 > t α,df (one sided) Programmer P1 P2 P1 P ,1 18, ,9 16, ,3 32, N mean 131,10 127,73 3,37 variance 627, ,60 748,46 std. dev. 25,04 39,54 27,36 df (N 1) 9 Jedlitschka, Vegas, Dieste 2014 Slide 40

41 T-Test Table Reject H0 if t0 > t α/2,df (two sided) = => do not reject H0!!! Jedlitschka, Vegas, Dieste 2014 Slide 41

42 Table for T-Test SPSS Outputs Jedlitschka, Vegas, Dieste 2014 Slide 42

43 Non Parametric Test / Dependent Sample WILCOXON SIGN TEST Jedlitschka, Vegas, Dieste

44 Wilcoxon Non-parametric for dependent samples alternative to the paired t-test Pre-requisites It must be possible to determine which value is larger and to rank the differences T1 = 23 (sum negative d) d= P1 Ranks (d) T2 = 32 (sum positive d) Programmer P1 P2 P1 P2 P ,1 18,9 18, ,9 16,1 16, ,3 32,7 32, N 10,00 10,00 mean 131,10 127,73 3,37 variance 627, ,60 748,46 std. dev 25,04 39,54 27,36 T T+ Sum of Ranks Check min(u1, U2) in table 23!<= 8: do not reject H0 Jedlitschka, Vegas, Dieste 2014 Slide 44

45 Sign Test Non-parametric for dependent samples alternative to the paired t-test Used if it is not possible to rank the differences but still, at least ordinal scale Based on the signs of the difference Formula: Programmer P1 P2 P1 P2 Sign ,1 18, ,9 16, ,3 32, N 10,00 10,00 mean 131,10 127,73 3,37 variance 627, ,60 748,46 std. dev 25,04 39,54 27,36 Count + 6 T1 = 6 (# negative d) T2 = 4 (# positive d) n = min (T1, T2) do not reject H0!!! Jedlitschka, Vegas, Dieste 2014 Slide 45

46 Parametric Methods / Independent Sample ONE FACTOR ANOVA Jedlitschka, Vegas, Dieste

47 ONE-FACTOR ANOVA One factor experiments with more than two levels Checks the statistical significance of the difference between the mean responses of one factor with several levels Y ij j e ij j Y Y Steps: 1. Identify the mathematical model 2. Validation of the basic model that relates the experimental variables 3. Calculate the factor induced variation in the response variable 4. Calculate the statistical significance of the factor-induced variation 5. Establish consequences or recommendations on the alternative that provides the best response variable values j j Y Jedlitschka, Vegas, Dieste 2014 Slide 47

48 Example: ANOVA Factor = programming language levels = {ADA, C, C++, JAVA} Response variable = number of errors detected during three months after development ( Quality ) Number of subjects = 24 H 0 = There is no effect of the programming language on the quality of the program PRG Languages ADA C C++ JAVA N Mean Grand Mean 64 Jedlitschka, Vegas, Dieste 2014 Slide 48

49 Example: ANOVA Results: Descriptives ADA lead to a quality of 61±1.83 Jedlitschka, Vegas, Dieste 2014 Slide 49

50 Example: ANOVA Results: > do not reject H0: There are no significant differences between the variances of the two groups. => variances are equal There is a statistically significant difference between groups as determined by one way ANOVA (F = , p =.021). What do we know now? Jedlitschka, Vegas, Dieste 2014 Slide 50

51 Example: ANOVA Post Hoc Tests Scheffé because of different N. else Tukey is preferred There is statistically significant difference between ADA and C (C++) p=0.032 (p=0.002) and JAVA and C (C++) p=0.009 (p=0.000). There are no difference between ADA and JAVA as well as C and C++. Jedlitschka, Vegas, Dieste 2014 Slide 51

52 Example: ANOVA Homogeneous Subsets Jedlitschka, Vegas, Dieste 2014 Slide 52

53 Example: ANOVA Means Plot Jedlitschka, Vegas, Dieste 2014 Slide 53

54 Further Analysis Two-way ANOVA MANOVA ranova Multitude of other tests Jedlitschka, Vegas, Dieste 2014 Slide 54

55 DECISION TREE Jedlitschka, Vegas, Dieste 2014 Slide 55

56 References Wohlin, Runeson, Höst, Ohlsson, Regnell, Wesslén (2012). Experimentation in Software Engineering, Springer J. Bortz, and N. Döring (2006). Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler (4 Auflage). Berlin: Springer Verlag. N. Juristo and A. Moreno. (2001). Basics of Software Engineering Experimentation, Kluwer Academic Publishers. Jedlitschka, Vegas, Dieste 2014 Slide 56

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

Difference tests (2): nonparametric

Difference tests (2): nonparametric NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

More information

Skewed Data and Non-parametric Methods

Skewed Data and Non-parametric Methods 0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted

More information

Data analysis process

Data analysis process Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Research Methodology: Tools

Research Methodology: Tools MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 11: Nonparametric Methods May 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Non-Inferiority Tests for Two Means using Differences

Non-Inferiority Tests for Two Means using Differences Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

individualdifferences

individualdifferences 1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Description. Textbook. Grading. Objective

Description. Textbook. Grading. Objective EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NAG C Library Chapter Introduction. g08 Nonparametric Statistics g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples Statistics One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples February 3, 00 Jobayer Hossain, Ph.D. & Tim Bunnell, Ph.D. Nemours

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015 Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

An introduction to using Microsoft Excel for quantitative data analysis

An introduction to using Microsoft Excel for quantitative data analysis Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

More information

Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem) NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability. Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

More information