# Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Save this PDF as:

Size: px
Start display at page:

Download "Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction"

## Transcription

1 Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction Some parts of this lecture are adopted with permission from lectures given by Sira Vegas and Oscar Dieste at UPM

2 Outline Descriptive statistics Statistical Analysis Parametric Tests Student s t-test Paired t-test One-way ANOVA Non-parametric Tests Mann-Whitney Wilcoxon Sign test Jedlitschka, Vegas, Dieste 2014 Slide 2

3 DESCRIPTIVE STATISTICS Jedlitschka, Vegas, Dieste

4 Important notice In inferential statistics, the population parameters are clearly differentiated from estimators (parameters calculated from samples) Population parameters are designated by Greek letters: μ, σ 2, σ Estimators are designated by Latin letters: m, s 2, s In most cases, symbols have an associated subscript denoting the associated sample (a treatment, usually): μ a, s b Jedlitschka, Vegas, Dieste 2014 Slide 4

5 Important notice The notational aspect is important because there are some differences in the calculation of estimators as compared to population parameters, concretely in the case of the variance: Sample variance It affects standard deviation, as it is the squared root of the variance (n-1) are the degrees of freedom of the sample. This will be important soon Jedlitschka, Vegas, Dieste 2014 Slide 5

6 Measures of central tendency Dataset: { 1, 2, 2, 2, 3, 14 } Arithmetic Mean Median Mode = 4 middle value of the ordered values: 2 Which one appears most often: 2 Measures differ in their response to outliers Jedlitschka, Vegas, Dieste 2014 Slide 6

7 Mean, Median, Mode Jedlitschka, Vegas, Dieste 2014 Slide 7

8 Dispersion (1/2) Dataset: {1, 2, 2, 2, 3, 14} Range {min, max}: {1, 14} Standard deviation (SD) σ if the data is from the population (N & μ) s if the data is from the population (N-1 & ) informs about the variation from the average Is the square root of the variance : 4,51 Jedlitschka, Vegas, Dieste 2014 Slide 8

9 Dispersion (2/2) Interquartile Range Jedlitschka, Vegas, Dieste 2014 Slide 9

10 Shape Variance σ² The average of the squared differences from the mean. Skewness Kurtosis Jedlitschka, Vegas, Dieste 2014 Slide 10

11 Dependency Linear regression Correlation coefficient (Pearson) Interval or ratio & normal distribution More than two variables: Multivariate analysis: principal component, moment_correlation_coefficient Jedlitschka, Vegas, Dieste 2014 Slide 11

12 Motivation STATISTICAL ANALYSIS Jedlitschka, Vegas, Dieste

13 A simple experiment Experiments don t have to be complicated. They can be so simple as comparing a technology to something else 1 factor Jedlitschka, Vegas, Dieste 2014 Slide 13

14 Distribution and Probability Find out whether this is a fair die! What could be the idea? Jedlitschka, Vegas, Dieste 2014 Slide 14

15 Solution Approach Either you have a trustworthy expectation Or Take by chance one of the dice Throw it one hundred times Note down each single event Derive distribution Now take this one and check whether it fulfils the expectation Jedlitschka, Vegas, Dieste 2014 Slide 15

16 A simple experiment Experiments don t have to be complicated. They can be so simple as comparing a pair of techniques 1 factor with 2 levels In cases like these, we don t need expensive tools (SPSS, STATA, etc.) to analyze the experimental results A scholar wants to know if technique A (say functional testing) is better than B (say inspection) He performs an experiment with some students and gets the following data (metric: higher value means better ): Technique A A B B A B B B A A B Measure Jedlitschka, Vegas, Dieste 2014 Slide 16

17 Question How can we decide which technique (A, B) is better? SPSS The most obvious option is looking at the data: Descriptive statistics Median, means Quartiles, variances, standard deviation and suitable plots Box plots Column1 A B 29,9 26,6 11,4 23,7 25,3 28,5 16,5 14,2 21,1 17,9 24,3 N 5 6 mean 20,84 22,53 variance 52,50 29,51 std. dev 7,25 5,43 Jedlitschka, Vegas, Dieste 2014 Slide 17

18 Box plot min Q1 Q3 max min Q1 Q3 max Jedlitschka, Vegas, Dieste

19 Preliminary answer B looks better, but the results are quite similar. We cannot be sure! It is likely that differences arise due to random chance Don t believe it? Remember what we found out with the dice. Or think about throwing a coin four times (What do you expect? What do you get?). As we can see from this example, many processes have an associated probability distribution How can we make a decision on this case? Jedlitschka, Vegas, Dieste 2014 Slide 19

20 Key question Idea: if we would know the probability distribution, we could calculate the probability that B > A Formally speaking: μ b > μ a Problem: What happens if we ignore the probability distribution? Jedlitschka, Vegas, Dieste 2014 Slide 20

21 Reference distribution Fisher claims that it is possible to relate the experimental results with a reference distribution, which is based on the same experimental data. Using this reference distribution, we can obtain an estimation about the likelihood of a given results under the assumption that A and B does not differ (that is, supposing that μ b = μ a ) Does the difference between the two groups represent a real difference or was it due to chance? Jedlitschka, Vegas, Dieste 2014 Slide 21

22 Standard distributions Building the reference distribution, even for a small example, requires a lot of effort. Under some assumptions, reference distributions are close to known probability distribution, such as normal (Gauss) distribution or, in our particular case, Students t t is used instead of the normal distributed when the sample sizes involved are small The good thing is that standard distributions are tabulated. Significance levels can be obtained immediately from the tables. Jedlitschka, Vegas, Dieste 2014 Slide 22

23 Use the standard distribution Calculate the actual difference between means Say d = ( b a) Locate d in the histogram Calculate the area of the histogram that falls at the right side of d That area is the probability that, by mater of chance, we could obtain a difference between means of value ( b a) or higher We call it p-value If the p-value is below a cutoff value α (significance level) we can affirm the techniques A and B are not alike α is arbitrarily set at 0.05 We say that we have obtained a significant result Jedlitschka, Vegas, Dieste 2014 Slide 23

24 Back to the Example Observed difference Null Hypothesis is not rejected Jedlitschka, Vegas, Dieste

25 Parametric Test / Independent Sample T-TEST Jedlitschka, Vegas, Dieste

26 T-Test One factor experiments with one level One-sample t-test Compare mean response of a group against a specific value The formula shows the general concept used by the following tests = mean (of groups 1 and 2) µ 0 = specified value (e.g., population mean) n = number of subjects in groups (1 and 2) (equal!!!) s = Standard Deviation of group (1 and 2) df = n-1 Lookup t in Student's t-distribution table to obtain p-value. Jedlitschka, Vegas, Dieste 2014 Slide 26

27 T-Test One factor experiments with two levels Two-sample t-test Checks the statistical signification of the difference between the mean responses of two levels of a factor Checks the null hypothesis of the samples belonging to two subpopulations where the mean X is the same Pre-requisites : the two sample sizes (that is, the number, n, of participants of each group) are equal; it can be assumed that the two distributions have the same variance. 2 H = mean (of groups 1 and 2) 1 n = number of subjects in groups (1 and 2) (equal!!!) s = Standard Deviation of group (1 and 2) s² = unbiased estimators of the variances df = 2n-2 H 0 : 2 2 Jedlitschka, Vegas, Dieste

28 T-Test One factor experiments with two levels Special cases Unequal sample sizes, equal variance df = n 1 + n 2-2 Equal or Unequal sample sizes, unequal variances (also Welch s t-test) Jedlitschka, Vegas, Dieste 2014 Slide 28

29 T-Test Project A B Program 3,42 3,44 Defect 2,71 4,97 density 2,84 4,76 1,85 4,96 3,22 4,10 3,48 3,05 2,68 4,09 4,30 3,69 2,49 4,21 1,54 4,40 3,49 1. Calculate means 2. Calculate difference of means 3. Use formula (unequal N) 4. Check obtained t value for respective df in t distribution table 5. Reject H0 if t0 > t α/2,df (two sided) 5. Reject H0 if t0 > t α,df (one sided) Data taken from Wohlin et al Jedlitschka, Vegas, Dieste 2014 Slide 29

30 t-distribution requirements There are three requirements 1. Samples must be independent and identically distributed (i.i.d.). In practice, it means that assignment of levels (A s and B s) to experimental units (subjects) have to be performed in a randomized way i.i.d. implies homoscedasticity and non-interaction 2. Accordingly, the mean estimator should be normally distributed (or close to normality) 3. Response variables are measured on ratio scales. Ordinal metrics cannot be used Condition #1 is probably more important than condition #2 and #3 Jedlitschka, Vegas, Dieste 2014 Slide 30

31 Non-parametric tests If condition #2 does not hold There are several test to check normality or condition #3 does not hold Ordinal metrics can be used non-parametric test can be applied Condition #1 must hold The Wilcoxon Rank Sum or Mann-Whitney Test is one most popular tests. Quite easy, but requires a minimum sample size and has some technical problems (power calculation) Jedlitschka, Vegas, Dieste 2014 Slide 31

32 Parametric vs. non-parametric Obviously, t distribution is an instance of a parametric test The main difference between both types of tests is the assumption of the distribution of the sample Non-parametric test do not make any assumption Non-parametric tests can be applied in situations where parametric cannot, but in turn they are more conservative (less power) Jedlitschka, Vegas, Dieste 2014 Slide 32

33 Non-Parametric Test / Independent Sample MANN WHITNEY U TEST Jedlitschka, Vegas, Dieste

34 Mann Whitney U test Non-parametric test for independent groups It has greater efficiency than the t-test on non-normal distributions Pre-requisites The responses are at least ordinal The distributions of both groups are equal under the null hypothesis Jedlitschka, Vegas, Dieste 2014 Slide 34

35 Mann Whitney U test Method 1: For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of the U statistic. Choose the sample for which the ranks seem to be smaller (The only reason to do this is to make computation easier). Call this "sample 1," and call the other sample "sample 2." For each observation in sample 1, count the number of observations in sample 2 that have a smaller rank (count a half for any that are equal to it). The sum of these ranks is U. Jedlitschka, Vegas, Dieste 2014 Slide 35

36 Mann Whitney U test Method 2: For larger samples, a formula can be used: Add up the ranks for the observations which came from sample 1. Where there are tied groups, take the rank to be equal to the midpoint of the group. The sum of ranks in sample 2 is now determinate, because the sum of all the ranks equals N(N + 1)/2 where N is the total number of observations. U is then given by: and R = Sum of Ranks for the respective group Reject H0 if min(u1, U2) is <= the critical value for the MW Jedlitschka, Vegas, Dieste 2014 Slide 36

37 Mann Whitney U test Project A Rank B Rank Program 3,42 9 3,44 10 Defect 2,71 5 4,97 21 density 2,84 6 4, ,85 2 4, ,22 8 4, , ,05 7 2,68 4 4, , , ,49 3 4, ,54 1 4, ,49 12 S of Ranks U 1 = 99 (use formula) U 2 = 11 (use formula) Check min(u 1, U 2 ) in table n of smaller sample n of larger sample 11 <= 26: reject H0 Data taken from Wohlin et al Table: Jedlitschka, Vegas, Dieste 2014 Slide 37

38 Parametric Test / Dependent Sample PAIRED T-TEST Jedlitschka, Vegas, Dieste

39 Paired T-Test Parametric test for dependent samples E.g., repeated measures or matched pairs differences between all pairs must be calculated = mean of differences between pairs µ 0 = (optional) specified value (e.g., population mean) n = number of subjects s D = Standard Deviation of differences (1 and 2) df = n-1 Jedlitschka, Vegas, Dieste 2014 Slide 39

40 Example Paired T-Test 1. Calculate differences (P1 P2) 2. Calculate mean of differences 3. Calculate std. dev. of differences 4. Use formula 5. Check t value for respective df in table 6. Reject H0 if t0 > t α/2,df (two sided) 6. Reject H0 if t0 > t α,df (one sided) Programmer P1 P2 P1 P ,1 18, ,9 16, ,3 32, N mean 131,10 127,73 3,37 variance 627, ,60 748,46 std. dev. 25,04 39,54 27,36 df (N 1) 9 Jedlitschka, Vegas, Dieste 2014 Slide 40

41 T-Test Table Reject H0 if t0 > t α/2,df (two sided) = => do not reject H0!!! Jedlitschka, Vegas, Dieste 2014 Slide 41

42 Table for T-Test SPSS Outputs Jedlitschka, Vegas, Dieste 2014 Slide 42

43 Non Parametric Test / Dependent Sample WILCOXON SIGN TEST Jedlitschka, Vegas, Dieste

44 Wilcoxon Non-parametric for dependent samples alternative to the paired t-test Pre-requisites It must be possible to determine which value is larger and to rank the differences T1 = 23 (sum negative d) d= P1 Ranks (d) T2 = 32 (sum positive d) Programmer P1 P2 P1 P2 P ,1 18,9 18, ,9 16,1 16, ,3 32,7 32, N 10,00 10,00 mean 131,10 127,73 3,37 variance 627, ,60 748,46 std. dev 25,04 39,54 27,36 T T+ Sum of Ranks Check min(u1, U2) in table 23!<= 8: do not reject H0 Jedlitschka, Vegas, Dieste 2014 Slide 44

45 Sign Test Non-parametric for dependent samples alternative to the paired t-test Used if it is not possible to rank the differences but still, at least ordinal scale Based on the signs of the difference Formula: Programmer P1 P2 P1 P2 Sign ,1 18, ,9 16, ,3 32, N 10,00 10,00 mean 131,10 127,73 3,37 variance 627, ,60 748,46 std. dev 25,04 39,54 27,36 Count + 6 T1 = 6 (# negative d) T2 = 4 (# positive d) n = min (T1, T2) do not reject H0!!! Jedlitschka, Vegas, Dieste 2014 Slide 45

46 Parametric Methods / Independent Sample ONE FACTOR ANOVA Jedlitschka, Vegas, Dieste

47 ONE-FACTOR ANOVA One factor experiments with more than two levels Checks the statistical significance of the difference between the mean responses of one factor with several levels Y ij j e ij j Y Y Steps: 1. Identify the mathematical model 2. Validation of the basic model that relates the experimental variables 3. Calculate the factor induced variation in the response variable 4. Calculate the statistical significance of the factor-induced variation 5. Establish consequences or recommendations on the alternative that provides the best response variable values j j Y Jedlitschka, Vegas, Dieste 2014 Slide 47

48 Example: ANOVA Factor = programming language levels = {ADA, C, C++, JAVA} Response variable = number of errors detected during three months after development ( Quality ) Number of subjects = 24 H 0 = There is no effect of the programming language on the quality of the program PRG Languages ADA C C++ JAVA N Mean Grand Mean 64 Jedlitschka, Vegas, Dieste 2014 Slide 48

49 Example: ANOVA Results: Descriptives ADA lead to a quality of 61±1.83 Jedlitschka, Vegas, Dieste 2014 Slide 49

50 Example: ANOVA Results: > do not reject H0: There are no significant differences between the variances of the two groups. => variances are equal There is a statistically significant difference between groups as determined by one way ANOVA (F = , p =.021). What do we know now? Jedlitschka, Vegas, Dieste 2014 Slide 50

51 Example: ANOVA Post Hoc Tests Scheffé because of different N. else Tukey is preferred There is statistically significant difference between ADA and C (C++) p=0.032 (p=0.002) and JAVA and C (C++) p=0.009 (p=0.000). There are no difference between ADA and JAVA as well as C and C++. Jedlitschka, Vegas, Dieste 2014 Slide 51

52 Example: ANOVA Homogeneous Subsets Jedlitschka, Vegas, Dieste 2014 Slide 52

53 Example: ANOVA Means Plot Jedlitschka, Vegas, Dieste 2014 Slide 53

54 Further Analysis Two-way ANOVA MANOVA ranova Multitude of other tests Jedlitschka, Vegas, Dieste 2014 Slide 54

55 DECISION TREE Jedlitschka, Vegas, Dieste 2014 Slide 55

56 References Wohlin, Runeson, Höst, Ohlsson, Regnell, Wesslén (2012). Experimentation in Software Engineering, Springer J. Bortz, and N. Döring (2006). Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler (4 Auflage). Berlin: Springer Verlag. N. Juristo and A. Moreno. (2001). Basics of Software Engineering Experimentation, Kluwer Academic Publishers. Jedlitschka, Vegas, Dieste 2014 Slide 56

### How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo

How to choose a statistical test Francisco J. Candido dos Reis DGO-FMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### CHAPTER 3 COMMONLY USED STATISTICAL TERMS

CHAPTER 3 COMMONLY USED STATISTICAL TERMS There are many statistics used in social science research and evaluation. The two main areas of statistics are descriptive and inferential. The third class of

### SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

### Analysis of numerical data S4

Basic medical statistics for clinical and experimental research Analysis of numerical data S4 Katarzyna Jóźwiak k.jozwiak@nki.nl 3rd November 2015 1/42 Hypothesis tests: numerical and ordinal data 1 group:

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### An example ANOVA situation. 1-Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters)

An example ANOVA situation Example (Treating Blisters) 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

### Supplement on the Kruskal-Wallis test. So what do you do if you don t meet the assumptions of an ANOVA?

Supplement on the Kruskal-Wallis test So what do you do if you don t meet the assumptions of an ANOVA? {There are other ways of dealing with things like unequal variances and non-normal data, but we won

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Seminar paper Statistics

Seminar paper Statistics The seminar paper must contain: - the title page - the characterization of the data (origin, reason why you have chosen this analysis,...) - the list of the data (in the table)

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Nonparametric tests, Bootstrapping

Nonparametric tests, Bootstrapping http://www.isrec.isb-sib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis

### Using SPSS version 14 Joel Elliott, Jennifer Burnaford, Stacey Weiss

Using SPSS version 14 Joel Elliott, Jennifer Burnaford, Stacey Weiss SPSS is a program that is very easy to learn and is also very powerful. This manual is designed to introduce you to the program however,

### L.8: Analysing continuous data

L.8: Analysing continuous data - Types of variables - Comparing two means: - independent samples - Comparing two means: - dependent samples - Checking the assumptions - Nonparametric test Types of variables

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### 1. Why the hell do we need statistics?

1. Why the hell do we need statistics? There are three kind of lies: lies, damned lies, and statistics, British Prime Minister Benjamin Disraeli (as credited by Mark Twain): It is easy to lie with statistics,

### Box plots & t-tests. Example

Box plots & t-tests Box Plots Box plots are a graphical representation of your sample (easy to visualize descriptive statistics); they are also known as box-and-whisker diagrams. Any data that you can

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

### UNDERSTANDING THE ONE-WAY ANOVA

UNDERSTANDING The One-way Analysis of Variance (ANOVA) is a procedure for testing the hypothesis that K population means are equal, where K >. The One-way ANOVA compares the means of the samples or groups

### Inferences About Differences Between Means Edpsy 580

Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Inferences About Differences Between Means Slide

### Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10

Data and Regression Analysis Lecturer: Prof. Duane S. Boning Rev 10 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance Model forms 3.

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### Inferential Statistics. Probability. From Samples to Populations. Katie Rommel-Esham Education 504

Inferential Statistics Katie Rommel-Esham Education 504 Probability Probability is the scientific way of stating the degree of confidence we have in predicting something Tossing coins and rolling dice

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### c. The factor is the type of TV program that was watched. The treatment is the embedded commercials in the TV programs.

STAT E-150 - Statistical Methods Assignment 9 Solutions Exercises 12.8, 12.13, 12.75 For each test: Include appropriate graphs to see that the conditions are met. Use Tukey's Honestly Significant Difference

### Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### T adult = 96 T child = 114.

Homework Solutions Do all tests at the 5% level and quote p-values when possible. When answering each question uses sentences and include the relevant JMP output and plots (do not include the data in your

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Difference tests (2): nonparametric

NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### ANSWERS TO EXERCISES AND REVIEW QUESTIONS

ANSWERS TO EXERCISES AND REVIEW QUESTIONS PART FIVE: STATISTICAL TECHNIQUES TO COMPARE GROUPS Before attempting these questions read through the introduction to Part Five and Chapters 16-21 of the SPSS

### Hypothesis Testing hypothesis testing approach formulation of the test statistic

Hypothesis Testing For the next few lectures, we re going to look at various test statistics that are formulated to allow us to test hypotheses in a variety of contexts: In all cases, the hypothesis testing

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Statistiek I. t-tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen. John Nerbonne 1/35

Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://wwwletrugnl/nerbonne/teach/statistiek-i/ John Nerbonne 1/35 t-tests To test an average or pair of averages when σ is known, we

### Intro to Parametric Statistics,

Descriptive Statistics vs. Inferential Statistics vs. Population Parameters Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom

### Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

### On Importance of Normality Assumption in Using a T-Test: One Sample and Two Sample Cases

On Importance of Normality Assumption in Using a T-Test: One Sample and Two Sample Cases Srilakshminarayana Gali, SDM Institute for Management Development, Mysore, India. E-mail: lakshminarayana@sdmimd.ac.in

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

### QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### 3. Nonparametric methods

3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

### ACTM State Exam-Statistics

ACTM State Exam-Statistics For the 25 multiple-choice questions, make your answer choice and record it on the answer sheet provided. Once you have completed that section of the test, proceed to the tie-breaker

### Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 1: NON-PARAMETRIC TESTS: What are non-parametric tests? Statistical tests fall into two kinds: parametric tests assume that

### Comparing two groups (t tests...)

Page 1 of 33 Comparing two groups (t tests...) You've measured a variable in two groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the two groups are really different?

### 1 Measures for location and dispersion of a sample

Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable

### Statistics: revision

NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 3 / 4 May 2005 Department of Experimental Psychology University of Cambridge Slides at pobox.com/~rudolf/psychology

### The Statistics Tutor s

statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence Stcp-marshallowen-7 The Statistics Tutor s www.statstutor.ac.uk

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Chapter 16 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 16 Multiple Choice Questions (The answers are provided after the last question.) 1. Which of the following symbols represents a population parameter? a. SD b. σ c. r d. 0 2. If you drew all possible

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### Review Statistics review 9: One-way analysis of variance Viv Bewick 1, Liz Cheek 1 and Jonathan Ball 2

Review Statistics review 9: One-way analysis of variance Viv Bewick 1, Liz Cheek 1 and Jonathan Ball 1 Senior Lecturer, School of Computing, Mathematical and Information Sciences, University of Brighton,

### Hypothesis Testing. Chapter 7

Hypothesis Testing Chapter 7 Hypothesis Testing Time to make the educated guess after answering: What the population is, how to extract the sample, what characteristics to measure in the sample, After

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

### Biostatistics Lab Notes

Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks

### Week 7 Lecture: Two-way Analysis of Variance (Chapter 12) Two-way ANOVA with Equal Replication (see Zar s section 12.1)

Week 7 Lecture: Two-way Analysis of Variance (Chapter ) We can extend the idea of a one-way ANOVA, which tests the effects of one factor on a response variable, to a two-way ANOVA which tests the effects

### One Way ANOVA. A method for comparing several means along a single variable

Analysis of Variance (ANOVA) One Way ANOVA A method for comparing several means along a single variable It is the same as an independent samples t test, test but for 3 or more samples Called one way when

### Contents 1. Contents

Contents 1 Contents 3 K-sample Methods 2 3.1 Setup............................ 2 3.2 Classic Method Based on Normality Assumption..... 3 3.3 Permutation F -test.................... 5 3.4 Kruskal-Wallis

### One-Way Analysis of Variance

Spring, 000 - - Administrative Items One-Way Analysis of Variance Midterm Grades. Make-up exams, in general. Getting help See me today -:0 or Wednesday from -:0. Send an e-mail to stine@wharton. Visit

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

### Suggested solution for exam in MSA830: Statistical Analysis and Experimental Design October 2009

Petter Mostad Matematisk Statistik Chalmers Suggested solution for exam in MSA830: Statistical Analysis and Experimental Design October 2009 1. (a) To use a t-test, one must assume that both groups of

### Hypothesis Testing. Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University

Hypothesis Testing Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 AMU / Bon-Tech, LLC, Journi-Tech Corporation Copyright 2015 Learning Objectives Upon successful

### CREIGHTON UNIVERSITY GRADUATE COLLEGE Fall Semester 2014. Biostatistics & Analysis of Clinical Data for Evidence-based Practice

CREIGHTON UNIVERSITY GRADUATE COLLEGE Fall Semester 2014 Course Number: Course Title: Credit Allocation: Placement: CTS 601 Biostatistics & Analysis of Clinical Data for Evidence-based Practice 3 semester

### Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test

Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test When samples do not meet the assumption of normality parametric tests should not be used. To overcome this problem, non-parametric tests can

### ANOVA Analysis of Variance

ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Business Statistics. Lecture 8: More Hypothesis Testing

Business Statistics Lecture 8: More Hypothesis Testing 1 Goals for this Lecture Review of t-tests Additional hypothesis tests Two-sample tests Paired tests 2 The Basic Idea of Hypothesis Testing Start

### CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

### Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Chapter 3: Nonparametric Tests

B. Weaver (15-Feb-00) Nonparametric Tests... 1 Chapter 3: Nonparametric Tests 3.1 Introduction Nonparametric, or distribution free tests are so-called because the assumptions underlying their use are fewer

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### Hypothesis Testing & Data Analysis. Statistics. Descriptive Statistics. What is the difference between descriptive and inferential statistics?

2 Hypothesis Testing & Data Analysis 5 What is the difference between descriptive and inferential statistics? Statistics 8 Tools to help us understand our data. Makes a complicated mess simple to understand.

### One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots

1 / 27 One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis

### Unit 24 Hypothesis Tests about Means

Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a two-sample t test To perform a paired t test To perform a two-sample t test A measure of the amount

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we