Hypothesis Testing hypothesis testing approach formulation of the test statistic

Similar documents
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

One-Way Analysis of Variance (ANOVA) Example Problem

Using Microsoft Excel to Analyze Data from the Disk Diffusion Assay

Data Analysis Tools. Tools for Summarizing Data

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Two-Group Hypothesis Tests: Excel 2013 T-TEST Command

Guide to Microsoft Excel for calculations, statistics, and plotting data

Using Microsoft Excel to Analyze Data

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

Module 4 (Effect of Alcohol on Worms): Data Analysis

3.4 Statistical inference for 2 populations based on two samples

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Difference of Means and ANOVA Problems

Using Excel for Statistics Tips and Warnings

Using Excel for inferential statistics

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

Recall this chart that showed how most of our course would be organized:

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Two-sample hypothesis testing, II /16/2004

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Regression step-by-step using Microsoft Excel

How To Run Statistical Tests in Excel

Projects Involving Statistics (& SPSS)

Comparing Means in Two Populations

Two Related Samples t Test

NCSS Statistical Software

TIPS FOR DOING STATISTICS IN EXCEL

When to use Excel. When NOT to use Excel 9/24/2014

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Section 13, Part 1 ANOVA. Analysis Of Variance

Using Excel s Analysis ToolPak Add-In

Chapter 2 Probability Topics SPSS T tests

The Dummy s Guide to Data Analysis Using SPSS

12: Analysis of Variance. Introduction

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Advanced Excel for Institutional Researchers

NCSS Statistical Software

E x c e l : Data Analysis Tools Student Manual

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

An SPSS companion book. Basic Practice of Statistics

Using Microsoft Excel for Probability and Statistics

1.5 Oneway Analysis of Variance

Hypothesis testing - Steps

In the past, the increase in the price of gasoline could be attributed to major national or global

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Tutorial 5: Hypothesis Testing

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Analysis of categorical data: Course quiz instructions for SPSS

1. Go to your programs menu and click on Microsoft Excel.

Analysis of Variance ANOVA

Chapter 23 Inferences About Means

Independent samples t-test. Dr. Tom Pierce Radford University

2 Sample t-test (unequal sample sizes and unequal variances)

Point Biserial Correlation Tests

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Tests for Two Proportions

Testing Research and Statistical Hypotheses

To create a histogram, you must organize the data in two columns on the worksheet. These columns must contain the following data:

An introduction to using Microsoft Excel for quantitative data analysis

Linear Models in STATA and ANOVA

Non-Inferiority Tests for One Mean

Using Excel in Research. Hui Bian Office for Faculty Excellence

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Two-sample inference: Continuous data

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Additional sources Compilation of sources:

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Chi-square test Fisher s Exact test

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Introduction. Statistics Toolbox

Simple Regression Theory II 2010 Samuel L. Baker

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Statistical Functions in Excel

How To Test For Significance On A Data Set

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Statistics Review PSY379

ABSORBENCY OF PAPER TOWELS

individualdifferences

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Problems With Using Microsoft Excel for Statistics

3 The spreadsheet execution model and its consequences

Transcription:

Hypothesis Testing For the next few lectures, we re going to look at various test statistics that are formulated to allow us to test hypotheses in a variety of contexts: In all cases, the hypothesis testing approach uses the same multi-step process: 1. State the null hypothesis (H 0 ) 2. State the alternative hypothesis (H A ) 3. Choose α, our significance level 4. Select a statistical test, and calculate the test statistic 5. Determine the critical value where H 0 will be rejected 6. Compare the test statistic with the critical value What differs is the formulation of the test statistic (which distribution/which formula)

Hypothesis Testing - t-distributions T-distributions are very similar in shape to the normal distribution, but the tails of the distribution contain a larger proportion of the total area than those in a normal distribution, and the shape of the crown of the distribution is slightly flatter on t- distributions than in the normal case This allows t-distributions to account for the higher variability we expect to find using smaller sample sizes (i.e. this reflects the uncertainty in estimating σ from s as n gets quite small s will certainly be greater than σ as n )

Hypothesis Testing - t-distributions Normal Dist. T-Dist. Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 1066.

Hypothesis Testing - t-tests We can formulate t-tests that can be used to address a range of different kinds of situations: 1. The basic one-sample t-test is used in much the same way as the basic z-test, except in instances where the sample size is less than or equal to 30 2. We can use a properly formulated t-test to compare the mean statistics derived from a pair of samples to assess if the samples are drawn from the same population and/or if they are significant different from one another using a two-sample t- test for the mean

Hypothesis Testing - t-tests 3. In a case where we are looking at paired observations (i.e. two samples that are collected in such a way that each observation in one sample matches up with its counterpart in the corresponding sample), there is a reduction in the amount of independence of the observations, and this needs to be taken into account in the formulation of the t-test, which in this case is known as a paired comparison or matched pairs t-test (e.g. measuring soil moisture at a set of 10 locations before and after a rainfall event)

Hypothesis Testing - Matched Pairs t-tests Matched pairs t-tests are used to compare one sample mean with another sample mean when the two samples consist of observations that were not sampled in a totally random or independent fashion, but instead represent repeated samples of the same members of a population, usually for purposes of assessing change over time (e.g. growth rates for the same cities in two decades) When inherent pairing exists between the pairs of elements in two samples, they are not independent, and we cannot treat them as such and use a two-sample t-test to compare them. Instead, we use the matched pairs t- test, that calculates the difference between values of the member of a pair in each sample

Hypothesis Testing - Matched Pairs t-tests The form of the sample statistic is based upon the calculated differences between the two samples: t test = d s d n where d is the average of the differences S d = Σ (d i -d) 2 n - 1 We use this test statistic: 1. To compare the sample means of paired samples 2. The size of the samples is somewhat small, i.e. n 30 3. When the two samples contain members that were not sampled at random but represent observations of the same entities, usually at different times or after some treatment has been applied

Hypothesis Testing - Matched Pairs t-test Example Data: Suppose we are interested in comparing the population growth rates (in %) for 6 cities in 2 decades (1971-1980 and 1981-1990): City 1971-1980 1981-1990 Difference 1 10 8 2 2 8 7 1 3 7 7 0 4 6 6 0 5 6 5 1 6 5 3 2 These are matched pairs, so we must first calculate the differences between the sample values

Hypothesis Testing - Matched Pairs t-test Example Research question: Are the growth rates for the two decades different for the set of 6 cities we have sampled? 1. H 0 : d = 0 (No significant difference in growth rate) 2. H A : d 0 (Growth rates are significantly different) 3. Select α = 0.05, two-tailed because of how the alternate hypothesis is formulated 4. In order to compute the t-test statistic, we need to generate some descriptive statistics of the differences between the growth rates in the 6 cities over the two decades, as we need the mean difference in rates, along with the statistical distances between the individual differences and the mean difference

Hypothesis Testing - Matched Pairs t-test Example 4. Cont. We calculate the mean difference using: Σ d i d = n = 2+1+0+0+1+2 6 = 6 6 =1 We calculate the standard deviation of the differences by taking the square root of the sum of squares of the statistical distances of the differences and dividing it by the sample size minus one: S d = Σ (d i -d) 2 n - 1 = (2-1)2 + (1-1) 2 + (0-1) 2 + (0-1) 2 +(1-1) 2 +(2-1) 2 6-1 = (1)2 + (0) 2 + (-1) 2 + (-1) 2 +(0) 2 +(1) 2 = 4 = 0.89 5 5

Hypothesis Testing - Matched Pairs t-test Example 4. Cont. We now have the quantities that we need to plug into the test statistic: t test = d s d n = 1 0.89 6 = 1 0.365 = 2.738 5. We now need to find the critical t-score, first calculating the degrees of freedom: df = (n - 1) = (6-1) = 5 We can now look up the t crit value for our α (0.025 in each tail) and df = 5, t crit = 2.571

Hypothesis Testing - Matched Pairs t-test Example 6. t test > t crit, therefore we reject H 0, and accept H A, finding that there is a significant difference in the growth rates of these six cities when the two samples of decadal growth rates are compared to one another

Using Excel to Conduct t-tests Through its Analysis ToolPak, Microsoft Excel has the ability to calculate many of the test statistics that you have already been introduced to in recent lectures including the 3 types of t-tests that we have examined Excel will calculate the test statistic, and in many cases it will also report the critical value for the test (based on an α value that you specify), and sometimes it will even calculate and report the significance probability associated with the calculated test statistic What Excel will not do is tell you which test to use in a particular situation, nor does it give clear indications of how to use the tests or interpret the results

Starting the Analysis ToolPak In the Tools menu, you should see an item entitled Data Analysis: If Data Analysis is absent from this menu, we will need to activate it before proceeding

Starting the Analysis ToolPak In the Tools menu, click on Add-ins, and this should bring up the Add-ins menu: Check off the Analysis ToolPak and/or Analysis ToolPak - VBA, and then click OK; Data Analysis now should be present in the Tools menu

Selecting an Analysis Tool You should now see the Data Analysis window, that lets you choose the analysis tool you wish to use: For example, you ll find the variance ratio test (F-Test) listed near the top of the menu, the t-tests are below:

Selecting an Analysis Tool Excel provides tools to calculate the last four test statistics we ve examined in this course: the equal variances assumption can be tested using the F-Test Two-Sample for Variance For situations where variances are found to be equal (homoscedastic), we can use the t-test Two-Sample Assuming Equal Variances For situations where variances are unequal (heteroscedastic), we can use the t-test Two-Sample Assuming Unequal Variances For situations where we have paired samples we wish to test, we can use the t-test Paired Two-Sample for Means tool

Using Analysis Tools (Filling in the Fields) The procedure for using an Analysis Tool is more or less the same for any of them. Select the desired Analysis Tool from the menu by highlighting it in the Data Analysis window and clicking OK Here, I have selected the F-Test Two-Sample for Variances to compare two samples variances. Once I click OK, this will bring up a pop-up window for the particular Analysis Tool

Using Analysis Tools (Filling in the Fields) Here s the F-Test Two-Sample for Variances Analysis Tool window: To use the tool, I need to fill in the required fields to specify the data to be tested, the confidence value, and where I want the output written. Some other tests require more parameters this is a very simple tool that has few fields and required parameters

Using Analysis Tools (Filling in the Fields) Before we fill in the fields, its useful to have the data set up in the work sheet in a fashion that makes it easy to select the cells that contain the samples that we want to compare For example, suppose I want to compare variances of soil moisture samples taken at Pond Branch and Glyndon on the same day. I should set each sample up in its own column, with a label that describes what is in the column:

Using Analysis Tools (Filling in the Fields) The Range fields are filled in by highlighting those cells that contain each of the samples: I have checked off Labels because I have included the label cells ( XXTHETA ) in the Ranges

Using Analysis Tools (Filling in the Fields) All that remains is to specify our α value, which is set to 0.05 by default, and to specify the Output options By default, Excel will write the output of the test to a new worksheet ply. Alternatively, click on the Output Range radio button, and specify a cell where the upper-left corner of the output should be located:

Interpreting Analysis Tool Output Once you click OK to run the tool, Excel will calculate the test statistic and write the output where you specified: degrees of freedom the test statistic P-value associated with test statistic the critical value The output varies depending on the tool used, but there are certain items that will appear for each of the tests we will be using

F-Test Two-Sample for Variances Reading the results of an F-Test on Excel requires us to look at those last three lines of the output: In this case, the F test value (denoted simply by F) ~ 1.73 The F crit value (denoted here by F Critical one-tail) for our selected α of 0.05 for df=(18,23) ~ 2.08 Since F test < F crit, we accept H 0, which for an F-Test means that the variances can be assumed to be equal P(F<=f) one-tail provides the minimum α where F can successfully reject the null hypothesis

F-Test Two-Sample for Variances A warning about using the F-Test in Excel: This analysis tool does not seem to produce the correct values unless the sample with the larger variance is placed in the numerator position, i.e. the sample with the larger variance must be in the Variable 1 Range and not vice versa If you place the sample with the larger variance in the Variable 2 Range field, you will not get the right results : The routines that were written for this analysis tool seem to be unable to deal with that situation and produce erroneous results

t-test Two-Sample Assuming Equal Variances Since our F-Test has shown the equal variance assumption to be acceptable for these two samples, let s use the appropriate t-test to compare their mean values: The Hypothesized Mean Difference is 0 by default, and that field can be left empty

t-test Two-Sample Assuming Equal Variances Here is the output the Excel produces for this t-test: The analysis tool provides the critical values and minimum α values that can reject the null hypothesis for both one-tailed and two-tailed scenarios Here, t Stat > t Critical one-tail and t Critical two-tail, so this data can reject H 0 in either case at α = 0.05 The tiny P values (scientific notation with -ve exponents) tell us we could reject H at very small α

t-test Two-Sample Assuming Unequal Variances On another date (Feb. 21/22, 2002) the variances of the PB and GL samples were unequal, requiring use to use the t-test Two-Sample Assuming Unequal Variances: The fields in the analysis tool window for this test are essentially identical to the other t-test s

t-test Two-Sample Assuming Equal Variances Here is the output the Excel produces for this t-test: The analysis tool again provides the critical values and minimum α values that can reject the null hypothesis for both one-tailed and two-tailed scenarios Here, t Stat > t Critical one-tail and t Critical two-tail, so this data can reject H 0 in either case at α = 0.05 The tiny P values (scientific notation with -ve exponents) tell us we could reject H at very small α

t-test Paired Two-Sample for Means Suppose we want to compare the mean soil moisture in Glyndon sampled before and after a rainstorm. These are paired samples since they are taken at the same locations, so we use the t-test Paired Two-Sample for Means analysis tool:

t-test Paired Two-Sample for Means As a condition, our paired samples need to be of the same sample size (obviously) AND they must also have the samples in the same order, i.e. Excel is pairing them up based on their order in the column, so they must be in the same order! The output once again is the same as the other t-test, with the addition of a correlation statistic between the two samples (we ll learn about this stat. in the coming weeks) Again, a large t Stat, H A accepted at 1-tail or 2-tail levels α = 0.05, P for min. α values very small

Hypothesized Mean Difference Each of the t-test formulations in Excel gives you the chance to enter a hypothesized mean difference What this means, is that instead of testing whether or not the two sampled means are equal as the null hypothesis, you could instead test if their difference was greater than a certain amount This might be well applied in the examples I ve shown you today: My sampling showed that Glyndon was always more wet than Pond Branch, on average 0.07 wetter on all comparable sampling days I might use this value as a hypothesized mean difference in my t-tests if I believe that is the null condition to which I wish to compare samples

Using Excel to Calculate Test Statistics In summary: Excel will do all the calculations for you when it comes to computing a test statistic, and it will calculate degrees of freedom and provide the appropriate critical values (using an α value that you specify) which you must use to test the hypotheses, as well as indicate the minimum α required to reject the null hypothesis But Excel will not formulate your hypotheses for you, it will not determine if a one-tailed or two-tailed test is appropriate, it will not compare the test and critical values to select a hypothesis, and most critically -- it WILL NOT warn you if you are using the wrong test statistic, or have made some small error in filling in the fields -- YOU must still understand what you are doing!