Significance, Meaning and Confidence Intervals

Similar documents
General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Two Related Samples t Test

Independent t- Test (Comparing Two Means)

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

2 Sample t-test (unequal sample sizes and unequal variances)

Two-sample hypothesis testing, II /16/2004

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Comparing Means in Two Populations

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Confidence intervals

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Correlational Research

5.1 Identifying the Target Parameter

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Confidence Intervals for the Difference Between Two Means

6.2 Normal distribution. Standard Normal Distribution:

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

HYPOTHESIS TESTING WITH SPSS:

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Two-sample inference: Continuous data

Week 4: Standard Error and Confidence Intervals

6.4 Normal Distribution

Section 13, Part 1 ANOVA. Analysis Of Variance

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Basics of Statistical Machine Learning

AP STATISTICS (Warm-Up Exercises)

THE KRUSKAL WALLLIS TEST

X = rnorm(5e4,mean=1,sd=2) # we need a total of 5 x 10,000 = 5e4 samples X = matrix(data=x,nrow=1e4,ncol=5)

Analysis of Variance ANOVA

How To Test For Significance On A Data Set

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Lesson 20. Probability and Cumulative Distribution Functions

Inference for two Population Means

Study Guide for the Final Exam

Lecture Notes Module 1

Factors affecting online sales

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Principles of Hypothesis Testing for Public Health

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

4. Continuous Random Variables, the Pareto and Normal Distributions

Analysing Questionnaires using Minitab (for SPSS queries contact -)

3.4 Statistical inference for 2 populations based on two samples

Normal distribution. ) 2 /2σ. 2π σ

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Unit 26 Estimation with Confidence Intervals

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

1-3 id id no. of respondents respon 1 responsible for maintenance? 1 = no, 2 = yes, 9 = blank

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

A) B) C) D)

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

Introduction to Hypothesis Testing

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Point and Interval Estimates

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Introduction to Hypothesis Testing OPRE 6301

p ˆ (sample mean and sample

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

NCSS Statistical Software. One-Sample T-Test

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Chapter 3 RANDOM VARIATE GENERATION

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Descriptive Analysis

Chapter 7: Simple linear regression Learning Objectives

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Case Study Call Centre Hypothesis Testing

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Chapter 2. Hypothesis testing in one population

Independent samples t-test. Dr. Tom Pierce Radford University

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Regression Analysis: A Complete Example

Estimation and Confidence Intervals

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Paired 2 Sample t-test

Unit 27: Comparing Two Means

Hypothesis Testing for Beginners

Solutions to Homework 6 Statistics 302 Professor Larget

Transcription:

Significance, Meaning and Confidence Intervals Paul Cohen ISTA 370 April, 2012 Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 1 / 18

Significance vs. Meaning Significance isn t Importance You can usually get a significant result with a big sample; Saying a result is statistically significant only matters if it also is important or meaningful or interesting; p values measure significance, what measures importance or meaning? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 2 / 18

Significance vs. Meaning Importance and Effect Size Only you can decide whether a result is important or meaningful. Effect size can help. Recall that our test statistic almost always has the form: Effect size is just SampleStatistic PopulationParameterUnderH 0 SampleStandardDeviation/ N SampleStatistic PopulationParameterUnderH 0 SampleStandardDeviation Effect size is the effect expressed in standard deviation units, so that effects across experiments are comparable. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 3 / 18

Significance vs. Meaning Significance Tells You What a Parameter is Not Significance says H 0 is probably false; Significance tells you that a sample comes from a population that does not have the H 0 parameter value Significance tells you what the parameter probably isn t, what tells you what it probably is? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 4 / 18

Confidence Intervals Wouldn t it be nice to say, I drew a sample of size N, and the statistic value for that sample is f, so I can infer that in the population the corresponding parameter, φ, is bounded by an interval g(f ) φ g(f ) with high probability. The expression g(f ) φ g(f ) is a confidence interval Confidence intervals put probabilities on estimates of population parameters, given sample statistics. Intervals that may contain the popula/on parameter Sample Sta/s/c Confidence 70% 80% 95% Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 5 / 18

Examples of Confidence Intervals The average midterm grade in ISTA 370 was 15.54 with a standard deviation of 3.709. The 95% confidence interval around this mean grade is [14.25,16.83]. The mean difference between ISTA100 scores in 2010 and 2011 was 9.4 points. The 95% confidence interval around this difference was [0.58,19.36]. The true difference between the classes is about 19 points with 95% confidence. The slope of the line relating body mass index of Miss America to year is 0.02 each contestant (on average) has 98% of the BMI of her predecessor. The 95% confidence interval around this slope is [-0.036,-0.015]. We can be confident that BMI is decreasing, and we have some uncertainty about the rate. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 6 / 18

Confidence Intervals and Accepting H 0 Two samples each have N = 100 and have means 99.79 and 100.07, and standard deviations 5.55 and 4.879 and respectively. The 95% confidence interval around the difference is [-1.18,1.73]. This is small and contains zero, so with high confidence the true difference between the samples is nearly zero. This is as close to accepting H 0 as we ever get. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 7 / 18

How to Get Confidence Intervals > t.test(scores2010,scores2011) Welch Two Sample t-test data: Scores2010 and Scores2011 t = -1.8898, df = 51.806, p-value = 0.06438 alternative hypothesis: true difference in means is not equal 95 percent confidence interval: -19.362186 0.581351 sample estimates: mean of x mean of y 71.36765 80.75806 Better answer: Understand what a CI is, then ask R or run Monte Carlo or Bootstrap Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 8 / 18

How to Get Confidence Intervals You have a statistic f and you want to infer the corresponding parameter φ: Get the sampling distribution of f The confidence interval around φ is bounded by particular quantiles of the sampling distribution. You just have to know which quantiles and how to use them. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 9 / 18

How to Get Confidence Intervals > MT370<-c(18,15.5,16.5,19.5,17,14.5,12.5,6.5,17,22,11.5,15,1 > Mean370<-mean(MT370) > sd370<-sd(mt370) > df370<-length(mt370)-1 y 0.0 0.1 0.2 0.3 0.4 The confidence interval is the 0.025 and 0.975 quantiles (dotted lines). But why? 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 10 / 18

How to Get Confidence Intervals: Intuition If the true mean were upper CI bound, then we d see the sample mean 2.5% of the time. If the true mean were the lower CI bound, then we d see the sample mean 2.5% of the time. If the true mean were between the upper and lower CI bounds, then we d see the sample mean at least 5% of the time. So with 95% confidence, the CI around the sample mean captures the true mean. y 0.0 0.1 0.2 0.3 0.4 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 11 / 18

How to Get Confidence Intervals: Math For an α/2 critical value k: Rearrange terms: Similarly: Combining these: P(x µ + k) α/2 P(µ x k) α/2 P(x µ k) = P(µ x + k) α/2 P(µ x k) or P(µ x + k) α P(x k µ x + k) α So if x k and x + k each have a p value of less than α = 0.025 then x ± k is the α confidence interval. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 12 / 18

How to Get Confidence Intervals - By hand > MT370<-c(18,15.5,16.5,19.5,17,14.5,12.5,6.5,17,22,11.5,15,1 > sd370<-sd(mt370) ; N370<-length(MT370) ; Mean370<- mean(mt3 > # Standard error of the sampling distribution: > se370<- sd370/sqrt(n370) > # Critical values of t dist with N370-1 df > lc<-qt(.025,n370-1) ; uc<-qt(.975,n370-1) > # Confidence interval: > Mean370 + (lc * se370) [1] 14.24968 > Mean370 + (uc * se370) [1] 16.83855 y 0.0 0.1 0.2 0.3 0.4 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 13 / 18

How to Get Confidence Intervals - By hand For the mean, the confidence interval is read from a t distribution: x + t crit,0.025 s.e. µ x + t crit,0.975 s.e. So for x = 15.54 and t crit,0.025 = 2.034 and t crit,0.975 = 2.034 and s.e. = 3.709/ 34 = 0.636: 15.54 + ( 2.034 0.636) µ 15.54 + (2.034 0.636) 14.249 µ 16.838 y 0.0 0.1 0.2 0.3 0.4 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 14 / 18

How to Get Confidence Intervals - Ask R > t.test(mt370) One Sample t-test data: MT370 t = 24.4313, df = 33, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 14.24968 16.83855 sample estimates: mean of x 15.54412 y 0.0 0.1 0.2 0.3 0.4 12 14 16 18 20 Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 15 / 18

How to Get Confidence Intervals - Quantiles Note that x + t crit,0.025 s.e. µ x + t crit,0.975 s.e. is just another way of asking for the 2.5 and 97.5 quantiles of the t distribution. If we got the sampling distribution by bootstrapping, then we d just read off these quantiles as the confidence interval. Why in general wouldn t we get the sampling distribution by Monte Carlo? What is a confidence interval telling you about? What do you need to get the sampling distribution by Monte Carlo? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 16 / 18

How to Get Confidence Intervals - Bootstrap The bootstrap is used frequently to estimate the standard error of the sampling distribution, in which case the confidence interval is gotten by: x + t crit,0.025 s.e. µ x + t crit,0.975 s.e. Alternatively, use the bootstrap sampling distribution directly and read off it s quantiles to get the confidence interval. > BootMT370<-replicate(1000,mean(sample(MT370,replace=TRUE))) > quantile(bootmt370,.025) 2.5% 14.29338 > quantile(bootmt370,.975) 97.5% 16.64743 Interval based on t distribution was [14.249,16.838] Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 17 / 18

How to Get Confidence Intervals - Bootstrap The real advantage of the bootstrap is that you can get confidence intervals for unconventional statistics. In a sample of N stockbrokers, you don t know how many stocks each holds, so no Monte Carlo. Each reports a proportion of their stocks up. Bootstrap confidence intervals around the MAXIMUM up of all N stockbrokers. > N<-827 ; pstockup<-.5 # For N brokers and pstockup > BrokerSample<replicate(N,GetOneStockbrokerProportionUp(pStockUp)) > BootMax<replicate(10000,max(sample(BrokerSample,N,replace=T))) > quantile(bootmax,.025) ; quantile(bootmax,.975) 2.5% 0.7115385 97.5% 0.75 Frequency 0 1000 3000 5000 0.70 0.71 0.72 0.73 0.74 0.75 Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, 2012 18 / 18 BootMax