Significance, Meaning and Confidence Intervals

Size: px

Start display at page:

Download "Significance, Meaning and Confidence Intervals"

Shanon Phillips
9 years ago
Views:

1 Significance, Meaning and Confidence Intervals Paul Cohen ISTA 370 April, 2012 Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

2 Significance vs. Meaning Significance isn t Importance You can usually get a significant result with a big sample; Saying a result is statistically significant only matters if it also is important or meaningful or interesting; p values measure significance, what measures importance or meaning? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

3 Significance vs. Meaning Importance and Effect Size Only you can decide whether a result is important or meaningful. Effect size can help. Recall that our test statistic almost always has the form: Effect size is just SampleStatistic PopulationParameterUnderH 0 SampleStandardDeviation/ N SampleStatistic PopulationParameterUnderH 0 SampleStandardDeviation Effect size is the effect expressed in standard deviation units, so that effects across experiments are comparable. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

4 Significance vs. Meaning Significance Tells You What a Parameter is Not Significance says H 0 is probably false; Significance tells you that a sample comes from a population that does not have the H 0 parameter value Significance tells you what the parameter probably isn t, what tells you what it probably is? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

5 Confidence Intervals Wouldn t it be nice to say, I drew a sample of size N, and the statistic value for that sample is f, so I can infer that in the population the corresponding parameter, φ, is bounded by an interval g(f ) φ g(f ) with high probability. The expression g(f ) φ g(f ) is a confidence interval Confidence intervals put probabilities on estimates of population parameters, given sample statistics. Intervals that may contain the popula/on parameter Sample Sta/s/c Confidence 70% 80% 95% Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

6 Examples of Confidence Intervals The average midterm grade in ISTA 370 was with a standard deviation of The 95% confidence interval around this mean grade is [14.25,16.83]. The mean difference between ISTA100 scores in 2010 and 2011 was 9.4 points. The 95% confidence interval around this difference was [0.58,19.36]. The true difference between the classes is about 19 points with 95% confidence. The slope of the line relating body mass index of Miss America to year is 0.02 each contestant (on average) has 98% of the BMI of her predecessor. The 95% confidence interval around this slope is [-0.036,-0.015]. We can be confident that BMI is decreasing, and we have some uncertainty about the rate. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

7 Confidence Intervals and Accepting H 0 Two samples each have N = 100 and have means and , and standard deviations 5.55 and and respectively. The 95% confidence interval around the difference is [-1.18,1.73]. This is small and contains zero, so with high confidence the true difference between the samples is nearly zero. This is as close to accepting H 0 as we ever get. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

8 How to Get Confidence Intervals > t.test(scores2010,scores2011) Welch Two Sample t-test data: Scores2010 and Scores2011 t = , df = , p-value = alternative hypothesis: true difference in means is not equal 95 percent confidence interval: sample estimates: mean of x mean of y Better answer: Understand what a CI is, then ask R or run Monte Carlo or Bootstrap Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

9 How to Get Confidence Intervals You have a statistic f and you want to infer the corresponding parameter φ: Get the sampling distribution of f The confidence interval around φ is bounded by particular quantiles of the sampling distribution. You just have to know which quantiles and how to use them. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

10 How to Get Confidence Intervals > MT370<-c(18,15.5,16.5,19.5,17,14.5,12.5,6.5,17,22,11.5,15,1 > Mean370<-mean(MT370) > sd370<-sd(mt370) > df370<-length(mt370)-1 y The confidence interval is the and quantiles (dotted lines). But why? Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

11 How to Get Confidence Intervals: Intuition If the true mean were upper CI bound, then we d see the sample mean 2.5% of the time. If the true mean were the lower CI bound, then we d see the sample mean 2.5% of the time. If the true mean were between the upper and lower CI bounds, then we d see the sample mean at least 5% of the time. So with 95% confidence, the CI around the sample mean captures the true mean. y Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

12 How to Get Confidence Intervals: Math For an α/2 critical value k: Rearrange terms: Similarly: Combining these: P(x µ + k) α/2 P(µ x k) α/2 P(x µ k) = P(µ x + k) α/2 P(µ x k) or P(µ x + k) α P(x k µ x + k) α So if x k and x + k each have a p value of less than α = then x ± k is the α confidence interval. Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

13 How to Get Confidence Intervals - By hand > MT370<-c(18,15.5,16.5,19.5,17,14.5,12.5,6.5,17,22,11.5,15,1 > sd370<-sd(mt370) ; N370<-length(MT370) ; Mean370<- mean(mt3 > # Standard error of the sampling distribution: > se370<- sd370/sqrt(n370) > # Critical values of t dist with N370-1 df > lc<-qt(.025,n370-1) ; uc<-qt(.975,n370-1) > # Confidence interval: > Mean370 + (lc * se370) [1] > Mean370 + (uc * se370) [1] y Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

14 How to Get Confidence Intervals - By hand For the mean, the confidence interval is read from a t distribution: x + t crit,0.025 s.e. µ x + t crit,0.975 s.e. So for x = and t crit,0.025 = and t crit,0.975 = and s.e. = 3.709/ 34 = 0.636: ( ) µ ( ) µ y Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

15 How to Get Confidence Intervals - Ask R > t.test(mt370) One Sample t-test data: MT370 t = , df = 33, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates: mean of x y Sampling Distribution of Mean Midterm Score Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

16 How to Get Confidence Intervals - Quantiles Note that x + t crit,0.025 s.e. µ x + t crit,0.975 s.e. is just another way of asking for the 2.5 and 97.5 quantiles of the t distribution. If we got the sampling distribution by bootstrapping, then we d just read off these quantiles as the confidence interval. Why in general wouldn t we get the sampling distribution by Monte Carlo? What is a confidence interval telling you about? What do you need to get the sampling distribution by Monte Carlo? Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

17 How to Get Confidence Intervals - Bootstrap The bootstrap is used frequently to estimate the standard error of the sampling distribution, in which case the confidence interval is gotten by: x + t crit,0.025 s.e. µ x + t crit,0.975 s.e. Alternatively, use the bootstrap sampling distribution directly and read off it s quantiles to get the confidence interval. > BootMT370<-replicate(1000,mean(sample(MT370,replace=TRUE))) > quantile(bootmt370,.025) 2.5% > quantile(bootmt370,.975) 97.5% Interval based on t distribution was [14.249,16.838] Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18

18 How to Get Confidence Intervals - Bootstrap The real advantage of the bootstrap is that you can get confidence intervals for unconventional statistics. In a sample of N stockbrokers, you don t know how many stocks each holds, so no Monte Carlo. Each reports a proportion of their stocks up. Bootstrap confidence intervals around the MAXIMUM up of all N stockbrokers. > N<-827 ; pstockup<-.5 # For N brokers and pstockup > BrokerSample<replicate(N,GetOneStockbrokerProportionUp(pStockUp)) > BootMax<replicate(10000,max(sample(BrokerSample,N,replace=T))) > quantile(bootmax,.025) ; quantile(bootmax,.975) 2.5% % 0.75 Frequency Paul Cohen ISTA 370 () Significance, Meaning and Confidence Intervals April, / 18 BootMax

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n