Chapter 4 Sampling and Estimation
Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-
Sample Design Sampling Plan a description of the approach that will be used to obtain samples from a population Objectives Target population Population frame Method of sampling Operational procedures for data collection Statistical tools for analysis 4-3
Sampling Methods Subjective Judgment sampling Convenience sampling Probabilistic Simple random sampling every subset of a given size has an equal chance of being selected 4-4
Excel Data Analysis Tool Sampling Excel menu > Tools > Data Analysis > Sampling Specify input range of data Choose sampling method Select output option 4-5
Other Sampling Methods Systematic sampling Stratified sampling Cluster sampling Sampling from a continuous process 4-6
Errors in Sampling Nonsampling error Poor sample design Sampling (statistical) error Depends on sample size Tradeoff between cost of sampling and accuracy of estimates obtained by sampling 4-7
Estimation Estimation assessing the value of a population parameter using sample data. Point estimate a single number used to estimate a population parameter Confidence intervals a range of values between which a population parameter is believed to be along with the probability that the interval correctly estimates the true population parameter 4-8
Common Point Estimates 4-9
Example 4-10
Theoretical Issues Unbiased estimator one for which the expected value equals the population parameter it is intended to estimate The sample variance is an unbiased estimator for the population variance s n i1 x i x x n 1 n i1 i N 4-11
Interval Estimates Range within which we believe the true population parameter falls Example: Gallup poll percentage of voters favoring a candidate is 56% with a 3% margin of error. Interval estimate is [53%, 59%] 4-1
Confidence Intervals Confidence interval (CI) an interval estimated that specifies the likelihood that the interval contains the true population parameter Level of confidence (1 ) the probability that the CI contains the true population parameter, usually expressed as a percentage (90%, 95%, 99% are most common). 4-13
Sampling Distribution of the Mean Theory 4-14
Interval Estimate Containing the True Population Mean 4-15
Interval Estimate Not Containing the True Population Mean 4-16
Confidence Interval for the Mean Known A 100(1 )% CI is: x z / (/n) z / may be found from Table A.1 or using the Excel function NORMSINV(1-/) 4-17
Sampling From Finite Populations When n > 0.05N, use a correction factor in computing the standard error: x n N N n 1 4-18
Key Observations x z / (/n) As the confidence level (1 - ) increases, the width of the confidence interval also increases. As the sample size increases, the width of the confidence interval decreases. 4-19
Confidence Interval for the Mean, Unknown A 100(1 )% CI is: x t /,n-1 (s/n) t /,n-1 is the value from a t-distribution with n-1 degrees of freedom, from Table A. or the Excel function TINV(, n-1) 4-0
Relationship Between Normal Distribution and t-distribution The t-distribution yields larger confidence intervals for smaller sample sizes. 4-1
Data Analysis, Fall 014 Confidence Intervals for a Proportion Sample proportion: p= x/n x = number in sample having desired characteristic n = sample size The sampling distribution of p has mean and variance (1 )/n When n and n(1 ) are at least 5, the sampling distribution of p approach a normal distribution 4-
Data Analysis, Fall 014 Confidence Intervals for Proportions A 100(1 )% CI is: p z / p(1- n p) PHSta tool is available under Confidence Intervals option 4-3
Sampling Distribution of s The sample standard deviation, s, is a point estimate for the population standard deviation, The sampling distribution of s has a chi-square ( ) distribution with n-1 df See Table A.3 CHIDIST(x, deg_freedom) returns probability to the right of x CHIINV(probability, deg_freedom) returns the value of x for a specified right-tail probability 4-4
Confidence Intervals for the Variance A 100(1 )% CI is: ( n 1) s n1, /, ( n 1) s n1,1 / Note the difference in the denominators! 4-5
Confidence Intervals for Population Total s N n A 100(1 )% CI is: N x t n-1,/ N n N 1 4-6
Confidence Intervals and Decision Making Required weight for a soap product is 64 ounces. A sample of 30 boxes found a mean of 63.8 and standard deviation of 1.05. A 95% CI is [63.43, 64.1]. What conclusion can you reach? What if the standard deviation was 0.46 and the CI is [64.65, 63.99]? 4-7
Confidence Intervals and Sample Size CI for the mean, known Sample size needed for half-width of at most E is n (z / ) ( )/E CI for a proportion Sample size needed for half-width of at most E is ( z / ) (1 ) n E Use p as an estimate of or 0.5 for the most conservative estimate 4-8
Additional Types of Confidence Intervals Difference in means Independent samples with unequal variances Independent samples with equal variances Paired samples Difference in proportions 4-9
Confidence Intervals for Differences Between Means Population 1 Population Mean 1 Standard deviation 1 Point estimate x 1 x Sample size n 1 n Point estimate for the difference in means, 1, is given by x 1 -x 4-30
Independent Samples With Unequal Variances 4-31 A 100(1 )% CI is: x 1 -x (t /, df* ) 1 1 n s n s 1 ) / ( 1 ) / ( 1 1 1 1 1 n n s n n s n s n s df* = Fractional values rounded down
Example In the Accounting Professionals Excel file, find a 95 percent confidence interval for the difference in years of service between males and females. 4-3
Calculations s 1 = 4.39 and n 1 = 14 (females), s = 8.39 and n = 13 (males) df* = 17.81, so use 17 as the degrees of freedom 10.07 19.69.1098 19.71 14 70.391 13 9.6 5.498 or 15.118, 4.1 4-33
Independent Samples With Equal Variances A 100(1 )% CI is: x 1 -x (t /, n1 + n ) s p 1 n 1 1 n s p ( n 1 1) s n 1 1 ( n n 1) s where s p is a common pooled standard deviation. Must assume the variances of the two populations are equal. 4-34
Example: Accounting Professionals Data s p 14 1 4.39 13 1 8.39 14 13 6.6 10.07 19.69.05956.6 1 14 1 13 9.6 5.5 or 14.87, 4.37 4-35
Paired Samples A 100(1 )% CI is: D (t n-1,/ ) s D /n D i = difference for each pair of observations D = average of differences s D n i1 ( D i n 1 D) PHSta tool available in the Confidence Intervals menu 4-36
Example Pile Foundation A 95% CI for the average difference between the actual and estimated pile lengths is 4-37
Calculations D = 6.38 s D = 10.31 95% CI is 6.38 ± 1.96(10.31/ 311) = 6.38 ± 1.146 or [5.34, 7.56] 4-38
Data Analysis, Fall 014 Differences Between Proportions A 100(1 )% CI is: p 1 p z / p 1(1 p 1) p (1 p n n 1 ) Applies when n i p i and n i (1 p i ) are greater than 5 4-39
Example In the Accounting Professionals Excel file, the proportion of females having a CPA is 8/14 = 0.57, while the proportion of males having a CPA is 6/13 = 0.46. A 95 percent confidence interval for the difference in proportions between females and males is 4-40
Probability Intervals A 100(1 )% probability interval for a random variable X is any interval [a,b] such that P(a X b) = 1 Do not confuse a confidence interval with a probability interval; confidence intervals are probability intervals for sampling distributions, not for the distribution of the random variable. 4-41