Sampling Distributions

Similar documents
1. C. The formula for the confidence interval for a population mean is: x t, which was

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Practice Problems for Test 3

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Determining the sample size

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Confidence Intervals for One Mean

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Confidence Intervals

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

5: Introduction to Estimation

I. Chi-squared Distributions

Chapter 7: Confidence Interval and Sample Size

Math C067 Sampling Distributions

Hypothesis testing. Null and alternative hypotheses

Measures of Spread and Boxplots Discrete Math, Section 9.4

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

One-sample test of proportions

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

1 Computing the Standard Deviation of Sample Means

PSYCHOLOGICAL STATISTICS

Sampling Distribution And Central Limit Theorem

Properties of MLE: consistency, asymptotic normality. Fisher information.

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Maximum Likelihood Estimators.

Lesson 17 Pearson s Correlation Coefficient

Descriptive Statistics

Chapter 7 Methods of Finding Estimators

Statistical inference: example 1. Inferential Statistics

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Normal Distribution.

Section 11.3: The Integral Test

Topic 5: Confidence Intervals (Chapter 9)

Quadrat Sampling in Population Ecology

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Confidence intervals and hypothesis tests

Hypergeometric Distributions

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 14 Nonparametric Statistics

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

3 Basic Definitions of Probability Theory

STATISTICAL METHODS FOR BUSINESS

1 Correlation and Regression Analysis

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

CHAPTER 3 THE TIME VALUE OF MONEY

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Exploratory Data Analysis

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011


Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

CHAPTER 3 DIGITAL CODING OF SIGNALS

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

CHAPTER 11 Financial mathematics

A probabilistic proof of a binomial identity

Lesson 15 ANOVA (analysis of variance)

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Soving Recurrence Relations

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Overview of some probability distributions.

Confidence Intervals for Linear Regression Slope

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Now here is the important step

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Output Analysis (2, Chapters 10 &11 Law)

OMG! Excessive Texting Tied to Risky Teen Behaviors

A Guide to the Pricing Conventions of SFE Interest Rate Products

% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Modified Line Search Method for Global Optimization

INVESTMENT PERFORMANCE COUNCIL (IPC)

Convexity, Inequalities, and Norms

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Sequences and Series

A Mathematical Perspective on Gambling

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

A Recursive Formula for Moments of a Binomial Distribution

Department of Computer Science, University of Otago

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Systems Design Project: Indoor Location of Wireless Devices

Incremental calculation of weighted mean and variance

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Hypothesis testing using complex survey data

AP Calculus AB 2006 Scoring Guidelines Form B

3. Greatest Common Divisor - Least Common Multiple

TI-89, TI-92 Plus or Voyage 200 for Non-Business Statistics

Transcription:

Samplig Distributios Quiz Parameters ad statistics The average fuel tak capacity of all cars made by Ford is 14.7 gallos. This value represets a a) Parameter because it is a average from all possible cars. b) Parameter because it is a average from all Ford cars. c) Statistic because it is a average from a sample of all cars. d) Statistic because it is a average from a sample of America cars. 1 Parameters ad statistics The fractio of all America adults who received at least oe speedig ticket last year ca be represeted by a). b). c). d). Sample size We wish to estimate the mea price, µ, of all hotel rooms i Las Vegas. The Covetio Bureau of Las Vegas did this i 1999 ad used a sample of = 112 rooms. I order to get a better estimate of µ tha the 1999 survey, we should a) Take a larger sample because the sample mea will be closer to µ. b) Take a smaller sample sice we will be less likely to get outliers. c) Take a differet sample of the same size sice it does ot matter what is. Samplig distributio If the first graph shows the populatio, which plot could be the samplig distributio of if all samples of size = 50 are draw? a) Plot A b) Plot B c) Plot C Samplig distributio Which of the followig is true? a) The shape of the samplig distributio of is always bellshaped. b) The shape of the samplig distributio of gets closer to the shape of the populatio distributio as gets large. c) The shape of the samplig distributio of gets approximately ormal as gets large. d) The mea of the samplig distributio of gets closer to µ as gets large. 1

Samplig distributio True or false: The shape of the samplig distributio of ormal the larger your sample size is. becomes more Samplig distributio True or false: The stadard deviatio of the samplig distributio of always less tha the stadard deviatio of the populatio whe the sample size is at least 2. is a) True b) False a) True b) False Samplig distributio The theoretical samplig distributio of a) Gives the values of from all possible samples of size from the same populatio. b) Provides iformatio about the shape, ceter, ad spread of the values i a sigle sample. c) Ca oly be costructed from the results of a sigle radom sample of size. d) Is aother ame for the histogram of values i a radom sample of size. Stadard error What does measure? a) The spread of the populatio. b) The spread of the s. c) Differet values could be. Samplig distributios The followig desity curve represets waitig times at a customer service couter at a atioal departmet store. The mea waitig time is 5 miutes with stadard deviatio 5 miutes. If we took all possible samples of size = 100, how would you describe the samplig distributio of the s? a) Shape = right skewed, ceter = 5, spread = 5 b) Shape = less right skewed, ceter = 5, spread = 0.5 c) Shape = approx. ormal, ceter = 5, spread = 5 d) Shape = approx. ormal, ceter = 5, spread = 0.5 Cetral Limit Theorem Which is a true statemet about the Cetral Limit Theorem? a) We eed to take repeated samples i order to estimate µ. b) It oly applies to populatios that are Normally distributed. c) It says that the distributio of s will have the same shape as the populatio. d) It requires the coditio that the sample size,, is large ad that the samples were draw radomly. 2

Cetral Limit Theorem True or false: The Cetral Limit Theorem allows us to compute probabilities o whe the coditios are met. a) True b) False Samplig distributios The followig desity curve shows a samplig distributio of created by takig all possible samples of size = 6 from a populatio that was very left-skewed. Which of the followig would result i a decrease of the area to the left of 0.5 (deoted by the vertical lie)? a) Icreasig the umber of samples take. b) Takig a differet sample. c) Decreasig. d) Icreasig the umber of observatios i each sample. Samplig distributios What effect does icreasig the sample size,, have o the ceter of the samplig distributio of? a) The mea of the samplig distributio gets closer to the mea of the populatio. b) The mea of the samplig distributio gets closer to 0. c) The variability of the populatio mea is decreased. d) It has o effect. The mea of the samplig distributio always equals the mea of the populatio. Samplig distributios What effect does icreasig the sample size,, have o the spread of the samplig distributio of? a) The spread of the samplig distributio gets closer to the spread of the populatio. b) The spread of the samplig distributio gets larger. c) The spread of the samplig distributio gets smaller. d) It has o effect. The spread of the samplig distributio always equals the spread of the populatio. Samplig distributios What effect does icreasig the sample size,, have o the shape of the samplig distributio of? a) The shape of the samplig distributio gets closer to the shape of the populatio. b) The shape of the samplig distributio gets more bell-shaped. c) It has o effect. The shape of the samplig distributio always equals the shape of the populatio. Samplig distributios Which of the followig would result i a decrease i the spread of the approximate samplig distributio of? a) Icreasig the sample size. b) Icreasig the umber of samples take. c) Icreasig the populatio stadard deviatio d) Decreasig the value of the populatio mea. 3

Samplig distributios Time spet workig out at a local gym is ormally distributed with mea µ = 43 miutes ad stadard deviatio = 6 miutes. The gym took a sample of size = 24 from its patros. What is the distributio of? Cofidece Itervals a) Normal with mea µ = 43 miutes ad stadard deviatio = 6 miutes. b) Normal with mea µ = 43 miutes ad stadard deviatio = miutes. c) Caot be determied because the sample size is too small. 6 24 Chapter 7 20 Iferece We are i the fourth ad fial part of the course - statistical iferece, where we draw coclusios about the populatio based o the data obtaied from a sample chose from it. Cofidece Itervals (CI) The goal: to give a rage of plausible values for the estimate of the ukow populatio parameter (the populatio mea, μ,, or the populatio proportio, p) We start with our best guess: the sample statistic (the sample mea x, or the sample proportio $p ) Sample statistic = poit estimate 21 22 Poit estimate Use a sigle statistic based o sample data to estimate a populatio parameter Simplest approach But ot always very precise due to variatio i the samplig distributio Cofidece Itervals (CI) CI = poit estimate ± margi of error Margi of error Margi of error { 23 24 4

Margi of error Shows how accurate we believe our estimate is The smaller the margi of error, the more precise our estimate of the true parameter Formula: critical E = value stadard deviatio of the statistic Cofidece Itervals (CI) for a Mea Suppose a radom sample of size is take from a ormal populatio of values for a quatitative variable whose mea µ is ukow,, whe the populatio s stadard deviatio is kow. A cofidece iterval (CI) for µ is: CI = poit estimate ± margi of error x ± z * Poit estimate Margi of error (m ( or E) 26 Cofidece iterval for a populatio mea: estimate Critical value x ± z * Stadard deviatio of the statistic Margi of error Cofidece level A cofidece iterval is associated with a cofidece level. The cofidece level gives us the success rate of the procedure used to costruct the CI. We will say: the 95% cofidece iterval for the populatio mea is The most commo choices for a cofidece level are 90% (z* = 1.645), 95% (z* = 1.96, ad 99% (z* = 2.576). 28 Statemet: (memorize!!) We are % cofidet that the true mea cotext lies withi the iterval ad. Usig the calculator Calculator: STAT TESTS 7:ZIterval Ipt: Data Stats Use this whe you have data i oe of your lists Use this whe you kow ad x 30 5

The Trade-off There is a trade-off betwee the level of cofidece ad precisio i which the parameter is estimated. higher level of cofidece -- wider cofidece iterval lower level of cofidece arrower cofidece iterval 95% cofidet meas: I 95% of all possible samples of this size, µ will ideed fall i our cofidece iterval. I oly 5% of samples would miss µ. 31 The Margi of Error The width (or legth) of the CI is exactly twice the margi of error (E): E E Commet The margi of error (E ) is E = z * The margi of error is therefore "i charge" of the width of the cofidece iterval. E ad sice, the sample size, appears i the deomiator, icreasig will reduce the margi of error for a fixed z*. 33 34 How ca you make the margi of error smaller? z* smaller (lower cofidece level) smaller (less variatio i the populatio) larger (to cut the margi of Really error i caot half, must be 4 times as big) chage! Margi of Error ad the Sample Size I situatios where a researcher has some flexibility as to the sample size, the researcher ca calculate i advace what the sample size is that he/she eeds i order to be able to report a cofidece iterval with a certai level of cofidece ad a certai margi of error. 36 6

Calculatig the Sample Size E = z * = z * 2 E Clearly, the sample size must be a iteger. Calculatio may give us a o-iteger result. I these cases, we should always roud up to the ext highest iteger. Example IQ scores are kow to vary ormally with stadard deviatio 15. How may studets should be sampled if we wat to estimate populatio mea IQ at 99% cofidece with a margi of error equal to 2? 2 2 = z = E * 2. 576 15 = 37326. = 374 2 They should take a sample of 374 studets. 37 38 Assumptios for the validity of The sample must be radom x ± z * The stadard deviatio,, is kow ad either the sample size must be large ( 30) or for smaller sample the variable of iterest must be ormally distributed i the populatio. Steps to follow 1. Check coditios: SRS, is kow, ad either 30 or the populatio distributio is ormal 2. Calculate the CI for the give cofidece level 3. Iterpret the CI 39 40 Example 1 A college admissios director wishes to estimate the mea age of all studets curretly erolled. I a radom sample of 20 studets, the mea age is foud to be 22.9 years. Form past studies, the stadard deviatio is kow to be 1.5 years ad the populatio is ormally distributed. Costruct a 90% cofidece iterval of the populatio mea age. Step 1: Check coditios A college admissios director wishes to estimate the mea age of all studets curretly erolled. I a radom sample of 20 studets, the mea age is foud to be 22.9 years. Form past studies, the stadard deviatio is kow to be 1.5 years ad the populatio is ormally distributed. SRS is kow The populatio is ormally distributed 41 42 7

Step 2: Calculate the 90% CI usig the formula x = 22. 9 = 15. = 20 z * = 1645.. x ± z * = 22. 9 ± 1645. 15 = 22. 9 ± 0. 6 = ( 22. 3, 235. ) 20 Step 2: Calculate the 90% CI usig the calculator Calculator: STAT TESTS 7:ZIterval Ipt: Data Stats = 1.5 x = 22.9 = 20 C-Level:.90 Calculate ZIterval : (22.3, 23.5) 43 44 Step 3: Iterpretatio We are 90% cofidet that the mea age of all studets at that college is betwee 22.3 ad 23.5 years. Example 1 How may studets should he ask if he wats the margi of error to be o more tha 0.5 years with 99% cofidece? 2 2 = z = E. * 2. 576 15 = 59. 72 05. Thus,, he eeds to have at least 60 studets i his sample. 45 46 Example 2 A scietist wats to kow the desity of bacteria i a certai solutio. He makes measuremets of 10 radomly selected sample: 24, 31, 29, 25, 27, 27, 32, 25, 26, 29 *10 6 bacteria/ml. From past studies the scietist kows that the distributio of bacteria level is ormally distributed ad the populatio stadard deviatio is 2*10 6 bacteria/ml. a. What is the poit estimate of μ? =27.5 *10 6 bacteria/ml. x Example 2 b. Fid the 95% cofidece iterval for the mea level of bacteria i the solutio. Step 1: check coditios: SRS, ormal distributio, is kow. All satisfied. Step 2: CI: 2 x ± z* = 27.5 ± 1.96 = 27.5 ± 1.24 = (26.26,28.74) 10 Step 3: Iterpret: we are 95% cofidet that the mea bacteria level i the whole solutio is betwee 26.26 ad 28.74 *10 6 bacteria/ml. 48 8

Example 2 Usig the calculator: Eter the umber ito o of the lists, say L1 STAT TESTS 7: ZIterval Ipt: Data : 2 List: L1 Freq: 1 (it s always 1) C-Level:.95 Calculate (26.26, 28.74) Example 2 c. What is the margi of error? From part b: 2 x ± z* = 27.5 ± 1.96 = 27.5 ± 1.24 = (26.26,28.74) 10 Thus, the margi of error is E=1.24 *10 6 bacteria/ml. 49 50 Example 2 d. How may measuremets should he make to obtai a margi of error of at most 0.5*10 6 bacteria/ml with a cofidece level of 95%? = z 2 6 * = 196. 2 10 = 614656. 6 E 05. 10 Thus, he eeds to take 62 measuremets. 2 Assumptios for the validity of x ± z * The sample must be radom The stadard deviatio,, is kow ad either The sample size must be large ( 30) or For smaller sample the variable of iterest must be ormally distributed i the populatio. The oly situatio whe we caot use this cofidece iterval, the, is whe the sample size is small ad the variable of iterest is ot kow to have a ormal distributio. I that case, other methods called oparameteric methods eed to be used. 51 52 Example 3 I a radomized comparative experimet o the effects of calcium o blood pressure, researchers divided 54 healthy, white males at radom ito two groups, takes calcium or placebo. The paper reports a mea seated systolic blood pressure of 114.9 with stadard deviatio of 9.3 for the placebo group. Assume systolic blood pressure is ormally distributed. Ca you fid a z-iterval for this problem? Why or why ot? BUT what if <30 ad is ukow? Well, there is some good ews ad some bad ews! The good ews is that we ca easily replace the populatio stadard deviatio,, with the sample stadard deviatio s. 54 9

Ad the bad ews is that oce has bee replaced by s, we lose the Cetral Limit Theorem together with the ormality of X ad therefore the cofidece multipliers z* for the differet levels of cofidece are (geerally) ot accurate ay more. The ew multipliers come from a differet distributio called the "t distributio" ad are therefore deoted by t* (istead of z*). CI for the populatio mea whe <30 ad is ukow The cofidece iterval for the populatio mea µ whe <30 is ukow is therefore: x ± t * s 55 56 z* vs. t* There is a importat differece betwee the cofidece multipliers we have used so far (z*) ad those eeded for the case whe is ukow (t*). z*, depeds oly o the level of cofidece, t* deped o both the level of cofidece ad o the sample size (for example: the t* used i a 95% cofidece whe =10 is differet from the t* used whe =40). t-distributio There is a differet t distributio for each sample size. We specify a particular t distributio by givig its degrees of freedom. The degrees of freedom for the oe-sample t statistic come from the sample stadard error s i the deomiator of t. Sice s has -1 degrees of freedom, the t- distributio has -1 degrees of freedom. 57 58 t-distributio The t-distributio is bell shaped ad symmetric about the mea. The total area uder the t-curve is 1 The mea, media, ad mode of the t-distributio are equal to zero. The tails i the t-distributio are thicker tha those i the stadard ormal distributio. As the df (sample size) icreases, the t-distributio approaches the ormal distributio. After 29 df the t- distributio is very close to the stadard ormal z- distributio. Historical Referece William Gosset (1876-1937) developed the t-distributio while employed by the Guiess Brewig Compay i Dubli, Irelad. Gosset published his fidigs usig the ame Studet. The t- distributio is, therefore, sometimes referred to as Studet s t-distributio. 59 60 10

Desity of the t-distributio (red ad gree) for 1, 2, 3, 5, 10, ad 30 df compared to ormal distributio (blue) Calculator Calculator: STAT TESTS 8:TIterval Ipt: Data Stats Use this whe you have data i oe of your lists Use this whe you kow ad s x 61 62 Example To study the metabolism of isects, researchers fed cockroaches measured amouts of a sugar solutio. After 2, 5, ad 10 hours, they dissected some of the cockroaches ad measured the amout of sugar i various tissues. Five roaches fed the sugar solutio ad dissected after 10 hours had the followig amouts of sugar i their hidguts: Example 55.95, 68.24, 52.73, 21.50, 23.78 Fid the 95% CI for the mea amout of sugar i cockroach hidguts: x = 44. 44 s = 20. 741 The degrees of freedom, df=-1=4, ad from the table we fid that for the 95% cofidece, t*=2.776. The s 20. 741 x ± t * = 44. 44 ± 2. 776 = ( 18. 69, 7019. ) 5 63 64 65 Example The large margi of error is due to the small sample size ad the rather large variatio amog the cockroaches. Calculator: Put the data i L 1. STAT TESTS 8:TIterval Ipt: Data Stats List: L 1 Freq:1 C-level:.95 66 Is 30? No Is the populatio ormally, or approximately ormally distributed? Yes Is kow? No Yes Use the t-distributio with s x ± t * Ad -1 degrees of freedom. No Yes Use the ormal distributio with x ± z * if is ukow, use s istead You caot use the ormal distributio or the t-distributio Use the ormal distributio with x ± z * 11

Examples: You take: 24 samples, the data are ormally distributed, is kow x ± z * ormal distributio with 14 samples, the data are ormally distributed, is ukow s x ± t * t-distributio with s 34 samples, the data are ot ormally distributed, is s ukow x ± z * ormal distributio with s 12 samples; the data are ot ormally distributed, is ukow caot use the ormal distributio or the t-distributio Some Cautios: The data MUST be a SRS from the populatio The formula is ot correct for more complex samplig desigs, i.e., stratified, etc. No way to correct for bias i data Outliers ca have a large effect o cofidece iterval Must kow to do a z-iterval which is urealistic i practice 67 Estimatig a Populatio Proportio Whe the variable of iterest is categorical, the populatio parameter that we will ifer about is a populatio proportio (p) associated with that variable. For example, if we are iterested i studyig opiios about the death pealty amog U.S. adults, ad thus our variable of iterest is "death pealty (i favor/agaist)," we'll choose a sample of U.S. adults ad use the collected data to make iferece about p - the proportio of US adults who support the death pealty. Example 2 Suppose that we are iterested i the opiios of U.S. adults regardig legalizig the use of marijuaa. I particular, we are iterested i the parameter p, the proportio of U.S. adults who believe marijuaa should be legalized. Suppose a poll of 1000 U.S. adults fids that 560 of them believe marijuaa should be legalized. 69 70 Example 2 If we wated to estimate p, the populatio proportio by a sigle umber based o the sample, it would make ituitive sese to use the correspodig quatity i the sample, the sample proportio $p = 560/1000 = 0.56. We say i this case that 0.56 is the poit estimate for p, ad that i geeral, we'll always use $p as the poit estimator for p. Note, agai, that whe we talk about the specific value (.56), we use the term estimate, ad whe we talk i geeral about the statistic we use the term estimator. Here is a visual summary of this example: Example 2 71 72 12

Back to Example 2 Suppose a poll of 1000 U.S. adults fids that 560 of them believe marijuaa should be legalized. The CI for p Thus, the cofidece iterval for p is p$( 1 p$) p$ ± E = p$ ± z * For a 95% CI use z*=1.96 For a 90% CI use z*=1.645 For a 99% CI use z*=2.576 73 74 Calculator: STAT TESTS A:1-PropZIt x is the umber of successes: x = p$ Coditios The CI is reasoably accurate whe three coditios are met: The sample was a simple radom sample (SRS) from a biomial populatio Both p$ 10 ad ( 1 p$) 10 The size of the populatio is at least 10 times the size of the sample 75 76 Example Suppose you have a radom sample of 40 buses from a large city ad fid that 24 buses have a safety violatio. Fid the 90% CI for the proportio of all buses that have a safety violatio. Coditios: SRS both ad 24 p$ = 40( ) = 24 10 40 ( 1 p$) = 40( 1 24 ) = 16 10 The size of the populatio (all the buses) is at least 10 times the size of the sample (40) 40 90% CI 24 p $ = = 0. 6 40 For 90% CI z*=1.645 p$( 1 p$) p$ ± E = p$ ± z * = 0. 6 ± 1645. 0. 6( 1 0. 6) = 06. ± 013. = ( 0. 47, 073. ) 40 77 78 13

Iterpretatio 1. What is it that you are 90% sure is i the cofidece iterval? The proportio of all of the buses i this populatio that have safety violatios if we could check them all. 2. What is the meaig (or iterpretatio) of the cofidece iterval of 0.47 to 0.73? We are 90% cofidet that if we could check all of the buses i this populatio, betwee 47% ad 73% of them would have safety violatios. 3. What is the meaig of 90% cofidece? If we took 100 radom samples of buses from this populatio ad computed the 90% cofidece iterval from each sample, the we would expect that 90 of these itervals would cotai the proportio of all buses i this populatio that have safety violatios. I other words, we are usig a method that captures the true populatio proportio 90% of the time. Margi of Error ad Sample Size Whe we have some level of flexibility i determiig the sample size, we ca set a desired margi of error for estimatig the populatio proportio ad fid the sample size that will achieve that. For example, a fial poll o the day before a electio would wat the margi of error to be quite small (with a high level of cofidece) i order to be able to predict the electio results with the most precisio. This is particularly relevat whe it is a close race betwee the cadidates. The pollig compay eeds to figure out how may eligible voters it eeds to iclude i their sample i order to achieve that. Let's see how we do that. 79 80 Margi of Error ad Sample Size The cofidece iterval for p is p$( 1 p$) p$ ± E = p$ ± z * p$( 1 p$) E = z * Thus, the margi of error is Usig some algebra we have z = * E 2 p$( 1 p$) z = * E p$( 1 p$) If you have a good estimate $p of p, use it i this formula, otherwise take the coservative approach by settig $p = 1 2. You have to decide o a level of cofidece so you kow what value of z* to use (most commo oe is the 95% level). Also, obviously, you have to set the margi of error (the most commo oe is 3%). 2 81 82 What sample size should we use for a survey if we wat a margi of error to be at most 3%? Let s use the 95% cofidece here, so z*=1.96. Also, sice we do t have a estimate of p, we will use p $ = 05..The 2 2 z = p p * = E 196. $( 1 $) ( 05. )( 1 05. ) = 1067111. 0. 03 Because you must have a sample size of at least 1067.111, roud up to 1068. So should be at least 1068. Summary: CI for a populatio proportio a populatio mea, 30 is kow/ is ukow a populatio mea, is ukow ad <30 p$( 1 p$) p$ ± z * x ± z * s x ± z * s x ± t * 83 84 14