Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Size: px
Start display at page:

Download "Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)"

Transcription

1 Objectives 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) Statistical confidence (CIS gives a good explanation of a 95% CI) Confidence intervals. Further reading Choosing the sample size t distributions. Further reading One-sample t confidence interval for a population mean How confidence intervals behave

2 Overview of Inference Sample population, and sample mean population mean µ. But we do not know the value of µ, and if we want to make any conclusions about µ then we have to use to do so. x x Methods for drawing conclusions about a population from sample data are called statistical inference. There are two main types of inference: Confidence Intervals - estimating the value of a population parameter, and Tests of Significance - assessing evidence for a claim (hypothesis) about a population. Inference is appropriate when data are produced by either a random sample or a randomized experiment.

3 Introducing con4idence intervals q It is very unlikely that the sample mean based on a sample will ever equal the true mean. Our aim is to construct an interval around the sample mean which is `likely to contain the mean. This is called a confidence interval. In the first lecture we considered a Gallop poll for the proportion of the electorate that would vote for Obama. Gallup predicted that the Obama vote would be in the interval [45%,51%] with 95% confidence. The Obama vote turned out to be 50.5%, so the interval did capture the true proportion. You may be asking yourself how do we understand 95%, since 50.5% lies in this interval, there does not appear to be any uncertainty in it. In the next few slides, our objective is to understand how a confidence interval is constructed and how to interpret it.

4 Review: properties of the sample mean The sample mean is a unique number for any particular sample. If you had obtained a different sample (by chance) you almost certainly would have had a different value for your sample mean. x In fact, you could get many different values for the sample mean, and virtually none of them would actually equal the true population mean, µ.

5 In Chapter 4, we learnt that if a random variable was normally distributed with µ and standard deviation σ then 95% probability it will lie in the interval Now our focus is on the sample mean it has mean µ and standard error σ/ n (chapter 5), thus there is 95% probability that it lies in interval apple µ 1.96 p n,µ+1.96 p n But the mean is unknown, our objective is to locate the true mean based on the sample mean. [µ 1.96,µ ] To do this we turn the story around, if the sample mean lies in the interval apple µ 1.96 p n,µ+1.96 p n This is the same as saying the mean µ lies in the interval [sample mean 1.96 σ/ n, sample mean σ/ n]. q Thus 95% of the time, the true mean (that we want to estimate) will be in the interval (this is called a confidence interval): apple sample mean (average) 1.96 p, sample mean (average) p n n

6 Case 1: Normal data sample size one Human heights are approximately a normal distribution. The standard deviation of a human height is 3.8 inches. Our objective is to construct a confidence interval for the mean height. We start with the less than ideal situation that we only have a sample size one (just observation!). In this case the standard error is 3.8/ 1 = 3.8 (the regular standard deviation). We know that the observation is normally distributed, so it is straightforward to construct the 95% confidence interval for the mean height using just one randomly selected height is: [height , height ] = [height 7.44, height ]. Construct an interval using your height. A large amount of data on heights has been collected and it is known that the mean height of a person is about 67 inches. Does your interval contain the mean? Most of you will contain the mean, 67 inches. Those of you whose height is in the extremes (very tall or small more than 1.96 standard deviations from 67) will have an interval that won t contain 67 inches.

7 Because the sampling distribution of distribution, by a factor of n, the estimates tend to be closer to the population x parameter µ than individual observations are. x is narrower than the population n Sample means, n subjects σ n x Population, x individual subjects σ µ If the population is normally distributed N(µ,σ), the sampling distribution is N(µ,σ/ n),

8 Case 1: Normal data sample size three Again we estimate the mean human height, but this time taken from a random sample of three people. Recall, the standard deviation of a human height is 3.8 inches. If the sample size is 3, the standard error of an average based on three is 3.8/ 3 = As each randomly selected height is normally distributed, so is the average based on three (recall Chapter 5): The 95% confidence interval is apple X N( µ, p 3.8 ) 3 {z}?? X 1.96 p 3.8, X p Given any random sample of size three we take its average and plug it in.

9 Here we illustrate the height example. q In the shot on the right we draw a sample of size three from the population of all heights. The average (sample mean) is evaluated. q This average corresponds to one of the green dots on the lower right plot. The green lines is the confidence interval centered about the average. q We did this 100 times and 96 of the intervals contain the true mean 67. If the sample mean is normally distributed and the 100 samples were calculated and for each sample a 95% CI was evaluated, about 95 would contain the true mean of 67. In reality only have one CI; we are 95% confident it contains mean.

10 Observations We see that the length of confidence interval when using just one person in the sample is = 14.88, this is quite long, and does not really allow us to pinpoint the mean. Whereas the length of the confidence interval using three people is only / 3 = 14.88/ 3 If ten people were used to calculate the sample mean the corresponding interval length would be 14.88/ 10 = 4.7. We see that for any given interval either the mean is in this interval or not. The 95% comes into play when we look at the proportion of intervals that contain the mean. In reality: We do not know the true mean µ, so will never know whether the interval contained the mean or not. We only observe one sample of size n, and thus have one CI. This is why we say with 95% confidence the mean lies in it.

11 Case 2: Skewed data sample size 3 In the previous example we looked at height data which tends to be normal. In this example we consider Right skewed data, which is NOT normal examples include, House prices, Salaries etc. We randomly draw a sample of size 3 from a right skewed distribution with mean 14 and standard deviation The sample/mean average has a mean which is 14 and standard deviation which is 10.7/ 3 = We construct a 95% confidence interval to locate the mean, apple X p 3, X p 3 The confidence interval is constructed under the assumption that the sample mean is normal. In the next slide we investigate how this influences the `quality of the confidence interval.

12 q We draw three samples from this skewed distribution and take the average. q The average corresponds to one of the green dots on the plot below. We construct a 95% interval. q We see that only 93 of the intervals contain the mean. The reason for the difference between the 95% and 93 (though not much) can be found in the green plot of the sample mean. It is slightly skewed and clearly not normal. The sample size is not large enough for the CLT to work. We do not have 95% confidence in this 95% confidence interval.

13 Case 2: Skewed data sample size 50 In the previous example, it was clear that we did not have the full 95% confidence in the 95% confidence interval we had constructed. This was because the sample mean was not normal. We need to be careful when constructing confidence intervals using small sample sizes because the normality assumption may not hold this means our interval is not as reliable as we think it is. If the sample size is sufficiently large then we recall from Chapter 5 that the corresponding sample size will be close to normal. This means that a 95% confidence interval will actually be a 95% confidence interval. In this next slide we look at the reliability of the 95% CI (where the data is sampled from a skewed distribution): apple X p 50, X p 50

14 We observe that the sample mean based on a sample of 50 appears close to normal (though it needs to checked with a QQplot). The `coverage of the confidence interval (at least over these 100 realizations) is `about 95%. We can `safely say that we have 95% confidence in the 95% confidence interval. To summarize a 95% confidence interval is an interval where we are 95% confident it contains the mean (note for any given interval the mean is either there or not so no probability).

15 Implications We do not need to (and cannot, anyway) take a lot of random samples to rebuild the sampling distribution and find µ at its center. n Sample Population µ n All we need is one SRS of size n and we can rely on the properties of the sampling distribution to infer reasonable values for the population mean µ.

16 Multiple samples revisited With 95% confidence, we can say that µ should be within 1.96 standard deviations (1.96 σ/ n) from our sample mean. In 95% of all possible samples of this size n, µ will indeed fall in our confidence interval. In only 5% of samples will farther from µ. be Confidence = the proportion of possible samples that give us a correct conclusion. x x σ n

17 Calculation practice 1 You want to rent an unfurnished one-bedroom apartment in Dallas. The mean monthly rent for 10 randomly sampled apartments is 980 dollars. Assume that monthly rents follow a normal distribution with standard deviation 280 dollars. Question: Construct a 95% confidence interval for the mean monthly rent of a one-bedroom apartment. Answer: The standard error for the sample mean is 280/ 10 = The 95% CI is [980 ± ] = [806,1153]. With 95% confidence we believe the mean price of one-bedroom apartments in Dallas lies in this interval.

18 q q Question Does the above confidence interval mean that 95% of all rents should lie in this interval? Answer: No, this is confidence interval for the mean not the apartment price. An interval where 95% of apartment prices will lie is [980 ±1.96( )] = [257,1720]. You do not have to understand this calculation, but you will notice this interval is much wider. The reason is that it must capture 95% of all rents, which are extremely varied. This interval will not get narrower as the sample size grows. The CI for the mean is suppose to capture the mean rent, this interval is far narrower and will get narrower as the sample size grows. Question A relator wants to know if the mean price of one bedroom apartments in Dallas is more than 1100 dollars a month. Based on the confidence interval for the mean, what can you say? Answer We showed that the 95% confidence interval for the mean is [806,1153] dollars. As this interval contains both values above and below 1100 dollars, we do not know. We do not have enough data to answer her question.

19 Calculation practice 2 Hypokalemia is diagnosed when the blood potassium level is below 3.5mEq/dl. The potassium in a blood sample varies from sample to sample and follows a normal distribution with unknown mean but standard deviation 0.2. A patient s potassium is measured taken over 4 days. The sample over 4 days is 3, 3.5, 3.9, 4.4, its sample mean is 3.7. q Question: Construct a 95% confidence interval for the mean potassium and discuss whether the patient is likely to be diagnosed with Hypokalemia. Answer: The standard error for the sample mean is 0.2/ 4 = 0.1. Thus the 95% confidence interval for the mean potassium level is [3.7± ] = [3.504,3.894]. This means with 95% confidence we believe the mean lies in this interval. Since 3.5 or less does not lie in this interval, it suggests that the patient does not have lower potassium. There is a precise way of answer this specific problem which we discuss in Chapter 7 (called statistical testing).

20 Con4idence interval misunderstandings Suppose 400 alumni were asked to rate the University of Okoboji counseling services from a scale 1 to 10. The sample mean was found to be 8.6 and it is known that the standard deviation is σ=2. Ima Bitlost has done the analysis, but has made some mistakes. Ima computes the 95% CI interval for the mean satisfaction score as [8.6±1.96 2]. What is her mistake? Ima has not taken into account that the sample mean has a much smaller standard deviation (standard error) than the population. The standard error is 2/ 400 = 0.1. Thus the true CI is [8.6± ] = [8.4,8.796]. After correcting her mistake, she states that I am 95% confident that the sample mean lies in the interval [8.4,8.796] What is wrong with her statement? This is a meaningless statement, for sure the sample mean lies in this interval! It is the population mean that we are 95% confident lies there.

21 She quickly realizes her mistake and instead states the probability that the mean lies in the interval [8.4,8.796] is 95%, what misinterpretation is she making now? By 95%, we mean that if we repeated the experiment many times over about 95% of the time the intervals will contain the mean. For any given interval the mean is either in there or not. There is no probability attached to it. To overcome, this issue we say that with we have 95% confidence in the mean lies in this interval. Finally, in her defense for using the normal distribution to determine the confidence coefficient (1.96) she says Because the sample size is quite large, the population of alumni ratings will be close to normal. Explain to Ima her misunderstanding. The distribution of the population always stays the same, regardless of the sample size (in this case, it is clear that variables that take integer values between 1 to 10 cannot be normal). However, the sample mean does get closer to normal as the sample size grow. With a sample size of 400, the distribution of the sample mean will be very close to normal.

22 Different levels of con4idence There is no need to restrict ourselves to 95% confidence intervals. The level of confidence we use really depends on how much confidence we want. For example, you would expect a 99% confidence interval is more likely to contain the mean than a 95% confidence interval. To construct a 99% confidence interval we use exactly the same prescription as used to construct a 95% confidence interval, the only thing that changes is 1.96 goes to 2.57 (if you look up in the z- tables you will see this corresponds to 0.5%, so 99% of the time the sample mean will lie within 2.57 standard errors from the mean). q A 99% CI for the mean one-bedroom apartment price is [980± ]. Length of interval is A 90% CI for the mean one-bedroom apartment price is [980± ]. Length of interval is What does a 100% confidence interval look like? In a 100% CI we are sure to find the mean, but this interval is so wide it is not informative.

23 Sample size and length of the CI Let us return to the apartment example. We recall that the 95% confidence interval for the mean price is [980 ± ] = [806,1153]. The length of this interval is = 347. Question: Suppose I take a SRS of 100 apartments in Dallas, the sample mean based on this sample is 1000, what will the CI be? Answer: The standard error is 280/ 100 = 28 (much smaller than when the sample size is 10), and the CI is [1000 ± ]. The length of this interval is =109. What we observe is: The length of the interval does not depend on the sample mean, this is just the centralizing factor. It only depends on (i)1.96, (ii) the standard deviation and (iii) the sample size. The length of the interval gets smaller as the sample size increases. If we want the interval to have a certain length, we can choose the sample size accordingly.

24 How large an interval q q q q You read in a newspaper that The proportion of the public that supports gay marriage is now 55%±15%. This means a survey was done, the proportion in the survey who supported gay marriage was 55% and that confidence interval for the population proportion is [55-15,55+15]% = [40%,70%]. This is an extremely large interval, it is so wide, that it is really not that informative about the opinion of the public. As we will see on the next slide, the reason it is too wide is that the sample size is too small. This experiment was not designed well. Typically, before data is collected, we need to decide how large a sample to collect. This is usually done by deciding how much `above and below the estimator seems reasonable. For example, [55-3,55+3]% = [52,58]% is more information. The 3% is known as a margin of error. Given a certain margin of error we can then determine the sample size (see formula on next page).

25 Margin of Error Margin of error is the lingo used for the plus and minus part in the confidence interval. That is the confidence interval is [sample mean±1.96 σ/ n], the margin of error is 1.96 σ/ n. q For example, in the previous example the margin of error for the CI based on 10 apartments is q The margin of error for the CI based on 100 apartments is q The margin of error in some sense, is a measure of reliability. For a given confidence level, the smaller the margin error the more precisely we can pinpoint the true mean. q Suppose we want the margin or error to be equal to some value, then we can find the sample size such that we obtain that margin of error. Solve for n the equation MoE = 1.96 σ/ n (the Margin of Error and the standard deviation σ are given): n = (1.96 σ/moe) 2 q See the next few slides for examples.

26 Calculation practice In a study of bone turn over in young women with a medical condition, serum TRAP was measured in 31 subjects. The sample mean was 13.2 units per liter. Assume the standard deviation is known to be 6.5U/l. q Question: Find the 80% CI for the mean serum level. Answer: 10% in the z-tables, this gives The standard error for the sample mean is 6.5/ 31 = Altogether this gives the CI [13.2± ] =[11.7,14.6]. This means with we believe with 80% confidence the mean level of serum for women with this medical condition should lie in this interval. By choosing such a low level of confidence our interval is quite narrow, but our confidence in this interval is relatively low. Question: How large a sample size should we choose such that the 80% CI for the mean has the margin of error 1U/l. q Answer: Solve / n = 1, n=( /1) 2 =70.

27 When the standard deviation is unknown? In the previous example we assumed the standard deviation was unknown. In general before we collect the data, we will not have much information about the standard deviation. However, we will have some idea on bounds for it. Ie. The standard deviation for human heights is probably between 2-5 inches. Based on this information we can can find the sample size whose Margin of Error is maximum a certain length. Question How large a sample size do we require such that the margin of error for a 95% confidence interval for the mean of human heights is maximum 0.25 inch, given that σ lies somewhere between 2-5 inches. Answer We know that the formula is n = (1.96 σ/0.25) 2.. We need to choose the standard deviation to place in the formula. If we use σ=2, then the sample size is n=(1.96 2/0.25) 2 = 246. If we use σ=5, then the sample size is n=(1.96 5/0.25) 2 = For standard deviations between 2 and 5, the sample size will be between In the next slide we see what the MoE for these different sample sizes and σ between 2 and 5.

28 q Using the smaller standard deviation gives a smaller sample size, which is easier to collect. However, if the standard deviation is greater than 2, then it means that the MoE will be larger than the desired minimum: If σ=5, and we use the minimum sample size n=246, then putting these numbers into the formula we see that the MoE =1.96 5/ 246 = Which is larger than the required of 0.5. This is not what we want, as we want to ensure that the MoE is less than If σ=2, and we use the maximum sample size n=1537, then putting these numbers into the formula we see that the MoE =1.96 2/ 1537 = 0.1. Which is less than the require of This is exactly what we want, as we want the MoE which is at most To be sure that the MoE is maximum 0.25, we need to use a sample size of n=1537. This means always using what we believe is the maximum standard deviation in the calculation of margin of error. Ie. n = 1.96 MAX MoE 2

29 Calculation practice (tricky) q Question: A confidence interval for the length of parrots beaks is [4,10] inches. It is based on a sample of size n. By what factor should the sample size increase such that the margin of error is 1? Answer: This looks like an impossible question because we don t have any obvious information. But we can break the problem into steps: q q Confidence intervals are centered about the sample mean, so the average of the observed data is 7. The margin of error is half the length of the CI interval which is [10-4]/2 = 3 = 1.96 σ/ n. We want to decrease the MoE, such that MoE = 1, so it decreases by a third. Now some basic maths, suppose we increase the sample size by factor 9 (9 times the original data): 1.96 p 9n =1.96 3n = p n {z } =3 = 3 3 =1 Thus increasing the sample size by factor 9 results in the Margin of Error reducing to 1. Observe we need a huge increase in sample size to get a moderate decrease in the MoE!

30 Calculation (continued) Example If a sample size of 20 gave a confidence interval [4,10], how large a sample size is required to reduce the margin of error to 1? Solution If the confidence interval is [4,10], from the previous slide we know that the MoE is 3. This means that 1.96 p =3 20 If increase the sample size by factor 36, ie. from n=20 to n=20 36=720. Then I see that the margin of error is 1.96 p = p 20 = p 20 = 3 6 = 1 2 We see that to decrease the margin of error from 3 to ½ (by a sixth) we need to increase the sample size by factor 36!

31 Analysis with unknown standard deviation So far we have assumed that the standard deviation is known, even though the mean is unknown. In some situations, this is realistic. For example, in the potassium level example, it seems reasonable to suppose that the amount of variation for everyone is about the same, but everyone has their own personal mean level, which is unknown. In most situations, the mean level is unknown. Given the data: 68, 68.5, 68.9 and 64.4 the sample mean is 68.7, how to `get the standard deviation to construct a confidence interval? We do not know the standard deviation, but we know that we can estimate it using the formula v For our example it is s = u t 1 n 1 nx (X i X) 2 i=1 s = r 1 3 ([ 0.7]2 +[ 0.2] 2 +[0.2] 2 +[0.7] 2 )=0.59

32 Using the z- transform with estimated standard deviation q Once we have estimated the standard deviation we replace the the unknown true standard deviation in the z-transform with the estimated standard deviation: X µ / p n ) X µ s/ p n X ± 1.96p n! X ± 1.96 s p n After this we could conduct the analysis just as before. However, we will show in the next few slides (with the aid of Statcrunch) that this strategy leads to unreliable confidence intervals (when the sample size small). We consider two examples q The data is normal (we `draw samples from a distribution with mean 3.8 and standard deviation 3.8, however confidence interval used does not know these specifications) and sample size is n = 3. q The data is normal (as above), but sample size is n = 50.

33 Case 1: Normal data sample size 3. In this example we draw samples of size 3: q The 95% CI using the above data and the normal apple p 3, p 3 We see from this example that the estimated standard deviation (1.73) underestimates the true standard deviation (3.8). This in general tends to be true for small sample sizes. This means the 95% CI is too narrow. We see from the plot on the left that only 84% of the `95% CI contain the mean. This means it is not a 95% CI. Something has gone wrong.

34 Case 1: Normal data sample size 50 In the previous example the sample size was 3, now we consider the case that the sample size is 50. For the example given on the right the 95% CI is apple p 50, p 50 For this example, the estimated standard deviation 4.07 is far closer to the true 3.8. This in general is true for large sample sizes. Looking at the number of times the mean is contained within in the 95% confidence interval (on the right) we see that it is close to the prescribed level lf 95%.

35 Observations from the experiments Simply replacing the true standard deviation with the estimated standard deviation seems to have severe consequences on the confidence interval. When the sample size was small there tends to be an underestimation in the standard error, resulting in the 95% CI not really being a 95% CI. To see why consider the z-transforms of the sample mean with known and estimated standard deviations: (sample mean - µ)/(σ/ n) (sample mean - µ)/(s/ n) In the first case, z-transform will be a standard normal. In the second case the estimated standard deviation adds extra variability into the `system. In particular, because s can be smaller than σ, this means the z-transform can be larger and take higher values then we would expect for a standard normal. In the next few slides we show that when we estimate the standard deviation the z-transform is no longer a standard normal, but the so called t-distribution.

36 Review: σ is unknown In the case the we can estimate the standard deviation from the data. The sample standard deviation s provides an estimate of the population standard deviation σ. When the sample size is large, the sample is likely to contain elements representative of the whole population. Then s is a good estimate of σ. Population distribution But when the sample size is small, the sample contains only a few individuals. Then s is a mediocre estimate of σ. The data is unlikely to contain values in the tails and, s is likely to underestimate σ. Large sample Small sample

37 Sample means and standard deviations Just like the sample mean is random with a distribution, so is the sample standard deviation. Here we take a sample of size 10 from a normal distribution can calculate its sample mean and variance.

38 Estimating the standard deviation The sampling distribution of the sample standard deviation (n=5) q The sample distribution of the sample standard deviation (n=25) Observe that as the sample size increases the estimator of the sample standard deviation becomes less variable (1.70 reduces to 0.65). Large amount of variability in the sample standard deviation influences the confidence interval.

39 That nice Mr. Gosset Just over 100 years ago, W.S. Gosset was a biometrician who worked for Guiness Brewery in Dublin, Ireland. His hobby was statistics. Gosset realized that his inferences with small sample data seemed to be incorrect too often his true confidence level was less than it was stated to be. We just observed this in the simulations previously. He worked out the proper method that took into account substituting s for σ. But he had to publish under a pseudonym: Student (probably because Gosset was a sweet and modest person). Gosset s theory is based on the distribution of the quantity t = x s µ. n This looks like the z-score for x, except that s replaces σ in the denominator.

40 Formal: Student s t distributions Suppose that an SRS of size n is drawn from an Normal(µ,σ) population. x µ When σ is known, the sampling distribution for z = σ n is Normal(0,1). q q When σ is estimated from the sample standard deviation s, the x µ sampling distribution for t = will be very close to normal if the s n sample size n is large. This is because for large n, s will be a very reliable estimator of σ. However, in the case that n is not so large, the variability in s will have an impact on the distribution. It is clear that the impact it has depends on the sample size.

41 Student s t distributions When σ is estimated from the sample standard deviation s, the sampling distribution for t = x s µ will depend on the sample size. n The sample distribution of x µ t = s n is a t distribution with n 1 degrees of freedom. q q The degrees of freedom (df) is a measure of how well s estimates σ. The larger the degrees of freedom, the better σ is estimated. This means we need a new set of tables! Further reading:

42 When n is very large, s is a very good estimate of σ, and the corresponding t distributions are very close to the normal distribution. The t distributions become wider (thicker tailed) for smaller sample sizes, reflecting that s can be smaller than σ, so the corresponding t- transform is more likely to take extreme values than the z-transform.

43 Impact on con4idence intervals Suppose we want to construct the C% confidence interval for the mean. The standard deviation is unknown, so as well as estimating the mean we also estimate the standard deviation from the sample. The C% apple Confidence Interval is: X t n C 2 p s, X 100 C + t n 1 n 2 Examples: 95%, sample size n=3 apple X 4.3 s p 3, X %, sample size n=10 apple X 2.26 s p 10, X s p 3 s p 10 s p n t* C Example: For an 95% confidence level C, 95% of Student s t curve s area is contained in the interval. t*

44 Con4idence level and the margin of error The confidence level C determines the value of t* (in table D). The margin of error also depends on t*. Higher confidence C implies a larger margin of error m (thus less precision in our estimates). A lower confidence level C produces a smaller margin of error m (thus better precision in our estimates). * m= t s n C We find t* in the line of Table D for df = n 1 and confidence level C. t* t*

45 Table D When σ is unknown, we use a t distribution with n 1 degrees of freedom (df). Table D shows the z-values and t-values corresponding to landmark P-values/ confidence levels. t= When the sample is very large, we use the normal distribution and the standardized z-value. x µ s n

46 Focus first on 2.5%. For each n, the 2.5% corresponds to the area on the left and right tails of the t-distribution with n degrees of freedom. Remember a distribution gives the chance/likelihood of certain outcomes. Recall that for a normal distribution, the point where we get 2.5% on the left and the right of the tails of the distribution is 1.96 (which is the very last row of the table). If we go down the table. we see that as the sample size, n, increases the value corresponding to 2.5, goes from (for n=1) to a number that is very close to 1.96 for extremely large n. This means for small n the variability on the standard deviation s means that the chance of the t-transform being extreme is relatively large. However, as n grows, the estimator of the standard deviation improves, and the t-transform gets closer to a normal distribution. You observe the same is true for other percentages. 90% means looking up 5% 99% means looking up 0.5% DO NOT MIX CONFIDENCE LEVEL WITH SIGNIFICANCE LEVEL

47 Case 1: Normal data sample size 3, using t- dist In this example we draw samples of size 3: q The 95% CI using the above data and the t-distribution is apple p 3, p 3 This is the same example as considered previously, but now the t-distribution has been used has been replaced with 4.3. From the plot of the right we see that using the t-distribution to construct the CI about 95% of the 95% confidence intervals really do contain the population mean. By using the t-distribution we have corrected for under the underestimation of the sample sd.

48 REMEMBER we only use the t- distribution because we have estimated the standard deviation from the data. Non-normal data: A misconception Using a t-distribution rather than a normal distribution when constructing a confidence interval does not correct for the lack of normality in the data. In the example of the left, we use the t- distribution to construct the CI. But we observe that only 88 of the % confidence intervals contain the mean. Fundamentally, if the data is not normal, and the sample size is small neither the normal or the t will give the correct 95% confidence interval.

49 Calculation practice (red wine 1) It has been suggested that drinking red wine in moderation may protect against heart attacks. This is because red wind contains polyphenols which act on blood cholesterol. To see if moderate red wine consumption increases the average blood level of polyphenols, a group of nine randomly selected healthy men were assigned to drink half a bottle of red wine daily for two weeks. The percent change in their blood polyphenol levels are presented here: Sample average = 5.50 Sample standard deviation s = Degrees of freedom df = n 1 = 8 x We will encounter two problems when doing the analysis. The first is that the sample size is not huge so we have to hope that the sample mean is close to normal. The second is the standard deviation is unknown and has to be estimated from the data.

50 q What is the 95% confidence interval for the average percent change? First, we determine what t* is. The degrees of freedom are df = n 1 = 8 and C = 95%. From Table D we get t* = ( ) The margin of error m is: m = t* s/ n = / So the 95% confidence interval is 5.50 ± 1.93, or 3.57 to We can say With 95% confidence, the mean of percent increase is between 3.57% and 7.43%. What if we want a 99% confidence interval instead? For C = 99% and df = 8, we find t* = Thus m = / Now, with 99% confidence, we only can conclude the mean is between 2.69 and (A big price to pay for the extra confidence.)

51 Calculation practice (red wine 2) Let us return to the same study, but this time we increase the sample size to 15 men. The data is now: 0.7,3.5,4,4.9,5.5,7,7.4,8.1,8.4, 3.2,0.8,4.3,-0.2,-0.6,7.5 The sample mean in this case is 4.3 and the sample standard deviation is Since the sample size has increased, it is likely that the sample standard deviation is a more reliable estimator of the true standard deviation. The number of degrees of freedom is 14. Just as in the previous example we can construct a 95% confidence interval but now we use 14df instead of 8dfs. Solution: Using the t-tables the 95% CI is ± {z } =[2.6, 6] t-tables 14 df, 2.5% p 15 3

52 Con4idence intervals using Software Usually software will construct the confidence interval for you. Therefore it is important to connect the calculations with the statistical output. The box on the right is the output (it is superimposed on the window used to generate the output). Observe that L.Limit U. limit gives the confidence interval [2.6,6] calculated on the previous slide. DF = 14, matches with the degrees of freedom.

53 Calculation practice 3 Let us return to the example of prices of apartments in Dallas. 10 apartments are randomly sampled. The sample mean and the sample standard deviation based on this sample is 980 dollars and 250 dollars (both are estimators based on a sample of size ten). Construct a 95% confidence interval for the mean: q The standard error is 250/ 10 = 79. Looking up the t-tables at 2.5% and 9 degrees of freedom gives The 95% confidence interval for the mean is [980 ± ]=[801,1159]. Suppose we want to know whether the price of apartments have increased since last year, where the mean price was 850 dollars. q Based on this interval we see that 850 dollars and greater is contained in this interval. This means the mean could be 850 dollars or higher. There given the sample it is unclear whether the mean price of apartments has increased since last year or not.

54 Calculation practice 4 Let us return to the M&M data. Suppose we want to calculate a 99% confidence interval for the mean number of M&Ms in plain, peanut butter and peanut M&Ms. These can be calculated using the summary statistics output: Summary statistics for Total: Group by: Type Type n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3 M P PB Using this output we can calculate the confidence intervals for the mean number of M&Ms in each type.

55 Using Software to obtain con4idence intervals Go to Stats -> t-statistics -> one-sample -> with data -> select the column you want to analyse (choose the Group by if you want it grouped), on the next page select confidence interval and the level you want it at. Sample mean Std. err DF L Limit U limit Looking at the intervals, do you think it that the mean number of M&Ms in a plain and peanut bag could be the same. What about the mean number in peanut and peanut butter? Later on we shall make a formal test on these questions.

56 Calculation practice: coffee shop sales A marketing firm randomly samples 45 coffee shops and determines their annual sales. The sample has an average of $2.67 million and a standard deviation of $1.03 million. What can we say with 90% confidence about the mean annual sales for the population of all coffee shops? x ± t * s n The degrees of freedom is 45 1 = 44. For 90% confidence, we find t* = The margin of error is / 45 = So the interval for the true mean is 2.67 ± We conclude that the mean annual sales of all coffee shops is between $2.41 million and $2.93 million, with 90% confidence.

57 Summary of con4idence interval for µ. The confidence interval for a population mean µ is t* is obtained from Student s t distribution using n 1 degrees of freedom. (Table D in the textbook.) t* is the value such that the confidence level C is the area between t* and t*. Confidence is the proportion of samples that lead to a correct conclusion (for a specific method of inference). The investigator chooses the confidence level C. Tradeoff: more confidence means bigger margin of error, wider intervals. The degrees of freedom is associated with s, the estimate for σ. * / ts * x ± t s n. n The margin of error also depends on the sample size: larger samples are better.

58 Interpretation of con4idence, again The confidence level C is the proportion of all possible random samples (of size n) that will give results leading to a correct conclusion, for a specific method. In other words, if many random samples were obtained and confidence intervals were constructed from their data with C = 95% then 95% of the intervals would contain the true parameter value. In the same way, if an investigator always uses C = 95% then 95% of the confidence intervals he constructs will contain the parameter value being estimated. But he never knows which ones do! Changing the method (such as changing the value of t*) will change the confidence level. Once computed, any individual confidence interval either will or will not contain the true population parameter value. It is not random. It is not correct to say C is the probability that the true value falls in the particular interval you have computed.

59 Cautions about using * x ± t s/ n This formula is only for inference about µ, the population mean. Different formulas are used for inference about other parameters. The data must be a simple random sample from the population. The formula is not quite correct for other sampling designs. (But see a statistician to get the right inference method.) Confidence intervals based on t* are not resistant to outliers. If n is small and the population is not normal, the true confidence level could be smaller than C. (Usually n 30 suffices unless the data are highly skewed.) This inference cannot rescue sampling bias, badly produced data or computational errors.

60 Accompanying problems associated with this Chapter Quiz 7 Quiz 8 Homework 4 (part of it)

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Estimation and Confidence Intervals

Estimation and Confidence Intervals Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015 Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation

More information

Confidence intervals

Confidence intervals Confidence intervals Today, we re going to start talking about confidence intervals. We use confidence intervals as a tool in inferential statistics. What this means is that given some sample statistics,

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Simple Inventory Management

Simple Inventory Management Jon Bennett Consulting http://www.jondbennett.com Simple Inventory Management Free Up Cash While Satisfying Your Customers Part of the Business Philosophy White Papers Series Author: Jon Bennett September

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

1. How different is the t distribution from the normal?

1. How different is the t distribution from the normal? Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Math 251, Review Questions for Test 3 Rough Answers

Math 251, Review Questions for Test 3 Rough Answers Math 251, Review Questions for Test 3 Rough Answers 1. (Review of some terminology from Section 7.1) In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate,

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Name: 1. The basic idea behind hypothesis testing: A. is important only if you want to compare two populations. B. depends on

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 1. Does vigorous exercise affect concentration? In general, the time needed for people to complete

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Lesson 7 Z-Scores and Probability

Lesson 7 Z-Scores and Probability Lesson 7 Z-Scores and Probability Outline Introduction Areas Under the Normal Curve Using the Z-table Converting Z-score to area -area less than z/area greater than z/area between two z-values Converting

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

SAMPLING DISTRIBUTIONS

SAMPLING DISTRIBUTIONS 0009T_c07_308-352.qd 06/03/03 20:44 Page 308 7Chapter SAMPLING DISTRIBUTIONS 7.1 Population and Sampling Distributions 7.2 Sampling and Nonsampling Errors 7.3 Mean and Standard Deviation of 7.4 Shape of

More information

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice! Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!) Part A - Multiple Choice Indicate the best choice

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Chapter Study Guide. Chapter 11 Confidence Intervals and Hypothesis Testing for Means

Chapter Study Guide. Chapter 11 Confidence Intervals and Hypothesis Testing for Means OPRE504 Chapter Study Guide Chapter 11 Confidence Intervals and Hypothesis Testing for Means I. Calculate Probability for A Sample Mean When Population σ Is Known 1. First of all, we need to find out the

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

= 2.0702 N(280, 2.0702)

= 2.0702 N(280, 2.0702) Name Test 10 Confidence Intervals Homework (Chpt 10.1, 11.1, 12.1) Period For 1 & 2, determine the point estimator you would use and calculate its value. 1. How many pairs of shoes, on average, do female

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Confidence Intervals for Cp

Confidence Intervals for Cp Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Hypothesis Testing: Two Means, Paired Data, Two Proportions Chapter 10 Hypothesis Testing: Two Means, Paired Data, Two Proportions 10.1 Hypothesis Testing: Two Population Means and Two Population Proportions 1 10.1.1 Student Learning Objectives By the end of this

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. STT315 Practice Ch 5-7 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. 1) The length of time a traffic signal stays green (nicknamed

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Inference for two Population Means

Inference for two Population Means Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information

Constructing and Interpreting Confidence Intervals

Constructing and Interpreting Confidence Intervals Constructing and Interpreting Confidence Intervals Confidence Intervals In this power point, you will learn: Why confidence intervals are important in evaluation research How to interpret a confidence

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Chapter 2. Hypothesis testing in one population

Chapter 2. Hypothesis testing in one population Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance

More information

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters. Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Non-random/non-probability sampling designs in quantitative research

Non-random/non-probability sampling designs in quantitative research 206 RESEARCH MET HODOLOGY Non-random/non-probability sampling designs in quantitative research N on-probability sampling designs do not follow the theory of probability in the choice of elements from the

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

More information

Statistical estimation using confidence intervals

Statistical estimation using confidence intervals 0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information