Statistics EXAM II through 2003

Transcription

1 Statistics EXAM II through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the P-value of the data was If α=.01 and the true value of µ was µ=7, then the decision based on the data A. was a Type I error. B. was a Type II error. C. was correct. D. cannot be determined. E. all of the above (2) The P-value of the computed value of a test statistic is A. the weight of evidence in favor of H 1. B. the largest value of α for which the observed data will reject H o. C. the smallest value of α for which the observed data will reject H o. D. the probability of observing a less extreme value of the test statistic. (3) A 95% confidence interval for µ was calculated to be (15,27). Then 95% represents: A. the probability of a type I error. B. the probability that µ is between 15 and 27. C. the probability that we obtain a sample which will yield an interval containing µ. D. the probability that µ 10. (4) The most crucial of the conditions imposed on the sampled data and the populations in order for the pooled t-test to be valid is A. normality. B. equal variance. C. independence. D. all three conditions are equally important. E. none of the conditions are crucial (5) In a hypotheses test of H o : µ 5 vs H 1 : µ < 5, with σ known, if the sample size remains constant, but the level α is increased from.01 to.05, then the probability of a Type II error at µ=4, A. increases. B. decreases. C. remains the same. D. may increase or decrease depending on the sample size. E. cannot be determined with the given information. 1

2 (6) The reason that experimental units are paired in a study to compare the average responses of two treatments A. is to reduce the degrees of freedom of the t-test. B. is to reduce the variance of the difference in the two sample means. C. is to increase the degrees of freedom of the t-test. D. is to make the difference in the two sample means normally distributed. (7) The Wilcoxon rank sum statistic is preferred to the pooled t-test A. if the population distributions are normally distributed. B. for all continuous distributions. C. if the population distributions are symmetric. D. if the population distributions have equal variance. E. if the population distributions have very heavy tails. (8) A 95/99 tolerance interval for a normal population A. has a higher level of confidence than a 95% confidence interval on the population mean. B. is a 95% estimate of µ and a 99% estimate of σ. C. an estimate of a region of values which will contain between 95% and 99% of the population values. D. is a region of values for which we are 99% confident that the region contains 95% of the population values. E. is based on the central limit theorem. (9) An experimenter wants to test H o : F = F o, where F is a process cdf. Which one of the following statements is TRUE? A. The Chi-squared GOF test is the preferred test statistic. B. The most powerful test statistic depends on the shape of F o. C. The Anderson-Darling test has greater power than any other test. D. The Shapiro-Wilk test has greater power than the Chi-squared test. E. The Chi-squared GOF test can only be used when F o is discrete. (10) If f(y; θ) is a pdf which is symmetric about θ, then, amongst the three test statistics discussed in class, the test statistic having greatest power A. is the Wilcoxon Signed Rank test. B. is the sign test C. is the t-test D. depends on the form of f(y; θ) 2

3 II. (28 points) In the following problems, (A) state the null and alternative hypotheses, (B) give the formula for the test statistic but do not compute its value, (C) set-up the rejection region by selecting the proper value from the appropriate table. (1) Two different types of fabrics (A and B) are to be compared on a Martingale wear tester. The wear tester is known to be quite variable from run to run. Thus, on each run of the tester, the operator evaluates a sample of Fabric A and a sample of Fabric B. The weight losses (in milligrams) from seven runs are as follows: Run : X S Fabric A: Fabric B: Is there significant evidence (α = 0.01) that Fabric A has a smaller average weight loss than Fabric B? A. H o : H a : B. Test statistic: C. Rejection Region: (2) An experiment is run to study the effects of PCB, an industrial contaminant, on the reproductive ability of owls. The shell thickness of eggs produced by owls exposed to PCB are compared to the shell thickness of eggs produced by owls which did not have PCB exposures. From previous studies, the shell thickness of eggs has been normally distributed. Owl X S PCB-Exposed: UnExposed: Is there significant(α =.05) evidence that the PCB exposed owls have thinner egg shells than those of the unexposed owls? A. H o : H a : B. Test statistic: C. Rejection Region: (3) Scientists think that robots will play a crucial role in factories in the next 20 years. Suppose that in an experiment to determine whether the use of robots to weave computer cables is feasible, a robot was used to assemble 10 cables and an experienced worker also assembled 10 cables. The cables were examined and the number of defectives on each cable was recorded. The 10 data points for each method were plotted in a normal probability plot and the plot indicated a very heavy-tailed distribution. The data is given here: Cable Mean St.Dev. Robot Human Does this data support the assertion that the average number of defectives per assembled cable is less for robots than for humans? Use α = A. H o : H a : B. Test statistic: C. Rejection Region: 3

4 III. (32 points) A company is developing a new cooling fan for diesel engines. From previous studies involving the old fan, the exponential distribution provided an adequate model for the time until failure of the fan. The existing fan has an average time to failure of 25,000 hours. The company developing the new cooling fan wants to determine if the new fan has a longer average time to failure than the old fan. (1) Suppose 20 of the new fans are evaluated in an accelerated life testing experiment and their times to failure T 1,..., T 20 are determined. Describe a graphical technique to evaluate whether the failure times from the accelerated life testing follow an exponential distribution. Be sure to define all your terms and explicitly label both axes on your graphs. (2) Suppose we have determined that the times to failure from the accelerated life testing procedures satisfy an exponential distribution and the average time to failure of the 20 fans is T = 32, 381 hours using the accelerated life testing procedures. Construct 95% confidence interval for the average time to failure for the new fans. Hint: If T 1,..., T n are iid exponential random variables with mean λ then 2n T λ has a Chi-square distribution with d.f. = 2n. (3) Is there sufficient evidence (α =.05), using the data from part (2) of this problem, that the new fan has a longer average time to failure than the fan currently in use? Compute the p-value of your test. (4) If the average time to failure of the new fan is 30,000 hours, compute the probability that the test in part (3) of this problem will detect that the average time to failure is greater than 25,000 hours based on testing 20 new fans. 4

5 EXAM II, December 1, 2000 I. (40 points) One of the major sources of sulfur dioxide air pollution is coal powered utility plants. Currently only 10% of such power plants meet federal EPA air pollution standards. Several major changes have been made in the power plants, such as using low-sulfur coal, burning the coal at a higher temperature, and installing new scrubbers on the air stacks of the plants. One year after imposing the changes, the EPA randomly selected 25 power plants for a length examination. The investigators found that 5 of the 25 power plants met the EPA air pollution standards, whereas 20 power plants did not meet the standards. 1. Place a 95% confidence interval on the proportion of power plants currently meeting the EPA air pollution standards. 2. Is there significant evidence at the α =.05 level that the proportion of power plants meeting the EPA air pollution standards has increased since the major changes were made in the power plants. 3. Compute the P-value for the test statistic obtained in part Evaluate the power function of the test constructed in part 2 for the proportions.05,.1,.2,.3,.4,.5, and.6. Use these values to sketch a rough plot of the power function. 5. The researchers are interested in a much larger study. How large must the sample size n be in order that the researchers can be 95% confident that the sample estimator of the proportion of power plants meeting the standard is within.05 of the true value? II. (30 points) A company has designed a new type of braking system for sports utility vehicles. To evaluate the effectivenes of the new system, they place n of the braking systems on a test device and recorded the time to failure of the braking systems: Y 1,..., Y n. The pdf of the random variables f(y; θ), depends on a unknown parameter θ. For each of the following situations, state whether the given statement is TRUE or FALSE. If the statement is FALSE, explain VERY BRIEFLY why the statement is FALSE. 1. In order to evaluate the reliability of the braking systems a 95/95 tolerance interval for the population of times to failure for the brakes is to be constructed. In order to construct the interval, it is necessary that the functional form of f(y; θ) be known. 2. The researchers obtain a point estimator of θ. They state that the sampling distribution of the estimator can be adequately approximated by a normal distribution if the sample size, n, is large enough. 3. Under appropriate weather conditions, the pdf f(y; θ) is symmetric about θ. In this type of situation, the sample mean would have a smaller mean squared error (MSE) than the sample median as an estimator of θ. III. (30 points ) Place the letter of the BEST answer in the BLANK to the LEFT of each question. (1) In testing H 0 : π.4 vs H a : π <.4, the P-value of the data was If α=.05 and the true value of π was π=.6, then the decision based on the data A) was a Type I error. B) was a Type II error. C) was correct. D) cannot be determined. 5

6 (2) The P-value of the computed value of a test statistic is A) the probability of observing a less extreme value of the test statistic. B) the largest value of α for which the observed data will reject H o. C) the weight of evidence in favor of H a. D) the smallest value of α for which the observed data will reject H o. (3) An industrial process produces piston rings having a nominal diameter of 9 cm. A 95% confidence interval for the mean diameter of the piston rings produced during July was calculated and a 95%/95% tolerance interval was calculated for the diameters of the piston rings produced during July. A) The probability is.95 that the mean diameter will fall within the confidence interval. B) The width of the 95% confidence interval is generally narrower than the width of the 95%/95% tolerance interval and hence is a more precise estimator. C) If the engineer wanted to set limits such that 95% of the output was within these limits, then the tolerance interval would be more informative than the confidence interval. D) The tolerance interval for the piston diameters could be used to determine if the mean diameter for July s output was equal to 9 cm or not. (4) Suppose we have two normal populations and we want to compare the means of the populations. Random samples of size 9 are selected from each population. The researcher is certain that the standard deviation of the first population is at least 8 times larger than the standard deviation of the second population. The most appropriate procedure for testing if the two population means differ is A. the pooled variance t-test. B. the Wilcoxon rank sum test. C. to transform the data and use a pooled variance t-test. D. the separate variance t-test using the Satterthwaite correction for the degrees of freedom. E. none of the procedures are very useful in this situation since the sample sizes are too small. (5) A biochemist is attempting to estimate the typical length of time it takes a drug to reach the kidney of a mature rat. There are several possible estimators of this parameter. In attempting to select the best estimator the biochemist should A. select the estimator with the smallest average squared distance from the parameter. B. select the estimator with the smallest variance since it would be the most consistent estimator. C. select the estimator with the smallest bias or smallest variance. D. always select the unbiased estimator since on the average it would equal the parameter. E. ask a statistician for advice. (6) In a study to compare the average responses of two drugs for the treatment of heart worms, the researchers were concerned that the available dogs for the study ranged from 1 year to 12 years of age and weighed from 10 kilograms to 55 kilograms. How can the effect of the difference in experimental units be reduced so as to not mask any significant difference in the treatments? A. A procedure based on the ranks of the responses should be used as a test statistic. B. A transformation of the data prior to using a pooled t-test would be effective in reducing variability. C. The separate variance t-test would adjust the test statistic for the unequal variances. D. The dogs should be grouped into similar pairs of dogs prior to assigning the two treatments. E. The effect of the differences in the dogs is not a problem if the treatments are randomly assigned to the dogs. 6

7 (7) The Wilcoxon rank sum test statistic is called a distribution-free test statistic since A. its sampling distribution under H o does not depend on the shape of the population distributions. B. it can be used even if the population distributions are non-normal. C. its sampling distribution is non-normal for small sample sizes. D. its sampling distribution does not require the variances to be equal. E. it has greater power than the pooled t-test when the population distributions are non-normal. (8) The random assignment of treatments to experimental units is crucial in designed experiments A. since it eliminates the effects of nontreatment factors on the experimental responses. B. since the effect of nontreatment factors are averaged over all experimental units, no matter the treatment. C. since it allows us to estimate the amount that nontreatment factors affect the average response. population values. D. since the randomization prevents the experimenter from knowing which experimental units are assigned to which treatments. reasons are correct. (9) An experimenter wants to test H o : F = F o, where F is a process cdf. Which ONE of the following statements is FALSE? A. The Chi-squared GOF test statistic can be used even when F is an absolutely continuous cdf. B. The Cramer-von-Mises test statistic is generally a better test than the Kolmogorov-Smirnov test statistic. C. The Anderson-Darling test has greater power than the Chi-squared GOF test for all choices of F o. D. The Shapiro-Wilk test has greater power than the Anderson-Darling test if F has a normal cdf. E. The Kolmogorov-Smirnov test is not an appropriate test statistic when F o is discrete. (10) If f(y; θ) is a pdf which is symmetric about θ, then, amongst the three test statistics discussed in class, the test statistic having greatest power A. is the Wilcoxon signed rank test. B. is the sign test C. is the t-test D. depends on the form of f(y; θ) E. the tests have essentially the same power 7

8 November 21, 2001 I. (35 points) A company that manufacturers silicon wafers for computer chips is concerned with both the mean thickness of the chips and the fluctuation in the thickness of the chips. In order to monitor the thickness, a random sample of 20 chips is selected every hour and the thickness is measured on each of the chips. The process is considered to be in control provided the process mean, µ, is 200 mm and the process standard deviation, σ is less than or equal to 2.5 mm. The company s statistician develops a test to evaluate whether the process standard deviation is greater than 2.5 mm. She plots the power curve of the test in order to evaluate its performance. The curve is given here: Power Curve for Standard Deviation Power Process Standard Deviation Use the above graph to answer questions a.-d. a. What is the level of significance of the test whose power curve is depicted above? b. What is the probability of a Type I error if σ = 2.4? c. What is the probability of a Type II error if σ = 2.4? d. What is the probability of a Type II error if σ = 3? e. What is the practical consequence to the company if the test commits a Type I error? f. What is the practical consequence to the company if the test commits a Type II error? 8

9 g. The random sample of 20 chips yields a sample standard deviation of Compute the p-value of the test statistic and determine if the process standard deviation is outside of its specification. II. (25 points) The company also needs to meet the specification that the mean thickness of the wafers must be at least 200 mm. The company wants to develop a test which has level of significance of 5% to determine if the mean thickness is less than 200 mm. a. Suppose the sample size is 20 units and we develop a level 0.05 test based on n=20. What is the chance the test will detect that the wafer mean thickness is less than 200 mm when the process mean is 199 mm and mm. (Assume that σ = 2.5 mm.) b. Sketch the power curve for the test. c. What sample size is needed so that the test in part a. will have a 80% chance to detect that the wafer mean thickness is less than 200 mm when the true mean thickness is at most mm? III. (4 points each) Place the letter of the best answer in the blank to the left of each question. (1) Suppose that the company s process engineer informed you that in most samples of 20 wafers that the distribution of wafer thicknesses is highly skewed to the right. In using the Chi-square test of whether σ is greater than 2.5 mm with α = 0.05, A. the actual level of significance will be greater than B. the actual level of significance will be less than C. the actual level of significance will be very close to D. it is completely unknown what the effect will be. (2) The p-value of the observed value of a test statistic is A. the probability of making the correct decision. B. the weight of evidence that the alternative hypothesis is true. C. the smallest level of significance for which the observed data will reject the null hypothesis. D. the probability of making a Type I error. (3) The sample estimator ˆθ of a population parameter θ is unbiased. Unbiased means that the estimator ˆθ A. is 95% certain of being close to θ. B. has a sampling distribution which is symmetric about θ. C. has a smaller mean squared error than a biased estimator. D. has a sampling distribution with mean value θ. E. all the above 9

10 (4) A 95/95 tolerance interval is to be constructed for a population having pdf f( ). A. The tolerance interval is always wider than a 95% confidence interval for the population parameter. B. It is necessary to specify the family for f( ) in order to construct the tolerance interval. C. The distribution-free tolerance interval will generally be wider than the tolerance interval based on a specified family for f( ). D. The normal based tolerance interval will have approximately the correct probabilities provided the sample size is large enough for the central limit theorem to be valid. E. All of the above. (5) In an α = 0.05 test of H o : µ 12 versus H 1 : µ > 12, the probability of a Type II error is A. greater at µ = 13 than at µ = 14. B. smaller at µ = 13 than at µ = 14. C. the same at µ = 13 as at µ = 14. D. always less than (6) In testing H o : µ µ o versus H 1 : µ < µ o, where µ is the median of a population having a symmetric pdf, f(), A. the power of the t-test is greater than the power of the Wilcoxon signed-rank test. B. the power of the t-test is greater than the power of the sign test. C. the power of the t-test is less than the power of the Wilcoxon signed-rank test. D. the power of the t-test is less than the power of the sign test. E. None of the above (7) The observations X 1,, X n are positively correlated with correlation greater than 0.8. An α = 0.05, t-test = X µ o s/ n of H o : µ µ o versus H 1 : µ > µ o will have A. maximum probability of Type I error equal to B. maximum probability of Type I error less than C. maximum probability of Type I error greater than D. maximum probability of Type I cannot be computed. (8) A study is to be conducted to estimate the mean conductivity (in ohms) of a new alloy. What sample size is needed to ensure that the sample mean will estimate the average conductivity to within 5 ohms with a reliability of 99%. Conductivity has a normal distribution with a standard deviation of approximately 15. A. 538 B. 30 C. 60 D. 49 E. cannot be determined with the given information 10

11 (9) A 95/99 tolerance interval for a normal population is A. a 95% C.I. for µ and a 99% C.I. for σ. B. an estimate of the population mean in which we are 99% confident that the sample mean is within 95% of σ from the population mean. C. an estimate of a region of values which will contain between 95% and 99% of the population values. D. a region of values for which we are 99% confident that the region will contain at least 95% of the population values.. (10) In testing H o : σ 2.3 versus H 1 : σ < 2.3, the p-value of the test statistic was computed to be If the level of significance was α = 0.05 and the true value of σ = 1.8 then the decision based on the data A. was a Type I error. B. was a Type II error. C. was a correct decision. D. depends on the power of the test. 11

12 November 21, 2002 I. (36 points) Suppose X 1,, X n are n observations from a population. (A) Suppose the X i s are iid and the population distribution is normal with µ and σ unknown. Describe how to compute the power curve for testing the hypotheses: H o : µ µ o versus H a : µ > µ o. (B) Suppose the X i s are iid but the population distribution is symmetric with very heavy-tails. If n is relatively small, describe an alternative test statistic to the one used in part (A) for testing the hypotheses: H o : µ µ o versus H a : µ > µ o. (C) Suppose the X i s are iid and the population distribution is normal. Determine the smallest sample size necessary for an α = 0.05 test of the hypotheses: H o : µ µ o versus H a : µ > µ o to have power at least.8 whenever µ > µ o +.75 σ. (D) Suppose X 1,, X n s are iid and the population distribution is very skewed to the right. Describe an interval of values for which you are 95% confident that the interval contains 90% of the population values. (E) Describe a procedure to determine if the X i s are positively correlated? (F) Suppose the population distribution is normal. What is the effect of the positive correlation on the standard test of the hypotheses: H o : µ µ o versus H a : µ > µ o? Justify your answer. II. (24 points) A company produces a product whose time to failure T has an exponential distribution with average time to failure λ = 25 (in thousands of hours). They make changes to the product and want to determine if the changes have an effect on the distribution of the average times to failure. An accelerated life test is performed on a random sample of 15 units produced after the product changes yielding a sample mean T = 40 (thousands of hours). (A) Describe a graphical technique to evaluate whether the failure times from the accelerated life tests follow an exponential distribution. Make sure to label your axes. (B) Construct a 90% confidence interval for the average time to failure. You may assume that T 1,, T 15 are iid exponential with T = 40. Hint: If T 1,, T n are iid exponential with average value λ, then d.f. = 2n. 2n T λ has a Chi-squared distribution with (C) Use the confidence interval you constructed in part (B) to test the hypothesis that the average time to failure after the changes is greater than 25. What is the level of significance of your test? (D) Construct a 95% prediction interval for the time to failure, T, of a single unit produced after the changes were made to the product. You may assume that T 1,, T 15 are iid exponential with T = 40. Hint: T and T are independent and T 1,, T n are iid exponential with average value λ, thus Chi-squared distribution with d.f. = 2n. III. (4 points each) Place the letter of the best answer in the blank to the left of each question. 2n T λ has a (1) Suppose that X 1,, X n are highly positively correlated with a N(µ, σ 2 ) distribution. A 95% confidence interval for µ was constructed using the formula X ± (t α/2,n 1 )(s/ n). The true coverage probability of this confidence interval A. is B. is much less than C. is very close to D. is much greater than

13 E. may be greater or less than (2) The p-value of the observed value of a test statistic is A. the probability of making the correct decision. B. the smallest level of significance for which the observed data will reject the null hypothesis. C. the largest level of significance for which the observed data will reject the null hypothesis. D. the probability of making a Type I error. E. the probability of making a Type II error. (3) There are two sample estimators ˆθ 1 and ˆθ 2 of a population parameter θ. ˆθ 1 is unbiased and ˆθ 2 is biased. A. ˆθ1 is always preferred to ˆθ 2 B. ˆθ1 is preferred to ˆθ 2 because it is a more accurate estimator than ˆθ 2. C. ˆθ2 is preferred to ˆθ 1 if it has a smaller variance than ˆθ 1. D. ˆθ2 is preferred to ˆθ 1 if it has a smaller variance and smaller bias than ˆθ 1. E. ˆθ1 is preferred to ˆθ 2 if it has a smaller variance than ˆθ 2. (4) A 95/95 tolerance interval is constructed for a population having pdf f( ) and mean µ. A. The tolerance interval is wider than a 95% confidence interval for µ. B. It is necessary to specify the family for f( ) in order to construct the tolerance interval. C. If we know that the population distribution is normal but use a Distribution-free tolerance interval, then the tolerance interval will contain more than 95% of the population values. D. A normal based tolerance interval will have approximately the correct probabilities for any pdf provided the sample size is large enough for the central limit theorem to be valid. E. All of the above. (5) In an α = 0.05 test of H o : µ 12 versus H 1 : µ < 12, the probability of a Type II error is A. greater at µ = 13 than at µ = 14. B. smaller at µ = 10 than at µ = 11. C. the same at µ = 10 as at µ = 11. D. always less than E. cannot be determined because my dog ate my noncentral t-tables (6) In testing H o : µ µ o versus H 1 : µ < µ o, where µ is the finite expected value for a population having a symmetric pdf, f( ), A. the power of the t-test is greater than the power of the Wilcoxon signed-rank test. B. the power of the t-test is greater than the power of the sign test. C. the power of the t-test is less than the power of the Wilcoxon signed-rank test. D. the power of the t-test is less than the power of the sign test. E. None of the above (7) The Anderson-Darling statistic is preferred to the Chi-square statistic in testing H o : F = F o based on a random sample X 1,, X n from a continuous cdf F because A. the Anderson-Darling is easier to compute. B. the number of degrees of freedom must be approximated for the Chi-squared test. C. the probability of Type I error is greater for the Chi-squared test. 13

14 D. the Anderson-Darling test has greater power. E. all of the above (8) A study is to be conducted to estimate the mean conductivity (in ohms) of a new alloy. What sample size is needed to ensure that the sample mean will estimate the average conductivity to within 10 ohms with a reliability of 99%. Conductivity has a normal distribution with a standard deviation of approximately 30. A. 35 B. 49 C. 60 D. 538 E. cannot be determined since σ is unknown (9) The Kolmogorov-Smirnov, Cramer von Mises, and Anderson-Darling statistics are referred to as Distribution- Free tests for testing H o : F = F o when the population cdf F o is completely specified because A. the three test statistics have the same level of significance. B. any unknown parameters have very little effect on their null distribution. C. the null distribution of the three tests do not depend on the particular form of F o. D. the power function is identical for the three tests.. (10) In testing H o : µ 5 vs H a : µ > 5, the P-value of the test statistic was computed to be If the level of significance was α =.10, and the true value of µ was µ = 4. The decision based on the data A. was a Type I error. B. was a Type II error. C. was a Type III error. D. was correct. E. was either a Type I error or a correct decision. 14

15 November 18, 2003 I. (25 points) Suppose Y 1,, Y 60 are the distances (in thousands of miles) to failure of 60 transmissions produced by a supplier of truck transmissions. The data is given on the next page. (A) Determine a warranty value such that you are 90% confident that at least 75% of all transmissions produced from the production line will have miles to failure greater than the warranty value. (B) Predict with 90% confidence the number of miles to failure of a transmission randomly selected from the production facility. (C) Estimate with 90% confidence the proportion of transmissions from the production facility having miles to failure greater than 50,000 miles. (D) Suppose the Y i s are iid but the population distribution is multimodal due to the supplier having several production facilities with varying levels of quality. Determine a warranty value such that you are 90% confident that at least 75% of all transmissions produced from the production facilities will have miles to failure greater than the warranty value. The following data is the failure times of the 60 transmissions in thousands of miles: The following summary statistics were computed from the 60 failure times: Ȳ = S = Y (1) = ˆQ(.25) = ˆQ(.5) = ˆQ(.75) = Y(60) =

16 Normal Quantile Plot for Failure Times Failure Times Standard Normal Quantiles II. (75 points) Answers each of the following questions in 20 words or less. (1) Suppose that X 1,, X n is a sample from a N(µ, σ 2 ) distribution but the X i s are highly positively correlated. A 95% confidence interval for µ was constructed using the formula X ± (t α/2,n 1 ) S n. What is the effect of the correlation on the true coverage probability of this confidence interval? (2) There are two sample estimators ˆθ 1 and ˆθ 2 of a population parameter θ. ˆθ1 is unbiased and ˆθ 2 is biased. Which of the two estimators is preferred? Justify your answer. (3) A 95/95 tolerance interval is to be constructed for a population having pdf f( ). Is the following statement true: A normal based tolerance interval will have approximately the correct probabilities provided the sample size is large enough for the central limit theorem to be valid.? Justify your answer. (4) Why is the Anderson-Darling statistic preferred to the Chi-square statistic in testing whether a population cdf is F o based on a random sample X 1,, X n from a continuous cdf F? (5) A study is to be conducted to estimate the mean conductivity (in ohms) of a new alloy. What sample size is needed to ensure that the sample mean will estimate the average conductivity to within 10 ohms with a reliability of 99%. Conductivity has a normal distribution with a standard deviation of approximately

17 (6) Explain why the Kolmogorov-Smirnov statistic, is referred to as Distribution-Free method for evaluating whether a population cdf F is equal to F o when F o is completely specified. (7) The Agresti-Coull C.I. for a proportion is somewhat more complex than the standard asymptotic C.I. for a proportion. Why would you recommend using the Agresti-Coull C.I.? (8) What are the two major sources of error in using the bootstrap procedure to the estimate the percentiles of a pivot? (9) In order to construct a 95% C.I. for the mean µ of a population, a random sample X 1,..., X 31 was selected from a process having pdf f( ). Because the sample size is relatively large, the biologist used X S ± t.025 n as the C.I. for µ. You plotted the data and noticed that the data was highly right skewed. What problems may exist in using this C.I.? (10) Referring to question (9), the problem encountered with the interval is due to possible correlation between X and S. If the biologist can not provide you with any information about the population distribution other than the data, describe a method to determine the degree of correlation between X and S using the observed data from the study. (11) Suppose X 1,..., X 20 are iid with a highly right skewed distribution. The transformation Y = g(x) = X yields a Shapiro-Wilk p-value = A 95% C.I. for µ Y is [2.86, 3.97]. Why is [(2.86) 2, (3.97) 2 ] NOT an appropriate 95% C.I. for µ X? (12) For each of the following sentences, state whether the sentence is true or false. If false, explain why. (A). The sampling distribution of an estimator of a parameter θ will have approximately a normal distribution if the sample size is large enough. (B). The bootstrap procedure for constructing a C.I. for a parameter θ is always preferred to using a distribution-based procedure in constructing the C.I. because the distribution-based procedure depends on various conditions being valid. (C). If the population pdf f(y; θ) is symmetric about θ, then the sample mean is a better estimator of θ than is the sample median. (D). A 95% C.I. for a parameter θ is given by (2.75, 3.25). This means that there is a.95 probability that θ is between 2.75 and