STA 248 Winter 2005 Assignment 1

Similar documents
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Important Probability Distributions OPRE 6301

GETTING TO THE CORE: THE LINK BETWEEN TEMPERATURE AND CARBON DIOXIDE

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Descriptive Statistics

Lecture 1: Review and Exploratory Data Analysis (EDA)

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

CARBON THROUGH THE SEASONS

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Pr(X = x) = f(x) = λe λx

Exploratory data analysis (Chapter 2) Fall 2011

Energy Pathways in Earth s Atmosphere

List of Examples. Examples 319

Exploratory Data Analysis

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

5.1 Identifying the Target Parameter

The Normal Distribution

6.4 Normal Distribution

Chapter 6: Point Estimation. Fall Probability & Statistics

AP * Statistics Review. Descriptive Statistics

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

8. THE NORMAL DISTRIBUTION

Complete a table of values. Graph the values given in a table. Create an equation representing the information in a table or graph.

4. Continuous Random Variables, the Pareto and Normal Distributions

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Lecture 5 : The Poisson Distribution

Chapter 3. The Normal Distribution

Means, standard deviations and. and standard errors

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

3: Summary Statistics

FACTS ABOUT CLIMATE CHANGE

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

6 3 The Standard Normal Distribution

Random Variables. Chapter 2. Random Variables 1

Quantitative Methods for Finance

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Chapter 3 RANDOM VARIATE GENERATION

Notes on Continuous Random Variables

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

1. Let A, B and C are three events such that P(A) = 0.45, P(B) = 0.30, P(C) = 0.35,

a. mean b. interquartile range c. range d. median

Sta 309 (Statistics And Probability for Engineers)

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

The Normal Distribution

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

7 CONTINUOUS PROBABILITY DISTRIBUTIONS

Chapter 4 Lecture Notes

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Summarizing and Displaying Categorical Data

Thursday 8 November 2012 Afternoon

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

AP STATISTICS REVIEW (YMS Chapters 1-8)

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Descriptive Statistics

Master s Theory Exam Spring 2006

Figure 1. Basic structure of the leaf, with a close up of the leaf surface showing Stomata and Guard cells.

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Assignment #03: Time Management with Excel

Chapter 4. Probability and Probability Distributions

AP Statistics Solutions to Packet 2

Random variables, probability distributions, binomial random variable

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Permutation Tests for Comparing Two Populations

Tutorial 5: Hypothesis Testing

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

Variables. Exploratory Data Analysis

Dongfeng Li. Autumn 2010

How To Check For Differences In The One Way Anova

Fairfield Public Schools

You flip a fair coin four times, what is the probability that you obtain three heads.

Military Reliability Modeling William P. Fox, Steven B. Horton

Characteristics of Binomial Distributions

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

THE BINOMIAL DISTRIBUTION & PROBABILITY

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

BINOMIAL DISTRIBUTION

Unit 5 Photosynthesis and Cellular Respiration

1. Theoretical background

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Diagrams and Graphs of Statistical Data

Cruise Line Agencies of Alaska. Cruise Ship Calendar for 2016 FOR PORT(S) = KTN AND SHIP(S) = ALL AND VOYAGES = ALL

How To Write A Data Analysis

The Earth s Atmosphere

climate science A SHORT GUIDE TO This is a short summary of a detailed discussion of climate change science.

How Far is too Far? Statistical Outlier Detection

Cruise Line Agencies of Alaska. Cruise Ship Calendar for 2016 FOR PORT(S) = KTN AND SHIP(S) = ALL AND VOYAGES = ALL

Transcription:

STA 248 Winter 2005 Assignment 1 Due: Thursday, January 27 at beginning of lecture. (Late assignments will be subject to a deduction of 10% of the total marks for the assignment for each day late.) Please hand in your R code when used. On future assignments I won t be typing out the textbook problems. Let me know if you have any difficulty getting a copy of the text. Problems to be handed in for marking: Chapter 6: 12, 17, 25 Chapter 7: 9, 23, 25, 30, 59 Additional problems: 3 Problems from the textbook: Chapter 6: 8. (a) Acute exposure to cadmium produces respiratory distress and kidney and liver damage, and may even result in death. For this reason, the level of airborne cadmium dust and cadmium oxide fume in the air is monitored. This level is measured in milligrams cadmium per cubic meter of air. A sample of 35 readings yield the given data (available on the web). (a) Construct a stem-and-leaf diagram for these data. Use the numbers 02, 03, 04, 05, 06, and 07 as stems. (Do by hand.) (b) Would you be surprised to hear someone claim that the random variable X, the cadmium level in the air, is normally distributed? Explain. (d) Use R to construct a relative frequency histogram for these data. Does the histogram exhibit the bell-shape characteristic of a normal density? (e) Construct a relative cumulative frequency ogive for these data. Use the ogive to approximate that point above which 50% of the readings should fall. 12. (Percentiles.) Let X be a random variable. The point p k/100 such that P [X < p k/100 ] k/100 and P [X p k/100 ] k/100 is called the kth percentile for X. For example, let X be binomial with n = 20 and p =.5. The 25th percentile for X is the point p 25/100 = 8 since P [X < 8] =.1316.25 and P [X 8] =.2517.25 (a) Let X be binomial with n = 20 and p =.5. Find the 60th percentile for X. (b) Let X be Poisson with λ = 10. Find the 30th percentile for X. (d) Let X be exponentially distributed with β = 1. Show that the 20th percentile for X is ln.80. Hint: Find the point p such that p e x dx =.20 0 1

17. Consider the two given data sets (available on web). (a) Find the sample mean and sample median for each data set. (b) Find the sample range for each data set. (c) Find the sample variance and sample standard deviation for each data set. (d) Would you be surprised to hear someone claim that these data were drawn from the same population? Explain. Hint: Consider the shape of the distribution as well as the observed values of the sample statistics. 20. Use the data of Exercise 8 to approximate the mean, variance, and standard deviation of the random variable X, the level of airborne cadmium dust and cadmium oxide fumes. Assume that these approximations are fairly accurate. Between what two values would you expect approximately 95% of the readings to fall? Explain. 25. (Approximating σ via the range.) The range can play an important role in the design of statistical studies. To obtain a prespecified degree of accuracy when estimating population parameters, an adequate sized sample must be drawn. Most formulas used to determine sample size require knowledge of σ, the population standard deviation. Often the researcher will not have an estimate of σ available but will have an idea of the expected range of his or her data. When sampling from a normal distribution,. P [ 2σ < X µ < 2σ] =.95 If X is not normally distributed, then Chebyshev s inequality can be applied to conclude that P [ 3σ < X µ < 3σ].89 That is, X always lies within at most 3 standard deviations of its mean with high probability. From this it can be concluded that the estimated range covers an interval of roughly 4σ for normally distributed random variables and 6σ otherwise. In the normal case an estimate of σ can be obtained by solving the equation 4σ. = estimated range for σ. If X is not normally distributed, then σ. = (estimate range)/6 Data are given (available on the web) for the random variable X, the cpu time in seconds required to run a program using a statistical package. (a) Construct a stem-and-leaf diagram for these data. Is the assumption justified that X is normally distributed? (b) Approximate σ via the sample standard deviation s. (c) Find the sample range for these data, and use it to approximate σ. Compare your result to that obtained in part (b). 27. Let X be normally distributed with mean µ and variance σ 2. (a) Verify that q 3 = µ +.67σ and that q 1 = µ.67σ. (b) Find the interquartile range for X. (c) Verify that the inner fences for X are f 1 = µ 2.68σ and f 3 = µ + 2.68σ. (d) Verify that the probability that X will fall beyond the inner fences is approximately.007. 2

28. Temperature differences between the warm upper surface of the ocean and the colder deeper levels can be utilized to convert thermal energy to mechanical energy. This mechanical energy can in turn be used to produce electrical power using a vapor turbine. Let X denote the difference in temperature between the surface of the water and the water at a depth of 1 kilometer. Measurements are taken at 15 randomly selected sites in the Gulf of Mexico. The measured temperatures are available on the web. Use R to do the following. (a) Construct a double stem-and-leaf diagram for these data. (b) Find the sample mean, sample median, and sample standard deviation for these data. (c) Not that the observation with value 10.1 is very different from the others. It is a potential outlier. Construct a boxplot for these data to verify that the value 10.1 does appear to be an outlier. (d) To see the effect of this outlier, drop it from the data set and calculate the sample mean, median, and standard deviation for the remaining 14 observations. Which measure is least affected by the presence of the outlier? 36. It is known that power surges or line spikes can damage sensitive electronic equipment. A study of these surges is conducted. The purpose of the study is to ascertain whether or not there are differences in the frequency of these surges among the seven days of the week. Data for the study is found on the website. Variables are observation number; day, with m = Monday, t = Tuesday, w= Wednesday, th = Thursday, f = Friday, s = Saturday, and sn = Sunday; and number of spikes per day. Use R to do the following. (a) Obtain descriptive statistics on the number of spikes per day for each day of the week. Discuss any differences among days that appear to exist. (b) Construct boxplots for each day, and use the boxplots for a visual comparison of the days. Chapter 7: 1. Let X 1, X 2,..., X 20 be a random sample from a distribution with mean 8 and variance 5. Find the mean and variance of X. 5. Let X 1, X 2, X 3, X 4, X 5 be a random sample from a binomial distribution with n = 10 and p unknown. (a) Show that X/10 is an unbiased estimator for p. (b) Estimate p based on these data: 3, 4, 4, 5, 6. 9. (Weighted means.) Assume that one has k independent random samples of sizes n 1, n 2,..., n k from the same distribution. These samples generate k unbiased estimators for the mean, namely, X 1, X 2,..., X k. (a) Show that the arithmetic average of these estimators, (X 1 + X 2 + X k )/k, is also unbiased for µ. (b) Certain mineral elements required by plants are classed as macronutrients. Macronutrients are measured in terms of their percentage of the dry weight of the plant. Proportions of each element vary in different species and in the same species grown under differeing conditions. One macronutrient is sulfur. In a 3

study of winter cress, a member of the mustard family, these data, based on three independent random samples, are obtained: x 1 =.8 x 2 =.95 x 3 =.7 n 1 = 9 n 2 = 3 n 3 = 200 Use the result of part (a) to obtain an unbiased estimate for µ, the mean proportion of sulfur by dry weight in winter cress. By averaging the three values.8,.95, and.7 to obtain the estimate for µ, each sample is being given equal importance or weight. Does this seem reasonable in this problem? Explain. (c) To take sample sizes into account, a weighted mean is used. This estimator, ˆµ W, is given by ˆµ W = n 1X 1 + + n k X k n 1 + + n k Show that ˆµ W is an unbiased estimator for µ. (d) Use the data of part (b) to find the weighted estimate for the mean proportion of sulfur by dry weight in winter cress. Compare your answer to the estimate found in part (b). 16. Let X 1, X 2,..., X m be a random sample of size m from a binomial distribution with parameters n, assumed to be known, and p. Show that the method of moments estimator for p is ˆp = X/n. 17. Let X 1, X 2,..., X n be a random sample from a Poisson distribution with parameter λ. Find the method of moments estimate for λ. 23. Find the method of moments estimator for the parameter p of a geometric distribution. 25. Using the method of moments estimator for p found in Exercise 23, find an estimator for σ 2 for the geometric distribution. (You don t have to do the rest of this question that is in the text.) 27. Carbon dioxide is an odorless, colorless gass that constitutes about.035% by volume of the atmosphere. It affects the heat balance by acting as a one-way screen. It lets in the sun s heat to warm the oceans and the land but blocks some of the infrared heat that is radiated from the earth. This reflected heat is absorbed into the lower atmosphere, producing a greenhouse effect which causes the earth s surface to become warmer than it would be otherwise. Systematic measurements of CO 2 began in 1957 with Charles D. Keeling monitoring at Mauna Loa in Hawaii. (a) Given the data (available on the web) that are CO 2 readings in ppm, construct a stem-and-leaf plot (by hand) for these data using 31, 32, 32, 33, 33, 34, 34, 35 at stems. Graph leaves 0-4 on the first of each repeated stem and leaves 5-9 on the other. Is it reasonable to assume that the CO 2 level in the atmosphere is normally distributed? Explain. (b) Estimate µ and σ 2 using the method of moments estimators. (c) Find an unbiased estimate for σ 2. 29. Based on the data of Exercise 27, what are the maximum likelihood estimates for the mean and variance of the atmospheric CO 2 level? 4

30. Let X 1, X 2,..., X m be a random sample of size m from a binomial distribution with parameters n, assumed to be known, and p. Find the maximum likelihood estimator for p. Does it differ from the method of moments estimator found in Exercise 16? 31. Let W be an exponential random variable with parameter β unknown. Find the maximum likelihood estimator for β based on a sample of size n. Does it differ from the method of moments estimator (derived in lecture)? 34. Computer terminals have a battery pack that maintains the configuration of the terminal. These packs must be replaced occasionally. Let X denote the life span in years of such a battery. Assume that X is exponentially distributed with unknown parameter β. Find the maximum likelihood estimate for β based on the given data (available on the web). 35. To esimate the proportion of defective microprocessor chips being produced by a particular maker, samples of five chips are selected at 10 randomly selected times during the day. These chips are inspected, and X, the number of defective chips in each batch of size 5, is recorded. Assume that X is binomially distributed with n = 5 and p unknown. Use the data given (available on the web) to find the maximum likelihood estimate for p. 54. Let X denote the unit price of a 3.5-inch floppy diskette. Observations are obtained from a random sample of 10 suppliers. (Data are available on web.) (a) Find an unbiased estimate for the mean price of these diskettes. (b) Find an unbiased estimate for the variance in the price of these diskettes. (c) Find the sample standard deviation. Is this an unbiased estimate for σ? (d) Assume that X is normally distributed. Find the maximum likelihood estimate for σ 2. Does this agree with your answer to (b)? 59. Consider the random variable X with density given by f(x) = (1/θ 2 )xe x/θ, x > 0 (b) Show that E(X) = 2θ. (c) Find the method of moments estimator for θ. (d) Find the maximum likelihood estimator for θ based on a random sample of size n. Does this estimator differ from that found in part (c)? (e) Estimate θ based on these data: 3 5 2 3 4 1 4 3 3 3 (f) Are the estimators found in parts (c) and (d) unbiased estimators for θ? Additional problems: 1. Which of the following statistics can be made arbitrarily large by making one number out of a batch of 100 numbers arbitrarily large: the mean, the median, the 10% trimmed mean, the standard deviation, the interquartile range? 2. Suppose X 1,..., X n are n identically distributed random variables with E(X i ) = µ, i = i,..., n. Show that (X) 2 is not an unbiased estimate of µ 2. 5

3. What general features are evident in a boxplot of data from a normal distribution? from a skewed distribution? from a distribution that is symmetric and bell-shaped like the normal distribution, but has less probability in the tails (the extreme values)? from a distribution that is symmetric and bell-shaped like the normal distribution, but has more probability in the tails (the extreme values)? 4. In data compression of text, a probability model is used where the probability of the next letter is heavily influenced by the preceding letters. In a first-order Markov model, the probability of the next letter depends only on the one letter immediately preceding it. Suppose we are interested in a model for the compression of a binary string. I ll label the values b for black and w for white. For a first-order Markov model we need the following probabilities for the value of a letter given the value preceding it: P (w w) = p w, P (b w) = 1 p w, P (b b) = p b, P (w b) = 1 p b Suppose X i is the random variable that is 1 if the ith letter is w and 0 if the ith letter is b. Then given that the (i 1)th letter is w (say), the probability function of X i is P (X i = x X i 1 = 1) = p x w (1 p w) 1 x. Suppose the string bbbbwwwbbbbbwwbbbbbbwwwwb is observed. Use maximum likelihood to estimate the parameters p w and p b. 6