4. Continuous Random Variables, the Pareto and Normal Distributions



Similar documents
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

3.4 Statistical inference for 2 populations based on two samples

Normal distribution. ) 2 /2σ. 2π σ

Lecture 8. Confidence intervals and the central limit theorem

6.4 Normal Distribution

5.1 Identifying the Target Parameter

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Week 3&4: Z tables and the Sampling Distribution of X

The Standard Normal distribution

Estimation and Confidence Intervals

Descriptive Statistics

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Probability Distributions

Chapter 4. Probability and Probability Distributions

You flip a fair coin four times, what is the probability that you obtain three heads.

Math 151. Rumbos Spring Solutions to Assignment #22

The normal approximation to the binomial

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

Characteristics of Binomial Distributions

Week 4: Standard Error and Confidence Intervals

Notes on Continuous Random Variables

5. Continuous Random Variables

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

MBA 611 STATISTICS AND QUANTITATIVE METHODS

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

AMS 5 CHANCE VARIABILITY

WHERE DOES THE 10% CONDITION COME FROM?

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Exploratory Data Analysis

Social Studies 201 Notes for November 19, 2003

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Probability Distributions

Chapter 3 RANDOM VARIATE GENERATION

MEASURES OF VARIATION

CALCULATIONS & STATISTICS

Chapter 5. Random variables

Lesson 20. Probability and Cumulative Distribution Functions

Lecture Notes Module 1

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Key Concept. Density Curve

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Random variables, probability distributions, binomial random variable

AP Statistics Solutions to Packet 2

An Introduction to Basic Statistics and Probability

The normal approximation to the binomial

THE BINOMIAL DISTRIBUTION & PROBABILITY

Lecture 5 : The Poisson Distribution

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

PROBABILITY AND SAMPLING DISTRIBUTIONS

Introduction to Hypothesis Testing

Standard Deviation Estimator

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

6 3 The Standard Normal Distribution

The Normal distribution

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

2 ESTIMATION. Objectives. 2.0 Introduction

Simple Regression Theory II 2010 Samuel L. Baker

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Statistics 104: Section 6!

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

TEACHER NOTES MATH NSPIRED

3.4 The Normal Distribution

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Practice Problems for Homework #6. Normal distribution and Central Limit Theorem.

MAS108 Probability I

Point and Interval Estimates

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

The Binomial Probability Distribution

Practice problems for Homework 11 - Point Estimation

Stats on the TI 83 and TI 84 Calculator

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Chapter 4. iclicker Question 4.4 Pre-lecture. Part 2. Binomial Distribution. J.C. Wang. iclicker Question 4.4 Pre-lecture

Means, standard deviations and. and standard errors

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

CURVE FITTING LEAST SQUARES APPROXIMATION

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

7.1 Graphs of Quadratic Functions in Vertex Form

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

SAMPLING DISTRIBUTIONS

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Stat 5102 Notes: Nonparametric Tests and. confidence interval

2 Sample t-test (unequal sample sizes and unequal variances)

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Joint Exam 1/P Sample Exam 1

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Probability Distribution for Discrete Random Variables

Normal Distribution as an Approximation to the Binomial Distribution

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

The Normal Distribution

Statistical Confidence Calculations

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

WEEK #22: PDFs and CDFs, Measures of Center and Spread

Exercise 1.12 (Pg )

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Transcription:

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random variable is given by a probability density curve. The total area under this curve is 1. 1 / 95

Probability density curves, histograms and probability Note: If we observed a very large number of observations and drew a histogram using a large number of intervals, then the histogram would look very similar to the probability density curve. Hence, a histogram is an estimator of the probability density curve. The probability density curve may be thought of as the histogram for the whole population. 2 / 95

A histogram as an estimator of the probability density curve 3 / 95

Probability density curves and probability The probability that a random variable takes a value between a and b, P(a < X < b) is the area under the density curve between x = a and x = b. Note that for a < b, P(a < X < b) = P(X < b) P(X < a) = P(X > a) P(X > b) 4 / 95

Probability density curves and probability-ctd. The probability that a random variable takes a value less than a, P(X < a), is the area under the density curve up to x = a. 5 / 95

Probability density curves and probability-ctd. The probability that a random variable takes a value greater than b, P(X > b), is the area under the density curve starting at x = b. 6 / 95

4.1 The Pareto distribution Pareto distributions are used to model the distribution of full time salaries, especially when there is a minimum wage. Pareto distributions are clearly right skewed. A Pareto distribution is defined by two parameters x m, the minimum value the random variable can take (i.e. the minimum wage) and α, where x m > 0 and α > 0. The parameter α describes the degree of concentration of the distribution. The smaller α (i.e. the closer α is to 0), the heavier the tail of the distribution (i.e. the proportion of individuals earning very high salaries increases). 7 / 95

The Pareto distribution If X has a Pareto distribution, then we write X Pareto(x m, α). The density function is given by f (x) = αxα m x α+1, when x > x m, otherwise f (x) = 0. For α > 1, the expected value of a Pareto random variable X is given by E(X ) = αxm α 1. 8 / 95

Probability density function of a Pareto distribution 9 / 95

Probability density function of a Pareto distribution In order to calculate probabilities for the Pareto distribution, we use { 1, x < x P(X > x) = m (x m /x) α,, x x m together with the interval rule, which states that if a b P(a < X < b) = P(X > a) P(X > b), and the rule of complementarity, which states that P(X < x) = 1 P(X > x). The last two rules hold for any continuous distribution. 10 / 95

Example 4.1 Suppose that monthly salaries, denoted X, have a Pareto(1000, 2) distribution. Calculate 1. The expected (mean) salary. 2. The probability that an individual earns above the mean salary. 3. The probability that an individual earns below 1500. 4. The probability that an individual earns between 3000 and 6000. 5. The median salary (50% of individuals earn above the median salary). 11 / 95

Example 4.1 1. The mean salary is given by 2. We have E(X ) = αx m α 1 = 2 1000 = 2000. 2 1 P(X > 2000) = ( xm ) ( ) α 1000 2 = = 0.25. 2000 2000 Hence, only 25% of individuals earn more than the mean salary. 12 / 95

Example 4.1 3. We need to calculate P(X < 1500). Using the rule of complementarity. ( xm ) α P(X < 1500)=1 P(X > 1500) = 1 1500 =1 (2/3) 2 = 5/9. 13 / 95

Example 4.1 4. We need to calculate P(3000 < X < 6000). Using the interval rule P(3000 < X < 6000)=P(X > 3000) P(X > 6000) =(1000/3000) 2 (1000/6000) 2 =1/9 1/36 = 1/12 14 / 95

Example 4.1 5. We need to find the value k for which P(X > k) = 0.5, i.e. we need to solve (1000/k) 2 = 0.5. Taking square roots: 1000/k = 0.5 0.707. It follows that k 1000/0.707 1414. Hence, the median wage is 1414 (just over 70% of the mean wage). 15 / 95

4.2 The normal distribution The normal distribution has a symmetric bell shaped density curve. This curve is defined by two quantities a) The theoretical (population) mean µ (this gives the centre of the distribution). The distribution is symmetric about x = µ. b) The variance σ 2 (describes the dispersion of the distribution). We write X N(µ, σ 2 ). The (theoretical) variance of the random variable is σ 2. Note that some textbooks give the standard deviation σ as a parameter of the distribution, rather than the variance. 16 / 95

Probability density function of a normal distribution 17 / 95

4.3 The standard normal distribution A random variable with a standard normal distribution will be denoted by Z. Such a random variable has mean 0 and variance 1, i.e. the standard deviation is also 1. Values for P(Z > k) are given in Table 3 of the Murdoch and Barnes book of mathematical formulae (which will be available during the exam), for various k 0. If k is greater than 4, then we may assume P(Z > k) = 0. 18 / 95

Probabilities read from the table for the standard normal distribution This graph illustrates the probabilities given in the table. 19 / 95

Calculating probabilities using the table for the standard normal distribution Using the symmetry rule (when we change the sign of k we change the direction of the inequality ) 1. P(Z < k) = P(Z > k) 20 / 95

Calculating probabilities using the table for the standard normal distribution In order to calculate P(Z < k), we use the rule of complementarity 2. P(Z < k) = 1 P(Z > k). 21 / 95

Calculating probabilities using the table for the standard normal distribution Also, to calculate probabilities of the form P(a < Z < b), we use the interval rule 3.P(a < Z < b) = P(Z > a) P(Z > b). 22 / 95

Calculating probabilities using the table for the standard normal distribution Using these three rules and the table for the standard normal distribution, we can calculate the probability that Z comes in any interval. The probability P(Z > k) is found in the row corresponding to the digits in k directly before and after the decimal point and the column corresponding to the second digit after the decimal point. 23 / 95

Example 4.2 Calculate i) P(Z > 2.13) ii) P(Z < 1.05) iii) P(Z < 0.87) iv) P(Z > 2.06) v) P(0.23 < Z < 1.49) 24 / 95

Solution to Example 4.2 i) P(Z > 2.13). We can read this directly from the table. This probability is given in the row corresponding to 2.1 and the column corresponding to 0.03. Thus, P(Z > 2.13) = 0.01659 25 / 95

Solution to Example 4.2 ii) P(Z < 1.05). When we have a negative constant, we use the symmetry rule i.e. P(Z < k) = P(Z > k) P(Z < 1.05) = P(Z > 1.05) = 0.1469 Note: Whenever the number on the right-hand side is negative, we will have to use the symmetry rule, but it might not lead immediately to the appropriate form. (see Example iv). 26 / 95

Solution to Example 4.2 iii) P(Z < 0.87). In this case (a positive constant, but the inequality is the wrong way round for us to read the probability directly), we use the rule of complementarity. P(Z < 0.87) = 1 P(Z > 0.87) = 1 0.1922 = 0.8078 27 / 95

Solution to Example 4.2 iv) P(Z > 2.06). First we use the rule of symmetry P(Z > 2.06) = P(Z < 2.06) Using the rule of complementarity P(Z < 2.06) = 1 P(Z > 2.06) = 1 0.0197 = 0.9803. 28 / 95

Solution to Example 4.2 v) P(0.23 < Z < 1.49). In this case we first use the interval rule P(a < Z < b) = P(Z > a) P(Z > b) Thus, P(0.23 < Z < 1.49)=P(Z > 0.23) P(Z > 1.49) =0.409 0.0681 = 0.3409 29 / 95

Calculating values from probabilities using the table for the standard normal distribution Sometimes we may wish to find the value of k for which P(Z > k) = p or P(Z < k) = p for some given p. Using the symmetry and complementarity rules, we can transform such a problem to the problem of finding P(Z > c) = p, where p < 0.5. The value of c can be found by finding the value closest to p in the heart of the table. The value of c corresponds to the row (digits around the decimal point) and the column (second decimal place). 30 / 95

Example 4.3 Find k such that i) P(Z > k) = 0.4 ii) P(Z < k) = 0.8 iii) P(Z > k) = 0.7 iv) P(Z < k) = 0.1 31 / 95

Solution to Example 4.3 i) P(Z > k) = 0.4. This is in the appropriate form to read directly from the table. We find the value closest to 0.4 in the heart of the table. This value is 0.4013 and is in the row corresponding to 0.2 and the column corresponding to 0.05. Hence, P(Z > 0.25) 0.4. Thus, k 0.25. 32 / 95

Solution to Example 4.3-ctd. ii) P(Z < k) = 0.8. In this case we cannot read k directly. In order to obtain a value of less than 0.5 on the right-hand side, we use the rule of complementarity P(Z > k) = 1 P(Z < k) = 0.2. The value in the heart of the table closest to 0.2 is 0.2005, which is in the row corresponding to 0.8 and the column corresponding to 0.04. It follows that Hence, k 0.84. P(Z > 0.84) 0.2 33 / 95

Solution to Example 4.3-ctd. iii) P(Z > k) = 0.7. Again we cannot directly read k. From the sketch below, it is clear that k < 0). 34 / 95

Solution to Example 4.3-ctd. Using complementarity to obtain a number on the right hand side less than 0.5 P(Z < k) = 1 P(Z > k) = 0.3. Once we have an appropriate number on the right hand side, we can use the law of symmetry to obtain the appropriate type of inequality P(Z < k) = P(Z > k) = 0.3. P(Z > k) = 0.3 is in the appropriate form. We can now read k from the table. As before k 0.52 Thus, k 0.52. 35 / 95

Solution to Example 4.3-ctd. iv) P(Z < k) = 0.1. Again k < 0. Using symmetry P(Z < k) = P(Z > k) = 0.1. From the tables P(Z > 1.28) 0.1. It follows that k 1.28 i.e. k 1.28. 36 / 95

4.4 Standardisation of a normal random variable By adding or subtracting a constant from a normally distributed random variable, we add or subtract (as appropriate) that constant from the mean of the random variable. The shape of the density curve remains the same (i.e. the dispersion is unchanged). Hence if X N(µ, σ 2 ), Y = X µ has a normal distribution with mean 0. 37 / 95

Standardisation of a normal random variable By dividing a random variable by a factor c, the standard deviation is decreased by the same factor. The density curve remains bell shaped. It follows that if X N(µ, σ 2 ), then Z = X µ σ, has a normal distribution with mean 0 and variance (and standard deviation) 1 (i.e. Z is a standard normal random variable). 38 / 95

Standardisation of a normal random variable Hence, in order to calculate the probability that any normal random variable takes a value in a given interval, we: i. First standardize. ii. Use the 3 rules given and tables to calculate the appropriate probability. 39 / 95

Example 4.4 The height of an Irish adult has a normal distribution with mean 170cm and variance 225cm 2. Calculate the probability that the height of an Irish adult is a) more than 191cm. b) less than 164cm. c) between 158 and 179cm. 40 / 95

Solution to Example 4.4 We have X N(170, 225). Hence, Z = X µ σ = X 170 225. a) We wish to calculate P(X > 191). First we standardise both sides of the inequality by subtracting 170 (the mean µ) and dividing by 15 (the standard deviation σ). P(X > 191) = P( X µ σ > 191 170 ) 15 41 / 95

Solution to Example 4.4 On the left hand side of this inequality we now have a standard normal random variable P(Z > 191 170 ) = P(Z > 1.4) = 0.0808 15 42 / 95

Solution to Example 4.4 ii) P(X < 164). First we standardize as before P(X < 164)=P( X µ 164 170 < ) σ 15 =P(Z < 0.4) Using the rule of symmetry P(Z < 0.4) = P(Z > 0.4) = 0.3446 43 / 95

Solution to Example 4.4 iii) P(158 < X < 179). First, we standardise on all three sides 158 170 P(158 < X < 179)=P( < Z < 15 =P( 0.8 < Z < 0.6) 179 170 ) 15 Using the interval rule P( 0.8 < Z < 0.6) = P(Z > 0.8) P(Z > 0.6) 44 / 95

Solution to Example 4.4 Using the law of symmetry and then the law of complementarity for the first probability P(Z > 0.8) P(Z > 0.6)=P(Z < 0.8) 0.2743 =1 P(Z > 0.8) 0.2119 =1 0.2119 0.2743 = 0.5138. 45 / 95

The normal distribution and probabilities of extreme values Note: If a variable has a normal distribution, then 1. The probability of an observation being within 1 standard deviation of the mean is approx. 2 3. 2. The probability of being in one of the tails is thus approx. 1 6. 3. The probability of an observation being within 2 (to be exact 1.96) standard deviations of the mean is approx. 0.95. 4. The probability of being in one of the tails is thus approx. 0.025. 5. The probability of being within 2.576 standard deviations of the mean is 0.99 46 / 95

The normal distribution and probabilities of extreme values 47 / 95

Example 4.5 The height of humans is normally distributed with mean 170cm and standard deviation 10cm. Approx. 2 3 of people are between 160cm and 180cm (i.e. 170 ± 10cm). Approx. 95% of people are between 150cm and 190cm (i.e. 170 ± 20cm). Note: These approximations are only valid when the distribution is normal. 48 / 95

4.5 Importance of the normal distribution, the central limit theorem The central limit theorem states that if a variable X is the sum of a large number of independent random variables (of comparible means and standard deviations), then X is approximately normally distributed. Note: large is usually interpreted as at least 30. If the random variables in the sum have a symmetric distribution, then this approximation will be very good. 49 / 95

4.5.1 Practical consequences of the central limit theorem 1. Many variables have a normal distribution. For example height is the sum of many factors: genetic, environmental and dietary (no individual factor is very important), and fits the normal distribution well. 2. The mean of a sample is simply the sum of the observations divided by a constant (the number of observations). Hence, if there is a large number of observations of a single variable, the distribution of the sample mean always fits the normal distribution well (i.e. the sample mean will be normally distributed around the population mean). 50 / 95

Practical consequences of the central limit theorem If there are a small number of observations, then the sample mean will only fit the normal distribution well if the observations are from a normal distribution (e.g. height, intelligence quotient). This fact is very important in statistical testing, since it is usually assumed that the mean of a sample is normally distributed. This assumption may not be appropriate when we are dealing with small samples. 51 / 95

Practical consequences of the central limit theorem 3. Suppose X has a binomial distribution with parameters n and p and n is large (and preferably p is not close to either 0 or 1). e.g. The number of heads when I throw a coin n times has a Bin(n, 1 2 ) distribution. If a proportion p of voters support party Y and n people are asked who they support. (Assuming they do not lie) The number of supporters of party Y has a binomial(n, p) distribution. 52 / 95

Practical consequences of the central limit theorem In the first example the total number of heads can be expressed at the sum of the number of heads from each individual throw (the number of heads from one throw is 1 with probability p otherwise it is zero). It follows that for large n, the number of heads will be approximately normally distributed. Using a similar argument, for large n the number of supporters of party Y will be approximately normally distributed. This approximation works well if the sample size n is at least 30 and p is between 0.1 and 0.9. 53 / 95

Practical consequences of the central limit theorem 4. For a large sample the proportion of observations in a given class is approximately normally distributed. This results from the fact that this proportion is simply the total number of such observations. (from 3, this is approximately normally distributed) divided by a constant (which does not change the form of the distribution). 54 / 95

4.5.2 The normal approximation to the binomial distribution Suppose n is large and X Bin(n, p), then X approx N(µ, σ 2 ), where µ = np, σ 2 = np(1 p). This approximation is used when n 30, 0.1 p 0.9. 55 / 95

The continuity correction for the normal approximation to the binomial distribution It should be noted that X has a discrete distribution, but we are using a continuous distribution in the approximation. For example, suppose we wanted to estimate the probability of obtaining exactly k heads when we throw a coin n times. This probability will in general be positive. However, if we use the normal approximation without an appropriate correction, we cannot sensibly estimate P(X = k) [for continuous distributions P(X = k) = 0]. 56 / 95

The continuity correction for the normal approximation to the binomial distribution Suppose the random variable X takes only integer values and has an approximately normal distribution. In order to estimate P(X = k), we use the continuity correction. This uses the fact that when k is an integer P(X = k) = P(k 0.5 < X < k + 0.5). 57 / 95

Example 4.6 Suppose a coin is tossed 36 times. Using CLT, estimate the probability that exactly 20 heads are thrown. 58 / 95

Example 4.6 Let X be the number of heads. We have X Bin(36, 0.5). Hence, E(X )=np = 36 0.5 = 18 Var(X )=np(1 p) = 36 0.5 0.5 = 9 It follows that X approx N(18, 9). We wish to estimate P(X = 20). Using the continuity correction, P(X = 20)=P(19.5 < X < 20.5) 19.5 18 =P( < X µ 20.5 18 < ) 9 σ 9 P(0.5 < Z < 0.83) = P(Z > 0.5) P(Z > 0.83) =0.3085 0.2033 = 0.1052 59 / 95

The continuity correction for the normal approximation to the binomial distribution This continuity correction can be adapted to problems in which we have to estimate the probability that the number of successes is in a given interval. e.g. P(15 X < 21)=P(X = 15) + P(X = 16) +... + P(X = 20) =P(14.5 < X < 15.5) +... + P(19.5 < X < 20.5) =P(14.5 < X < 20.5) In general, if the boundary of an interval is given by a non-strict inequality, we widen that end of the boundary by 0.5. If the boundary of an interval is given by a strict inequality, we narrow that end of the boundary by 0.5. 60 / 95

Example 4.7 A die is thrown 180 times. Estimate the probability that 1) at least 35 sixes are thrown 2) between 27 and 33 sixes are thrown (inclusively). 61 / 95

Example 4.7 Let X be the number of sixes. We have X Bin(180, 1 6 ) E(X )=np = 180 1 6 = 30 Var(X )=np(1 p) = 180 1 6 5 6 = 25 62 / 95

Example 4.7 i) Using the continuity correction P(X 35)=P(X = 35) + P(X = 36) +... =P(34.5 < X < 35.5) + P(35.5 < X < 36.5) +... =P(X > 34.5) Standardising P(X > 34.5)=P( X µ 34.5 30 > ) σ 25 P(Z > 0.9) = 0.1841. Note: Strictly speaking, we shouid calculate P(35 X 180) = P(X 35) P(X > 180). However, when the normal approximation is appropriate, the 2nd probability (that the number of successes is greater than the number of experiments) is estimated to be close to 0. 63 / 95

Example 4.7 ii) Using the continuity correction P(27 X 33)=P(X = 27) + P(X = 28) +... + P(X = 33) Standardising =P(26.5 < X < 27.5) +... + P(32.5 < X < 33.5) =P(26.5 < X < 33.5) 26.5 30 P(26.5 < X < 33.5)=P( < X µ 25 σ < 33.5 30 25 ) =P( 0.7 < Z < 0.7) = P(Z > 0.7) P(Z > 0.7) =P(Z < 0.7) P(Z > 0.7) = 1 2P(Z > 0.7) =1 2 0.242 = 0.516 64 / 95

The normal approximation to the binomial It should be noted that the normal approximation to the binomial is most accurate when n is large and p is close to 0.5. This is due to the fact that X = X 1 + X 2 +... + X n, where X i 0 1(p). The distribution of X i is symmetric when p = 0.5. 65 / 95

4.6 Confidence intervals for population means Suppose we take a large number of samples of size n. The distribution of the sample means will be distributed around the population mean. If the size of the samples is increased, you would expect that the average error obtained by estimating the population mean using the sample mean will decrease (the distribution of the sample mean will be more concentrated around the population mean). If the population standard deviation of the observations is σ, then the standard deviation of the sample mean from a sample of n observations is σ n (otherwise known as the standard error, S.E.(x)). 66 / 95

4.6.1 Confidence intervals for the mean with large samples All the calculations in Section 4.6 assume that the sample mean has a normal distribution. This is always reasonable when there is a large number of observations. When there is a small number of observations, this is only reasonable if the observations themselves have a normal distribution. When the sample size is large (n > 30), we may assume that the sample standard deviation s is a good estimate of the population standard deviation σ. Hence, we can use s n as an approximation of the standard error. The sample mean X is the best estimator of the population mean µ (this is a point estimate). 67 / 95

Confidence intervals for the population mean A point estimate does not indicate the expected error of that estimate, so an interval estimate should be used (e.g. the average population height is 175 ± 4cm). The default confidence level is 95%, i.e. if we calculate one hundred 95% confidence intervals for the mean based on 100 samples, then on average 95 of them will contain the real population mean. The population mean is not guaranteed to lie within a confidence interval. 68 / 95

Confidence intervals for the population mean with large samples Since 95% of the observations of the sample mean will lie within 1.96 standard errors of the population mean, the following is a 95% confidence interval for the population mean (an interval estimate) 95% confidence interval for the population mean (large sample) x ± 1.96s n = x ± 1.96S.E.(x) 69 / 95

Confidence intervals for the population mean with large samples Similarly, 99% of the observations of the sample mean will lie within 2.576 standard errors of the population mean, the following is a 99% confidence interval for the population mean. 99% confidence interval for the population mean (large sample) x ± 2.576s n = x ± 2.576S.E.(x) A general equation for calculating confidence intervals at a given confidence level will be given in the following subsection. 70 / 95

Example 4.8 Suppose the mean weekly wage of 100 randomly chosen Irish workers is 420 Euros and the sample standard deviation is 300. Calculate a 95% confidence interval for the mean weekly wage of all Irish workers. 71 / 95

Solution to Example 4.8 The standard error of the sample mean is approximately S.E.(x) = s n = 300 100 = 30. The 95% confidence interval for the mean weekly wage of all Irish adults is x ± 1.96S.E.(x)=420 ± 1.96 30 =420 ± 58.8 = [361.2, 478.8]. The narrower a confidence interval, the more accurately we are estimating a population mean. 72 / 95

4.6.2 Confidence intervals for the population mean with small samples When the sample size is small we cannot assume that the sample standard deviation is a good estimate of the population standard deviation. In this case, a confidence interval must reflect the increased degree of uncertainty resulting from not knowing the population standard deviation (i.e. the confidence interval must be wider). 73 / 95

The student distribution Suppose the observations X 1,..., X n are normally distributed. n(x µ) Z = σ has a normal distribution with mean 0 and standard deviation 1 (where µ and σ are the population mean and popluation standard deviation, respectively). Let n(x µ) T n 1 =, s where s is the sample standard deviation. T n 1 has a student distribution with n 1 degrees of freedom. 74 / 95

Relation betweeen the student distribution and the normal distribution Since s is an estimate of σ, the distribution of T n 1 will be similar to the distribution of Z. However, this estimation introduces a larger degree of uncertainty and the dispersion of the T n 1 distribution will be larger than the dispersion of the Z distribution. As n increases, s becomes a very good estimate of σ. Thus, for large n, the distribution of T n 1 will converge to the standard normal distribution. 75 / 95

Critical values for the student distribution The p-critical value of the student distribution with n 1 degrees of freedom is denoted t n 1,p. It satisfies P(T n 1 > t n 1,p ) = p. By symmetry, a proportion 1 2p of sample means will be within t n 1,p standard deviations of the mean. For n < 30 these critical values can be read from Table 7. For n > 30 we use the fact that for large n the student distribution converges to the standard normal distribution. The appropriate critical values for the normal distribution can also be read from Table 7. They are given as t,p. 76 / 95

Critical values for the student distribution The graph illustrates the critical value t n 1,0.005 ( 2.6 in this case). 77 / 95

Confidence intervals for the population mean with small samples For a small sample, the following is a 100(1 α)% confidence interval for the population mean 100(1 α)% confidence interval for the population mean (small sample) s x ± t n 1, α = x ± t 2 n 1, α S.E.(x) n 2 78 / 95

95% confidence interval for the population mean with small samples For a 95% confidence interval 100(1 α) = 95. Hence, α = 0.05. 95% confidence interval for the population mean (small sample) s x ± t n 1,0.025 = x ± t n 1,0.025 S.E.(x) n 79 / 95

99% confidence interval for the population mean with small samples For a 99% confidence interval 100(1 α) = 99. Hence, α = 0.01. 99% confidence interval for the population mean (small sample) x ± t n 1,0.005 s n = x ± t n 1,0.005 S.E.(x) 80 / 95

Example 4.9 9 students were weighed. Their average weight was 68kg and the standard deviation 12kg. Calculate a 99% confidence interval for the mean weight of all students. 81 / 95

Solution to Example 4.9 This confidence interval is given by s 12 x ± t n 1,0.005 =68 ± t 8,0.005 n 3 =68 ± 4 3.355 = 68 ± 13.42 = [54.58, 81.42] Note: Since the sample size is small and weight has a skewed distribution (i.e. is not normally distributed), this is only approximately a 99% confidence interval. 82 / 95

4.7 Confidence intervals for proportions We want to estimate the proportion p of people in the population who have trait A. It will be assumed that the sample size n is large (n > 30). In this case the distribution of the sample proportion will be approximately normal (unless p is very close to 0 or 1). Suppose x people in a sample of n have trait A. ˆp = x n (the sample proportion) is an estimator of p (the population proportion). 83 / 95

Standard error of the sample proportion The standard error of this estimator is p(1 p) S.E.(ˆp) =, n which can be approximated using ˆp(1 ˆp) S.E.(ˆp) n It should be noted that the maximum standard error is attained when p = 0.5. It follows that S.E.(ˆp) S.E. max (ˆp) = 1 2 n 84 / 95

Formula for a confidence interval for the population proportion For large n and p not close to 0 or 1, ˆp will be approximately normally distributed. It follows that a 100(1-α)% confidence interval for p is given by ˆp ± t, α 2 S.E.(ˆp) 85 / 95

Particular cases of confidence intervals for the population proportion A 95% confidence interval for p is given by ˆp ± t,0.025 S.E.(ˆp) A 99% confidence interval for p is given by ˆp ± t,0.005 S.E.(ˆp) 86 / 95

Estimating the population proportion to a given accuracy The upper bound on the standard error of the sample proportion is useful when defining the sample size necessary to estimate the population proportion to a given accuracy. To calculate the sample size required to estimate a population proportion to within δ with a confidence of 100(1 α)%, we require that the error term t, α S.E.(ˆp) is smaller than the 2 required accuracy δ, i.e. t, α S.E.(ˆp) δ. 2 This will always be satisfied if t, α 2 S.E. max(ˆp) δ t, α 2 2 n δ. To find the appropriate sample size, we have to solve this inequality. 87 / 95

Example 4.10 In a survey of 300 people, 75 stated that they would vote for the Labour party. i) Calculate a 95% confidence interval for the proportion of people wishing to vote for the Labour party. ii) What sample size is needed in order to estimate this proportion to within 3% with a confidence level of 99%? 88 / 95

Solution to Example 4.10 i) The formula for this confidence interval is given by ˆp ± t,0.025 S.E.(ˆp) We have Also, ˆp = 75 300 = 1 4 = 0.25. ( ) ˆp(1 ˆp) S.E.(ˆp) n ( ) 3 = = 16 300 t,0.025 =1.96 1 1600 = 0.025 89 / 95

Solution to Example 4.10 Hence, the 95% confidence interval is given by 0.25 ± 1.96 0.025=0.25 ± 0.049 =[0.201, 0.299] 90 / 95

Solution to Example 4.10 For a 99% confidence level, we require t, α 2 S.E. max(ˆp) δ 2.576 2 n 0.03 2.576 2 0.03 n 42.933 n 42.933 2 n 1843.27 n. Hence, since the sample size is an integer we require at least 1844 observations. 91 / 95

4.8 Estimating a population mean to a given accuracy In a similar way, we can estimate the sample size required to estimate a population mean to some required accuracy. However, the standard error depends on the population standard deviation, which in general is unknown. Unlike the standard error for a population proportion, we cannot give an upper bound on the standard error for a population mean (it simply increases as the population standard deviation increases). Normally a two stage procedure is used. Firstly, a relatively small sample is used to estimate the standard deviation and then the estimation of the required sample size is based on the standard deviation for this sample. 92 / 95

Estimating a population mean to a given accuracy We assume that the required sample size is relatively large (i.e. n > 30). As previously, we require the error term in the corresponding confidence interval to be less than or equal to the permitted error, δ i.e. t,α/2 S.E.(X ) δ t,α/2 s δ. n 93 / 95

Example 4.11 The monthly salaries of 30 randomly chosen Irish adults were observed and the sample standard deviation was 1000 Euro. Find the sample size required to estimate the mean monthly salary of Irish adults to within 200 Euro with a confidence level of 95%. 94 / 95

Solution to Example 4.11 We have s = 1000, α = 0.05, δ = 200. Hence, we need to solve the following inequality for n: t,0.025 s 200 n 1.96 1000 200. n Hence, n 1.96 5 = 9.8 n 9.8 2 = 96.04 It follows that we need at least 97 observations (i.e. at least another 67 observations in addition to the initial sample). 95 / 95