Week 3&4: Z tables and the Sampling Distribution of X



Similar documents
Characteristics of Binomial Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Normal distribution. ) 2 /2σ. 2π σ

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Lesson 7 Z-Scores and Probability

Chapter 4. Probability and Probability Distributions

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Probability Distributions

You flip a fair coin four times, what is the probability that you obtain three heads.

Measures of Central Tendency and Variability: Summarizing your Data for Others

Descriptive Statistics

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

CALCULATIONS & STATISTICS

SAMPLING DISTRIBUTIONS

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

MBA 611 STATISTICS AND QUANTITATIVE METHODS

The Normal distribution

Descriptive Statistics and Measurement Scales

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

6 3 The Standard Normal Distribution

Lesson 20. Probability and Cumulative Distribution Functions

Stats on the TI 83 and TI 84 Calculator

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Lesson 9 Hypothesis Testing

6.4 Normal Distribution

An Introduction to Basic Statistics and Probability

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Non-Parametric Tests (I)

Statistics 2014 Scoring Guidelines

The Normal Distribution

AP Statistics Solutions to Packet 2

CURVE FITTING LEAST SQUARES APPROXIMATION

Descriptive Statistics

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

HYPOTHESIS TESTING: POWER OF THE TEST

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Statistics 104: Section 6!

MATH 140 Lab 4: Probability and the Standard Normal Distribution

5.1 Identifying the Target Parameter

Point and Interval Estimates

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing

Standard Deviation Estimator

3.4 Statistical inference for 2 populations based on two samples

Hypothesis Testing for Beginners

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Normal distributions in SPSS

How To Test For Significance On A Data Set

Simple Regression Theory II 2010 Samuel L. Baker

Introduction to Hypothesis Testing OPRE 6301

3: Summary Statistics

Normal Distribution as an Approximation to the Binomial Distribution

Chapter 5: Normal Probability Distributions - Solutions

WISE Sampling Distribution of the Mean Tutorial

Estimation and Confidence Intervals

The normal approximation to the binomial

THE BINOMIAL DISTRIBUTION & PROBABILITY

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Key Concept. Density Curve

Frequency Distributions

Chapter 6: Probability

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Binary Adders: Half Adders and Full Adders

WHERE DOES THE 10% CONDITION COME FROM?

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Probability. Distribution. Outline

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Exploratory Data Analysis

The Binomial Probability Distribution

Mind on Statistics. Chapter 2

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Pr(X = x) = f(x) = λe λx

Math 151. Rumbos Spring Solutions to Assignment #22

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

The Standard Normal distribution

Lesson 4 Measures of Central Tendency

Lecture 1: Review and Exploratory Data Analysis (EDA)

Independent samples t-test. Dr. Tom Pierce Radford University

Exercise 1.12 (Pg )

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

9. Sampling Distributions

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Mathematics (Project Maths Phase 1)

Descriptive Statistics

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Lecture 5 : The Poisson Distribution

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Probability Distributions

Notes on Continuous Random Variables

MEASURES OF VARIATION

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Transcription:

Week 3&4: Z tables and the Sampling Distribution of X

2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal random variable, X N(µ, σ 2 ), can be converted to a Z = X µ σ. Probabilities for these variables are areas under the curve, but since we don t use calculus in the this course, we can use software or a Z table to find probabilities. The random variable, Z, is continuous which means the probabilty at any exact point is always 0. Thus, we will find probabilities for ranges of values.

3 / 36 First, some general characteristics of the Z distribution. The area under the entire curve is 1 since it represents all possible values. Because it is symmetric, the mean = the median, so the area under the curve to the left of 0 is 0.5 (as is the area to the right). We say, The probability that Z is less than 0 is 0.5. This is written as P(Z < 0) = 0.5. Again, since Z is continuous, P(Z 0) = P(Z < 0) = 0.5.

4 / 36 We will use the Z table found on the Stat30X webpage - http://www.stat.tamu.edu/stat30x/zttables.php Notice that the only entry on both pages of the table is z = 0.00 and the probability is 0.5000. The rows of the table are the z-scores with the columns indicating the 2 nd decimal. The body of the table contains the probabilitiesis to the left of any particular z-score = z.zz. For example, the P(Z < 0.00) = 0.5000 and P(Z < 0.07) = 0.5279.

5 / 36 Examples of reading the table: P(Z < 1.25) = 0.8944 P(Z < 0.50) = 0.6915

6 / 36 P(Z < 0.75) = 0.2266 P(Z < 2.01) = 0.0222

7 / 36 The Z -table only gives probabilities to the left of a value. If we want to get probabilities to the right we use the complement rule, P(Z > z) = 1 P(Z < z). P(Z > 1.25) = 1 0.8944 = 0.1056 P(Z > 0.50) = 1 0.6915 = 0.3085

8 / 36 P(Z > 0.75) = 1 0.2266 = 0.7734 P(Z > 2.01) = 1 0.0222 = 0.9778

9 / 36 To find probabilities between two numbers, find the larger area (using the larger value) first and then subtract the smaller area. Remember, a probability can never be negative, so check your work! P( 2.01 < Z < 2.01) = P(Z < 2.01) P(Z < 2.01) = 0.9778 0.0222 = 0.9556

10 / 36 Now suppose we have a non-standard normal, X N(µ, σ 2 ), and we want to know the probability that X is less than some value. We must first convert the X to a Z and then use the probabilities from the Z -table. Recall that if X N(µ, σ 2 ), then so Z = X µ σ N(0, 1 2 ) P(X < x) = P(Z < x µ σ ) Beware: P(X > x) 1 P(X < x) if X is not centered at 0. You must convert to a Z before using the complement rule.

11 / 36 Suppose X N(2, 3 2 ), Given a value x, find the corresponding z and then the probability. Find P(X > 5) = P ( X µ s ) > 5 2 3 = P(Z > 1) = 1 P(Z < 1) = 1 0.8413 = 0.1587. ( ) 4 2 Find P( 4 < X < 8) = P < X µ < 8 2 3 s 3 = P( 2 < Z < 2) = P(Z < 2) P(Z < 2) = 0.9772 0.0228 = 0.9544. Find P(X < 4 or X > 8) = P ( X µ s < 4 2 or X µ 3 s = P(Z < 2 or Z > 2) = P(Z < 2)+P(Z > 2) = 0.0228 + (1 0.9772) = 0.0456. ) > 8 2 3 Note: Since the two areas are the same size, you could have just doubled the lower tail.

12 / 36 Reverse use of Z -table: Finding probabilities given z-scores. Find the z such that Pr(Z < z ) = 0.8485, where 0.8485 is some probability. Answer: z = 1.03. Find z such that P(Z < z ) = 0.2981 Answer z = 0.53 Find z such that P(Z > z ) = 0.1056 = 1 P(Z < z ). Answer: P(Z < z ) = 1 P(Z > z ) = 1 0.1056 = 0.8944. z = 1.25.

13 / 36 Finding Centered Probabilities What if P( z < Z < z ) = 0.85, where 0.85 is a central area under the Z curve (if it s not, we can t do this). Since the total area under the curve is 1 and 1 0.85 = 0.15, there is 0.15 of the area outside z and z. And since the Z curve is centered at 0, half of this area is below z and the other half is above z.

14 / 36 This means P( z < Z < z ) = 0.85 = P(Z < z ) P(Z < z ) = (1 0.075) 0.075 We can now find z such that P(Z < z ) = 0.075 Answer: z = 1.44 If we call the central area 1 α (we ll discover why later), then the outside area is α and the area to look up α/2.

15 / 36 Standard Normal 5 Number Summary We know from Chapter 1 that the IQR = Q 3 Q 1 covers the middle 50% of a distribution. So what are z Q1 and z Q3? P(z Q1 < Z < z Q3 ) = 0.50 = P(Z < z Q3 ) P(Z < z Q1 ) = 0.75 0.25 or P(Z < z Q1 ) = 0.25 and P(Z < z Q3 ) = 0.75. Answer: z Q1 = 0.675 and z Q3 = 0.675 Adding these numbers to the Empirical Rule numbers, we have estimates for the middle 50, 68, 95 and 99.7% s as easy references.

16 / 36 Non-standard Normal Example Suppose the sample proportion of 100 students who think that there is insufficient parking is normally distributed with a mean of 0.8 and a standard deviation of 0.04. As long as we know the distribution is normal, and µ and σ, we can find any probability! p N(µ p = 0.8, σp 2 = 0.04 2 ) How often would we get a sample proportion of 0.75 or less? P(p 0.75) = P( p µ σ 0.75 0.8 ) 0.04 = P(Z < 1.25) = 0.1056

17 / 36 Inference So what good are these probabilities? Recall from the Introduction, an important area of statistics is inference: drawing a conclusion based on data and making decisions based on how likely something is to occur. Since probabilities tell us how often things occur, we can use them to make our decisions. But probabilities come from the whole population which would mean we needed a census, a complete listing of all of the data. We need to be able to make our decisions based on samples, or even one sample.

18 / 36 Inference Inferential Statistics General Idea of Inferential Statistics We take a sample from the whole population. We summarize the sample using important statistics. We use those summaries to make inference about the whole population. We realize there may be some error involved in making inference.

19 / 36 Inference Inferential Statistics Example: (1988, the Steering Committee of the Physicians Health Study Research Group) Question: Can Aspirin reduce the risk of heart attack in humans? Sample: Sample of 22,071 male physicians between the ages of 40 and 84, randomly assigned to one of two groups. One group took an ordinary aspirin tablet every other day (headache or not). The other group took a placebo every other day. This group is the control group. Summary statistic: The rate of heart attacks in the group taking aspirin was only 55% of the rate of heart attacks in the placebo group. Inference to population: Taking aspirin causes lower rate of heart attacks in humans.

20 / 36 Inference Basics for sampling Samples should not be biased: no favoring of any individual in the population. Examples of biased samples: select goldfish from a particular store, polling your neighbor rather than the whole city The selection of an individual in the population should not affect the selection of the next individual: independence. Example of non-independent sample: when taking a survey on the cost of a college education, we ask both the mother and the father of a student Samples should be large enough to adequately cover the population. Example of a small sample: suppose only 20 physicians were used in the aspirin study.

21 / 36 Inference Basics for sampling Samples should have the smallest variability possible. We know that there are many different samples, so we want to make sure our statistics are consistent. The larger sample we use, the less the different sample statistics will vary. Although there are many types of samples, we will only discuss the simplest, a sample random sample. Every sample of a particular size, n, from the population has an equal chance of being selected. A SRS produces an biased statistic.

22 / 36 Inference Basics for sampling

23 / 36 Inference Sampling Distribution Since statistics vary from sample to sample, there is a distribution of them called a sampling distribution which is the distribution of all of the values taken by the statistic in all possible samples of the same size, n, from the same population. We can then examine the shape, center, and spread of the sampling distribution. We know that there are many statistic that we can calculate from a sample, but we re going to start with the sample mean, X.

24 / 36 Inference Bias and Variance Bias concerns the center of the sampling distribution. A statistic used to estimate a parameter is unbiased if the mean of the sampling distribution is equal to the true value of the parameter being estimated. This says that the mean of the sample mean is the same as the mean of the population sampled, µ X = µ X. To reduce bias, we use a random sample. Variability is described by the spread of the sampling distribution. To reduce the variability of a statistic, use a larger sample; the larger the sample size, n, the smaller the variance of the statistic. The reason this is true is because the variance of the sample mean gets smaller as the sample size increases, σ 2 X = σx 2 /n, or σ X = σ X / n.

25 / 36 Inference Bias and Variance Summary Population Distribution of a random variable The distribution of all the members of the population. Parameters help describe the distribution, for example, µ and σ. Sampling Distribution of a sample statistic This is not the distribution of the sample! The sampling distribution is the distribution of a statistic. If we take many, many samples and calculate the statistic for each of those samples, the distribution of all those statistics is the sampling distribution. We will start with the sampling distribution of the sample mean, X.

26 / 36 Sampling Distribution for Numeric Data Sampling Distribution of a Sample Mean We already know that if we take random samples the sample mean is unbiased, µ X = µ X, so we know the center. We can minimize the variance by using a large sample, n, σ X = σ X / n, so we know the spread. Since the sample mean of a normal random variable is also normal, we know the shape. So, if the X is normal, the distribution of the sample mean, or sampling distribution of the sample mean is X n N ( ( ) ) 2 σ µ, n the subscript on X indicates the sample size

27 / 36 Sampling Distribution for Numeric Data Examples of Sample Mean There has been some concern that young children are spending too much time watching television. A study in Columbia, South Carolina recorded the number of cartoon shows watched per child from 7:00 a.m. to 1:00 p.m. on a particular Saturday morning for 28 different children. The results were as follows: 2, 2, 1, 3, 3, 5, 7, 5, 3, 8, 1, 4, 0, 4, 2, 0, 4, 2, 7, 3, 6, 1, 3, 5, 6, 4, 4, 4. (Adapted from Intro. to Statistics, Milton, McTeer and Corbet, 1997) Suppose the true average for all of South Carolina is 3.4 with a standard deviation of 2.1, and that the data is normal.

28 / 36 Sampling Distribution for Numeric Data Examples of Sample Mean What is the population mean? µ = 3.4 What is the sample mean? x = 99/28 = 3.535 What is the approximate sampling distribution (of the sample mean)? X 28 N ( 3.4, ( 2.1 28 ) 2 ) = N(3.4, 0.4 2 ) Again, what does this mean?

29 / 36 Sampling Distribution for Numeric Data Examples of Sample Mean Suppose we take many, many samples (each sample of size 28), then we find the sample mean for each sample. The sampling distribution of all those means (2.9, 3.4, 4.1,... ) is distributed N(3.4, 0.4 2 ).

30 / 36 Sampling Distribution for Numeric Data The Central Limit Theorem What if the original data (parent population) is not normal? The Central Limit Theorem states that for any population with mean µ and standard deviation σ, the sampling distribution of the sample mean, X n, is approximately normal when n is large. X n N ( ( ) ) 2 σ µ, n The central limit theorem is a very powerful tool in statistics. Remember, the central limit theorem works for any distribution. Let us see how well it works for the years on pennies.

31 / 36 Sampling Distribution for Numeric Data Example of Central Limit Theorem Penny Population Distribution (276)

32 / 36 Sampling Distribution for Numeric Data Example of Central Limit Theorem Note from the previous slide, the distribution is highly left skewed. The mean of the 276 pennies is 1992.9. The standard deviation of the 276 pennies is 8.7. Let us take 50 samples of size 10. According to the Central Limit Theorem, the sampling distribution of the sample means should be normal with mean 1992.9 and standard deviation 8.7/ 10 = 2.75.

33 / 36 Sampling Distribution for Numeric Data Example of Central Limit Theorem That is, the sampling distribution, the distribution of the x s should be a normal distribution. Suppose we took 50 samples from these pennies and plotted the sample means:

34 / 36 Sampling Distribution for Numeric Data Example of Central Limit Theorem The distribution of the means of the 50 samples is Notice x X is close to 1992.9 = µ and s X is not far from 2.75 = σ. The previous slide shows the distribution of the means of the 50 samples is slightly skewed but closer to the normal distribution. So, n = 10 isn t large enough and taking larger samples would produce a more normal distribution of sample means. So what is large enough? Theory says at least n = 30, but sometimes more is needed.

35 / 36 Sampling Distribution for Numeric Data Recap So in general: The mean of sample means is the mean of the data, µ X = µ X. The standard deviation of the sample means is the standard deviation of the data divided by the square root of the sample size, σ X = σ X. If the data is normal, then the distribution of the sample means is exactly normal. But even if the distribution of the data isn t known, we can say the distribution of the sample means is approximately normal as long as we take a large sample.

36 / 36 Sampling Distribution for Numeric Data Example Example: Suppose past studies indicate it takes an average of 15 minutes with a standard deviation of 5 minutes to memorize a short passage of 100 words. A psychologist claims a new method of memorization will reduce the average time. A random sample of 40 people use the new method and the average time required to memorize the passage is found to be 12.5 minutes. 12.5 minutes is obviously less than 15, but is it small enough to say that the new method actually reduces the average time or is it just random chance that produced such a small sample mean? How likely is x 12.5 if µ = 15? First X N(15, ( 5 40 ) 2 ) = N(15, 0.79 2 ) P(X < 12.5) = P(Z < 12.5 15 0.79 ) = P(Z < 3.16) = 0.0008 So, even though 12.5 isn t much different than 15 minutes, an average this small should rarely if ever happen.