Lab 6: Sampling Distributions and the CLT

Similar documents
6 3 The Standard Normal Distribution

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Fairfield Public Schools

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Practice#1(chapter1,2) Name

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

4. Continuous Random Variables, the Pareto and Normal Distributions

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Scatter Plots with Error Bars

CALCULATIONS & STATISTICS

Getting started in Excel

Descriptive Statistics

Lesson 17: Margin of Error When Estimating a Population Proportion

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Gestation Period as a function of Lifespan

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Point and Interval Estimates

Years after US Student to Teacher Ratio

The Normal Distribution

Lab 11. Simulations. The Concept

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Chapter 1: Exploring Data

Lecture 8 : Coordinate Geometry. The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 20

Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

Chapter 7 Section 1 Homework Set A

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Standard Deviation Estimator

Data Analysis Tools. Tools for Summarizing Data

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

The Circumference Function

Exploratory data analysis (Chapter 2) Fall 2011

Scientific Graphing in Excel 2010

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Confidence Intervals for One Standard Deviation Using Standard Deviation

Exploratory Data Analysis. Psychology 3256

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Chapter 4. Probability and Probability Distributions

NCSS Statistical Software

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

What Does the Normal Distribution Sound Like?

Statistics 2014 Scoring Guidelines

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

Week 3&4: Z tables and the Sampling Distribution of X

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Summary of important mathematical operations and formulas (from first tutorial):

Appendix 2.1 Tabular and Graphical Methods Using Excel

Intermediate PowerPoint

WEEK #22: PDFs and CDFs, Measures of Center and Spread

How To Check For Differences In The One Way Anova

Characteristics of Binomial Distributions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 2: Descriptive Statistics

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Statgraphics Getting started

AP STATISTICS 2010 SCORING GUIDELINES

Chapter 3 RANDOM VARIATE GENERATION

TEACHER NOTES MATH NSPIRED

WISE Sampling Distribution of the Mean Tutorial

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Jump-Start Tutorial For ProcessModel

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

This activity will show you how to draw graphs of algebraic functions in Excel.

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

AMS 5 CHANCE VARIABILITY

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

1.5 Oneway Analysis of Variance

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Journal of Statistical Software

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

The Binomial Probability Distribution

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

SPSS Explore procedure

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Creating a Poster in PowerPoint A. Set Up Your Poster

LOGNORMAL MODEL FOR STOCK PRICES

Introduction; Descriptive & Univariate Statistics

Petrel TIPS&TRICKS from SCM

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Physics Lab Report Guidelines

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Coins, Presidents, and Justices: Normal Distributions and z-scores

The Math. P (x) = 5! = = 120.

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Inference for two Population Means

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Describing, Exploring, and Comparing Data

Permutation Tests for Comparing Two Populations

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

Data exploration with Microsoft Excel: univariate analysis

A Guide to Using Excel in Physics Lab

Using Excel to find Perimeter, Area & Volume

How To Write A Data Analysis

Midterm Review Problems

Transcription:

Lab 6: Sampling Distributions and the CLT Objective: The objective of this lab is to give you a hands- on discussion and understanding of sampling distributions and the Central Limit Theorem (CLT), a theorem that plays an important role in statistics. In this lab, you will simulate random samples from a known population distribution and compute a sample statistic for each of the generated samples. The generated sample statistics can be examined to learn about properties of the sampling distribution of the statistic. Application: Jack is a researcher who wants to perform a hypothesis test to learn something about the mean salary for all Major League Baseball players. This test requires that the distribution of salaries have a normal distribution. When Jack creates a histogram of the salaries for a sample of players he notices that this distribution is not normal. However, all hope is not lost for doing this hypothesis test. The distribution of the sample mean has a nice feature in which the sample mean will have a normal distribution if some conditions are met. Overview: Statistical inference is the process of drawing conclusions about a population parameter based on data. When a sample is selected from a population, a summary number can be computed from the observations resulting in the value of a statistic. A statistic is used to estimate the corresponding value for a population (that is, a sample statistic estimates a population parameter). However, a sample chosen at random will not necessarily yield an estimate (or statistic) that is exactly equal to the corresponding parameter for the population; the next selected sample of the same size will probably give a different estimate from the first one. If additional samples of the same size were taken, you would begin to see how the possible estimates (possible values of the statistic) vary and how close they tend to be to the parameter value. With a large number of samples, you can assess whether the value of the statistic (e.g., sample mean, X ) will frequently be close to the true value of the population parameter (e.g., population mean, μ), and if so, how close on average. This can be seen more easily through some pictures: When data are gathered by random sampling, the statistic will be a random variable and as such, it will have a probability distribution. The probability distribution of the sample statistic is called its sampling distribution. Generally, if we use a statistic to make an inference about a population parameter, we want its sampling distribution to be centered at the true parameter (a characteristic which allows us to call that statistic unbiased), and we would like variability in the estimates to be as small as possible.

Below, we have two estimators that are both unbiased, but Estimator I has less variability (is more precise). Thus, we would prefer Estimator I to Estimator II. We will next examine the sampling distribution of the sample statistic most commonly used for measuring the center of a distribution - - the sample mean. Formula Card: Warm- Up: Check Your Understanding Suppose the time to wait for placing an order at a drive- through window has a uniform distribution between 0 and 8 minutes. The mean waiting time is 4 minutes and the standard deviation is 2.3 minutes. Suppose a random sample of 100 drive- through window orders will be selected from over the past week and the average waiting time for placing the orders will be calculated. Which graph below best represents the model that could be used to approximate the probability that the average waiting time is at most 5 minutes?

ILP: How Sample Size and Distribution of Parent Population Affect the Sampling Distribution of the Sample Mean In this activity, you will observe the effects that sample size and the distribution of the population you are sampling from have on the sampling distribution of the sample mean. The sampling distribution of the sample mean, X, is the distribution of the sample mean values for all possible samples of the same size from the same population. Open the Sampling Distribution applet which can be found at: http://onlinestatbook.com/stat_sim/sampling_dist/index.html This applet will help you simulate sampling distributions for a variety of statistics and allows you to vary the sample size and the population from which the samples are taken. Read the Applet instructions.

Press Begin to open the applet; you will see the screen pictured below. Notice that when the applet begins, a histogram of the normal distribution with mean 16 and standard deviation 5 is displayed for the default parent distribution. The Sampling Distribution Applet has several options from which you can choose: The 1 st histogram, the Parent Population histogram, is the population from which the sample will be drawn. You can select from Normal, Uniform, Skewed, or even customize the distribution by selecting Custom and dragging the mouse over the plot. For now, keep the default N(16, 5) distribution as the parent population. When you are done with a particular simulation, you can click on Clear lower 3 button to clear the remaining histograms, and select new settings for your next simulation. The 2 nd plot, the Sample Data histogram, displays a histogram of the sampled data. This histogram is initially blank. You can select to draw Animated Sample, 5 Samples, 1,000 Samples, or 10,000 Samples from the parent population. The 3 rd and 4 th histograms show the distribution of statistics computed from the sampled data. The number of samples (replications) on which the 3 rd and 4 th histograms are based is indicated by the label "Reps=," which is displayed once the simulation is started. You can also control which statistic to examine, as well as the sample size by using the drop- down menu options to the right of each plot. (Note that the applet uses N to denote sample size, whereas we generally use n.)

The statistic options include: Mean Median sd = standard deviation (uses N in the denominator) Variance = variance of the sample (uses N in the denominator) Variance (U) = unbiased estimate of variance (uses N- 1 in the denominator) MAD = mean absolute value of the deviation from the mean Range Select Mean as the statistic in the 3 rd histogram and a sample size of 5 (default), then click on Animated Sample to draw one sample of size n = 5 from the normal parent population. You will see five observations appear in the 2 nd histogram, and the sample mean of the five numbers will appear in the 3 rd histogram as a blue rectangle. This graphically shows the process of attaining the sample mean from one sample of size 5. Repeat this several times and you will see how the sampling distribution of the sample mean starts to form in the 3 rd histogram. Once you have a feeling of this works, you can speed things up by choosing the larger sampling options, such as 5, 1,000, or 10,000 samples. 1. Select the Normal distribution as a parent population. a. What are the mean and standard deviation of this population? Mean = 16.00, sd = 5.00 b. Select Mean (sample mean) as the statistic of interest in both the 3 rd and 4 th histograms, sample size n = 5 for the 3 rd histogram, and n = 25 for the 4 th. Do about 5 animated samples, and then take 10,000 samples at once. Draw rough sketches of each of the distributions of the sample means. Make sure to label both axes. n = 5: n = 25: c. How do the distributions of each sample mean in the 3 rd and 4 th histograms compare with the parent population in the 1 st histogram? Comment on shape, mean, standard deviation, etc. d. Looking at the properties of the population and sample distributions (displayed to the left of their respective histograms), what can you say about the relationship between the standard deviation of the sample mean and the population standard deviation? The standard deviation of the sample mean is smaller than the population standard deviation.

e. What can you say about the relationship between the sample size and the standard deviation of the sample mean? The standard deviation of the sample mean becomes smaller as the sample size increases. f. Does the number of replications influence the shape of the sampling distribution? That is, as you take more samples, does the shape of the sampling distribution change significantly? No, only the sample size n and the shape of the parent population will influence the shape of the sampling distribution. 2. Clear the lower three graphs and then select the Skewed distribution as a parent population. a. Select Mean (sample mean) as the statistic of interest in both the 3 rd and 4 th histograms, sample size n = 5 for the 3 rd histogram, and n = 25 for the 4 th. Do about 5 animated samples, and then take 10,000 samples at once. Draw rough sketches of each of the distributions of the sample means. Make sure to label both axes. n = 5: n = 25: b. How do the distributions of each sample mean in the 3 rd and 4 th histograms compare with the parent population in the 1 st histogram? Comment on shape, mean, standard deviation, etc. n = 5: The distribution of the sample mean n=25: The distribution of the resulting is more symmetric than the parent sample mean values follows a normal shape population but still slightly skewed right. The mean is close to the population mean, and the standard deviation is smaller than that of the population. that is centered around the population mean value of 8, but the sample means seem to be more concentrated (less varied) around the population mean of 8. c. How do the distributions of each sample mean in the 3 rd and 4 th histograms compare to each other? Comment on shape, mean, standard deviation, etc. The histogram for n=25 is more symmetric and more normal than the histogram for n=5, which is slightly skewed right. The means for the two distributions are the same. The standard deviation for n=25 is less than the standard deviation for n=5. d. How do the distributions of each sample mean in the 3 rd and 4 th histograms compare with those created of the sample mean when the parent population was normal (in question 1)? Comment on shape, mean, standard deviation, etc.

e. What should be the value of the standard deviation of the sample mean if the population standard deviation is 6.22 and the sample size is n = 25? (Show the calculation.) How does this value compare to the standard deviation displayed to the left of the 4 th histogram created above? The standard deviation of the sample mean will be equal to!.!!!" = 1.24. 3. Clear the lower three graphs and select the Custom distribution as a parent population. The parent population plot should be empty. To create a distribution, you will need to use the mouse to click and drag on different parts of the parent population graph until you ve drawn a distribution that you like. Try and make the distribution as unusual as possible! a. Provide a rough sketch of your custom population. Be sure to note the mean and standard deviation. This will vary by student. Encourage students to create a unique distribution. b. Select Mean (sample mean) as the statistic of interest in both the 3 rd and 4 th histograms, sample size n = 5 for the 3 rd histogram, and n = 25 for the 4 th. Do a few animated samples, and then take 10,000 samples at once. Draw rough sketches of each of the distributions of the sample means. Make sure to label both axes. n = 5: Will vary n = 25: Will vary c. How do the distributions of each sample mean in the 3 rd and 4 th histograms compare with the parent population in the 1 st histogram? Comment on shape, mean, standard deviation, etc. d. How do the distributions of each sample mean in the 3 rd and 4 th histograms compare to each other? Comment on shape, mean, standard deviation, etc. The distribution of sample means for n=25 is more concentrated around the population mean and is more symmetric than that for n=5. e. Considering the changes observed from n = 5 to n = 25 in questions 2 and 3, what can you say about the shape of the distribution of the sample mean with respect to the sample size n? The larger sample size n, the narrower (the smaller the standard deviation) the distribution of the sample mean is. f. What should be the standard deviation of the sample mean for samples of size n = 25 from your custom population? (Show the calculation.) How does this value compare to the standard deviation displayed to the left of the 4 th histogram created above? According to the CLT, the standard deviation for the sample mean should be equal to σ/! where σ is the population standard deviation. Calculations will vary.

Cool- Down: Check Your Understanding about the CLT 1. If the parent population is NOT a normal distribution, but with a mean μ and a standard deviation σ, then for a large sample size, the sample mean will have approximately a _normal distribution with a mean of μ and a standard deviation of σ/!. This result is known as the Central Limit Theorem. 2. Fill out the chart below to further summarize your findings regarding the sampling distribution of the sample mean based on the CLT. Sample Settings Will the sampling distribution of sample mean be approx normal? n = 10, Parent Population Normal Yes n = 10, Parent Population NOT Normal No n = 50, Parent Population Normal Yes n = 70, Parent Population NOT Normal Yes

Example Exam Question on Sampling Distribution of the Sample Mean For a particular community it is known that the mean amount of water used per home during October is 1250 gallons and the standard deviation is 325 gallons. a. The distribution for amount of water used is skewed to the right. Sketch a skewed right distribution below and label both axes. density Amount of water (gallons) b. For a promotional campaign, a radio station plans to randomly select 50 homes and pay their water bills for the month of October. Describe the approximate sampling distribution of the sample mean amount of water used for a random sample of 50 homes. Provide all features of the distribution. The sample mean will have approximately a NORMAL distribution with a mean of 1250 gallons and a standard deviation of! =325/ 50 = 45.962 gallons.! c. The radio station can afford to pay for a total of 67,000 gallons. What is the probability that the total number of gallons for a random sample of 50 homes will exceed 67,000 gallons? Hint: Think about how a total and an average are related.!"!#$!"##$%& 67000!"##$%&!!"!#$!"##$%& > 67,000!"##$%& =! > =!(! > 1340!"#) =!!!! >!"#$!!!!!! =!! >!"#$!!"#$!".!"# 50 h!"#$ 50 h!"#$ =!! > 1.96 = 1 0.975 = 0.025. The probability that the total number of gallons for a random sample of 50 homes will exceed 67,000 gallons is 0.025, or 2.5%.