Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

Similar documents
MEASURES OF VARIATION

Chapter 7 Section 1 Homework Set A

Draft 1, Attempted 2014 FR Solutions, AP Statistics Exam

Chapter 7 Section 7.1: Inference for the Mean of a Population

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Lecture 5 : The Poisson Distribution

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Lesson 17: Margin of Error When Estimating a Population Proportion

Chapter 1: Exploring Data

Section 1.3 Exercises (Solutions)

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Descriptive statistics parameters: Measures of centrality

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Dongfeng Li. Autumn 2010

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Confidence Intervals for One Standard Deviation Using Standard Deviation

9. Sampling Distributions

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

The Normal distribution

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Interpreting Data in Normal Distributions

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

The Math. P (x) = 5! = = 120.

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Example 1: Dear Abby. Stat Camp for the Full-time MBA Program

Chapter 4. Probability Distributions

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

Name: Date: Use the following to answer questions 3-4:

The Assumption(s) of Normality

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Characteristics of Binomial Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions

AP STATISTICS (Warm-Up Exercises)

Week 3&4: Z tables and the Sampling Distribution of X

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Confidence Intervals for the Difference Between Two Means

Getting Started with Statistics. Out of Control! ID: 10137

Mathematics (Project Maths Phase 1)

Summarizing and Displaying Categorical Data

Recall this chart that showed how most of our course would be organized:

Lecture 8. Confidence intervals and the central limit theorem

Chapter 4. Probability and Probability Distributions

Means, standard deviations and. and standard errors

Diagrams and Graphs of Statistical Data

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

7. Normal Distributions

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Exact Confidence Intervals

AP Statistics Solutions to Packet 2

Walk the Line Written by: Maryann Huey Drake University

Lecture 1: Review and Exploratory Data Analysis (EDA)

Unit 31: One-Way ANOVA

Example: Find the expected value of the random variable X. X P(X)

Statistics 104: Section 6!

3: Summary Statistics

ax 2 by 2 cxy dx ey f 0 The Distance Formula The distance d between two points (x 1, y 1 ) and (x 2, y 2 ) is given by d (x 2 x 1 )

PRACTICE PROBLEMS FOR BIOSTATISTICS

Week 1. Exploratory Data Analysis

Chi Square Tests. Chapter Introduction

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Unit 8 Data Analysis and Probability

Basics of Statistics

Statistics 151 Practice Midterm 1 Mike Kowalski

Lecture 8 : Coordinate Geometry. The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 20

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Example Research Scenarios

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Solutions to Homework 3 Statistics 302 Professor Larget

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Practice#1(chapter1,2) Name

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

The Distance Formula and the Circle

7 CONTINUOUS PROBABILITY DISTRIBUTIONS

Key Concept. Density Curve

Basic Tools for Process Improvement

Chapter 7 - Practice Problems 1

Geostatistics Exploratory Analysis

Midterm Review Problems

STAT 350 Practice Final Exam Solution (Spring 2015)

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics

Name: Date: Use the following to answer questions 2-4:

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Physics 1 and 2 Lab Investigations

LOGNORMAL MODEL FOR STOCK PRICES

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Chapter 23 Inferences About Means

Nonparametric statistics and model selection

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Demographics of Atlanta, Georgia:

Transcription:

Activity #14: Sampling distributions and the Central Limit Theorem So far, this unit has focused on distributions of discrete and continuous random variables. In this activity, we ll investigate sampling distributions distributions of statistics. Scenario: We want to know the average age of all cloned sheep that exist right now. We don t know how many cloned sheep exist, but we are able to get samples of sheep delivered to us. Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months. 1. Using R, I input the ages of the sheep with the code: sheep <- c(10, 11, 12, 13, 14). I then calculated population parameters that we do not know in this scenario: μ = 12 and σ = 1.414. To estimate μ, the unknown average age of all cloned sheep, we decide to do the following: Take a sample of n sheep from the population Calculate the average from each sample Repeat this process many times and sketch a distribution of the averages we calculate from our samples Suppose we go through this process with a sample size of n=1 sheep. We first sample one sheep and then calculate it s average age. We then take another sample (possibly getting the same sheep) and calculate an average. a) What possible averages could we get? Sketch a dotplot of those averages: (n=1) 10 11 12 13 14 b) Suppose we sample n=1 sheep 25,000 times. What would the distribution of all those look like? c) The mean of all 25,000 should be: d) Suppose we decide to sample n = 2 sheep. How many different averages could we get from samples of two sheep? e) Sketch a dotplot of all possible averages we could get from n=2 sheep: (n=2) 10 11 12 13 14 f) Suppose we sample n=2 sheep 25,000 times. What would the distribution of all those look like? g) The mean of all 25,000 should be:

2. I simulated this process 25,000 times for sample sizes of n=1, 2, 3, 4, and 5 sheep. Fill-in-the-blanks: Sample Size Distribution of 25,000 Mean of sample means Std. Deviation of Probability of a usual event: Probability of an unusual event: n = 1 11.989 1.4197 n = 2 11.994 0.8678 n = 3 12.001 0.5774 n = 4 12.001 0.3526 n = 5 12.000 0.000 3. From these simulations, let s generalize. If we repeatedly take samples of size n from a population and calculate the mean of each sample: a) The expected value of our (i.e., the mean of our means) = b) The standard deviation of our is called the standard error. If we take a larger sample, the size of our standard error. DECREASES INCREASES c) If we take a larger sample, the probability of an unusual sample mean DECREASES INCREASES d) If we take a larger sample, the probability of an usual sample mean.... DECREASES INCREASES

If we repeatedly take samples of size n from a population with an unknown distribution and calculate the mean of each sample, The mean of the will equal the population mean The standard deviation of the (the standard error) shrinks as the sample size increases We still don t know what shape the distribution of will have (although, in this example, it looks like the distribution becomes unimodal and symmetric) 4. Suppose body temperatures for a population of interest follow a normal distribution with μ = 98.5 and σ = 0.75. a) Suppose we randomly select a single individual from this population. Use a computer (R, Wolfram Alpha, or the normal distribution applet*) to calculate: P( 97.75 < X < 99.25) = P( 1< Z < +1) = P( X < 98) = b) Suppose we randomly select a sample of n=100 individuals from this population. Circle the correct symbol. P( 97.75 < X < 99.25) < = > 0.683 P( X < 98) < = > 0.252 c) Sketch the distribution of sample averages we d get if we repeatedly sampled n=100 individuals. d) Now calculate the probabilities for a sample of n=100 individuals: P( 97.75 < X < 99.25)= P( X < 98) = e) To calculate those probabilities, we assumed the sampling distribution had what kind of shape? Applet: http://lock5stat.com/statkey/theoretical_distribution/theoretical_distribution.html#normal

To calculate the previous 2 probabilities, we needed to assume the sampling distribution was approximately normal. Is there a way we can know the shape of the distribution of? Scenario: Researchers collected data from 4,390 babies born to mothers in Georgia from 1980-1992. The birth weights of these babies approximated a normal distribution with a mean of 3156.3 grams and a standard deviation of 570.44. I had a computer randomly sample babies from this dataset and calculate the average weight for each sample. I repeated this process 10,000 times to plot the sampling distributions. Data: Adams MM, et. al. The relationship of interpregnancy interval to infant birthweight and length of gestation among low-risk women, Georgia. Paediatr Perinat Epidemiol. 1997 Jan;11 Suppl 1:48-62 5. Below, I ve pasted results from my computer simulations. Fill-in-the-blanks to see if these simulated sampling distributions agree with the theory we ve derived. Explain why the simulated results do not match the theory perfectly. Sample Size Sampling distribution Mean of sample means Mean Standard deviation of standard error 2 3161.37 398.382 16 3154.85 141.975 100 3156.30 3156.3 56.933 57.044 It looks like these sampling distributions are approximately normal, but that might be because the population distribution was approximately normal. What happens if we start with a population that is not normally distributed?

Scenario: The high school GPAs of 556 St. Ambrose freshmen in 2012 are displayed to the right. These GPAs are obviously not normally distributed (they have a negative skew). The mean is 3.27 with a standard deviation of 0.5527. I had a computer randomly sample GPAs from this dataset and calculate an average. I repeated this 10,000 times and graphed all 10,000 mean GPAs. 6. Fill-in-the-blanks. Do our theoretical results hold for populations that are not normally distributed? Sample Size Sampling distribution Mean of sample means Mean Standard deviation of standard error 2 3.273 0.391 16 3.269 3.27 0.1371 0.1382 100 3.270 3.27 0.0504 0.0553 7. Under what conditions does it appear as though the distribution of the sample mean will be approximately normal? 8. Use the following applet to predict the distribution of various sample statistics under various conditions: http:// www.onlinestatbook.com/stat_sim/sampling_dist/index.html.

Central Limit Theorem: If we repeatedly take samples of size n from a population with an unknown distribution and calculate the mean of each sample, The mean of the will equal the population mean The standard deviation of the (the standard error) shrinks as the sample size increases The sampling distribution of will approximate a normal distribution if: a) The population follows a normal distribution, or b) We repeatedly take large sample sizes (how large?) Scenario: A (hypothetical) statistics professor often continues lecturing after the class period should have ended. Let X = the amount of time the professor lectures after class should have ended. Suppose students recorded X each day for several years and found X has a mean of 5 minutes and a standard deviation of 1.8 minutes. 0 2 4 6 8 10 time 9. Suppose we sample 1, 5, or 25 class days at random. Calculate the following probabilities: Sample 1 day: Sample 5 days Sample 25 days μ = and σ = μ = and σ = μ = and σ = P( X < 5.5)= P( X < 5.5)= P( X < 5.5)= P( X > 7) = P( X > 7) = P( X > 7) = 10. Suppose we repeatedly sample 25 days and calculate the average time lecturing. What average represents the 10th percentile of this distribution? 11. Complete the following: If you sample one day, 0.95 = P ( X ) If you sample 100 days, 0.95 = P ( X ) 12. What sample size would we need in order for 0.95 = P( 4.5 X 5.5)