Population and sample; parameter and statistic. Sociology 360 Statistics for Sociologists I Chapter 11 Sampling Distributions. Question about Notation

Similar documents
Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Descriptive Statistics

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Week 3&4: Z tables and the Sampling Distribution of X

MTH 140 Statistics Videos

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CALCULATIONS & STATISTICS

Lesson 17: Margin of Error When Estimating a Population Proportion

Simple Regression Theory II 2010 Samuel L. Baker

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Chapter 4. Probability and Probability Distributions

The Normal Distribution

COMMON CORE STATE STANDARDS FOR

z-scores AND THE NORMAL CURVE MODEL

AP Statistics Solutions to Packet 2

Lesson 4 Measures of Central Tendency

Section 1.3 Exercises (Solutions)

AP STATISTICS REVIEW (YMS Chapters 1-8)

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Chapter 7 Section 7.1: Inference for the Mean of a Population

Statistics 2014 Scoring Guidelines

4. Continuous Random Variables, the Pareto and Normal Distributions

Mind on Statistics. Chapter 2

Notes on Continuous Random Variables

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Bootstrap Methods and Permutation Tests*

6 3 The Standard Normal Distribution

Characteristics of Binomial Distributions

Exploratory Data Analysis. Psychology 3256

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

MEASURES OF VARIATION

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

5.1 Identifying the Target Parameter

6.4 Normal Distribution

Simple linear regression

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

Fairfield Public Schools

Key Concept. Density Curve

Descriptive Statistics and Measurement Scales

What Does the Normal Distribution Sound Like?

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

The Normal distribution

8. THE NORMAL DISTRIBUTION

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Point and Interval Estimates

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

What is the purpose of this document? What is in the document? How do I send Feedback?

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Interpreting Data in Normal Distributions

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Correlation and Regression

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Pr(X = x) = f(x) = λe λx

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

The Normal Distribution

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

AP * Statistics Review. Descriptive Statistics

3: Summary Statistics

Common Core State Standards for Mathematical Practice 4. Model with mathematics. 7. Look for and make use of structure.

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

AP Physics 1 and 2 Lab Investigations

SAMPLING DISTRIBUTIONS

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Exercise 1.12 (Pg )

Relationships Between Two Variables: Scatterplots and Correlation

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Common Core Unit Summary Grades 6 to 8

Measurement and Measurement Scales

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Probability. Distribution. Outline

9. Sampling Distributions

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Grade 7 & 8 Math Circles Circles, Circles, Circles March 19/20, 2013

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Prentice Hall Connected Mathematics 2, 7th Grade Units 2009

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Summarizing and Displaying Categorical Data

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Rarefaction Method DRAFT 1/5/2016 Our data base combines taxonomic counts from 23 agencies. The number of organisms identified and counted per sample

Parametric and Nonparametric: Demystifying the Terms

Example 1: Dear Abby. Stat Camp for the Full-time MBA Program

Correlation key concepts:

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Introduction to Quantitative Methods

Midterm Review Problems

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

3. Data Analysis, Statistics, and Probability

Transcription:

Population and sample; parameter and statistic Sociology 360 Statistics for Sociologists I Chapter 11 Sampling Distributions The Population is the entire group we are interested in A parameter is a number describing a characteristic of the population. Parameters are usually unknown. Sample A statistic is a number describing a characteristic of a sample. We often use a statistic to estimate an unknown population parameter. 1 2 Populations and samples: Notation Question about Notation Numerical Summaries for quantitative variables Population Parameters Sample Statistics The mean distance run in a year by a sample of subscribers to Runner s World can be represented by: Mean Standard deviation Proportion for a dichotomous categorical variable: 3 4

Key question: What if we drew another sample? How closely does a sample reflect the population? of states The Population is the entire group we are interested in Sample How likely is it that a statistic estimated from a sample will be close to the population parameter? We can use statistical theory to answer this question if we used a probability sample to calculate the statistic. 7.332 5 The law of large numbers 6 Distribution of x (the sample mean) Law of large numbers: As the number of randomly-drawn observations (n) in a sample increases, We take many random samples of a given size n from a population with mean and standard deviation. the mean of the sample ( ) gets closer and closer to the population mean (quantitative variable). Some sample means will be above the population mean! and some will be below, making up the sampling distribution. the sample proportion ( ) gets closer and closer to the population proportion p (categorical variable). Histogram of some sample averages 7 8

Facts about the distribution of x A sampling distribution is a distribution of sample statistics The mean, or center of the sampling distribution of, is equal to the population mean µ. The standard deviation of the sampling distribution is!/!n, where n is the sample size. Sampling distribution of When sampling randomly from a given population: The sampling distribution describes what happens to the statistic when we take all possible random samples of a fixed size n. Like other distributions we can describe the center and the spread of sampling distributions. The sampling distribution of a statistic is the probability distribution of that statistic.!/!n µ 9 10 Why we use sampling distributions Sample size and the spread of sampling distributions We have data from sample surveys..2 n = 5.2 n = 10 How accurate are our estimates of the population parameters of interest? E.g., 5 states to estimate the mean murder rate for all 50 states? 10 states? Fraction.1 Fraction.1 If we know about the sampling distribution of a statistic, we can say how precise (close to the population parameter) the statistic is likely to be. 0.2 0 5 10 15 mean means of 1000 samples of size 5 n = 15 0 0 5 10 15 mean means of 1000 samples of size 10 Distribution of samples of size 5: mean = 7.4048 Standard Deviation = 1.7048 Fraction.1 Size 10: mean = 7.3596 standard deviation = 1.1564 0 0 5 10 15 mean means of 1000 samples of size 15 Size 15: mean = 7.3380 standard deviation = 0.8553 11 12

Relationships of the statistics to the parameters The mean of the sampling distribution of the sample mean is equal to the population mean, or, using symbols: The mean of x is equal to µ. Because the average value of x, over many samples, is equal to µ, we say: x is an unbiased estimator of µ. The standard deviation of the sampling distribution of x is smaller than the standard deviation of the population, if the sample size is larger than 1. Specifically: Shape of the sampling distribution with a normal population When a variable is normally distributed in its population, then the sampling distribution of over all possible samples of size n is also normal. The standard deviation of x is equal to! n. Thinking about the implication of n = 1 as one possibility, we see that averages are less variable than individual observations. 13 14 Summary for normal populations So if variable X is N ( ), then the sample mean distribution is N Problem: IQ scores In a selected population of adults, IQ is normally distributed with mean 112 with standard deviation 20. Suppose 200 adults are randomly selected for a market research campaign. The distribution of the sample mean IQ is: A) normal, mean 112, standard deviation 20. B) normal, mean 112, standard deviation 1.414. C) normal, mean 112, standard deviation 0.1. 15 16

Problem: IQ scores, continued Suppose that we would be satisfied with a standard deviation of the mean of 5. How many individuals would we need to sample? Practical notes Large samples are not always feasible Not all variables are normally distributed Example: Income is strongly skewed to the right Is still a good estimator of!? In large samples? In small samples? 17 18 The central limit theorem Central Limit Theorem: When randomly sampling from any population with mean and standard deviation, when n is large enough, the sampling distribution of is approximately normal: N(, /!n). Question about distributions and the CLT If the first graph shows the population, which plot could be the sampling distribution of if all samples of size n = 50 are drawn? Population with strongly skewed distribution Sampling distribution of for n = 2 observations Sampling distribution of for n = 10 observations Sampling distribution of for n = 25 observations 19 20

Another Question about Distributions & the CLT The following density curve represents waiting times at a customer service counter at a national department store. The mean waiting time is 5 minutes with standard deviation 5 minutes. If we took all possible samples of size n = 100, how would you describe the sampling distribution of the s? Shape? Center? Spread? Sampling Distributions and Normality When sample size is small, the sampling distribution of the mean will resemble the population distribution. As sample size increases, the sampling distribution of the mean becomes more normal-shaped, regardless of the shape of the population distribution. A sample size of 25 is generally enough to obtain a normal sampling distribution from a strongly-skewed population or even one with mild outliers. A sample size of 40 will typically be good enough to overcome extreme skewness and outliers. 21 22 The three distributions to keep straight Distribution of a variable in the population Mean =!; standard deviation " Units/cases = people, states, etc Distribution of a variable in a sample Mean = ; standard deviation s Statistics estimate parameters Units/cases = people, states, etc Distribution of a mean calculated from repeated samples Mean =!; standard deviation = It is the sampling distribution of Units/cases = samples Using the central limit theorem In 1997 mean family income in the United States was $49,692 with a standard deviation of $39,802. What is the minimum sample size we should use and why? Using this sample size, find the probability that the sample you draw will have a mean income of above 60,000. 23 24

More Problems : Stocks 1987 was a bad year for the stock market. Of 1815 stocks on the NYSE: the average return was 3.5%; the standard deviation was 26%. Stock returns were normally distributed. 1) What is the probability that a randomly selected stock lost more than 30% of its value in 1987? 2) What is the probability that a portfolio of 5 randomly chosen stocks lost more than 30% of its value. 3) Why do experts recommend larger portfolios as less risky? 4) If I randomly picked 5 stocks, what s the least I could have lost if I were in the bottom 5% of the returns distribution? Concluding Comments Some things to know include: what a sampling distribution is and why they are important the effect of sample size on the sampling distribution the center and variability of a sampling distribution how to think of a sampling distribution as a probability model the Law of Large Numbers and the Central Limit Theorem keeping straight the population, sample, and sampling distribution what parts of the following expression are true for any sampling distribution, and what is true only in certain situations: N (µ, n! ) 25 26 Review of important concepts Sampling distributions are theoretical distributions: they are the distribution of using all possible combinations of samples of size n. The spread of a sampling distribution depends on the number of cases over which you calculate the mean, or the sample size n, as well as on the spread of the population, measured by!. When you calculate means over more cases (larger n) the variability of the sampling distribution decreases and the closer and closer the samples will fall around the population mean (by the LLN). Because sampling distributions are theoretical distributions, they vary only by the number of cases used to calculate the mean (for any given population). Their characteristics are not affected by the number of samples that might be drawn. 27