Estimation; Sampling; The T distribution

Similar documents
Normal distribution. ) 2 /2σ. 2π σ

Week 4: Standard Error and Confidence Intervals

4. Continuous Random Variables, the Pareto and Normal Distributions

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Independent samples t-test. Dr. Tom Pierce Radford University

One-Way Analysis of Variance

6 3 The Standard Normal Distribution

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

6.4 Normal Distribution

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Descriptive Statistics and Measurement Scales

Introduction; Descriptive & Univariate Statistics

Study Guide for the Final Exam

Simple Regression Theory II 2010 Samuel L. Baker

AMS 5 CHANCE VARIABILITY

3.2 Measures of Spread

8. THE NORMAL DISTRIBUTION

Descriptive Statistics

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Estimation and Confidence Intervals

Standard Deviation Estimator

Frequency Distributions

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Hypothesis Testing for Beginners

Need for Sampling. Very large populations Destructive testing Continuous production process

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

6.3 Conditional Probability and Independence

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

5.1 Identifying the Target Parameter

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Permutation Tests for Comparing Two Populations

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Week 3&4: Z tables and the Sampling Distribution of X

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Means, standard deviations and. and standard errors

Population Mean (Known Variance)

Comparing Means in Two Populations

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Key Concept. Density Curve

E3: PROBABILITY AND STATISTICS lecture notes

Exploratory Data Analysis. Psychology 3256

Basic Probability and Statistics Review. Six Sigma Black Belt Primer

How To Check For Differences In The One Way Anova

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

1 Error in Euler s Method

Two-sample inference: Continuous data

3: Summary Statistics

Section 6.2 Definition of Probability

Review of basic statistics and the simplest forecasting model: the sample mean

CALCULATIONS & STATISTICS

Point and Interval Estimates

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

The Normal Distribution

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Confidence intervals

9. Sampling Distributions

Sampling Distributions

MAS108 Probability I

The Normal distribution

6.042/18.062J Mathematics for Computer Science. Expected Value I

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Probabilistic Strategies: Solutions

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Reflections on Probability vs Nonprobability Sampling

An Introduction to Basic Statistics and Probability

Probability Using Dice

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

In the past, the increase in the price of gasoline could be attributed to major national or global

Chapter 7. One-way ANOVA

Fairfield Public Schools

CHAPTER 14 NONPARAMETRIC TESTS

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

You flip a fair coin four times, what is the probability that you obtain three heads.

What Does the Normal Distribution Sound Like?

MEASURES OF VARIATION

Chapter 4. Probability and Probability Distributions

The Graphical Method: An Example

1.5 Oneway Analysis of Variance

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Confidence Intervals

GAUTENG DEPARTMENT OF EDUCATION DIRECTORATE: EXAMINATIONS AND ASSESSMENT GUIDELINE DOCUMENT FOR CONTINUOUS ASSESSMENT (CASS) PORTFOLIO MATHEMATICS

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Foundation of Quantitative Data Analysis

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

z-scores AND THE NORMAL CURVE MODEL

Midterm Review Problems

Statistics Review PSY379

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

6th Grade Lesson Plan: Probably Probability

Normal distributions in SPSS

The Binomial Probability Distribution

Chapter 7 Section 7.1: Inference for the Mean of a Population

Transcription:

Estimation; Sampling; The T distribution I. Estimation A. In most statistical studies, the population parameters are unknown and must be estimated. Therefore, developing methods for estimating as accurately as possible the values of population parameters is an important part of statistical analysis. B. Estimators random variables used to estimate population parameters C. Estimates specific values of the population parameters. Estimates need not have a single value; instead, the estimate could be a range of values. Point estimate estimate that specifies a single value of the population 3. Interval estimate estimate that specifies a range of values D. Properties of a good estimator. Let θ (this is the Greek letter theta) a population parameter. Let θˆ a sample estimate of that parameter. Desirable properties of θˆ are:. Unbiased: Expected value the true value of the parameter, that is, E( θˆ ) θ. For example, E( X ) µ, E(s5) σ5.. Efficiency: The most efficient estimator among a group of unbiased estimators is the one with the smallest variance. For example, both the sample mean and the sample median are unbiased estimators of the mean of a normally distributed variable. However, X has the smallest variance. 3. Sufficiency: An estimator is said to be sufficient if it uses all the information about the population parameter that the sample can provide. The sample median is not sufficient, because it only uses information about the ranking of observations. The sample mean is sufficient. 4. Consistency. An estimator is said to be consistent if it yields estimates that converge in probability to the population parameter being estimated as becomes larger. That is, as tends to infinity, E( θˆ ) θ, V( θˆ ). For example, as tends to infinity, V( X ) σ5/. II. Sampling A. Usually, we do not have data on the entire population. Hence, we must rely on observing a subset (or sample) of the population. Population parameters are estimated using the sample. Estimation; Sampling distributions; The T distribution - Page

B. The sample may not necessarily represent the population though.. onsampling errors - Can study the wrong population. For example, we might exclude those without phones; or, we might rely on self-selection into the sample. For now, we will assume nonsampling errors are not present.. Sampling errors - chance alone can cause the sample estimates to differ from the population parameters. C. There are many different types of sampling schemes. Different types of samples, and the kinds of issues you have to be concerned with when drawing a sample, are discussed in much greater detail in the Research Methods class. For now, we will primarily concern ourselves with a specific type of sampling scheme, simple random sampling. With simple random sampling, each item in the population has an equal and identical chance of being included in the sample. III. Sample statistics. A. Suppose we are planning to take a sample of independent observations in order to determine the characteristics of some random variable X. Then, X, X,..., X represent the observations in this sample. The probability distribution of X through X is identical to the distribution of the population random variable X. We want to use the sample values of X - X to determine the characteristics of X. Sample statistics are used as estimates of population parameters. The population mean and variance are among the parameters we want to estimate using a sample. B. The sample mean is defined as: C. The sample variance is X X i µˆ s - ( X i - X ) ( - X i - X ) σˆ D. Comments on formulas and notation.. otation used differs greatly from source to source and author to author. It is therefore good to become comfortable with a variety of notations. ote the use of the ^ in the notation for both the sample mean and variance. It is very common to use a ^ over a population parameter to represent the corresponding sample estimate. 3. In the sample variance, note that the denominator uses - rather than. It can be shown that using - rather than produces an unbiased estimate of the sample Estimation; Sampling distributions; The T distribution - Page

variance; that is, E(s5) σ5. Of course, when is large, it doesn't make much difference whether you use or -. IV. Sampling distributions. A. ote that different samples could yield different values for X. For example, if I took a random sample of 5 exam scores, I might get a mean of 94; if I took a different random sample of 5 cases, I might get a mean of 9; and so on. That is, X is itself a random variable, which has its own mean and variance. If we take all possible samples of size and determine for each sample, the resulting distribution is the probability distribution for X. The probability distribution of X is called a sampling distribution. B. What is the mean and variance of the sampling distribution for X? That is, what is the mean and variance of X?. Mean of X : + X +...+ X X E( X ) E. Variance of X : + X +...+ X X V( X )V 3. Standard deviation of X : ( + µ +...+ µ ) * µ µ X X µ X µ µ X x σ X σ X ( σ X + σ X +...+ σ X ) σ x σ X SD( X ) σ X σ X The standard deviation of X is referred to as the true standard error of the mean. C. Question: We know the mean and variance of X - but what is the shape?. If X- (µ, σ5), then X σ ~ µ,,and Z X - µ σ X - µ ~ (,) σ Estimation; Sampling distributions; The T distribution - Page 3

. What if X is not distributed normally? According to the Central Limit Theorem - regardless of the shape of the parent population (as long as it has a finite mean µ and variance σ5) the distribution of X will approach (µ, σ5/) as the sample size approaches infinity. That is, as ---> 4, X - (µ, σ5/). In practice, 3 is usually a pretty good approximation of infinity! Example: How X comes to be normally distributed as gets larger and larger. As stated above, according to the Central Limit Theorem - regardless of the shape of the parent population (as long as it has a finite mean µ and variance σ5) the distribution of X will approach (µ, σ5/) as the sample size approaches infinity. That is, as ---> 4, X - (µ, σ5/). We will now give an example of this, showing how the sampling distribution of X for the number of pips showing on a die changes as changes from,, 3,, and. ote how, as gets bigger and bigger, the distribution of X gets more and more normal-like. Also, the true SE of the mean gets smaller and smaller, causing X to vary less and less from the true mean of 3.5.. When, X and X have the same distribution. This distribution is clearly OT normal. Rather, it has a Uniform distribution each possible value is equally likely. For example, you are just as likely to get a as you are a 4, even though 4 is much closer to the mean (3.5) than is. By way of contrast, with a normally distributed variable, values close to the mean are more likely than values farther away from the mean. Statistics Mean True SE Percentiles Missing 6.667 33.333 5 66.667 83.333 6 3.5.78.67.333 3.5 4.667 5.833.. 3. 4. 5. 6. Total Cumulative Frequency Percent Percent Percent 6.7 6.7 6.7 6.7 6.7 33.3 6.7 6.7 5. 6.7 6.7 66.7 6.7 6.7 83.3 6.7 6.7. 6.. Estimation; Sampling distributions; The T distribution - Page 4

Percent.. 3. 4. 5. 6. Cases weighted by WGT. otice how you are already getting big changes a mean of is now much less likely than a mean of 3.5 or 4. The distribution is not quite normal yet, but it certainly is not uniform anymore. As you see from the percentiles, the middle two-thirds of the distribution runs from.83 to 4.97, i.e. if we repeatedly drew samples of size, /3 of the times the sample mean would be between.83 and 4.97 (which in practice is.5 to 4.5 inclusive). Statistics Mean True SE Percentiles Missing 6.667 33.333 5 66.667 83.333 36 3.5.8.83 3. 3.5 4. 4.97..5..5 3. 3.5 4. 4.5 5. 5.5 6. Total Cumulative Frequency Percent Percent Percent.8.8.8 5.6 5.6 8.3 3 8.3 8.3 6.7 4.. 7.8 5 3.9 3.9 4.7 6 6.7 6.7 58.3 5 3.9 3.9 7. 4.. 83.3 3 8.3 8.3 9.7 5.6 5.6 97..8.8. 36.. Estimation; Sampling distributions; The T distribution - Page 5

Percent..5..5 3. 3.5 4. 4.5 5. 5.5 6. Cases weighted by WGT 3. The distribution continues to get more and more normal like, and values further from the mean continue to get less and less likely. The middle /3 now covers a smaller range,.67 through 4.33. Statistics Mean True SE Percentiles Missing 6.667 33.333 5 66.667 83.333 6 3.5.986.667 3. 3.5 4. 4.333 Estimation; Sampling distributions; The T distribution - Page 6

..33.67..33.67 3. 3.33 3.67 4. 4.33 4.67 5. 5.33 5.67 6. Total Cumulative Frequency Percent Percent Percent.5.5.5 3.4.4.9 6.8.8 4.6 4.6 4.6 9.3 5 6.9 6.9 6. 9.7 9.7 5.9 5.6.6 37.5 7.5.5 5. 7.5.5 6.5 5.6.6 74. 9.7 9.7 83.8 5 6.9 6.9 9.7 4.6 4.6 95.4 6.8.8 98. 3.4.4 99.5.5.5. 6.. 4 8 6 4 Percent..67.33 3. 3.67 4.33 5. 5.67.33..67 3.33 4. 4.67 5.33 6. Cases weighted by WGT. This is very close to being a normal distribution. When, there is a /6 th chance that the sample mean would be ; when, there is less than chance in 6 million of getting a sample mean that small. ow the middle two-thirds runs from 3 to 4. If you rolled a die times, there is only a very small chance (about.3%) that you would get a mean of less than or greater than 5. Statistics Mean True SE Percentiles Missing 6.667 33.333 5 66.667 83.333 646676 3.5.54 3. 3.3 3.5 3.7 4. Estimation; Sampling distributions; The T distribution - Page 7

....3.4.5.6.7.8.9....3.4.5.6.7.8.9 3. 3. 3. 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4. 4. 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. 5. 5. 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6. Total Cumulative Frequency Percent Percent Percent...... 55...... 75...,... 4,995...,34... 3,76... 46,4... 85,8...3 47,94...5 43,95.4.4.9 383,47.6.6.6 576,565...5 83,4.4.4 3.9,5,37.9.9 5.8,535,4.5.5 8.3,97,63 3.3 3.3.6,446,3 4. 4. 5.7,93,455 4.8 4.8.5 3,393,6 5.6 5.6 6. 3,8,535 6.3 6.3 3.4 4,,6 6.8 6.8 39. 4,35,3 7. 7. 46.4 4,395,456 7.3 7.3 53.6 4,35,3 7. 7. 6.8 4,,6 6.8 6.8 67.6 3,8,535 6.3 6.3 73.9 3,393,6 5.6 5.6 79.5,93,455 4.8 4.8 84.3,446,3 4. 4. 88.4,97,63 3.3 3.3 9.7,535,4.5.5 94.,5,37.9.9 96. 83,4.4.4 97.5 576,565.. 98.4 383,47.6.6 99. 43,95.4.4 99.5 47,94.. 99.7 85,8.. 99.9 46,4.. 99.9 3,76...,34... 4,995...,... 75...... 55......... 6,466,76.. 8 6 4 Percent..6..8 3.4 4. 4.6 5. 5.8.3.9.5 3. 3.7 4.3 4.9 5.5 Cases weighted by WGT Estimation; Sampling distributions; The T distribution - Page 8

. I cannot generate enough data to illustrate this! (When, there are already more than 6 million possible combinations.) But if I could, the mean would be 3.5, and the true standard error would be.54. The middle /3 would run from about 3.446 to 3.554. If we took the average of dice rolls, there is only about a.3% chance that the average would not be between 3.34 and 3.66. V. The T distribution. ote that, in order to do a Z-score transformation, σ must be known. In reality, such a situation would be extremely rare. Much more common is the situation where both µ and σ are unknown. What happens if we do not want to (or cannot) assume a value for σ (i.e. σ is unknown)? When σ is unknown, we substitute the sample variance s; and instead of doing a Z transformation which produces a (,) variable, we do a T transformation which produces a variable with a T distribution. That is, when σ is unknown, X - µ X - µ ; s s X - µ ~ T s T - Unfortunately, when s is substituted for σ, T is not normally distributed, nor is its variance. Since s is itself a random variable, there is more uncertainty about the true value of X_, hence the variance of T is greater than. OTE: You want to be sure to distinguish between the following: σ True standard error of the mean, s est σ M Estimated standard error of the mean. Characteristics of the T distribution: T Shape is determined by one parameter, v - degrees of freedom. Why -? If you know s, and the value of - observations, you can determine the value of the th observation. Hence, only - observations are free to vary. T E(T). T The T distribution is symmetric. Estimation; Sampling distributions; The T distribution - Page 9

T As tends to infinity, T - (, ). is a pretty good approximation of infinity, and even 3 isn't too bad. (ote: this is fairly logical, since the bigger the sample size, the closer s approximates σ). T The following diagram (from Harnett) shows how a variable with a T distribution and 3 degrees of freedom compares to a variable with a standardized normal distribution. ote that values are much more spread out in the T-distributed variable, i.e. the tails are fatter. As V increases, the T distribution becomes more and more like a standardized normal distribution; after v, there is very little difference, and even after v3 there is not much difference. 3. The T transformation is appropriate whenever the parent population is normally distributed and σ is unknown. Even if the parent population is not normally distributed, T will often work ok. 4. Tables for the T distribution. T is a little bit harder to work with than Z, because P(a # T # b) depends upon the degrees of freedom. For example, for v, P(-.98 # T #.98).95, but when v, P(-.8 # T #.8).95. Ergo, tables for the T distribution typically list critical values for T at specific levels of significance (i.e. α., α.5) for different values of v. See Appendix E, Table III, (or Hays p. 93) for tables for the T distribution. In Table III, the first column gives values for v. The first row lists significance values for what Hayes labels as Q and Q. If you want to do a -tailed test, look at the line labeled Q; for a - tailed test, use the line labeled Q. Q P(T v $ t) P(T v # -t), and Q - P(-t # T v # t). T For example, suppose (i.e. v ) and we want to find t for P(T $ t).5; looking at the row labeled v and the column labeled Q.5, we see that the critical value is.8. (ote that this is larger than the z value of.65 that we use when σ is known). ote that this also means that F(.8).95. Estimation; Sampling distributions; The T distribution - Page

T Conversely, if we wanted to find t for P(T # t).5, the appropriate value would be -.8 (meaning F(-.8).5. T Or, suppose v 5, α., and we want P(-a # T # a).99. We look at the row for v 5 and the column for Q., and we see that a.947, i.e. P(-.947 # T 5 #.947).99. (By way of contrast, when σ is known, the interval is -.58 to.58). Other than always having to look up values in the table, working with the T distribution is pretty much the same as working with the (, ) distribution. EXAMPLES:. Find the values of t for the T distribution which satisfy each of the following conditions: (a) the area between -t and t is.9 and v 5 (b) the area to the left of t is.5 and v Solution. (a) P(-t # T 5 # t).9 > Q.. Looking at Table III, appendix E, v 5, Q., we see that t.78. (b) P(T # t).5 P(T $ -t) > t -.86. (Look at Q.5, v.) Estimation; Sampling distributions; The T distribution - Page