This week: Chapter 9 (will do 9.6 to 9.8 later, with Chap. 11) Understanding Sampling Distributions: Statistics as Random Variables

Similar documents
Hypothesis testing. Null and alternative hypotheses

1. C. The formula for the confidence interval for a population mean is: x t, which was

5: Introduction to Estimation

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

One-sample test of proportions

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Confidence Intervals for One Mean

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Math C067 Sampling Distributions

I. Chi-squared Distributions

Practice Problems for Test 3

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Measures of Spread and Boxplots Discrete Math, Section 9.4

Sampling Distribution And Central Limit Theorem

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Confidence Intervals

1 Computing the Standard Deviation of Sample Means

Determining the sample size

Descriptive Statistics

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error


The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Chapter 7 Methods of Finding Estimators

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Chapter 7: Confidence Interval and Sample Size

Quadrat Sampling in Population Ecology

Lesson 17 Pearson s Correlation Coefficient

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Mathematical goals. Starting points. Materials required. Time needed

Maximum Likelihood Estimators.

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

PSYCHOLOGICAL STATISTICS

Overview of some probability distributions.

1 Correlation and Regression Analysis

Statistical inference: example 1. Inferential Statistics

Properties of MLE: consistency, asymptotic normality. Fisher information.

OMG! Excessive Texting Tied to Risky Teen Behaviors

Hypergeometric Distributions

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Chapter 14 Nonparametric Statistics

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

STATISTICAL METHODS FOR BUSINESS

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Lesson 15 ANOVA (analysis of variance)

Confidence intervals and hypothesis tests

Topic 5: Confidence Intervals (Chapter 9)

Normal Distribution.

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Incremental calculation of weighted mean and variance

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%

LECTURE 13: Cross-validation

Output Analysis (2, Chapters 10 &11 Law)

TI-83, TI-83 Plus or TI-84 for Non-Business Statistics

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Section 11.3: The Integral Test

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Central Limit Theorem and Its Applications to Baseball

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

A Mathematical Perspective on Gambling

Confidence Intervals for Linear Regression Slope

MEP Pupil Text 9. The mean, median and mode are three different ways of describing the average.

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

INVESTMENT PERFORMANCE COUNCIL (IPC)

Now here is the important step

Hypothesis testing using complex survey data

Baan Service Master Data Management

The Stable Marriage Problem

CHAPTER 11 Financial mathematics

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Soving Recurrence Relations

Chapter XIV: Fundamentals of Probability and Statistics *

CHAPTER 3 DIGITAL CODING OF SIGNALS

A Recursive Formula for Moments of a Binomial Distribution

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Infinite Sequences and Series

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

CHAPTER 3 THE TIME VALUE OF MONEY

Uncertainty Chapter 13. Mausam (Based on slides by UW-AI faculty)

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

technical guide Uncertainty of Measurement, Precision and Limits of Detection in Chemical and Microbiological Testing Laboratories

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Transcription:

This week: Chapter 9 (will do 9.6 to 9.8 later, with Chap. 11) Uderstadig Samplig Distributios: Statistics as Radom Variables ANNOUNCEMENTS: Shadog Mi will give the lecture o Friday. See website for differet office hours Fri, Mo, Tues. New use of clickers: to test for uderstadig. I will give may more clicker questios, ad radomly five to cout for credit each week. Homework from today ad Friday is due Moday, Nov 8. Homework to be assiged Moday is ot due. Midterm i oe week. You are allowed two sheets of otes. HOMEWORK: Due Mo 11/8, Chapter 9: #15, 25, 37, 44 Chapters 9 to 13: Statistical Iferece See picture draw o board. Five situatios we will cover for the rest of this quarter: For each parameter we will: Lear how to fid a cofidece iterval for its true value Test hypotheses about its true value EXAMPLES OF EACH OF THE 5 SITUATIONS Oe proportio: Biomial situatio with ad p Questio: What proportio of households watched Dacig with the Stars the week of Oct 18? Get a cofidece iterval. Populatio parameter: p = proportio of the populatio of all US households that watched it. Nielse ratigs measure = 25,000 households. X = umber i sample who watched the show = 3,075. pˆ X 3,075 25,000 = = =.123 = proportio of sample who watched This is called p-hat. Differece i two proportios: Compare two populatio proportios usig idepedet samples of size 1 ad 2. Questio: What is the differece i the proportio of smokers who would quit if wearig a icotie patch versus placebo? Get a cofidece iterval for the populatio differece. Test to see if it is statistically sigificatly differet from 0. Populatio parameter: p 1 p 2 = populatio differece i proportios who would quit if everyoe were to use each type of patch (ic.-plac.) Differece i the proportios i the sample who did quit pˆ ˆ 1 p2 =.46.20 =.26 This is read as p-oe-hat mius p-two-hat Note that the parameter ad statistic ca rage from 1 to +1.

Oe mea: Populatio mea for a quatitative variable. Questio: A airlie would like to kow the average weight of checked luggage per passeger, for fuel calculatios. Get a cofidece iterval for the populatio mea. There is o logical value to test, so we would ot do a test. Populatio parameter: µ = mea weight of the luggage for the populatio of all passegers who check luggage. Collect a sample of observatios. For istace, suppose they sample 100 passegers ad fid the mea is 30 pouds. x = 30 = the mea for the sampleof 100 passegers Remember this is read as x-bar Mea for paired differeces: Populatio mea for the differece i two quatitative measuremets i a matched pairs situatio. Questio: How much differet o average would IQ be after listeig to Mozart compared to after sittig i silece? Populatio parameter: µ d = populatio mea for the differece i IQ if everyoe i the populatio were to liste to Mozart versus silece. For the experimet doe with = 36 UCI studets, the mea differece was 9 IQ poits. d = 9 = the mea differece for the sample of 36 studets Read as d-bar. Differece i two meas: Comparig two populatio meas whe idepedet samples of size 1 ad 2 are available. Questio: What is the differece i mea IQ of 4-year-old childre for the populatio of mothers who smoked durig pregacy ad the populatio who did ot? Get a cofidece iterval for the differece. Test to see if the differece is stat sigif. differet from 0. Populatio parameter: µ 1 µ 2 = differece i the meas for the two populatios Based o a study doe at Corell, the differece i meas for two samples was 9 IQ poits. x1 x2 = differece i the meas for the two samples= 9 Read as x-bar-oe mius x-bar-two. GOAL: Estimate ad test parameters based o statistics. Get cofidece itervals ad do hypothesis tests SOME LOGICAL NOTES: 1. Assumig the sample is represetative of the populatio, the sample statistic should represet the populatio parameter fairly well. (Better for larger samples.) 2. But the sample statistic will have some error associated with it, i.e. it wo t equal the parameter exactly. Recall the margi of error from Chapter 3! 3. If repeated samples are take from the same populatio ad the sample statistic is computed each time, these sample statistics will vary but i a predictable way, i.e. they will have a distributio. It is a pdf for the statistic. It is called a samplig distributio for the statistic.

Ratioale: RATIONALE AND DEFINITION FOR SAMPLING DISTRIBUTIONS Remember that a radom variable is a umber associated with the outcome of a radom circumstace, which ca chage each time the radom circumstace occurs. Whe a sample is take from a populatio the resultig umbers are the outcome of a radom circumstace. Dacig with the Stars example: A radom circumstace is takig a radom sample of 25,000 households with TVs. The resultig umber is the proportio of those households that were watchig Dacig with the Stars that week =.123 (or 12.3%) Example: For each differet sample of 25,000 households that week, we would have had a differet sample proportio (sample statistic) watchig the show. Therefore, a sample statistic is a radom variable. Therefore, a sample statistic has a pdf associated with it. The pdf of a sample statistic ca be used to fid the probability that the sample statistic will fall ito specified itervals whe a ew sample is take. Defiitio: The pdf of a sample statistic is called the samplig distributio for that statistic. Example: The radom variable is ˆp = sample proportio = sample statistic. The pdf of ˆp will be defied ext. It is the distributio of possible sample proportios i this sceario. We already kow the pdf for X = umber of households out of 25,000 that are watchig the show. It is biomial with = 25,000 ad p = true proportio of households i US that watched. Familiar example: Suppose 48% (p = 0.48) of a populatio supports a cadidate. I a poll of 1000 radomly selected people, what do we expect to get for the sample proportio pˆ who support the cadidate i the poll? I the last few lectures, we looked at the pdf for X = the umber who support the cadidate. X was biomial, ad also X was approx. ormal with mea = 480 ad s.d. = 15.8. Now let s look at the pdf for the proportio who do. X pˆ = where X is a biomial radom variable. We have see picture of possible values of X. Divide all values by to get picture for possible pˆ.

PDF for x = umber of successes Probability for each possible value of X Plot of possible umber who support cadidate ad probabilities Biomial, =1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000 420 440 460 480 500 520 540 Values for umber of successes X (umber who support cadidate) PDF for ˆp = proportio of successes Probability for each possible value of p-hat Plot of possible proportio who support cadidate, with probabilities Biomial, =1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000.42.44.46.48.50.52 Values for proportio of successes p-hat What s differet ad what s the same about these two pictures? Everythig is the same except the values o the x-axis! O the left, values are umbers 0, 1, 2, to 1000 O the right, values are proportios 0, 1/1000, 2/1000, to 1..54 Recall the ormal approximatio for the biomial: For a biomial radom variable X with parameters ad p (with p ad (1 p) at least 5) X is approximately a ormal radom variable with: mea µ = p stadard deviatio σ = p(1 p) NOW: Divide everythig by to get similar result for p ˆ ˆp is approximately a ormal radom variable with: mea µ = p stadard deviatio σ = s.d.( pˆ ) = p(1 p) So, we ca fid probabilities that ˆp will be i specific itervals if we kow ad p. X = The Samplig Distributio for a Sample Proportio pˆ 1. The physical situatio: biomial. Actual populatio with fixed proportio w/trait or opiio (e.g. polls, TV ratigs, etc.) OR A repeatable situatio with fixed probability of a certai outcome (e.g. birth is a boy, probability of heart attack if oe takes aspiri) 2. The Experimet Radom sample of from the populatio, pˆ = proportio w/trait OR Repeat situatio times, pˆ = proportio with specified outcome 3. Sample size requiremet: I either case, must have p ad (1- p) at least 5, prefer at least 10. Assumig the above coditios are met, the distributio of possible values of pˆ is approximately ormal with: p ( 1 p) mea µ = p stadard deviatio σ = The resultig ormal distributio is called the samplig distributio of ˆp Notatio: p ( 1 p) s.d.( pˆ ) = stadard deviatio of pˆ = But suppose p is ukow (which is will be if we are estimatig it!) The istead we approximate the s.d. usig p ˆ(1 pˆ ) s.e.( pˆ ) = stadard error of pˆ = = estimate of the stadard deviatio of pˆ

This result is also called the ormal curve approximatio rule for sample proportios For the poll example: Poll of = 1000 people, where the true populatio proportio p = 0.48. The distributio of possible values of pˆ is approximately ormal with p ( 1 p). 48(1.48) mea µ = p =0.48 ad s.d. σ = = 0.0158 = 1000 Probability for each possible value of p-hat Actual (tiy rectagles) Plot of possible proportio who support cadidate, with probabilities Biomial, =1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000.42.44.46.48.50 Values for proportio of successes p-hat.52.54 Normal approximatio (smooth) Desity 25 20 15 10 5 0 0.42 0.44 Approproximate distributio of p-hat Normal, Mea=0.48, StDev=0.0158 0.46 0.48 0.50 Possible values of p-hat For example, to fid the probability that ˆp is at least 0.50: Could add up areas of rectagles from.501,.502,, 1000 but that would be too much work! P( ˆp > 0.50) 0.50.48 Pz ( > ) = Pz ( > 1.267) =.103.0158 0.52 0.54 Goig back to the Big Picture The samplig distributio for ˆp describes the distributio of possibilities for it if we were to take millios of samples of size ad compute ˆp each time. It tells us what rages we ca expect ˆp to fall i, ad with what probability. To fid the samplig distributio, we would eed to kow the true value of the parameter p. I practice, we do t kow the true value of the parameter p. I fact the whole poit of statistical iferece is to estimate the parameter, or test for possible values of it. BUT, the stadard deviatio (or stadard error) of the samplig distributio tells us how far the sample statistic is likely to fall from the parameter p, eve if we do t kow what that value of p is. For example, i our poll of = 1000, we kow that the stadard deviatio of ˆp is about.0158 (or.016). So, (from the Empirical Rule) we kow that for approximately 68% of all samples ˆp will be withi oe stadard deviatio =.016 of the true parameter p. We ca use that to estimate p! For istace, if ˆp is 0.45, we ca be 68% certai that the true p is somewhere i the rage of 0.45 ±.016 or betwee 0.434 ad 0.466.

PREPARING FOR THE REST OF CHAPTER 9 For all 5 situatios we are cosiderig, the samplig distributio of the sample statistic: Is approximately ormal Has mea = the correspodig populatio parameter Has stadard deviatio that ivolves the populatio parameter(s) ad thus ca t be kow without it (them) Has stadard error that does t ivolve the populatio parameters ad is used to estimate the stadard deviatio. Has stadard deviatio (ad stadard error) that get smaller as the sample size(s) get larger. Summary table o pages 382-383 will help you with these! New Example I 2005, accordig to the Cesus Bureau, 67% of all childre i the Uited States were livig with 2 parets. (Icludes step-parets ad adoptive parets, but ot foster parets.) I our class, there are about 180 of you who participate i clicker questios. Are you a represetative sample for this questio? If so, what should we expect the class proportio to be? = 180 p =.67 The samplig distributio of ˆp is approximately ormal with (.67)(.33) mea =.67 ad stadard deviatio = =.035 180 Clicker questio (ot for credit, aswers aoymous) I 2005, were you livig with 2 parets? (Step parets ad adoptive parets cout, but foster parets do ot.) A. Yes B. No 12 Samplig distributio of p-hat for =180, p=.67 Normal, Mea=0.67, StDev=0.035 10 8 Desity 6 4 2 0 0.565 0.600 0.635 0.670 0.705 Possible values of p-hat 0.740 0.775