SAMPLING DISTRIBUTIONS Page So far the entire set of elementary events has been called the sample space, since

Similar documents
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Point and Interval Estimates

Descriptive Statistics and Measurement Scales

6.4 Normal Distribution

Session 7 Bivariate Data and Analysis

Fairfield Public Schools

UNDERSTANDING THE TWO-WAY ANOVA

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Week 4: Standard Error and Confidence Intervals

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Chapter 4. Probability and Probability Distributions

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Reflections on Probability vs Nonprobability Sampling

CALCULATIONS & STATISTICS

Binomial Sampling and the Binomial Distribution

CHAPTER 2 Estimating Probabilities

II. DISTRIBUTIONS distribution normal distribution. standard scores

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Chapter 3 RANDOM VARIATE GENERATION

1 Prior Probability and Posterior Probability

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Descriptive Statistics

SAMPLING DISTRIBUTIONS

Exact Nonparametric Tests for Comparing Means - A Personal Summary

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

TEACHER NOTES MATH NSPIRED

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

How To Write A Data Analysis

Problem of the Month: Fair Games

PROBABILITY AND SAMPLING DISTRIBUTIONS

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Scatter Plots with Error Bars

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Means, standard deviations and. and standard errors

Normal distribution. ) 2 /2σ. 2π σ

Standard Deviation Estimator

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

Benchmarking Student Learning Outcomes using Shewhart Control Charts

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

E3: PROBABILITY AND STATISTICS lecture notes

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

What is the Probability of Pigging Out

Sampling Distributions

Study Guide for the Final Exam

The Graphical Method: An Example

Chapter G08 Nonparametric Statistics

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Descriptive Statistics

Estimation and Confidence Intervals

The normal approximation to the binomial

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Credit Score Basics, Part 1: What s Behind Credit Scores? October 2011

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Basic Probability Concepts

Consolidation of Grade 3 EQAO Questions Data Management & Probability

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

The Binomial Distribution

Sample Size and Power in Clinical Trials

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

A review of the portions of probability useful for understanding experimental design and analysis.

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Non-Parametric Tests (I)

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Notes on Probability and Statistics

CHAPTER 14 NONPARAMETRIC TESTS

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

The normal approximation to the binomial

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 7: Continuous Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Terms concerned with internal quality control procedures

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Principle of Data Reduction

Quantitative Methods for Finance

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

INTERNATIONAL STANDARD ON AUDITING 540 AUDITING ACCOUNTING ESTIMATES, INCLUDING FAIR VALUE ACCOUNTING ESTIMATES, AND RELATED DISCLOSURES CONTENTS

Chi Square Tests. Chapter Introduction

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Partial Estimates of Reliability: Parallel Form Reliability in the Key Stage 2 Science Tests

Sampling Techniques Surveys and samples Source:

Jitter Measurements in Serial Data Signals

Mathematical Induction

Lottery Combinatorics

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Sampling. COUN 695 Experimental Design

Transcription:

SAMPLING DISTRIBUTIONS Page 1 I. Populations, Parameters, and Statistics 1. So far the entire set of elementary events has been called the sample space, since this term is useful and current in probability theory. However, in many fields using statistics it is common to find the word population used to mean the totality of potential units for observation. 2. These potential units for observation are very often real or hypothetical sets of people, plants, or animals, and population provides a very appropriate alternative to sample space in such instances. Nevertheless, whenever the term population is used in the following, we shall mean only the sample space of elementary events from which the samples are drawn. 3. Given a population of potential observations, the particular numerical score assigned to any particular unit observation is a value of a random variable; the distribution of this random variable is the population distribution. This distribution will have some mathematical form, with a mean, a variance ², and all the other characteristic features of any distribution.

SAMPLING DISTRIBUTIONS Page 2 4. If you like, you may think of the population distribution as a frequency distribution based on some large but finite number of cases. However, population distributions are almost always discussed as though they were theoretical probability distributions; the process of random sampling a single units with replacement ensures that the long-run relative frequency of any value of the random variable is the same as the probability of that value. 5. Later we shall have occasion to idealize the population distribution and treat it as though the random variable were continuous. This is impossible for real world observations, but we shall assume that it is "true enough" as an approximation to the population state of affairs. 6. Population values such as and ² will be called parameters of the population. Strictly speaking, a parameter is a value entering as an arbitrary constant in the particular function rule for a probability distribution, although the term is used more loosely to mean any value summarizing the population distribution. Just as parameters are characteristic of populations, so are statistics associated with samples.

SAMPLING DISTRIBUTIONS Page 3 7. There is no limit to the number of ways in which statistics can be constructed and associated with samples, even for samples as simple as binomial sequences. Not all of these statistics would be very useful perhaps, but we are perfectly free to define them. A statistic is simply a function on samples, such that any sample is paired with a value of that statistic. For samples of numerical data we ordinarily construct and use familiar statistics such as means, variances, medians, percentile ranks, and the likes because they happen to be simple and useful. 8. Moreover, a statistic need not use all of the information in a sample. Certainly the median, like the other percentiles, appears to be based on less information in a sample than is the mean or the variance. II. Sampling Distributions 1. In actual practice, random samples seldom consists of single observations. Almost always some N observations are drawn from the same population. Furthermore, the value of some statistic is associated with the sample.

SAMPLING DISTRIBUTIONS Page 4 2. Interest then lies in the distribution of values of this statistic across all possible samples of N observations from this population. Accordingly, we must distinguish still another kind of theoretical distribution, called a sampling distribution. A sampling distribution is a theoretical probability distribution that shows the function relation between the possible values of a given statistic based on a sample of N cases and the probability density associated with each value, for all possible samples of size N drawn from a particular population. (Hays, 1988, p. 192) 3. In general, the sampling distribution of values for a particular sample statistic will not be the same as the distribution of the random variable for the population. However, the sampling distribution always depends in some specifiable way upon the population distribution, provided the probability structure underlying the occurrence of samples is known. 4. Notice that this definition is not confined to simple random samples, even though in most applications it will be assumed that samples are drawn at random from the population.

SAMPLING DISTRIBUTIONS Page 5 5. Nevertheless, some probability structure linking the occurrence of the possible samples with the population must exist and be known if the population distribution is to be related to the sampling distribution of any statistic. 6. For our elementary purposes this probability structure will be that of simple random sampling, in which each possible sample of size N has exactly the same probability of occurrence as any other. However, in more advanced work, assumptions other than simple random sampling are sometimes made. 7. Actually, we have already used sampling distributions. For example, a binomial distribution is a sampling distribution. Recall that a binomial distribution is based on a two-category population distribution, or Bernoulli process. 8. A sample of N independent cases is drawn at random from such a distribution, and the number (or proportion) of successes is calculated for each sample. Then the binomial distribution is the sampling distribution showing the relation between each possible sample result and the theoretical probability of occurrence. 9. The binomial distribution is not the same as the Bernoulli process unless N is 1; however, given the Bernoulli process and the size of the sample N, the binomial distribution may be worked out.

SAMPLING DISTRIBUTIONS Page 6 10. Other examples of sampling distributions will now be given. A most important distribution we shall employ is the sampling distribution of the mean. Here, samples of N cases are drawn independently and at random from some population and each observation is measured numerically. For each sample drawn the sample mean is calculated. The theoretical distribution that relates the possible values of the sample mean to the probability (density) of each over all possible samples of size N is called the sampling distribution of the mean.(hays, 1988, p.193) 11. Furthermore, for each sample of size N drawn, the sample variance S² may be found. The theoretical distribution of sample variances in relation to the probability of each is the sampling distribution of the variance. By the same token, the sampling distribution of any summary characteristic (mode, median, range, etc.) of samples of N cases may be found, given the population distribution and the sample size N.

SAMPLING DISTRIBUTIONS Page 7 IV. Characteristics of Single-Variate Sampling Distributions 1. A sampling distribution is a theoretical probability distribution, and like any such distribution, is a statement of the functional relation between the values or intervals of values of some random variable and probabilities. 2. Sampling distributions differ from population distributions in that the random variable is always the value of some statistic based on a sample of N cases, such as the sample mean, sample variance, or sample median, etc. Thus, a plot of a sampling distribution, such as figure 5.3.1 (p. 193), always has for the abscissa (or horizontal axis) the different sample statistic values that might occur. 3. Like population distributions, sampling distributions may be either continuous or discrete. The binomial distribution is discrete, although in applied problems it is sometimes treated as though it were continuous. Most of the commonly encountered sampling distributions based on a continuous population distribution will be continuous.

SAMPLING DISTRIBUTIONS Page 8 V. Sample Statistics as Estimators 1. some population parameters have obvious parallels in sample statistics. The population mean has its sample counterpart in X, the variance ² in the sample variance S², the population proportion p in the sample proportion P, and so on. 2. It is true, however, that a sample of cases drawn from a population contains information about the population distribution and its parameters. Furthermore, a statistic computed from the data in the sample contains some of that information. Some statistics contain more information than others, and some statistics may contain more information about certain parameters than about others. 3. A central problem of inferential statistics is point estimation, the use of the value of some statistic to infer the value of a population parameter. The value of some statistic (or point in the "space" of all possible values) is taken as the "best estimate" of the value of some parameter of the population distribution. 4. How does one go from a sample statistic to an inference about the population parameter? In particular, which sample statistic does one use, if it is go give an estimate that is in some sense "best"?

SAMPLING DISTRIBUTIONS Page 9 5. The fact that the sample represents only a small subset of observations drawn from a much larger set of potential observations makes it nearly impossible to say that any estimate is exactly like the population value. As a matter of fact they very probably will not be the same, as all sorts of different factors of which we are in ignorance may make the sample a poor representation of the population. Such factors we lump together under the general rubrics chance or random effects. 6. In the long run such samples should reflect the population characteristics. However, practical action can seldom wait for "in the long run"; things must be decided here and now in the face of limited evidence. We need to know how to use the available evidence in the best of possible ways to infer the characteristics of the population. 7. Various statistics differ in the information they provide about population parameters. They also differ in the extent to which this is "good" information, that can be used to estimate the value of the parameter in question. We are now going to examine some statistics in terms of their properties as estimators.

SAMPLING DISTRIBUTIONS Page 10 VI. Desirable Properties of Estimators 1. Since there are many ways for devising a sample statistic for estimating a population parameter's value, several criteria are used for judging how effectively a given statistic serves this purpose. Some statistics have the desirable property of being the maximum-likelihood estimator of a population parameter. In addition, good estimators should be unbiased, consistent, and relatively efficient, and a set of used for estimating a set of parameters should be sufficient. 2. As we shall see later, the sample mean X is an unbiased estimator of the population mean. Furthermore, under binomial sampling, the sample proportion P is an unbiased estimator of the population proportion p. 3. On the other hand, the sample variance S² is an example of a biased estimator, since E(S²) is not, in general, equal to the population variance ². 4. Another desirable property of an estimator is consistency. Roughly speaking, this means that the larger the sample size N, the higher the probability that the sample statistic comes close to the population parameter. Statistics that have this property are called consistent estimators.

SAMPLING DISTRIBUTIONS Page 11 5. The sample mean, the sample variance, and many other common statistics are consistent estimators, as they tend in likelihood to be closer to the population value as the sample size increases. 6. A third criterion for choosing an estimator is called relative efficiency. When looking at two or more estimators the more efficient estimator has the smaller sampling variance. We shall see, one of the reasons for preferring the mean to the median is that when the population is of a "normal" type, and both are unbiased estimates of the population mean, the mean is relatively more efficient that the median, given the same sample size N. 7. Still another concept of major importance in the modern theory of statistics is that of sufficiency. That is, if our statistic is a sufficient statistic, our estimate of the parameter cannot be improved by considering any other aspect of the data not already included in the statistic itself. 8. In some population distributions where there may be more than one parameter required to specify the distribution, then two or more statistics may be required for sufficiency. In these instances one refers to the set of sufficient statistics, rather than to a single sufficient estimator.

SAMPLING DISTRIBUTIONS Page 12 9. Sufficient statistics do not always even exist, and situations can be constructed in which no sufficient set of estimators can be found for a set of parameters. Nevertheless, sets of sufficient estimators, when they do exist, are important, since if one can find a set of sufficient estimators, then it is ordinarily possible to find unbiased and efficient estimators based on that sufficient set. In particular, when a set of sufficient statistics exists, then the maximum-likelihood estimators will be based upon that set.