SAMPLING DISTRIBUTIONS Page 1 I. Populations, Parameters, and Statistics 1. So far the entire set of elementary events has been called the sample space, since this term is useful and current in probability theory. However, in many fields using statistics it is common to find the word population used to mean the totality of potential units for observation. 2. These potential units for observation are very often real or hypothetical sets of people, plants, or animals, and population provides a very appropriate alternative to sample space in such instances. Nevertheless, whenever the term population is used in the following, we shall mean only the sample space of elementary events from which the samples are drawn. 3. Given a population of potential observations, the particular numerical score assigned to any particular unit observation is a value of a random variable; the distribution of this random variable is the population distribution. This distribution will have some mathematical form, with a mean, a variance ², and all the other characteristic features of any distribution.
SAMPLING DISTRIBUTIONS Page 2 4. If you like, you may think of the population distribution as a frequency distribution based on some large but finite number of cases. However, population distributions are almost always discussed as though they were theoretical probability distributions; the process of random sampling a single units with replacement ensures that the long-run relative frequency of any value of the random variable is the same as the probability of that value. 5. Later we shall have occasion to idealize the population distribution and treat it as though the random variable were continuous. This is impossible for real world observations, but we shall assume that it is "true enough" as an approximation to the population state of affairs. 6. Population values such as and ² will be called parameters of the population. Strictly speaking, a parameter is a value entering as an arbitrary constant in the particular function rule for a probability distribution, although the term is used more loosely to mean any value summarizing the population distribution. Just as parameters are characteristic of populations, so are statistics associated with samples.
SAMPLING DISTRIBUTIONS Page 3 7. There is no limit to the number of ways in which statistics can be constructed and associated with samples, even for samples as simple as binomial sequences. Not all of these statistics would be very useful perhaps, but we are perfectly free to define them. A statistic is simply a function on samples, such that any sample is paired with a value of that statistic. For samples of numerical data we ordinarily construct and use familiar statistics such as means, variances, medians, percentile ranks, and the likes because they happen to be simple and useful. 8. Moreover, a statistic need not use all of the information in a sample. Certainly the median, like the other percentiles, appears to be based on less information in a sample than is the mean or the variance. II. Sampling Distributions 1. In actual practice, random samples seldom consists of single observations. Almost always some N observations are drawn from the same population. Furthermore, the value of some statistic is associated with the sample.
SAMPLING DISTRIBUTIONS Page 4 2. Interest then lies in the distribution of values of this statistic across all possible samples of N observations from this population. Accordingly, we must distinguish still another kind of theoretical distribution, called a sampling distribution. A sampling distribution is a theoretical probability distribution that shows the function relation between the possible values of a given statistic based on a sample of N cases and the probability density associated with each value, for all possible samples of size N drawn from a particular population. (Hays, 1988, p. 192) 3. In general, the sampling distribution of values for a particular sample statistic will not be the same as the distribution of the random variable for the population. However, the sampling distribution always depends in some specifiable way upon the population distribution, provided the probability structure underlying the occurrence of samples is known. 4. Notice that this definition is not confined to simple random samples, even though in most applications it will be assumed that samples are drawn at random from the population.
SAMPLING DISTRIBUTIONS Page 5 5. Nevertheless, some probability structure linking the occurrence of the possible samples with the population must exist and be known if the population distribution is to be related to the sampling distribution of any statistic. 6. For our elementary purposes this probability structure will be that of simple random sampling, in which each possible sample of size N has exactly the same probability of occurrence as any other. However, in more advanced work, assumptions other than simple random sampling are sometimes made. 7. Actually, we have already used sampling distributions. For example, a binomial distribution is a sampling distribution. Recall that a binomial distribution is based on a two-category population distribution, or Bernoulli process. 8. A sample of N independent cases is drawn at random from such a distribution, and the number (or proportion) of successes is calculated for each sample. Then the binomial distribution is the sampling distribution showing the relation between each possible sample result and the theoretical probability of occurrence. 9. The binomial distribution is not the same as the Bernoulli process unless N is 1; however, given the Bernoulli process and the size of the sample N, the binomial distribution may be worked out.
SAMPLING DISTRIBUTIONS Page 6 10. Other examples of sampling distributions will now be given. A most important distribution we shall employ is the sampling distribution of the mean. Here, samples of N cases are drawn independently and at random from some population and each observation is measured numerically. For each sample drawn the sample mean is calculated. The theoretical distribution that relates the possible values of the sample mean to the probability (density) of each over all possible samples of size N is called the sampling distribution of the mean.(hays, 1988, p.193) 11. Furthermore, for each sample of size N drawn, the sample variance S² may be found. The theoretical distribution of sample variances in relation to the probability of each is the sampling distribution of the variance. By the same token, the sampling distribution of any summary characteristic (mode, median, range, etc.) of samples of N cases may be found, given the population distribution and the sample size N.
SAMPLING DISTRIBUTIONS Page 7 IV. Characteristics of Single-Variate Sampling Distributions 1. A sampling distribution is a theoretical probability distribution, and like any such distribution, is a statement of the functional relation between the values or intervals of values of some random variable and probabilities. 2. Sampling distributions differ from population distributions in that the random variable is always the value of some statistic based on a sample of N cases, such as the sample mean, sample variance, or sample median, etc. Thus, a plot of a sampling distribution, such as figure 5.3.1 (p. 193), always has for the abscissa (or horizontal axis) the different sample statistic values that might occur. 3. Like population distributions, sampling distributions may be either continuous or discrete. The binomial distribution is discrete, although in applied problems it is sometimes treated as though it were continuous. Most of the commonly encountered sampling distributions based on a continuous population distribution will be continuous.
SAMPLING DISTRIBUTIONS Page 8 V. Sample Statistics as Estimators 1. some population parameters have obvious parallels in sample statistics. The population mean has its sample counterpart in X, the variance ² in the sample variance S², the population proportion p in the sample proportion P, and so on. 2. It is true, however, that a sample of cases drawn from a population contains information about the population distribution and its parameters. Furthermore, a statistic computed from the data in the sample contains some of that information. Some statistics contain more information than others, and some statistics may contain more information about certain parameters than about others. 3. A central problem of inferential statistics is point estimation, the use of the value of some statistic to infer the value of a population parameter. The value of some statistic (or point in the "space" of all possible values) is taken as the "best estimate" of the value of some parameter of the population distribution. 4. How does one go from a sample statistic to an inference about the population parameter? In particular, which sample statistic does one use, if it is go give an estimate that is in some sense "best"?
SAMPLING DISTRIBUTIONS Page 9 5. The fact that the sample represents only a small subset of observations drawn from a much larger set of potential observations makes it nearly impossible to say that any estimate is exactly like the population value. As a matter of fact they very probably will not be the same, as all sorts of different factors of which we are in ignorance may make the sample a poor representation of the population. Such factors we lump together under the general rubrics chance or random effects. 6. In the long run such samples should reflect the population characteristics. However, practical action can seldom wait for "in the long run"; things must be decided here and now in the face of limited evidence. We need to know how to use the available evidence in the best of possible ways to infer the characteristics of the population. 7. Various statistics differ in the information they provide about population parameters. They also differ in the extent to which this is "good" information, that can be used to estimate the value of the parameter in question. We are now going to examine some statistics in terms of their properties as estimators.
SAMPLING DISTRIBUTIONS Page 10 VI. Desirable Properties of Estimators 1. Since there are many ways for devising a sample statistic for estimating a population parameter's value, several criteria are used for judging how effectively a given statistic serves this purpose. Some statistics have the desirable property of being the maximum-likelihood estimator of a population parameter. In addition, good estimators should be unbiased, consistent, and relatively efficient, and a set of used for estimating a set of parameters should be sufficient. 2. As we shall see later, the sample mean X is an unbiased estimator of the population mean. Furthermore, under binomial sampling, the sample proportion P is an unbiased estimator of the population proportion p. 3. On the other hand, the sample variance S² is an example of a biased estimator, since E(S²) is not, in general, equal to the population variance ². 4. Another desirable property of an estimator is consistency. Roughly speaking, this means that the larger the sample size N, the higher the probability that the sample statistic comes close to the population parameter. Statistics that have this property are called consistent estimators.
SAMPLING DISTRIBUTIONS Page 11 5. The sample mean, the sample variance, and many other common statistics are consistent estimators, as they tend in likelihood to be closer to the population value as the sample size increases. 6. A third criterion for choosing an estimator is called relative efficiency. When looking at two or more estimators the more efficient estimator has the smaller sampling variance. We shall see, one of the reasons for preferring the mean to the median is that when the population is of a "normal" type, and both are unbiased estimates of the population mean, the mean is relatively more efficient that the median, given the same sample size N. 7. Still another concept of major importance in the modern theory of statistics is that of sufficiency. That is, if our statistic is a sufficient statistic, our estimate of the parameter cannot be improved by considering any other aspect of the data not already included in the statistic itself. 8. In some population distributions where there may be more than one parameter required to specify the distribution, then two or more statistics may be required for sufficiency. In these instances one refers to the set of sufficient statistics, rather than to a single sufficient estimator.
SAMPLING DISTRIBUTIONS Page 12 9. Sufficient statistics do not always even exist, and situations can be constructed in which no sufficient set of estimators can be found for a set of parameters. Nevertheless, sets of sufficient estimators, when they do exist, are important, since if one can find a set of sufficient estimators, then it is ordinarily possible to find unbiased and efficient estimators based on that sufficient set. In particular, when a set of sufficient statistics exists, then the maximum-likelihood estimators will be based upon that set.