Chapter 7 Sampling Distributions

Every probability distribution is characterized by its parameter(s). For example a normal distribution is determined by μ and σ. A binomial distribution is determined by p. See pages 179 and 214. In practical situations, we want to decide what type of probability distribution to use as a model. However the parameter(s) of that distribution is (are) not known to us. In such situations, we obtain information about the values of the parameters by taking a sample from the population involved.

Examples: 1. The responses to a true/false question posed to a population is believed to fit a binomial distribution. In this case the relative frequency of those in the sample, chosen from the population, who answered true gives information about the true value of parameter p. The heights of adult males in north America follow a normal distribution. In this case the mean and standard deviation of the sample taken approximate the actual values of μ and σ.

SAMPLING As it now clear sampling is important in determining the value(s) of the parameters involved. If you want the sample to provide reliable information about the population, you must select your sample in a certain way!

SIMPLE RANDOM SAMPLING The way a sample is selected is called the sampling plan or experimental design and determines the amount of information you can extract, and often allows you to measure the reliability of your inference. Simple random sampling is a method of sampling that allows each possible sample of size n from a population of size N an equal chance (probability) of being selected.

If the size of the population N is small, the simple random sampling can be performed by assigning a number to each member of the population and writing the numbers on pieces of paper, and mix them and select a sample of n. Example: You have a population of 4 objects and want to choose a sample of 2 from this population. How would you do a simple random sampling?

EXAMPLE There are 89 students in a statistics class. The instructor wants to choose 5 students to form a project group. How should he proceed? 1. Give each student a number from 01 to 89. 2. Choose 5 pairs of random digits from the random number table. 3. If a number between 90 and 00 is chosen, choose another number. 4. The five students with those numbers form the group. See example 7.1

TYPES OF SAMPLES Sampling can occur in two types of practical situations: 1. Observational studies: The data existed before you decided to study it. A type of study in which individuals are observed or certain outcomes are measured. No attempt is made to affect the outcome (for example, no treatment is given). In this case computer databases make it possible to assign an id # to each member of the population, even very large one) and makes it possible to select a simple random sample.

Most sample surveys, where information are gathered by asking questions, fall into this category. In such situations watch out for Nonresponse: Are the responses biased because only opinionated people responded? Undercoverage: Are certain segments of the population systematically excluded? Wording bias: The question may be too complicated or poorly worded.

2. Experimentation: The data are generated by imposing an experimental condition or treatment on the experimental units. Hypothetical populations: Statistical population which has no real existence but is imagined to be generated by repetitions of events of a certain kind. Examples: all possible values of tomorrow's highest temperature; all possible ph values of some unknown liquid; all possible heights of men. Hypothetical populations can make random sampling difficult if not impossible. Samples must sometimes be chosen so that the experimenter believes they are representative of the whole population.

Experiments vs. Observational Studies In an experiment investigators apply treatments to experimental units (people, animals, plots of land, etc.) and then proceed to observe the effect of the treatments on the experimental units.

In an observational study investigators observe subjects and measure variables of interest without assigning treatments to the subjects. The treatment that each subject receives is determined beyond the control of the investigator. For example, suppose we want to study the effect of smoking on lung capacity in women.

Experiment Find 100 women age 20 who do not currently smoke. Randomly assign 50 of the 100 women to the smoking treatment and the other 50 to the no smoking treatment. Those in the smoking group smoke a pack a day for 10 years while those in the control group remain smoke free for 10 years. Measure lung capacity for each of the 100 women.

Observational Study Find 100 women age 30 of which 50 have been smoking a pack a day for 10 years while the other 50 have been smoke free for 10 years. Measure lung capacity for each of the 100 women.

OTHER SAMPLING PLANS There are several other sampling plans that still involve randomization: 1. Stratified random sample: Divide the population into subpopulations or strata and select a simple random sample from each strata. 2. Cluster sample: Divide the population into subgroups called clusters; select a simple random sample of clusters and take a census of every element in the cluster. 3. 1-in-k systematic sample: Randomly select one of the first k elements in an ordered population, and then select every k-th element thereafter.

EXAMPLES Divide California into counties and take a simple random sample within each county. Stratified Divide California into counties and take a simple random sample of 10 counties. Divide a city into city blocks, choose a simple random sample of 10 city blocks, and interview all who live there. Cluster Choose an entry at random from the phone book, and select every 50 th number thereafter. 1-in-50 Systematic Cluster

NON-RANDOM SAMPLING PLANS There are several other sampling plans that do not involve randomization. They should NOT be used for statistical inference! 1. Convenience sample: A sample that can be taken easily without random selection. People walking by on the street 2. Judgment sample: The sampler decides who will and won t be included in the sample. 3. Quota sample: The makeup of the sample must reflect the makeup of the population on some selected characteristic. Race, ethnic origin, gender, etc.