# Sampling Distributions and the Central Limit Theorem

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained the differences between sample, population and sampling distributions and we showed how a sampling distribution can be constructed by repeatedly taking random samples of a given size from a population. A quick look back at Table 9-8 shows the outcome of the process. To summarize, when all possible samples of N = 2 are taken from a population with the values 5, 6, 7, 8 and 9, a total of 25 different samples with 9 different means can be observed. When the sample size is increased to 3, 125 different samples with 13 different means are produced. If we were to take samples of N = 4 from this same population of values, we would find that 625 distinct combinations of 4 numbers could be observed. It is quite obvious that constructing a sampling distribution by the method we used in the preceding chapter becomes extremely laborious as soon as sample size and population size are larger than 10. In fact, the number of different samples (S) of any given size (N) taken from a population of size (P) when we sample with replacement can be expressed by the following equation: For populations of 5 units (P = 5) like our example, this yields the number of samples that were illustrated in Chapter 9: For N = 2, S = 52 = 25

2 136 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics For N = 3, S = 53 = 125 For N = 4, S = 54 = 625 Imagine constructing a sampling distribution by drawing samples of N = 5 students out of a population of 50 students in an undergraduate communication research methods class. If we sample with replacement, the number of different samples is 50 raised to the power 5, or 50 x 50 x 50 x 50 x 50 or 312,500,000 different samples. Computing the mean, sampling variance and standard error of such a sampling distribution would be a monumental assignment. Furthermore, determining the probabilities associated with each of the various sample means would be an equally enormous task. But we need these probabilities to make a valid statistical generalization from a random sample. We obviously need to use another approach to obtaining these probabilities. Fortunately such an approach exists. The Central Limit Theorem The Central Limit Theorem provides us with a shortcut to the information required for constructing a sampling distribution. By applying the Theorem we can obtain the descriptive values for a sampling distribution (usually, the mean and the standard error, which is computed from the sampling variance) and we can also obtain probabilities associated with any of the sample means in the sampling distribution. The mathematics which prove the Central Limit Theorem are beyond the scope of this book, so we will not discuss them here. Instead we will focus on its major principles. They are summarized below: If we randomly select all samples of some size N out of a population with some mean M and some variance of ó 2, then The mean of the sample means ( ) will equal M, the population mean; The sampling variance will be here (the population variance divided by N, the sample size). The standard error will equal the square root of the sampling variance; The sampling distribution of sample means will more closely approximate the Normal Distribution as N increases. We ll discuss each of these points separately in the following sections of this chapter. But before we do, let us be sure to emphasize the three assumptions of the Central Limit Theorem: we know what size sample (N) we would like to draw from the population, and, more importantly, that we know the two population parameters M and ó 2. The Mean of the Sampling Distribution of Means: Parameter Known According to the Central Limit Theorem, the mean of the sampling distribution of means is equal to the population mean. We have already observed this in the examples given in the previous chapter. Our population, consisting of the values 5, 6, 7, 8 and 9, has a mean of 7. When we took all samples of N = 2 or N = 3 out of this population, the mean of all the resulting sample means ( ) in the two sampling distributions were both equal to 7. Therefore, if we know the parameter mean, we can set the mean of the sampling distribution equal to M. This allows us to avoid two massively difficult steps: (1) calculating sample means for all possible samples that can be drawn from the population and (2) calculating the sampling distribution mean from this mass of sample means. The Variance of the Sampling Distribution of Means: Parameter Known According to the Theorem, the variance of the sampling distribution of means equals the population variance divided by N, the sample size. The population variance (ó 2 ) and the size of the samples (N) drawn from that population have been identified in the preceding chapter as the two key factors which influence the variability of the sample means. As we saw in the examples in that chapter, the larger the variance of the values in the population, the greater the range of values that the sample means can take on. We also saw that the sample size was inversely related to the variability of

3 137 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics sample means: the greater the sample size, the narrower the range of sample means. The effect of both factors is thus captured by computing the value of the sampling variance as ó 2 /N. If we know the variance of the population as well as the sample size, we can determine the sampling variance and the standard error. This aspect of the theorem can be illustrated by using our running example. As you can see in Table 10-1, the variance of the population equals Applying the Central Limit Theorem to sample sizes of N = 2 and N = 3 yields the sampling variances and standard errors shown in Table For N = 2 and N = 3, these are exactly the same values for the sampling variance and standard error as were computed from the full set of sample means shown in Table 9-5. Table 10-1 also shows the sampling variance and standard error for a sampling distribution based on a sample size of N = 4 drawn from the same population. Had we calculated these values from the set of 625 sample means, we would have obtained exactly the same results for the variance and standard error. But what do we do when the population parameters are unknown? For example, assume that we are interested in studying the population of newly married couples. Specifically, we are interested in the amount of time they spend talking to each other each week about their relationship. It is highly unlikely that any parameters for this population would be available. As we have already mentioned several times, the absence of known parameters is very common in communication research. How are we to proceed under these conditions? In the absence of known parameters we will have to make do with reliable estimates of these parameters. Such reliable estimates can be obtained when we take random samples of sufficient size from the population. Suppose that we draw a random sample of N = 400 from this population. After measuring the amount of time these newlyweds spend talking to one another about their relationship we observe the mean to be 2 hours per week and the sample standard deviation is 1 hour per week. We will use this information to estimate the mean, the variance and the standard error or the sampling distribution. The Mean of the Sampling Distribution of Means: Parameter Unknown Since we have only a single sample mean, we can t compute the mean of the means. But we can make a simple assumption, based on probability, that will allow us to work from the results of this single sample. We know that the most probable mean found in the sampling distribution is the true population mean (you can see this in Table 9-5 in the previous chapter), and that this mean is at the center

4 138 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics of the sampling distribution. So if we have only one sample from a population, the assumption that the value of the sample mean is the same as the value of the population mean is more likely to be correct than any other assumption we could make. When we do this, we place the center of the sampling distribution right at the sample mean. That is, we arrange our sampling distribution around the computed value of our sample mean (see Figure 10-1). It is important to note that the sample mean of 2.0 is the best estimate of the unknown population (or true) mean. But we have to realize that there is also the possibility that the true population mean is somewhat higher or lower than that figure. We can use the sampling distribution to describe how probable it is that the real population mean falls somewhere other than the computed sample mean. We will return to this point once we are able to fully describe the sampling distribution. The Variance of the Sampling Distribution of Means: Parameter Unknown The sampling variance (and hence the standard error) can be estimated from the sample variance if we are willing to make the following assumption. If we are willing to assume about the population variance what we assumed about the population mean, namely, that the most probable value for this unknown parameters is the one which we have computed from our sample, we can again work with the results of our single sample. After we make this assumption, we can estimate the measures of dispersion by the same method as we did in Table Sampling Variance= var / N = 1 / 400 =.0025 Std Error = Std Error =

5 139 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics We now have a complete description of the sampling distribution, constructed from the information provided by a single random sample. Once the sampling distribution has been identified, either by using known parameters, or by using estimates of these parameters obtained from samples, we can now use this distribution to carry out the next important step: computing the probabilities of the means in the sampling distribution. We need these probabilities to be able to make statements about the likelihood of the truth or falsity of our hypotheses, as we ve already mentioned in the previous chapter. Whether the sampling distribution was derived from known parameters or from estimates matters little in terms of the remainder of the discussion in this chapter, as long as one condition is met: if estimates are used, the sample from which the estimates are derived should be sufficiently large. A violation of this condition does not alter the logic, but requires some computational adjustments. Any statistics text will show you how to carry out these adjustments whenever small samples are encountered. The Sampling Distribution of Means and Probability The most important reason why we need to construct sampling distributions is obtaining the probabilities associated with varying amounts of sampling error. Once we have this information, we can use it for statistical generalization and for testing our hypotheses. Recall that the sampling distribution of means, constructed as we did in the previous chapter, yields a number of means, along with the frequency with which these means were observed. From such a table of means we can determine, for any given amount of sampling error, the likelihood of occurrence of such a sampling error. This likelihood is determined by dividing the frequency with which a given sampling error occurred by the total number of samples. The Central Limit Theorem allows us to directly compute the mean, variance, and standard error of the sampling distribution without going to the trouble of drawing all possible samples from the population. But we still need a way to describe the shape of the distribution of sample means. Fortunately, we can do this by using the third principle of the Central Limit Theorem. This principle states that the sampling distribution of means will tend to look like a normal distribution, when we select samples which contain relatively large numbers of observations. What does this mean? Just this: if we were able to compute the thousands or millions of means from every different sample in a population, then collect these means into a classified frequency distribution, we would see that the distribution has the characteristics of the standard normal distribution. But we don t have to do this. We can simply assume a normal distribution (a bell-shaped curve ) will result, as the Central Limit Theorem tells us that we re going to find this distribution if our samples are large enough. A standard normal distribution like our sampling distribution can be described by the following equation: In this equation, the quantity H is known as the ordinate (or height) of the curve at any point. This computation involves two mathematical constants (e, which equals and n, which equals ) and a quantity called z, which is the standard score as defined and discussed in Chapter 8. A standard score defines any data point (which can be either a data value or a statistic) in terms of the number of standard deviation units (or standard error units) that the point is from the mean. For a given value of z (such as the point marked A in Figure 10-2), the height of the normal distribution can then be computed from the above formula. Why do we need to know this? Because the height of the normal distribution represents the frequency in the sampling distribution with which any mean that is some distance z from the center of the distribution occurs. We can use the frequency to compute probabilities, as we did in Chapter 9. A mean with a large z value is distant from the center of the distribution, and it has a small H (i.e., frequency). This implies that a mean with a large z value has a low probability. The most probable mean (the largest H value) appears when z = 0. Furthermore, when we compute the

6 140 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics height of the curve at two points with different values of z, then the two heights will define an area under the curve which will include a certain proportion or percentage of the means in the sampling distribution, which is the same as saying that it defines a certain proportion of the total area under the curve, as can be seen in Figure The Normal Distribution and Areas under the Curve Determining the magnitude of the proportion of means in a given area under the curve is quite straightforward. It is based, essentially, on the mathematical procedure for determining the area of a rectangle: multiply the length of the rectangle by its width. Remember that the formula for the standard normal distribution allows us to compute H, or the height of the curve, for any point z on the baseline. Figure 10-2 shows the values for z= 0.0 and z=1.0, a range in which we might be interested.

7 141 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics When z= 0.00, the value of H is.3989; when z= 1.00, the value of H = The areas of two rectangles can now be computed. The first rectangle is the area defined by the product of these two dimensions:.3989 (or b, the height of the curve at z =0.0 and the length of the rectangle) and 1.00 (a, the difference between z = 0.00 and z = 1.00, and the width of the rectangle), or (a x b), which equals The second rectangle has the dimensions.2420 and 1.00 (or a and c, the height at z = +1.00), giving an area of The first rectangle (a x b) obviously covers more than the area under the curve we are interested in; a certain amount of area in the rectangle is not under the curve. The second rectangle does not cover all the area under the curve: there is a certain amount of area under the curve which is not included in the rectangle. However, it would appear from Figure 10-2 that the overestimate of the first rectangle and the underestimate of the second rectangle might cancel. Taking the average of the two rectangles yields ( )/2 =.3205, which, as we shall see below, is quite close to the actual figure for the portion of the area under the normal curve between that point where the mean is located and another point one standard deviation from the mean. We could also compute the area under the curve between the points z = and z = At z = +2.00, H equals Following the same averaging procedure as we carried out above, we d obtain ( )/2=.1480 as an estimate of the area under the curve between z = and z = Since the right half of the distribution contains.50 of the area under the curve, this crude method of determining areas under the curve leaves.50 - ( )=.0315 of the area beyond z = Also, we already know that the normal distribution is symmetrical and so half of the area under the curve falls to the left of the Mean, the other half to the right. Consequently, the same proportions for areas under the curve hold for the left hand side of the distribution. In Figure 10-2, neither the rectangle a x b nor the rectangle a x c, nor even their average, is a completely satisfactory estimate of the area under the curve. It is obvious that a better approximation of the area under the curve between the mean (z=0.0) and z=1.00 could be obtained by dividing the baseline in tenths (or even smaller parts) of a standard deviation, then computing the area of a larger number of rectangles. The total area could then be obtained by adding together the areas calculated from the larger number of smaller rectangles. Students familiar with elementary calculus will recognize this as the basic process involved in integration of the normal distribution function between two limits. By making the rectangle bases smaller and smaller,, and summing the resulting areas, we will get more and more accurate estimates of the true area between the two z-score limits. Figure 10-3 illustrates the result of the calculus of integrals. It shows the broad outline of the relationship between standard scores and areas under the curve. In a normal distribution approximately.34 (or 34%) of the area under the curve lies between the mean (0.00) and one standard unit in either direction. An additional 14% will be observed between 1.00 and 2.00 standard units. Note that these proportions are quite similar to the values obtained by our rough method of taking the averages of the areas of the various rectangles. This means that between and standard units, we will find approximately 68% of all the observations in the normal distribution. Between and standard deviations roughly 96% of the observations in the distribution will occur. That is, an additional 28% (14% + 14%) are observed between 1 and 2 standard units from the mean. The remaining 4% of cases can be found in each tail (2% in the positive tail, and 2% in the negative tail). These represent the cases that are two or more standard deviations away from the mean of the distribution. Also notice that the.34,.14 and.02 in each half of the distribution sum to the.50 probability that we expect. Earlier in this chapter we wondered about the probability that the true population mean was either higher or lower than the observed sample mean of From the sampling distribution we can find the probability that the true population lies between any two values in the sampling distribution by determining the area under the curve between these two values. For instance, there is a.34 probability that the true population mean lies somewhere between 2.00 hours of conversation and 2.05 hours. The symmetry of the distribution allows us to double that proportion to get the probability that the population mean lies somewhere between 1.95 hours and 2.05 hours. This means that we have better than 2:1 odds that the population mean lies within.05 hour (or one standard error) of the sample mean. The probability is.96 that the population mean lies between 1.90 and 2.10 hours of conversation. This example illustrates the way we can use a sampling distribution to quantify our confidence in the results that we get from a random sample.

8 142 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics The Table of Areas under the Normal Curve Because of the widespread use of the normal distribution, tables have been prepared which show percentages of area under the normal distribution function very accurately for a wide range of z-values. Tabled values for areas under the curve of the normal distribution can be found in Appendix A. We ll reproduce a portion of that table here to illustrate how to interpret it. Use the numbers above the various columns in the Table to refer to areas under the curve. In row (a) of the table under Figure 10-4, the z-score is located exactly on the mean (z = 0.0), and therefore the mean-to-z area is equal to 0.0. The fact that this z-score is located on the mean also implies that the area under the normal curve is divided into two equal parts; thus the larger and the smaller areas are equal to one another (and also equal to.50). Finally, column 5 gives the height of the curve at that point as In row (b) we see a z-score of.75, a location three-fourths of a standard unit away from the mean. The area under the curve from the mean to this value of z is approximately.27 (or 27%). The larger area is now equal to.50 plus.27, or.77. This means that the smaller portion under the curve is equal to =.23. Rows (c) and (d) illustrate what happens when the z-score becomes larger, that is, when we move farther away from the mean of the normal distribution. The area between the mean and the z- value becomes larger, the larger portion becomes larger, the smaller portion becomes smaller, and the height of the curve decreases. For instance, row (d) indicates that when z = 2.50, the values in the larger portion of the distribution (i.e., those with lower z-values) constitute 99.38% of all cases in the distribution. Cases that have values that are 2.50 standard units or further from the mean constitute only the remaining.62% of all the cases in the distribution. Since the normal distribution is symmetrical, the results for negative z-scores are the same as those of the positive z-values. In order to use the tables for negative z-scores, you need only look up the corresponding positive z-score. With negative z-scores the smaller area under the curve will

9 143 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics be located in the left-hand side of the curve; the larger area will be on the right. The mean-to-z and ordinate values are the same for negative z-scores. The Sampling Distribution of Means and the Sample Size We know from the examples in Chapters 5 and 9 that increasing the sample size will decrease the sampling error. By using sampling distributions based on the normal curve, constructed from a single sample with the mean and variance estimates provided by the Central Limit Theorem, we can see the effects of sample size even more clearly. Figure 10-5 illustrates sampling distributions constructed from samples of different size which were drawn from a population with a mean of 16 and a variance of 25. First, a general observation. Figure 10-5 shows us how the normal distribution can be used to

10 144 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics associate probabilities with certain sample means. For instance, the first sampling distribution shows that when we take samples of N = 50 from this population we would expect 96% of all the sample means to fall between and inclusive. Only 2% of the sample means would be expected to be larger than (or smaller than ). In other words, applying the Normal distribution to the Sampling distribution allows us to distinguish between likely and unlikely sample means when sampling from a given population. The ability to do this is of key importance in hypothesis testing and this topic will be taken up again later. There are several items in Figure 10-5 you should pay close attention to. First, all sampling distributions have the same mean, since they are centered at the population mean. Further, since all are based on a normal curve, the z-values for each band of probabilities are the same. But these are the only things the three distributions have in common. Although each distribution has the same central tendency and is symmetrical, the dispersion differs. This can be seen by looking at the distribution as a function of the expected sample means, rather than the standard scores. The same ranges of z-values are associated with an increasingly narrower range of sample means. The sample means become more tightly clustered around the population mean when the sample size increases. Thus the general observation that we can draw from Figure 10-5 is that increases in N are associated with decreases in sampling variance and standard error. That is, the sample means that we will obtain will be better estimates of the true value of the population mean as we increase the number of observations. It is important to note, however, that when we plot the sample sizes and the values of the standard error, as in Figure 10-6, we observe the relationship between sample size and the size of the standard error to be non-linear. That is, for each identical increase in sample size we obtain a smaller reduction in the standard error. When N = 50, the standard error was equal to.707. Doubling N to 100 (an increment of 50) brings the standard error down to.50, for a reduction of.207. An additional increment of 50 in N (to a total of 150) brings the standard error to.408, for a reduction of.092. The reduction in the standard error for the second increment is less than half of that of the first increment. Were we to increase N to 200, the new standard error would be.354, a reduction of.054, which is only one-fourth of the

11 145 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics reduction in the standard error obtained from the first increment of 50 in the sample size. Since subsequent increases in sample size will yield increasingly smaller reductions in sampling error, it clearly is not worth our while to continue increasing sample size indefinitely. In our current example, for instance, there doesn t seem to be much point in increasing the sample size beyond 150. Another point about the relationship between sample size and sampling error is less evident, but just as important. Note that none of our discussion about sampling distributions and sampling error has involved the size of the population. Only the size of the sample has been shown to have an effect on sampling error. This seems counter-intuitive, but it s correct: sampling error in a true random sample is a function only of sample size. It doesn t matter if we sample from a population with 1000 members or with 100,000,000. The standard error depends only on the number of observations that we actually make. Often communication or public opinion research is criticized for generalizing to huge populations on the basis of sample sizes that are small, relative to the total size of the population. For example, people often ask How can the TV ratings services describe what s going on in 100,000,000 households when they only collect information in 1200 homes? The answer to this question is contained in this chapter: the error in measurement due to sampling is determined solely by the sample N of 1200, and the total population size is irrelevant. A sample of 1200 in a single city with only 10,000 houses will be just as accurate (or inaccurate) as a national sample of the same size. Summary In this chapter, we demonstrated how the Central Limit Theorem can eliminate the need to construct a sampling distribution by examining all possible samples that might be drawn from a population. The Central Limit Theorem allows us to determine the sampling distribution by using the parameter mean and variance values or estimates of these obtained from a single sample. In particular, the theorem tells us that the estimated sampling distribution has a mean which is the

12 146 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics

13 147 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics same as the population mean, or our best estimate, a sample mean, and that the variance of the sampling distribution can be determined from either the parameter or its estimate divided by the number of observations in the sample. The third contribution of the Central Limit Theorem is that it tells us that the shape of the sampling distribution can be approximated by a normal curve. The normal curve can be described as a function of standard, or z, scores. Using the normal curve allows us to describe sampling error in terms of probabilities, which are represented as areas under the curve. Standard tables which describe the areas under the normal curve have been constructed to aid in making probability statements. We can use these tables to see what the likelihood is that samples with means a certain distance from the true population mean will be observed. When the sample size is increased, we see that the range of values that correspond to a fixed probability level under the normal curve decreases, indicating a corresponding decrease in sampling error. The decrease in sampling error is a simple function of sample size, and is not related to the size of the population. Furthermore, increases in sample size and decreases in sampling error are related by a law of diminishing returns, making it inefficient to increase samples beyond a certain size. With the tools provided by the Central Limit Theorem, we can proceed to the next task in communication research: testing hypotheses about the relationships between our theoretical constructs. In order to do this we will need to use the sampling distributions and the probability statements that they give us. References and Additional Readings Annenberg/CPB (1989). Against all odds. [Videotape]. Santa Barbara, CA: Intellimation. Hays, W.L. (1981). Statistics (3rd. Ed.). New York: Holt, Rinehart & Winston. (Chapter 6, Normal Population and Sampling Distributions ). Kerlinger, F.N. (1986). Foundations of behaviorial research (3rd ed.) New York: Holt, Rinehart and Winston. (Chapter 12, Testing Hypotheses and the Standard Error ). Moore, D.S. & McCabe, G.P. (1989). Introduction to the practice of statistics. New York: Freeman. (Chapter 6, From Probability to Inference ).

### Distributions: Population, Sample and Sampling Distributions

119 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 9 Distributions: Population, Sample and Sampling Distributions In the three preceding chapters

### Hypothesis Testing. Chapter Introduction

Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic

### The previous chapters have provided us with the basic tools and strategies needed to test. Chapter 12. Testing Hypotheses

165 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 12 Testing Hypotheses The previous chapters have provided us with the basic tools and strategies

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: THE NORMAL CURVE AND "Z" SCORES: The Normal Curve: The "Normal" curve is a mathematical abstraction which conveniently

### Statistical Inference

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Introduction to Statistics for Computer Science Projects

Introduction Introduction to Statistics for Computer Science Projects Peter Coxhead Whole modules are devoted to statistics and related topics in many degree programmes, so in this short session all I

### In this chapter, you will learn to use descriptive statistics to organize, summarize, analyze, and interpret data for contract pricing.

3.0 - Chapter Introduction In this chapter, you will learn to use descriptive statistics to organize, summarize, analyze, and interpret data for contract pricing. Categories of Statistics. Statistics is

### 6 3 The Standard Normal Distribution

290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Describing Data: Measures of Central Tendency and Dispersion

100 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 8 Describing Data: Measures of Central Tendency and Dispersion In the previous chapter we

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### ( ) ( ) Central Tendency. Central Tendency

1 Central Tendency CENTRAL TENDENCY: A statistical measure that identifies a single score that is most typical or representative of the entire group Usually, a value that reflects the middle of the distribution

### MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for

### Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

### Unit 16 Normal Distributions

Unit 16 Normal Distributions Objectives: To obtain relative frequencies (probabilities) and percentiles with a population having a normal distribution While there are many different types of distributions

### Normal distribution. ) 2 /2σ. 2π σ

Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### Lesson 7 Z-Scores and Probability

Lesson 7 Z-Scores and Probability Outline Introduction Areas Under the Normal Curve Using the Z-table Converting Z-score to area -area less than z/area greater than z/area between two z-values Converting

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### The Normal Curve. The Normal Curve and The Sampling Distribution

Discrete vs Continuous Data The Normal Curve and The Sampling Distribution We have seen examples of probability distributions for discrete variables X, such as the binomial distribution. We could use it

### The Chi-Square Distributions

MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation " of a normally distributed measurement and to test the goodness

### STA201 Intermediate Statistics Lecture Notes. Luc Hens

STA201 Intermediate Statistics Lecture Notes Luc Hens 15 January 2016 ii How to use these lecture notes These lecture notes start by reviewing the material from STA101 (most of it covered in Freedman et

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### find confidence interval for a population mean when the population standard deviation is KNOWN Understand the new distribution the t-distribution

Section 8.3 1 Estimating a Population Mean Topics find confidence interval for a population mean when the population standard deviation is KNOWN find confidence interval for a population mean when the

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### 9Tests of Hypotheses. for a Single Sample CHAPTER OUTLINE

9Tests of Hypotheses for a Single Sample CHAPTER OUTLINE 9-1 HYPOTHESIS TESTING 9-1.1 Statistical Hypotheses 9-1.2 Tests of Statistical Hypotheses 9-1.3 One-Sided and Two-Sided Hypotheses 9-1.4 General

### CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### Advanced High School Statistics. Preliminary Edition

Chapter 2 Summarizing Data After collecting data, the next stage in the investigative process is to summarize the data. Graphical displays allow us to visualize and better understand the important features

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### TRANSCRIPT: In this lecture, we will talk about both theoretical and applied concepts related to hypothesis testing.

This is Dr. Chumney. The focus of this lecture is hypothesis testing both what it is, how hypothesis tests are used, and how to conduct hypothesis tests. 1 In this lecture, we will talk about both theoretical

### CONFIDENCE INTERVALS I

CONFIDENCE INTERVALS I ESTIMATION: the sample mean Gx is an estimate of the population mean µ point of sampling is to obtain estimates of population values Example: for 55 students in Section 105, 45 of

### Chapter 8: Interval Estimates and Hypothesis Testing

Chapter 8: Interval Estimates and Hypothesis Testing Chapter 8 Outline Clint s Assignment: Taking Stock Estimate Reliability: Interval Estimate Question o Normal Distribution versus the Student t-distribution:

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

### Size of a study. Chapter 15

Size of a study 15.1 Introduction It is important to ensure at the design stage that the proposed number of subjects to be recruited into any study will be appropriate to answer the main objective(s) of

### 4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

### Chapter 7 What to do when you have the data

Chapter 7 What to do when you have the data We saw in the previous chapters how to collect data. We will spend the rest of this course looking at how to analyse the data that we have collected. Stem and

### 8. THE NORMAL DISTRIBUTION

8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

### 1.7 Graphs of Functions

64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### This HW reviews the normal distribution, confidence intervals and the central limit theorem.

Homework 3 Solution This HW reviews the normal distribution, confidence intervals and the central limit theorem. (1) Suppose that X is a normally distributed random variable where X N(75, 3 2 ) (mean 75

### Summarizing Data: Measures of Variation

Summarizing Data: Measures of Variation One aspect of most sets of data is that the values are not all alike; indeed, the extent to which they are unalike, or vary among themselves, is of basic importance

### Continuous Functions, Smooth Functions and the Derivative

UCSC AMS/ECON 11A Supplemental Notes # 4 Continuous Functions, Smooth Functions and the Derivative c 2004 Yonatan Katznelson 1. Continuous functions One of the things that economists like to do with mathematical

### Hypothesis tests, confidence intervals, and bootstrapping

Hypothesis tests, confidence intervals, and bootstrapping Business Statistics 41000 Fall 2015 1 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference

### Chapter 8. Linear Regression. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 8 Linear Regression Copyright 2012, 2008, 2005 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King

### 5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

### Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

### It is time to prove some theorems. There are various strategies for doing

CHAPTER 4 Direct Proof It is time to prove some theorems. There are various strategies for doing this; we now examine the most straightforward approach, a technique called direct proof. As we begin, it

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### Accuracy, Precision and Uncertainty

Accuracy, Precision and Uncertainty How tall are you? How old are you? When you answered these everyday questions, you probably did it in round numbers such as "five foot, six inches" or "nineteen years,

### Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture - 3 Rayleigh Fading and BER of Wired Communication

### Frequency Distributions

Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal

### Recitation, Week 3: Basic Descriptive Statistics and Measures of Central Tendency:

Recitation, Week 3: Basic Descriptive Statistics and Measures of Central Tendency: 1. What does Healey mean by data reduction? a. Data reduction involves using a few numbers to summarize the distribution

### CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

### Chapter 7 Inference for a Mean or Median

7.1 Introduction 139 Chapter 7 Inference for a Mean or Median 7.1 Introduction There are many situations when we might wish to make inferences about the location of the center of the population distribution

### MATH Fundamental Mathematics II.

MATH 10032 Fundamental Mathematics II http://www.math.kent.edu/ebooks/10032/fun-math-2.pdf Department of Mathematical Sciences Kent State University December 29, 2008 2 Contents 1 Fundamental Mathematics

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Descriptive Data Summarization

Descriptive Data Summarization (Understanding Data) First: Some data preprocessing problems... 1 Missing Values The approach of the problem of missing values adopted in SQL is based on nulls and three-valued

### We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

### Chapter 8: Introduction to Hypothesis Testing

Chapter 8: Introduction to Hypothesis Testing We re now at the point where we can discuss the logic of hypothesis testing. This procedure will underlie the statistical analyses that we ll use for the remainder

### Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

### AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### Week 4: Standard Error and Confidence Intervals

Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

### x Measures of Central Tendency for Ungrouped Data Chapter 3 Numerical Descriptive Measures Example 3-1 Example 3-1: Solution

Chapter 3 umerical Descriptive Measures 3.1 Measures of Central Tendency for Ungrouped Data 3. Measures of Dispersion for Ungrouped Data 3.3 Mean, Variance, and Standard Deviation for Grouped Data 3.4

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### WRITING PROOFS. Christopher Heil Georgia Institute of Technology

WRITING PROOFS Christopher Heil Georgia Institute of Technology A theorem is just a statement of fact A proof of the theorem is a logical explanation of why the theorem is true Many theorems have this

### Each exam covers lectures from since the previous exam and up to the exam date.

Sociology 301 Exam Review Liying Luo 03.22 Exam Review: Logistics Exams must be taken at the scheduled date and time unless 1. You provide verifiable documents of unforeseen illness or family emergency,

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Session 6 Number Theory

Key Terms in This Session Session 6 Number Theory Previously Introduced counting numbers factor factor tree prime number New in This Session composite number greatest common factor least common multiple

### Position, Velocity, and Acceleration

Chapter 2 Position, Velocity, and Acceleration In this section, we will define the first set of quantities (position, velocity, and acceleration) that will form a foundation for most of what we do in this

### Math Matters: Why Do I Need To Know This?

Math Matters: Why Do I Need To Know This? Bruce Kessler, Department of Mathematics Western Kentucky University Episode Nine 1 Normal distributions Statistical models Objective: To explain and illustrate

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### Lecture 7 Cogsci 109. Thurs. Oct. 12, 2006

Lecture 7 Cogsci 109 Thurs. Oct. 12, 2006 Announcements Homework 2 is posted. Office hours Midterm is coming up somewhere in the next couple of weeks Anything lectured on, presented in section, in the

### Hypothesis Testing. Concept of Hypothesis Testing

Quantitative Methods 2013 Hypothesis Testing with One Sample 1 Concept of Hypothesis Testing Testing Hypotheses is another way to deal with the problem of making a statement about an unknown population

### Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation

Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #4-9/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location

### I know when I have written a number backwards and can correct it when it is pointed out to me I can arrange numbers in order from 1 to 10

Mathematics Targets Moving from Level W and working towards level 1c I can count from 1 to 10 I know and write all my numbers to 10 I know when I have written a number backwards and can correct it when

### 302 Part 3 / Research Designs, Settings, and Procedures

302 Part 3 / Research Designs, Settings, and Procedures Chapter 19 Selecting Statistical Tests We have now arrived at the point where we have to talk about selecting the appropriate statistics to test

### Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.

Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

### SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION

CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/ SECTION 5.2 Mathematical Induction I Copyright Cengage Learning. All rights reserved.

### Inference, Sampling, and Confidence. Inference. Inference

Inference, Sampling, and Confidence Inference defined Sampling Statistics and parameters Sampling distribution Confidence and standard error Estimation Precision and accuracy Estimating sample size 1 Inference

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Probability Models for Continuous Random Variables

Density Probability Models for Continuous Random Variables At right you see a histogram of female length of life. (Births and deaths are recorded to the nearest minute. The data are essentially continuous.)

### The answer is (a) describes the relation between desired consumption expenditures and the factors that determine it, like real disposable income.

1 Solution Set: Chapter 22 Chapter Review: 2. The consumption function The answer is (a) describes the relation between desired consumption expenditures and the factors that determine it, like real disposable