Objectives. 3.3 Toward statistical inference. Toward statistical inference Sampling variability
|
|
- Marianna Webb
- 7 years ago
- Views:
Transcription
1 Objectives 3.3 Toward statistical inference Population versus sample (CIS, Chapter 6) Toward statistical inference Sampling variability Further reading: (some of the concepts introduced in this link are beyond this class) Adapted from authors slides 2012 W.H. Freeman and Company
2 The inconvenient truth So far we have assumed the mean of a population is known. In reality the population is unknown so its mean is unknown. Inference is detecting/find the unknown population mean based on a very small sample from the population. We illustrate what is meant by this in the following examples. See also the recent journal article from Poultry Science.
3 Towards statistical inference A survey of 2000 randomly sampled college students, 62% of this sample reported they have encountered some type of harassment. Parents are worried: What is the truth about the millions of students who are currently at college? Because the sample was taken at random it seems quite reasonable to suppose this sample is representative of the population of college students. This suggests that about 62% of all college students may have encountered some type of harassment. 62% is in fact an estimate of the total proportion who may have encountered harassment. What is the exact proportion? This is the start of statistical inference, where we infer conclusions on the entire population based on a sample. 62% is not the exact value, it will vary from sample to sample, and our objective in the next few lectures is to understand this variability. This will help us to understand the reliability of the estimate.
4 Refresher: De9initions Population: The entire group of individuals in which we are interested but cannot assess or observe directly. Examples: All college students, All calves etc. Often the population is described by a mathematical model. Population Sample: The part of the population we actually examine and for which we do have data. How well the sample represents the population depends on the sampling method, as well as on the sample size. Sample A parameter is a number describing a characteristic of the population. A statistic is a number describing a characteristic of a sample.
5 Example: M&M data q q q To illustrate what we mean by a population and sample, let us return to the M&M example. Let us suppose that the 170 M&M bags represent the population of M&Ms (in reality we do not observe the population so this is just an example for illustration). The population mean for the number of M&Ms is A random sample of size 5 is taken. There are different random samples that can be taken! q Note: Examples of random samples are given in homework 1. q q On the next two slides we show how to sample from the distribution. q Top plot: The distribution for the number of M&Ms in a bag (over 170 bags). q Middle plot: One sample of size 5 q Lower plot: The average of that sample (sample mean). Observe how the sample mean is different for the two samples.
6 Sample 1
7 Sample 2
8 Sampling variability As illustrated from the previous example, for every sample taken from a population, we are likely to get a different set of individuals and calculate a different value for our statistic (such as the sample mean). This is called sampling variability. This would suggest that the sample and the statistic contains no information about the population. However. The good news is that, if we imagine taking lots of random samples of the same size from a given population, the variation from sample to sample the sampling distribution will follow a predictable pattern. All of statistical inference is based on this; to see how trustworthy a statistic is what happens of we kept repeating the sampling many times?
9 We measure the quality of a statistic (such as the sample mean) with: Accuracy (bias) Random samples provide accurate estimates of a parameter because they are unbiased (or close to unbiased, depending on the random sampling method). This is done by sampling in a good way (ie. Randomly sampling over the population of interest). Using a well constructed statistic. Typically we will assume an estimator is unbiased. When reading an article identify the population of interest and potentially biases which may arise. Reliability (variable) A reliable estimation method is one that would give similar results if the random sampling is repeated over. The less variable a statistic, the more reliable it is. Random sampling enables us to measure the variability of a statistic. We do this with the standard error in the next slide we define what this means. Important: The larger the sample size, the less variable the corresponding estimator will be. To understand the above concepts look at the question at the end of this page:
10 Measuring Variability We have come across variability before. Recall in Chapter 3 we used the standard deviation to measure the variability in the sample. We recall that the sample standard deviation is the deviation from each observation to the sample mean: s = v u t 1 n 1 nx (X i X) 2 i=1 q The same criterion is used to measure the variability in the sample mean (and all other estimators). This is called the standard error. q More precisely, we measure the average spread from each estimator to the true mean. q Looking back at the M&M examples, it would appear that we have to calculate sample means! q This is impossible. q Remarkably we can find a very nice expression for the standard error which requires very little effort!
11 Population size does not matter There are about 15 million students in higher education. In the harassment survey about 2000 people were randomly surveyed. This means that the sexual harassment survey interviewed about one in every 7500 students. 62% is a estimate of the true population proportion. Question: Would the estimate of the proportion be better if the population size were smaller? For example, 1.5 million students rather than 15 million student. Answer: No. Only the size of the sample, in this case n=2000, has an influence on it s reliability, not the size of the population. Statistical inference is not based on how close the sample size is to the population (usually we assume that the population is infinite). It is based on the idea that simple random sample gives a representative sample over the entire population.
12 Summary and what s to come The techniques of statistics allow us to draw inferences or conclusions about a population using the data from a sample. Your estimate of the population parameter is only as good as your sampling design. à Work hard to eliminate biases (design your experiment well). Your sample statistic is only an estimate and if you randomly sampled again you would probably get a somewhat different result (more of this next). In the next section we will show: q q The distribution of the estimates (for much of the course it will be the sample mean) will, if the sample size is large enough, be normally distributed even if the observations are not normal. The standard error (reliability) has a simple formula!
13 Objectives 5.1 Sampling distribution of a sample mean (CIS, Chapter 8) The mean and standard deviation of x For normally distributed populations The central limit theorem (CIS, Chapter 8 and p103) Additional reading: samp_dist_mean.html Adapted from authors slides 2012 W.H. Freeman and Company
14 Simulation tools used To demonstrate the concepts I am using here I will be using an Applet in Statcrunch called sampling distribution. It is highly recommended that you try this out yourself. Applets -> Sampling Distributions. Select the distribution (from uniform etc) or choose the data table (your own data). Press computer. Choose your sample size (this is how large a sample you use) times etc. has NOTHING to do with sample size. It is the number of samples you draw (this part is the thought experiment). You should make this as large as possible (I usually set it to 100,000). Press the + sign next to Sampling means to get the QQplot of the distribution of the sample mean. Do not press the + sign next to Samples this will give you the QQplot of the sample. Conceptionally, what we will be doing is rather sophisticated and it will take time to precisely understand the ideas behind inference. This is NOT plug and chug. Note that you can customize the (parent) distribution from which you sample from by simply left clicking over the parent distribution and moving the cursor as you want the shape of the distribution to be.
15 M&M example Look first at the distribution of the total number M&Ms in a bag. We will treat this as our `population. Just comparing the histogram with the normal curve we can see that it is not normal. There are two reasons for this: a) The mix of different type of M&Ms (milk chocolate, peanut and peanut butter), will induce multimodalness in the distribution. b) The number of M&Ms is a numerical discrete random variable. In the following examples we will be drawing M&M bags (numbers) from this distribution. It is analogous to putting all 170 counts in a bag and drawing them out (with replacement). We see that we are most likely to draw the number 18 and least likely to draw 14 (within the range 5-21).
16 Distribution of average: sample 5 Let us now look at the distribution of the sample mean of all samples of size 5. That is we randomly sample 5 values from the population, and take the sample mean.
17 QQplot of average: sample 5 Let us now look at the QQplot of the sample mean of all samples of size 5 (corresponding to the histogram on the previous page) Observations: 1. The histogram of the sample mean is more bell-shaped than the original distribution. However, it is certainly not normal (the spikes we see is due taking average of 5 numbers, which is not continuous enough). 2. There is less spread in the distribution of the averages than the original histogram. 3. The QQplot shows a large deviation from normality in the tails.
18 Distribution of average: sample 10 Let us now look at the distribution of the sample mean of all samples of size 10. That is we randomly sample 10 values from the population, and take the sample mean.
19 QQplot of average: sample 10 Let us now look at the QQplot of the sample mean of all samples of size 10 (corresponding to the histogram on the previous page) Observations: 1. The histogram of the sample mean is a lot more bell-shaped than the original distribution. The spikes that were seen for sample size 5 have gone (the bumps you see on the histogram are due to binwidth). 1. There is even less spread in the distribution of the averages than the original histogram. 2. The QQplot shows only a small deviation from normality in the top tail of the distribution.
20 Distribution of average: sample 20 Let us now look at the distribution of the sample mean of all samples of size 20. That is we randomly sample 20 values from the population, and take the sample mean.
21 QQplot of average: sample 20 Let us now look at the QQplot of the sample mean of all samples of size 20 (corresponding to the histogram on the previous page) Observations: 1. The histogram of the sample mean is pretty much normal. 2. There is even less spread in the distribution of the averages than the original histogram. 3. The QQplot shows only a very tiny deviation from normality in the tails of the distribution.
22 Distribution of average: sample 40 Let us now look at the distribution of the sample mean of all samples of size 40. That is we randomly sample 40 values from the population, and take the sample mean.
23 QQplot of average: sample 40 Let us now look at the QQplot of the sample mean of all samples of size 40 (corresponding to the histogram on the previous page) Observations: 1. The histogram of the sample mean is almost normal. 2. There is even less spread in the distribution of the averages than the original histogram. 3. The QQplot is very close to the x=y line.
24 Summary: Sampling distribution of M&Ms
25 Summary of averages of M&Ms Sample size mean standard error comment original =4.64/ p 1 Not normal =4.64/ p 5 More unimodal =4.64/ p 10 Getting normal =4.64/ p 20 Mostly there =4.64/ p 40 Pretty much normal. This example illustrates three major insights: q The distributions of the sample means are centered about the true mean This tells us that the sample mean is not biased. q We see that the spread in the sample means decreases as the sample size used to evaluate them increases. The spread/reliability/ variability is measured using the standard error which has the formula σ/ n (in this case σ=4.64 and n=5,10,20 or 40). q The distribution of the sample mean becomes more normal (look at the QQplots) as the sample size grows.
26 Properties: Sample mean for normally distributed data When a variable in a population is normally distributed, the sampling distribution of distributed. x for all possible samples of size n is also normally If the population is Normal(µ, σ) then the sample mean s Sampling distribution distribution is Normal(µ, σ/ n). Note that the sample average has less variability than any Population individual observation.
27 Properties: Sample mean of non- normal distributed data Central Limit Theorem: When randomly sampling from any population with mean µ and standard deviation σ, if n is large enough then the sampling distribution of is approximately normal: ~ N(µ, σ / n). x Population with strongly skewed distribution Sampling distribution of x for n = 2 observations Sampling distribution of x for n = 10 observations Sampling distribution of x for n = 25 observations
28 Calculation Practice In 2010 the combined SAT scores had mean 1016 and standard deviation 212. They also had approximately normal distribution. Population distribution is Normal(µ = 1016; σ = 212). In Chapter 4, we used the normal distribution to show that the probability of a randomly selected student scoring 1100 or higher is 34.5%. Now, suppose 50 students are randomly selected and their SAT scores averaged. What is the probability that the average is greater than 1100? Sampling distribution of the sample average when n = 50 is Normal(µ = 1016; σ / n = 212 / 50 = 29.98). Using these values, the z-score for 1100 is z ( x µ ) = = ` = = σ n In Table A, the area to the right of 2.80 is So there is only a 0.25% chance that the average of 50 randomly sampled students is more than In this example we do not use the CLT because the original data is assumed normal.
29 Calculation Practice Hypokalemia is diagnosed when blood potassium levels are below 3.5mEq/dl. Let s assume that we know a patient whose measured potassium levels vary daily according to the Normal(µ = 3.8, σ = 0.2) distribution. If only one measurement is made, what is the probability that this patient will be misdiagnosed with Hypokalemia? ( x µ ) z = = = 1.5 σ 0.2, P(z < 1.5) = %. Instead, if measurements are taken on 4 separate days, what is the probability of a misdiagnosis (in this case sample mean based on 4 is below 3.5)? ( x µ ) z = = = 3 σ n 0.2 4, P(z < 3) = %. Note: If the problem is about the sample mean, make sure to standardize (get z) using the standard error for the sample mean.
30 Calculation Practice: using the CLT In Chapter 4 we discussed ACT scores. We argued that because the grades were numerical discrete over a small range, that the grade distribution could not be normally distributed. This means we cannot use the normal distribution to calculate probabilities for one randomly selected person. BUT if the sample size is large enough we can use the normal distribution to calculate probabilities for averages. We recall the mean ACT score is 20 with standard deviation 5. Question: 50 students are randomly selected. Calculate the probability their average (sample mean) score will be greater than 18. Answer: The mean of the sample mean has the same mean as the original distribution, which we know is 20. The standard error of the sample mean is s.e. = 5/ 50 = We use this to make the z-transform z = = Looking up the z-tables using a computer we see that probability is 99.7%. This means there is a very large chance the sample mean greater is than 18.
31 Calculation Practice Let us return to the weights of calves at 0.5 weeks. The distribution is below q Looking at the plot, it seems that a normal density (with mean and standard deviation 7.7) is a rough approximation of the underlying distribution of calves weights (see also the QQplot given at the end of Chapter 4). q q Question (a): Using the normal density calculate the approximate probability that a calf weights more than 100 pounds. Answer: Make a z-transform=( )/7.7 =1.28. Looking this up in the z-tables we have 90%. Therefore the approximate probability that a calf is greater than 100 is 10%.
32 Question (b): Let us suppose that the sample mean of 10 calves is taken. Using the normal approximation of the sample mean, what is the probability that the sample mean will be greater than 100 pounds? Answer: The mean of the sample mean is the same as the mean weight of cows which is The standard deviation of the sample mean is 7.7/ 10 = 2.4. By making the z-transform we have z=( )/2.4 = Looking 4.12 up in the z-tables, we see that it is in the far upper tails, thus the probability is close to 0%. The size of the probabilities calculated in (a) and (b) are compared in the above plots.
33 q Of the two probabilities calculated above, which is likely to be closest to the true probability? q q Both probabilities were calculated using the normal distribution. But this is only an approximation of the true distribution of calf weights and sample mean of calf weights. From the histogram on two pages back, it appears that the density for the underlying weights of calves is only very approximately normal. Thus it is unlikely that the probability calculated for the weight of one calf is that accurate. On the other hand the Central Limit Theorem tells is that the distribution of the sample mean gets closer to normal as the sample size grows. The second probability we calculated was based on the average weight of 10 calves. The distribution of the average is likely to be more normal than the weight of calves. Thus the second probability based on the average is more accurate (close to the true probability).
34 Calculation practice A farmer wants to use a vehicle to carry week old calves. The vehicle he plans to use can carry a maximum load of 2760 pounds. He knows that the mean weight of a calf is pounds and the standard deviation is 7.7. What is the chance the vehicle can carry the calves? We need to turn the total weight into the sample mean. We observe, if the total weight of 30 calves needs to be less than 2760 pounds this is the same as the sample mean weight of 30 calves must be less than 2760/30 = 92: X30 i=1 X i < 2760 ) X = 1 30 X30 i=1 X i < Therefore, we have turned the problem from totals into averages and apply the CLT to calculate the probability using the normal distribution.
35 Calculation practice (cont) We know from the central limit theorem that the sample mean is close to normally distributed. Thus the distribution of the sample mean is normal with mean and standard deviation 7.7/ 30 = 1.4. We know that for the vehicle to carry the calves, the sample mean has to be less than 92 pounds. Calculate the z-transform z=( )/1.4 = 1.35 and look up the z- tables to get Conclusion: There is a 91.1% chance the vehicle can carry the week old calves. In mathematical symbols: P X30 i=1 X i < 2760! = P X = 1 30 X30 i=1! X i < = P (Z <1.35) = 0.911
36 How large is a large enough sample size? It depends on the population distribution. More observations are required if the population distribution has a large standard deviation or if it is far from normal in distribution. A sample size of 25 is generally enough to obtain a normal sampling distribution from a population with some skewness or even mild outliers. A sample size of 40 will typically be good enough to overcome some skewness and outliers. More importantly, n should be large enough to make the standard error sufficiently small then we can get meaningful and precise inferences. We can check this by using the Sampling distribution applet. In many cases, even n = 40 is not large enough to give results reliable enough when there is a lot at stake. This is why clinical trials, political polls and marketing surveys typically observe 100 s or even 1000 s of individuals.
37 The effect of skewness on the CLT Below we look at the sample mean taken from data with a large right skew
38 The corresponding QQplot of the sample mean Observations: 1. We see that the standard error is = 4.7/ 40, which is as it should be. 2. However, the QQplot deviates far from normality in the tables. The distribution of the sample mean still has a slight right skew (look back at the QQplots in Chapter 4). This demonstrates that when data is highly skewed, we need a much large sample size for the CLT to kick in. 3. Calculations based on normality of the the average will not be completely correct.
39 Effect of binary data on the CLT Binary data arises in several situations. It includes Male or Female. Like or Dislike, wherever there are two possible outcomes. In this example, we have encrypted one outcome with zero and the other with 1 (it does not really matter which way). We see that the proportion in the one category is about 20% - this is what is meant by the mean. This data is discrete and clearly skewed.
40 The corresponding QQplot of the sample mean Observations: 1. We see that the standard error is = 0.405/ 50, which is as it should be. 2. However, the QQplot deviates far from normality in the tables. The lines across demonstrate that the average over 50 still takes discrete values (though not integers). We also see a U shape that shows that the sample mean is still skewed. 3. Calculations based on normality of the the average will not be completely correct.
41 Example: Income distribution Let s consider the very large database of individual incomes from the Bureau of Labor Statistics as our population. Income is strongly right skewed. We take 1000 SRSs of 25 incomes, calculate the sample mean for each, and make a histogram of these 1000 means. We also take 1000 SRSs of 100 incomes, calculate the sample mean for each, and make a histogram of these 1000 means. Which histogram corresponds to samples of size 100? Which to samples of size 25?
42 So many standard deviations! In statistics we talk about different kinds of standard deviations, and it can be hard to keep track of them: s is the standard deviation of a set (sample) of data. It is a statistic we can compute once we have the data. σ is the standard deviation of a population (which is much too big to observe completely). It is a parameter usually, we will never know its true value. σ / n is the standard deviation of the values of from all possible random samples of size n. It refers to the sample mean, not to data. It is also called the standard error of. s / n is our estimate of σ / n, since we do not know the value of σ. From a survey of students taking statistics, n = 459 responded to the question How many Facebook friends do you have? The sample mean was x = and the sample standard deviation was s = The standard error for the sample mean is s / n = 589.5/ 459 = x is an estimate for µ = mean of the population of all students required to take the class and s is an estimate for the population standard deviation σ. x x
43 Summary is always unbiased for µ, even if the population s distribution is very different from a normal distribution. The standard deviation of, σ / n, measures the variability due to random sampling. If the population is approximately normal or if the sample size n is large, we can use the normal distribution to compute probabilities for. We just have to remember to use σ / n, not σ, in the denominator when calculating z. This means we can say something about how close is likely to be to µ. Generally it is quite likely (95% chance) that it will be within 2 x x standard errors of µ. x Not all variables are normally distributed and large samples are not always attainable. In such circumstances, a statistician should be consulted for proper methods of statistical inference and calculation. x
44 Accompanying problems associated with this Chapter Quiz 5 Quiz 6 Homework 2, Q6. Homework 3.
Chapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More informationWeek 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
More informationObjectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)
Objectives 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) Statistical confidence (CIS gives a good explanation of a 95% CI) Confidence intervals. Further reading http://onlinestatbook.com/2/estimation/confidence.html
More informationWISE Sampling Distribution of the Mean Tutorial
Name Date Class WISE Sampling Distribution of the Mean Tutorial Exercise 1: How accurate is a sample mean? Overview A friend of yours developed a scale to measure Life Satisfaction. For the population
More informationDensity Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationSAMPLING DISTRIBUTIONS
0009T_c07_308-352.qd 06/03/03 20:44 Page 308 7Chapter SAMPLING DISTRIBUTIONS 7.1 Population and Sampling Distributions 7.2 Sampling and Nonsampling Errors 7.3 Mean and Standard Deviation of 7.4 Shape of
More informationLesson 4 Measures of Central Tendency
Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationCharacteristics of Binomial Distributions
Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation
More informationLesson 17: Margin of Error When Estimating a Population Proportion
Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information
More informationTEACHER NOTES MATH NSPIRED
Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when
More informationPoint and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
More informationBinomial Sampling and the Binomial Distribution
Binomial Sampling and the Binomial Distribution Characterized by two mutually exclusive events." Examples: GENERAL: {success or failure} {on or off} {head or tail} {zero or one} BIOLOGY: {dead or alive}
More information5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
More informationInference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationThe Normal Distribution
Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution
More informationSolutions to Homework 3 Statistics 302 Professor Larget
s to Homework 3 Statistics 302 Professor Larget Textbook Exercises 3.20 Customized Home Pages A random sample of n = 1675 Internet users in the US in January 2010 found that 469 of them have customized
More informationDef: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More information6 3 The Standard Normal Distribution
290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationLecture 5 : The Poisson Distribution
Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationProbability Distributions
Learning Objectives Probability Distributions Section 1: How Can We Summarize Possible Outcomes and Their Probabilities? 1. Random variable 2. Probability distributions for discrete random variables 3.
More informationLecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions
Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions Typical Inference Problem Definition of Sampling Distribution 3 Approaches to Understanding Sampling Dist. Applying 68-95-99.7 Rule
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More informationJohn Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text
Law of Averages - pg. 294 Moore s Text When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So, if the coin is tossed a large number of times, the number of heads and the
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationUnit 26 Estimation with Confidence Intervals
Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference
More informationCONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont
CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationPr(X = x) = f(x) = λe λx
Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error
More informationSKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.
SKEWNESS All about Skewness: Aim Definition Types of Skewness Measure of Skewness Example A fundamental task in many statistical analyses is to characterize the location and variability of a data set.
More informationYou flip a fair coin four times, what is the probability that you obtain three heads.
Handout 4: Binomial Distribution Reading Assignment: Chapter 5 In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables.
More informationExperimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationSENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More information8. THE NORMAL DISTRIBUTION
8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,
More informationNormal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.
Normal Distribution Definition A continuous random variable has a normal distribution if its probability density e -(y -µ Y ) 2 2 / 2 σ function can be written as for < y < as Y f ( y ) = 1 σ Y 2 π Notation:
More informationThe Math. P (x) = 5! = 1 2 3 4 5 = 120.
The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationMATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem
MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides, you
More informationCOMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationz-scores AND THE NORMAL CURVE MODEL
z-scores AND THE NORMAL CURVE MODEL 1 Understanding z-scores 2 z-scores A z-score is a location on the distribution. A z- score also automatically communicates the raw score s distance from the mean A
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More information7. Normal Distributions
7. Normal Distributions A. Introduction B. History C. Areas of Normal Distributions D. Standard Normal E. Exercises Most of the statistical analyses presented in this book are based on the bell-shaped
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationDescriptive Statistics
Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9
More informationTest Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but
Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic
More informationIntroduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true
More informationp ˆ (sample mean and sample
Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationc. Construct a boxplot for the data. Write a one sentence interpretation of your graph.
MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
More informationDescriptive Methods Ch. 6 and 7
Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational
More informationLecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationCenter: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationSolutions to Homework 6 Statistics 302 Professor Larget
s to Homework 6 Statistics 302 Professor Larget Textbook Exercises 5.29 (Graded for Completeness) What Proportion Have College Degrees? According to the US Census Bureau, about 27.5% of US adults over
More informationMATH 140 Lab 4: Probability and the Standard Normal Distribution
MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationMath 461 Fall 2006 Test 2 Solutions
Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two
More informationAn Introduction to Basic Statistics and Probability
An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationMidterm Review Problems
Midterm Review Problems October 19, 2013 1. Consider the following research title: Cooperation among nursery school children under two types of instruction. In this study, what is the independent variable?
More informationCA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
More informationComparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationClassify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous
Chapter 2 Overview Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Classify as categorical or qualitative data. 1) A survey of autos parked in
More informationMind on Statistics. Chapter 2
Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table
More informationLOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationChapter 8: Quantitative Sampling
Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or
More informationRelationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
More informationEstimation and Confidence Intervals
Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More information5544 = 2 2772 = 2 2 1386 = 2 2 2 693. Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1
MATH 13150: Freshman Seminar Unit 8 1. Prime numbers 1.1. Primes. A number bigger than 1 is called prime if its only divisors are 1 and itself. For example, 3 is prime because the only numbers dividing
More informationMONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010
MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationName: Date: Use the following to answer questions 3-4:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
More information