Hypothesis tests, confidence intervals, and bootstrapping
|
|
- Alexia Nicholson
- 7 years ago
- Views:
Transcription
1 Hypothesis tests, confidence intervals, and bootstrapping Business Statistics Fall
2 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference in means: H0 : µ 1 µ 2 = 0 Testing a difference in proportions: H0 : p 1 p 2 = 0 Testing a difference in means: H0 : µ 1 µ 2 = 0 (paired sample) Testing a difference in means: H0 : µ 1 µ 2 = 0 (same variance) Simulating from a null distribution 2. Confidence intervals 3. Bootstrap confidence intervals Read chapter 15 from Kaplan, chapter 9 in Naked Statistics and chapters 4-6 of OpenIntro 2
3 Homeless guys don t wear nice shoes The guy asking for your change outside Alinea is wearing a posh pair of loafers. Would you be willing to conclude that he s not actually homeless on the basis of this evidence? To make things numerical, assume we recognize the shoes and know for a fact they cost $285, or 5.65 log dollars. Also assume that the distribution of log-prices of homeless guys shoes is described by X N(3.7, ). Then we find P(X > 4.69) = 0.05 (using qnorm(0.05,3.7,0.6,lower.tail = FALSE)). So, if we call out all supposed homeless guys with shoes worth more than exp(4.69) = $108 we ll only do so incorrectly 5% of the time. 3
4 Homeless guys don t wear nice shoes Shoe price distribution for homeless dudes Density Price in dollars 4
5 Homeless guys don t wear nice shoes Homeless guy shoe prices in log-dollars Density x 5
6 Homeless guys don t wear nice shoes To turn this classification problem into a hypothesis testing problem, we must phrase the question in terms of probability distributions and their parameters. Assume the data we observe the shoe price was a draw from a normal probability distribution with an unknown mean, µ, and a known variance (for now), σ 2 = If the guy were homeless, then µ = 3.7. So we want to test the hypothesis H 0 : µ = µ 0 where µ 0 =
7 Logic of hypothesis tests Consider a normal random variable X N(µ, σ 2 ). Density µ! 3! µ! 2! µ!! µ µ +! µ + 2! µ + 3! x 7
8 Logic of hypothesis tests Imagine observing a single draw of this random variable, call it x. Assume the variance σ 2 is known, but the mean parameter µ is not. Density µ! 3! µ! 2! µ!! µ µ +! µ + 2! µ + 3! x 8
9 Logic of hypothesis tests Intuitively, this single observed value x tells us something about the unknown parameters: more often than not, the observed value will tend to be close to the parameter value. Density µ! 3! µ! 2! µ!! µ µ +! µ + 2! µ + 3! x Then again, sometimes it will not be. But it will only rarely be too far off. 9
10 Logic of hypothesis tests Assume we have a guess in mind for our true parameter value. We denote this guess by µ 0, pronounced mew-naught. We refer to this as the null hypothesis, which we write: H 0 : µ = µ 0. The symbol µ is the true value and µ 0 is the hypothesized value. 10
11 Logic of hypothesis tests Hypothesis testing asks the following question: if the true value were µ 0, is my data in an unlikely region? Density µ 0! 3! µ 0! 2! µ 0!! µ 0 µ 0 +! µ 0 + 2! µ 0 + 3! x If we consider it too unlikely, we decide not to believe our hypothesis and we reject the null hypothesis. 11
12 Logic of hypothesis tests On the other hand, if the data falls in a likely region, we decide our hypothesis was plausible and we fail to reject the null hypothesis. Density µ 0! 3! µ 0! 2! µ 0!! µ 0 µ 0 +! µ 0 + 2! µ 0 + 3! x 12
13 Level of tests Where do we put the rejection region? In general, it depends on the problem (more on that in a minute). Density µ 0! 3! µ 0! 2! µ 0!! µ 0 µ 0 +! µ 0 + 2! µ 0 + 3! x But one thing is always true: the probability of the rejection region (the area under the curve) dictates how often we will falsely reject the null hypothesis. This is called the level of the test. 13
14 Level of tests Because when the null hypothesis is true, we still end up in unusual areas sometimes. How often this happens is exactly the level of the test. Density µ 0! 3! µ 0! 2! µ 0!! µ 0 µ 0 +! µ 0 + 2! µ 0 + 3! x 14
15 Where to put the rejection region One way to think about rejection regions is in terms of alternative hypotheses, such as H A : µ > µ 0. Density µ 0 3σ µ 0 2σ µ 0 σ µ 0 µ 0 + σ µ 0 + 2σ µ 0 + 3σ x I prefer to think of it the other way around: where we place our rejection region dictates what the alternative hypothesis is, because it determines what counts as unusual. 15
16 Where to put the rejection region For H A : µ < µ 0 the rejection region is on the other side. Density µ 0! 3! µ 0! 2! µ 0!! µ 0 µ 0 +! µ 0 + 2! µ 0 + 3! x In all of the pictures so far, the level of the test has been α = There is nothing special about that number. 16
17 Where to put the rejection region We could even have a rejection region in a small sliver around the null hypothesis value. Density µ 0 3σ µ 0 2σ µ 0 σ µ 0 µ 0 + σ µ 0 + 2σ µ 0 + 3σ x Perhaps this would reflect evidence of cheating of some sort: the data fit too well. 17
18 More than one observation To apply this logic to more than one data point, we simply collapse our data into a single number, or statistic, and figure out the sampling distribution of this statistic. Then we proceed as before. In this lecture we will use sample means as our test statistic. Conveniently, if we have n samples, each drawn independently X i iid N(µ, σ 2 ) we have the result that X N(µ, σ 2 /n) where X = n 1 n i=1 X i. 18
19 Did something change? You have implemented a new incentive policy with your sales force and you want to measure if the new policy is translating to increased sales. Previously sales hover around $50,000 a week, with a standard deviation of $6,000. The first five weeks have produced the following sales figures (in thousands of dollars) of [61, 52, 48, 43, 65]. Do you reject the null hypothesis that nothing has changed? 19
20 Did something change? Our test statistic is X. Under the null distribution X N(50, 6 2 /5). We observe x = = We want our unusual region to be unusually high sales. At a level of 10% the rejection region starts at 53.44, so we reject. At a level of 5%, the rejection region starts at 54.4, so we fail to reject. 20
21 Did something change? Density x The empirical or sample mean falls in the 10% rejection region (but not the 5% rejection region). 21
22 p-values The largest level at which we would reject our observed value is called the p-value of the data. In other words, the p-value is the probability of seeing data as, or more, extreme than the data actually observed. So the p-value will change depending on the shape of the rejection region. So a p-value larger than the level of a test, implies that you fail to reject. A p-value smaller than the level of a test implies you reject. 22
23 Application to a proportion We ask n = 50 cola drinkers if they prefer Coke to Pepsi; 28 say they do. Can we reject the null hypothesis that the two brands have evenly split the local market? We can approach this problem using a normal approximation to the binomial. Under the null distribution, the proportion of Coke drinkers has an approximate N(0.5, /50) distribution. We observe x = = The p-value is the area under the curve less than 0.44 and greater than
24 Coke vs Pepsi The p-value is an uncompelling 40%. Density x What would happen if we had the same observed proportion of Coke drinkers, but a sample size of 200? 24
25 Coke vs Pepsi At n = 200 we have reduced our standard deviation by a factor of 2. Density x Our p-value drops to
26 Variance unknown So far we have been considering normal hypothesis tests when the variance σ 2 is known. Very often it is unknown. But if we have a sample of reasonable size (say, more than 30), then we can use a plug-in estimate without much inaccuracy. That is, we use the empirical standard deviation (the sample standard deviation) as if it were our known standard deviation: we treat ˆσ as if it were σ. 26
27 Are mountain people taller? It is claimed that individuals from the Eastern mountains are much taller, on average, than city dwellers who are known to have an average height of 67 inches. Of course, some mountain people are hobbits, so obviously there is a lot of variability. Based on a sample of 35 mountain people we measured, we find x = 73 and ˆσ = Can we reject the null hypothesis that there is no difference in height? 27
28 Are mountain people taller? We assume that our test statistic is distributed X N(67, /35) under the null distribution. Density x Our p-value is
29 Z scores In normal hypothesis tests where the rejection region is in the tail, we re essentially measure the distance of our observed measurement from the mean under the null distribution. How far is too far is determined by the level of our test and by the standard deviation under the null. To get a sense of how far into the tail an observation is, we can standardize our observation. If X N(µ, σ 2 ), then X µ σ N(0, 1). Applying this idea to a normal test statistic tells us how many standard deviations away from the mean our observed value is. In this last example we would get z = x / 35 =
30 Z scores The usefulness of this approach is mainly that we can remember a few special rejection regions. P(Z > 2.33) = 1% P(Z > 1.64) = 5% P(Z > 1.28) = 10% This defines rejection regions for one-sided tests at those 1%, 5% and 10% respectively. (Include a negative sign as the circumstances require.) The analogous two-sided thresholds are given by P(Z > 2.57) = 0.5% P(Z > 1.96) = 2.5% P(Z > 1.64) = 5%. We arrive at these numbers by dividing the test level by 2 and putting half of it in the left tail and half of it in the right tail. 30
31 Difference of two means A common use of hypothesis testing is to compare the means between two groups based on observed data from each group. For example, we may want to compare a drug to a placebo pill in terms of how much it reduces a patient s weight. In this case we have and for i = 1,..., n and j = 1,..., m. X i N(µ X, σ 2 X ) Y j N(µ Y, σ 2 Y ) Our test statistic in this case will be X Ȳ, the difference in the observed sample means. 31
32 Better than a placebo? Our test is H 0 :µ X µ Y = 0, H A :µ X > µ Y, which defines a rejection region in the right tail. The test statistic has null distribution of X Ȳ = D N(0, σx 2 /n + σy 2 /m) which we approximate as N(0, ˆσ X 2 /n + ˆσ Y 2 /m). 32
33 Better than a placebo? We observe that 34 patients receiving treatment have a mean reduction in weight of 5 pounds with standard deviation of 4 pounds. The 60 patients in the placebo group show a mean reduction in weight of 3 pounds with a standard deviation of 6 pounds. Can we reject the null hypothesis at the 5% level? In this case z = (5 3) 0 42 / /60 = so we reject at the 5% level because P(Z > 1.933) < 5%. If this were a 5% two-sided test, would we reject? 33
34 Difference in proportions Suppose we try to address the Coke/Pepsi local market share with a different kind of survey in which we conduct two separate polls and ask each person either Do you regularly drink Coke? or Do you regularly drink Pepsi?. With this set up we want to know if p X = p Y. Suppose we ask 40 people the Coke question and 53 people the Pepsi question. In this case the observed difference in proportions has approximate distribution D N(0, s 2 ) where s = p1 (1 p 1 ) 40 + p 2(1 p 2 )
35 Difference in proportions In practice we have to use ˆp1 (1 ˆp 1 ) ŝ = 40 + ˆp 2(1 ˆp 2 ). 53 If 30/40 people say that they regularly drink Coke and 30 out of 53 people say they regularly drink Pepsi, do we reject the null hypothesis at the 10% level? We find 0.75(1 0.75) ŝ = + 40 so z = ( ) 0 ŝ = Do we reject at the 5% level? 0.566( ) 53 =
36 Paired samples As a final variant, sometimes data from two groups comes paired, which changes yet again how we approximate our variance term. Suppose I want to know which route to work is faster. On each day in February I take the Lakeshore route and my colleague takes the MLK route. I average 22.2 minutes with a standard deviation of 3.75 minutes and he averages 20.8 minutes with a standard deviation of 3.96 minutes. Which way is faster? 36
37 Paired sample Because the samples are paired (in the sense of each happening on the same day) we can directly approximate the variance of the difference D i = X i Y i by n ˆσ D = n 1 (d i d) 2 where d i = x i y i. i=1 So the extra bit of information we need is that the standard deviation of the daily difference between the commute times was 4 minutes. In this case we have, under the null, D = X Ȳ N(0, 4 2 /28) and the observed difference is d = 1.4 leads to z = cannot reject at the 5% level. d 0 4/ = We 28 37
38 Practical vs. Statistical significance Suppose that the average commute times differ by 2 seconds. Do we care? If we have enough observations, we will eventually reject the null hypothesis and conclude that they are statistically significantly different. However, that result says nothing of the effect size the magnitude of the difference. This inability for the statistical machinery to distinguish between the two types of significance is sometimes called Lindley s paradox: With enough data, you ll reject any hypothesis at all! 38
39 Power See section 4.6 in OpenIntro Probability of rejecting Two-sided One-sided right tail se = 1 5% level Underlying mean The probability of rejecting the null hypothesis is called the power of the test. It will depend on the actual underlying value. The level of a test is precisely the power of the test when the null hypothesis is true. 39
40 Power Probability of rejecting se = 0.1 5% level Two-sided One-sided right tail Underlying mean The power function gets more pointed around the null hypothesis value as the sample size gets larger (which makes the standard error smaller). 40
41 Other test statistics Nothing obliges us to use the sample mean as our test statistic, other than convenience. A manufacturing facility produces 15 golf carts a day. Some days it produces more, some days less. We can model this variability with a normal distribution with mean 15 and standard deviation 3. The last 25 production days saw very low-production numbers. Has the production facility has changed: has the N(15, 3 2 ) description of the production variability become something left skewed? 41
42 Skewed golf cart production To test this hypothesis we consider the difference between the mean number of carts produced in the last period (13.96) and the median (15). We simulate (using R) from the null hypothesis (nothing has changed) and find that the distribution of the difference D has the following quantiles: 0.5% 1% 2.5% 5% How many total carts were produced in the last production period? What is the statistic used in this hypothesis test? 42
43 Skewed golf cart production 0.5% 1% 2.5% 5% Is a one-sided test or two-sided test is more appropriate? Would we reject the null hypothesis at the 1% level? 43
44 Confidence intervals A confidence interval consists of two numbers, a lower bound and an upper bound. The idea is to give a range of possible values for the true/unobserved/underlying parameter. The basic goal is to achieve coverage: we want our interval to capture the true value more often than not. One way to guarantee this is to make our intervals huge, but we can be more clever by using ideas from hypothesis testing. 44
45 Confidence intervals We consider normal confidence intervals for simplicity. Let X N(µ, σ 2 ). We know the following facts (with a bit of algebra) P(µ 1.96σ < X < µ σ) = 0.95 P( 1.96σ < X µ < 1.96σ) = 0.95 P( X 1.96σ < µ < X σ) = 0.95 P(X 1.96σ < µ < X σ) = This tells us that the random interval will overlap the mean 95% of the time. 45
46 Confidence interval This means that for normally distributed data, if we construct our interval estimate as x ± 1.96σ where x is our observed data, that we will cover the true value 95% of the time. Naturally, we can apply this to X as well, yielding an interval of the form x ± 1.96 σ n. 46
47 Confidence interval If we use a different number instead of 1.96, we can get different levels of coverage. For instance, an interval of the form has 90% coverage. x ± 1.64σ You will notice a straightforward relationship between confidence intervals and hypothesis tests: A null hypothesis value µ 0 outside the confidence interval implies an observation inside the rejection region. A null hypothesis value µ 0 inside the confidence interval implies an observation outside of the rejection region. 47
48 Asymmetric confidence intervals Confidence intervals don t have to be symmetric. By the same algebra as before P(µ a < X < µ + b) = 0.95 P( a < X µ < b) = 0.95 P( X aσ < µ < X + bσ) = 0.95 P(X b < µ < X + a) =
49 Asymmetric confidence intervals Such a confidence interval is based on an asymmetric rejection region. Density µ 0! 3! µ! 2! µ!! µ µ +! µ + 2! µ + 3! x 49
50 Simulation demo 50
51 Bootstrapping Recall the idea of bootstrapping introduced last lecture. To get a sense of our sampling variability, we simply resample our data (with replacement) to get a sample of the same size n. We compute our estimate on this sample over and over (thousands of times) and visualize the results in a histogram. We can use this approach to construct a bootstrap confidence interval: an interval which contains 1 α of the bootstrap estimates. 51
52 Bootstrapping a mean Let s try this idea out on our mountain people example. Frequency Height in inches A normal confidence interval would be 73 ± 1.96(12.6)
53 Bootstrap samples Here is the code. There are dedicated R packages for bootstrapping, but this one is by hand. 53
54 Bootstrap confidence intervals Here is a side-by-side comparison of the two results Inches We find (68.63, 77.17) with the standard approach (black) and (68.75, 77.01) with the bootstrap approach (red). 54
Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing
More informationThe Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report
More informationIntroduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationComparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationPoint and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More informationHypothesis testing. c 2014, Jeffrey S. Simonoff 1
Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationExperimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
More informationUnit 26 Estimation with Confidence Intervals
Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference
More informationWISE Power Tutorial All Exercises
ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationName: Date: Use the following to answer questions 3-4:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More information3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationCONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont
CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
More informationInference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationMONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010
MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationHow To Test For Significance On A Data Set
Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.
More informationOpgaven Onderzoeksmethoden, Onderdeel Statistiek
Opgaven Onderzoeksmethoden, Onderdeel Statistiek 1. What is the measurement scale of the following variables? a Shoe size b Religion c Car brand d Score in a tennis game e Number of work hours per week
More informationWeek 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationSTATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4
STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate
More informationSTT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
More informationIntroduction to Hypothesis Testing OPRE 6301
Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about
More informationTHE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
More informationReview #2. Statistics
Review #2 Statistics Find the mean of the given probability distribution. 1) x P(x) 0 0.19 1 0.37 2 0.16 3 0.26 4 0.02 A) 1.64 B) 1.45 C) 1.55 D) 1.74 2) The number of golf balls ordered by customers of
More informationMind on Statistics. Chapter 12
Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference
More informationLOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
More information6 3 The Standard Normal Distribution
290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More informationNonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
More informationSTAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013
STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean
More informationThe Math. P (x) = 5! = 1 2 3 4 5 = 120.
The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More informationT O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationOnline 12 - Sections 9.1 and 9.2-Doug Ensley
Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 12 - Sections 9.1 and 9.2 1. Does a P-value of 0.001 give strong evidence or not especially strong
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationOne-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationLesson 9 Hypothesis Testing
Lesson 9 Hypothesis Testing Outline Logic for Hypothesis Testing Critical Value Alpha (α) -level.05 -level.01 One-Tail versus Two-Tail Tests -critical values for both alpha levels Logic for Hypothesis
More informationSolutions to Homework 6 Statistics 302 Professor Larget
s to Homework 6 Statistics 302 Professor Larget Textbook Exercises 5.29 (Graded for Completeness) What Proportion Have College Degrees? According to the US Census Bureau, about 27.5% of US adults over
More informationChapter 2. Hypothesis testing in one population
Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance
More informationHow To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationThe Normal distribution
The Normal distribution The normal probability distribution is the most common model for relative frequencies of a quantitative variable. Bell-shaped and described by the function f(y) = 1 2σ π e{ 1 2σ
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
STATISTICS/GRACEY PRACTICE TEST/EXAM 2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Identify the given random variable as being discrete or continuous.
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationSTATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS
STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS 1. If two events (both with probability greater than 0) are mutually exclusive, then: A. They also must be independent. B. They also could
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationAn Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS
The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More information1 Nonparametric Statistics
1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationc. Construct a boxplot for the data. Write a one sentence interpretation of your graph.
MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?
More informationUnderstanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation
Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation Leslie Chandrakantha lchandra@jjay.cuny.edu Department of Mathematics & Computer Science John Jay College of
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationDef: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationA POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
More informationTwo-sample inference: Continuous data
Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As
More informationReview. March 21, 2011. 155S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results
MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 7 Estimates and Sample Sizes 7 1 Review and Preview 7 2 Estimating a Population Proportion 7 3 Estimating a Population
More informationChapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position
Chapter 27: Taxation 27.1: Introduction We consider the effect of taxation on some good on the market for that good. We ask the questions: who pays the tax? what effect does it have on the equilibrium
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationHYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationStatistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!
Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!) Part A - Multiple Choice Indicate the best choice
More informationThe Wilcoxon Rank-Sum Test
1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We
More informationTwo-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
More informationCharacteristics of Binomial Distributions
Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More information1. How different is the t distribution from the normal?
Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.
More informationColored Hats and Logic Puzzles
Colored Hats and Logic Puzzles Alex Zorn January 21, 2013 1 Introduction In this talk we ll discuss a collection of logic puzzles/games in which a number of people are given colored hats, and they try
More informationSKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.
SKEWNESS All about Skewness: Aim Definition Types of Skewness Measure of Skewness Example A fundamental task in many statistical analyses is to characterize the location and variability of a data set.
More information