# Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Save this PDF as:

Size: px
Start display at page:

Download "Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests"

## Transcription

1 Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis testing. Most of the material presented has been taken directly from either Chapter 4 of Scharf [3] or Chapter 10 of Wasserman [4]. 2 Hypotheses Let X be a random vector with the range χ and distribution F θ (x). The parameter θ is belongs to the parameter space Θ. Let Θ = Θ 0 Θ1... ΘM 1 be a disjoint covering of the parameter space. We define H i as the hypothesis is that θ Θ i. An M-ary hypothesis test chooses which of the M disjoint subsets contain the unknown parameter θ. When M = 2 we have a binary hypothesis test. For the remainder of this document we will only discuss binary hypothesis tests (H 0 : θ Θ 0 versus H 1 : θ Θ 1 ). 2.1 Null and Alternative Hypotheses In many binary hypothesis tests H 0 is referred to as the null hypothesis and H 1 the alternative hypothesis. This is due to the fact that in most binary tests the hypothesis H 0 is set up to be refuted in order to support an alternative hypothesis H 1. That is, H 0 usually represents the absence of some effect/factor/condtion. For example when testing if a new drug is better than a placebo for relieving a set of symptoms the null hypothesis H 0 says the new drug has the same effect as the placebo. With tests of this form it is common to talk about a hypothesis test in terms accepting or rejecting the null hypothesis. 2.2 Simple vs. Composite When Θ i contains a single element θ i hypothesis H i is said to be simple. Otherwise it is composite. A binary hypothesis test can be simple vs simple, simple vs composite, composite vs simple, or composite vs composite. Here are some simple examples: H 0 : θ = θ 0 versus H 1 : θ = θ 1 (simple vs simple) H 0 : θ = 0 versus H 1 : θ 0 (simple vs composite) H 0 : θ < 0 versus H 1 : θ > 0 (composite vs composite) 2.3 One-Sided and Two-Sided Tests A hypothesis test is considered to be two-sided if the is of the form: H 0 : θ = θ 0 versus H 1 : θ θ 0 where the alternative hypothesis H 1 lies on both sides of H 0. A test of the form H 0 : θ θ 0 versus H 1 : θ > θ 0 1

2 or H 0 : θ θ 0 versus H 1 : θ < θ 0 is called a one-sided test. Note that one-sided and two-sided tests are only defined for scalar parameter spaces and at least one hypothesis must be composite. 2.4 Frequentist vs. Bayesian View of Hypotheses Thus far we discussed hypothesis testing in terms of determining which subset of a parameter space an unknown θ lies. The classic/frequentist approach to hypothesis testing treats θ as deterministic but unknown. A Bayesian approach treats θ as a random variable and assumes there is a distribution on the possible θ in the parameter space. That is, one can define a prior on each hypothesis being true. Discussion of advantageous and disadvantageous of each of these will be spread throughout the following sections. 3 Binary Hypothesis Testing Give a sample x of a random vector X whose range is χ and has the distribution F θ (x) a binary hypothesis test (H 0 : θ Θ 0 versus H 1 : θ Θ 1 ) takes the form { 0 H 0, x A φ(x) = 1 H 1, x A c (1), This equation is read as, the test function φ(x) equals 0, and the hypothesis H 0 is accepted, if the measurement x lies in the acceptance region A (where A χ). If the measurement lies outside this region then test function equals 1 and hypothesis H 0 is rejected and H 1 is accepted. Usually the region A is of the form: A = {x : T (x) < c} (2) where T is a test statistic and c is a critical value. The trick is to find the appropriate test statistic T and an appropriate critical value c. We will be more explicit about what appropriate means in the following sections. 3.1 Type I and Type II Errors There are two types of errors a binary hypothesis test can make. A type I error or false alarm is when H 0 is true, but x A c. That is, the test chooses H 1 when H 0 is true. A type II error or miss is when H 1 is true, but x A. That is, the test chooses H 0 when H 1 is true. 3.2 Size and Power If H 0 is simple (Θ 0 = {θ 0 }), the size or probability of false alarm is α = P θ0 (φ(x) = 1) = E θ0 [φ(x)] = P F A. (3) where E θ0 [φ(x)] indicates that φ(x) is averaged under the density function f θ0 (x). If H 0 is composite, the size is defined to be α = sup E θ [φ(x)]. (4) The size is the worst-case probability of choosing H 1 when H 0 is true. A test is said to have level α if its size is less than or equal to α. A hit or detection is when H 1 is true, and x A c. If H 1 is simple, the power or probability of detection is β = P θ1 (φ(x) = 1) = E θ1 [φ(x)] = P D. (5) 2

3 If H 1 is composite, the power is defined for each θ Θ 1 as β(θ). In fact everything can be defined in terms of this power function: β(θ) = P θ (φ(x) = 1) = P θ (x A c ) (6) That is the power of a composite test is defined for each θ Θ 1 and the size can be written as: α = sup β(θ) (7) A receiver operating characteristic (ROC) curve is a plot of β versus α (for a simple vs simple hypothesis test). Usually multiple β, α pairs are obtained by adjusting the threshold / critical value c in Equation Bias A test φ(x) is said to be unbiased if its power is never smaller than its size. That is, 3.4 Best and Uniformly Most Powerful Test β(θ) α θ Θ 1 (8) For a simple versus simple binary hypothesis test, φ(x) is the best test of size α if it has the most power among all tests of size α. That is, if φ(x) and φ (x) are two competing tests each of which has size α, then β β. The best test maximizes the probability of detection (power) for a fixed probability of false alarm (size). Neyman-Pearson will show us the form of the best test for a fixed α in the next section. A test φ(x) is said to be uniformly most powerful (UMP) of size α if it has size α and its power is uniformly (for all θ) greater than the power of any other test φ (x) whose size is less than or equal to α. That is: sup E θ [φ(x)] = α sup E θ [φ (x)] α E θ [φ(x)] E θ [φ (x)] θ Θ 1 In general a UMP test may be difficult to find or may not exist. One strategy to proving a test is UMP is to find the best test for a particular θ and then show the test does not depend on θ. The Karlin-Rubin theorem shows how to obtain the UMP test for certain one-sided hypothesis tests (See [3]). 4 Neyman-Pearson Lemma The Neyman-Pearson Lemma shows how to find the most powerful or best test of size α when H 0 and H 1 are both simple. The lemma tells us the appropriate test statistic T to maximize the power given a fixed α. The test is a slight generalization of the test defined in 1. The lemma states that: 1, f θ1 (x) > kf θ0 (x) φ(x) = γ, f θ1 (x) = kf θ0 (x) (9) 0, f θ1 (x) < kf θ0 (x), or alternatively 1, T (x) > k φ(x) = γ, T (x) = k 0, T (x) < k for some k 0, 0 γ 1, is the most powerful test of size α > 0 for testing H 0 versus H 1. T (x) = f θ1 (x)/f θ0 (x) = L(x) and is called the likelihood ratio. When φ(x) is 1 or 0 it is the same as in Equation 1. However, when φ(x) = γ we flip a γ coin to select H 1 with probability γ (when the coin comes up heads). (10) 3

4 Proof. Consider any test φ (x) such that α α. We have [φ(x) φ (x)][f θ1 (x) kf θ0 (x)] 0 (11) β β k(α α ) (12) 0. (13) 4.1 Setting the size to α The question remains of how to set k to produce a test of size α. The size for this test is: If there exists a k 0 such that α = E θ0 [φ(x)] = P θ0 [L(x) > k] + γp θ0 [L(x) = k] (14) then we set γ = 0 and pick k = k 0. Otherwise there exists a k 0 such that We can then use k = k 0 and choose the γ that solves: 4.2 Interpretation P θ0 [L(x) > k 0 ] = α (15) P θ0 [L(x) > k 0] α < P θ0 [L(x) k 0] (16) γp θ0 [L(x) = k 0] = α P θ0 [L(x) > k 0] (17) So the Neyman-Pearson tells us that the best / most powerful test for a fixed size alpha is one that uses that makes decisions by thesholding the likelihood ratio L(x). We refer to such tests as likelihood ration tests (LRT). Note that the test statistic is the likelihood ratio L(x) and is a random variable. If L(x) = k with probability zero (which is most likely the case for continuous x) then the threshold k is found as: α = P θ0 [L(x)] > k] = where f θ0 (L) is the density function for L(x) under H 0. 5 Bayesian Hypothesis Testing k f θ0 (L)dL (18) In the previous sections we discussed simple binary hypothesis testing in the following framework. Given a measurement x drawn from the distribution F θ (x), how do we choose whether θ = θ 0 or θ = θ 1. We defined hypothesis H 0 : θ = θ 0 and H 1 : θ = θ 1 and look for a test of H 1 versus H 0 that is optimal. We talked about optimality in terms of maximizing the power (β) of such a test for a fixed size α. The parameter θ (and the hypothesi) are treated at deterministic but unknown quantities. That is either H 1 or H 0 is true and we don t know which one. We don t have any prior knowledge of how likely H 1 or H 0 is to occur or how likely any parameter choice is. Note that the power and size are both defined in terms of one of the hypothesis being true. The Bayesian approach to hypothesis testing treats θ and the hypothesis H as unknown random variables. Here we are introducing the random variable H to represent the hypothesis. If H = i it means hypothesis H i is true. We can think of the test function φ as an estimator for H. The conceptual framework is as follows. Mother Nature selects H and the parameter θ from a joint distribution f(θ, H) = p(h)f(θ H). She does this by first choosing H according to p(h) and then picks a θ according to f(θ H). Note that f(θ H = i) = 0 for all θ / Θ i. Her selection determines from which distribution F θ (x) Father Nature draws his measurement. This measurement is given to the experimenter and he or she must decide between estimate the value of H via a decision function φ. Each time the experiment is run a parameter θ is chosen by Mother nature, and the 4

5 experimenter outputs a decision Ĥ = φ(x). The goal in Bayesian hypothesis testing to design a test φ that is optimal / gives the best performance. Here optimality is described in terms of the Bayes risk which is described below. To be more concrete let us consider the simple versus simple binary hypothesis test we have been discussing so far. Let p(h = H 0 ) = p 0 and p(h = H 1 ) = 1 p 0. Since the hypothesis is simple f(θ H = H i ) = δ(θ θ i ). Mother nature selects H according to p(h). In the simple binary case we are considering this is equivalent to picking θ = θ 0 with probability p 0 and θ = θ 1 with probability p 1 = 1 p 0. Depending on her choice Father Nature then draws a a measurement x from either F θ0 (x) or F θ1 (x). Our goal will be to obtain Ĥ = φ(x) that minimizes the Bayes risk. 5.1 Cost of Decisions A cost or loss function is defined for each possible pairing of the true hypothesis H and decision Ĥ = φ(x). That is for the pair (H, φ(x)) = (i, j) we assign a nonnegative cost C[H = i, φ(x) = j] = c ij. We say c ij the loss incurred when Mother Nature selects the hypothesis H i and the experimenter decides to choose H j. That is for a simple binary hypothesis test we give values for c 00, c 11, c 01 and c 10. Normal we do not associate a loss/cost for making a correct decision, i.e. c 00 = c 11 = Risk We define the risk R(H i, φ(x)) for each H i as the expected loss given H = H i for particular test function φ: { c 00 P 00 + c 01 P 01, θ = θ 0 R(H, φ) = E x [C[H, φ(x)]] = (19) c 10 P 10 + c 11 P 11, θ = θ 1 where P ij = p(φ(x) = 1 H = H i ). This is equivalent to p θi (φ(x) = j) in the simple binary hypothesis case. 5.3 Bayes Risk The Bayes risk is the average risk over the distribution of H that Mother Nature used (for the binary case, only P (H = 0) = p 0 is needed). R(p 0, φ) = E H [R(H, φ)] = p 0 R(H = 0, φ) + (1 p 0 )R(H = 1, φ) (20) Given that we known the prior p 0, the optimal test φ is defined to be the one that minimizes the Bayes Risk: φ = arg min R(p 0, φ ) (21) φ It turns out that the solution to this equation/optimization has the form (see notes or Scharf [3] Chapter 5): { 1, L(x) > η φ(x) = (22) 0, L(x) < η where f(x H = 1) L(x) = f(x H = 0) = f(x H = 1, θ)f(θ H = 1)dθ θ θ f(x H = 0, θ)f(θ H = 0)dθ (23) is the likelihood ratio and for the simple versus simple case is and the threshold L(x) = f(x θ 1) f(x θ 0 ) = f θ 1 (x) f θ0 (x) η = p 0 (c 10 c 00 ) (1 p 0 )(c 01 c 11 ) (24) (25) 5

6 So once again we see that the optimal test is a likelihood ratio test. Here optimal is in terms of minimizing Bayes risk. The test statistic is the likelihood ratio and the acceptance region depends on the threshold η which is based on the priors and decision costs. Minimizing risk sounds like the right way to think about these problems. However this approach requires us to have some knowledge about the prior on θ and be able to assign a cost to each possible outcome. This may be difficult in some applications. For example, what is the prior probability of a missile coming toward your home? We will quickly discuss what can be done when we can assign costs to decision but don t know the a priori probabilities of each hypothesis in the Minmax section below. When we don t know the prior AND cannot think of a meaningful costs we go back to Neyman-Pearson testing. 6 Test Statistics and Sufficiency When we talk about the hypothesis H being a random variable we can consider the following Markov chain H X T (X) where T (X) is some test statistic. Any test statistic is said to be sufficient for H if p(h X) = p(h T (X)) That is, if T (X) is sufficient it tells us everything we need to know about the observation x in order to estimate H and the Markov chain can be written as H T (X) X. Note that the likelihood ratio L(X) was to optimal test statistic for our hypothesis test. It takes our K dimension observation x and maps it to a scalar value. It can be shown that L(X) is a sufficient statistic for H. This was just a quick note. More details can be found in the notes. 7 Minimax Tests As we eluded to before we may be able to associated a cost with each of the possible outcomes of a hypothesis test but have no idea what the prior on H is (or even worse the full f(θ, H)). In such cases we can play a game in which we assume Mother Nature is really mean and will choose a prior that makes whatever test we choose to look bad. That is for a simple versus simple binary hypothesis test: To combat this, we will try to find a φ that minimizes the worst she can do: max po min φ R(p 0, φ) (26) min φ max po R(p 0, φ) (27) Seciont 5.3 of Scharf [3] shows how to find such a minmax detector / test function φ. It is also shown that This topic is also discussed in the notes. max po min φ R(p 0, φ) = min φ max po R(p 0, φ) (28) 8 Generalized Likelihood Ratio Test A discussed before when dealing with composite hypotheses in Neyman-Pearson framework we wish to find the UMP test for a fixed size α (put your Bayes hat away for a bit). However, it is typically the case that such a test does not exist. A generalized likelihood ratio test is a way to deal with general composite hypothesis test. Again we will focus on the binary case with H 0 : θ Θ 0 versus H 1 : θ Θ 1. The generalized likelihood ratio is a test statistic with the following form: L G (x) = sup θ Θ 1 f(x θ) sup θ Θ0 f(x θ) (29) We see that for a simple versus simple hypothesis test L G (x) = L(x). This test statistic is rather intuitive. If top part of the fraction in Equation 29 is greater than the bottom part then the data is best explained when 6

7 θ Θ 1. If the opposite is true then the data is best explained by θ Θ 0. Instead of using Equation 29 as the test statistic one generally uses: λ(x) = sup θ Θ f(x θ) (30) sup θ Θ0 f(x θ) instead, where Θ = Θ 0 Θ1. It can be shown in general that λ(x) = max(l G (x), 1). This may seem a little strange in that Θ and Θ 0 are nested (they are not disjoint). Much more can be said about this, but for now I will simply paraphrase Wasserman s tutorial on this subject [4] and say that in practice using λ(x) instead of L G (X) has little effect in practice and theoretically properties of λ(x) are usually much easier to obtain. So the generalized likelihood ratio test gives us a test statistic that makes some intuitive sense. We can threshold this statistic and calculate it s size and power if we can derive it s distribution under each hypothesis. Now let s put our Bayes hat back on. We showed before that the likelihood ratio test minimizes the Bayes risk and that for a composite test the likelihood ratio is: f(x H = 1, θ)f(θ H = 1)dθ L(x) = θ θ f(x H = 0, θ)f(θ H = 0)dθ (31) That is we should integrate over all possible values of theta for each hypothesis in each hypothesis. However, typically our parameter space is extremely large. If we assume that f(x H = 1, θ)f(θ H = 1) contains a large peak (looks like a delta function) at ˆθ 1 = arg max θ Θ1 f(x H = 1, θ)f(θ H = 1) and f(x H = 1, θ)f(θ H = 0) peaks at ˆθ 0 we can approximate the likelihood ratio as: ˆL(x) = f(x ˆθ 1 )f( ˆθ 1 H = 1) f(x ˆθ 0 )f( ˆθ 1 H = 0) = max θ Θ 1 f(x H = 1, θ)f(θ H = 1) max θ Θ0 f(x H = 0, θ)f(θ H = 0) (32) which is one possible interpretation of the generalized likelihood ratio ˆL G (x) (ignore the details involved with max and sup). 9 Test of Significance In the Neyman-Pearson testing framework one fixes the size of the α of the test. However, different people may have different criteria for choosing an appropriate size. One experimenter may be happy with setting the size to α = 0.05 while another demands α is set to In such cases it is possible that one experimenter accepts H 0 while the other rejects it when given the same data x. Only reporting if H 0 was accepted or rejected may not be very informative in such cases. If the experimenters both use the same test statistic (i.e. both do a likelihood ratio test) it may be more useful for them to report the outcome of their experiment in terms of the significance probability or p-value of the test (also referred to as the observed size). We give a formal definition for the p-value below. 9.1 P-value If a test rejects H 0 at a level α it will also reject at a level α > α. Remember that when a test rejects H 0 at level α that means it s size is less than or equal to α. (i.e. if we say test rejects at level.05 that means it s size α.05). The p-value is the smallest α at which a test rejects H 0. Suppose that for every α (0, 1) we have a size α test with a rejection A c α, then p-value = inf{α : T (x) A c α} (33) That is the p-value is the smallest level α at which an experimenter using the test statistic T would reject H 0 on the basis of the observation x. Ok, that definition requires a lot of thought to work through. It may be easier to understand what a p-value is if we explain how to calculate one: p-value = sup P θ (T (X) T (x)) (34) 7

8 where x is the observed value of X. If H 0 is a simple hypothesis and Θ 0 = {θ 0 } then p-value = P θ (T (X) T (x)) = P (T (X) T (x) H 0 ) (35) It is important remember that T (X) is a random variable with a particular distribution under H 0 and that T (x) is a number, the value of the test statistic for the observed x. In the case of a simple H 0 the p-value is the probability of obtaining a test statistic value greater than the one you observed when H 0 is true. Another way to look at it is that the p-value is the size of a test using your observed T (x) as the threshold for rejecting H 0. If the test statistic T (X) has a continuous distributen then under a simple H 0 : θ = θ 0, the p-value has a uniform distribution between 0 and 1. If we reject H 0 when the p-value is less than α then the probability of false alarm (or size of the test) is α. That is we can set up a test to have size α by making the test function be: { 1, p-value α φ(x) = (36) 0, p-value < α, So a small p-value is strong evidence against H 0. However, note that a large p-value is NOT strong evidence in favor of H 0. A large p-value can mean H 0 is true OR H 0 is false but the test has low power β. It also important to note that the p-value is not the probability that the null hypothesis is true. That is, in almost every case the p-value p(h 0 x). We will shown one exception in the next section. 9.2 P-values for One-Sided Tests Let s look at a one-sided tests where Θ R and θ 0 is entirely to one side of θ 1. In this case, p-values will sometimes have a Bayesian justification. For example if X N(θ, σ 2 ) and p(θ) = 1, then p(θ x) is N(x, σ 2 ). We test H 0 : θ θ 0 against H 1 : θ > θ 0, ( ) θ0 x p(h 0 x) = p(θ θ 0 x) = Φ. (37) σ The p-value is because Φ is symmetric. 10 Permutation Tests ( ) x θ0 p-value = p(x x) = 1 Φ = p(h 0 x) (38) σ We showed how to calculate a p-value or significance in the previous section. This calculation requires knowing the distribution (the cdf) of the test statistic T (X) under H 0. However, in many cases it may be difficult to obtain this distribution (i.e. distribution of X may be unknown). A permutation tests are tests based on non-parametric estimates of significance. It does not rely on the distribution of the test statistic. The basic procedure is as follows: 1. Compute the observed value of the test statistic t obs = T (x) 2. Obtain a new sample x s that obeys the null hypothesis H 0 via a resampling function π. That is x s = π(x). 3. Compute t s = T (x s ) 4. Repeat Steps 2 and 3 B times and let t 1,..., t B denote the resulting values. 5. Calculate an approximate ˆp-value = 1 B B j=1 I(t j > t obs ) where I(true) = 1 and I(false) = Reject H 0 (choose H 1 ) if ˆp-value > α 8

9 where step 5 uses the empirical cumulative distribution obtained from samples in step 2 to estimate the p- value. The question remains on what we mean by obeys the null hypothesis in step 2. The resampling function π obeys H 0 if each new sample x s is equality likely when H 0 is true. Take for example an observation x = (y 1, y 2,..., y N, z 1,..., z M ) which is N observations of some random variable Y and M observations of the random variable Z. If H 0 was that both Z and Y have the same mean N+M j=n+1 x j. We let π(x) produces a new one possible test statistic would be T (x) = 1 N N i=1 x i 1 M N+M sample that is a random permutation on the order of the elements in x s. There are (N+M)! possible permutations each of which is equally likely under H 0. A simple introduction to permutation tests can be found in [1]. In [2] Joeseph P. Romano shows that for any finite set of transformations π Π that are a mapping of X onto itself and for which π(x) and X have the same distribution under H 0 the testing procedure described above produces a test of size α. References [1] P. Good. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer- Verlag, [2] J. P. Romano. On the behavoir of randomization tests without a group invariance assumption. Journal of the American Statistical Association, 85(411): , [3] L. Scharf. Statistical Signal Processing: Detection, Estimation, and Time Series Analysis. Addison- Wesley Publishing Company, [4] L. Wasserman. All of Statistics : A Concise Course in Statistical Inference. Springer-Verlag New York, Inc.,

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### 15.0 More Hypothesis Testing

15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections - we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### [Chapter 10. Hypothesis Testing]

[Chapter 10. Hypothesis Testing] 10.1 Introduction 10.2 Elements of a Statistical Test 10.3 Common Large-Sample Tests 10.4 Calculating Type II Error Probabilities and Finding the Sample Size for Z Tests

### SECOND PART, LECTURE 3: HYPOTHESIS TESTING

Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 3: HYPOTHESIS TESTING Lecture 3: Hypothesis Testing Prof.

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the Means of Two Normal

### Hypothesis Testing: General Framework 1 1

Hypothesis Testing: General Framework Lecture 2 K. Zuev February 22, 26 In previous lectures we learned how to estimate parameters in parametric and nonparametric settings. Quite often, however, researchers

### Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

### Chapter 8 Introduction to Hypothesis Testing

Chapter 8 Student Lecture Notes 8-1 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate

### Lecture Outline. Hypothesis Testing. Simple vs. Composite Testing. Stat 111. Hypothesis Testing Framework

Stat 111 Lecture Outline Lecture 14: Intro to Hypothesis Testing Sections 9.1-9.3 in DeGroot 1 Hypothesis Testing Consider a statistical problem involving a parameter θ whose value is unknown but must

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.6 Comparing the

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### 6.2 Permutations continued

6.2 Permutations continued Theorem A permutation on a finite set A is either a cycle or can be expressed as a product (composition of disjoint cycles. Proof is by (strong induction on the number, r, of

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### 1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

### HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

### Hypothesis Testing. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 10

Hypothesis Testing Mark Huiskes, LIACS mark.huiskes@liacs.nl Estimating sample standard deviation Suppose we have a sample x_1,, x_n Average: xbar = (x_1 + + x_n) / n Standard deviation estimate s? Two

### Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

### Testing: is my coin fair?

Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference Stephen Tu tu.stephenl@gmail.com 1 Introduction This document collects in one place various results for both the Dirichlet-multinomial

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Exact Confidence Intervals

Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter

### Hypothesis testing and the error of the third kind

Psychological Test and Assessment Modeling, Volume 54, 22 (), 9-99 Hypothesis testing and the error of the third kind Dieter Rasch Abstract In this note it is shown that the concept of an error of the

### Chapter 9 Testing Hypothesis and Assesing Goodness of Fit

Chapter 9 Testing Hypothesis and Assesing Goodness of Fit 9.1-9.3 The Neyman-Pearson Paradigm and Neyman Pearson Lemma We begin by reviewing some basic terms in hypothesis testing. Let H 0 denote the null

### Non-Inferiority Tests for Two Means using Differences

Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

### Statistical Inference: Hypothesis Testing

Statistical Inference: Hypothesis Testing Scott Evans, Ph.D. 1 The Big Picture Populations and Samples Sample / Statistics x, s, s 2 Population Parameters μ, σ, σ 2 Scott Evans, Ph.D. 2 Statistical Inference

### Hypothesis testing S2

Basic medical statistics for clinical and experimental research Hypothesis testing S2 Katarzyna Jóźwiak k.jozwiak@nki.nl 2nd November 2015 1/43 Introduction Point estimation: use a sample statistic to

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### Null Hypothesis Significance Testing Signifcance Level, Power, t-tests. 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom

Null Hypothesis Significance Testing Signifcance Level, Power, t-tests 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Simple and composite hypotheses Simple hypothesis: the sampling distribution is

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Classical Hypothesis Testing and GWAS

Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 15, 2014 Outline General Principles Classical statistical hypothesis testing involves the test of a null hypothesis

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### TRANSCRIPT: In this lecture, we will talk about both theoretical and applied concepts related to hypothesis testing.

This is Dr. Chumney. The focus of this lecture is hypothesis testing both what it is, how hypothesis tests are used, and how to conduct hypothesis tests. 1 In this lecture, we will talk about both theoretical

### 38. STATISTICS. 38. Statistics 1. 38.1. Fundamental concepts

38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

### Multiple Testing. Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Abstract

Multiple Testing Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf Abstract Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular

### Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

### Prep Course in Statistics Prof. Massimo Guidolin

MSc. Finance & CLEFIN A.A. 3/ ep Course in Statistics of. Massimo Guidolin A Few Review Questions and oblems concerning, Hypothesis Testing and Confidence Intervals SUGGESTION: try to approach the questions

### The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

### Lecture 9: Bayesian hypothesis testing

Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian

### . (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

### Lecture 8 Hypothesis Testing

Lecture 8 Hypothesis Testing Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Midterm 1 Score 46 students Highest score: 98 Lowest

### People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

### BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### The Neyman-Pearson lemma. The Neyman-Pearson lemma

The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to

### Point Biserial Correlation Tests

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

### Hypothesis Testing or How to Decide to Decide Edpsy 580

Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Hypothesis Testing or How to Decide to Decide

### PASS Sample Size Software. Linear Regression

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

### Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### Non-Inferiority Tests for One Mean

Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

### Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

### Chapter 2. Hypothesis testing in one population

Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### 1 Norms and Vector Spaces

008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

### Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

### SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

SOLUTIONS TO EXERCISES FOR MATHEMATICS 205A Part 3 Fall 2008 III. Spaces with special properties III.1 : Compact spaces I Problems from Munkres, 26, pp. 170 172 3. Show that a finite union of compact subspaces

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Chapter 4 Statistical Inference in Quality Control and Improvement 許 湘 伶 Statistical Quality Control (D. C. Montgomery) Sampling distribution I a random sample of size n: if it is selected so that the

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Module 7: Hypothesis Testing I Statistics (OA3102)

Module 7: Hypothesis Testing I Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 10.1-10.5 Revision: 2-12 1 Goals for this Module

### HypoTesting. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: HypoTesting Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A Type II error is committed if we make: a. a correct decision when the

### Correlational Research

Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

### MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample

MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of

### Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider

### Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

### Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

### Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

### Nonparametric statistics and model selection

Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.