Statistical foundations of machine learning

Size: px
Start display at page:

Download "Statistical foundations of machine learning"

Transcription

1 Machine learning p. 1/45 Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212

2 Machine learning p. 2/45 Testing hypothesis Hypothesis testing is the second major area of statistical inference. A statistical hypothesis is an assertion or conjecture about the distribution of one or more random variables. A test of a statistical hypothesis is a rule or procedure for deciding whether to reject the assertion on the basis of the observed data. The basic idea is formulate some statistical hypothesis and look to see if the data provides any evidence to reject the hypothesis.

3 Machine learning p. 3/45 An hypothesis testing problem Consider the model of the traffic in the boulevard. Suppose that the measures of the inter-arrival times are D N = {10, 11, 1, 21, 2,... } seconds. Can we say that the mean inter-arrival time θ is different from 10? Consider the grades of two different school sections. Section A had {15, 10, 12, 19, 5, 7}. Section B had {14, 11, 11, 12, 6, 7}. Can we say that Section A had better grades than Section B? Consider two protein coding genes and their expression levels in a cell. Are the two genes differentially expressed? A statistical test is a procedure that aims to answer such questions.

4 Machine learning p. 4/45 Types of hypothesis We start by declaring the working (basic, null) hypothesis H to be tested, in the form θ = θ 0 or θ ω Θ, where θ 0 or ω are given. The hypothesis can be Simple. It fully specifies the distribution of z. Composite. It partially specifies the distribution of z. Example: if D N constitutes a random sample of size N from N(µ, σ 2 ) the hypothesis H : µ = µ 0, σ = σ 0, (with µ 0 and σ 0 known values) is simple while the hypothesis H : µ = µ 0 is composite since it leaves open the value of σ in (0, ).

5 Machine learning p. 5/45 Types of statistical test Suppose we have collected N samples D N = {z 1,...,z N } from a distribution F z and we have declared a null hypothesis H about F. Three are the most common types of statistical test: Pure significance test: data D N are used to assess the inferential evidence against H. Significance test: the inferential evidence against H is used to judge whether H is inappropriate. In other words it is a rule for rejecting H. Hypothesis test: data D N are used to assess the hypothesis H against a specific alternative hypothesis H. In other words this is a rule for rejecting H in favour of H.

6 Machine learning p. 6/45 Pure significance test Suppose that the null hypothesis H is simple. Let t(d N ) be a statistic such that the larger its value the more it casts doubt on H. The quantity t(d N ) is called test statistic or discrepancy measure. Let t N = t(d N ) the value of t calculated on the basis of the sample data D N. Let us consider the p-value quantity p = Prob {t(d N ) > t N H} If p is small the sample data D N are highly inconsistent with H and p (significance probability or significance level ) is the measure of such inconsistency.

7 Machine learning p. 7/45 Some considerations p is the proportion of situations under the hypothesis H where we would observe a degree of inconsistency at least to the extent represented by t N. t N is the observed value of the statistic for a given D N. Different D N yield different values of p (0, 1). it is essential that the distribution of t(d N ) under H is known. We cannot say that p is the probability that H is true but better that p is the probability that the dataset D N is observed given that H is true Open issues 1. What if H is composite? 2. how to choose t(d N ).

8 Machine learning p. 8/45 Tests of significance Suppose that the value p is known. If p is small either a rare event has occured or perhaps H is not true. Idea: if p is less than some stated value α, we reject H. We choose a critical level α, we observe D N and we reject H at level α if P {t(d N ) > t N H) α This is equivalent to choose some critical value t α and we reject H if t N > t α. We obtain two regions in the space of sample data: critical region S 0 where if D N S 0 we reject H. non-critical region S 1 where the sample data D N gives us no-reason to reject H on the basis of the level-α test.

9 Machine learning p. 9/45 Some considerations The principle is that we will accept H unless we witness some event that has sufficiently small probability of arising when H is true. If H were true we could still obtain data in S 0 and consequently wrongly reject H with probability Prob {D N S 0 H} = Prob {t(d N ) > t α H} = α The significance level α provides an upper bound to the maximum probability of incorrectly rejecting H. The p-value is the probability that the test statistic is more extreme than its observed value. The p-value changes with the observed data (i.e. it is a random variable) while α is a level fixed by the user.

10 Machine learning p. 10/45 Standard normal distribution 1 Normal distribution function (µ=0, σ=1) 0.4 Normal density function (µ=0, σ=1) Remember that z This means that, if z N(0, 1), then Prob {z z 0.05 } = 0.05 and also that For a generic z N(µ, σ 2 ) Prob { z z 0.05 } = = 0.1 Prob { z µ σ z 0.05 } = = 0.1

11 Machine learning p. 11/45 TP: example Let D N consist of N independent observations of x N(µ, σ 2 ), with known variance σ 2. We want to test the hypothesis H : µ = µ 0 with µ 0 known. Consider as test statistic t(d N ), the quantity ˆµ µ 0 where ˆµ is the sample average estimator. If H is true we know that ˆµ N(µ 0, σ 2 /N). Let us calculate the value t(d N ) = ˆµ µ 0 and assume that the rejection region is S 0 = { ˆµ µ 0 ˆµ µ 0 > t α }. Let us put a significance level α = 10% = 0.1. This means that t α should satisfy Prob {t(d N ) > t α H} = Prob { ˆµ µ 0 > t α H} = Prob {(ˆµ µ 0 > t α ) (OR) (ˆµ µ 0 < t α ) H} = 0.1

12 Machine learning p. 12/45 TP: example (II) For a normal variable x N(µ, σ 2 ) Prob {x µ > 1.645σ} = 1 F x (1.645σ) = 0.05 and consequently Prob {x µ > 1.645σ (OR) x µ < 1.645σ} = = 0.1 It follows that being ˆµ N(µ 0, σ 2 /N) (i.e. ˆµ µ 0 σ/ N N(0, 1)) once we put t α = 1.645σ/ N we have Prob { ˆµ µ 0 > t α H} = 0.1 and that the critical region is S 0 = { D N : ˆµ µ 0 > 1.645σ/ } N

13 Machine learning p. 13/45 TP: example (III) Suppose that σ = 0.1 and that we want to test if µ = µ 0 = 10 with a significance level 10%. After N = 6 observations we have D N = {10, 11, 12, 13, 14, 15}. On the basis of the dataset we compute ˆµ = = 12.5 and t(d N ) = ˆµ µ 0 = 2.5 Since t α = / 6 = , and t(d N ) > t α, the observations D N are in the critical region. The hypothesis is rejected.

14 Machine learning p. 14/45 Hypothesis testing: types of error So far we considered a single hypothesis. Let us now consider two alternative hypothesis: H and H. Type I error. It is the error we make when we reject H if it is true. Significance level represents the probability of making the type I error. Type II error. It is the error we make when we accept H if it is false. In order to define this error, we are forced to declare an alternative hypothesis H as a formal definition of what is meant by H being false. The probability of type II error is the probability that the test leads to acceptance of H when in fact H prevails. When the alternative hypothesis is composite, there is no unique Type II error.

15 Machine learning p. 15/45 An analogy Consider the analogy with a murder trial, where we have as suspect Mr. Bean. The null hypothesis H is Mr. Bean is innocent. The dataset is the amount of evidence collected by the police against Mr. Bean. The Type I error is the error that we make if, being Mr. Bean innocent, we send him to penalty death. The Type II error is the error that we make if, being Mr. Bean guilty, we acquit him.

16 Machine learning p. 16/45 Hypothesis testing Suppose we have some data {z 1,...,z N } F from a distribution F. H and H represent two hypotheses about F. On the basis of the data, one is accepted and one is rejected. Note that the two hypotheses have different philosophical status (asymmetry). H is a conservative hypothesis, not to be rejected unless evidence is clear. This means that a type I error is more serious than a type II error (benefit of the doubt). It is often assumed that F belongs to a parametric family F(z, θ). The test on F becomes a test on θ. A particular example of hypothesis test is the goodness of fit test where we test H : F = F 0 against H : F F 0.

17 Machine learning p. 17/45 The five steps of hypothesis testing 1. Declare the null (e.g. H: honest student) and the alternative hypothesis ( H: cheat student) 2. Choose the numeric value of the type I error (e.g. the risk I want to run). 3. Choose a procedure to obtain test statistic (e.g. number of similar lines). 4. Determine the critical value of the test statistic (e.g. 4 identical lines) that leads to a rejection of H. This is done in order to ensure the Type I error defined in Step Obtain the data and determine whether the observed value of the test statistic leads to an acceptation or rejection of H.

18 Machine learning p. 18/45 Quality of the test Suppose that N students took part to the exam, N N did not copy, N P copied, ˆNN were considered not guilty and passed the exam ˆNP were considered guilty and rejected F P honest students were refused F N cheat students passed.

19 Machine learning p. 19/45 Confusion matrix Then we have Not refused Refused H: Not guilty student (-) T N F P N N H: Guilty student (+) F N T P N P ˆN N ˆNP N F P is the number of False Positives and the ratio F P /N N represents the type I error. F N is the number of False Negatives and the ratio F N /N P represents the type II error.

20 Machine learning p. 20/45 Specificity and sensitivity Specificity: the ratio (to be maximized) SP = T N F P + T N = T N N N = N N F P N N = 1 F P N N, 0 SP 1 It increases by reducing the number of false positive. Sensitivity: the ratio (to be maximized) SE = T P T P + F N = T P N P = N P F N N P = 1 F N N P, 0 SE 1 It increases by reducing the number of false negatives and corresponds to the power of the test (i.e. it estimates the quantity 1-Type II error).

21 Machine learning p. 21/45 Specificity and sensitivity (II) There exists a trade-off between these two quantities. In the case of a test who return always H (e.g. very kind professor) we have ˆN P = 0, ˆN N = N, F P = 0, T N = N N and SP = 1 but SE = 0. In the case of a test who return always H (e.g. very suspicious professor) we have ˆN P = N, ˆN N = 0, F N = 0, T P = N P and SE = 1 but SP = 0.

22 Machine learning p. 22/45 False Positive and False Negative Rate False Positive Rate: FPR = 1 SP = 1 T N F P + T N = F P F P + T N = F P N N, 0 FPR 1 It decreases by reducing the number of false positive and estimates the Type I error. False Negative Rate FNR = 1 SE = 1 T P T P + F N = F N T P + F N = F N N P 0 FPR 1 It decreases by reducing the number of false negative.

23 Machine learning p. 23/45 Predictive value Positive Predictive value: the ratio(to be maximized) PPV = T P T P + F P = T P ˆN P, 0 PPV 1 Negative Predictive value: the ratio (to be maximized) PNV = T N T N + F N = T N ˆN N, 0 PNV 1 False Discovery Rate: the ratio (to be minimized) FDR = F P T P + F P = F P ˆN P = 1 PPV, 0 FDR 1

24 Machine learning p. 24/45 Receiver Operating Characteristic curve The Receiver Operating Characteristic (also known as ROC curve) is a plot of the true positive rate (i.e. sensitivity or power) against the false positive rate (Type I error) for the different possible decision thresholds of a test. Consider an example where t + N(1, 1) and t N( 1, 1). Suppose that the examples are classed as positive if t > THR and negative if t < THR, where THR is a threshold. If THR =, all the examples are classed as positive: TN = FN = 0 which implies SE = T P N P = 1 and FPR = F P F P +T N = 1. If THR =, all the examples are classed as negative: TP = FP = 0 which implies SE = 0 and FPR = 0.

25 Machine learning p. 25/45 ROC curve SE FPR R script roc.r

26 Machine learning p. 26/45 Choice of test The choice of test and consequently the choice of the partition {S 0, S 1 } is based on two steps 1. Define a significance level α, that is the probability of type I error Prob {reject H H} = Prob {D N S 0 H} α that is the probability of incorrectly rejecting H 2. Among the set of tests {S 0, S 1 } of level α, choose the test that minimizes the probability of type II error Prob { accept H H } = Prob { D N S 1 H } that is the probability of incorrectly accepting H. This is equivalent to look for maximizing the power of the test Prob { reject H H } = Prob { D N S 0 H } = 1 Prob { D N S 1 H } which is the probability of correctly rejecting H. The higher the power, the better!

27 Machine learning p. 27/45 TP example Consider a r.v. z N(µ, σ 2 ), where σ is known and a set of N iid observations are given. We want to test the null hypothesis µ = µ 0 = 0, with α = 0.1 Consider the 3 critical regions S 0 1. ˆµ µ 0 > 1.645σ/ N 2. ˆµ µ 0 > 1.282σ/ N 3. ˆµ µ 0 < 0.126σ/ N For all these tests Prob {D N S 0 H} α, hence the significance level is the same. However if H : µ = 10 the type II error of the three tests is significantly different. What is the best one?

28 Machine learning p. 28/45 µ: H S TP example (II) µ: H On the left: distribution of the test statistic ˆµ if H : µ 0 = 0 is true. On the right: distribution of the test statistic ˆµ if H : µ 1 = 10 is true. The interval marked by S 1 denotes the set of observed ˆµ values for which H is accepted (non-critical region). The interval marked by S 0 denotes the set of observed ˆµ values for which H is rejected (critical region). The area of the black pattern region on the right equals Prob {D N S 0 H}, i.e. the probability of rejecting H when H is true (Type I error). The area of the grey shaded region on the left equals the probability of accepting H when H is false (Type II error). S

29 Machine learning p. 29/45 TP example (III) S 1 µ: H 0 10 S 0 S 1 µ: H On the left: distribution of the test statistic ˆµ if H : µ 0 = 0 is true. On the right: distribution of the test statistic ˆµ if H : µ 1 = 10 is true. The two intervals marked by S 1 denote the set of observed ˆµ values for which H is accepted (non-critical region). The interval marked by S 0 denotes the set of observed ˆµ values for which H is rejected (critical region). The area of the pattern region equals Prob {D N S 0 H}, i.e. the probability of rejecting H when H is true (Type I error). Which area corresponds to the probability of the Type II error?

30 Machine learning p. 30/45 Type of parametric tests Consider random variables with a parametric distribution F(, θ). One-sample vs. two-sample: in the one-sample test we consider a single r.v. and we formulate hypothesis about its distribution. In the two-samples test we consider 2 r.v. z 1 and z 2 and we formulate hypothesis about their differences/similarities. Simple vs composite: the test is simple if H describes completely the distributions of the involved r.v. otherwise it is composite. Single-sided (or one-tailed) vs Two-sided (or two-tailed): in the single-sided test the region of rejection concerns only one tail of the distribution of the null distribution. This means that H indicates the predicted direction of the difference (e.g. H : θ > θ 0 ). In the two-sided test, the region of rejection concern both tails of the null distribution. This means that H does not indicate the predicted direction of the difference (e.g. H : θ θ 0 ).

31 Machine learning p. 31/45 Example of parametric test Consider a parametric test on the distribution of a gaussian r.v., and suppose that the null hypothesis is H : θ = θ 0 where θ 0 is given and represents the mean. The test is one-sample and composite. In order to know whether it is one or two-sided we have to define the alternative configuration: if H : θ < θ 0 the test is one-sided down, if H : θ > θ 0 the test is one-sided up, if H : θ θ 0 the test is double-sided.

32 Machine learning p. 32/45 z-test (one-sample and one-sided) Consider a random sample D N x N(µ, σ 2 ) with µ unknown et σ 2 known. STEP 1: Consider the null hypothesis and the alternative (composite and one-sided) H : µ = µ 0 ; H : µ > µ0 STEP 2: fix the value α of the type I error. STEP 3: choose a test statistic: If H is true then the distribution of ˆµ is N(µ 0, σ 2 /N). This means that the variable z is z = (ˆµ µ 0) N N(0, 1) σ It is convenient to rephrase the test in terms of the test statistic z.

33 Machine learning p. 33/45 z-test (one-sample and one-sided) (II) STEP 4: determine the critical value for z. We reject the hypothesis H is rejected if z N > z α where z α is such that Prob {N(0, 1) > z α } = α. Ex: for α = 0.05 we would take z α = since 5% of the standard normal distribution lies to the right of R command: z α =qnorm(alpha,lower.tail=false) STEP 5: Once the dataset D N is measured, the value of the test statistic is z N = (ˆµ µ 0) N σ

34 Machine learning p. 34/45 TP: example z-test Consider a r.v. z N(µ, 1). We want to test H : µ = 5 against H : µ > 5 with significance level Supose that the data is D N = {5.1, 5.5, 4.9, 5.3}. Then ˆµ = 5.2 and z N = (5.2 5) 2/1 = 0.4. Since this is less than z α = 1.645, we do not reject the null hypothesis.

35 Machine learning p. 35/45 Two-sided parametric tests Assumption: all the variables are normal! Name one/two sample known H H z-test one σ 2 µ = µ 0 µ µ 0 z-test two σ1 2 = σ2 2 µ 1 = µ 2 µ 1 µ 2 t-test one µ = µ 0 µ µ 0 t-test two µ 1 = µ 2 µ 1 µ 2 χ 2 -test one µ σ 2 = σ0 2 σ 2 σ0 2 χ 2 -test one σ 2 = σ0 2 σ 2 σ0 2 F-test two σ1 2 = σ2 2 σ1 2 σ2 2

36 Machine learning p. 36/45 Student s t-distribution If x N(0, 1) and y χ 2 N are independent then the Student s t-distribution with N degrees of freedom is the distribution of the r.v. z = x y/n We denote this with z t N. If z 1,...,z N are i.i.d. N(µ, σ 2 ) then N(ˆµ µ) ŜS/(N 1) = N(ˆµ µ) ˆσ t N 1

37 Machine learning p. 37/45 t-test: one-sample and two-sided Consider a random sample from N(µ, σ 2 ) with σ 2 unknown. Let H : µ = µ 0 ; H : µ µ0 Let t(d N ) = T = 1 N 1 N(ˆµ µ0 ) = (ˆµ µ 0) N i=1 (z i ˆµ) 2 ˆσ 2 N a statistic computed using the data set D N.

38 Machine learning p. 38/45 t-test: one-sample and two-sided (II) It can be shown that if the hypothesis H holds, T T N 1 is a r.v. with a Student distribution with N 1 degrees of freedom. The size α t-test consists in rejecting H if T > k = t α/2,n 1 where t α/2,n 1 is the upper α point of a T -distribution on N 1 degrees of freedom, i.e. Prob { t N 1 > t α/2,n 1 } = α/2. where t N 1 T N 1. In other terms H is rejected when T is large. R command: t α/2,n 1 =qt(alpha/2,n-1,lower.tail=true)

39 Machine learning p. 39/45 TP example Does jogging lead to a reduction in pulse rate? Eight non jogging volunteers engaged in a one-month jogging programme. Their pulses were taken before and after the programme pulse rate before pulse rate after decrease Suppose that the decreases are samples from N(µ, σ 2 ) for some unknown σ 2. We want to test H : µ = µ 0 = 0 against H : µ 0 with a significance α = We have N = 8, ˆµ = 2.75, T = 1.263, t α/2,n 1 = Since T t α/2,n 1, the data is not sufficient to reject the hypothesis H. In other terms we have not enough evidence to show that there is a reduction in pulse rate.

40 Machine learning p. 40/45 The chi-squared distribution For a N positive integer, a r.v. z has a χ 2 N distribution if z = x x 2 N where x 1,x 2,...,x N are i.i.d. random variables N(0, 1). The probability distribution is a gamma distribution with parameters ( 1 2 N, 1 2 ). E[z] = N and Var[z] = 2N. The distribution is called a chi-squared distribution with N degrees of freedom.

41 Machine learning p. 41/45 χ 2 -test: one-sample and two-sided Consider a random sample from N(µ, σ 2 ) with µ known. Let H : σ 2 = σ0; 2 H : σ2 σ0 2 Let ŜS = i (z i µ) 2. It can be shown that if H is true then ŜS/σ2 0 χ 2 N The size α χ 2 -test rejects H if ŜS/σ2 0 < a 1 or ŜS/σ2 0 > a 2 where Prob {ŜS σ 2 0 < a 1 } + Prob {ŜS σ 2 0 > a 2 } = α If µ is unknown, you must 1. replace µ with ˆµ in the quantity ŜS 2. use a χ 2 N 1 distribution.

42 Machine learning p. 42/45 t-test: two-samples, two-sided Consider two r.v.s x N(µ 1, σ 2 ) and y N(µ 2, σ 2 ) with the same variance. Let DN x and Dy M two independent sets of samples. We want to test H : µ 1 = µ 2 against H : µ 1 µ 2. Let ˆµ x = N i=1 x i N, SS x = N (x i ˆµ x ) 2, ˆµ y = i=1 M i=1 y i M, SS y = M (y i ˆµ y ) 2 i=1 Once defined the statistic T = ( 1 M + 1 N ˆµ x ˆµ y ) ( SS x +SS y M+N 2 ) T M+N 2 it can be shown that a test of size α rejects H if T > t α/2,m+n 2

43 Machine learning p. 43/45 F-distribution Let x χ 2 M and y χ2 N be two independent r.v.. A r.v. z has a F-distribution F m,n with M and N degrees of freedom if If z F M,N then 1/z F N,M. If z T N then z 2 F 1,N. z = x/m y/n

44 Machine learning p. 44/45 F-distribution 0.9 F M,N density: M=20 N=10 1 F M,N cumulative distribution: M=20 N= R script s_f.r.

45 Machine learning p. 45/45 F-test: two-samples, two-sided Consider a random sample x 1,...,x M from N(µ 1, σ 2 1) and a random sample y 1,...,y N from N(µ 2, σ 2 2) with µ 1 and µ 2 unknown. Suppose we want to test Let us consider the statistic H : σ 2 1 = σ 2 2; H : σ 2 1 σ 2 2 f = ˆσ2 1 ˆσ 2 2 = ŜS 1/(M 1) ŜS 2 /(N 1) σ2 1 χ2 M 1 /(M 1) σ2 2χ2 N 1 /(N 1) = σ2 1 σ2 2 F M 1,N 1 It can be shown that if H is true, the ratio f has a F-distribution F M 1,N 1 We reject H if the ratio f is large, i.e. f > F α,m 1,N 1 where if z F M 1,N 1. Prob {z > F α,m 1,N 1 } = α

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

22. HYPOTHESIS TESTING

22. HYPOTHESIS TESTING 22. HYPOTHESIS TESTING Often, we need to make decisions based on incomplete information. Do the data support some belief ( hypothesis ) about the value of a population parameter? Is OJ Simpson guilty?

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935) Section 7.1 Introduction to Hypothesis Testing Schrodinger s cat quantum mechanics thought experiment (1935) Statistical Hypotheses A statistical hypothesis is a claim about a population. Null hypothesis

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Chapter 2. Hypothesis testing in one population

Chapter 2. Hypothesis testing in one population Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Testing a claim about a population mean

Testing a claim about a population mean Introductory Statistics Lectures Testing a claim about a population mean One sample hypothesis test of the mean Department of Mathematics Pima Community College Redistribution of this material is prohibited

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Mind on Statistics. Chapter 12

Mind on Statistics. Chapter 12 Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so: Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population. SAMPLING & INFERENTIAL STATISTICS Sampling is necessary to make inferences about a population. SAMPLING The group that you observe or collect data from is the sample. The group that you make generalizations

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010 MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Non-Inferiority Tests for One Mean

Non-Inferiority Tests for One Mean Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Testing Hypotheses About Proportions

Testing Hypotheses About Proportions Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

More information

Non-Inferiority Tests for Two Proportions

Non-Inferiority Tests for Two Proportions Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

1 Hypothesis Testing. H 0 : population parameter = hypothesized value:

1 Hypothesis Testing. H 0 : population parameter = hypothesized value: 1 Hypothesis Testing In Statistics, a hypothesis proposes a model for the world. Then we look at the data. If the data are consistent with that model, we have no reason to disbelieve the hypothesis. Data

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Hypothesis Testing --- One Mean

Hypothesis Testing --- One Mean Hypothesis Testing --- One Mean A hypothesis is simply a statement that something is true. Typically, there are two hypotheses in a hypothesis test: the null, and the alternative. Null Hypothesis The hypothesis

More information

STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013

STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013 STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1 Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Hypothesis Testing. Steps for a hypothesis test:

Hypothesis Testing. Steps for a hypothesis test: Hypothesis Testing Steps for a hypothesis test: 1. State the claim H 0 and the alternative, H a 2. Choose a significance level or use the given one. 3. Draw the sampling distribution based on the assumption

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

Performance Measures for Machine Learning

Performance Measures for Machine Learning Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,

More information

Tests of Hypotheses Using Statistics

Tests of Hypotheses Using Statistics Tests of Hypotheses Using Statistics Adam Massey and Steven J. Miller Mathematics Department Brown University Providence, RI 0292 Abstract We present the various methods of hypothesis testing that one

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

6: Introduction to Hypothesis Testing

6: Introduction to Hypothesis Testing 6: Introduction to Hypothesis Testing Significance testing is used to help make a judgment about a claim by addressing the question, Can the observed difference be attributed to chance? We break up significance

More information

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1. General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

Mind on Statistics. Chapter 13

Mind on Statistics. Chapter 13 Mind on Statistics Chapter 13 Sections 13.1-13.2 1. Which statement is not true about hypothesis tests? A. Hypothesis tests are only valid when the sample is representative of the population for the question

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Guide to Microsoft Excel for calculations, statistics, and plotting data

Guide to Microsoft Excel for calculations, statistics, and plotting data Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

More information

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery) Chapter 4 Statistical Inference in Quality Control and Improvement 許 湘 伶 Statistical Quality Control (D. C. Montgomery) Sampling distribution I a random sample of size n: if it is selected so that the

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i. Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Sample Practice problems - chapter 12-1 and 2 proportions for inference - Z Distributions Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide

More information

Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Tests for One Proportion

Tests for One Proportion Chapter 100 Tests for One Proportion Introduction The One-Sample Proportion Test is used to assess whether a population proportion (P1) is significantly different from a hypothesized value (P0). This is

More information

Chapter 26: Tests of Significance

Chapter 26: Tests of Significance Chapter 26: Tests of Significance Procedure: 1. State the null and alternative in words and in terms of a box model. 2. Find the test statistic: z = observed EV. SE 3. Calculate the P-value: The area under

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information