Lecture 5: Contingency Tables

Size: px
Start display at page:

Download "Lecture 5: Contingency Tables"

Transcription

1 Lecture 5: Contingency Tables Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 5: Contingency Tables p. 1/46

2 Overview Over the next few lectures, we will examine the 2 2 contingency table Some authors refer to this as a four fold table We will consider various study designs and their impact on the summary measures of association Lecture 5: Contingency Tables p. 2/46

3 Rows Fixed: Product Binomial Case - Generally Prospective Studies Question of interest: Does treatment affect outcome? OUTCOME NO COLD COLD Total T R VITAMIN E C <-- A (fixed T by M NO design) E VITAMIN <-- N C T Total Lecture 5: Contingency Tables p. 3/46

4 Columns Fixed: Also Product Binomial - Generally Retrospective Studies Question of interest: Does alcohol exposure vary among cases and controls? STATUS CASE CONTROL Total A L 80+ C (gm/day) O H O 0-79 L (gm/day) Total ˆ ˆ (fixed by design) Lecture 5: Contingency Tables p. 4/46

5 N Fixed: Multinomial Case - Generally Cross-Sectional Studies Question of interest: Is there an association among cancer stage and smoking status? CANCER STAGE NOT SPREAD SPREAD Total S M YES O K E NO Total <---(fixed by design) Lecture 5: Contingency Tables p. 5/46

6 Rows and Columns Fixed: Hypergeometric Case Question of Interest: Is there gender bias in juror selection? SELECTED FOR JURY YES NO Total G E N FEMALE D E R MALE Total This distribution is will be used in Fisher s Exact testing. Lecture 5: Contingency Tables p. 6/46

7 Prospective Studies We are going to begin examining contingency tables first by looking at prospective studies. Number on each treatment (or experimental) arm fixed by design. Rows are independent binomials. Question of interest: Does treatment affect outcome? Usually the design for Experimental Studies, Clinical Trials. In general, the 2 2 table is written as Outcome 1 2 Treatment 1 Y 1 n 1 Y 1 n 1 2 Y 2 n 2 Y 2 n 2 Lecture 5: Contingency Tables p. 7/46

8 Facts about the distribution n 1 and n 2 are fixed by design Y 1 and Y 2 are independent with distributions: Y 1 Bin(n 1, p 1 ) Y 2 Bin(n 2, p 2 ) The distribution is the product of 2 independent binomials; often called the product binomial : P(y 1, y 2 p 1, p 2 ) = P(Y 1 = y 1 p 1 )P(Y 2 = y 2 p 2 ) = n 1 y 1! n 2 y 2! p y 1 1 (1 p 1) n 1 y 1 p y 2 2 (1 p 2) n 2 y 2 Lecture 5: Contingency Tables p. 8/46

9 Question of interest (all the same) Does treatment affect outcome? Are treatment and outcome associated? Is the probability of success the same on both treatments? How do we quantify treatment differences? Also, what test statistics can we use for H 0 :p 1 = p 2 = p and the alternative is H A :p 1 p 2 Lecture 5: Contingency Tables p. 9/46

10 MLE and estimated SEs of treatment differences To estimate these treatment differences, we must estimate the success probabilities p 1 and p 2. Intuitively, thinking of the two groups separately, the MLE s should be the proportion of successes in the two groups, i.e., bp 1 = Y 1 n 1 and bp 2 = Y 2 n 2. However, we will derive these based on the likelihood of the product binomial. Lecture 5: Contingency Tables p. 10/46

11 The Likelihood for (p 1, p 2 ) is the product binomial distribution of (y 1, y 2, p 1, p 2 ). L(p 1, p 2 ) = P(Y 1 = y 1 p 1 )P(Y 2 = y 2 p 2 ) = n 1 y 1! n 2 y 2! p y 1 1 (1 p 1) n 1 y 1 p y 2 2 (1 p 2) n 2 y 2 Lecture 5: Contingency Tables p. 11/46

12 Then the log-likelihood is the sum of the two pieces, log L(p " 1, p 2 )! = log n 1 y 1 p y 1 1 (1 p 1) n 1 y 1 # + log " n 2 y 2! p y 2 2 (1 p 2) n 2 y 2 # Similar to before, to find the MLE, we set the partial derivatives of log L(p 1, p 2 ) with respect to p 1 and p 2 to 0, and solve for bp 1 and bp 2 : Note: Agresti (and most statisticians) simply denote the natural logarithm as log instead of the ln as you would see in mathematics or physics. In this class, all references of log are consider the log to base e. Lecture 5: Contingency Tables p. 12/46

13 Now, log L(p 1, p 2 ) = log " n 1 y 1! p y 1 1 (1 p 1) n 1 y 1 # + log " n 2 y 2! p y 2 2 (1 p 2) n 2 y 2 # The derivative of the log-likelihood with respect to p 1 is d log L(p 1,p 2 ) dp 1 = = d dp 1 log d dp 1 log " d dp 1 log since the the second part is not a function of p 1. " n 1 y 1 " n 1 y 1! n 2 y 2! p y 1 1 (1 p 1) n 1 y 1! # + # p y 2 2 (1 p 2) n 2 y 2 p y 1 1 (1 p 1) n 1 y 1 # + 0 Lecture 5: Contingency Tables p. 13/46

14 Note, though, dlog L(p 1, p 2 ) dp 1 " = d log dp 1 n 1 y 1! p y 1 1 (1 p 1) n 1 y 1 # is just the derivative of a binomial log-likelihood with respect to its parameter p 1. From before, we have bp 1 = y 1 n 1 To explicitly show this, in the single binomial section, we showed that d log L(p 1, p 2 ) dp 1 " = d log dp 1 n 1 y 1! p y 1 1 (1 p 1) n 1 y 1 # = y 1 n 1 p 1 p 1 (1 p 1 ) Similarly, d log L(p 1, p 2 ) dp 2 = y 2 n 2 p 2 p 2 (1 p 2 ) Lecture 5: Contingency Tables p. 14/46

15 Then, the MLE s are found by simultaneously solving and which gives d log L(p 1, p 2 ) dp 1 = y 1 n 1 bp 1 bp 1 (1 bp 1 ) = 0 d log L(p 1, p 2 ) dp 2 = y 2 n 2 bp 2 bp 2 (1 bp 2 ) = 0 bp 1 = y 1 n 1 and bp 2 = y 2 n 2. provided that bp 1, bp 2 0,1 Lecture 5: Contingency Tables p. 15/46

16 Since Y 1 and Y 2 are independent binomials we know that and V ar(bp 1 ) = p 1(1 p 1 ) n 1 V ar(bp 2 ) = p 2(1 p 2 ) n 2 Lecture 5: Contingency Tables p. 16/46

17 Estimating the treatment differences To obtain the MLE of the log-odds ratio, we just plug bp 1 and bp 2 in to get log(dor) = log bp1 /(1 bp 1 ) bp 2 /(1 bp 2 ) = logit(bp 1 ) logit(bp 2 ) Now, suppose we want to estimate the variance of log( d OR). Since the treatment groups are independent, logit(bp 1 ) and logit(bp 2 ) are independent, so that Cov[logit(bp 1 ), logit(bp 2 )] = 0, The variance of differences of independent random variables is V ar[log( d OR)] = V ar[logit(bp 1 ) logit(bp 2 )] = V ar[logit(bp 1 )] + V ar[logit(bp 2 )] Lecture 5: Contingency Tables p. 17/46

18 Delta method approximation The V ar[log( d OR)] can be approximated by the delta method To do so we need to calculate d d p [log(p) log(1 p)] = 1 p 1 1 p = 1 p(1 p) Therefore, Var log( p 1 p ) = = = 2 1 p(1 p) p(1 p) n 1 np(1 p) 1 np + 1 n(1 p) Using these results from the Delta Method, we have and V ar[logit(bp 1 )] = 1 n 1 p 1 + V ar[logit(bp 2 )] = 1 n 2 p n 1 (1 p 1 ) 1 n 2 (1 p 2 ) Lecture 5: Contingency Tables p. 18/46

19 Then, V ar[log( d OR)] = V ar[logit(bp 1 )] + V ar[logit(bp 2 )] = 1 n 1 p n 1 (1 p 1 ) n 2 p 2 1 n 2 (1 p 2 ) which we estimate by replacing p 1 and p 2 with bp 1 and bp 2, dv ar[log( d OR)] = 1 n 1 bp n 1 (1 bp 1 ) n 2 bp 2 1 n 2 (1 bp 2 ) = 1 y n 1 y y n 2 y 2 Note: This is the same result we obtained in the previous lecture; however, in this case we assumed two independent binomial distributions. Lecture 5: Contingency Tables p. 19/46

20 General formula for variance of treatment difference The MLE of a treatment difference θ = g(p 1 ) g(p 2 ) is ˆθ = g(bp 1 ) g(bp 2 ) Also, since bp 1 and bp 2 are independent, so g(bp 1 ) and g(bp 2 ) are independent. Recall, the variance of a difference of two independent random variables is V ar[g(bp 1 ) g(bp 2 )] = V ar[g(bp 1 )] + V ar[g(bp 2 )] Then, to obtain the large sample variance, we can apply the delta method to g(bp 1 ) to get V ar[g(bp 1 )] and to g(bp 2 ) to get V ar[g(bp 2 )] and then sum the two. Lecture 5: Contingency Tables p. 20/46

21 The results are summarized in the following table: TREATMENT DIFFERENCE ESTIMATE Var(ESTIMATE) RISK DIFF bp 1 bp 2 p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 log (RR) log bp1 bp2 log (OR) log 1 p 1 n 1 p p 2 n 2 p 2 h bp1 /(1 bp 1 ) 1 + bp 2 /(1 bp 2 ) n 1 p 1 i h n 1 (1 p 1 ) n 2 p 2 i 1 n 2 (1 p 2 ) Lecture 5: Contingency Tables p. 21/46

22 ESTIMATES of Standard Error, and LARGE SAMPLE CONFIDENCE INTERVALS To estimate the variances, we can replace p 1 and p 2 with bp 1 and bp 2. dv ar(bp 1 bp 2 ) = bp 1(1 bp 1 ) n 1 + bp 2(1 bp 2 ) n 2 ; dv ar[log( d RR)] = 1 bp 1 n 1 bp bp 2 n 2 bp 2 ; dv ar[log( d OR)] = 1 n 1 bp n 1 (1 bp 1 ) n 2 bp 2 1 n 2 (1 bp 2 ) = 1 y n 1 y y n 2 y 2 Lecture 5: Contingency Tables p. 22/46

23 Then Large Sample 95% confidence interval for treatment differences can be obtained via (bp 1 bp 2 ) ± 1.96 s bp 1 (1 bp 1 ) n 1 + bp 2(1 bp 2 ) n 2 log(drr) ± 1.96 s 1 bp 1 n 1 bp bp 2 n 2 bp 2 and log(dor) ± 1.96 s 1 y n 1 y y n 2 y 2 Lecture 5: Contingency Tables p. 23/46

24 Confidence Interval for OR and RR You want a confidence interval for RR or OR that is assured to be in the interval (0, ). Similar to what we did for a confidence interval for p, it is first better to get confidence intervals for log(rr) or log(or), and to exponentiate the endpoints : i.e., q exp{log( OR) d ± 1.96 dv ar[log( OR)]}, d and q exp{log( RR) d ± 1.96 dv ar[log( RR)]}, d Lecture 5: Contingency Tables p. 24/46

25 Example: MI example Suppose clinical trial participants are randomized to either Placebo or Aspirin The subjects are followed prospectively for 5 years to determine whether or not an MI (or heart attack) occurs The following table summarizes the results Myocardial Infarction Heart or No Total per Attack Attack Arm Placebo ,034 Aspirin ,037 About randomized to each treatment Overall probability of heart attack is low The disease is rare = 1.33% Lecture 5: Contingency Tables p. 25/46

26 Estimates and Test Statistics The test statistics for H 0 : p 1 = p 2 versus H A : p 1 p 2 Estimated Z Statistic Parameter Estimate Standard Error (Est/SE) RISK DIFF log(rr) (RR=1.818) log(or) (OR=1.832) In each case, we reject the null, and the Z statistic is about 5. Lecture 5: Contingency Tables p. 26/46

27 Confidence Intervals Creation The following are the 95% confidence intervals Parameter Estimate 95% C.I. RISK DIFF.0077 [.0047,.0107] RR [1.433,2.306] OR [1.440,2.331] For the OR and RR, we exponentiated the 95% confidence intervals for the log(or) and log(rr), respectively. None of the confidence intervals contain the null value for no association (0 for the RISK DIFFERENCE, 1 for the OR and RR). Lecture 5: Contingency Tables p. 27/46

28 Interpretation The risk difference has the interpretation that the Excess Risk of a heart attack on Placebo is This fraction is not very meaningful for rare diseases, but stated in terms of subjects, we can say that we would expect 77 more heart attacks in placebo subjects than in aspirin users. The relative risk has the interpretation that Individuals on Placebo have almost twice (1.8) the risk (or probability) of a heart attack than individuals on Aspirin The odds ratio has the interpretation that Individuals on Placebo have almost twice (1.8) the odds of a heart attack versus no heart attack than individuals on Aspirin Lecture 5: Contingency Tables p. 28/46

29 Relationship between OR and RR Recall, OR = p 1/(1 p 1 ) p 2 /(1 p 2 ) OR = p 1/(1 p 1 ) p 2 /(1 p 2 ) = h i p1 1 p2 p 2 1 p 1 h i = RR 1 p2 1 p 1 When the disease is rare (in the example, bp 2 < bp 1 < 2%),» 1 p2 1 1 p 1 1 = 1; and OR RR. In the example, d OR = 1.832, d RR = 1.818; i.e., they are almost identical. Lecture 5: Contingency Tables p. 29/46

30 Likelihood Ratio Test Now, we want to test the null hypothesis H 0 :p 1 = p 2 = p versus the alternative H A :p 1 p 2 with the likelihood ratio statistic (the likelihood ratio statistic generally has a two-sided alternative...i.e., it is χ 2 based). The general likelihood ratio statistic involves the estimate of p 1 = p 2 = p under the null and (p 1, p 2 ) under the alternative. Thus, unlike the simple single binomial sample we discussed earlier, in which the null was H 0 :p =.5 the parameters are not completely specified under the null. i.e., we must still estimate a common p under the null for the likelihood ratio. Lecture 5: Contingency Tables p. 30/46

31 General Likelihood Ratio Statistic The likelihood is a function of the parameter vector p = [p 1, p 2 ]. In large samples, it can be shown that 2 log n o L(bp1,bp 2 H A ) L( p 1, p 2 H 0 ) = 2[log L(bp 1, bp 2 H A ) log L( p 1, p 2 H 0 )] χ 2 df where L(bp 1, bp 2 H A ) is the likelihood after replacing [p 1, p 2 ] by its estimate, [bp 1, bp 2 ] under H A and L( p 1, p 2 H 0 ) is the likelihood after replacing [p 1, p 2 ] by its estimate, [ p 1, p 2 ], under H 0 (In our case, [ p 1, p 2 ] = [bp, bp] since p 1 = p 2 = p under the null ). and the degrees-of-freedom df is the difference in the number of parameters estimated under the alternative and null (In our example, df = 2 1 = 1). Lecture 5: Contingency Tables p. 31/46

32 MLE under the Null Thus, to use the likelihood ratio statistic, we need to estimate the common p under the null hypothesis. When H 0 : p 1 = p 2 = p, E(Y 1 ) = n 1 p and Then, E(Y 2 ) = n 2 p E(Y 1 + Y 2 ) = E(Y 1 ) + E(Y 2 ) = n 1 p + n 2 p = (n 1 + n 2 )p The pooled estimate of p is bp = «Y1 + Y 2 n 1 + n 2 = «total # successes total sample size which is unbiased and the MLE. Intuitively, when the probability of success is the same on both treatments, the best estimate (MLE) of p is obtained by pooling over the treatments. Lecture 5: Contingency Tables p. 32/46

33 Using the likelihood to obtain the MLE under the null Under the null H 0 :p 1 = p 2 = p, the MLE of p is obtained from the likelihood L(p) = n 1 y 1! n 2 y 2! p y 1(1 p) n 1 y 1 p y 2(1 p) n 2 y 2 = n 1 y 1! n 2 y 2! p y 1+y 2 (1 p) (n 1+n 2 ) (y 1 +y 2 ), Then, d log L(p) dp = " d log dp 1 n 1 y 1! n 2 y 2!# + d dp 1 log[p y 1+y 2 (1 p) (n 1+n 2 ) (y 1 +y 2 ) ] = y 1 + y 2 (n 1 + n 2 )p p(1 p) Lecture 5: Contingency Tables p. 33/46

34 This is the same first derivative as a single binomial sample, in fact, under the null, and it is easily shown that the solution is Y 1 + Y 2 Bin(n 1 + n 2, p), bp = Y 1 + Y 2 n 1 + n 2 Lecture 5: Contingency Tables p. 34/46

35 Using the Estimates to obtain the Likelihood Ratio Statistic Under the alternative, and log[l(bp 1, bp 2 H! A )] = log log bp 1 = Y 1 n 1 and bp 2 = Y 2 n 2, n 1 y 1 n 2 y 2 + y 1 log(bp 1 ) + (n 1 y 1 ) log(1 bp 1 )+! + y 2 log(bp 2 ) + (n 2 y 2 )log(1 bp 2 ) Lecture 5: Contingency Tables p. 35/46

36 Then, log[l(bp, bp H 0 )]! = log log n 1 y 1 n 2 y 2 + y 1 log(bp) + (n 1 y 1 ) log(1 bp)+! + y 2 log(bp) + (n 2 y 2 )log(1 bp) Under the alternative, we estimate 2 parameters, under the null, we estimated 1, so df = 2 1 = 1. Then, we take 2 times the differences in the log-likelihoods and compare it to a chi-square with 1 df. Lecture 5: Contingency Tables p. 36/46

37 Simplification of the Likelihood Ratio Statistic Then, the likelihood ratio statistic equals 2 times the difference in the log-likelihoods under the alternative and null, or G 2 = 2[y 1 log bp1bp + (n 1 y 1 )log +y 2 log bp2bp + (n 2 y 2 )log = 2[y 1 log y1 n 1 bp +y 2 log y2 n 2 bp χ 2 1 under the null, in large samples. (1 bp1 ) (1 bp) (1 bp2 ) (1 bp) ] + (n 1 y 1 )log n1 y 1 n 1 (1 bp) + (n 2 y 2 )log n2 y 2 n 2 (1 bp) ] + Lecture 5: Contingency Tables p. 37/46

38 OBSERVED and EXPECTED Cell Counts First, let s look at the (2 2) table of OBSERVED Cell Counts. OUTCOME 1 2 TRT 1 Y 1 (n 1 Y 1 ) n 1 2 Y 2 (n 2 Y 2 ) n 2 total Y 1 + Y 2 [(n 1 + n 2 ) (n 1 + n 2 ) (Y 1 + Y 1 )] Lecture 5: Contingency Tables p. 38/46

39 If we look at the likelihood ratio statistic, G 2 = 2[y 1 log y1 n 1 bp +y 2 log y2 n 2 bp + (n 1 y 1 )log n1 y 1 n 1 (1 bp) + (n 2 y 2 )log n2 y 2 n 2 (1 bp) In the numerator of the log s, we have the observed cell counts for the 4 cells in the table. Sometimes, statisticians let O ij denote the observed count in row i, column j, ] + O 11 = Y 1, O 12 = n 1 Y 1, O 21 = Y 2, O 22 = n 2 Y 2 Lecture 5: Contingency Tables p. 39/46

40 Then, we can rewrite the observed table as OUTCOME 1 2 TRT 1 O 11 O 12 O 11 + O 12 2 O 21 O 22 O 21 + O 22 total O 11 + O 21 O 12 + O 22 We will show that the likelihood ratio statistic is often written as G 2 = 2[y 1 log y1 n 1 bp +y 2 log y2 n 2 bp + (n 1 y 1 )log n1 y 1 n 1 (1 bp) + (n 2 y 2 )log n2 y 2 n 2 (1 bp) = 2 P 2 P 2 i=1 j=1 O Oij ij log, E ij ] + Lecture 5: Contingency Tables p. 40/46

41 Simple Form of the Estimated Expected Counts First, suppose p 1 p 2, Then, the (2 2) table of expected cell counts is OUTCOME 1 2 TRT 1 n 1 p 1 n 1 (1 p 1 ) n 1 2 n 2 p 2 n 2 (1 p 2 ) n 2 total n 1 p 1 + n 2 p 2 [(n 1 + n 2 ) (n 1 + n 2 ) (n 1 p 1 + n 2 p 2 )] If we look at the n 1 subjects in the first row, we expect n 1 p 1 subjects to have outcome 1, and n 1 (1 p 1 ) of them to have outcome 2. Similarly, if we look at the n 2 subjects in the second row, we expect n 2 p 2 subjects to have outcome 1, and n 2 (1 p 2 ) of them to have outcome 2. Lecture 5: Contingency Tables p. 41/46

42 Under the null, when the probability of success is the same on both treatments, p 1 = p 2 = p, the table of expected counts looks like OUTCOME 1 2 TRT 1 n 1 p n 1 (1 p) n 1 2 n 2 p n 2 (1 p) n 2 total (n 1 + n 2 )p [(n 1 + n 2 )(1 p)] (n 1 + n 2 ) Here, if we look at the n 1 subjects in the first row, we expect n 1 p subjects to have outcome 1, and n 1 (1 p) of them to have outcome 2. Similarly, if we look at the n 2 subjects in the second row, we expect n 2 p subjects to have outcome 1, and n 2 (1 p) of them to have outcome 2. Lecture 5: Contingency Tables p. 42/46

43 Under H 0 : p 1 = p 2 = p, the table of estimated expected counts looks like OUTCOME 1 2 TRT 1 n 1 bp n 1 (1 bp) n 1 2 n 2 bp n 2 (1 bp) n 2 total (n 1 + n 2 )bp [(n 1 + n 2 )(1 bp)] (n 1 + n 2 ) where, recall, bp is the pooled estimate of p, bp = «Y1 + Y 2 n 1 + n 2 = «total # successes. total sample size These estimated expected counts are denoted E ij, (i th row, j th column), and are found in the denominator of the likelihood ratio statistic, with E 11 = n 1 bp, E 12 = n 1 (1 bp), E 21 = n 2 bp, E 22 = n 2 (1 bp) Lecture 5: Contingency Tables p. 43/46

44 Simplification of Expected Cell Counts Substituting and bp = Y 1 + Y 2 n 1 + n 2, 1 bp = 1 Y 1 + Y 2 n 1 + n 2 = (n 1 + n 2 ) (Y 1 + Y 2 ) n 1 + n 2, in the table, we get the E ij s, TRT 1 2 OUTCOME 1 2 n 1 (Y 1 +Y 2 ) n 1 +n 2 n 1 [(n 1 +n 2 ) (Y 1 +Y 2 )] n 1 +n 2 n 1 n 2 (Y 1 +Y 2 ) n 1 +n 2 n 2 [(n 1 +n 2 ) (Y 1 +Y 2 )] n 1 +n 2 n 2 total (Y 1 + Y 2 ) [(n 1 + n 2 ) (n 1 + n 2 ) (Y 1 + Y 2 )] Lecture 5: Contingency Tables p. 44/46

45 From this table, you can see that E ij = [ith row total] [j th column total] [total sample size (n 1 + n 2 ) ] Lecture 5: Contingency Tables p. 45/46

46 Summary We did all this to show that 2X 2X G 2 = 2 O ij log i=1 j=1 Oij E ij «Note that, we can also write this as 2X 2X G 2 = 2 O ij [log(o ij ) log(e ij )] i=1 j=1 Writing it this way, we see that the likelihood ratio measures the discrepancy between the log of the observed counts, and the log of estimated expected counts under the null; if they are similar, you would expect the statistic to be small, and the null not to be rejected. Lecture 5: Contingency Tables p. 46/46

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Lecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared

More information

Categorical Data Analysis

Categorical Data Analysis Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

More information

Lesson 14 14 Outline Outline

Lesson 14 14 Outline Outline Lesson 14 Confidence Intervals of Odds Ratio and Relative Risk Lesson 14 Outline Lesson 14 covers Confidence Interval of an Odds Ratio Review of Odds Ratio Sampling distribution of OR on natural log scale

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

Topic 8. Chi Square Tests

Topic 8. Chi Square Tests BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

Prospective, retrospective, and cross-sectional studies

Prospective, retrospective, and cross-sectional studies Prospective, retrospective, and cross-sectional studies Patrick Breheny April 3 Patrick Breheny Introduction to Biostatistics (171:161) 1/17 Study designs that can be analyzed with χ 2 -tests One reason

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

Non-Inferiority Tests for Two Proportions

Non-Inferiority Tests for Two Proportions Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

More information

Is it statistically significant? The chi-square test

Is it statistically significant? The chi-square test UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Section 12 Part 2. Chi-square test

Section 12 Part 2. Chi-square test Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

More information

Chapter 23. Two Categorical Variables: The Chi-Square Test

Chapter 23. Two Categorical Variables: The Chi-Square Test Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Chapter 19 The Chi-Square Test

Chapter 19 The Chi-Square Test Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Two Independent Samples Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Testing differences in proportions

Testing differences in proportions Testing differences in proportions Murray J Fisher RN, ITU Cert., DipAppSc, BHSc, MHPEd, PhD Senior Lecturer and Director Preregistration Programs Sydney Nursing School (MO2) University of Sydney NSW 2006

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails. Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

More information

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2 Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data) 22S:101 Biostatistics: J. Huang 1 Bayes Theorem For two events A and B, if we know the conditional probability P (B A) and the probability P (A), then the Bayes theorem tells that we can compute the conditional

More information

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

More information

Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

More information

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters. Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Chi Square Distribution

Chi Square Distribution 17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

More information

This chapter discusses some of the basic concepts in inferential statistics.

This chapter discusses some of the basic concepts in inferential statistics. Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details

More information

Solutions to Homework 10 Statistics 302 Professor Larget

Solutions to Homework 10 Statistics 302 Professor Larget s to Homework 10 Statistics 302 Professor Larget Textbook Exercises 7.14 Rock-Paper-Scissors (Graded for Accurateness) In Data 6.1 on page 367 we see a table, reproduced in the table below that shows the

More information

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS Hypothesis 1: People are resistant to the technological change in the security system of the organization. Hypothesis 2: information hacked and misused. Lack

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Elementary Statistics

Elementary Statistics lementary Statistics Chap10 Dr. Ghamsary Page 1 lementary Statistics M. Ghamsary, Ph.D. Chapter 10 Chi-square Test for Goodness of fit and Contingency tables lementary Statistics Chap10 Dr. Ghamsary Page

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Statistical estimation using confidence intervals

Statistical estimation using confidence intervals 0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

More information

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions Typical Inference Problem Definition of Sampling Distribution 3 Approaches to Understanding Sampling Dist. Applying 68-95-99.7 Rule

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

More information

Mind on Statistics. Chapter 4

Mind on Statistics. Chapter 4 Mind on Statistics Chapter 4 Sections 4.1 Questions 1 to 4: The table below shows the counts by gender and highest degree attained for 498 respondents in the General Social Survey. Highest Degree Gender

More information

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

People like to clump things into categories. Virtually every research

People like to clump things into categories. Virtually every research 05-Elliott-4987.qxd 7/18/2006 5:26 PM Page 113 5 Analysis of Categorical Data People like to clump things into categories. Virtually every research project categorizes some of its observations into neat,

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1

Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals. 1 BASIC STATISTICAL THEORY / 3 CHAPTER ONE BASIC STATISTICAL THEORY "Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1 Medicine

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

STRUTS: Statistical Rules of Thumb. Seattle, WA

STRUTS: Statistical Rules of Thumb. Seattle, WA STRUTS: Statistical Rules of Thumb Gerald van Belle Departments of Environmental Health and Biostatistics University ofwashington Seattle, WA 98195-4691 Steven P. Millard Probability, Statistics and Information

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

individualdifferences

individualdifferences 1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

WHERE DOES THE 10% CONDITION COME FROM?

WHERE DOES THE 10% CONDITION COME FROM? 1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory LA-UR-12-24572 Approved for public release; distribution is unlimited Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory Alicia Garcia-Lopez Steven R. Booth September 2012

More information

Integrating algebraic fractions

Integrating algebraic fractions Integrating algebraic fractions Sometimes the integral of an algebraic fraction can be found by first epressing the algebraic fraction as the sum of its partial fractions. In this unit we will illustrate

More information

RATIOS, PROPORTIONS, PERCENTAGES, AND RATES

RATIOS, PROPORTIONS, PERCENTAGES, AND RATES RATIOS, PROPORTIOS, PERCETAGES, AD RATES 1. Ratios: ratios are one number expressed in relation to another by dividing the one number by the other. For example, the sex ratio of Delaware in 1990 was: 343,200

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information