Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions



Similar documents
Chi-square test Fisher s Exact test

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Is it statistically significant? The chi-square test

Section 12 Part 2. Chi-square test

Cohort Studies. Sukon Kanchanaraksa, PhD Johns Hopkins University

Testing differences in proportions

Guide to Biostatistics

Chapter 23. Two Categorical Variables: The Chi-Square Test

Chi Square Distribution

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Data Analysis, Research Study Design and the IRB

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

People like to clump things into categories. Virtually every research

One-Way Analysis of Variance (ANOVA) Example Problem

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Simulating Chi-Square Test Using Excel

11. Analysis of Case-control Studies Logistic Regression

Topic 8. Chi Square Tests

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Chi Square Tests. Chapter Introduction

Chapter 19 The Chi-Square Test

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Study Design and Statistical Analysis

Poisson Models for Count Data

Simple Linear Regression Inference

University of Colorado Campus Box 470 Boulder, CO (303) Fax (303)

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Two Correlated Proportions (McNemar Test)

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

Testing Research and Statistical Hypotheses

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Sample Size Planning, Calculation, and Justification

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

8 6 X 2 Test for a Variance or Standard Deviation

Statistical Rules of Thumb

Organizing Your Approach to a Data Analysis

Elementary Statistics Sample Exam #3

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Analysis of categorical data: Course quiz instructions for SPSS

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Prospective, retrospective, and cross-sectional studies

Biostatistics: Types of Data Analysis

Tests for Two Proportions

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Come scegliere un test statistico

An analysis method for a quantitative outcome and two categorical explanatory variables.

Multiple samples: Pairwise comparisons and categorical outcomes

2 GENETIC DATA ANALYSIS

SAS Software to Fit the Generalized Linear Model

Recall this chart that showed how most of our course would be organized:

PRACTICE PROBLEMS FOR BIOSTATISTICS

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Confidence Intervals for Cp

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Analysis of Variance. MINITAB User s Guide 2 3-1

Online 12 - Sections 9.1 and 9.2-Doug Ensley

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

MTH 140 Statistics Videos

Two-sample inference: Continuous data

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Crosstabulation & Chi Square

Descriptive Statistics

The Chi-Square Test. STAT E-50 Introduction to Statistics

Binary Diagnostic Tests Two Independent Samples

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

12: Analysis of Variance. Introduction

First-year Statistics for Psychology Students Through Worked Examples

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

06 Validation of risk prediction model

Competency 1 Describe the role of epidemiology in public health

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Confidence Intervals for Exponential Reliability

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

VI. Introduction to Logistic Regression

Study Guide for the Final Exam

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

Elementary Statistics

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Final Exam Practice Problem Answers

Measurement in ediscovery

Using Excel for inferential statistics

Nonparametric Statistics

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Categorical Data Analysis

Transcription:

BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared and Fisher's exact tests Fisher's exact test is a computationally intensive test Chi squared test provides a good approximation in many cases Both tests are for purely categorical data i.e. looking at the count of values in given categories There are two distinct uses for these tests: Testing if data match an expected distribution Comparing proportions among different groups Observed vs Expected Distributions The chi-squared test can be used to test whether a discrete distribution of results follow a predicted or expected distribution. Example (from Motulsky): Study from 2007 investigated heart disease among firefighters Hypothesized that risk of death from heart disease was related to duty the firefighter was performing at the time Null hypothesis is that risk of death from heart disease is independent of duty performed Kales et al. study Duty Number of heart attacks observed Proportion of time spent in duty Fire Suppression 144 2.0% Alarm response 138 16.0% Physical training 56 8.0% Other duties 111 74.0% Total 449 100.0% Under the null hypothesis, the number of heart attack deaths occuring during each duty would be proportional to the time spent in that duty So, for example, the number of deaths expected during fire suppression would be 2.0% of 449 = 9.0. 1/6

Observed and Expected values Duty Number of heart attacks observed Expected number Fire Suppression 144 9.0 Alarm response 138 71.8 Physical training 56 35.9 Other duties 111 332.3 Total 449 449.0 The number of deaths from heart attack while on active duty, and particularly while actively working on fire suppression, is far higher than the expected number The Chi-squared goodness of fit test The Chi-squared goodness of fit test computes a p-value for the following question: If the data were distributed as defined in the null hypothesis, what is the probability of seeing a discrepancy between the observed and expected values this large solely by random sample selection? In our example, distribution for the null hypothesis is merely the one in which the chance of death from heart disease is randomly distributed according to time The chi-squared statistic is defined as follows: For each category: Subtract the expected value from the observed value and squared the difference Divide the square by the expected value Sum these up. This is the chi-squared (χ 2 ) value. The chi-squared distribution The chi-squared value under the null hypothesis follows a known distribution that depends on the number of degrees of freedom The degrees of freedom in a goodness of fit test is the number of categories minus one In the firefighter example there are three degrees of freedom The p-value is obtained by comparing the number of degrees of freedom to a table of χ 2 values, or by using statistical software. For our example, χ 2 =2245, d.f.=3, and p<10-6 Chi-squared goodness of fit test: cautions When using the chi-squared goodness of fit test: Always use actual count data for the observed numbers Do not use percentages or normalized values 2/6

The observed values should always be integers If the expected number is less than 5 in any category, or less than 10 if there are only two categories, the results may be invalid. For two categories, a binomial test may be used instead Or the sample size can be increased to achieve the required expected values Do not confuse this test (goodness of fit test) with the chi-squared test for comparing proportions (test of independence). Proportion comparison studies Many studies, particularly clinical studies, answer a question of the type "Does exposure to a risk factor (or a specific treatment, etc) change the rate of disease?" Note that in these studies, both the dependent variable (disease status) and the independent variable (treated vs untreated, etc), are categorical Types of study Some jargon: Incidence: rate of new cases of disease Prevalence: proportion of a sample which has the disease Cross-sectional study: a sample is chosen without control as to how many are affected with the disease or as to how many were exposed to the risk factor. The subjects are divided as to exposure to the risk factor and disease prevalence is compared. Prospective or longditudinal study: Two groups are selected, one exposed to a risk factor (or treatment), one not. They are followed over the natural timeline of the disease, and disease incidence is compared. Experimental study: A single sample is chosen and randomly divided into two groups. One group is treated (or exposed to a risk factor), one is not. Incidence is compared between the two groups. Case-control or retrospective study: Two groups are selected, one with the disease, one without. The number exposed to the risk factor (or treated, etc) is compared between the two groups. Contingency Tables Data from all these types of study may be summarized in a contingency table Rows in the table represent exposure to the risk factor or treatment status Columns in the table represent disease status Cells in the table are counts of the number of subjects in that category Disease No disease Total Exposed/Treated A B A+B Not Exposed/Not Treated C D C+D 3/6

Total A+C B+D A+B+C+D Example experimental study Example study we saw previously: CABG vs PTCA in coronary artery disease patients Sample of 1829 patients with CAD. 914 Randomly assigned to bypass (CABG), 915 to angioplasty (PTCA). Focus on five-year survival rates Survived 5 years Did not survive 5 years Total CABG 542 372 914 PCTA 537 378 915 Total 1079 750 1829 40.7% of those receiving CABG died within five years, and 41.3% of those receiving PTCA died within five years Clearly little difference Diabetes and CABG/PTCA Study also looked at survival rates for CABG vs PTCA among patients treated for diabetes Note this is still an experimental study as the treatments were assigned and the outcome was measured subsequently Even though the diabetes diagnosis was retrieved retrospectively Five year survival among diabetic patients: Survived 5 years Did not survive 5 years Total CABG 93 87 180 PCTA 69 104 173 Total 162 191 353 Mortality rates were 48.3% for CABG-treated patients and 60.1% for PTCA-treated patients Diabetes and CABG/PTCA: Data Analysis Aim is to know the extent to which these data generalize to the general population CAD patients with diabetes One way is to compute confidence intervals for these proportions Saw how to do this in lecture 2. 95% CI for CABG is 41% to 56% and for PTCA is 53% to 67% The CIs overlap, but this does not mean the difference is not statistically significant Attributable Risk and NNT The difference between the two proportions is called the attributable risk The amount of risk which can be attributed to the treatment or exposure to risk factor 4/6

For our example the attributable risk is 60.1%-48.3%=11.8%. This means that for diabetic patients there is an 11.8% risk of mortality in five years associated with choosing PTCA over CABG. The reciprocal of attributable risk is called the Number needed to treat (NNT). For this example, NNT=1/0.118=8.5. This means that if we choose to give CAD patients with diabetes CABG instead of PTCA, for every 8.5 patients one will survive five years who would not have done under PTCA. Relative Risk The relative risk is the ratio of the risks. In this example it is 48.3%/60.1%=0.80. This means that CAD-diabetic patients treated with CABG have 80% of the chance of mortality of PTCA treated patients. The confidence interval of the relative risk can also be computed. For this example the 95% confidence interval is 66% to 98%. Be careful with the percentages: For attributable risk, the percentages represent the percentages of subjects For relative risk, the proportion 0.80 (80%) represents a relative probability; it's the proportion of risk assumed by one group relative to the other group Attributable Risk or Relative Risk? Whether attributable risk (or NNT) or relative risk are more useful depends on the context When making a choice between treatments, the relative risk is often more intutive Risk under CABG is 80% of the risk under PTCA But on a population level, the relative risk can be misleading Imagine a vaccine which reduces the occurence of a disease by 60% Relative risk of taking the vaccine is 40% compared to no vaccine The utility of the vaccine depends on the prevalence of the disease in the general population If the disease is very rare, say prevalence is 1 in 1 million, then administering the vaccine will save 4 in 10 million people If the disease is common, say prevalence of 1%, then the vaccine will save 4 in 1000 people Fisher's Exact Test The data analysis can be supplemented with a p-value Remember the p-value cannot be interpreted without a null hypothesis The null hypothesis is that the categories corresponding to the rows in the contingency table are independent of the categories corresponding to the columns. In this example, the null hypothesis is that the mortality rate does not depend on which treatment was used The p-value is the probability of seeing mortality rates this different under the assumption of the null hypothesis. The best way to compute a p-value for contingency table data is with Fisher's Exact Test. For very large sample sizes (in the 100,000s and above), this test is computationally 5/6

Summary unwieldy, and a chi-squared test can be used as an approximation For the example data, p=0.0325 by Fisher's Exact Test Chi-squared and Fisher's Exact Tests can be used for two scenarios: Testing if categorical data match an expected distribution Testing for independence of two categorical variables i.e. testing if proportions are equivalent among two or more sets of counted data Chi-squared goodness of fit test used to test if categorical data matches an expected distribution Approximate test Good when expected values are at least 5 in each category (at least 10 if only two categories) Must use actual counts, not proportions or normalized values Contingency tables tabluate subject counts in different categories Usually use rows in the table for the independent (predictor) variable Columns for the dependent (outcome) variable Summary (continued) Attributable risk is the difference in proportions between treatment categories Number needed to treat (NNT) is the reciprocal of attributable risk Relative risk is the ratio of proportions between treatment categories Fisher's exact test can be used to compute a p-value for the null hypothesis that there is no relationship between the dependent and independent variable (i.e. the variables are independent of each other) Computationally prohibitive for very large data sets Can use chi-squared test for independence instead (but never needed in practice) 6/6