Multiple samples: Pairwise comparisons and categorical outcomes



Similar documents
Two-sample inference: Continuous data

Analysis of Variance ANOVA

Statistics for Sports Medicine

Chi-square test Fisher s Exact test

Section 13, Part 1 ANOVA. Analysis Of Variance

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Recall this chart that showed how most of our course would be organized:

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Testing Research and Statistical Hypotheses

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chapter 23. Two Categorical Variables: The Chi-Square Test

Association Between Variables

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Comparing Means in Two Populations

Testing differences in proportions

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Section 12 Part 2. Chi-square test

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

13: Additional ANOVA Topics. Post hoc Comparisons

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Multiple-Comparison Procedures

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Data Analysis, Research Study Design and the IRB

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Using Excel for inferential statistics

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

One-Way Analysis of Variance

Contingency Tables and the Chi Square Statistic. Interpreting Computer Printouts and Constructing Tables

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

1 Nonparametric Statistics

The Dummy s Guide to Data Analysis Using SPSS

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Chapter 5 Analysis of variance SPSS Analysis of variance

Principles of Hypothesis Testing for Public Health

Nonparametric Statistics

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Topic 8. Chi Square Tests

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

3.4 Statistical inference for 2 populations based on two samples

Week 4: Standard Error and Confidence Intervals

UNDERSTANDING THE TWO-WAY ANOVA

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Final Exam Practice Problem Answers

Tests for Two Proportions

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Biostatistics: Types of Data Analysis

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Descriptive Statistics

1 Basic ANOVA concepts

II. DISTRIBUTIONS distribution normal distribution. standard scores

ABSORBENCY OF PAPER TOWELS

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Hypothesis testing - Steps

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Is it statistically significant? The chi-square test

How To Check For Differences In The One Way Anova

Hypothesis Testing for Beginners

Generalized Linear Models

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Parametric and non-parametric statistical methods for the life sciences - Session I

Chapter 14: Repeated Measures Analysis of Variance (ANOVA)

This chapter discusses some of the basic concepts in inferential statistics.

Two Correlated Proportions (McNemar Test)

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Statistical Models in R

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Math 108 Exam 3 Solutions Spring 00

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

SPSS Explore procedure

Research Methodology: Tools

Elementary Statistics Sample Exam #3

Chapter 7 Section 7.1: Inference for the Mean of a Population

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Chapter 2 Probability Topics SPSS T tests

THE KRUSKAL WALLLIS TEST

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

2 Sample t-test (unequal sample sizes and unequal variances)

11. Analysis of Case-control Studies Logistic Regression

Prospective, retrospective, and cross-sectional studies

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Simple Linear Regression Inference

Linear Models in STATA and ANOVA

PRACTICE PROBLEMS FOR BIOSTATISTICS

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Projects Involving Statistics (& SPSS)

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Tests for One Proportion

This clinical study synopsis is provided in line with Boehringer Ingelheim s Policy on Transparency and Publication of Clinical Study Data.

Transcription:

Multiple samples: Pairwise comparisons and categorical outcomes Patrick Breheny May 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/19

Introduction Pairwise comparisons In the previous lecture, we saw how one could use ANOVA with the tailgating study to test the hypothesis that the average following distances in all four of the groups were the same There was strong evidence (p = 0.006 using the rank transformation) that this was not the case We then looked at the estimated means and confidence intervals for the average quantile in each group and remarked that it looked like the MDMA group had much lower following distances than the other three groups, but that the alcohol, marijuana, drug-free groups were about the same Patrick Breheny Introduction to Biostatistics (171:161) 2/19

Pairwise comparisons In many ways, this is fine our primary analysis determined that there was a difference among the means, and the rest is just commentary about which of those differences are most substantial However, it is often desirable to have a formal, objective criterion for deciding which pairs are significantly different from each other One approach would be to carry out all 6 pairwise comparisons with a Bonferroni correction to the significance levels Patrick Breheny Introduction to Biostatistics (171:161) 3/19

Tukey s range distribution A somewhat more powerful approach, however, is to use Tukey s range test (also known as Tukey s Honest Significant Difference test and the Tukey-Kramer method) Tukey s idea was to focus on the distribution of the largest difference (i.e., the range) in the means of multiple groups, and he worked out mathematical expressions for the distribution of the range when comparing K sample means, all of which have the same population mean With these expressions, we can set a threshold of, say, the 95th percentile of the range distribution; doing so directly controls the family-wise error rate as there is only a 5% chance that a single pairwise difference will exceed this bound Patrick Breheny Introduction to Biostatistics (171:161) 4/19

Tukey s range distribution N A T N T A T M N MM A 0.4 0.3 Density 0.2 0.1 0.0 0.0 0.5 1.0 1.5 Range Patrick Breheny Introduction to Biostatistics (171:161) 5/19

Adjusted p-values Pairwise comparisons Calculating tail areas, we arrive at adjusted p-values 1 : Difference Adjusted in means 2 p-value MDMA-ALC 0.29 0.01 MDMA-NODRUG 0.27 0.01 MDMA-THC 0.22 0.04 THC-ALC 0.07 0.78 THC-NODRUG 0.05 0.87 ALC-NODRUG -0.02 0.99 1 The range distribution on the previous page was for n = 30 in each group. In the actual study, some groups had more than 30 and others less than 30, which is properly accounted for in the above p-values 2 Difference in average quantiles, since we carried out a rank transformation Patrick Breheny Introduction to Biostatistics (171:161) 6/19

Bonferroni, Tukey, and Unadjusted p-values Difference Adjusted p-values in means p Bonferroni Tukey MDMA-ALC 0.29 0.002 0.01 0.01 MDMA-NODRUG 0.27 0.001 0.01 0.01 MDMA-THC 0.22 0.008 0.05 0.04 THC-ALC 0.07 0.349 1.00 0.78 THC-NODRUG 0.05 0.448 1.00 0.87 NODRUG-ALC -0.02 0.780 1.00 0.99 Patrick Breheny Introduction to Biostatistics (171:161) 7/19

Testosterone treatment for low libido Let s look at another example of a multiple group study, this one from 2008 and examining the use of testosterone as a therapy for diminished libido in postmenopausal women In the study, women were randomly assigned to three groups: one group received a patch delivering 300 µg of testosterone per day, one group received a patch delivering 150 µg of testosterone per day, and one group was assigned to placebo (a patch delivering no testosterone) Potentially, we are interested in three different comparisons here: comparing each of the two treatments to the placebo, as well as comparing the two testosterone treatments to each other Patrick Breheny Introduction to Biostatistics (171:161) 8/19

Outcome: Number of satisfying episodes As a primary outcome, the authors looked at the number of satisfying sexual episodes over the final four-week-period of the study: Testosterone for Low Libido in Postmenopausal Women Not Taking No. of Satisfying Episodes 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 0.7 Placebo (N=265) 1.2 TPT, 150 µg (N=252) P<0.001 P<0.001 2.1 P=0.02 2.0 1.2 TPT, 300 µg (N=254) 0.5 Placebo (N=196) Figure from Davis, et 1.5 al. (2008), Testosterone for Low Libido in Postmenopausal Women Not Taking Estrogen, N Engl J Med, 359:2005-17: TPT, 150 µg (N=187) TPT, 300 µg (N=189) Additiona Baseline Placebo (N=69) Intention-to-Treat Population Subgroup with Natural Subgroup w Patrick Breheny Introduction to Biostatistics Menopause(171:161) Induced 9/19 T 15 (N

ANOVA Pairwise comparisons Performing an ANOVA comparing the three-mean model and the common-mean model: RSS 0 = 9, 668 RSS 1 = 9, 408 The three-mean model explains 2.7% of the variability in frequency of satisfying episodes: 9, 668 9, 408 9, 668 =.027 A total of 771 women completed the study, so ˆσ 2 = 12.25, F = 10.6, and p = 0.00003 Patrick Breheny Introduction to Biostatistics (171:161) 10/19

Estimates (and CIs) for difference from baseline Difference from baseline four-week frequency of satisfying sexual episodes: Patrick Breheny Introduction to Biostatistics (171:161) 11/19

Significance of pairwise comparisons Difference p p Tukey High-Placebo 1.4 < 0.0001 < 0.0001 Low-Placebo 0.5 0.11 0.17 High-Low 0.9 0.01 0.01 Patrick Breheny Introduction to Biostatistics (171:161) 12/19

Complications from testosterone therapy The study also looked at complications arising from testosterone therapy We ll take a closer look at two binary (yes/no) complications to get a sense of how to analyze multiple-sample studies involving categorical outcomes Patrick Breheny Introduction to Biostatistics (171:161) 13/19

Complications from acne Acne Yes No Placebo 14 263 Low dose (150 µg) 15 252 High dose (300 µg) 16 251 Patrick Breheny Introduction to Biostatistics (171:161) 14/19

Expected counts We can perform a test of the overall hypothesis that the complication rate is the same in all three groups by using a χ 2 test, which proceeds exactly as in the 2 2 case We begin by calculating expected counts under the null: Acne Yes No Placebo 15.4 261.6 Low dose (150 µg) 14.8 252.2 High dose (300 µg) 14.8 251.2 Patrick Breheny Introduction to Biostatistics (171:161) 15/19

χ 2 and Fisher tests The χ 2 test statistic here is 0.23 For tables bigger than 2 2, we can still use a χ 2 distribution, but the degrees of freedom change; specifically, df = (I 1)(J 1) where I and J are the number of rows and columns of the table Comparing X 2 = 0.23 to a χ 2 distribution with 2 df, we obtain p = 0.89 Fisher s Exact Test can also be applied to larger tables, and yields a very similar result: 0.91 Patrick Breheny Introduction to Biostatistics (171:161) 16/19

Increased hair growth So there s no evidence that testosterone therapy leads to acne, but what about increased hair growth? Increased hair growth Yes No Placebo 29 248 Low dose (150 µg) 31 236 High dose (300 µg) 53 214 Patrick Breheny Introduction to Biostatistics (171:161) 17/19

Expected counts Increased hair growth Yes No Placebo 38.6 238.4 Low dose (150 µg) 37.2 229.8 High dose (300 µg) 37.2 229.8 X 2 = 11.8; p = 0.003 Patrick Breheny Introduction to Biostatistics (171:161) 18/19

Odds ratios Pairwise comparisons So there is clear evidence that testosterone therapy increases hair growth As for continuous outcomes, we often wish to follow up on a significant finding with pairwise comparisons: High dose vs. placebo: OR=2.1 (1.3, 3.6) Low dose vs. placebo: OR=1.1 (0.6, 2.0) High dose vs. low dose: OR=1.9 (1.1, 3.2) So in conclusion, the 300 µg dose of testosterone definitely improves sexual desire in postmenopausal women, but leads to increased hair growth There is no evidence that the 150 µg dose leads to any complications, but it seems to have little benefit Patrick Breheny Introduction to Biostatistics (171:161) 19/19