Laboratory 3 Type I, II Error, Sample Size, Statistical Power



Similar documents
Young Researchers Seminar 2011

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

How to get accurate sample size and power with nquery Advisor R

Data Analysis, Research Study Design and the IRB

>

Pearson's Correlation Tests

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Collection and Use of Industry and Occupation Data III: Cancer surveillance findings among construction workers

Lesson Outline Outline

Case-control studies. Alfredo Morabia

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

PREDICTION OF INDIVIDUAL CELL FREQUENCIES IN THE COMBINED 2 2 TABLE UNDER NO CONFOUNDING IN STRATIFIED CASE-CONTROL STUDIES

Introduction to proc glm

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Risk pricing for Australian Motor Insurance

University of Colorado Campus Box 470 Boulder, CO (303) Fax (303)

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Confidence Intervals for the Difference Between Two Means

Alex Vidras, David Tysinger. Merkle Inc.

C:\Word\documentation\SAS\Class2\SASLevel2.doc 3/7/2013 Biostatistics Consulting Center

A CASE-CONTROL STUDY OF DOG BITE RISK FACTORS IN A DOMESTIC SETTING TO CHILDREN AGED 9 YEARS AND UNDER

13. Poisson Regression Analysis

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

An Example of SAS Application in Public Health Research --- Predicting Smoking Behavior in Changqiao District, Shanghai, China

ABSTRACT INTRODUCTION

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Two Correlated Proportions (McNemar Test)

Cellular/Mobile Phone Use and Intracranial Tumours

Poisson Regression or Regression of Counts (& Rates)

Basic Statistical and Modeling Procedures Using SAS

HYPOTHESIS TESTING: POWER OF THE TEST

Sample Size Planning, Calculation, and Justification

Confidence Intervals for One Standard Deviation Using Standard Deviation

Getting Correct Results from PROC REG

Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina

Confidence Intervals in Public Health

Simple Linear Regression Inference

Renal cell carcinoma and body composition:

Using MS Excel to Analyze Data: A Tutorial

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

II. DISTRIBUTIONS distribution normal distribution. standard scores

Quantifying the Influence of Social Characteristics on Accident and Injuries Risk: A Comparative Study Between Motorcyclists and Car Drivers

SAS CLINICAL TRAINING

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Design and Analysis of Phase III Clinical Trials

MEASURES OF LOCATION AND SPREAD

Guide to Biostatistics

VI. Introduction to Logistic Regression

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

Tests for Two Proportions

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Step 3: Go to Column C. Use the function AVERAGE to calculate the mean values of n = 5. Column C is the column of the means.

Summary Measures (Ratio, Proportion, Rate) Marie Diener-West, PhD Johns Hopkins University

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

4 Other useful features on the course web page. 5 Accessing SAS

Two-Group Hypothesis Tests: Excel 2013 T-TEST Command

Modeling Lifetime Value in the Insurance Industry

Competency 1 Describe the role of epidemiology in public health

BIOM611 Biological Data Analysis

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Childhood leukemia and EMF

10:20 11:30 Roundtable discussion: the role of biostatistics in advancing medicine and improving health; career as a biostatistician Monday

Trading to hedge: Static hedging

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Chapter 7: Effect Modification

Basic Study Designs in Analytical Epidemiology For Observational Studies

The CRM for ordinal and multivariate outcomes. Elizabeth Garrett-Mayer, PhD Emily Van Meter

Health Care and Life Sciences

Paper DV KEYWORDS: SAS, R, Statistics, Data visualization, Monte Carlo simulation, Pseudo- random numbers

CC03 PRODUCING SIMPLE AND QUICK GRAPHS WITH PROC GPLOT

Scatter Plot, Correlation, and Regression on the TI-83/84

Likelihood Approaches for Trial Designs in Early Phase Oncology

Estimation of the Number of Lung Cancer Cases Attributable to Asbestos Exposure

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Simple Regression Theory II 2010 Samuel L. Baker

IS 30 THE MAGIC NUMBER? ISSUES IN SAMPLE SIZE ESTIMATION

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

14.74 Lecture 7: The effect of school buildings on schooling: A naturalexperiment

Testing for Lack of Fit

Big Data Analysis. Rajen D. Shah (Statistical Laboratory, University of Cambridge) joint work with Nicolai Meinshausen (Seminar für Statistik, ETH

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Financial Risk Management Exam Sample Questions

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

CALCULATIONS & STATISTICS

ANNEX 2: Assessment of the 7 points agreed by WATCH as meriting attention (cover paper, paragraph 9, bullet points) by Andy Darnton, HSE

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Let SAS Write Your SAS/GRAPH Programs for You Max Cherny, GlaxoSmithKline, Collegeville, PA

Development Period Observed Payments

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

Using Excel in Research. Hui Bian Office for Faculty Excellence

I. Research Proposal. 1. Background

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Transcription:

Laboratory 3 Type I, II Error, Sample Size, Statistical Power Calculating the Probability of a Type I Error Get two samples (n1=10, and n2=10) from a normal distribution population, N (5,1), with population mean=5, and population variance=1, by random sampling. Perform the independent t-test Procedure. Please repeat 10 times, and fill in Table 1. SAS program title1 '*******************************************************'; title2 'Probability of type 1 error by 2002-08-22'; title3 '*******************************************************'; data type1error; do j=1 to 2; do i=1 to 10; x=rannor(0)+5; proc ttest; class j; var x; Table 1. Results of 10 independent t-tests 1 2 3 4 5 6 7 8 9 10 Sample 1 Sample 2 t Value P Value What is a Type I error? Did you get any P value less than 0.05 (type 1 error)? Please calculate the probability of a Type I error based on all the results of t-tests your classmates did in your laboratory. Why did we get a P value less than 0.05 from the same population? Calculating the Probability of a Type II Error Applied Epidemiologic Analysis P8400 Page 1

Get one sample (n1=10) from a normal distribution population, N (6,1), and another sample (n2=10) from another normal distribution population, N (8,1), by random sampling. Perform the independent t-test Procedure. Please repeat 10 times, and fill in Table 2. SAS program title1 '********************************************************'; title2 ''Probability of type 2 error by 2002-08-22'; title3 '********************************************************'; data type2error; do i=1 to 10; j=1; x=rannor(0)+6; do i=1 to 10; j=2; x=rannor(0)+8; proc ttest; class j; var x; Table 2. Results of 10 independent t-tests 1 2 3 4 5 6 7 8 9 10 Sample 1 Sample 2 t Value P Value What is a Type II error? Did you get any P value greater than 0.05 (type 2 error)? Please calculate the probability of a Type II error based on all the results of t- tests your classmates did in your laboratory. Why did we get a P value big than 0.05 from the two different populations? What is the relationship between sample size and the probability of a Type II error? Would you like to rerun this program by using a bigger sample, say n=50 or 100? Applied Epidemiologic Analysis P8400 Page 2

Power Calculations 1. For the Esophageal Cancer Study there are 200 cases and 778 controls. As the study has already been conducted, this sample size is fixed. What we can compute, however, is the range of statistical power we expect based on different scenarios. Assume the following: 1) Sample size as given 2) Type 1 error of.05 3) Range of exposure prevalences in controls as listed in table 4) Range of detectable odds ratios a. Fill in the Table 3 by putting in the estimate of statistical power for each cell by using the following SAS program. title ' Statistical Power Calculations'; data power; c=3.89; n=200; do p0=0.05 to 0.50 by 0.05; do or=1.5 to 3.0 by 0.5; q0=1-p0; p1=(p0*or)/(1+p0*(or-1)); q1=1-p1; pbar=(p1+c*p0)/(1+c); qbar=1-pbar; zbeta=(sqrt(n*(p1-p0)**2)-1.96*sqrt((1+1/c)*pbar*qbar)) / sqrt((p1*q1)+p0*q0/c); power=probnorm(zbeta); proc print data=power; var p0 or zbeta power; Applied Epidemiologic Analysis P8400 Page 3

Table 3: Calculate power with varying exposure prevalence & OR Detectable Exposure Prevalence Difference (ORs) 5% 10% 20% 25% 50% 1.5.27.60.65 2.0.86.98.99 2.5.88.99 >.99 >.99 3.0.97.99 >.99 >.99 >.99 b. Using the information in Table 3, how much statistical power do we have for the parent study? 2. Estimate the statistical power for a study with the following parameters: (i) Range of sample sizes (50 to 300), case:control ratio=1 (ii) Type 1 error of.05 (iii) Range of exposure prevalence in controls is fixed at 30% (iv) Range of detectable odds ratios (1.5 to 4.0) a. Fill in the Table 4 with estimates of statistical power for each cell (Refer to the SAS program for Table 3). Table 4: Calculate power with varying sample size & OR Detectable Sample size Difference (ORs) 0 50 100 200 300 1.0.025.025.025.025 1.5.024.16.27 2.0.023.65.92.98 3.0.021.76 >.99 >.99 4.0.019.93.99 >.99 >.99 b. Using the following commands, plot five power curves which are associated with different odds ratios (ORs from 1 to 4) and different sample sizes (n from 50 to 300) when the prevalence of exposed among controls is 30%, type I error (α) =.05, and equal number of cases and controls. In SAS, input data from Table 4 and run the following program. Applied Epidemiologic Analysis P8400 Page 4

data powercurves; input or n power; cards; 1.0 00.025 1.5 00.024 2.0 00.023 3.0 00.021 4.0 00.019 1.0 50.025 1.5 50.157 enter data here 4.0 300 1.000 ; proc gplot data=powercurves; plot power*n=or; symbol line=1 width=5 interpol=join; title 'Figure 1. Power curve with varying sample size & OR'; quit; c. Examine the plot and answer the following questions. i. Describe the relationships between statistical power, sample size and detectable differences. ii. Which parameter (n or OR) is relatively more important in terms of increasing the statistical power? 3. An investigator is applying for a NIH grant and wants to show to his/her reviewers that the proposed study has at least 80% statistical power to detect an association of 2.0 (i.e., OR=2.0) when the exposure prevalence among controls is 30%, type I error=.05. The cost of recruiting cases is much more expensive than controls. Facing a budget constraint, the investigator has to limit the recruitment of cases to 100. a) Based on the information in Table 5 below, plot a power curve with varying case:control ratios for this particular study (Refer to the SAS program for Table 4) and suggest ways to increase the statistical power. Applied Epidemiologic Analysis P8400 Page 5

Table 5. Statistical power with varying case:control ratio scenarios OR c n power 2 1 100.654 2 2 100.784 2 4 100.856 2 8 100.892 2 10 100.898 OR: odds ratio c: case:control ratio n: number of cases power: statistical power at the significance level of 5% b) Referring to the power curve, how many cases and controls would this investigator need in order to have at least 80% power? c) Beyond which case:control ratio (c) that you would not recommend to increase the statistical power? Why? We will practice Multiple Regression in the next week s laboratory. Please bring the dataset life-satisfaction751.txt. Applied Epidemiologic Analysis P8400 Page 6