# Statistical Simulations for Sample Size Calculation with PROC IML Iza Peszek, Merck & Co., Inc

Save this PDF as:

Size: px
Start display at page:

Download "Statistical Simulations for Sample Size Calculation with PROC IML Iza Peszek, Merck & Co., Inc"

## Transcription

1 Paper SP08 Statistical Simulations for Sample Size Calculation with PROC IML Iza Peszek, Merck & Co., Inc ABSTRACT This paper describes some techniques for sample size calculation via statistical simulation with PROC IML. The simulation approach may turn out to be the most practical way to calculate sample size when trial design is very complex. The concept is illustrated in a case example: showing how to validate a sample size calculation program from the public domain using simulation methods. The basic concepts of statistical simulation are described and corresponding IML code is provided. INTRODUCTION Calculation of the sample size is a typical activity performed when planning clinical trials. For simple trial designs with fixed duration, this task is often relatively simple and reduces to using some well known formulas. Complications arise when the trial design is more exotic, and especially when dealing with variable duration trials, for example, when the stopping rule depends on the number of observed clinical events. There is a vast literature covering sample size calculation for such trials. Most described methods assume some simplifications on the underlying distribution of time-to-event, and are not well equipped to deal with any but the most simple trial designs. For example, a proportional hazard model is often assumed, even though this assumption is rarely realistic. Another example is stratified analysis. Most described methods assume uniform population and perform poorly in case of multiple stratification variables. There are papers describing methodologies for dealing with complex trial designs, but the solutions they present are far from trivial and require substantial learning time. Statisticians asked to calculate a sample size for a trial are often hard-pressed to deliver the solution on short notice, and may not have the luxury to study the described methods in sufficient detail. An alternative to such theoretical solutions is to use simulation techniques to assist in sample size calculation. This approach can be also used to predict a number of other characteristics in the trial (e.g., the number of clinic visits, or the amount of study drug), and allows for flexibility of assumptions and scenarios. Thus, sensitivity calculations can be performed, comparing sample sizes under a variety of assumptions. Moreover, once this technique is mastered, it can be applied to almost any design. This paper illustrates calculation of the sample size via statistical simulation with PROC IML. We illustrate the concept by validating a published method of sample size calculation ([1]) via simulation techniques. EXAMPLE 1: BASIC CONCEPTS Most time-to-event trials can be described using the following concepts: - There are several patient populations (different drug regimens, and possibly different strata). - Each population is assumed to be homogeneous with respect to the distributions of time-to-enrollment, timeto-event, time-to-loss-to-follow-up (leaving the trial), and time-to-crossover (switching treatment group). - Sometimes there is a time limit on the trial, so that the trial will terminate after a specified amount of time. - Sometimes the trial will continue until a pre-specified number of clinical events are observed. - Sometimes the trial will terminate whenever a pre-specified number of clinical events are observed, or after some pre-specified number of months, whichever comes first. These factors will be translated into a statistical model, and simulation will be used to run a trial. We will start with defining some variables. There are several different populations in the trial (e.g., different treatment groups). There are N i patients in the i-th population, who enroll in the trial according to distribution F i, experience an event according to distribution G i, and drop-out for various reasons according to distribution H i. The time of enrollment is from time 0 (start of the trial), and the times of event and loss-to-follow-up are from the time of enrollment. For now, we will assume no crossovers. For each of the trial populations, the simulation model will need to generate N i variables according to distribution F i, N i variables according to distribution G i, and N i variables according to distribution H i. If there is no time limit on the trial, then the loss-to-follow-up time T D (from the start of the trial) is F i +H i. If there is a time limit on the trial, e.g., the trial will stop at time T (from the start of the trial), then the loss-to-follow-up time T D = min (T, F i +H i ). A patient experiences an event at the time F i +G i (as long as F i +G i T D ), or is censored at the time F i +H i (if F i +G i >T D ).

2 For simplicity we assume 2 populations (2 treatment groups), having N 1 =2000 and N 2 = 2000 patients respectively. We will also assume that G i and H i are exponential distributions with one-year event rates p e1 = 0.1, p e2 = 0.12 and one-year loss-to-follow-up rates p d1 =0.2, p d2 =0.25, and that the distribution of time of enrollment is uniform over a 0.25 year period in both populations. The study will terminate 3 years after enrollment starts. More complicated cases will be considered later. It is a well known fact that to generate a distribution according to cdf F, we calculate F -1 (X), where X has a uniform distribution over the interval (0,1). For the exponential distribution, F(t) = 1-exp(-λt), and F -1 (t) =-ln(1-t)/λ. The rate of events at unit time is 1-exp(-λ), so λ=-ln(1-p), where p is a one-year rate of events. Thus, to generate an exponential distribution corresponding to one-year rate of p, we will calculate -ln(1-t)/ λ or ln(1-t)/ln(1-p), where t is uniform over (0,1) and can be easily generated by a UNIFORM function. Here is PROC IML code to implement this scenario: n1 = 1000; * group 1 sample size; n2 = 1000; * group 2 sample size; pe1 = 0.1; * prob of event in group 1 after 1 year; pe2 = 0.12; * prob of event in group 2 after 1 year; pd1 = 0.20; * prob of loss-to-follow-up in group 1 after 1 year; pd2 = 0.25; * prob of loss-to-follow-up in group 2 after 1 year; studyduration = 3; * initialize a dummy used for random number generation; myran1=j(n1, 1, 0); myran2=j(n2, 1, 0); * generate enrollment time according to uniform distribution over 0.25 years; enroll = 0.25*(uniform(myran1))// 0.25*(uniform(myran2)); * generate event time according to exponential distribution; * this is time from enrollment; event_ = (log(1-uniform(myran1))/log(1-pe1)) //(log(1-uniform(myran2))/log(1-pe2)); * calculate event time from time 0 (study start); event = enroll+event_; * generate loss-to-follow-up time according to exponential distribution; * this is time from enrollment; loss_ = (log(1-uniform(myran1))/log(1-pd1)) //(log(1-uniform(myran2))/log(1-pd2)); * calculate loss-to-follow-up time from time time 0 (study start); loss = enroll+loss_; * in addition, administrative censoring occurs at specified time after study started; censor = (loss<studyduration)#loss+studyduration*(loss>=studyduration); * the following matrix has time of event or loss-to-follow-up or censoring - whichever comes first - in the first column, an indicator of censoring (1 if this is time of loss-to-follow-up /censor, 0 otherwise) in the 2nd column, and treatment group (0/1) in the 3rd column; status = (event censor)[,><] (censor<event) (myran1//(1+myran2)); * output data with log-rank test statistics values; create status from status [colname={'time' 'censor' 'trt'}]; append from status; Once we have run this code, we will have a simulated data set STATUS with trial results. For simplicity, we will assume that the final statistical analysis will compare distributions of time-to-event via log-rank statistics. We can run a log-rank test on this data set and determine whether we should reject the null hypothesis that the distribution of time-to-event is the same in both groups. By repeating this process a number of times, and calculating the percent of rejections of the null hypothesis, we will estimate the power of the trial. Typically, 5000 iterations give a reasonable estimate of the power.

3 There are several ways to run the log-rank test. We can leave PROC IML and conduct the test via PROC PHREG, or we can write an IML module for the log-rank test and run it within PROC IML. The latter solution seems to be more practical, as the CPU time is significantly shorter. The module for log-rank statistics is not annotated below, as it is not the focus of this paper. Here is the final code for this simple example: proc iml; * define a module to calculate log-rank statistics; * takes a matrix as input * in first column are event times (death or censoring) * in second column are censoring indicators (1 if censored, 0 otherwise) * in third column are treatment codes(1/0); start logrank(times); * sort matrix; b=times; times[rank(times[,1]),]=b; times=times j(nrow(times),1,1); trt1id=times[,4]#(times[,3]=0); trt2id=times[,4]#(times[,3]=1); ntrt1 = sum(trt1id); ntrt2 = sum(trt2id); * indicator if death occurred; death1 = (trt1id)#(1-times[,2]); death2 = (trt2id)#(1-times[,2]); death=death1+death2; cumgone1 = cusum(trt1id); cumgone2 = cusum(trt2id); cumgone1_={0}//cumgone1[1:nrow(cumgone1)-1]; cumgone2_={0}//cumgone2[1:nrow(cumgone2)-1]; * number at risk before each event time point; atrisk1 = ntrt1-cumgone1_; atrisk2 = ntrt2-cumgone2_; atrisk=atrisk1+atrisk2; disp = atrisk atrisk1 atrisk2 death death1 death2 times[,1]; create tmpdata from disp; append from disp; summary class {col7} var {col1 col2 col3} stat{max} opt{noprint save}; summary class {col7} var {col4 col5 col6} stat{sum} opt{noprint save}; close ; call delete(tmpdata); atriskf = col1 col2 col3 col4 col5 col6 col7; small = loc(col1<2); if nrow(small)>0 then do; min0 = min(small)-1; atriskf = atriskf[1:min0,1:5]; end; w = atriskf[,5]-(atriskf[,2]#atriskf[,4])/atriskf[,1]; v = (atriskf[,2]#atriskf[,3]#atriskf[,4]#(atriskf[,1]- atriskf[,4]))/(atriskf[,1]#atriskf[,1]#(atriskf[,1]-1)); sum_w=sum(w); sum_v=sum(v); chisq = sum_w*sum_w/sum_v; return(chisq); finish logrank; * ; * start simulation * ; n1 = 100; * group 1 sample size; n2 = 100; * group 2 sample size;

4 pe1 = 0.1; * prob of event in group 1 after 1 year; pe2 = 0.12;* prob of event in group 2 after 1 year; pd1 = 0.20;* prob of loss-to-follow-up in group 1 after 1 year; pd2 = 0.25;* prob of loss-to-follow-up in group 2 after 1 year; studyduration = 3; * initialize a dummy used for random number generation; myran1=j(n1, 1, 0); myran2=j(n2, 1, 0); * initialize matrix to which we will append values of chi-square statistics from logrank test; result = j(1,2,0); do i=1 to 5000 by 1; * generate enrollment times, event times and loss-to-follow-up times enroll = 0.25*(uniform(myran1)) // 0.25*(uniform(myran2)); event_ = (log(1-uniform(myran1))/log(1-pe1)) //(log(1-uniform(myran2))/log(1-pe2)); event = enroll+event_; loss_ = (log(1-uniform(myran1))/log(1-pd1)) //(log(1-uniform(myran2))/log(1-pd2)); loss = enroll+loss_; censor = (loss<studyduration)#loss+studyduration*(loss>=studyduration); status = (event censor)[,><] (censor<event) (myran1//(1+myran2)); * run logrank test and append test statistics to result matrix; chisq = logrank(status); result = result//(chisq i); end; * output data with logrank test statistics values; create result from result [colname={'chisq' 'runid'}]; append from result; quit; * calculate percent of rejections of null hypothesis by logrank test; data result; set result (where=( runid>0)); prob = 1-probchi(chisq,1); if prob<0.05 then reject=1; else reject=0; run; proc freq data=result; tables reject; title "Percent of rejections of null hypothesis"; run; EXAMPLE 2 The paper by Shih ([1]) describes the use of a SAS program to calculate sample size in complex time-to-event trials. We want to validate the results of this SAS program via simulations. The author of the paper describes several examples and gives the results generated by their program. The first example is as follows: a hypothetical 2-arm trial with 2 years of average-follow up. The probability of having an event in the control group after 2 years is 30%. The hazard rate is piecewise constant such that λ 1 : λ 2 : λ 3 : λ 4 = 1: 2: 2: 2, where study is divided into four 6-months periods and λ i is the hazard rate in the i-th period. The ratio of the hazard in the treatment group to that in the control group is assumed constant and equal to The program described in the paper calculates a total sample of 683 to achieve 80% power. We will validate the results of this example via IML code. First, we need to translate the problem into the language of a statistical model. Constant hazard rate implies exponential distribution, so time-to-event is piece-wise exponential. In order to determine the parameters of these exponential distributions, note that the assumptions imply that in the control group, 1-(1-P 1 )*(1-P 2 ) 3 = 0.3, where P 1 = 1-exp(-λ 1 ) is the probability of event in period 1 and P 2 = 1-exp(-2λ 1 )

5 is the probability of event in periods 2-4, respectively. Hence, 0.7 = exp(-λ 1 )(exp(-2λ 1 )) 3 and λ 1 =-ln(0.7)/7. In periods 2-4, the hazard of event in the control group is 2λ 1. In the treatment group, hazard in each period is 0.65 of the hazard in the control group. All patients enter the study simultaneously and there are no losses to follow-up. Study duration is 2 years. In the previous example, our time unit was year. In this example, it is more convenient to use 6-months as the time unit, so study duration is 4. Based on a total sample size of 683, we set N 1 = 342 and N 2 =341. Now we can write PROC IML statements to implement this statistical model. For brevity, we quote only the statements to simulate one repetition of the study outcome. Calculating log-rank statistics and running iterations can be done in the same way as in Example 1. n = 683; * total sample size; n1 = 342; * control sample size; n2 = 341; * treatment sample size; hctrl1 = -1*log(0.7)/7; * hazard, period 1, control grp; hctrl2 = 2*hctrl1; * hazard, periods 2-4, conrol grp; htrt1 = 0.65*hctrl1; * hazard, period1, trt grp; htrt2 = 0.65*hctrl2; * hazard, period2, trt grp; * initialize a dummy used for random number generation; myran1=j(n1, 1, 0); myran2=j(n2, 1, 0); * create treatment groups and study duration matrices; trt=j(n2, 1, 1); control=j(n1,1,0); studyduration=j(n,1,4); p1duration = j(n,1,1); * duration of period 1 * generate time of event according to 2-stage distribution; * time of event according to initial distribution (in period1); period1 = -1#log(1-uniform(myran1)/hctrl1; // -l#log(1-uniform(myran2)/htrt1; * time of event according to 2nd stage distribution, applies only if no event in period1 (conditional on no event in period1); period2 = p1duration +(-1#log(1-uniform(myran1)/hctrl2 //(-1#log(1-uniform(myran2)/htrt2); * finally, absolute time of event; teventbx = period1#(period1<=p1duration)+period2#(period1>p1duration); * time of event is censored if it occurs study ended; censor = (studyduration<teventbx); * final time of event is the smaller of study ending time and event time; event = studyduration#censor+teventbx#(1-censor); * final matrix has time of event/censoring in 1 st column, censoring indicator in the second column and treatment group indicator in the third column; times0 = event censor (control//trt); EXAMPLE 3 Let us now assume that in the previous example the study duration is 2.5 years, participants enter the study uniformly for 1 year and are followed for a minimum of 1.5 years. In addition, 10% of each group are crossovers (switch to another group) and 10% are lost to follow up after 2 years. For loss-to-follow-up and for crossover from control to treatment group, the hazard rate is constant. For crossover from treatment to control group, the hazard rate is piecewise constant such that λ 1 : λ 2 : λ 3 : λ 4 : λ 5 = 1: 1: 2: 2: 2 when the study is divided into five 6-months intervals. We assume that patients who crossed over to another treatment group immediately experience the hazard of clinical event specific to this group. Hence we need to simulate the following: time of entry into the study, time of lost to follow up, time of event conditional on no crossover, time of crossover, and time of event after crossover (conditional on no event and no lost to follow up before crossover). To simulate time of loss-to-follow-up and time of crossover from treatment to control group, note that the assumptions

6 imply that 0.9= exp(-β)exp(-β)exp(-β)exp(-β), where β is a hazard rate for lost to follow up. Hence the (constant) hazard of lost to follow, β=-ln(0.9)/4. This is also the hazard of crossover from treatment to control. For crossovers from control to treatment group, 0.9 = exp(-γ)exp(-γ)exp(-2γ)exp(-2γ), so γ = -ln(0.9)/6. Here is the IML code to simulate the study data. hctrl1 = -1*log(0.7)/7; * hazard for event, period 1, control grp; hctrl2 = 2*hctrl1; * hazard for event, periods 2-5, control grp; htrt1 = 0.65*hctrl1; * hazard for event, period 1, trt grp; htrt2 = 0.65*hctrl2; * hazard for event, period2, trt grp; hloss = -1*log(0.9)/4; * hazard for loss to follow up; * hazards of crossover:; htc12 = -1*log(0.9)/6; * treatment to control, periods 1 and 2; htc34 = 2*htc12; * treatment to control, periods 3,4,5; hct = -1*log(0.9)/4; * control to treatment, all periods; * initialize a dummy used for random number generation; myran1=j(n1, 1, 0); myran2=j(n2, 1, 0); * create duration matrices; studyduration_c = j(n1,1,5); * study duration for control; studyduration_t = j(n2,1,5); * study duration for treatment group; p1duration_c = j(n1,1,1); * period 1 duration control; p1duration_t = j(n2,1,1); * period 1 duration treatment group; p12duration_c = j(n1,1,2); * period 1 + period 2 duration control; p12duration_t = j(n2,1,2); * period 1 + period 2 duration treatment group; * for control group; enroll_c = 2#uniform(myran1);* enrollment time uniform over priods 1,2; timeofloss_c = enroll_c - log(1-uniform(myran1))/hloss; timesofstop_c = timeofloss_c studyduration_c; tloss_c = timesofstop_c[,><]; * minumum of lost-to-follow and study stop; period1_c = -1#log(1-uniform(myran1))/hctrl1; * time of event in period 1; period2_c = p1duration_c - log(1-uniform(myran1))/hctrl2; * time of event in periods 2-5, conditional on no event in period 1; teventbx_c = enroll_c + period1_c#(period1_c<=p1duration_c) + period2_c#(period1_c>p1duration_c); * the unconditional time of event, from start of study, if no crossovers, control group; tcross_c = enroll_c-log(1-uniform(myran1))/hct; * time of crossover from control to trt, from study start; * generate time of event after noncompliance started; period1x_c = -1#log(1-uniform(myran1))/htrt1; * time of event in period 1; period2x_c = p1duration_c - log(1-uniform(myran1))/htrt2; * time of event in periods 2-5, conditional on no event in period 1; teventx0_c = tcross_c + period1x_c#(period1x_c<=p1duration_c) + period2x_c#(period1x_c>p1duration_c); * unconditional time of event from start of study, control group; tevent_c = teventbx_c#(teventbx_c<=tcross_c)+teventx_c#(teventbx_c>tcross_c); * observed event time is teventbx_c of it occurs before x-over or teventx_c otherwise; censor_c = (tloss_c<tevent_c); eventtime_c = tevent_c#(1-censor_c)+tloss_c#censor_c; * event is observed only if it occurs before loss to follow and before study end;

7 * for treatment group; enroll_t = 2#uniform(myran2);* enrollment time; timeofloss_t = enroll_t - log(1-uniform(myran2))/hloss; timesofstop_t = timeofloss_t studyduration_t; tloss_t = timesofstop_t[,><]; * minumum of lost-to-follow and study stop; period1_t = -1#log(1-uniform(myran2))/htrt1; * time of event in period 1; period2_t = p1duration_t - log(1-uniform(myran2))/htrt2; * time of event in periods 2-5, conditional on no event in period 1; teventbx_t = enroll_t + period1_t#(period1_t<=p1duration_t) + period2_t#(period1_t>p1duration_t); * the unconditional time of event, from start of study, if no crossovers, control group; tcross12_t = -1#log(1-uniform(myran2))/htc12; * time of x-over in periods 1-2; tcross34_t = p12duration_t - log(1-uniform(myran2))/htc34; * time of x-over in periods 3-5, conditional on no x-over in periods 1-2; tcross_t = enroll_t + tcross12_t#(tcross12_t<=p12duration_t) + tcross34_t#(tcross34_t>p12duration_t); * time of x-over from treatment to control group, from study start; * generate time of event after noncompliance started; period1x_t = -1#log(1-uniform(myran2))/hctrl1; * time of event according to initial distribution; period2x_t = p1duration_t - log(1-uniform(myran2))/hctrl2; * time of event in periods 2-5, conditional on no event in period 1; teventx_t = tcross_t + period1x_t#(period1x_t<=p1duration_t) + period2x_t#(period1x_t>p1duration_t); * unconditional time of event from start of study, treatment group; tevent_t = teventbx_t#(teventbx_t<=tcross_t)+teventx_t#(teventbx_t>tcross_t); * observed event time is teventbx_c of it occurs before x-over or teventx_c otherwise; censor_t = (tloss_t<tevent_t); eventtime_t = tevent_t#(1-censor_t) + tloss_t#censor_t; * event is observed only if it occurs before loss to follow and before study end; * create matrix for logrank routine; finaldata = (eventtime_c censor_c (myran1))//(eventtime_t censor_t (1+myran2)); chisq=logrank(finaldata); ADDITIONAL REMARKS We illustrated the simulation of a clinical trial via some simple examples. In these examples, we started with a predetermined sample size and estimated the power of the study via simulations. To determine a sample size for a specified power, a search algorithm should be implemented, trying different sample sizes until the desired power is achieved. A practical way would be to select a reasonable initial estimate using a simple formula based on the logrank test and recursively refine the initial estimate until the specified power is achieved. One way to speed up the simulation time would be to conduct a search for N with a small number of repetitions (say, 200). This will quickly produce a rough estimate of the power. Once we determine the sample size which gives a result close to the desired power, we can verify the accuracy by increasing the number of repetitions to a more reasonable number (5000 repetitions or so). One of several search algorithms is as follows: 1. Calculate the initial sample size N based on the log-rank formula: D = 4(z α/2 + z β ) 2 /θ 2, where D denotes required number of events, z α is an upper α-th percentile of standard normal distribution and θ is a log-hazard ratio. Given D, sample sizes in treatment groups 1 and 2 can be approximated from the formula D=N 1 *P 1 +N 2 *P 2, where N 1, N 2 denote sample sizes, and P 1, P 2 denote expected rates of events over an average study duration. 2. Perform low-repetition simulations to obtain power estimate.

### Design and Analysis of Equivalence Clinical Trials Via the SAS System

Design and Analysis of Equivalence Clinical Trials Via the SAS System Pamela J. Atherton Skaff, Jeff A. Sloan, Mayo Clinic, Rochester, MN 55905 ABSTRACT An equivalence clinical trial typically is conducted

### Given hazard ratio, what sample size is needed to achieve adequate power? Given sample size, what hazard ratio are we powered to detect?

Power and Sample Size, Bios 680 329 Power and Sample Size Calculations Collett Chapter 10 Two sample problem Given hazard ratio, what sample size is needed to achieve adequate power? Given sample size,

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### p Values and Alternative Boundaries

p Values and Alternative Boundaries for CUSUM Tests Achim Zeileis Working Paper No. 78 December 2000 December 2000 SFB Adaptive Information Systems and Modelling in Economics and Management Science Vienna

### Surviving Left Truncation Using PROC PHREG Aimee J. Foreman, Ginny P. Lai, Dave P. Miller, ICON Clinical Research, San Francisco, CA

Surviving Left Truncation Using PROC PHREG Aimee J. Foreman, Ginny P. Lai, Dave P. Miller, ICON Clinical Research, San Francisco, CA ABSTRACT Many of us are familiar with PROC LIFETEST as a tool for producing

### Monte Carlo Simulation using Excel for Predicting Reliability of a Geothermal Plant

Monte Carlo Simulation using Excel for Predicting Reliability of a Geothermal Plant Daniela E. Popescu, Corneliu Popescu, Gianina Gabor University of Oradea, Str. Armatei Romane nr.5, Oradea, România depopescu@uoradea.ro,

### Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA

Statistics for Clinical Trial SAS Programmers 1: paired t-test Kevin Lee, Covance Inc., Conshohocken, PA ABSTRACT This paper is intended for SAS programmers who are interested in understanding common statistical

### tests whether there is an association between the outcome variable and a predictor variable. In the Assistant, you can perform a Chi-Square Test for

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. In practice, quality professionals sometimes

### A SAS Macro to Calculate Exact Confidence Intervals for the Difference of Two Proportions

A SAS Macro to Calculate Exact Confidence Intervals for the Difference of Two Proportions Paul R. Coe, Dominican University, River Forest, IL Abstract When comparing two proportions, it is often desirable

### Introduction to Software Reliability Estimation

Introduction to Software Reliability Estimation 1 Motivations Software is an essential component of many safetycritical systems These systems depend on the reliable operation of software components How

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Paper PO06. Randomization in Clinical Trial Studies

Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection

### Chapter 9 Testing Hypothesis and Assesing Goodness of Fit

Chapter 9 Testing Hypothesis and Assesing Goodness of Fit 9.1-9.3 The Neyman-Pearson Paradigm and Neyman Pearson Lemma We begin by reviewing some basic terms in hypothesis testing. Let H 0 denote the null

### ABSTRACT INTRODUCTION THE DATA SET

Computing Hazard Ratios and Confidence Intervals for a Two-Factor Cox Model with Interaction Terms Jing Wang, M.S., Fred Hutchinson Cancer Research Center, Seattle, WA Marla Husnik, M.S., Fred Hutchinson

### Size of a study. Chapter 15

Size of a study 15.1 Introduction It is important to ensure at the design stage that the proposed number of subjects to be recruited into any study will be appropriate to answer the main objective(s) of

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Nursing Practice Development Unit, Princess Alexandra Hospital and Research Centre for Clinical and

SAMPLE SIZE: HOW MANY IS ENOUGH? Elizabeth Burmeister BN MSc Nurse Researcher Nursing Practice Development Unit, Princess Alexandra Hospital and Research Centre for Clinical and Community Practice Innovation,

### PASS Sample Size Software. Linear Regression

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

### SUGI 29 Statistics and Data Analysis

Paper 195-29 "Proc Power in SAS 9.1" Debbie Bauer, MS, Sanofi-Synthelabo, Russell Lavery, Contractor, About Consulting, Chadds Ford, PA ABSTRACT Features were added to SAS V9.1 STAT that will allow many

### Module 10: Mixed model theory II: Tests and confidence intervals

St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 10: Mixed model theory II: Tests and confidence intervals 10.1 Notes..................................

### Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980)

Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) For the Driver Behavior Study, the Chi Square Analysis II is the appropriate analysis below.

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### 11.2 POINT ESTIMATES AND CONFIDENCE INTERVALS

11.2 POINT ESTIMATES AND CONFIDENCE INTERVALS Point Estimates Suppose we want to estimate the proportion of Americans who approve of the president. In the previous section we took a random sample of size

### Worked examples Random Processes

Worked examples Random Processes Example 1 Consider patients coming to a doctor s office at random points in time. Let X n denote the time (in hrs) that the n th patient has to wait before being admitted

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### Median of the p-value Under the Alternative Hypothesis

Median of the p-value Under the Alternative Hypothesis Bhaskar Bhattacharya Department of Mathematics, Southern Illinois University, Carbondale, IL, USA Desale Habtzghi Department of Statistics, University

### Lecture Outline (week 13)

Lecture Outline (week 3) Analysis of Covariance in Randomized studies Mixed models: Randomized block models Repeated Measures models Pretest-posttest models Analysis of Covariance in Randomized studies

### Lecture Quantitative Finance

Lecture Quantitative Finance Spring 2011 Prof. Dr. Erich Walter Farkas Lecture 12: May 19, 2011 Chapter 8: Estimating volatility and correlations Prof. Dr. Erich Walter Farkas Quantitative Finance 11:

### ON THE FIBONACCI NUMBERS

ON THE FIBONACCI NUMBERS Prepared by Kei Nakamura The Fibonacci numbers are terms of the sequence defined in a quite simple recursive fashion. However, despite its simplicity, they have some curious properties

### DETERMINING AVERAGE BIOEQUIVALENCE AND CORRESPONDING SAMPLE SIZES IN GENERAL ASYMMETRIC SETTINGS

Sankhyā : The Indian Journal of Statistics Special Issue on Biostatistics 2000, Volume 62, Series B, Pt. 1, pp. 162 174 DETERMINING AVERAGE BIOEQUIVALENCE AND CORRESPONDING SAMPLE SIZES IN GENERAL ASYMMETRIC

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### Episode 516: Exponential and logarithmic equations

Episode 516: Exponential and logarithmic equations Students may find this mathematical section difficult. It is worth pointing out that they have already covered the basic ideas of radioactive decay in

### 13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

### SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

### Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Paper 3140-2015 An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Iván Darío Atehortua Rojas, Banco Colpatria

### AP Statistics 2004 Scoring Guidelines

AP Statistics 2004 Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use must be sought

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. 2-Sample

### NPTEL STRUCTURAL RELIABILITY

NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:

### SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates

SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates Martin Wolkewitz, Ralf Peter Vonberg, Hajo Grundmann, Jan Beyersmann, Petra Gastmeier,

### Contents. 3 Survival Distributions: Force of Mortality 37 Exercises Solutions... 51

Contents 1 Probability Review 1 1.1 Functions and moments...................................... 1 1.2 Probability distributions...................................... 2 1.2.1 Bernoulli distribution...................................

### Calculating Changes and Differences Using PROC SQL With Clinical Data Examples

Calculating Changes and Differences Using PROC SQL With Clinical Data Examples Chang Y. Chung, Princeton University, Princeton, NJ Lei Zhang, Celgene Corporation, Summit, NJ ABSTRACT It is very common

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Estimating survival functions has interested statisticians for numerous years.

ZHAO, GUOLIN, M.A. Nonparametric and Parametric Survival Analysis of Censored Data with Possible Violation of Method Assumptions. (2008) Directed by Dr. Kirsten Doehler. 55pp. Estimating survival functions

### Comparison of Survival Curves

Comparison of Survival Curves We spent the last class looking at some nonparametric approaches for estimating the survival function, Ŝ(t), over time for a single sample of individuals. Now we want to compare

### Chapter 10. Verification and Validation of Simulation Models Prof. Dr. Mesut Güneş Ch. 10 Verification and Validation of Simulation Models

Chapter 10 Verification and Validation of Simulation Models 10.1 Contents Model-Building, Verification, and Validation Verification of Simulation Models Calibration and Validation 10.2 Purpose & Overview

### Lecture 14: Fourier Transform

Machine Learning: Foundations Fall Semester, 2010/2011 Lecture 14: Fourier Transform Lecturer: Yishay Mansour Scribes: Avi Goren & Oron Navon 1 14.1 Introduction In this lecture, we examine the Fourier

### Two Correlated Proportions (McNemar Test)

Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

### Lecture 22: Introduction to Log-linear Models

Lecture 22: Introduction to Log-linear Models Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

### Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

### Missing Data Approach to Handle Delayed Outcomes in Early

Missing Data Approach to Handle Delayed Outcomes in Early Phase Clinical Trials Department of Biostatistics The University of Texas, MD Anderson Cancer Center Outline Introduction Method Conclusion Background

### Hypothesis Testing - II

-3σ -2σ +σ +2σ +3σ Hypothesis Testing - II Lecture 9 0909.400.01 / 0909.400.02 Dr. P. s Clinic Consultant Module in Probability & Statistics in Engineering Today in P&S -3σ -2σ +σ +2σ +3σ Review: Hypothesis

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### The Utility of Bayesian Predictive Probabilities for Interim Monitoring of Clinical Trials

The Utility of Bayesian Predictive Probabilities for Interim Monitoring of Clinical Trials Ben Saville, Ph.D. Berry Consultants KOL Lecture Series, Nov 2015 1 / 40 Introduction How are clinical trials

### Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:

### Design and Analysis of Phase III Clinical Trials

Cancer Biostatistics Center, Biostatistics Shared Resource, Vanderbilt University School of Medicine June 19, 2008 Outline 1 Phases of Clinical Trials 2 3 4 5 6 Phase I Trials: Safety, Dosage Range, and

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

### Chi-Square Tests. In This Chapter BONUS CHAPTER

BONUS CHAPTER Chi-Square Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated

### STT 430/630/ES 760 Lecture Notes: Chapter 6: Hypothesis Testing 1. February 23, 2009 Chapter 6: Introduction to Hypothesis Testing

STT 430/630/ES 760 Lecture Notes: Chapter 6: Hypothesis Testing 1 February 23, 2009 Chapter 6: Introduction to Hypothesis Testing One of the primary uses of statistics is to use data to infer something

### A Formalization of the Turing Test Evgeny Chutchev

A Formalization of the Turing Test Evgeny Chutchev 1. Introduction The Turing test was described by A. Turing in his paper [4] as follows: An interrogator questions both Turing machine and second participant

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Financial Risk Forecasting Chapter 8 Backtesting and stresstesting

Financial Risk Forecasting Chapter 8 Backtesting and stresstesting Jon Danielsson London School of Economics 2015 To accompany Financial Risk Forecasting http://www.financialriskforecasting.com/ Published

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### 1. Chi-Squared Tests

1. Chi-Squared Tests We'll now look at how to test statistical hypotheses concerning nominal data, and specifically when nominal data are summarized as tables of frequencies. The tests we will considered

### Use of Pearson s Chi-Square for Testing Equality of Percentile Profiles across Multiple Populations

Open Journal of Statistics, 2015, 5, 412-420 Published Online August 2015 in SciRes. http://www.scirp.org/journal/ojs http://dx.doi.org/10.4236/ojs.2015.55043 Use of Pearson s Chi-Square for Testing Equality

### Modern Geometry Homework.

Modern Geometry Homework. 1. Rigid motions of the line. Let R be the real numbers. We define the distance between x, y R by where is the usual absolute value. distance between x and y = x y z = { z, z

### SAS and Clinical IVRS: Beyond Schedule Creation Gayle Flynn, Cenduit, Durham, NC

Paper SD-001 SAS and Clinical IVRS: Beyond Schedule Creation Gayle Flynn, Cenduit, Durham, NC ABSTRACT SAS is the preferred method for generating randomization and kit schedules used in clinical trials.

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### SAMPLE SIZE ESTIMATION USING KREJCIE AND MORGAN AND COHEN STATISTICAL POWER ANALYSIS: A COMPARISON. Chua Lee Chuan Jabatan Penyelidikan ABSTRACT

SAMPLE SIZE ESTIMATION USING KREJCIE AND MORGAN AND COHEN STATISTICAL POWER ANALYSIS: A COMPARISON Chua Lee Chuan Jabatan Penyelidikan ABSTRACT In most situations, researchers do not have access to an

### The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health, ABSTRACT

### Prime Numbers. Chapter Primes and Composites

Chapter 2 Prime Numbers The term factoring or factorization refers to the process of expressing an integer as the product of two or more integers in a nontrivial way, e.g., 42 = 6 7. Prime numbers are

### Credibility and Pooling Applications to Group Life and Group Disability Insurance

Credibility and Pooling Applications to Group Life and Group Disability Insurance Presented by Paul L. Correia Consulting Actuary paul.correia@milliman.com (207) 771-1204 May 20, 2014 What I plan to cover

### Generalized Linear Mixed Modeling and PROC GLIMMIX

Generalized Linear Mixed Modeling and PROC GLIMMIX Richard Charnigo Professor of Statistics and Biostatistics Director of Statistics and Psychometrics Core, CDART RJCharn2@aol.com Objectives First ~80

### Fast and Robust Signaling Overload Control

Fast and Robust Signaling Overload Control Sneha Kasera, José Pinheiro, Catherine Loader, Mehmet Karaul, Adiseshu Hari and Tom LaPorta Bell Laboratories, Lucent Technologies {kasera,jcp,cathl,karaul,hari,tlp}@lucent.com

### Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover

1 Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover Jie Xu, Member, IEEE, Yuming Jiang, Member, IEEE, and Andrew Perkis, Member, IEEE Abstract In this paper we investigate

### Matrix Inverse and Determinants

DM554 Linear and Integer Programming Lecture 5 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1 2 3 4 and Cramer s rule 2 Outline 1 2 3 4 and

### Models Used in Variance Swap Pricing

Models Used in Variance Swap Pricing Final Analysis Report Jason Vinar, Xu Li, Bowen Sun, Jingnan Zhang Qi Zhang, Tianyi Luo, Wensheng Sun, Yiming Wang Financial Modelling Workshop 2011 Presentation Jan

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE

### Parametric Models. dh(t) dt > 0 (1)

Parametric Models: The Intuition Parametric Models As we saw early, a central component of duration analysis is the hazard rate. The hazard rate is the probability of experiencing an event at time t i

### Regression analysis in the Assistant fits a model with one continuous predictor and one continuous response and can fit two types of models:

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. The simple regression procedure in the

### The three statistics of interest comparing the exposed to the not exposed are:

Review: Common notation for a x table: Not Exposed Exposed Disease a b a+b No Disease c d c+d a+c b+d N Warning: Rosner presents the transposed table so b and c have their meanings reversed in the text.

### Logistic regression diagnostics

Logistic regression diagnostics Biometry 755 Spring 2009 Logistic regression diagnostics p. 1/28 Assessing model fit A good model is one that fits the data well, in the sense that the values predicted

### Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Mathematical Induction

Chapter 2 Mathematical Induction 2.1 First Examples Suppose we want to find a simple formula for the sum of the first n odd numbers: 1 + 3 + 5 +... + (2n 1) = n (2k 1). How might we proceed? The most natural

### The Method of Least Squares

33 The Method of Least Squares KEY WORDS confidence interval, critical sum of squares, dependent variable, empirical model, experimental error, independent variable, joint confidence region, least squares,

### EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M

EAA492/6: FINAL YEAR PROJECT DEVELOPING QUESTIONNAIRES PART 2 1 A H M A D S H U K R I Y A H A Y A E N G I N E E R I N G C A M P U S U S M CONTENTS Reliability And Validity Sample Size Determination Sampling

### Multiple One-Sample or Paired T-Tests

Chapter 610 Multiple One-Sample or Paired T-Tests Introduction This chapter describes how to estimate power and sample size (number of arrays) for paired and one sample highthroughput studies using the.

### Classical Hypothesis Testing and GWAS

Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 15, 2014 Outline General Principles Classical statistical hypothesis testing involves the test of a null hypothesis