Sample Size/Power Calculations

Transcription

1 Sample Size/Power Calculations 1

2 Why Power Analysis? Research is expensive wouldn t want to conduct an experiment with far too 1. few experimental units (EUs) Project won t find important differences that exist not really worth doing 2. many experimental units (EUs) Project will be unnecessarily too expensive Typical granting agency requirement 2

3 A Simple Experimental Design Effect of diet on blood pressure (mmhg) in rats Consider a Completely Randomized Design (CRD) 12 rats randomly assigned to one of two different diets Trt 1: DASH diet n=6 Trt 2: Standard diet n=6 Investigator expects higher mean blood pressure (BP) at the end of 12 weeks when under Trt 2 Is n=6 enough to detect this difference? 3

4 Statistical Analysis Two competing hypotheses: H o : m 1 =m 2 H 1 : m 1 <m 2 i.e. one-tailed test (for now) Basis for choosing between the two is the degree of evidence against the null hypothesis. We use the P-value relative to a declared significance level a P a reject H o and conclude mean BP larger in Trt 2 P > a fail to reject H o, not enough evidence to conclude H 1 There are two possible incorrect conclusions based on this approach to inference 4

5 True unknown state Type I and Type II errors What the data indicate: Fail to reject H o : (P>a) Reject H o : (P a) H o No error Type I error (Prob is a) H 1 Type II error (Prob = b) No error 5

6 So is n = 6 rats large enough? Rephrase: Do we have enough statistical power? Need to know two things 1.How large is the true mean difference (d = m 2 -m 1 )? a) What do you anticipate and/or want to detect? b)what would be economically/practically important? 2.How much variability (s) between rats within a grp? Sometimes prior information available from pilot study or previously published studies Otherwise need to make an educated guess Always round up to be a little conservative 6

7 One way to elicit values for s Empirical rule: Consider range of responses to be equal to 4s Question to client: What would be the likely range (max-min) of responses for rats within the same trt? Suppose the answer was 60 mmhg R = 60 s = 15 mmhg. Suppose researchers also believe that d 20 mmhg is important R 4s 7

8 Two competing hypotheses Under H o : Under H 1 : 2 2s y2 y1 ~ N 0, n 2 2s y2 y1 ~ N d, n Conduct one-tailed z-test for a certain a Currently assuming the data are Normal. Sole difference is in the mean of the distribution. Reject H o : if z y y s n z a if y y z 2 1 a 2 2s n 8

9 Distributions of ) 2 za d s n y H o : H 1 : Power 1-b 1 2 / 2 y 1 z a 2 2s n a 1b 0 d 9

10 More reasonable statistical test t-test Because you likely won t be able to assume s 2 is known One-sided: Reject H o if y2 y1 t s s n n Two-sided (H 1 : m 1 m 2 ): Reject H o : if t t a / 2,df a, df Use of t distribution results in more complicated alternative hypothesis distribution (non-central t) t 10

11 Using SAS for power analysis proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 npergroup=6 stddev=15 power=.; run; or proc power; onewayanova alpha=.05 test=overall groupmeans=(0 20) npergroup=6 stddev=15 power=.; run; Similar to two-sided t-test 11

12 SAS Output Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Number of Sides 1 Null Difference 0 Alpha 0.05 Mean Difference 20 Standard Deviation 15 Sample Size Per Group 6 Computed Power Power

13 SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means 0 20 Standard Deviation 15 Sample Size Per Group 6 Computed Power Power Typically want power to be larger than 80% so more rats would be desirable 13

14 Using SAS for sample size proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 npergroup=. stddev=15 power=.80; run; or proc power; onewayanova alpha=.05 test=overall groupmeans=(0 20) npergroup=. stddev=15 power=.80; run; 14

15 SAS Output Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Number of Sides 1 Null Difference 0 Alpha 0.05 Mean Difference 20 Standard Deviation 15 Nominal Power 0.8 Computed N Per Group Actual N Per Power Group

16 SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means 0 20 Standard Deviation 15 Nominal Power 0.8 Computed N Per Group Actual N Per Power Group

17 Generating Power Curve I proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 stddev=15 power=. npergroup=3 to 20 by 1; plot interpol=join yopts=(ref=0.80); run; 17

18 Power Curve for one-sided t test Power Sample Size Per Group 18

19 Generating Power Curve II proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=10 to 30 by 1 stddev=15 npergroup=6 power=.; plot x=effect interpol=join yopts=(ref=0.80); run; 19

20

21 Determining sample size for a desired margin of error Confidence interval 2 2 s1 s2 y 2 y 1 tdf n n Margin of error Given guesstimates for the variances, one can set margin of error equal to desired amount and solve for n

22 What if more than two trts? Example: In a study of vitamin supplementation, certain pigs are assigned to each of 5 treatment groups and weight gains over a specified time period are to be recorded. Researchers anticipate mean responses to be 3.9, 4.1,4.2, 4.3 and 4.5 kg for the five treatments, respectively Based on previous experience, they anticipate a within-treatment variance of about 0.30 kg 2 They want to know if n=4 animals per treatment would provide sufficient power for the ANOVA F- test. 22

23 Linear model written two ways 1) Y m e ij i ij 2) Y ma e ij i ij Cell means model Factor level effects model i= 1,...,r=5; j = 1,2,,n=4 2 e ~ 0, ) ij NIID s r m i r i1 m ai mi m ai r i1 0 i.e. Sum-to-zero constraints 23

24 One-way ANOVA table Source Df SS MS EMS Treatment r-1 SSTrt MSTrt s 2 function r i1 Error r(n-1) SSE MSE s 2 ANOVA F-test: 1) H o : m 1 =m 2 =m 3 =m 4 =m 5 versus H 1 : at least one m i m i 2) H o : ALL a i = 0 versus. H 1 : at least one a i 0 Equivalent specs. Note: if H o : is true then both EMS = s 2 such that F = MSTrt/MSE ~ F r-1, r(n-1) Central F-distribution 24

25 Under H 1 : m1 3.9 m m m4 4.3 m 5 4.5, or a1 0.3 a a a a Power determination for F-test with m 4.2 This means F = MSTrt/MSE ~ F r-1,r(n-1),f Non-central F-distribution (if f > 0) f n s 2 r i 1 a 2 i is the non-centrality parameter Corrected sum of squared means (CSSM) =(-0.3) 2 +(-0.1) 2 + +(0.0) 2 + (0.1) 2 +(+0.3) 2 =0.20 for example 25

26 SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=( ) npergroup=4 stddev= power=.; run; This is the square root of

27 SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means Standard Deviation Sample Size Per Group 4 Computed Power Power Very poorly underpowered.as designed, this would be a waste of time and money to run!! 27

28 SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=( ) npergroup=4 to 30 stddev= power=.; plot interpol=join yopts=(ref=.80); run; Let s look at a power curve to get an idea of the necessary sample size 28

29 Power Looks like we need about 19 animals per group (almost 5 times the number before) Sample Size Per Group 29

30 What if trt means unknown? Use the worst case scenario Conservative assessment of power Just have to know the difference between the largest and smallest means or the smallest difference D that is scientifically meaningful Use D/2 and D/2 with all other means clumped at zero Minimizes 2 a so it minimizes f i True power will be greater than or equal to this 30

31 SAS Code **Suppose D=0.6***; proc power; onewayanova alpha=.05 test=overall groupmeans=( ) npergroup=4 to 30 stddev= power=.; plot interpol=join yopts=(ref=.80); run; 31

32 Power Looks like we need about 21 animals per group in the worst case Sample Size Per Group 32

33 There is actually a trick to computing f using ANOVA software like PROC GLM/MIXED (O Brien and Lohr, 1984) 1) Substitute true means for data in ANOVA. 2) Use the ANOVA table to compute the noncentrality parameter 3) Then use that computed value in power calculations! 33

34 Using true means for data data oneway; input treatment mean; datalines; ; Suppose you are interested in 3 treatments. Anticipate true mean responses of 4.0, 4.3 and 4.6 Anticipate residual variance of 0.30 Wish to compute power based on sample size of n= 4 for each treatment. proc mixed data=oneway noprofile; class treatment; model mean = treatment; parms (0.30) /noiter; ods output tests3 = tests3; run; Output the ANOVA table to a file called tests3 34

35 Trick to compute f Compute the ANOVA treatment F ratio " F Treatment " Obs Effect NumDF DenDF FValue ProbF 1 treatment Multiple F Treatment by numerator degrees of freedom (NumDF) to get f: f " F " df 1.2* Treatmen t * Treatmen t F Treatment is a function of CSSM. 35

36 Use f to computer power data power; set tests3; noncent = Fvalue*numdf; alpha = 0.05; criticalvalue = Finv(1-alpha,numdf,dendf,0); Power = 1-Probf(criticalvalue,numdf,dendf,noncent); run; proc print data=power; run; Effect Num DF Den DF The critical value separating the acceptance region from the rejection region Probability of falling in rejection region if H1 is true. FValue ProbF noncent alpha Critical value Power treatment

37 PROC GLMPOWER does this data example1; input FactorA $ mean; datalines; run; proc glmpower data=example1 ; class FactorA ; model mean = FactorA ; power stddev =.548 ntotal = 12 power =. alpha=0.05; run; Much simpler data step Total number of experimental units The GLMPOWER Procedure Fixed Scenario Elements Dependent Variable mean Source FactorA Alpha 0.05 Error Standard Deviation Total Sample Size 12 Test Degrees of Freedom 2 Error Degrees of Freedom 9 Computed Power Power

38 What about Factorial Designs? An experiment was conducted to determine the effects of three different sources of dietary phosphorous and four different varieties of corn silage on daily milk production Proposed a 3 x 4 factorial experiment: Factor A, Dietary phosphorus : 1, 2, & 3 (a=3) Factor B, Corn silage varieties: 1,2,3, & 4 (b=4). Each cow randomly assigned to just one particular A*B treatment combination. How many cows should be considered? 38

39 Need to specify true means Power analysis requires knowledge of m ij and s 2. Suppose, investigator anticipates that: m m12 38 m13 44 m14 41 m m22 43 m23 49 m24 46 m m32 48 m33 54 m s 2 = 5 kg 2 Wishes to determine power for both main effects and two-way interaction and also the difference between, say, Level 1 and 2 of A 1 1 m m m m m m m m m m 4 4 ) )

40 Setup data data power; input FactorA FactorB cellmean; datalines; run; symbol1 i=join; proc gplot; plot cellmean*factorb=factora; run; 40

41 Profile means plot cellmean Researcher anticipating no interaction (Power analysis should still take its possiblity into account in ANOVA) FactorB FactorA

42 Using GLMPower proc glmpower data=power ; class FactorA FactorB; model cellmean = FactorA FactorB ; contrast 'A1 vs A2' FactorA FactorB FactorA*FactorB ; power stddev = 5 /* square root of residual standard deviation */ ntotal = 36 /* provides power determination for n =36/12 = 3 reps per group */ power =. /* Blank because you want to compute power */ alpha=0.05; plot x=n min=24 max=96; /* power curve plot ranging from n = 24/12 to 96/12 */ run; 42

43 PROC GLMPOWER OUTPUT Fixed Scenario Elements Dependent Variable cellmean Alpha 0.05 Error Standard Deviation 5 Total Sample Size 36 Error Degrees of Freedom 24 Computed Power Test Index Type Source DF Power 1 Effect FactorA Effect FactorB Effect FactorA*FactorB Contrast A1 vs A

44 Power curves 44