Statistics for the intensivist (2) ICU Fellowship Training Radboudumc

Hypothesis testing The null hypothesis Therapeutic hypothermia (33 0 C) vs normothermic hypothermia (36.8 0 C) N = 260, children age between 2 days - < 18 years, comatose after OHCA % 20 15 10 20 P = 0.14 12 5 0 1-year survival with VABS-II 70

Null hypothesis Due to sampling variation, it is extremely unlikely that the same proportion of patients in each group will die Observed differences may be due to treatment or may be due to chance Hypothesis testing tries to establish which of these explanations is most likely (but can never prove the truth)

Null hypothesis Null hypothesis states that there is no difference between the two treatments Hypothesis testing explores how likely it is that the observed difference would be seen by chance alone if the null hypothesis were true - how likely is it that we find 20% survival in the hypothermia group and 12% in the normothermia group while in fact there is no difference

P-value P = probability Measures how likely it is that the observed difference between groups (20 vs 12%) is due to chance and thus measures the strength of evidence against the null hypothesis In this example there is a 14% chance that the observed difference is solely based on chance

Interpreting P values Patients Tidal volume Mortality Low Tv Control Low Tv Control Low Tv Control P-value 33% Amato 29 24 6.1 ± 0.2 11.9 ± 0.5 38 71 < 0.001 Stewart 60 60 7.2 ± 0.8 10.6 ± 0.2 50 3% 47 0.72 Brochard 58 58 7.2 ± 0.2 10.4 ± 0.2 47 9% 38 0.38 Brower 26 26 7.3 ± 0.1 10.2 ± 0.1 50 4% 46 0.60 ARDSnet 432 429 6.3 ± 0.1 11.7 ± 0.1 31 9% 40 0.007 Villar 50 45 7.3 ± 0.9 10.2 ± 1.2 30 24% 54 0.017

Interpretation ARDSnet trial suggests that there is less than 0.7% chance that the 9% mortality difference between low- and high tidal volume ventilation is based on coincidence if in fact there is no difference Brochard trial suggests that there is less than 38% chance that the 9% mortality difference between low- and high tidal volume ventilation is based on coincidence if in fact there is no difference

Interpretation P-value is significantly influenced by sample size (compare Brochard ARDSnet) P-value changes with the size of the effect (compare Brochard Villar) Never use an arbitrary cut-off P-value to say that something is true or not true Study of Villar does not prove that LTVV decreases mortality because in 1.7% of samples this difference will be found by chance while in fact there is no difference (Type I error) Study of Brochard does not prove that LTVV is not harmful despite the 38% chance that the observed difference of 9% is due to chance (Type II error - Power issue) Statistical significancy does not mean clinically relevant

P-values and confidence intervals 50 Controle P = 0.07 Optimization 30-D mortality or moderate/major complication (%) 40 30 20 10 0 43,4 36,6 N = 734 - major GI surgery Optimise trial Pearse RM. JAMA 2014;311:2181-2190

17-29% minder kans op ernstige complicatie bij tune-up Pearse RM. JAMA 2014;311:2181-2190

Sample size calculation Power = probability of correctly identifying a difference between two groups in the study sample when one genuinely exists in the populations from which the samples were drawn Higher power is usually obtained by increasing the sample size Always calculate the power when a clinically relevant difference between groups is not statistically significant

Hypothermia after OHCA in children MC trial (N = 38) in comatose children (2 D - 18 y) within 6 hrs after OHCA Therapeutic hypothermia (33.0 o C) vs normothermia (36.8 o C) Primary outcome 12 M survival with VABS-II score 70 Moler FW. N Engl J Med 2015;372:1898-1908

Hypothermia after OHCA in children Hypothermia Normothermia 20 20 P = 0.14 Power calculation Absolute effect size of 15-20% % 15 10 12 Primary outcome rate in normothermia group 15-35% 5 Needed SS 275 for power of 85% 0 1-Y survival N = 260 Totally unrealistic

Factors affecting sample size calculation Factor Magnitude Impact on identification Required sample size P-value Small Difficult to achieve significance Large Large Significance easy to obtain Small Power Low Identification unlikely Small High Identification more probable Large Effect Small Difficult to identify Large Large Easy to identify Small Very often p-value set at 0.05 and power at 80-95%

Effect size Take a clinically relevant value (a 0.1% mortality reduction in ARDS will not change practice)

Assume.. You have a new inotropic agent (Levosimendan) You would like to compare this to placebo and see if the CO increases You think that a clinically relevant increase in CO is 1 liter/min and that the CO of the placebo group will be 3 ± 1 L/min Calculate the Standardized difference = target difference (1 liter) / standard deviation (1 liter) Standardized difference = 1

SS calculation Standardized difference Altman nomogram Or a formula N = 2/d 2 C p,power N = number patients/group, d = standardized difference, C p,power = constant P-value Power 80% Power 90% Power 95% 0.05 7.9 10.5 13.0 0.01 11.7 14.9 17.8 For p-value 0.05 and Power 0.80 = 32 patients (2 16) = 2/1 2 7.9 = 16 per group For p-value 0.01 and Power 0.80 = 46 patients (2 23)

SS calculation for proportions Hypothermia Normothermia Power calculation P = 0.14 Absolute effect size of 15-20% 20 15 20 Primary outcome rate in normothermia group 15-35% Needed SS 275 for power of 85% % 10 5 0 1-Y survival N = 260 12 Standardized difference = (p1 - p2)/ [pmean(1-pmean)] = 0.40-0.20/ 0.3(1-0.3)] = 0.436

SS calculation for proportions Or a formula N = number patients/group, p1 = proportion 1, p2 = proportion 2, C p,power = constant P-value Power 80% Power 90% Power 95% 0.05 7.9 10.5 13.0 0.01 11.7 14.9 17.8 For p-value 0.01 and Power 0.85 = 270 patients (2 135) = 130 per group

Calculating power afterwards Hypothermia Normothermia 1100 patients needed 20 15 20 % 10 12 5 Power < 25% to detect significant difference 0 1-Y survival Standardized difference = (0.20-0.12)/ [0.16(1-0.16)] = 0.08/ [0.16(0.84)] = 0.218