Biostatistics 102: Quantitative Data Parametric & Non-parametric Tests

Singpore Med J 2003 Vol 44(8) : 391-396 B s i c S t t i s t i c s F o r D o c t o r s Biosttistics 102: Quntittive Dt Prmetric & Non-prmetric Tests Y H Chn In this rticle, we re going to discuss on the sttisticl tests vilble to nlyse continuous outcome vribles. The prmetric tests will be pplied when normlity (nd homogeneity of vrince) ssumptions re stisfied otherwise the equivlent non-prmetric test will be used (see tble I). TbleI. Prmetric vs Non-Prmetric tests. Prmetric Non-Prmetric Figure 1. Histogrms of Systolic & Distolic blood pressures. Histogrm 50 40 Frequency 30 20 1 Smple T-test Sign Test/Wilcoxon Signed Rnk test Pired T-test Sign Test/Wilcoxon Signed Rnk test 2 Smple T-test Mnn Whitney U test/wilcoxon Sum Rnk test ANOVA Kruskl Wllis test We shll look t vrious exmples to understnd when ech test is being used. 10 0 80 150.0 140.0 130.0 120.0 110.0 100.0 90.0 80.0 Systolic blood pressure 170.0 160.0 200.0 190.0 180.0 1 SAMPLE T-TEST The 1-Smple T test procedure determines whether the men of single vrible differs from specified constnt. For exmple, we re interested to find out whether subjects with cute chest pin hve bnorml (norml = 120 mmhg) nd/or distolic (norml = 80 mmhg) blood pressures. 500 subjects presenting themselves to n emergency physicin were enrolled. Assumption for 1 smple T test: Dt re normlly distributed. We hve discussed in the lst rticle (1) on how to check the normlity ssumption of quntittive dt. One issue being highlighted ws tht these forml normlity tests re very sensitive to the smple size of the vrible concerned. As seen here, tble II shows tht the normlity ssumptions for both the nd distolic blood pressures re violted but bsing on their histogrms (see figure 1), normlity ssumptions re fesible. Tble II. Forml normlity tests. Tests of Normlity Kolmogorov-Smirnov Shpiro-Wilk Sttistic df Sig. Sttistic df Sig. blood pressure.049 500.006.990 500.002 distolic blood pressure.042 500.032.992 500.011 Lilliefors Significnce Correction. Frequency 60 40 20 0 50.0 85.0 80.0 75.0 70.0 65.0 60.0 55.0 Distolic blood pressure 110.0 105.0 100.0 95.0 90.0 So with the normlity ssumptions stisfied, we could use the 1 Smple T-test to check whether the nd distolic blood pressures for these subjects re sttisticlly different from the norms of 120 mmhg nd 80 mmhg respectively. Firstly, simple descriptive would give us some ide, see tble III. Tble III. Descriptive sttistics for the & distolic BP. Men Std Minimum Mximum Medin Devition blood pressure 140.51 24.08 79.00 200.00 139.00 distolic blood pressure 78.65 12.91 48.00 110.00 79.00 Clinicl Trils nd Epidemiology Reserch Unit 226 Outrm Rod Blk A #02-02 Singpore 169039 Y H Chn, PhD Hed of Biosttistics Correspondence to: Y H Chn Tel: (65) 6317 2121 Fx: (65) 6317 2122 Emil: chnyh@ cteru.com.sg

392 : 2003 Vol 44(8) Singpore Med J To perform 1 Smple T-test, in SPSS, use Anlyze, Compre Mens, One-Smple T test. For, put test vlue = 120 nd for distolic put test vlue = 80 (we hve to do ech test seprtely). Tbles IV & V shows the SPSS output. Tble IV. 1 Smple T-test for testing t 120 mmhg. One-Smple Test Test Vlue = 120 Intervl of the t df Sig. Men Lower Upper (2-tiled) blood pressure 19.046 499.000 20.5080 18.3925 22.6235 Tble V. 1 Smple T-test for distolic BP testing t 80 mmhg. One-Smple Test Test Vlue = 80 Intervl of the t df Sig. Men Lower Upper (2-tiled) distolic blood pressure -2.345 499.019-1.3540-2.4886-0.2194 These subjects hd much higher (p<0.001, difference = 20.5, 95% CI 18.4 to 22.6) compred to the norm of 120 mmhg. This difference is cliniclly relevnt too. For the distolic BP, though there ws sttisticl significnce of 1.35 (95% CI 0.22 to 2.5, p = 0.019) lower thn the norm of 80 mmhg, this difference my not be of clinicl significnce. By now, we should relize tht the p-vlue is significntly ffected by smple size (2), thus we should be looking t the clinicl significnce first then the sttisticl significnce. If the normlity ssumptions were not stisfied, then the equivlent non-prmetric Sign test or Wilcoxon Signed Rnk test would be used. In SPSS, before we could perform the non-prmetric nlysis, we will hve to crete new vrible in the dtset, sy, sysnorm (which is just column of 120). Use the Trnsform, Compute commnd to do this (likewise, we hve to crete new vrible, sy, dinorm which is just column of 80). Then go to Anlyze, Non Prmetric tests, 2 relted smples to do the tests (we cn do both tests for nd distolic simultneously, Tbles VI & VII show the SPSS outputs). In this cse, we re nlyzing the medins of the vribles rther thn their mens. Tble VI. Wilcoxon Signed Rnk tests. Test Sttistics c SYSTOLIC - DIASTOLI - blood distolic blood pressure pressure z -14.965-2.474 Asymp. Sig. (2-tiled).000.013 c Wilcoxon Signed Rnks test. Tble VII. Sign test. Test Sttistics SYSTOLIC - DIASTOLI - blood distolic blood pressure pressure z -13.169-2.343 Asymp. Sig. (2-tiled).000.019 Sign Test. In the Sign test, the mgnitude of the differences between the vrible nd the norm is not tken into considertion when deriving the significnce. It uses the number of positives nd negtives of the differences. Thus if there were nerly equl numbers of positives nd negtives, then no sttisticl significnce will be found regrdless of the mgnitude of the positives/negtives. The Wilcoxon Signed Rnk test, on the other hnd, uses the mgnitude of the positives/negtives s rnks in the clcultion of the significnce, thus more sensitive test. PAIRED T-TEST When the interest is in the before nd fter responses of n outcome (within group comprison), sy, the before nd fter n intervention, the pired T-test would be pplied. Tble VIII shows the descriptive sttistics for the before nd fter intervention s of 167 subjects. Tble VIII. Descriptive sttistics for the before & fter intervention. Std Men Devition Minimum Mximum Medin BP before 142.31 22.38 90.00 200.00 139.00 BP fter 137.14 24.87 90.00 199.00 137.00 Assumption for the Pired T test: The difference between the before & fter is normlly distributed We will hve to compute new vrible for the difference between the before & fter nd then check it s normlity ssumption. Tble IX shows the forml tests for the checking of the normlity

Singpore Med J 2003 Vol 44(8) : 393 ssumption nd figure 2 shows the corresponding histogrm. Tble IX. Normlity ssumption checks. Kolmogorov-Smirnov Shpiro-Wilk Sttistic df Sig. Sttistic df Sig. Systolic BP before.048 167.200.991 167.388 Figure 2. Histogrm of the difference between the Before & After intervention. 30 Tble X. Pired T-test for the Before & After intervention. Pir 1 before fter Pired s Intervl of the Std. Std. Error Sig. (2- Men Devition Men Lower Upper t df tiled) 5.17 34.11 2.64 -.04 10.38 1.958 166.052 Tble XI. 1 Smple T test for the difference between the Before & After intervention. Frequency 20 10 0 0.0 10.0-10.0-20.0-30.0-40.0-50.0-60.0-70.0 50.0 40.0 30.0 20.0 before - fter 90.0 100.0 80.0 70.0 60.0 One-Smple Test Test Vlue = 0 Intervl of the t df Sig. Men Lower Upper (2-tiled) before - fter 1.958 166.052 5.1677 -.0442 10.3795 Since the normlity ssumption is stisfied, we cn use the pired T-test to perform the nlysis: In SPSS, use Anlyze, Compre Mens, Pired Smples T test. Tble X shows the SPSS output for the pired T-test. Cliniclly there ws men reduction of 5.17 mmhg but this ws not sttisticlly significnt (p = 0.052). Should we then increse the smple size to chse fter the p-vlue? We shll discuss this issue t the end of this rticle. Alterntively, we cn use the 1-Smpe T test (with test vlue = 0) on the difference between the Before & After to check whether there ws sttisticl significnce; see Tble XI. In the event tht the normlity ssumption ws not stisfied, we will use the Wicoxon Signed Rnk test to perform the comprison on the medins. In SPSS, use Anlyze, Non Prmetric tests, 2 relted smples: tble XII shows the SPSS output. 2 SAMPLE T-TEST When our interest is the Between-Group comprison, the 2 Smple T test would be pplied. For exmple, we wnt to compre the between the norml weight nd the over-weight ( proper power nlysis should be done before embrking on the study (2) ). 250 subjects for ech group were recruited. Tble XIII gives the descriptive sttistics. Assumptions of the 2 Smple T test: 1. Observtions re normlly distributed in ech popultion. Tble XII. Wilcoxon Signed Rnk test on the difference on the Before & After intervention. Test Sttistics b fter - before z -1.803 Asymp. Sig. (2-tiled).071 Bsed on positive rnks b Wilcoxon Signed Rnks Test Tble XIII. Descriptive sttistics of Systolic BP by group. Men Std Minimum Mximum Medin Devition over-weight 141.65 23.06 90.00 200.00 138.00 norml-weight 97.12 10.82 80.00 132.00 100.00 2. Homogeneity of vrince (The popultion vrinces re equl). 3. The 2 groups re independent rndom smples. The 3 rd ssumption is esily checked from the design of the experiment ech subject cn only be in one of the groups or intervention. The 1 st ssumption of normlity is lso esily checked by using the Explore option in SPSS (with group declred in the Fctor list this will produce normlity checks for ech group seprtely). Normlity ssumptions must be stisfied for both groups for the 2 Smple T test to be pplied. Lstly, the 2 nd ssumption of homogeneity of vrince will be given in the 2 Smple T test nlysis.

394 : 2003 Vol 44(8) Singpore Med J Tble XIV. 2 Smple T test. Independent Smples Test Levene s Test for Equlity of Vrinces t-test for Equlity of Mens F Sig. t df Sig. Men Std. Error Intervl of the (2-tiled) Lower Upper Equl vrinces ssumed 131.183.000 27.638 498.000 44.5280 1.61111 41.36258 47.69342 Equl vrinces not ssumed 27.638 353.465.000 44.5280 1.61111 41.35943 47.69657 To perform 2 Smple T test, in SPSS, use Anlyze, Compre Mens, Independent Smples T-test. Tble XIV shows the SPSS output. The Levene s Test for equlity of vrinces checks the 2 nd ssumption. The Null hypothesis is: Equl Vrinces ssumed. The Sig vlue (given in the 3 rd column) shows tht the Null hypothesis of equl vrinces ws rejected nd SPSS djusts the results for us. In this cse we hve to red off the p-vlue (Sig 2-tiled) from the 2 nd line (equl vrinces not ssumed) rther thn from the 1 st line (equl vrinces ssumed). As expected, there ws significnt difference in the between the over-weight nd norml (p<0.001, difference = 44.53, 95% CI 41.36 to 47.69 mmhg) When normlity ssumptions re not stisfied for ny one or both of the groups, the equivlent nonprmetric Mnn Whitney U/Wilcoxon Rnked Sum tests should be pplied. In SPSS, use Anlyze, Non Prmetric tests, 2 Independent Smples. Tble XV shows the results for the non-prmetric test. Tble XV. Mnn Whitney U & Wilcoxon Rnk Sum tests. Test Sttistics BPSYS Mnn-Whitney U 1664.000 Wilcoxon W 33039.000 z -18.454 Asymp. Sig. (2-tiled).000 Grouping Vrible: TRT. Observe tht only 1 p-vlue will be given for both Mnn Whitney U nd Wilcoxon Rnk Sum tests. Tble XVI. Descriptive sttistics of Systolic BP by weight groups. Men Std Minimum Mximum Medin Devition over-weight 140.89 24.65 90.00 195.00 137.50 under-weight 104.72 21.58 80.00 186.00 100.00 norml-weight 112.14 26.79 80.00 194.00 100.00 After checking for the normlity ssumptions, to perform n ANOVA, in SPSS, use Anlyze, Compre Mens, One-Wy ANOVA. Click on Options nd tick the Homogeneity of Vrince test. Tbles XVII & XVIII shows the results for the homogeneity of vrince nd ANOVA tests respectively. Tble XVII. Homogeneity of Vrince test. Test of Homogeneity of vrinces Levene sttistic df1 df2 Sig. 4.249 2 296.090 The Null hypothesis is: Equl Vrinces ssumed. Since p = 0.09>0.05, we cnnot reject the null hypothesis of equl vrince. Tble XVIII. ANOVA results. ANOVA Sum of df Men F Sig. Squres Squre Between Groups 72943.542 2 36471.771 61.126.000 Within Groups 176613.970 296 596.669 Totl 249557.512 298 ANOVA (ANALYSIS OF ONE WAY VARIANCE) The ANOVA is just n extension of the 2-Smple T test when there re more thn 2 groups to be compred. The 3 ssumptions for the 2-Smple T test lso pply for the ANOVA. Let s sy, this time we hve 3 weight groups (norml, under nd over weight), the descriptive sttistics is given in Tble XVI. The Null Hypothesis: All the groups mens re equl. Since p<0.001, not ll the groups mens re equl. We would wnt to crry out post-hoc test to determine where the differences were. In SPSS, under the ANOVA, click on the Post Hoc button nd tick Bonferroni (3) (this method is most commonly used nd rther conservtive in testing for multiple

Singpore Med J 2003 Vol 44(8) : 395 Tble XIX. ANOVA Bonferroni djustment for multiple comprisons. Dependent vrible: Bonferroni Multiple comprisons Intervl (1) GROUP (J) GROUP Men (I-J) Std. Error Sig. Lower Bound Upper Bound over-weight over-weight under-weight 36.1700* 3.45447.000 27.8528 44.4872 norml-weight 28.7486* 3.46318.000 20.4104 37.0868 under-weight over-weight -36.1700* 3.45447.000-44.4872-27.8528 under-weight norml-weight -7.4214 3.46318.099-15.7596.9168 norml-weight over-weight -28.7486* 3.46318.000-37.0868-20.4104 under-weight 7.4214 3.46318.099 -.9168 15.7596 norml-weight * The men difference is significnt t the.05 level. comprisons). Tble XIX shows the post-hoc multiple comprisons using Bonferroni djustments. The of the over-weights were sttisticlly (nd cliniclly) higher thn the other 2 weight groups but there ws no sttisticl difference between the norml nd under weights (p = 0.099). If we hve crried out multiple 2 Smple T tests on our own, we hve to djust the type 1 error mnully. By Bonferroni, we hve to multiply the p-vlue obtined by the number of comprisons performed. For 3 groups, there will be 3 comprisons (ie. A vs B, B vs C & A vs C). Tble XX shows the 2 Smple T-test between the norml nd under weights. It seems tht there s lso sttisticl difference between the 2 groups in but tking into ccount multiple comprison nd djusting for type 1 error, we will hve to multiply the p-vlue (= 0.033) by 3 which gives the sme result s in ANOVA post-hoc. Tble XX. 2 Smple T test for Norml vs Under weight. Independent Smples Test Levene s Test for Equlity of Vrinces t-test for Equlity of Mens Sig. (2- F Sig. t df tiled) Equl vrinces ssumed 7.470.007-2.153 197.033 Equl vrinces not ssumed -2.151 187.71.033 When normlity nd homogeneity of vrince ssumptions re not stisfied, the equivlent nonprmetric Kruskl Wllis test will be pplied. In SPSS, use Anlyze, Non Prmetric tests, k Independent Smples. Tble XXI shows the SPSS results. Tble XXI. Kruskl Wllis test on for the 3 groups. Test Sttistics,b Chi-Squre 101.083 df 2 Asymp. Sig..000 Kruskl Wllis Test b Grouping Vrible: GROUP There ws sttisticl significnt difference mongst the groups. In Kruskl Wllis, there s no post-hoc option vilble, we will hve to do djust for the type 1 error mnully for multiple comprisons. TYPE 1 ERROR ADJUSTMENTS A type 1 error is committed when we reject the Null Hypothesis of no difference is true. If we tke the conventionl level of sttisticl significnce t 5%, it mens tht there is 0.05 (5%) probbility tht result s extreme s the criticl vlue could occur just by chnce, i.e. the probbility of flse positive is 0.05. There re few scenrios when djustments for type 1 error is required: Multiple comprisons When we re compring between 2 tretments A & B with 5% significnce level, the chnce of true negtive in this test is 0.95. But when we perform A vs B nd A vs C (in three tretment study), then the probbility tht neither test will give significnt result when there is no rel difference is 0.95 x 0.95 = 0.90; which mens the type 1 error hs incresed to 10%.

396 : 2003 Vol 44(8) Singpore Med J Tble XXII. Tble XXII shows the probbility of getting flse positive when repeted comprisons t 5% level of significnce re performed. Thus for 3 pirwise comprisons for 3-tretment groups study (generlly, number of pirwise comprisons for n-group study is given by n(n-1)/2), without performing type 1 error djustment, the probbility of flse positive is 14%. Number of comprisons 1 2 3 4 5 6 7 8 9 10 Probbility of flse positive 5% 10% 14% 19% 23% 27% 30% 34% 37% 40% As mentioned in ANOVA, Bonferroni djustments (multiplying the p-vlue obtined in ech multiple testing by the number of comprisons) would be the most convenient nd conservtive test. But this test hs low power (the bility to detect n existing significnt difference) when the number of comprisons is lrge. For exmple with 4 tretment groups, we will hve 6 comprisons which mens tht for every pir-wise p-vlue obtined, we hve to multiply by 6. In such sitution, other multiple comprison techniques like Tukey or Scheffe would be pproprite. Miller (1981) (4) gve comprehensive review of the pros nd cons of the vrious methods vilble for multiple comprisons. In ANOVA, this multiple comprison is utomticlly hndled by the post-hoc option but for Kruskl Wllis test, mnul djustments needed to be crried out by the user which mens tht the Bonferroni method would normlly be used becuse of it s simplicity. Chsing fter the p-vlue In the exmple of the Pired T-test, the before & fter tretment nlysis gve p-vlue of 0.052 with n=167. Perhps this p-vlue will be significnt if we increse the smple size. If the smple size ws indeed incresed, then the obtined p-vlue will hve to be multiplied by 2! The reson being tht we re lredy bised by the positive trend of the findings nd the type 1 error needed to be controlled. Interim nlysis Normlly in lrge smple size clinicl trils, interim nlyses re crried out t certin time points to ssess the efficcy of the ctive tretment over the control. This is crried out usully on the ethicl bsis tht perhps the ctive tretment is relly superior by lrger effect difference thn expected (thus smller smple size would be sufficient to detect sttisticl significnce) nd we do not wnt to put further subjects on the control rm. These plnned interim nlyses with documenttions of how the type 1 error djustments for multiple comprisons must be specificlly write-up in the protocol. CONCLUSIONS The concentrtion of the bove discussions hve been on the ppliction of the relevnt tests for different types of designs. The theoreticl spects of the vrious sttisticl techniques could be esily referenced from ny sttisticl book. The next rticle (Biosttistics 103: Qulittive Dt Test of Independence), we will discuss on the techniques vilble to nlyse ctegoricl vribles. REFERENCES 1. Chn YH. Biosttistics 101: Dt presenttion, Singpore Medicl Journl 2003; Vol 44(6):280-5. 2. Chn YH. Rndomised Controlled Trils (RCTs) smple size: the mgic number? Singpore Medicl Journl 2003; Vol 44(4):172-4. 3. Blnd, JM & DG. Altmn. Multiple significnce tests: the Bonferroni method, 1995 British Medicl Journl 310:170. 4. Miller RG Jr. Simultneous Sttisticl Inference, 2nd ed, 1981, New York, Springer-Verlg.