dominant: other:

Transcription

1 Data Set 6: Comparison of arm size Background In this handout I use the data provided by Biometry students in 26 on the circumferences of their dominant and other arms. I had hypothesized that most people s dominant arm would be slightly larger than their other arm, and this handout tests this not-so-serious hypothesis. I also had expected that the difference between dominant and non-dominant arms would be very small compared to the variation among people in arm size. The main purpose of this study was to determine whether a paired-sample design would provide enough power to detect a difference using a fairly small sample. This handout thus illustrates the analysis of paired samples, and the effectiveness of pairing in eliminating the effect of the great variability among individuals. The question: The data Is a person s dominant arm larger than their other arm? Thirty students turned in arm measurements and indicated which arm is their dominant one. Measurements are the circumference of the upper arm at its largest part, in mm. student: dominant: other: student: dominant: other: student: dominant: other:

2 Data exploration Arms separately dominant arm 8 45 dominant arm other arm Frequency 8 other arm Variable Mean StDev Minimum Q1 Median Q3 Maximum dominant arm other arm Comparison of the means or medians shows that the dominant arms are indeed slightly larger than the other arms, but that the difference 6.5 or 3 mm is much smaller than the variability in size of either arm. The distributions are asymmetric, but fairly similar to each other. Paired values Not surprisingly, the sizes of a person s two arms are quite similar, so that the two variables are highly correlated, as shown by a scatterplot.it also is evident that most of the points are to the right of the line indicating equality: most dominant arms are larger, or at least not smaller, than the corresponding other arm other arm dominant arm 4 45 Data Set 6: Comparison of arm size (rev. October 16, 29) 2

3 In a paired design the proper analysis is of the within-pair differences. These are shown in the following plots. The first plots the two arm measures by person, and shows the very small but fairly consistent difference between arms as well as the large differences between people Variable dominant arm other arm circumference (mm) person Differences Examining the distribution of the differences, the first thing we can notice is that most of the differences are positive: at least 3/4s are or positive (the lower quartile is ). The distribution is roughly symmetric and bell shaped, with a mild left (negative) outlier and a substantial right (large positive) outlier. Frequency difference: dominant - other 4 difference: dominant - other The normal quantile-quantile plot of the differences (below) indicates that the distribution is close to normal. The principal exception is the outlier at the right end. There also is a suggestion Data Set 6: Comparison of arm size (rev. October 16, 29) 3

4 of a longer-than-normal tail on the left side. There also is some granularity, resulting from a tendency for the values to have been recorded with only 5 mm precision.. 99 Probability Plot of difference 95 9 Percent difference Variable Mean StDev Minimum Q1 Median Q3 Maximum difference These statistics quantify what is seen in the plots: most differences are or larger, the typical difference is about 6 mm, the variability is considerably larger than this (s = 9.47, IQR=1), and the distribution is roughly symmetric (shown by the similarity of the mean and median, as well as by the symmetry of Q1 and Q3 around the median). Data Set 6: Comparison of arm size (rev. October 16, 29) 4

5 Inference The purpose of the study was to test whether the dominant arms are larger than the other arms, so one-sided paired-sample tests are appropriate. The difference will also be estimated. Scope of inference Clearly the 3 students who submitted data are not a random sample of any population. They probably though are reasonably typical, in terms of arm circumferences, of healthy (mostly) young adults. I expect the conclusion drawn from these data is true for most people in that loosely defined population, though there could be ethnic, occupational, or other differences which are not represented in the class. In addition it is quite possible that the between-arm difference might not be the same or might not even be present in populations differing in health, nutritional status, age, or various other factors. t procedures The standard t test and confidence interval are based on these statistics for the sample of differences: Variable N Mean SE Mean StDev difference From these, the result for the one-sided test of the null hypothesis that the mean difference is (i.e. H : μ D = vs. H a : μ D > ) is T-Value = 3.79 P-Value <.1 Assuming for the moment that this test is valid, it indicates that it would be very unusual to get a mean difference, or a t statistic, as large as was observed if the true difference were : there is fairly strong evidence that dominant arms are larger than non-dominant arms. To quantify how much larger, a one-sided confidence interval that is, a lower bound can be calculated, as x t α ( σ D n). (The differences from a standard CI are that the t critical value is for α rather than α/2, and only the lower side is calculated.) The result is 95% lower bound for mean difference: which would be expressed as we are 95% confident that the mean difference is at least mm. Nonparametric procedures sign procedures Minitab s output for the one-sided sign test of H : M D = vs. H a : M D > is Sign test of median =. versus >. N Below Equal Above P Median difference: dominant - other In a situation such as this, with a one-sided alternative hypothesis and several of the observations equalling, it would be better to count those s as supporting H rather than Data Set 6: Comparison of arm size (rev. October 16, 29) 5

6 following the usual procedure of excluding them, as Minitab does. Defined this way, the P-value would be P(X 22) where X has a Binomial (n=3, p=.5) distribution. This gives a much more conservative result: P =.81. Minitab s sign procedure does not explicitly produce one-sided confidence bounds, but one can be gotten by requesting a CI with twice the α desired for the bound, and then only reporting the appropriate end of the interval, in this case the lower end. (In calculating the CI, Minitab uses all observations, including s.) Achieved Interval Confidence Lower Upper Position NLI According to these procedures, there is strong evidence the true median of the differences is larger than, and indeed we are 95% confident it is at least 4 mm. Signed-rank procedures Minitab gives the following results: N for Wilcoxon Estimated N Test Statistic P Median difference <.1 6. The preceding test also excludes s. The procedure can be tricked into counting the s as support for H, however, by setting them equal to some negative value smaller in absolute value than any of the positive values. This gives the following: N for Wilcoxon Estimated N Test Statistic P Median shifted difference <.1 6. For this test the effect of excluding or including the s is much less than for the sign test. The 95% confidence lower bound for the true median is: Estimated Achieved Interval N Median Confidence Lower Upper difference These signed-rank results are quite similar to those of the t test above: the evidence against the null hypothesis is quite significant and the true median is estimated to be at least 4 mm. Resampling procedures Randomization tests Following is the result for a test of H : μ D = vs. H a : μ D > : Minitab onesampleran macro: P =.1 The resampled distribution for this procedure is shown below; the test statistic is the sum of the differences. The distribution is fairly close to normal, with slightly shorter tails than a normal distribution. The observed value of the statistic, shown by the vertical line, is in the far right tail of the distribution, resulting in the low P-value. Data Set 6: Comparison of arm size (rev. October 16, 29) 6

7 6 Histogram of tssim Probability Plot of tssim Normal - 95% CI Frequency Percent tssim tssim Bootstrap confidence bounds The Minitab bootstrap procedures do not explicitly provide one-sided confidence bounds, so 9% confidence intervals were requested, and the upper bounds will be ignored. The various forms of bounds for the true mean are as follows: Bootstrap-t method 3.91 Efron percentile method Hall percentile method 3.5 BC percentile method BCA percentile method The bootstrap distribution is quite close to normal. Histogram of sim means Probability Plot of sim means Normal - 95% CI Frequency Percent sim means sim means There also is a Minitab macro for bootstrap confidence intervals for the population median. The results of this are as follows (again requesting 9% intervals and ignoring the upper bound): Bootstrap-t method Efron percentile method 4. Hall percentile method 3. BC percentile method 2.5 Data Set 6: Comparison of arm size (rev. October 16, 29) 7

8 Which method to use? All the procedures t, sign, signed-rank, and bootstrap/permutation gave quite similar results: P-values of.1 or smaller, and 95% confidence lower bounds for the mean or median of about 3.6 to 4 mm. The distribution of the data looks somewhat non-normal, largely because of one unusually large value. The sample size is not small, however, and the various resampling distributions all are quite close to normal. Considering the similarity of the t and resampling results, and the near normality of the resampling distributions, I think the standard t results are valid for these analyses. Effect of pairing The power of the one-sided paired-t test with n = 3, α =.5, and assuming parameter values of σ D = 9.5 and H a : μ D = 6.5, similar to the observed statistics, is very high:.978. The study could have been done using two independent samples of people, with dominant arms measured for one sample and non-dominant arms measured for the other sample. In this case the large variability among people would make the sampling distribution of the difference in sample means much wider than the sampling distribution of mean within-pair differences for the paired design. Specifically, the standard deviations of the dominant-arm circumferences and the other-arm circumferences were mm and mm, about five times the standard deviation of the differences. Assuming σ dom = σ non = 48, H a : μ dom μ non = 6.5, and with α=.5, a one-sided twosample t-test would require sample sizes (for each sample) of 1461 to achieve the same power of.978 as the paired-t analysis had with only 3 observations! With the same assumptions and sample sizes of 3 (giving the same total number of arm measurements), the two-sample test would have power of only.13; to reach power of.978 with n = 3, the true difference in circumference means would have to be nearly 46 mm (equivalent to more than 1 cm in arm diameter)! Conclusions Is a person s dominant arm larger than their other arm? Yes, at least for a poorly defined population of healthy young(ish) adults like those students who gave me data. For the majority of the people in the sample the dominant arm was larger, and the data provide strong evidence that the true mean difference is greater than : x = 6.55, t = 3.79, P =.1. We can be 95% confident the mean difference is at least 3.6 mm. The paired sampling design increased the power of this analysis enormously, as a result of the small but fairly consistent difference between a given person s arms along side the great variability among people in the size of their arms. Data Set 6: Comparison of arm size (rev. October 16, 29) 8