UCLA STAT 3 Introuction to Statistical Methos for the Life an Health Sciences Instructor: Ivo Dinov, Asst. Prof. of Statistics an Neurology Chapter 9 Paire Data Teaching Assistants: Jacquelina Dacosta & Chris Barr University of California, Los Angeles, Fall 006 http://www.stat.ucla.eu/~inov/courses_stuents.html Slie Slie Comparison of Paire Samples Paire ata In chapter 7 we iscusse how to compare two inepenent samples In chapter 9 we iscuss how to compare two samples that are paire In other wors the two samples are not inepenent, Y an Y are linke in some way, usually by a irect relationship For example, measure the weight of subjects before an after a six month iet To stuy paire ata we woul like to examine the ifferences between each pair Y -Y each Y, Y pair will have a ifference calculate With the paire t test we woul like to concentrate our efforts on this ifference ata we will be calculating the mean of the ifferences an the stanar error of the ifferences Slie 3 Slie 4 Paire ata Paire ata The mean of the ifferences is calculate just like the one sample mean we calculate in chapter n y y it also happens to be equal to the ifference in the sample means this is similar to the t test This sample mean ifferences is an estimate of the population mean ifference μ μ μ Because we are focusing on the ifferences, we can use the same reasoning as we i for a single sample in chapter 6 to calculate the stanar error aka. the stanar eviation of the sampling istribution of Recall: s SE n s Using similar logic: SE n where s is the stanar eviation of the ifferences an n is the sample size of the ifferences Slie 5 Slie 6
Paire ata Paire ata Example: Suppose we measure the thickness of plaque (mm) in the caroti artery of ranomly selecte patients with mil atherosclerotic isease. Two measurements are taken, thickness before treatment with Vitamin E (baseline) an after two years of taking Vitamin E aily. What makes this paire ata rather than inepenent ata? Why woul we want to use pairing in this example? Subject Before After Difference 0.66 0.60 0.06 0.7 0.65 0.07 3 0.85 0.79 0.06 4 0.6 0.63-0.0 5 0.59 0.54 0.05 6 0.63 0.55 0.08 7 0.64 0.6 0.0 8 0.70 0.67 0.03 9 0.73 0.68 0.05 0.68 0.64 0.04 mean 0.68 0.637 0.045 s 0.074 0.0709 0.064 Slie 7 Slie 8 Paire ata Paire CI for μ Calculate the mean of the ifferences an the stanar error for that estimate SE 0.045 s s n 0.064 0.064 0.00833 A 0( - α )% confience interval for μ ± t( f ) α where f n - ( SE ) Very similar to the one sample confience interval we learne in section 6.3, but this time we are concentrating on a ifference column rather than a single sample Slie 9 Slie Paire CI for μ Paire CI for μ Example: Vitamin E (cont ) Calculate a 90% confience interval for the true mean ifference in plaque thickness before an after treatment with Vitamin E ± t( f ) α ( SE 0.045 ± t(9) 0.05 0.045 ± (.833)(0.00833) (0.097,0.0603) ) (0.00833) CONCLUSION: We are highly confient, at the 0. level, that the true mean ifference in plaque thickness before an after treatment with Vitamin E is between 0.03 mm an 0.06 mm. Great, what oes this really mean? Does the zero rule work on this one? Slie Slie
Paire t test Of course there is also a hypothesis test for paire ata # Hypotheses: Ho: μ 0 Ha: μ! 0 or Ha: μ < 0 or Ha: μ > 0 # test statistic Where f n #3 p-value an #4 conclusion similar iea to that of the inepenent t test t s 0 SE Paire t test Example: Vitamin E (cont ) Do the ata provie enough evience to inicate that there is a ifference in plaque before an after treatment with vitamin E for two years? Test using α 0. H o : μ 0 H a : μ! 0 f 9 (thickness in plaque is the same before an after treatment with Vitamin E ) (thickness in plaque after treatment is ifferent than before treatment with Vitamin E ) p < (0.0005) 0.00, so we reject H o. 0.045 0 t s 5. 40 0.00833 Slie 3 Slie 4 Paire t test Paire t test CONCLUSION: These ata show that the true mean thickness of plaque after two years of treatment with Vitamin E is statistically significantly ifferent than before the treatment (p < 0.00). In other wors, vitamin E appears to be a effective in changing caroti artery plaque after treatment May have been better to conuct this as an uppertaile test because we woul hope that vitamin E will reuce clogging however, researchers nee to make this ecision before analyzing ata Slie 5 Paire T-Test an CI: Before, After Paire T for Before - After N Mean StDev SE Mean Before 0.68000 0.07407 0.03466 After 0.637000 0.070875 0.043 Difference 0.045000 0.0635 0.008333 90% CI for mean ifference: 99 (0.0974, 0.06076) 95 90 80 T-Test of mean ifference 0 70 60 50 40 (vs not 0): 30 0 T-Value 5.40 P-Value 0.000 Percent 5 Slie 6 0.000 0.05 Probability Plot of Normal 0.050 0.075 0.0 Mean 0.045 StDev 0.0635 N AD 0.67 P-Value 0.60 Results of Ignoring Pairing Results of Ignoring Pairing Suppose we accientally analyze the groups inepenently (like an inepenent t-test) rather than a paire test? keep in min this woul be an incorrect way of analyzing the ata How woul this change our results? Example Vitamin E (con t) Calculate the test statistic an p-value as if this were an inepenent t test SE y y s s + n n y y 0.074 0.0709 + 0.035 ts.38 SE 0.035 y y f 7 (0.05) < p < (0.) 0. < p < 0. Fail To Reject H o! 0.68 0.637 Slie 7 Slie 8 3
Results of Ignoring Pairing Paire T-Test an CI: Before, After What happens to a CI? Calculate a 90% confience interval for μ - μ y y ± t( f ) ( SE α y y (0.68 0.637) ± t(7) 0.045± (.740)(0.035) ( 0.06,0.6) (0.035) How oes the significance of this interval compare to the paire 90% CI (0.03 mm an 0.06 mm)? Why is this happening? Is there anything better about the inepenent CI? Is it worth it in this situation? ) 0.05 Paire T for Before - After N Mean StDev SE Mean Before 0.68000 0.07407 0.03466 After 0.637000 0.070875 0.043 Difference 0.045000 0.0635 0.008333 90% CI for mean ifference: (0.0974, 0.06076) T-Test of mean ifference 0 (vs not 0): T-Value 5.40 P-Value 0.000 Two Two-Sample T-Test an CI: Before, After Two-sample T for Before vs After N Mean StDev SE Mean Before 0.680 0.074 0.03 After 0.6370 0.0709 0.0 Difference mu (Before) - mu (After) Estimate for ifference: 0.045000 90% CI for ifference: (-0.0450, 0.450) T-Test of ifference 0 (vs not ): T-Value.39 P-Value 0.83 DF 7 Slie 9 Slie 0 Results of Ignoring Pairing Why woul the SE be smaller for correctly paire ata? If we look at the within each sample at the ata we notice variation from one subject to the next This information gets incorporate into the SE for the inepenent t-test via s an s The original reason we paire was to try to control for some of this inter-subject variation This inter-subject variation has no influence on the SE for the paire test because only the ifferences were use in the calculation. The price of pairing is smaller f. However, this can be compensate with a smaller SE if we ha paire correctly. Conitions for the valiity of the paire t test Conitions we must meet for the paire t test to be vali: It must be reasonable to regar the ifferences as a ranom sample from some large population The population istribution of the ifferences must be normally istribute. The methos are approximately vali if the population is approximately normal or the sample size n is large. These conitions are the same as the conitions we iscusse in chapter 6. Slie Slie Conitions for the valiity of the paire t test How can we check: check the stuy esign to assure that the ifferences are inepenent (ie no hierarchical structure within the 's) create normal probability plots to check normality of the ifferences NOTE: p.355 summary of formulas The Paire Design Ieally in the paire esign the members of a pair are relatively similar to each other Common Paire Designs Ranomize block experiments with two units per block Observational stuies with iniviually matche controls Repeate measurements on the same iniviual Blocking by time forme implicitly when replicate measurements are mae at ifferent times. IDEA of pairing: members of a pair are similar to each other with respect to extraneous variables Slie 3 Slie 4 4
The Paire Design Example: Vitamin E (cont ) Same iniviual measurements mae at ifferent times before an after treatment (controls for within patient variation). Example: Growing two types of bacteria cells in a petri ish replicate on 0 ifferent ays. These are measurements on ifferent bacteria at the same time (controls for time variation). Purpose of Pairing Pairing is use to reuce bias an increase precision By matching/blocking we can control variation ue to extraneous variables. For example, if two groups are matche on age, then a comparison between the groups is free of any bias ue to a ifference in age istribution Pairing is a strategy of esign, not analysis Pairing nees to be carrie out before the ata are observe It is not correct to use the observations to make pairs after the ata has been collecte Slie 5 Slie 6 Paire vs. Unpaire If the observe variable Y is not relate to factors use in pairing, the paire analysis may not be effective For example, suppose we wante to match subjects on race/ethnicity an then we compare how much ice cream men vs. women can consume in an hour The choice of pairing epens on practical consierations (feasibility, cost, etc ) an on precision consierations If the variability between subjects is large, then pairing is preferable If the experimental units are homogenous then use the inepenent t The sign test is a non-parametric version of the paire t test We use the sign test when pairing is appropriate, but we can t meet the normality assumption require for the t test The sign test is not very sophisticate an therefore quite easy to unerstan Sign test is also base on ifferences Y Y The information use by the sign test from this ifference is the sign of (+ or -) Slie 7 Slie 8 # Hypotheses: H o : the istributions of the two groups is the same H a : the istributions of the two groups is ifferent or H a : the istribution of group is less than group or H a : the istribution of group is greater than group # Test Statistic B s - Metho # Test Statistic B s :. Fin the sign of the ifferences. Calculate N + an N - 3. If H a is non-irectional, B s is the larger of N + an N - If H a is irectional, B s is the N that jives with the irection of Ha: if H a : Y <Y then we expect a larger N -, if H a : Y >Y then we expect a larger N +. NOTE: If we have a ifference of zero it is not inclue in N + or N -, therefore n nees to be ajuste Slie 9 Slie 30 5
#3 p-value: Table 7 p.684 Similar to the WMW Use the number of pairs with quality information http://www.socr.ucla.eu/htmls/socr_analyses.html #4 Conclusion: Similar to the Wilcoxon-Mann-Whitney Test Do NOT mention any parameters! Slie 3 Example: sets of ientical twins are given psychological tests to etermine whether the first born of the set tens to be more aggressive than the secon born. Each twin is score accoring to aggressiveness, a higher score inicates greater aggressiveness. Because of the natural pairing in a set of twins these ata can be consiere paire. Slie 3 Set st born n born Sign of 86 88-7 77-3 77 76 + 4 68 64 + 5 9 96-6 7 7 Drop 7 77 65 + 8 9 90 + 9 70 65 + 7 80-88 8 + 87 7 + (cont ) Do the ata provie sufficient evience to inicate that the first born of a set of twins is more aggressive than the secon? Test using α 0.05. H o : The aggressiveness is the same for st born an n born twins H a : The aggressiveness of the st born twin tens to be more than n born. NOTE: Directional Ha (we re expecting higher scores for the st born twin), this means we preict that most of the ifferences will be positive N + number of positive 7 N - number of negative 4 n number of pairs with useful info Slie 33 B s N + 7 (because of irectional alternative) P > 0., Fail to reject H o CONCLUSION: These ata show that the aggressiveness of st born twins is not significantly greater than the n born twins (P > 0.). X~B(, 0.5) P(X>7)0.7444065 http://socr.stat.ucla.eu/htmls/socr_distributions.html (Binomial Distribution) http://socr.stat.ucla.eu/applets.ir/normal_t_chi_f_tables.htm Slie 34 Hol on i we actually nee to carry out a sign test? What shoul we have checke first? Percent 99 95 90 80 70 60 50 40 30 0 5-5 - -5 Probability Plot of iff Normal 0 iff 5 5 0 Mean.97 StDev 7.54 N AD 0.64 P-Value 0.9 Practice Suppose H a : one-taile, n An B s Fin the appropriate p-value 0.005 < p < 0.0 Pick the smallest p-value for B s an bracket NOTE: Distribution for the sign test is iscrete, so probabilities are somewhat smaller (similar to Wilcoxon-Mann-Whitney) Slie 35 Slie 36 6
Applicability of the Sign Test Applicability of the Sign Test Vali in any situation where s are inepenent of each other Distribution-free, oesn t epen on population istribution of the s although if s are normal the t-test is more powerful Can be use quickly an can be applie on ata that o not permit a t-test Slie 37 Example: ranomly selecte rats were chosen to see if they coul be traine to escape a maze. The rats were release an time (sec.) before an after weeks of training. Do the ata provie evience to suggest that the escape time of rats is ifferent after weeks of training? Test using α 0.05. Rat Before After Sign of 0 50 + 38 + 3 N 45 + 4 6 + 5 95 90 + 6 6 0 + 7 56 75-8 35 5 + 9 4 44 + N 50 + N enotes a rat that coul not escape the maze. Slie 38 Applicability of the Sign Test H o : The escape times (sec.) of rats are the same before an after training. H a : The escape times (sec.) of rats are ifferent before an after training. N + 9; N - ; n B s larger of N + or N - 9 0.0 < p < 0.05, reject H o X~Bin(, 0.5) P(X>9)0.074875 http://socr.stat.ucla.eu/applets.ir/normal_t_chi_f_tables.htm CONCLUSION: These ata show that the escape times (sec.) of rats before training are ifferent from the escape times after training (0.0 < p < 0.05). Further Consierations in Paire Experiments Many stuies compare measurements before an after treatment There can be ifficulty because the effect of treatment coul be confoune with other changes over time or outsie variability for example suppose we want to stuy a cholesterol lowering meication. Some patients may have a response because they are uner stuy, not because of the meication. We can protect against this by using ranomize concurrent controls Slie 39 Slie 40 Further Consierations in Paire Experiments Further Consierations in Paire Experiments Example: A researcher conucts a stuy to examine the effect of a new anti-smoking pill on smoking behavior. Suppose he has collecte ata on 5 ranomly selecte smokers, will receive treatment (a treatment pill once a ay for three months) an 3 will receive a placebo (a mock pill once a ay for three months). The researcher measures the number of cigarettes smoke per week before an after treatment, regarless of treatment group. Assume normality. The summary statistics are: # cigs / week n y before y after SE Treatment 63.9 5.50.4. Placebo 3 63.08 60.3.85.9 Slie 4 Test to see if there is a ifference in number of cigs smoke per week before an after the new treatment, using α 0.05 H o : μ 0 H a : μ! 0 t s.4.38. # cigs / week n SE Treatment.4. Placebo 3.85.9 f n p < (0.0005) 0.00, reject H o. These ata show that there is a statistically significant ifference in the true mean number of cigs/week before an after treatment with the new rug Slie 4 7
Further Consierations in Paire Experiments Further Consierations in Paire Experiments This result oes not necessarily emonstrate the effectiveness of the new meication Smoking less per week coul be ue to the fact that patients know they are being stuie (i.e., ifference statistically significantly ifferent from zero) All we can say is that he new meication appears to have a significant effect on smoking behavior Slie 43 Test to see if there is a ifference in number of cigs smoke per week before an after in the placebo group, using α 0.05 H o : μ 0 H a : μ! 0.85 t s..9 # cigs / week n SE Treatment.4. Placebo 3.85.9 f n 3 (0.0) < p < (0.05) 0.04 < p < 0.05, reject H o These ata show that there is a statistically significant ifference in the true mean number of cigs/week before an after treatment with the a placebo Slie 44 Further Consierations in Paire Experiments Further Consierations in Paire Experiments Patients who i not receive the new rug also experience a statistically significant rop in the number of cigs smoke per week This oesn t necessarily mean that the treatment was a failure because both groups ha a significant ecrease We nee to isolate the effect of therapy on the treatment group Now the question becomes: was the rop in # of cigs/week significantly ifferent between the meication an placebo groups? How can we verify this? Slie 45 Test to see if there is the ifference in number of cigs smoke per week before an after treatment was significant between the treatment an placebo groups, using α 0.05 SE y (.) (.9).695 y + Ho: μ 0.4.85 t 5.06 H a : μ! 0 s.695 f # cigs / week n SE Treatment.4. Placebo 3.85.9 p < (0.0005) 0.00, reject H o These hypothesis tests provie strong evience that the new antismoking meication is effective. If the experimental esign ha not inclue the placebo group, the last comparison coul not have been mae an we coul not support the efficacy of the rug. Slie 46 Reporting of Paire Data Common in publications to report the mean an stanar eviation of the two groups being compare In a paire situation it is important to report the mean of the ifferences as well as the stanar eviation of the ifferences Why? Limitations of There are two major limitations of. we are restricte to questions concerning When some of the ifferences are positive an some are negative, the magnitue of oes not reflect the typical magnitue of the ifferences. Suppose we ha the following ifferences: +40, -35, +0, -4, +6, -3. Descriptive Statistics: ata Variable N N* Mean SE Mean StDev Minimum Q Meian Q3 Max ata 6 0.7 7.9 43.9-4.0-36.8-5.50 45.3 6.0 Slie 47 What is the problem with this? Small average, but ifferences are large. What other statistic woul help the reaer recognize this issue? Slie 48 8
Limitations of Inference for Proportions. limite to questions about aggregate ifferences If treatment A is given to one group of subjects an treatment B is given to a secon group of subjects, it is impossible to know how a person in group A woul have respone to treatment B. Nee to beware of these viewpoints an take time to look at the ata, not just the summaries To verify accuracy we nee to look at the iniviual measurements. Accuracy implies that the s are small We have iscusse two major methos of ata analysis: Confience intervals: quantitative an categorical ata Hypothesis Testing: quantitative ata In chapter, we will be iscussing hypothesis tests for categorical variables RECALL: Categorical ata Gener (M or F) Type of car (compact, mi-size, luxury, SUV, Truck) We typically summarize this type of ata with proportions, or probabilities of the various categories Slie 49 Slie 50 9