1. Comparing Two Means: Dependent Samples

1. Comparing Two Means: ependent Samples In the preceding lectures we've considered how to test a difference of two means for independent samples. Now we look at how to do the same thing with dependent samples specifically, when observations from both samples can be matched one-for-one. This method is called the matched-pairs t-test or paired t-test. Some example applications: Are student ability scores are the same before vs. after a course? o patients show improvement after a treatment? If Treatment A and Treatment B are given to the same patients, which works better? This procedure is very simple, because it is ultimately merely a test of a single mean. That is, let X 1 and X be two measurements (e.g., Pre and Post scores) made on the same sample of subjects/objects. efine the new variable ifference = = X 1 X for all cases. If our scientific hypothesis is that the means of X 1 and X are different (e.g., one treatment is better than another other), our null and alternative hypotheses are simply: H0: μ = 0 (i.e., μ 1 = μ ) H1: μ 0 (i.e., μ 1 μ ) where μ is the (population) mean difference of X 1 and X, equal to μ 1 μ. Alternatively, if we want to test for a difference of, say, greater than some value c: H0: μ = c (i.e., μ 1 = μ + c) H1: μ > c (i.e., μ 1 > μ + c) When the null hypothesis is for no difference (H0: μ = 0) we our test statistic is: - µ t = s / n where: n is the number of pairs. s is the sample standard deviation computed for = (X 1 X ). As before, we then determine the probability (p) of this t value and compare it to a pre-specified α (e.g., α = 0.05). If p < α, reject H0. Credible/Confidence Intervals

To compute a credible/confidence interval for the mean difference between matched pairs, look up the critical value of t crit for the desired width of the credible/confidence interval (e.g., 95%). Then use the formulas: LL = UL = ( X 1 X ) t crit s ( X 1 X ) + t crit s Excel. Paired t-tests in Excel and JMP 1. State H0 and H1; choose α.. Enter X1 and X values side by side in adjacent columns. 3. Make a new column for = (X1 X). 4. Calculate mean and sample standard deviation of. 5. Compute t statistic t = /( s / n) (assuming H0: μ = 0) 6. Use Excel function T.IST to find p = probability in tail area(s) of t distribution. 7. If p < α, reject H0. Figure 1 JMP 1. Paste X and Y variables into two separate columns, side by side.. Highlight columns 3. Analyze > Matched Pairs 4. In pop-up window, designate both variables as "Y, Paired Responses", and press OK Step 4 Step 3

3. Chi-Square Tests We'll now look at how to test statistical hypotheses concerning nominal data, and specifically when nominal data are summarized as tables of frequencies. The tests we will considered are generically called chi-squared (or chi-square) tests. Each test involves computing a test statistic, and then calculating the area in the tail of a theoretical distribution called the chisquared (χ²) distribution. The χ² distribution, like the t distribution, is actually a family of distributions each one corresponding to a certain number of degrees of freedom: However in the case of the χ² distribution, we are almost always concerned with upper-tail probabilities. That is, chi-squared tests are usually 1-tailed.

Hypothetical ata Various Outcomes to Arterial Stent Placement Outcome Observed (O) Expected (E) Rejected 15 7 1 100 days 75 60 > 100 days 118 156 Replaced 0 5 Total 8 8 Our observed frequencies come from data on 8 patients who receive the treatment. Our expected frequencies may come from theoretical models or from estimates of probabilities derived from some larger reference population. Our null hypothesis is that the observed frequencies do not differ from the expected frequencies by more than is expected than chance. Or: H0: Our sample comes from some specified reference population. To test the null hypothesis, we may use either of two test statistics. Pearson X-squared statistic X = All cells ( O E E ) Likelihood ratio statistic L = All cells O O ln E Both of these test statistics follow a theoretical χ²-distribution. They are typically, (though not necessarily always), close in value to each other. Note that in the former case the test statistic is denoted X. This should be called "ex-squared". It is not the same as the theoretical distribution, χ² (chi-squared). Most textbooks mistakenly call the test statistic (X ) "chi-squared." That is, the name "chi-squared" test comes from the distribution used to test the hypothesis (χ² distribution), and not the test statistic itself.

We perform our test by computing X. Our calculations for the example data are shown below: Hypothetical ata Various Outcomes to Arterial Stent Placement Outcome Observed (O) Expected (E) ( O E) (O E) E Rejected 15 7 64 9.14 1 100 days 75 60 5 3.75 > 100 days 118 156 1444 9.6 Replaced 0 5 5 45 Total 8 8 Sum = X = 67.15 The area of the χ² distribution (with 4 1 = 3 df) above 67.15 is vanishingly small (p = 1.739E-14). Even assuming a low α (e.g., α = 0.001) then p < α, so we reject the H0 which asserted that our data came from the reference population. That is, our sample comes from some other population, with probabilities of each level that are different from the reference population. We can check our results here: http://vassarstats.net/csfit.html Homework 4 Work 9.9 (a) and (c) using Excel, as in Figure 1 above and class demonstration. Use data = Gasmile.xls. (Hint. First do problem in JMP to find correct results). Print results (or check with me for alternative). Read: http://onlinestatbook.com//chi_square/distribution.html http://onlinestatbook.com//chi_square/one-way.html http://onlinestatbook.com//chi_square/contingency.html