One-sample inference: Categorical Data

Transcription

1 One-sample inference: Categorical Data October 8

2 One-sample vs. two-sample studies A common research design is to obtain two groups of people and look for differences between them We will learn how to analyze these types of two-group, or two-sample studies in a few weeks We are going to start, however, with a simpler case: the one-sample study

3 One-sample inference For example, a researcher collects a random sample of individuals, measures their heights, and wants to make a generalization about the heights in the population Or a researcher collects a random sample of individuals, determines whether or not they smoke, and wants to make inferences about the percentage of the population that smokes These are examples of one-sample inference problems the first involving continuous data, the second involving categorical data

4 One-sample inference: categorical data Today s topic is inference for one-sample categorical data The object of such inference is percentages: What percent of patients survive surgery? What percent of women develop breast cancer? What percent of people who do better on one therapy than another? Investigators see one percentage in their sample, but what does that tell them about the population percentage? In short, how accurate are percentages?

5 The normal approximation Approximate approach Exact approach The big picture A percentage is a kind of average the average number of times an event occurs per opportunity Thus, one approach is to use the central limit theorem, which tells us that: The expected value of the sample percentage is the population percentage The standard error of the sample average is equal to the population standard deviation divided by the square root of n The shape of the sampling distribution is approximately normal (how accurate this is depends on n)

6 Approximate approach Exact approach The big picture The normal approximation (cont d) Statisticians often use p to represent the population proportion, and ˆp to represent the sample proportion Thus, if we observe ˆp in our sample, the central limit theorem suggests that ˆp is a good estimate of p If ˆp is a good estimate of the population percentage, then it follows that ˆp(1 ˆp) is a good estimate of the population standard deviation Continuing, a good estimate for the SE is ˆp(1 ˆp) SE = n

7 Approximate approach Exact approach The big picture The probability that p and ˆp are close If the probability that ˆp is within 1 standard error of p is 68%, what is the probability that p is within 1 standard error of ˆp? Also 68%; it s the same thing, just worded differently Therefore, if p plus or minus 1.96 standard errors has a 95% chance of containing ˆp, then ˆp plus or minus 1.96 standard errors has a 95% chance of containing p

8 The form of confidence intervals Approximate approach Exact approach The big picture Thus, x% confidence intervals look like: (ˆp z x% SE, ˆp + z x% SE) where z x% contains the middle x% of the standard normal distribution For 95% confidence intervals, then, z is always 1.96

9 Approximate approach Exact approach The big picture Procedure for finding confidence intervals To sum up, the central limit theorem tells us that we can create x% confidence intervals by: #1 Calculate the standard error: SE = ˆp(1 ˆp)/n #2 Determine the values of the normal distribution that contain the middle x% of the data; denote these values ±z x% #3 Calculate the confidence interval: (ˆp z x% SE, ˆp + z x% SE)

10 Approximate approach Exact approach The big picture Example: Survival of premature infants In order to estimate the survival chances of infants born prematurely, researchers at Johns Hopkins surveyed the records of all premature babies born at their hospital in a three-year period They found 39 babies who were born at 25 weeks gestation, 31 of which survived at least 6 months Their best estimate (point estimate) is that 31/39 = 79.5% of all babies (in other hospitals, in future years) born at 25 weeks gestation would survive at least 6 months, but how accurate is that percentage?

11 Approximate approach Exact approach The big picture Example: Survival of premature infants (cont d) The standard error of the percentage is.795(1.795) SE = 39 = So, one way of expressing the accuracy of the estimated percentage is: 79.5% ± 6.5% (this would be about a 68% confidence interval) Another way wold be to calculate the 95% confidence interval: ( (6.47), (6.47)) = (66.8%, 92.2%)

12 Approximate approach Exact approach The big picture Problems with the normal approximation That approach works pretty well, but if you think about it, the distribution our data isn t normal it s binomial The normal approximation works because the binomial distribution looks a lot like the normal distribution when n is large and p isn t close to 0 or 1 Other times, the normal approximation doesn t work as well n=39, p=0.8 n=15, p=0.95 Probability Probability

13 Approximate approach Exact approach The big picture Example: Survival of premature infants, part II In their study, the Johns Hopkins researchers also found 29 infants born at 22 weeks gestation, none of which survived 6 months The normal approximation is clearly not going to work here, for two reasons: The estimated standard deviation will be 0 Even if it wasn t, the confidence interval will be symmetric about 0, so half of it would be negative

14 Approximate approach Exact approach The big picture Using the binomial distribution directly But why settle for an approximation? The number of infants who survive is going to follow a binomial distribution; why not use that directly? It seems pretty obvious that the lower limit of our confidence interval should be 0, but how can we use the binomial distribution to find an upper limit? The upper limit should be a number p such that there would only be a 2.5% probability of observing 0 infants who survive if the probability of surviving really were p

15 Finding the upper limit for p Approximate approach Exact approach The big picture P(0 out of 29 infants survive) p

16 Exact confidence intervals Approximate approach Exact approach The big picture Thus, the exact confidence interval for the population percentage of infants who survive after being born at 22 weeks is (0%,11.9%) The exact confidence interval for the population percentage of infants who survive after being born at 25 weeks is (63.5%,90.7%) Recall that our approximate confidence interval for the population percentage of infants who survive after being born at 25 weeks was (66.8%, 92.2%)

17 Exact vs. approximate intervals Approximate approach Exact approach The big picture When n is large and p isn t close to 0 or 1, it doesn t really matter whether you choose the approximate or the exact approach The advantage of the approximate approach is that it s easy to do by hand In comparison, finding exact confidence intervals by hand is quite time-consuming

18 Approximate approach Exact approach The big picture Exact vs. approximate intervals (cont d) However, we live in an era with computers, which do the work of finding confidence intervals instantly (as we will see in lab) If we can obtain the exact answer easily, there is no reason to settle for the approximate answer That said, in practice, people use and report the approximate approach all the time Possibly, this is because the analyst knew it wouldn t matter, but more likely, it s because the analyst learned the approximate approach in their introductory statistics course and doesn t know any other way to calculate a confidence interval

19 One-sample hypothesis tests Paired samples The sign test The z-test It is relatively rare to have specific hypotheses about population percentages One important exception is the collection of paired samples In a paired sampling design, we collect n pairs of observations and analyze the difference between the pairs

20 Paired samples The sign test The z-test Hypothetical example: A sunblock study Suppose we are conducting a study investigating whether sunblock A is better than sunblock B at preventing sunburns The first design that comes to mind is probably to randomly assign sunblock A to one group and sunblock B to a different group This is nothing wrong with this design, but we can do better

21 Signal and noise Introduction Paired samples The sign test The z-test Generally speaking, our ability to make generalizations about the population depends on two factors: signal and noise Signal is the magnitude of the difference between the two groups in the present context, how much better one sunblock is than the other Noise is the variability present in the outcome from all other sources besides the one you re interested in in the sunblock experiment, this would include factors like how sunny the day was, how much time the person spent outside, how easily the person burns, etc. depend on the ratio of signal to noise how easily we can distinguish the treatment effect from all other sources of variability

22 Signal to noise ratio Introduction Paired samples The sign test The z-test To get a larger signal-to-noise ratio, we must either increase the signal or reduce the variability The signal is usually determined by nature and out of our control Instead, we are going to have to reduce the variability/noise If our sunblock experiment were controlled, we could attempt such steps as forcing all participants to spend an equal amount of time outside, on the same day, in an equally sunny area, etc.

23 Person-to-person variability Paired samples The sign test The z-test But what can be done about person-to-person variability (how easily certain people burn)? A powerful technique for reducing person-to-person variability is pairing For each person, we can apply sunblock A to one of their arms, and sunblock B to the other arm, and as an outcome, look at the difference between the two arms In this experiment, the items that we randomly sample from the population are pairs of arms belonging to the same person

24 Benefits of paired designs Paired samples The sign test The z-test What do we gain from this? As variability goes down, become narrower become more powerful How much narrower? How much more powerful? This depends on the fraction of the total variability that comes from person-to-person variability

25 More examples Introduction Paired samples The sign test The z-test Investigators have come up with all kinds of clever ways to use pairing to cut down on variability: Before-and-after studies Crossover studies Split-plot experiments

26 Pairing in observational studies Paired samples The sign test The z-test Pairing is also widely used in observational studies Twin studies Matched studies In a matched study, the investigator will pair up ( match ) subjects on the basis of variables such as age, sex, or race, then analyze the difference between the pairs In addition to increasing power, pairing in observational studies also eliminates (some of the) potential confounding variables

27 Cystic fibrosis experiment Paired samples The sign test The z-test You may not have known it at the time, but you have already conducted an exact hypothesis test for paired categorical data in your homework Recall our cystic fibrosis experiment in which each patient took both drug and placebo and the reduction in their lung function (measured by FVC) over a 25-week period was recorded This is a crossover study, an example of a paired design

28 The null hypothesis Introduction Paired samples The sign test The z-test The null hypothesis here is that the drug provides no benefit that whether the patient received drug or placebo has no impact on their lung function Under the null hypothesis, then, the probability that a patient does better on drug than placebo (let s call this p) is 50% So, another, more compact and mathematical way of writing the null hypothesis, is p 0 =.5 (statisticians like to use a subscript 0 to denote the null hypothesis)

29 The sign test Introduction Paired samples The sign test The z-test We can test this null hypothesis by using our knowledge that, under the null hypothesis, the number of patients who do better on the drug than placebo (x) will follow a binomial distribution with n = 14 and p = 0.5 This approach to hypothesis testing is called the sign test All we need to do is calculate the p-value (the probability of obtaining results as extreme or more extreme than the one observed in the data, given that the null hypothesis is true)

30 As extreme or more extreme Paired samples The sign test The z-test The result observed in the data was that 11 patients did better on the drug But what exactly is meant by as extreme or more extreme than 11? It is uncontroversial that 11, 12, 13, and 14 are as extreme or more extreme than 11 But what about 0? Is that more extreme than 11? Under the null, P (11) = 2.2%, while P (0) =.006% So 0 is more extreme than 11, but in a different direction

31 One-sided vs. two-sided tests Paired samples The sign test The z-test Potentially, then, we have two different approaches to calculating this p-value: Find the probability that x 11 Find the probability that x 11 x 3 (the number that is as far away from the expected value of 7 as 11 is, but in the other direction) These are both reasonable things to do, and intelligent people have argued both sides of the debate However, the statistical and scientific community has for the most part come down in favor of the latter the so called two-sided test For this class, all of our tests will be two-sided tests

32 The sign test Introduction Paired samples The sign test The z-test Thus, the p-value of the sign test is p = P (x 3) + P (x 11) = P (x = 0) + + P (x = 3) + P (x = 11) + + P (x = 14) =.006% +.09% +.6% + 2.2% + 2.2% +.6% +.09% +.006% = 5.7% One might call this result borderline significant it isn t below.05, but it s close These results suggest that the drug has potential, but with a sample size of only 14, it s hard to say for sure

33 Introduction Paired samples The sign test The z-test Thinking about the sign test, what enabled us to calculate the p-value? How were we able to attach a specific number to the probability that x would take on certain values? We were able to do this because we knew that, under the null, x followed a specific distribution (in that case, the binomial) This is the most common strategy for developing hypothesis tests to calculate from the data a quantity for which we know its distribution under the null hypothesis Note that in general, we would not know the distribution of the number of patients who do better on drug than placebo only under the null hypothesis

34 Test statistics Introduction Paired samples The sign test The z-test This quantity that we know the distribution of under the null hypothesis is called a test statistic Because we can calculate the test statistic from the data, and because we know its distribution under the null hypothesis, we can calculate the probability of obtaining a result as extreme or more extreme than the observed result (the p-value)

35 The z test statistic Introduction Paired samples The sign test The z-test As we did before with confidence intervals, we can use the central limit theorem for this problem, now to create a test statistic From the central limit theorem, we know that z, the number of standard errors away from p that ˆp falls, follows (approximately) a standard normal distribution Our test statistic, then is z = ˆp p 0 SE Having calculated z, we can get p-values from the standard normal distribution This approach to hypothesis testing is called the z-test

36 The standard error Introduction Paired samples The sign test The z-test What about the standard error? Under the null, the population standard deviation is p0 (1 p 0 ), which means that, under the null, SE = p0 (1 p 0 ) n

37 Procedure for a z-test Introduction Paired samples The sign test The z-test The procedure for a z-test is then: #1 Calculate the standard error: SE = p 0 (1 p 0 )/n #2 Calculate the test statistic z = (ˆp p 0 )/SE #3 Calculate the area under the normal curve outside ±z

38 Paired samples The sign test The z-test The z-test for the cystic fibrosis experiment For the cystic fibrosis experiment, p 0 = 0.5 Therefore, p0 (1 p 0 ) SE = n 0.5(0.5) = 14 =.134

39 Paired samples The sign test The z-test The z-test for the cystic fibrosis experiment (cont d) The test statistic is therefore z = ˆp p 0 SE =.134 = 2.14 The p-value of this test is therefore 2(1.6%) = 3.2%

40 can produce hypothesis tests It may not be obvious, but there is a close connection between confidence intervals and hypothesis tests For example, suppose our hypothesis test was to construct a 95% confidence interval and then reject the null hypothesis if p 0 was outside the interval It turns out that this is exactly the same as conducting a hypothesis test with α = 5%

41 can produce confidence intervals Alternatively, suppose we formed a collection of all the values of p 0 for which the p-value of our hypothesis test was above 5% This would form a 95% confidence interval for p Note, then, that there is a correspondence between hypothesis testing at significance level α and confidence intervals with confidence level 1 α It turns out that the z-test corresponds to the approximate interval, and that the sign test corresponds to the exact interval

42 Introduction In general, then, confidence levels and hypothesis tests always lead to the same conclusion This is a good thing it would be confusing otherwise Furthermore, this is not just true of confidence intervals for one-sample categorical data; it is generally true of all confidence intervals and hypothesis tests However, the information provided by each technique is different: the confidence interval is an attempt to estimate a parameter, while the hypothesis test is an attempt to measure the evidence against the hypothesis that the parameter is equal to a certain, specific number