OPINION Two cheers for P-values?

Size: px
Start display at page:

Download "OPINION Two cheers for P-values?"


1 Journa of Epidemioogy and Biostatistics (2001) Vo. 6, No. 2, OPINION Two cheers for P-vaues? S SENN Department of Epidemioogy and Pubic Heath, Department of Statistica Science, University Coege London, UK Abstract P-vaues are a practica success but a critica faiure. Scientists the word over use them, but scarcey a statistician can be found to defend them. Bayesians in particuar find them ridicuous, but even the modern frequentist has itte time for them. In this essay, I consider what, if anything, might be said in their favour. Keywords Bayesian, hypothesis-tests, significance tests, Jeffreys Lindey paradox, repication probabiities. Introduction I do not beieve in beief (EM Forster Two Cheers for Democracy) P-vaues have ong inked medicine and statistics. John Arbuthnot and Danie Bernoui were both physicians, in addition to being mathematicians, and their anayses of sex ratios at birth (Arbuthnot) and incination of the panets orbits (Bernoui) provide the two most famous eary exampes of significance tests 1 4. If their ubiquity in medica journas is the standard by which they are judged, P-vaues are aso extremey popuar with the medica profession. On the other hand, they are subject to reguar criticism from statisticians 5 7 and ony reuctanty defended 8. For exampe, a dozen years ago, the prominent biostatisticians, the ate Martin Gardner and Doug Atman 9, together with other coeagues, mounted a successfu campaign to persuade the British Medica Journa to pace ess emphasis on P-vaues and more on confidence intervas. The journa Epidemioogy has banned them atogether. Recenty, attacks have even appeared in the popuar press 10,11. P- vaues thus seem to be an appropriate subject for the Journa of Epidemioogy and Biostatistics. This essay represents a persona view of what, if anything, may be said to defend them. I sha offer a imited defence of P-vaues ony. I sha argue the foowing: Certain paradoxica behaviour of P-vaues is not so much an inherent feature of P-vaues, but a consequence of the fact that Bayesians can disagree with each other (possiby quite reasonaby) about the concusions to be reached when data are anaysed. A supposedy undesirabe property of P-vaues, that they have moderate repication probabiity, is in fact desirabe. Even expert Bayesians can be fauted in choosing their priors when presenting aternatives to P-vaues. In short, I sha concude that if you can do better than using P-vaues you shoud, but that to do better is not as simpe as has sometimes been impied. Significance tests or hypothesis tests It is often considered important in discussing P-vaues to make a cear distinction between the Fisherian test of significance and Neyman Pearson hypothesis testing. However, operationay at east, this distinction is ess important than is sometimes supposed. In the Fisherian test of significance, three things are needed for examining a hypothesis: first, a statistic that measures discrepancy from what is considered expected, or reasonabe, given the hypothesis, second an ordering of the statistic and third the probabiity distribution of that statistic under the hypothesis. Given a particuar vaue of the test statistic, the probabiity of observing a vaue as extreme or more extreme than the one actuay observed is cacuated. This probabiity is the P-vaue and is used as a measure of credibiity of the hypothesis. If the P-vaue is sma, then Fisher invites us to consider the disjunction: Either an exceptionay rare chance has occurred, or the theory of random distribution is not Correspondence to: S Senn, Department of Epidemioogy and Pubic Heath, Department of Statistica Science, University Coege London, 1 19 Torrington Pace, London, WC1 E 6BT, UK. Received 25 Juy 2000 Revised 29 October 2000 Accepted 15 November Isis Medica Media Limited

2 194 S SENN true (Ref. 12, p. 42). Using the P-vaue is then considered to constitute a statistica inference. In the Neyman Pearson theory, however, a different approach is adopted. Nu and aternative hypotheses are identified and the task is to decide between them. Two kinds of error thus become possibe. Type I, in which the nu hypothesis is rejected athough true and Type II, where the aternative hypothesis is rejected athough true. Rues for deciding between the hypotheses can then be characterised in terms of particuar probabiities. The size of the test is the probabiity of deciding that the nu hypothesis is fase given that it is true. (Note that this is not the same as the probabiity of a Type I error; it is the probabiity of a Type I error under the nu hypothesis, the probabiity under the aternative hypothesis being zero and the unconditiona probabiity being either undefined or some average of the two depending on one s phiosophy.) The power of the test is the probabiity of deciding that the aternative hypothesis is true given that it is true. The actua business of making a decision is made using a statistic which, if it fas in the so-caed critica region, eads to the decision, reject the nu hypothesis, accept the aternative hypothesis and if it does not, eads to the decision, accept the nu hypothesis, reject the aternative hypothesis. Given this genera framework, the further step is then usuay taken of considering rues for which the size is fixed to be at or beow some predefined imit and the power is made to be as arge as possibe. Various features of the simpe theory do not hod for more compex cases, but the task of the scientist is seen to be one of making decisions not inferences. The Neyman Pearson theory is generay hed to improve on Fisher s in that the choice of hypotheses indicates the statistic to be empoyed and this is regarded as eiminating an arbitrary eement in Fisher s approach: how to decide on the test statistic. This view, however, depends on the assumption that hypotheses are somehow more primitive than statistics, anaogous to the way in which we-chosen axioms are more fundamenta than theorems. In Fisher s view this assumption is fase. According to him, nu hypotheses are more primitive than statistics but statistics are more primitive than aternative hypotheses 13. In practice any attempt to operate the Neyman Pearson theory in a way that eads automaticay from hypotheses to statistic invoves a high degree of mathematica abstraction and the reduction of the probem to a set of aternatives known a priori to be fase. The probem of the (partiay) arbitrary choice of test statistic is then repaced by one of the (partiay) arbitrary choice of aternative hypothesis, which is usuay obscured by a mathematica goss. For exampe, depending on the aternative hypothesis a probit or ogistic anaysis of binary data might be indicated. But you coud not know which of these aternative hypotheses was true and ony prior experience with test statistics coud suggest to you which was a better choice. A simiar probem occurs in forma Bayesian anayses, but it is often caimed that the mode is ony a convenient approximation. For the Bayesian, with enough determination, any choice between modes can be reduced to a question of a choice of prior probabiities, which is a purey persona matter. In many practica appications, statistics used for significance testing and hypothesis testing are the same, and a hypothesis test can be carried out via a P-vaue, so that if, for exampe, the P-vaue is noted to be, 0.05 (e.g ) then the nu hypothesis may be rejected at the 5% eve. At the end, a function that has many possibe vaues (the P-vaue function) is mapped onto a function which has ony two (the decision function). In the hypothesis-testing framework it coud then be argued that the P-vaue is ony a means to an end, and irreevant once the dichotomy has been made. This argument depends, however, on what might be referred to as the myth of the singe processor. It assumes that a statistica hypothesis test is either carried out by a scientist on his own behaf ony or, if performed on behaf of others, for exampe a scientific posterity, that he has been mandated to make the decision on their behaf as we and that he has the right to determine the size of the test to be used. If, however, Neyman operates tests with a size of 0.05 and Pearson operates with a size of 0.01, then knowedge that Neyman has rejected the nu hypothesis is not sufficient to enabe Pearson to make a decision. Simiary, if Pearson s resut is simpy that he does not reject the nu hypothesis then this does not yied a decision for Neyman. In the case of continuous statistics they shoud communicate P-vaues to each other. (See, for exampe, Lehmann, Ref. 14, p. 70). In the case of discrete statistics they shoud communicate the P-vaue for the observed resut and for the next most extreme resut or, equivaenty, the mid-p 15,16 and the probabiity of the observed resut under the nu hypothesis. In short, it is not surprising that Johnstone 17, in his survey of cassica statistics as actuay appied, found Neyman Pearson theory, but Fisherian practice. The two wi aways be difficut to separate. The Jeffreys Lindey paradox Nothing in this word... is probabe uness it appeas to our own trumpery experience. (Wikie Coins, The Moonstone) In an important paper that appeared in Biometrika in 1957, and which has been much cited, Lindey 18 eaborated an objection to the Fisherian test of significance originay due to Jeffreys 19. Lindey took the exampe of

3 TWO CHEERS FOR P-VALUES? 195 a simpe random sampe (x 1, x 2,... x n ) from a norma distribution of mean q and known variance s 2. He supposed that the prior probabiity that q = q 0, where q 0 is the vaue under the nu hypothesis, was c. He further supposed that the remainder of the prior probabiity was distributed over an interva I which incuded q 0. He then showed that if the vaue of the sampe mean was significant at the a percentage point so that x _ 5 q 0 1 a s / Ö n where a is a number dependent on a ony and can be found from tabes of the norma distribution function. (p. 187), then the posterior probabiity that q = q 0 was given by a 2 a c _ 5 ce 1(1 2 c)s (2p /{ce Ö / n)} (1) Lindey then pointed out that a consequence of the formua (1) is that as n increases, c _ approaches one and that therefore, whatever the vaue of c, a vaue of n coud be found such that if the sampe mean were significant at the a eve the posterior probabiity that q = q 0 woud be (100 a)%. Actuay, as Bartett 20 pointed out, formua (1), which is as presented by Lindey 18, is rather curious in that it is not dimensioness, but depends on the units of s, which therefore needs to be expressed as a fraction of I. To make this expicit, it needs to be rewritten in this form: a 2 a c _ 5 ce /{ce 1(1 2 c) I s (2p / n)} (2) Once this is done it can be seen that the actua vaue of c _ depends not ony on c, the prior probabiity of the nu hypothesis, but aso on the conditiona probabiity density over the region specified by the aternative hypothesis. The paradox sti appies but is no onger automatic ; two Bayesians having the same prior probabiity that a hypothesis is true and having seen the same data can come to radicay different concusions because they differ regarding the aternative hypothesis. (A particuary sharp exampe is given ater in this paper.) Now, this is not iogica. Indeed, from a Bayesian perspective it is perfecty reasonabe. What is unreasonabe is to regard the sort of contradiction impicit in the Lindey paradox as a reason in itsef for regarding P-vaues as iogica for, to adopt and adapt the rhetoric of Jeffreys (Ref. 19, p. 385), it woud require that a procedure is dismissed because, when combined with information which Ö it doesn t require and which may not exist, it disagrees with a procedure that disagrees with itsef. The esson from the Lindey paradox is this: knowedge about hypotheses may make P-vaues inappropriate 16. Before eaving the Lindey paradox, it is worth drawing attention to one other feature. From time to time certain Bayesians have fet it necessary to expain why, despite that the test of significance gives significance too easiy, nevertheess, frequentists using P-vaues come to reasonabe concusions (in the sense of agreeing with Bayesian concusions). The expanation usuay given is that they tend to instinctivey act as Bayesians, requiring a stricter eve of significance for arger sampes. This expanation is: irreevant in Bayesian terms and wrong in frequentist terms. The reason that it is irreevant in Bayesian terms is that it is no requirement of Bayesian inference that one Bayesian s concusions shoud agree with anothers, et aone with a frequentist s. Coherence is the ony standard by which probabiities are to be judged. Pure Bayesianism is a theory of how to remain perfect. The reason that it is wrong in frequentist terms is that it presupposes that the Jeffrey Lindey Paradox occurs frequenty and can be recognised as having occurred. Now, as regards the first point, the opinion of the eading Bayesians is that the Fisherian test of significance gives significance too easiy. If this reay is true, then whenever (or at east usuay, whenever) the frequentist concudes that a resut is not significant he must be agreeing with the Bayesians. On the other hand, what the frequentists caim about significance is sti true whether or not the Bayesians are right in regarding these caims as irreevant. As Fisher put it, A man who rejects a hypothesis provisionay, as a matter of habitua practice, when the significance eve is at the 1% eve or higher, wi certainy be mistaken in not more than 1% of such decisions. For when the hypothesis is correct he wi be mistaken in just 1% of these cases, and when it is correct he wi never be mistaken in rejection. (Ref. 12, p. 45.) Hence, if the word is fu of true nu hypotheses, and if tests reay do give significance too easiy then, using the more common standard, at the most he can ony disagree with the Bayesians 1 in 20 times. (Assuming that they never reject a hypothesis. If they occasionay reject a hypothesis the rate of disagreement is ower.) If, on the other hand, some nu hypotheses are fase, then the rate of rejecting them must increase, but if the increase eads to an increase of

4 196 S SENN the rate of disagreement with Bayesians then this increase must be made up of cases where the frequentist is right and the Bayesian is wrong (if right to be wrong). As regards the second point, the frequentist may argue that the Lindey paradox can ony be recognised as having occurred using prior information that doesn t exist. Any Bayesian who recognises that the paradox has occurred cannot guarantee that another Bayesian wi not disagree with him. An exampe of Howson and Urbach s A hair perhaps divides the fase and true. [E. Fitzgerad, The Rubaiyat of Omar Khayam (5th edition)] In this section an exampe taken from Howson and Urbach s famous book 21 wi be considered to show that where intuition conficts with the resut of a test of significance, intuition may be wrong. The book summarises many powerfu Bayesian criticisms of frequentist methods. For a critica review, see Senn 22. For a more sympathetic review see Gies 23. Howson and Urbach (Ref. 21, p. 136.) consider the exampe of a die, which is roed 600 times and shows 100 sixes, fives, fours and threes but 123 twos and 77 ones. They then point out that the Pearson Fisher c 2 statistic is 10.58, which, on five degrees of freedom, is not significant. As they put it, one is, therefore, under no obigation to reject the nu hypothesis, even though that hypothesis has pretty ceary got it bady wrong, in particuar, in regard to the outcomes two and one (p. 136, my itaics). Before going on to discuss this exampe in more detai, I note certain features that wi not be primariy reevant to the discussion. The first is that the geometry of the die makes it a priori unikey that if biased it shoud produce an excess of twos and a deficit of ones, whist maintaining a rate of occurrence of other vaues that woud be expected from a fair die. In fact, any Bayesian who had read Jeffreys s chapter on significance tests might expect a rather different apparent bias in a die for, in the manufacture of the dice sma pits are made in the faces to accommodate the marking materia, and this ightens the faces with five or six spots dispacing the centre of gravity towards the opposite sides and increasing the chance that these faces wi sette upwards. (Jeffreys, Ref. 19, p. 258.) This is, perhaps, an exampe where a singe test on five degrees of freedom makes too itte use of the ikey nature of departures from fairness and I am not sure that Fisher himsef woud choose to anayse these resuts in such a way. (Fisher 24 did, of course, anayse Wedon s famous data as did Jeffreys 19 after him, but the data are rather different from the Howson Urbach exampe. An interesting discussion of Wedon s data has been given by Kemp and Kemp 25.) Nevertheess, this particuar point is not essentia to the discussion and wi not be pursued further, except to note that Howson and Urbach 21 themseves seem to have avoided thinking about the probem expicity in terms of aternatives deemed reasonabe a priori and the same wi be done here. The second point is that the criticism proposed here is, in a sense, the opposite of Lindey s. Lindey s attitude is that significance is given too easiy by the Fisherian significance test, especiay if the sampe size is arge. This is aso the point of view of Matthews, who has gone so far as to caim that the reason that scientists find that some of their caims are not confirmed subsequenty is because they have been reying on significance tests 10,11,26. Here, however, the test is being criticised for not giving significance easiy enough. The third point is that Howson and Urbach confusingy cite a paper of Good s 27 on goodness-of-fit tests in support of their argument. However, Good s paper deas with a different matter atogether: that of determining goodness-of-fit when the vaues are continuous and the number of categories (histogram bins) has been determined arbitrariy. What can we say about the exampe of Howson and Urbach 21? The probem, of course, is that we have six parameters to fix, q 1, q 2, q 6 subject ony to the constraint that they shoud a add to 1 and this gives a arge consteation of possibe ikeihoods to examine. We can, however, compare the vaue of the ikeihood under the nu hypothesis with the argest vaue it can possiby obtain. The ikeihood is proportiona to: q q q q q q (3) Under the nu hypothesis we must substitute 1/6 for each of the parameters. To obtain the maximum we substitute the observed ratios for the parameters. Ceary, if we then cacuate the ratios of these two ikeihoods, ony the terms for parameters q 1 and q 2 are reevant since the observed ratios for the others are, in fact, 1/6. The ratio then reduces to (77/100) 77 (123/100) However, for a Bayesian to accept this as overwheming grounds for disbeieving the nu hypothesis has severa consequences. Suppose, for exampe the resuts of roing the die had been 89 ones, 113 twos, 111 threes, 85 fours, 116 fives and 86 sixes. For this second exampe the ratio of the ikeihood is 234 to 1 and the Pearson Fisher c 2 is If, therefore, the Bayesian accepts the first exampe as definitey exposing the nu hypothesis as scarcey credibe (s)he must then either:

5 TWO CHEERS FOR P-VALUES? 197 regard the nu hypothesis as even ess credibe for the second exampe caim that these ratios are not particuary reevant or produce an argument in terms of priors as to why the nu hypothesis is, after a more credibe for the second exampe. What can the cassica statistician say about this exampe? (S)He can point out that twice the og of the ikeihood ratio has approximatey a c 2 distribution, so that the ikeihood ratio c 2 for the first exampe is For the second it is We may now note the foowing points: The ikeihood ratio c 2 is very simiar to the Pearson Fisher c 2. None of these c 2 are significant at the 5% eve. If you accept that for exampe 1 the nu hypothesis has pretty ceary got it bady wrong and if you accept ikeihood as your guide, you wi, whenever you ro a fair die 600 times, come to this concusion with a probabiity in excess of 1/20. Now, it must be conceded that the third point is not necessariy unreasonabe. You may, indeed, beieve the word to be fu of curiousy biased dice. I suspect, however, that many peope wi, at first sight, find the first exampe gives more convincing evidence against the nu hypothesis than the second. The reason they wi do so, is not that they are Bayesians with strange priors, rather than frequentists, but that if they are Bayesians they are bad Bayesians and if they are frequentists, bad frequentists. What I think happens with this exampe is the foowing: the probem is redefined once the data are in. The extremey good fit for the numbers three to six causes their evidence to be mentay dismissed as irreevant, as is the fact that the tota of one and two is exacty as expected for a fair die and the probem, which was originay one of testing the fairness of the die against any possibe departure, is repaced by one of testing its fairness against the specific departure observed. In frequentist terms this is easiy expained: a c 2 with five degrees of freedom is iegay reduced to one with one degree of freedom. In Bayesian terms one coud say that some principe of tota information is being vioated: the 400 ros of the die that give good support to its fairness are ignored and ony the 200 that do not are accepted. Aternativey, the probem may be regarded as being one of using the data twice: once to define the hypotheses and then again to assess them. As Jeffreys s correspondence with Fisher shows, he woud regard the probem as being one of debating, a motion not before the House 28. In fact, one can go further: a Bayesian anaysis woud probaby not concude that the die is biased, especiay if what Cox has caed a pausibe hypothesis 29 and what Berger and Deampady refer to as a point hypothesis 30 is being tested. (Berger and Deampady aso consider a more genera cass of precise hypotheses that incudes point and sma interva hypotheses.) For such a Bayesian hypothesis test, a ump of prior probabiity woud be paced on the die s being perfecty fair and the rest of the probabiity woud be smeared over the rest of the possibiities. It is this smearing that causes the Bayesian significance test to be so conservative compared with the frequentist one. Athough the ikeihood ratio of 208 noted seems particuary impressive, we must not forget that the parameter combination under the aternative is, of course, that corresponding to the maximum ikeihood soution. By definition, for every other parameter combination under the aternative there is ess evidence against the nu and the proportion of the parameter space for which the ratio of ikeihoods is, 1 (that is to say in favour of the nu hypothesis) is enormous. Of course, depending on the prior, some of these combinations woud be beieved unikey. Nevertheess, carefu consideration of this point suggests that Howson and Urbach woud be hard put to come up with a prior that did suggest that the nu hypothesis pretty ceary had got matters wrong in the case of their die without rendering the Bayesian test of significance vunerabe to faiure to detect other curious patterns of departure. Your prior probabiity has to add up to one and you have to spend it wisey. Consider, for exampe, an extremey interesting proposa of IJ Good s regarding the anaysis of data from mutinomia distributions 31. Good suggests that one might use a symmetric Dirichet distribution with parameter k as a prior over the region of the aternative hypothesis for the purpose of constructing a Bayes factor for comparing nu and aternative. He cas such a probabiity mode, a prior conditiona on a given vaue for a parameter, a Type II mode. The probabiity distribution of the data conditiona on a given combination of prior parameters is a Type I mode. As Good points out, in practice, to use such a Type II Bayes factor (the ikeihood ratio might be regarded as the maximum Type I factor) we have to commit ourseves to a particuar vaue of k. We may fee happier by going to an even higher eve, Leve III, at which we aso postuate a prior distribution for k. This woud enabe us to produce a Type III Bayes factor. In practice, however, this may be difficut and we might prefer to examine the Type II Bayes factor instead as a function of k, possiby paying particuar attention to its maximum. Suppose that we have such a Leve II symmetric prior with parameter k for a mutinomia Type I probabiity

6 198 S SENN distribution. Good shows 31 that the Type II Bayes factor, F(k) is given by: t n i 21 N21 F(k) 5 [ P P {1 1 j / k}] / P {1 1 j / (tk)} (4) i51 j51 j51 where n i is the frequency in ce i, t is the number of ces, N is the tota of the frequencies and, by convention, n i 21 P {1 1 ( j / k)} (5) j51 is to be assigned the vaue 1 when n i 5 0 or 1. Figure 1 shows a pot of F(k) against k for the exampe originay considered by Howson and Urbach 2 and for the aternative exampe suggested above. It wi be noted that: Type II Bayes factor Just as was the case for c 2 and ikeihood ratio statistics, and whatever the vaue of k chosen for the prior, the Type II Bayes factor is more against the nu hypothesis for the second exampe than that originay considered by Howson and Urbach. Concentrating on the Howson and Urbach exampe, however, we note that uness k is. 20 then, according to the ogic of the Bayesian significance test, the resuts provide no evidence against the nu hypothesis. For exampe for k # 5.7, the posterior odds of the die s being fair are $ 10 times what they were to start with. The maximum vaue of F(k) is attained at about k 5 88, at which point it equas about 2.5. In other words, if you start with odds of evens that the die is fair, then under the east favourabe vaue of k for the nu hypothesis, your posterior odds on the die being biased wi be about five to two Howson and Urbach k Modified exampe Fig. 1 Good s Type II Bayes factor for the Howson and Urbach die-roing exampe. The statement by Howson and Urbach that the c 2 test has pretty ceary got it bady wrong seems, in the ight of these resuts, rather extreme, to say the east. If that is what they think of the c 2 test there must be ots of Bayesians (those with the wrong vaues of k) who woud get it even more bady wrong than the c 2 test. Of course, if you beieved a priori that this die might be one that some phiosophers, who were panning to show the c 2 test in a bad ight, were intending to ro, then you might be abe to arrange to have a prior with peaks at a 30 possibe combinations (1 and 2, 2 and 1, 1 and 3 and so forth) of probabiities of 77/600 and 123/600, these being the imits that just aren t significant and probabiity zero esewhere (except on the nu). Such a prior woud yied a Bayes factor of about 7.5 against the nu hypothesis for the phiosopher s die. This is impressive, but of course the prior eaves you very vunerabe if other sorts of individuas, say statisticians, are tampering with dice. If you beieved that it was possibe that the phiosophers had actuay fasified the data themseves, then much more impressive odds coud be arranged. For a rea die, however, the c 2 resut observed does not seem exceptiona 22. An exampe of Lindey s The danger the significance test appears to be avoiding in the previous exampe is that of over-reacting to chance events. A forma Bayesian anaysis can, of course, dea with this probem, but it requires updating of a prior estabished before encountering the data and this prior then commits the anayst to a particuar given posterior for every data-set, not just the data-set seen. It appears to be rare that a Bayesian wi actuay commit him or hersef to a singe prior beforehand. More popuar nowadays seems to be what might be described as subjunctive Bayes: a range of different priors are assayed and the posteriors they give rise to are examined. The anaysis is then of the what if form. What if this had been our prior? Why, then this woud be our posterior. This form of statistica anaysis has many attractions, but aso some dangers, and is rather difficut to justify in terms of Bayesian coherence. It seems to be a retreat from pure subjectivism but pure Bayesianism is subjective but not subjunctive. However, Lindey himsef, in a paper that starts with a detaied examination and criticism of the Fisherian test of significance, has pubished his prior for a ady tasting wine 32. Since this is a Bayesian attempt to provide an aternative soution to the sort of probem for which significance tests have famousy been iustrated 33, it seems reasonabe to examine how successfu Lindey s purey subjective Bayesian soution is.

7 TWO CHEERS FOR P-VALUES? 199 Lindey supposes that a ady who has a quaification in wine-tasting caims to be abe to te caret from Caifornian wine with the same grape mix. She is given six pairs of gasses in a bind experiment and asked to distinguish which member of each pair is French and which is Caifornian. Lindey s prior for her probabiity of correct identification is given by f(q) 5 48(1 2 q)(q 2 1 2), 1 2, q, 1. This is a beta distribution, modified to cover the range 1 2, q, 1 and its mode is at The probabiity density function is shown in Figure 2. Lindey then shows how this prior beief may be updated in a Bayesian approach given any particuar resut of the experiment. (He proposes testing the ady with six pairs of gasses.) For exampe, if the ady gets five pairs right and one wrong, then the probabiity distribution for 6 is proportiona to (1 2 q) 2 q 5 (q 2 1 2). This anaysis is, of course, beyond reproach in the sense that if coherence is your guide then, having once expressed the prior, the posterior probabiity statements must be as given by Bayes theorem. Lindey goes further, however, he actuay expresses the hope that the reader wi agree with him regarding the prior. However, my prior, to the extent that I can identify it, is amost the opposite of Lindey s. I think, that either the ady is justified in her beief in her discriminatory powers or she is misguided. If the former is the case, then I beieve that she wi repeat the trick of identifying the correct member of a pair with high probabiity; if not, she is guessing and wi have a probabiity near one haf. Her quaifications merey make the former more ikey than it woud be otherwise. Like Lindey, in a previous examination of this probem 34 I woud aso aow a sma probabiity for her having a fine paate but a poor knowedge, so that she consistenty abes the wrong member of the pair as Caifornian. Thus I require a prior with a considerabe ump around 0.5, a considerabe smear in the vicinity of 0.95 (say) and a smaer smear near Something of the sort is iustrated in Figure 3, where the rectange in the midde is supposed to represent a ump of probabiity on the exact vaue 0.5. (An aternative might be to regard this as a further concentrated smear of probabiity density very cose to 0.5.) Let us imagine a suitabe way of deciding if you can accept Lindey s prior. Suppose that we can arrange to test the ady 20 times over a suitabe period. Just to make it interesting suppose that you wi be given at no cost to you if you can correcty predict which of the foowing mutuay excusive and exhaustive events wi occur. Event A: the ady guesses correcty for 12 to 16 pairs of gasses incusive. Event B: the ady guesses correcty for 0 to 11 or 17 to 20 gasses incusive. If you prefer to bet A, then at east as regards this concrete test, you are with Lindey, whose predictive probabiity of event A shoud be 16 1 N S # 48(1 2 q)(q 2 1 2) ( ) qx X512 1/2 X (1 2 q) N2X dq It is a necessary but not sufficient test of the suitabiity of Lindey s prior for you that you must prefer bet A. If you prefer B, then you cannot accept his prior. A comment of Goodman s In an interesting paper in Statistics in Medicine, Goodman has cacuated the probabiity of repeating a significant resut (at the 5% eve) given a particuar P-vaue given two different sets of assumptions 35. First, that the true difference is the observed difference from the first experiment and second on the assumption that there was an uninformative prior for the treatment effect (6) Prior probabiity density Probabiity of successfu identification Fig. 2 Lindey s prior for a ady tasting wine. Prior probabiity density Probabiity of successfu identification Fig. 3 A possibe aternative prior for the ady tasting wine.

8 200 S SENN prior to the first. For both of these cases, if the P-vaue is 0.05 the probabiity of a repetition is 50%. If the P-vaue is 0.01 the probabiity of a repeated significant resut is 0.73 and 0.66 respectivey. This demonstration is usefu, but one shoud be carefu in regarding it as showing the unsuitabiity of P- vaues. For exampe, Goodman does not mention that this property is shared by Bayesian statements. This is obvious when one considers the history of P-vaues. Onesided P-vaues were cacuated ong before the Fisherian revoution and given a Bayesian interpretation as the probabiity that the true treatment difference was in the opposite direction to that observed given a uniform prior. (For exampe, the probabiity that the treatment was, after a, inferior to pacebo even though the difference in this tria was in favour of treatment.) This means that, under this interpretation, the one-sided P-vaue is the Bayesian probabiity that a tria of infinite size woud produce a treatment effect in the opposite direction to that observed. In other words, there is a 95% probabiity that an infinitey arge tria woud show that treatment was superior to pacebo if the one-sided P-vaue in favour of the treatment is 5%. Such a demonstration, being based on an infinite tria woud be concusive. But the repetition probabiity that Goodman considers differs from this probabiity in two important ways 36. First, the tria to come is not of infinite size and second the probabiity is not for the event that the sign of the treatment effect from the second tria is concordant with the first. Instead, what Goodman requires is the probabiity that the evidence from the second tria taken aone shoud have the same vaue as that from the first. But two trias, both significant at the 5% eve, are more impressive than one and what Goodman is requiring is that a singe significant P-vaue shoud carry with it the near certain promise that a second wi foow. This is, however, a highy undesirabe property. The inferentia vaue of one tria is the vaue of one tria, not the vaue of two. Anticipated evidence is not evidence, nor do we want it to be. To expect that it is, is to make exacty the same mistake that physicians make in saying, the resut was not significant, p , because the tria was too sma. In fact, P-vaues are not unreasonabe objects from a Bayesian perspective, given an uninformative prior, therefore Goodman s demonstration of their behaviour under these circumstances cannot be taken as a reevant (Bayesian) criticism of their behaviour. This is not to say that their use is justified to the Bayesian, athough those who favour subjunctive Bayes (see above) may find a use for them. However, their unreasonabeness depends on an uninformative prior being inappropriate. Changing the prior wi make a posterior statement inappropriate. In this sense, P-vaues can be rendered inappropriate to the Bayesian because posterior statements can be inappropriate, not because P-vaues are not Bayesian statements. In moving away from P-vaues, however, it must be understood that sharp disagreement between Bayesians is possibe. Consider an exampe. Two scientists are jointy invoved in testing a new drug and estabishing its treatment effect, d, where positive vaues of d are good. The variance of the response in this group of patients is known to be about one. Scientist A has a vague prior beief regarding d, which is described by a norma distribution with mean zero and standard deviation 25. Scientist B has exacty the same distribution as regards positive vaues of d, but is ess pessimistic than A as regards the effect of the drug. If it is not usefu, she beieves that it wi have no effect at a. Thus, the two scientists share the same beief that the drug has a positive effect. Given that it has a positive effect, they share the same beief regarding its effect. They share the same beief that it wi not be usefu. They differ ony in beief as to how harmfu it might be. A cinica tria is run with 70 patients per group and the observed vaue of d is found to be d , corresponding to a standardised difference of 1.65 and a onesided P-vaue of about What is the probabiity that the drug does not have a positive effect? The answer is 1/20 (Scientist A) but the answer is aso 19/20 (Scientist B). (Detais avaiabe on request.) Not ony do the two scientists not agree, but from a common beief in the drug s efficacy they have moved in opposite directions. (It is interesting to note that the ess pessimistic of the two scientists ends up being the more sceptica. But one shoud be carefu: utiity has not been incuded in the formuation.) Sampe size: a serious difficuty It needs to be stressed that, athough the rank order correation between P-vaues and ikeihood ratio can be perfect for tests based on continuous statistics, provided the precision is constant, as is, in fact, impied by the Neyman Pearson emma, this is not true where precision varies. This is, of course, we known to Bayesians, but is worth repeating. Consider a cinica tria of a treatment effect D, comparing two groups where it is desired to test H 0 : D 5 0 against H 1 : D 5 d, d. 0 using a test of size a. Let d be the observed difference, assumed normay distributed, and suppose that the known variance is one. In that case the critica vaue, c, of d is: c(n) 5 z a Ö 2/n (7) where z a is such that 1 2 F(z a ) 5 a and F(.) is the norma distribution function. The og of the ratio of the

9 TWO CHEERS FOR P-VALUES? 201 ikeihood under H 1 compared to H 0 as a function of d is given by: L(d) 5 (2n / 4)(d 2 2 2dd ) (8) This is an increasing function of d. Substituting, however, a critica vaue (7) for d in (8) and then ooking at the og of the ratio of ikeihoods as a function of n we have L(n) 5 (2n / 4)(d 2 2 2dz a Ö 2/n ) (9) (9) is not, however, a monotonic function of n. Figure 4 pots both critica vaue c(n) and the og ratio of ikeihoods L(n) for this exampe, for the case where A 5 1. It wi be seen that, athough the critica vaue is a decreasing function of n, L(n) increases at first but then decreases. Further discussion of this can be found in chapter 13 of Statistica Issues in Drug Deveopment. 37 In fact, contrary to what is often stated about the Neyman Pearson theory, it seems most probematic when used to compare two simpe hypotheses. A good discussion of this is given by Howson and Urbach (Ref. 21, chapter 7). This carries over to P-vaues if these are used in comparing the nu hypothesis against a simpe aternative. The consequence of this is that P-vaues cannot be used to compare across sampes of different sizes if the aternative hypothesis is known. As Barnard puts it, P-vaues are usefu in situations where the probems discussed are reativey unstructured (Ref. 16, p. 610). In fact, I cannot think of any case where the aternative hypothesis is known in drug deveopment, athough some simiar inferentia probems can arise in equivaence trias if an attempt is made to appy Neyman Pearson theory for very ow precision (Ref. 37, chapters 15 and 22). Critica vaue or og ratio of ikeihoods Patients per arm Critica vaue Log ratio of ikeihoods Fig. 4 The effect of sampe size on the interpretation of a P-vaue when there is a fixed aternative. One cheer for P-vaues? I found an atar with this inscription. TO THE UNKNOWN GOD (Acts, ch. 17, v. 23) On the whoe it may be admitted that the pedestrian shoud cant a map but... he shoud aways cherish the thriing and secret thought that it may be a wrong. (GK Chesterton) The esson shoud be cear. Tai area probabiities are not appropriate for comparing two precise hypotheses. For comparing what Cox has caed dividing hypotheses 29, hypotheses of the form H 0 : t. 0 and H 1 : t. 0, they may have a use. Indeed, they even have a Bayesian interpretation. A one-sided P-vaue then corresponds to a posterior probabiity that t. 0 given a uniform prior. As a piece of subjunctive Bayesian reasoning this may be of interest. Indeed, it is precisey with this prior in mind that Student, who was a Bayesian, fet that it was important to go to the considerabe abour of tabuating the integra of his distribution, rather than merey providing the formua for its density 38. On the other hand, if you are perfect, any use of any inferentia device other than that of Bayesian updating of your subjective prior cannot be recommended. However, you shoud be aware that phrases such as back to the drawing board are forbidden to you. Furthermore you cannot become perfect. If you are incoherent you remain incoherent: the number of incoherencies can ony grow with time, uness the striking out of priors is acceptabe. However, to accept that striking out of priors is egitimate is to accept that the striking out of posterior statements is aso egitimate (since a priors are aso posteriors with respect to some other experience) and hence that Bayesian updating is not necessary. If, on the other hand you see statistica inference as providing a set of toos, not an a-embracing method, and beieve that being ocay Bayesian can be an exceent thing but that being gobay Bayesian is amost impossibe, then means of checking assumptions become interesting. One such means is the P-vaue and, indeed, the pioneering neo-bayesian IJ Good accepts such a (imited) roe for it 27. It is best regarded, in my view, as the sceptic s concession to the guibe. The guibe concentrates on the particuar, pecuiar pattern observed, as in the die-roing exampe. Good points out that in code-breaking circes such an observed, but not anticipated, pattern was referred to as a kinkus. The psychoogica danger to which the guibe is vunerabe is, effectivey, to adopt the maximum ikeihood soution as the aternative hypothesis. The prior probabiities have not been specified. (And peope tend not to specify their priors. It is my view, however, that attempting to be a subjective Bayesian and not specifying your priors prior

10 202 S SENN to seeing the data can be very dangerous.) Therefore, the option of integrating the ikeihood over the region of the aternative is not avaiabe. A that the sceptic can counter with is a warning as to what the genera consequences wi be of a behaviour that regards this sort of thing as significant. For this purpose, a different type of measure atogether is needed. The P-vaue measure can be regarded as a sort of crude surprise index. Indeed, Matthews himsef uses it in this way without even noticing that he has done so 26. In referring to an experiment of Miikan s and comparing his determination of the charge of the eectron to the modern vaue he says... the discrepancy is so arge that the probabiity of generating it by chance aone is ess than 1 in Matthews s purpose here is to draw attention to the effect of subjectivity in science and he uses a surprising resut to do so, but his one in 10 3 is nothing more nor ess than a P-vaue. But one aso has to be very cear about the cost of using this sort of measure. A P-vaue of for a onesided test corresponds, in the case of a normay distributed test statistic with known variance, to a vaue of about six of the ikeihood ratio 37. This is true whatever the sampe size, but the best-supported aternative hypothesis changes with the sampe size, getting coser and coser to the nu as the sampe size increases. Thus, P-vaues shoud not be considered aone, but in conjunction with point estimates and standard errors or confidence intervas or, even better, ikeihood functions. But a P-vaue of 0.05 does correspond to a very ow standard of evidence. There are two points to make in connection with this. If you wish to be Bayesian then, as Jefrreys 19 pointed out, there is no independent principe of parsimony. Simper modes are to be preferred to more compex ones, because they are inherenty more probabe and the parsimony is refected in, and hence consumed by, the choice of prior probabiity (Ref. 19, p. 119). It thus foows that compex modes are to be preferred to simper ones as soon as they become more probabe. Hence, as a Bayesian, at east one of the threshods of significance you ought to use is a probabiity of 0.5. The probabiity of 0.95 is aready a much more stringent requirement. I mention this because some Bayesians in criticising P-vaues seem to think that it is appropriate to use a threshod for significance of 0.95 of the probabiity of the aternative hypothesis being true 26,30. This makes no more sense than, in moving from a minimum height standard (say) for recruiting poice officers to a minimum weight standard, decaring that since it was previousy 6 foot it must now be 6 stone. The second point is that, contrary to what is usuay impied (see for exampe Matthews 26 ), significant P-vaues are rarey used by medica statisticians to persuade cinicians against their better judgement that treatments are effective. On the contrary, reying on intuition, as for exampe Howson and Urbach stated they did with their die (Ref. 21, p. 136), scientists are a too ready to ca resuts significant. In 8 years in drug deveopment, during which time I was guity of producing hundreds of P-vaues, in addition to thousands of other statistics, I can ony think of two occasions where I found mysef defending a significant P-vaue against a contrary judgement. The first occasion concerned an equivaence tria comparing two doses of a new formuation to a standard. This tria was for interna decisionmaking. If a Bayesian formuation had been used, the presumption of equivaence woud have been much stronger than for the conventiona pacebo-controed tria. Despite apparent equivaence in efficacy there was a significant difference in a key toerabiity measure, to the detriment of the new formuation. The cinicians were convinced that this resut was of no consequence. I cautioned against assuming that it was necessariy a fuke. The further deveopment of this formuation was ater abandoned as a resut of spontaneous adverse toerabiity reports and this was eventuay traced to a manufacturing difficuty that was causing over-dosing. In the second case, a riva company s drug, that had been studied many times and was beieved to have a duration of 6 h showed a (just) significant effect at 12 h compared with a pacebo. The investigator with whom the tria had been paced wanted me to reanayse the tria using a non-parametric procedure to make this mistake go away, caiming that it woud jeopardise his abiity to pubish. (A rare exampe of non-significance being preferred. However, the resut for our product was highy significant anyway.) I refused, pointing out that this was amost certainy a fuke, but had to be interpreted together with a the other trias. This was certainy a case where a Bayesian anaysis woud have been abe to swamp the evidence from this tria, but as the point estimate and standard error were reported from this tria the information needed was there for anyone who wanted to update a prior. (Certainy my posterior distribution woud have been useess to them.) Maybe I over-estimated the scientific maturity of the investigator. Athough the company report produced the anaysis originay proposed in the protoco, the investigator independenty pubished his account using the anaysis that made the unwanted significant P-vaue go away. In summary, my advice regarding the use of P-vaues is as foows. Do not rey on P-vaues aone. Use ikeihood as we. Report point estimates and standard errors (or confidence intervas).

11 TWO CHEERS FOR P-VALUES? 203 Where extra precision is required, it wi be worth making a more stringent requirement for significance rather than consuming a of the increased precision in increased power. Consider aso using Bayesian methods. If you are in the habit of using P-vaues, acquire some famiiarity with their behaviour under the aternative hypothesis If, having used a Bayesian technique and found itte evidence against the nu, you nevertheess find that there is a ow P-vaue, consider if there might be any feature of the probem you have overooked. Finay, I give the ast word to RA Fisher.... tests of significance are based on hypothetica probabiities cacuated from their nu hypothesis. They do not generay ead to any probabiity statement about the word but to a rationa and we-defined measure of reuctance to the acceptance of the hypothesis they test. (Ref. 12, p. 47) Acknowedgements I thank Steve Goodman, Robert Matthews, Professor Jack Good and Professor Dennis Lindey for hepfu discussions on this topic and the referees for hepfu comments on an earier version of this paper. References 1 Had A. A history of probabiity and statistics and their appications before New York: Wiey, Shoesmith E, Arbuthnot, J. In: Johnson, NL, Kotz, S, editors. Leading personaities in statistica sciences. New York: Wiey, 1997: Bernoui, D. Sur e probeme propose pour a seconde fois par Acadamie Royae des Sciences de Paris. In: Speiser D, editor. Die Werke von Danie Bernoui, Band 3, Base: Birkhauser Verag, 1987: Arbuthnot J. An argument for divine providence taken from the constant reguarity observ d in the births of both sexes. Phi Trans R Soc 1710;27: Freeman P. The roe of P-vaues in anaysing tria resuts. Statist Med 1993;12: Anscombe FJ. The summarizing of cinica experiments by significance eves. Statist Med 1990;9: Roya R. The effect of sampe size on the meaning of significance tests. Am Stat 1986;40: Senn SJ. Discussion of Freeman s paper. Statist Med 1993;12: Gardner M, Atman D. Statistics with confidence. Br Med J Matthews R. The great heath hoax. Sunday Teegraph 13 September, Matthews R. Fukes and faws. Prospect 20 24, November Fisher RA. Statistica methods and scientific inference. In: Bennett JH, editor. Statistica methods, experimenta design and scientific inference. Oxford: Oxford University Press, Fisher RA. Letter to Chester Biss 6 October In: Bennett JH, editor Statistica inference and anaysis. Seected correspondenc e of R.A. Fisher. Oxford: Oxford Science Pubications, Lehmann E. Testing statistica hypotheses. (2nd edn) New York: Chapman and Ha, Lancaster HO. Statistica contro of counting experiments. Biometrika 1949;39: Barnard GA. Must cinica trias be arge? The interpretatio n of P-vaues and the combination of test resuts. Statist Med 1990;9: Johnstone DJ. Tests of significance in theory and practice. Statistician 1986;35: Lindey DV. A statistica paradox. Biometrika. 1957;44: Jeffreys H. Theory of probabiit y. (3rd edn) Oxford: Carendon Press, Bartett MS. A comment on D.V. Lindey s statistica paradox. Biometrika 1957;44: Howson C, Urbach P. Scientific reasoning: the Bayesian approach. La Sae: Open Court, Senn SJ. Review of Howson and Urbach, Scientific Reasoning, the Bayesian Approach. Statist Med 1991;10: Gies D. Bayesianism versus fasificationism. Ratio 1990;3: Fisher RA. Statistica methods for research workers. In: Bennett, JH editor. Statistica methods, experimenta design and scientific inference. Oxford: Oxford University Press, Kemp AW, Kemp CD. Wedon s dice data revisited. Am Stat 1991;45: Matthews R Fact versus factions: the uses and abuse of subjectivit y in scientific research, European Science and Environment: Forum, Good IJ. Some ogic and history of hypothesis testing. In: Pitt JC editor. Phiosophica foundations of economics. Dordrecht: Reide, 1981: Jeffreys H. Letter to RA Fisher, 8 May In: Bennett JH editor. Statistica inference and anaysis. Seected correspond - ence of R.A. Fisher. Oxford: Oxford Science Pubications, 1990: Cox DR. The roe of significance tests. Scand J Stat 1977;4: Berger JO, Deampady M. Testing precise hypotheses. Statist Sci 1987;2: Good IJ. A Bayesian significance test for mutinomia distributions (with discussion). J R Stat Soc Ser B 1967;29: Lindey DV. The anaysis of experimenta data: the appreciation of tea and wine. Teach Statist 1993;15: Fisher RA. The design of experiments. In: Bennett JH, editor. Statistica methods, experimenta design and scientific inference. Oxford: Oxford University Press, Lindey DV. A Bayesian ady tasting tea. In: David HA, David HT, editors. Statistics an appraisa. Ames: Iowa State University Press, 1984:

12 204 S SENN 35 Goodman SN. A comment on repication, p-vaues and evidence. Statist Med 1992;11: Senn SJ. A note on p-vaues and repication probabiities. Statist Med. In press. 37 Senn SJ. Statistica issues in drug deveopment. Chichester: Wiey, Student. The probabe error of a mean. Biometrika 1908;VI: Miettinen O. Theoretica epidemioogy. Demar Pubishing. 40 Senn SJ. Suspended judgment: n of 1 trias. Cont Cin Trias 1993;14: Hung HM, O Nei RT, Bauer P, Kohne K. The behavior of the P-vaue when the aternative hypothesis is true. Biometrics 1997;53:11 22.

Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni G. Parmigiani. For further volumes: http://www.springer.

Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni G. Parmigiani. For further volumes: http://www.springer. Use R! Series Editors: Robert Genteman Kurt Hornik Giovanni G. Parmigiani For further voumes: http://www.springer.com/series/6991 Graham Wiiams Data Mining with Ratte and R The Art of Excavating Data

More information

Health Literacy Online

Health Literacy Online Heath Literacy Onine A guide to writing and designing easy-to-use heath Web sites Strategies Actions Testing Methods Resources HEALTH OF & HUMAN SERVICES USA U.S. Department of Heath and Human Services

More information

You May Believe You Are a Bayesian But You Are Probably Wrong

You May Believe You Are a Bayesian But You Are Probably Wrong RMM Vol. 2, 2011, 48 66 Special Topic: Statistical Science and Philosophy of Science Edited by Deborah G. Mayo, Aris Spanos and Kent W. Staley http://www.rmm-journal.de/ Stephen Senn You May Believe You

More information

Are Health Problems Systemic?

Are Health Problems Systemic? Document de travai Working paper Are Heath Probems Systemic? Poitics of Access and Choice under Beveridge and Bismarck Systems Zeynep Or (Irdes) Chanta Cases (Irdes) Meanie Lisac (Bertesmann Stiftung)

More information

All Aspects. of a...business...industry...company. Planning. Management. Finance. An Information. Technical Skills. Technology.

All Aspects. of a...business...industry...company. Planning. Management. Finance. An Information. Technical Skills. Technology. A Aspects Panning of a...business...industry...company Management Finance Technica Skis Technoogy Labor Issues An Information Sourcebook Community Issues Heath & Safety Persona Work Habits Acknowedgement

More information

On the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object

On the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object 2448 J. Opt. Soc. Am. A/ Vo. 18, No. 10/ October 2001 R. Ramamoorthi and P. Hanrahan On the reationship between radiance and irradiance: determining the iumination from images of a convex Lambertian object

More information

Securing the future of excellent patient care. Final report of the independent review Led by Professor David Greenaway

Securing the future of excellent patient care. Final report of the independent review Led by Professor David Greenaway Securing the future of exceent patient care Fina report of the independent review Led by Professor David Greenaway Contents Foreword 3 Executive summary 4 Training structure for the future 6 Recommendations

More information

How to Make Adoption an Affordable Option

How to Make Adoption an Affordable Option How to Make Adoption an Affordabe Option How to Make Adoption an Affordabe Option 2015 Nationa Endowment for Financia Education. A rights reserved. The content areas in this materia are beieved to be current

More information

YOUR GUIDE TO Healthy Sleep

YOUR GUIDE TO Healthy Sleep YOUR GUIDE TO Heathy Seep Your Guide to Heathy Seep NIH Pubication No. 11-5271 Originay printed November 2005 Revised August 2011 Contents Introduction...1 What Is Seep?...4 What Makes You Seep?...7

More information

Relationship Between the Retirement, Disability, and Unemployment Insurance Programs: The U.S. Experience

Relationship Between the Retirement, Disability, and Unemployment Insurance Programs: The U.S. Experience Reationship Between the Retirement, Disabiity, and Unempoyment Insurance Programs The US Experience by Virginia P Reno and Danie N, Price* This artice was prepared initiay for an internationa conference

More information

Some statistical heresies

Some statistical heresies The Statistician (1999) 48, Part 1, pp. 1±40 Some statistical heresies J. K. Lindsey Limburgs Universitair Centrum, Diepenbeek, Belgium [Read before The Royal Statistical Society on Wednesday, July 15th,

More information

Minimum Required Payment and Supplemental Information Disclosure Effects on Consumer Debt Repayment Decisions

Minimum Required Payment and Supplemental Information Disclosure Effects on Consumer Debt Repayment Decisions DANIEL NAVARRO-MARTINEZ, LINDA COURT SALISBURY, KATHERINE N. LEMON, NEIL STEWART, WILLIAM J. MATTHEWS, and ADAM J.L. HARRIS Repayment decisions ow muc of te oan to repay and wen to make te payments directy

More information

How to Make Cognitive Illusions Disappear: Beyond Heuristics and Biases

How to Make Cognitive Illusions Disappear: Beyond Heuristics and Biases Published in: W. Stroebe & M. Hewstone (Eds.). (1991). European Review of Social Psychology (Vol. 2, pp. 83 115). Chichester: Wiley. 1991 John Wiley & Sons Ltd. How to Make Cognitive Illusions Disappear:

More information

Centre for Philosophy of Natural and Social Science. Why There s No Cause to Randomize

Centre for Philosophy of Natural and Social Science. Why There s No Cause to Randomize Centre for Philosophy of Natural and Social Science Causality: Metaphysics and Methods Technical Report 24/04 Why There s No Cause to Randomize John Worrall Editor: Julian Reiss Why There s No Cause to

More information

Ending the Rationality Wars: How to Make Disputes About Human Rationality Disappear

Ending the Rationality Wars: How to Make Disputes About Human Rationality Disappear This paper was published in Common Sense, Reasoning and Rationality, ed. by Renee Elio. New York: Oxford University Press, 2002, pp. 236-268. Ending the Rationality Wars: How to Make Disputes About Human

More information



More information


EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NON-LINEAR REGRESSION. Carl Edward Rasmussen EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NON-LINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate

More information

The Lovely but Lonely Vickrey Auction

The Lovely but Lonely Vickrey Auction The Lovey but Loney Vickrey Auction Lawrence M. Ausube and Pau Migro 1. Introduction Wiia Vickrey s (1961) inquiry into auctions and counterspecuation arked the first serious attept by an econoist to anayze

More information

Epistemological puzzles about disagreement. Richard Feldman. Disagreements among intelligent and informed people pose challenging epistemological

Epistemological puzzles about disagreement. Richard Feldman. Disagreements among intelligent and informed people pose challenging epistemological 414 Epistemological puzzles about disagreement Richard Feldman Disagreements among intelligent and informed people pose challenging epistemological issues. One issue concerns the reasonableness of maintaining

More information

Who to Follow and Why: Link Prediction with Explanations

Who to Follow and Why: Link Prediction with Explanations Who to Foow and Why: Lin rediction with Expanations Nicoa Barbieri Yahoo Labs Barceona, Spain barbieri@yahoo-inc.com Francesco Bonchi Yahoo Labs Barceona, Spain bonchi@yahoo-inc.com Giuseppe Manco ICAR-CNR

More information



More information

Can political science literatures be believed? A study of publication bias in the APSR and the AJPS

Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Alan Gerber Yale University Neil Malhotra Stanford University Abstract Despite great attention to the

More information

Are Disagreements Honest? Tyler Cowen Robin Hanson * Department of Economics George Mason University. August 18, 2004 (First version April 16, 2001.

Are Disagreements Honest? Tyler Cowen Robin Hanson * Department of Economics George Mason University. August 18, 2004 (First version April 16, 2001. Are Disagreements Honest? Tyler Cowen Robin Hanson * Department of Economics George Mason University August 18, 2004 (First version April 16, 2001.) *The authors wish to thank Curt Adams, Nick Bostrom,

More information



More information


COMPUTING MACHINERY AND INTELLIGENCE A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433-460. COMPUTING MACHINERY AND INTELLIGENCE 1. The Imitation Game By A. M. Turing I propose to consider the question, "Can machines

More information

Computing Machinery and Intelligence

Computing Machinery and Intelligence Computing Machinery and Intelligence A. M. Turing 1950 1 The Imitation Game I propose to consider the question, Can machines think? This should begin with definitions of the meaning of the terms machine

More information

The New Development Economics: We Shall Experiment, but How Shall We Learn? October 2008 RWP08-055

The New Development Economics: We Shall Experiment, but How Shall We Learn? October 2008 RWP08-055 Faculty Research Working Papers Series The New Development Economics: We Shall Experiment, but How Shall We Learn? Dani Rodrik John F. Kennedy School of Government - Harvard University October 2008 RWP08-055

More information

An Introduction to Regression Analysis

An Introduction to Regression Analysis The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator

More information

Why I Am Not a Bayesian*

Why I Am Not a Bayesian* THEORIES OF CONFIRMATION objective mark by which, in actual cases, analytically true hypotheses may be picked out. The body of hypotheses by which a theoretical term is introduced is generally lost in

More information

The InStat guide to choosing and interpreting statistical tests

The InStat guide to choosing and interpreting statistical tests Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 1990-2003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:

More information