Teaching Bayesian Reasoning in Less Than Two Hours

Transcription

1 Joural of Experimetal Psychology: Geeral 21, Vol., No. 3, -4 Copyright 21 by the America Psychological Associatio, Ic /1/S5. DOI: 1.7// Teachig Bayesia Reasoig i Less Tha Two Hours Peter Sedlmeier Chemitz Uiversity of Techology Gerd Gigerezer ax Plack Istitute for Huma Developmet The authors preset ad test a ew method of teachig Bayesia reasoig, somethig about which previous teachig studies reported little success. Based o G. Gigerezer ad U. Hoffrage's (1995) ecological framework, the authors wrote a computerized tutorial program to trai people to costruct frequecy represetatios (represetatio traiig) rather tha to isert probabilities ito Bayes's rule (rule traiig). Bayesia computatios are simpler to perform with atural frequecies tha with probabilities, ad there are evolutioary reasos for assumig that cogitive algorithms have bee developed to deal with atural frequecies. I 2 studies, the authors compared represetatio traiig with rule traiig; the criteria were a immediate learig effect, trasfer to ew problems, ad log-term temporal stability. Rule traiig was as good i trasfer as represetatio traiig, but represetatio traiig had a higher immediate learig effect ad greater temporal stability. Statistical literacy, like readig ad writig, is idispesable for a educated citizeship i a fuctioig democracy, ad the dissemiatio of statistical iformatio i the 19th ad 2th ceturies has bee liked to the rise of democracies i the Wester world (Porter, 19). Iterest i statistical iformatio such as populatio figures has bee commo amog political leaders for ceturies (e.g., Bourguet, 1987). The willigess to make ecoomic ad demographic umbers public rather tha to treat them as state secrets, however, is of recet origi: The avalache of prited statistics after about 182 both iformed the public ad justified govermetal actio to the public (Kriiger, Dasto, & Heidelberger, 1987). Nevertheless, ulike readig ad writig, statistical literacy the art of drawig reasoable ifereces from such umbers is rarely taught (e.g., Garfield & Ahlgre, 1988; Shaugessy, 1992). The result of this has bee termed "iumeracy" (Paulos, 1988). I this article, we address the questio of how best to teach statistical literacy. We focus o the special case of Bayesia Peter Sedlmeier, Departmet of Psychology, Chemitz Uiversity of Techology, Chemitz, Germay; Gerd Gigerezer, Ceter for Adaptive Behavior ad Cogitio, ax Plack Istitute for Huma Developmet, Berli, Germay. This research was supported by a Feodor Lye Stiped of the Humboldt Foudatio as well as a Habilitatiosstipedium of the Deutsche Forschugsgemeischaft (awarded to Peter Sedlmeier) ad a UCSP athematics Project grat from the Uiversity of Chicago (awarded to Gerd Gigerezer). We are especially grateful to Jim agusso ad Tom cdougal, who helped the project get started, ad to Brad Pasaek, Nicola Korherr, Ursel Dohme, ad Gregor Caregato for their assistace i the traiig studies. We thak Doa Alexader, Bera Ede, Da Goldstei, Ralph Hertwig, ad Aita Todd for commets o a earlier versio of this article. Correspodece cocerig this article should be addressed to Peter Sedlmeier, Departmet of Psychology, Chemitz Uiversity of Techology, 917 Chemitz, Germay, or to Gerd Gigerezer, ax Plack Istitute for Huma Developmet, Letzeallee 94, 195 Berli, Germay. Electroic mail may be set to peter.sedlmeier@phil.tu-chemitz.de or to gigerezer@mpib-berli.mpg.de. iferece with biary hypotheses ad biary iformatio (for results of traiig i reasoig about other kids of statistical tasks, see Sedlmeier, 1999, 2). Here are two examples to which this form of statistical iferece applies. First, cosider the case of a 2-year-old ma from Dallas who had a routie HTV test (Gigerezer, 1998). The test result was positive; the youg ma assumed this meat he was ifected with the virus ad was plagued by thoughts of suicide. But what is the probability that he really has the virus give a positive test? Or cosider the case of Ala. Dershowitz, a Harvard professor ad advisor to the O. J. Simpso defese team. He stated o U.S. televisio that oly about.1% of wife batterers actually murder their wives ad claimed that therefore evidece of abuse ad batterig should ot be admissible i a murder trial. But what is the probability that the husbad was the murderer, give that he battered his wife ad the wife was killed (Good, 1995; Koehler, 1997)? Bayesia Iferece Our goal is to desig a effective method of teachig Bayesia iferece. This goal might appear to be doomed to failure for two reasos. First, a large body of experimetal results suggests that Bayesia iferece is alie to huma iferece; secod, a small umber of studies actually attemptig to teach people Bayesia reasoig met with little or o success. These two reasos eed to be addressed i more detail. Sice the pioeerig work of Ward Edwards ad his colleagues, a avalache of experimetal studies has ivestigated whether people reaso accordig to Bayes's rule (for a summary, see Koehler, 1996). Edwards's (1968) major fidig was "coservatism," that is, that participats overweighed base rates. I the 197s, however, Kahema ad Tversky (1972) argued that "i his evaluatio of evidece, ma is apparetly ot a coservative Bayesia: he is ot a Bayesia at all" (p. 45). Neglect rather tha overweighig of base rates became the message of their heuristicsad-biases program i the 197s ad 198s. "The geuieess, the robustess, ad the geerality of the base-rate fallacy are matters of established fact" (Bar-Hillel, 198, p. 2). These demostra-

2 TEACHING BAYESIAN REASONING 1 tios that huma iferece deviated radically from Bayesia iferece were ot cofied to laboratory studies; some experts coducted studies i the field ad reported similar results. For istace, Eddy (1982) asked physicias to estimate the probability that a woma with a positive mammogram actually has breast cacer, give a base rate of 1 % for breast cacer, a hit rate of about 8%, ad a false-alarm rate of about 1%. He reported that 95 of 1 physicias estimated the probability that she actually has breast cacer to be betwee 7% ad 8%, whereas Bayes's rule gives a value of about 7.5%. Such systematic deviatios from Bayesia reasoig have bee called "cogitive illusios," aalogous to stable ad icorrigible visual illusios (vo Witerfeldt & Edwards, 19; for a discussio of the aalogy, see Gigerezer, 1991). If the aalogy betwee cogitive illusios ad visual illusios holds, the teachig of statistical reasoig should have little hope of success. This coclusio seems to be cofirmed by the results of the few studies that have attempted to teach Bayesia iferece, usig mostly corrective feedback. Peterso, DuCharme, ad Edwards (1968) repeatedly showed their participats biomial samplig distributios to correct their "coservative" judgmets. Yet, this traiig did very little to reduce coservatism i further judgmets of the same type. Schaefer's (1976) statistically well traied participats received corrective feedback o their estimatios of probabilities ad also showed practically o traiig effect. Lidema, va de Brik, ad Hoogstrate (1988) gave corrective feedback o participats' solutios of problems like those used by Kahema ad Tversky (1973). No trasfer effect was foud i the test phase. Fially, Fog, Lurigio, ad Stalas (199) traied participats o the "law of large umbers" rather tha o what they called the "base-rate priciple" ad thereby oly idirectly traied Bayesia iferece; this traiig ehaced the use of base-rate iformatio i oly oe of several experimetal coditios. I these studies, traiig had little or o success. 1 The egative coclusios of the heuristics-ad-biases program (Kahema & Tversky, 1996) ad the meager results of the teachig studies seem to suggest to may what Gould (1992) so blutly stated: 'Tversky ad Kahema argue, correctly I thik, that our mids are ot built (for whatever reaso) to work by the rules of probability" (p. 469). Bayesia Algorithms Deped o Iformatio Format I the face of these results there seems to be little hope for a successful method of teachig Bayesia iferece ad statistical reasoig i geeral. Ad we would ot have tried had there ot bee two ovel results, both theoretical ad empirical (Gigerezer & Hoffrage, 1995). To uderstad the ovelty of the theoretical results, oe eeds to recall that research o statistical reasoig has focused o whether cogitive algorithms correspod to the laws of statistics or probability (as Piaget & Ihelder, 1951/1975, claimed for childre aged 11 ad older) or to simple ostatistical rules of thumb, as Kahema ad Tversky (1996) claim. However, to discuss huma iferece oly i terms of "what kid of rule?" is icomplete because cogitive algorithms work o iformatio, ad iformatio always eeds represetatio (arr, 1982). Take umerical iformatio ad the algorithms i a pocket calculator as a example. Numerical iformatio ca be represeted by the Arabic system, the Roma system, ad the biary system, amog others. These represetatios are mathematically equivalet (a isomorphic mappig exists), but they are ot equivalet for a calculator or a mid. The algorithms of pocket calculators are tued to Arabic umbers as iput data ad would perform badly if oe etered biary umbers. The huma mid seems to have evolved ad leared aalogous prefereces for particular formats. Cotemplate, for a momet, log divisio with Roma umerals. The argumet that cogitive algorithms are tued to particular formats of umerical iformatio coects cogitio with the eviromet ad ca be applied to Bayesia iferece. Assume that some capacity or algorithm for iductive iferece has bee built up i aimals ad humas through evolutio. To what iformatio format would such a algorithm be tued? It certaily would ot be tued to percetages ad probabilities (as i the typical experimets o cogitive illusios) because these took milleia of literacy ad umeracy to evolve as tools of commuicatio. athematical probability ad percetages are, after all, comparatively recet developmets (Gigerezer et al., 1989). Rather, i a illiterate world, the iput format would be atural frequecies, acquired by atural samplig (see below). The crucial theoretical results are (a) that Bayesia computatios are simpler whe iformatio is represeted i atural frequecies compared with probabilities, percetages, ad relative frequecies ad (b) that atural frequecies seem to correspod to the format of iformatio humas have ecoutered throughout most of their evolutioary developmet (Cosmides & Tooby, 1996; Gigerezer, 1994, 1998; Gigerezer & Hoffrage, 1995; Kleiter, 1994). Let us illustrate the cocept of atural frequecies ad how they facilitate computatios with the mammography problem itroduced earlier, i the form i which it was used i our traiig study: A reporter for a wome's mothly magazie would like to write a article about breast cacer. As a part of her research, she focuses o mammography as a idicator of breast cacer. She woders what it really meas if a woma tests positive for breast cacer durig her routie mammography examiatio. She has the followig data: The probability that a woma who udergoes a mammography will have breast cacer is 1%. If a woma udergoig a mammography has breast cacer, the probability that she will test positive is 8%. If a woma udergoig a mammography does ot have cacer, the probability that she will test positive is 1%. What is the probability that a woma who has udergoe a mammography actually has breast cacer if she tests positive? 1 There were also attempts to improve Bayesia reasoig by focusig participats' attetio o certai parts of Bayes's formula. Fischhoff, Slovic, & Lichtestei (1979, Study 1) tried to icrease participats' sesitivity to the impact of base rates by varyig the base rates of a Bayesia problem withi the same idividual but without givig feedback o participats' solutios. This maipulatio had almost o geeralizig effect o a secod task. I three experimets, Fischhoff ad Bar-Hillel (1984) examied the effect of differet focusig techiques o performace of Bayesia iferece tasks. They foud that participats took the iformatio to which the experimeters called their attetio ito accout, but this was doe equally for relevat ad irrelevat iformatio. I a recet study, Wolfe (1995, Experimet 3) foud comparable results.

3 2 SEDLEDER AND GIGERENZER The umerical iformatio i the mammography problem is represeted i terms of sigle-evet probabilities, that is, i a probability format. The three pieces of iformatio are the base rate p(cacer) =.1, the hit rate p(positive ' cacer) =.8, ad the false-alarm rate p(positive I o cacer) =.1. The task is to estimate the posterior probability p(cacer I positive). The Bayesia algorithm for computig the posterior probability from the probability format amouts to solvig the followig equatio: p(cacer)p( positive cacer) p(cacer positive) =, r p(cacer)p( positive cacer) + p(o cacer)p( positive o cacer) =.1 X.8/(.1 X X.1) =.75. (1) Both layme ad physicias have great difficulties with Bayesia iferece whe iformatio is give i a probability format (e.g., Aberathy & Hamm, 1995; Dowie & Elstei, 1988). For istace, Hoffrage ad Gigerezer (1998; Gigerezer, 1996) tested 48 physicias o four stadard diagostic problems, icludig mammography. Whe iformatio was preseted i terms of probabilities, oly 1% of the physicias reasoed cosistetly with Bayes's rule. Gigerezer, Hoffrage, ad Ebert (1998) studied how AIDS couselors explai what a low-risk cliet's chaces are that he actually has the virus if he tests positive. As a assumed cliet, oe of the authors visited 2 public health ceters i Germay to have 2 couselig sessios ad HTV tests. All the couselors commuicated the risks i probabilities ad percetages (rather tha i atural frequecies, see below) ad cosistetly overestimated the posterior probabilities of havig the virus give a positive test ( of 2 couselors estimated the probability as 99.9% or higher, whereas a reasoable estimate is about 5%), ad some couselors eve gave icosistet probability judgmets without oticig. Do these ad similar results imply that people are ot Bayesias? As the pocket calculator example illustrates, such a coclusio may be uwarrated. Let us ow chage the format of iformatio from probabilities ad percetages to atural frequecies. Natural frequecies represet umerical iformatio i terms of frequecies as they ca actually be experieced i a series of evets. ore techically, atural frequecies are frequecies that have ot bee ormalized with respect to the base rates; that is, they still carry iformatio about base rates (Gigerezer & Hoffrage, 1995, 1999): 2 A reporter for a wome's mothly magazie would like to write a article about breast cacer. As a part of her research, she focuses o mammography as a idicator of breast cacer. She woders what it really meas if a woma tests positive for breast cacer durig her routie mammography examiatio. She has the followig data: Te of every 1, wome who udergo a mammography have breast cacer. Eight of every 1 wome with breast cacer who udergo a mammography will test positive. Niety-ie of every 99 wome without breast cacer who udergo a mammography will test positive. Imagie a ew represetative sample of wome who have had a positive mammogram. How may of these wome would you expect to actually have breast cacer? What is the Bayesia algorithm whe the iformatio is preseted i atural frequecies? There are 8 wome with positive tests ad breast cacer (P & C) ad 99 wome with positive tests ad o breast cacer. Thus, the proportio of wome with breast cacer amog those who test positive is 8 out of 17 (8 + 99). Expressed i probabilities oe gets p(cacer positive) = #(P & C)/#P = 8/17 =.75. (2) Thus, Bayesia computatios are simpler whe the iformatio is represeted i a frequecy format (i.e., atural frequecies) rather tha i a probability format (Gigerezer & Hoffrage, 1995). I the frequecy format, oe ca immediately "see" the aswer: About 8 of 17 wome who test positive will have cacer. The geeral poit here is that Bayesia algorithms are depedet o the iformatio format. Note that the two iformatio formats probability ad frequecy are mathematically equivalet, ad so are the two equatios; but the Bayesia algorithms are ot computatioally ad psychologically equivalet. Cosistet with the theoretical result that Bayesia algorithms are simpler to use with atural frequecies tha with the widely used probabilities, ad the ecological thesis that, if the mid has evolved Bayesia algorithms, these are likely to be tued to atural frequecies, experimetal studies have show that people are more likely to use Bayesia reasoig with atural frequecies. Gigerezer ad Hoffrage (1995) tested laypeople o Bayesia iferece problems such as the mammography problem ad foud that, i every sigle oe, Bayesia reasoig occurred more ofte whe probabilities were replaced with atural frequecies (the two formats show earlier), with a average icrease i Bayesia solutios from 16% to 46%. Bayesia reasoig was measured both by process aalysis ad by outcome aalysis. Similar results with laypeople were foud by Christese-Szalaski ad Beach (1982) ad Cosmides ad Tooby (1996). Hoffrage ad Gigerezer (1998) tested physicias with a average of years professioal experiece ad foud that atural frequecies improve "isight" i physicias to about the same extet as i laypeople. As metioed earlier, with probabilities, physicias foud the Bayesia aswer i oly 1% of the cases; whe the same iformatio was represeted i atural frequecies, this umber wet up to 46%. We applied these theoretical ad empirical results whe desigig a tutorial program for teachig Bayesia reasoig, focusig o everyday situatios rather tha o the abstract world of "urs ad balls." 2 Natural frequecies must ot be cofused with frequecies that have bee ormalized with respect to the base rates. For istace, the iformatio i the mammography problem ca be expressed i relative frequecies that are ormalized with respect to the base rates: a base rate of.1, a hit rate of.8, ad a false positive rate of.1. Also, absolute frequecies ca be ormalized: a base rate of 1 i 1, a hit rate of 8 i 1, ad a false positive rate of 1 i 1. Normalized frequecies, like probabilities or percetages, are ormalized umbers that o loger carry iformatio about atural base rates (e.g., about the base rate of breast cacer). They do ot facilitate Bayesia reasoig.

4 TEACHING BAYESIAN REASONING 3 Teachig Bayesia Iferece Teachig represetatios is a alterative to the traditioal program of teachig rules, that is, teachig rules without simultaeously teachig represetatios (e.g., Arkes, 1981). A rule traiig program would try to teach Bayesia reasoig by first explaiig Bayes's rule i its abstract form ad the explaiig how to isert sigle-evet probabilities ito the rule (Falk & Koold, 1992). We are ot aware of ay studies o rule traiig for Bayesia reasoig, but rule traiig programs exist for other statistical rules, such as for the "law of large umbers," more precisely, for recogizig the impact of sample size (see Sedlmeier & Gigerezer, 1997,2). 3 For istace, Fog ad Nisbett (1991) proposed rule traiig for the law of large umbers ad foud moderate improvemet over a utraied cotrol; whe geeralizatio to a ew domai was tested after 2 weeks, this moderate effect was cosiderably dimiished (Ploger & Wilso, 1991; Reeves & Weisberg, 1993). We propose a alterative method: teachig Bayesia reasoig by showig people how to costruct frequecy represetatios. For this purpose, we desiged two versios of frequecy represetatios. Oe, the frequecy grid, has bee suggested as a meas to make the uderstadig of statistical tasks easier (e.g., Cole, 1988), ad the secod, the frequecy tree, is a variat of a tree structure ofte used i decisio aalysis. I the frequecy grid tutorial, participats leared how to costruct frequecy represetatios by meas of grids, ad i the frequecy tree tutorial, they leared to costruct frequecy represetatios by meas of trees. We also desiged a rule traiig tutorial as a cotrol, with which participats were taught how to isert probabilities ito Bayes's formula. All three tutorials were implemeted as a computer program o acitosh computers, writte i acitosh Commo Lisp (Apple Computer, Ic., 1992). I all coditios, the basic traiig mechaism was to have participats traslate the iformatio i the problem text ito a give format, that is, Bayes's formula, the frequecy grid, or the frequecy tree, ad have them practice with those formats. The traiig procedure for each of the tutorials had two parts. The first part guided participats through two iferetial tasks the sepsis problem (see below) ad the mammography problem. I the rule traiig tutorial, participats were istructed how to isert probability iformatio ito Bayes's formula. I the two tutorials that taught frequecy represetatios, the system showed participats how to traslate probability iformatio ito either a frequecy grid or a frequecy tree. After they were guided through each step i Part 1, the secod part of the traiig required participats to solve eight additioal problems o their ow with step-by-step feedback. The system asked them to solve each step before goig o to the ext oe. If participats had difficulties with followig the requests or made mistakes, the system provided immediate help or feedback. If, for istace, the user was required to eter umbers i the formula or the frequecy tree, ad the umbers etered were ot correct, the system gave immediate feedback. The user always had a choice betwee tryig agai or lettig the system perform the correctios. If the user decided to try agai, the system supplied some hits that were specific to the format used. If, after several corrective itervetios, the user was still uable to fill i the umbers correctly ad did ot wat to try agai, the system iserted the correct umbers ito the respective odes (frequecy tree) or slots (formula). For all traiig procedures, the help was sufficiet to esure that all participats would solve all problems correctly ad complete the traiig. We ow describe the rule traiig procedure ad the two frequecy represetatio traiig procedures (see Sedlmeier, 1997, for a detailed descriptio of a exteded versio of the system, ad for a program that provides a comprehesive treatmet of basic probability theory ad that icludes Bayesia reasoig as a part, see Sedlmeier & Kohlers, 21). Rule Traiig Durig traiig ad i all three tutorials, participats saw three widows o the scree. The problem widow, located i the top right portio (see Figure 1), displayed the problem text, i this case, the text of the sepsis problem. The tutor widow (white area) provided the explaatios ad istructios ad asked the user to perform certai actios. The represetatio widow (left half of Figure 1) performed demostratios ad allowed the user to maipulate its cotets. Figure 1 shows a scree at the begiig of the first part of the rule traiig procedure. Just before, the program had explaied to the participat that Bayes's formula allows oe to calculate the probability that the walk-i patiet who displays the symptoms metioed i the problem has sepsis. The program had also metioed that to calculate that probability, oe eeds p(r), p(ot H), p(d I H), ad p(d I ot H). H is short for hypothesis, such as sepsis, ad D stads for data such as the presece of the symptoms. At the curret poit i the traiig, the system begis to explai how to extract the umerical iformatio from the problem text. The tutor widow i Figure 1 explais which iformatio i the problem text correspods to p(h), the base rate. I the ext step (ot show), the base-rate iformatio is "traslated" ito a compoet of Bayes's formula. I this step, the empty slot i the represetatio widow is filled with the base-rate value of.1, ad it is explaied how the ext piece of iformatio, p(ot H), is calculated from the value of p(h) by subtractig it from 1. The, aalogously, the program explais which parts of the problem text correspod to p(d I H), the hit rate, ad p(d I ot H), the false-alarm rate, ad iserts the respective probabilities, that is,.8 (the probability of the symptoms give sepsis) ad.1 (the probability of the symptoms give o sepsis). Whe the slots for the four bits of iformatio are filled, the system creates a iitially empty "frame" for Bayes's formula ad demostrates how the probabilities are to be iserted ito the frame. Isertig the correct umbers ito that frame ad calculatig the result gives the posterior probability p(sepsis I symptoms). Figure 2 shows this fial state of the traslatio process for the mammography problem. All pieces of iformatio eeded i Bayes's formula have bee extracted from the problem text ad iserted ito the respective slots (Figure 2, upper left). The, the frame for Bayes's formula has bee filled with the respective probabilities (lower left) ad the result has bee calculated (lower right). I the secod part of the traiig procedure, the upper part of the represetatio widow, icludig the formula, was show immediately. The 3 Other authors have provided advice o how to reaso the Bayesia way but have ot reported traiig studies. Such advice icludes structurig the problem, modelig prior probabilities explicitly, stressig the statistical ature of base rate iformatio, clarifyig causal chais, ad providig idividuatig iformatio about base rates (e.g., vo Witerfeldt & Edwards, 19).

5 4 SEDLEIER AND GIGERENZER pu Bayes-Formula Sepsis Sepsis You are vorkig i a outpatiet cliic vhere the record sho'ss that durig the past year 1% of the Uc-i patiets have had sepsis. A patiet -walks i vith a high fever ad chills, ad you also ote that he has ski lesios, Accordig to the records: If a patiet has sepsis, there is a 8% chace that he or she vtil have these symptoms If a patiet does ot have sepsis, there is still a 1% chace that he or she will shov these symptoms P{H)*i»(BiB) p(h) * pi ih] + plot li) * p(b ot Hj Lookig at the problem, we fid that 1% of the walk-i patiets haue had sepsis. Therefore the probability for a patiet havig sepsis is.1 (1 divided by 1, or decimal poit moved from 1. to.1). This is the first piece of iformatio we eed i the formula. Cotiue Figure I. problem). Rule traiig (Bayes's rule). Scree shot is from the begiig of the first phase of traiig (sepsis frame for Bayes's formula appeared oly whe all the probabilities had bee correctly filled i. Frequecy Grid I a frequecy grid, each square represets oe case. Figure 3 shows a scree at the begiig of the first part of the frequecy grid traiig procedure. Just before, the program had iformed the participats that the empty squares i Figure 3 represet 1 walk-i patiets. Agai, the tutor widow explais which part of the problem text correspods to the base rate. I the ext step (ot show), 1 of the 1 squares are shaded to represet the 1% of walk-i patiets who suffer from sepsis. Evetually, circled pluses ("positives") are added to 8 of the 1 shaded squares (correspodig to the hit rate of 8%) ad to 9 of the 9 oshaded squares (correspodig to the false-alarm rate of 1%). Figure 4 shows the poit i traiig whe all the iformatio ecessary to solve the sepsis problem is filled i o the frequecy grid. The ratio of the umber of circled pluses i the shaded squares divided by the umber of all circled pluses gives the desired posterior probability, that is, ^(sepsis I symptoms). Participats could choose betwee two grid sizes (1 ad 1, cases) ad were ecouraged to select the oe that best represeted the iformatio give i a problem. For istace, for the mammography problem, the 5 X 2 grid is superior to the 1 X 1 grid because, i the latter, oe would have to deal with "rouded" persos. Figure 5 shows the completely filled i frequecy grid for the mammography problem, where the ratio of the umber of circled pluses i the shaded squares (8) divided by the umber of all circled pluses (17) gives the_desired posterior probability p(cacer I positive test) =.75. Frequecy Tree A frequecy tree (Figures 6 ad 7) does ot represet idividual cases but costructs a referece class (total umber of observatios) that is broke dow ito four subclasses. The top ode shows the size of the referece class (1 i Figure 6, ad 1. i Figure 7), which ca be chose freely i the program. I Figure 6. the program explais how oe obtais the base-rate frequecy of the walk-i patiets to be iserted i the "sepsis" ode (left middle ode) from the problem text. I the ext step (ot show). 1 is iserted i the sepsis ode ad the program explais how oe obtais the umber to be iserted i the "o sepsis" ode (by subtractig the 1 patiets with sepsis from all 1 patiets). Evetually, the 1 patiets i the sepsis ode are divided ito 8 (8% of 1) showig the symptoms ad 2 ot showig the symptoms (left two lower odes), ad the 9 patiets i the o sepsis ode are divided ito 9 (1% of 9) showig the symptoms ad 81 ot showig the symptoms. The posterior probability /?(sepsis I symptoms) is calculated by dividig the umber i the left black ode, the umber of true positives, by the sum of the umbers i both black odes, the total umber of positives. Figure 7 shows the complete frequecy tree for the mammography problem: The two middle odes specify the base-rate frequecies, that is. the umber of cases for which the hypothesis is

6 TEACHING BAYESIAN REASONING 5 puo pftlqt H) p( D I H) p( ot H( Bayes-Formuli 6s<C&rw* p =' s:t ; Taif > if if» Breast ammography A reporter for a wome's mothly magazie TOUld like to wite a article about breast cacer. As a part of her research, she focuses o taammography as e idicator of breast cacer. She woders vhat it really meas if a T«tima tests positive for breast cacer durig her routie mammography examiatio. She has the folio-wig data: The probability that atotaaisrtio udergoes a mammography -will have breast cacer is 1 %. If a woma udergoig a taammography has breast. cacer, the probability that she will test positive is 8%. If a wma "udergoig a memmography does ot have cacer, the probability that she "sdll test positive is 1%. p(h!d) - p(h)*p(bih} p(hj *p(;h) ^p(ot H) *p(dlot H) Calculatig the probability that a woma has actually breast cacer if she has a positiue test result fially giues.75 p(hsb)=j= Cotiue] Figure 2. problem). Rule traiig (Hayes's rule). Scree shot is from the ed of the first phase of traiig (mammography true (1 wome with breast cacer) ad the umber of cases for which the hypothesis is false (99 wome without breast cacer). The four odes at the lowest level split up the base-rate frequecies accordig to the diagostic iformatio (the result of the mammography). The posterior probability p(cacer I positive test) is agai calculated by dividig the umber i the left black ode, the true positives, by the sum of the umbers i both black odes, the total umber of positives. Evaluatio of Traiig Effectiveess To measure the effect of traiig represetatio or rule traiig the test problems were always give i a probability format. Before participats started to work o the problems, the program was explaied ad it was made sure that they uderstood all istructios. Figure 8 shows a example of a test problem preseted to the participats, the cab problem (Tversky & Kahema, 1982). The problem text ad the questio were always i two differet widows. Participats did ot have to do the calculatios; they were ecouraged just to type i their solutio as a formula. A formula cosisted of umbers, arithmetic operators, ad paretheses. This aswer format was used to miimize errors due to faulty calculatios. To avoid a systematic effect of problem difficulty o traiig results, the order of problems was systematically varied betwee participats i Study la ad completely couterbalaced accordig to a Lati square i Studies Ib ad 2. I all studies, traiig effectiveess was measured by comparig participats' solutio rates immediately after the traiig (Test 2), about a week after the traiig (Test 3), ad 1 3 moths after the traiig (Test 4) with the solutio rates at baselie (Test 1). ' Two Scorig Criteria We used two criteria to classify a aswer as a Bayesia solutio, oe strict ad oe liberal. For the strict criterio, the posterior probability calculated by the participat either i the form of a umerical value or a formula had to match exactly the value obtaied by Bayes's rule, roudig up or dow to the ext digit (percetage poit). This measure might, however, obscure the fact that participats gaied some "ballpark" isight that eabled them to produce a soud but iexact respose. To take this possibility ito accout, we also used a more liberal scorig criterio, which couted a participat's estimate as a Bayesia solutio whe it came withi ±five percetage poits of the value obtaied by Bayes's rule. The liberal criterio, however, icreased the possibility that o-bayesia reasoig is mistake as Bayesia reasoig. As Gigerezer ad Hoffrage (1995) have demostrated, participats cofroted with Bayesia tasks ofte use o- Bayesia algorithms, which by accidet might yield results that fall ito the iterval specified by the liberal scorig criterio. The most frequet o-bayesia algorithms they idetified iclude computig p(h&d) by multiplyig p(h) ad p(d I H); computig p(d I H) - p(d I ot H); or simply pickig p(d I H) or p(h) from the problem descriptio. I the mammography problem, oe of these alterative strategies leads to a result that would be misclassified by a liberal scorig rule as a "Bayesia solutio," but this

7 3 SEDLEIER AND GIGERENZER ^Q ^J^_ 1 cases J H z j _J_ zmz uz ZQZ _T~ [_.^..a^ii^a zoz 1 ; ZQZ i BI == Sepsis You are workig i a outpatiet cliic vhere the record sho'ws that durig the past year 1% of the walk-i patiets have had sepsis. A patiet -walks i with a high fever ad chills, ad you also ote that he has ski lesios. Accordig to the records: If a patiet has sepsis, there is a 8% chace that he or she trill have these symptoms If a patiet does ot have sepsis, there is still a 1% chace that he or she will shov these symptoms Now, lookig at the problem, we see that 1% of the populatio (walk-i patiets) haue had sepsis. That meas that 1 out of our 1 patiets actually haue sepsis. [cotiue] Figure 3. A empty 1X1 frequecy grid (grid size 1). Scree shot is from the begiig of the first phase of traiig (sepsis problem). occurs i other problems. The "rubella problem" illustrates this case: I Germay, every expectat mother must have a obligatory test for rubella ifectio because childre bor to wome who have rubella while pregat are ofte bor with terrible deformities. The followig iformatio is at your disposal: The probability that a ewbor will have deformities traceable to a sickess of its mother durig pregacy is 1%. If a child is bor healthy ad ormal, the probability that the mother had rubella durig her pregacy is 1%. If a child is bor with deformities ad it ca be traced to some sickess of the mother, the probability that the mother had rubella durig her pregacy is 5%. What is the probability that a child will be bor with deformities if its mother had rubella durig her pregacy? The Bayesia solutio /?(H I D) is.48. But participats who use oe of two o-bayesia algorithms, computig p(h&d) =.5 or pickig p(h) =.1, will produce estimates that lie i the iterval of ±5 percetage poits aroud the Bayesia solutio. These cases would be misclassified by a liberal scorig criterio but ot by a strict scorig criterio (for details, see Gigerezer & Hoffrage, 1995). To reduce the possibility of such misclassificatios, we computed for each problem the results of the o- Bayesia algorithms, ad whe a participat respoded with exactly oe of these results, it was couted as a o-bayesia aswer eve though it was withi the 5 percetage poits rage. Three easures of Traiig Effectiveess We measured three possible effects of the traiig: the immediate traiig effect (Test 1 compared with Test 2), the geeralizatio or trasfer to ew problems, ad the temporal stability of learig over time (Test 2 compared with Tests 3 ad 4). The most iterestig measure is stability. ay who teach statistics have the experiece that studets ofte study successfully for a exam but quickly forget what they leared after the exam: a steep decay curve. That statistical reasoig does ot tur ito a habit of mid may ot be etirely the studets' fault; rather, we cojecture, it is liked to the widespread use of probabilities or percetages as represetatios for ucertaities ad risks. If the thesis is correct that atural frequecies correspod to the format of iformatio humas have ecoutered throughout most of their evolutioary developmet, oe should expect that decay should ot be as quick as with rule traiig. Effect sizes rather tha sigificace tests were used for the statistical aalysis of traiig effects (for reasos for usig effect sizes, see Cohe, 199; Loftus, 1993; Rosow & Rosethal, 1996; Schmidt, 1996; Sedlmeier, 1996, 1999, Appedix C). Correlatioal effect sizes (r) i all studies were calculated from the results of sigificace tests as follows (e.g., Rosethal & Rosow, 1991): To evaluate immediate traiig effects for a give traiig coditio, effect sizes were obtaied from repeated measures aalyses of variace (ANOVAs) with tests (Test 1, Test 2) as the repeated factors by calculatig r = [FI(F + df)] tf2. To evaluate differetial traiig effects for two give traiig coditios, that is, for how

8 TEACHING BAYESIAN REASONING 7 1 cases Sepsis You are -workig i a outpatiet cliic where the record shovs that durig the past year 1% of the walk-i patiets have had sepsis. A patiet walks i sith a high fever ed chills, ad you also ote that he has ski lesios. Accordig to the records: If a patiet has sepsis, there is a 8% chace that he or she trill have these symptoms If a patiet does ot have sepsis, there is still a 1% chace that he or she will show these symptoms Sepsis Symptoms 8 people who actually haue sepsis test positiue (haue the symptoms), but so do 9 of the people mho do ot haue sepsis. What is the probability that oe of these 17 people who test positiue has sepsis? Quite simply, we kow that 8 out of the 17 haue sepsis. The solutio to the problem is: 8/(8+9) =.47 (or 47%) Note that we eed to extract oly TWO pieces of iformatio from the problem. [cotiue] Figure 4. A filled K) X 1 frequecy grid (grid size 1). Scree shot is from the middle of the first phase of traiig (sepsis problem). much better oe traiig coditio does tha the other, the improvemet scores for the short-term traiig effect (Test 2 - Test 1) ad for the log-term traiig effect (Test 4 Test 1) were used. Effect sizes were obtaied from t tests that compared these improvemet scores betwee the two coditios by calculatig r = \fl(f- + df)]" 2. The tables that report effect sizes also cotai test statistics, that is, values for F ad /, ad degrees of freedom so that iterested readers ca easily look up p values from tables of the F ad the t distributios. Note that the effect sizes rely o comparisos betwee meas ad therefore rather uderestimate the true effects if the distributios cotai outliers. This was the case for all studies reported here. Therefore, we report i the figures the more robust medias that ca give a more realistic picture. Uless specified otherwise, we report performace i terms of the liberal criterio. Overall, the differece betwee the two criteria was oly oe of quatity ad ot of quality. However, the Appedix shows the complete results, icludig medias, meas, stadard deviatios, ad group sizes for both the liberal ad strict scorig criteria. Study la Whe people are taught to costruct frequecy represetatios, will their Bayesia reasoig improve after traiig? Will teachig represetatios eable the trasfer of these ew skills to ew problems? Will performace decay over time or will there be some stability? The computatioal result (that Bayesia calculatios are easier with atural frequecies) ad the evolutioary hypothesis (that mids are tued to frequecy represetatios) gave us some hope for improvemet, trasfer, ad stability. We desiged a traiig study to put our hopes to the test. ethod Four groups of participats took part i the study. Oe group worked with the frequecy grid, oe with the frequecy tree, ad oe with the rule traiig. A fourth group did ot receive traiig ad served as a cotrol. For the three traiig groups, the study cosisted of three sessios with four tests altogether. For the cotrol group, there were two sessios ad two tests. The traiig ad all tests were admiistered o the computer. Procedure. Test 1 (first sessio) provided a baselie for performace. Participats were give 1 problems. Before they started to work o the problems, the program was explaied ad it was cofirmed that they uderstood all istructios. After the baselie test (Test 1), participats i the three traiig groups received traiig o 1 problems (2 i Part 1 ad 8 i Part 2). They the had to solve aother 1 problems (Test 2). The traiig lasted betwee 1 ad 2 hr; the computerized tutorials allowed participats to work at their ow pace. The etire first sessio (icludig Tests 1 ad 2) lasted betwee 1 hr 45 mi ad 3 hr for traiig groups ad betwee ad 3 mi for the cotrol group. The secod sessio (1 week after the first sessio) ad the third sessio (5 weeks after the first sessio) served to test trasfer ad stability. Participats i the cotrol group participated i Sessios 1 ad 2 oly, 1 week apart. I each of the tests, participats had to solve 1 problems, most of them from Gigerezer ad Hoffrage (1995). Two of the problems, the sepsis problem ad the mammography problem (see Figures 1 ad 2), were used i all four tests ad i the traiig. Tests 3 ad 4 each cotaied oe additioal "old" problem, that is, a problem already used i the traiig. All the other problems were

9 8 SEDLEIER AND GIGERENZER 1 cases afflfflfflffla D BEBfflfflfflfflDDDDDDDaDDDaDDDDDDDDDDDDDDDDDDD Breast afflfflaaa BfflHHfflffl r Lacer arp,. ffiebebebeseb BErasffiH fflfflfflhffiffl afflsfflffl ES _ Baaaa Hfflfflfflffl Positiue EBHSESHHc T ' e t s ' aafflaa i i ammography A reporter for a wome's mothly magazie would like to write a article about breast cacer. As a part of her research, she focuses ou mamiaography as a idicator of breast cacer. She woders what it really meas if a woma tests positive for breast cacer durig her routie mamtography examiatio. She has the followig data: The probability that a woma who udergoes a macmography will have breast cacer is 1 %. If a woma udergoig a mammography has breast cacer, the probability that she will test positive is 6%. If a woma udergoig a laamtaography does ot have cacer, the probability that she will test positive is 1%. Noui let's get back to the questio: What is the probability that a uiora who has udergoe a mammography actually has breast cacer, if she tests positive? The key to this is agai the last part of the questio, IF SHE TESTS POSITIUE. HOUJ may people out of our 1, cases test positive? [cotiue] Figure 5. A 5 X 2 frequecy grid (grid size 1,). Scree shot is from the ed of the first phase of traiig (mammography problem). "ew," that is, ot used before, either i a test or i the traiig. The use of both old ad ew problems allowed us to examie how well the traiig geeralized to problems participats had ot see before. The problems were couterbalaced across sessios, ad participats were assiged radomly to oe of the four groups. Participats. Sixty-two Uiversity of Chicago studets were paid for their participatio i two istallmets, after the first ad third sessios, respectively. Six participats who achieved 6% or more correct solutios i Test 1 (baselie) were excluded from the study. Two participats did ot complete the first sessio. We traied participats i the grid coditio. participats i the tree coditio. 2 participats i the rule traiig coditio, ad we had 5 participats i the cotrol coditio. We had some loss of participats over the 5-week period due to heavy study loads (ed of sprig term). The umber of participats i the secod ad third sessios were 12 ad 7, respectively, i the grid coditio, ad 5 i the tree coditio, ad ad 1 i the rule traiig coditio. Four of the five members of the cotrol group took part i the secod sessio. Results Figure 9 shows the media percetages of correct solutios for the three traiig coditios ad the cotrol group usig the liberal scorig criterio. Immediate effect. At baselie (Test 1), the media percetage of Bayesia solutios was 1% i the frequecy coditios ad % i the rule traiig coditio. After traiig, there was a substatial improvemet i Bayesia reasoig i each of the three traiig coditios. The media performace after rule traiig icreased to 6%, whereas it was 75% ad 9% for the two frequecy represetatio traiig sessios. I terms of correlatioal effect sizes, which express the immediate effect of a traiig procedure, the traiig effects were very large" for each traiig, with / >.9 for the represetatio traiig ad r >.8 for the rule traiig (Table 1). I cotrast, the cotrol group showed oly miimal improvemet. Trasfer. To what extet were participats able to geeralize from the 1 problems they solved durig traiig (traiig problems) to problems with differet cotets (trasfer problems)? To test trasfer, we compared the solutios for traiig ad trasfer problems. Recall that 2 of the traiig problems, the mammography problem ad the sepsis problem, were give i all four tests, ad i Tests 3 ad 4, participats ecoutered 1 additioal traiig problem. To the extet that a traiig method promotes the ability to geeralize a techique to costruct frequecy represetatios or to isert probabilities ito a formula there should be little differece betwee traiig ad trasfer problems. A zero value for the differece betwee traiig ad trasfer problems would mea perfect trasfer; a large positive value of the size of the differece betwee traiig ad trasfer problems, that is, 6 to 8 percetage poits, would mea complete lack of trasfer. The mea percetage of Bayesia solutios was geerally almost as high for the trasfer problems as for the traiig problems. With the liberal scorig procedure, the differeces betwee traiig problems ad trasfer problems were, o average, 7.2, 3., ad -.8 percetage poits for the frequecy tree, frequecy grid, ad rule traiig methods, respectively, ad with the strict scorig

10 TEACHING BAYESIAN REASONING 9 Sepsis You ere pricig i a outpatiet cliic where the record shows that durig the past year 1% of the vaik-i patiets have had sepsis. A patiet f/alfcs i T?ith a high fever ad chills, ad you also ote that he has ski lesios. Accordig to the records: If a patiet has sepsis, there is a 8% chace that he or she will have these symptoms If a patiet does ot have sepsis, there is still a 1% chace that he or she 'sill show these symptoms Now, lookig at the problem, u»e see that 1% of the populatio (walk-i patiets) haue had sepsis. That meas that 1 out of our 1 patiets actually haue sepsis. I Cotiue Figure 6. A frequecy tree. Scree shot is from the begiig of the first phase of traiig (sepsis problem). procedure, they were 6., 6.3, ad 5.3 percetage poits. To summarize, each of the three traiig programs led to high levels of trasfer; that is, participats' average performace i ew problems was almost as good as i old problems. Note that this result cocers the differece betwee the umber of Bayesia solutios i traiig ad trasfer problems, ot the absolute umber of Bayesia solutios i trasfer problems. The absolute umber was cosistetly larger for those participats who were taught to costruct frequecy represetatios (Figure 9). Stability. For the rule traiig. Figure 9 shows that, 5 weeks after traiig, Bayesia reasoig is dow to a media of 2% almost back to where it was before traiig. The studets who were taught to costruct frequecy represetatios, however, show a differet curve. The higher immediate effect of traiig is ot lost, ad 5 weeks after traiig, there is eve a icrease i the media umber of Bayesia ifereces i the frequecy grid coditio. The calculatio of a correlatioal effect size that expresses the differece i the log-term traiig effect (Test 4 Test 1) betwee the combied frequecy coditios ad the rule traiig coditio resulted i a medium to large effect size accordig to Cohe's (1992) covetios (see Table 1, log-term differetial traiig effect). However, there is possibly a alterative iterpretatio of the log-term stability: the high attritio rate toward the ed of the study. If predomiatly weaker participats had dropped out, the the log-term results would be upwardly biased because they would maily reflect the achievemet of the stroger participats. Is there evidece for this cojecture? We checked whether the performace of participats who completed all four tests differed from those who did ot. For both represetatio traiig versios, the media performace of.those who completed all tests was the same as that of the total group show i Figure 9, except for oe poit the frequecy grid group at Test 2 matched the media of the frequecy tree group. Thus, the results of the represetatio traiig seem to be uiflueced by the attritio. For the rule traiig, there was a small differece, which is show i Figure 9. The dotted lie shows the performace of those participats who completed all tests. Their performace was slightly above the total group but showed the same patter of decay. This aalysis idicates that the results i Figure 9 are ot much iflueced by a potetial differece betwee those participats who dropped out ad those who completed all four tests. Discussio Study la showed that all three traiig programs ca improve Bayesia reasoig. The degree of improvemet was, as it should be, larger tha Gigerezer ad Hoffrage (1995) had reached without traiig, that is, by merely presetig iformatio i atural frequecies (46% o average, w : ith a strict scorig procedure). The differece betwee the represetatio ad the rule traiig was most proouced i the temporal stability of what participats had leared. The time eeded for teachig represetatios was short, betwee 1 ad 2 hr (ot coutig the time eeded for the tests). depedig o the speed of the idividual participat.

11 39 SEDLEIER AND GIGERENZER Frequecy Tree ammography A reporter for a TOmea's mothly magazie would like to write a article about breast cacer. As a part of her research, she focuses o mamiography as a idicator of breast cacer. She woders what it really meas if a votaa tests positive for breast cacer durig her routie mammography examiatio. She has the followig data: The probability that a "romaa who udergoes a mamruography will have breast cacer is 1 %, If a voffia udergoig a mammography has breast ; cacer, the probability that she -sill test positive is 8%. If a wma udergoig a maamography does ot have cacer, the probability that she will test positive is NOLU let's get back to the questio: lilhat is the probability that a woma who has udergoe a mammography actually has breast cacer, if she tests positiue? The key to this is agai the last part of the questio, IF SHE TESTS POSITIUE. How may people out of our 1, cases test positiue? [cotiue] ^ j i Figure 7. A frequecy tree. Scree shot is from the ed of the first phase of traiig (mammography problem). Study la, however, had its limits ad therefore should be assiged the status of a pilot study. First, there was the high attritio rate. Although the participats who completed all four tests did ot seem to differ from those who completed oly the first two or three tests, the high attritio rate may have affected the reliability of the results i Test 4. Secod, the study does ot ecessarily show that the distictio betwee probabilities ad frequecies was the oly factor that made a differece because the traiig coditios differed ot oly i whether frequecy or probability formats were used but also i whether the coditios relied o a graphical aid. Graphics might have bee a importat factor i achievig traiig success. Study Ib addressed the first cojecture, ad Study 2 addressed the secod. Study Ib This study ivestigated whether the results of Study la could be replicated i the absece of high attritio rates. To help prevet high attritio rates, participats were paid oly at the ed of Sessio 3 rather tha i two istallmets, as i Study 1 a. Furthermore, Study Ib addressed the questio of whether results are iflueced by performace-cotiget paymet (participats i Study la were paid a flat sum, idepedet of their performace). If a flat fee is paid, participats might ot be motivated to do their best (Hertwig & Ortma, 1999). ethod Two of the three traiig programs from Study 1 a were used i this study: the frequecy tree traiig ad the rule traiig. Germa versios of the programs were used because participats i Study Ib were Germa. Procedure. Two groups of participats took part i the study. Oe group was taught with the frequecy tree ad the other with rule traiig. About half of the participats i each group were told at the begiig of the first sessio that the 2% of participats who achieved the best results overall would receive a moetary bous. The first sessio cotaied a baselie test (Test 1), the traiig, ad a posttest (Test 2). Testig ad traiig proceeded as i Study 1 a. The Germa participats took more time tha did their America couterparts i Study la. Observatio of participats suggested that the Germas took the task more seriously tha did their America couterparts. If they could ot solve a task, they did ot easily switch to the ext oe, a behavior that was frequetly observed i the America participats. To achieve average times comparable to those eeded i Study la, that is, about 2.5 hr for tests ad traiig combied, the umber of tasks was reduced to seve per test, ad the umber of traiig tasks was reduced to six (two i Part 1 ad four i Part 2). Each test cotaied two "old" tasks, the sepsis ad mammography tasks, that were also used i the traiig ad five "ew" tasks, that is, tasks ot previously used i either test or traiig. The secod sessio (Test 3, about 1 week after the first) served to assess trasfer ad short-term stability. Fially, the third sessio, which was held, o average, about 5 weeks after the first, measured log-term stability. Participats. Fifty-six studets at the Free Uiversity of Berli, Germay, were paid for their participatio. Ulike i Study la, oe of the participats i Studies Ib ad 2 reached more tha 6% solutios at the baselie test; thus o participats were excluded i these studies. Twetyeight participats were traied i each coditio. Fourtee participats i the frequecy tree coditio ad i the rule traiig coditio were told that they would receive a moetary bous if their results were amog the best 2%. With the help of the revised paymet schedule, there was o attritio of participats over the course of the study.

12 TEACHING BAYESIAN REASONING 391 Please read carefully A cab was ivolved i a hit ad ru accidet at ight. Two cab compaies, the Gree ad the Blue, operate i the city. You are give the followig data: A witess idetified the cab as Blue. The court tested the reliability of the witess uder the same circumstaces that existed o the ight of the accidet ad cocluded that the witess correctly idetified each oe of the two colors 8% of the time ad failed 2% of the time. 85% of the cabs i the city are Gree ad % are Blue. Ie questio is: What is the probability that the cab ivolved i the accidet was Blue rather tha Gree? Please giue your aswer -- a umber or a formula Please keep i mid: Keep spaces o both sides of +, -, /, ad *. Whe fiished, click the OK butto Figure 8. Testig sessio: The problem text (here, the cab problem) is i the upper widow, the questio i the lower left widow, ad the istructios i the lower right widow. Participats ca type i umbers or formulas cosistig of paretheses ad basic arithmetic operators. Results Figure 1 shows the performace for both traiig methods (Figure loa) ad this performace broke dow to participats with ad without the performace-cotiget bous (Figures lob ad loc). Immediate effect. Similar to the America participats i Study la, the Germa participats showed little or o skills to solve Bayesia tasks: At Test 1, the media umber of problems solved is zero. The immediate traiig effect (Test 2 - Test 1) is of very similar magitude as for the America participats i Study la: Both teachig methods improve Bayesia reasoig, with a media of 64% for the rule traiig ad % for the represetatio traiig. This differetial effect was more proouced for participats who were ot told about a bous tha for those who could expect to ear oe (compare Figures lob ad loc). The effect size aalysis that relied o the more coservative meas rather tha the medias gives a similar picture (Table 2). All effect sizes measurig the immediate effect were large ad were more proouced i the frequecy tree coditio tha i rule traiig, i particular whe paymet was ot performace cotiget (Table 2, short-term differetial traiig effect). I o sigle test, regardless of whether there was a prospect of a bous, did the rule traiig performace surpass that of the represetatio traiig. Trasfer. To test trasfer, the solutios i the two "old" tasks that were used i all tests as well as i the traiig were compared with the results i the "ew" tasks that were used oly oce. Trasfer was excellet i both traiig programs. The average differece betwee old ad ew tasks was oly 2.4 percetage poits i the frequecy tree coditio ad zero i the rule traiig coditio, with the liberal scorig criterio (with the strict scorig criterio, the correspodig values were 4.6 ad 2.4 percetage poits, respectively). Trasfer was ot iflueced by whether a bous could be expected. The differece betwee old ad ew tasks i the bous ad o-bous subgroups differed from those,i the overall aalysis by, at most,.9 percetage poits. Stability. Figure loa shows basically the same patter show i Figure 9, except that the decay i the rule traiig is ot as strog after 5 weeks, performace is dow to oly 43% compared with 2% (Figure 9). But this direct compariso betwee the two studies would be misleadig because it aggregates over the bous ad o-bous groups i Study Ib, which show differet performace patters (Figures lob ad loc). Whe participats could ot wi a bous, the performace was almost idetical with that i Study la, where participats were also ot offered bouses. The decay curve foud i the rule traiig coditio of Study la could be replicated almost perfectly (compare Figure 9, "Frequecy Tree" ad "Rule Traiig," with Figure lob): After 5 weeks, a media of oly % Bayesia solutios was foud. If, however, participats had the prospect of a bous, there was o decay i the rule traiig coditio (see Figure loc). I cotrast, the results i the frequecy tree coditio were ot iflueced by whether participats could expect to receive a bous. For istace, i the o-bous group, the performace remaied at a media of % Bayesia solutios over 5 weeks, from Test 2 to Test 4. Take together, the o-bous group provides a almost exact replicatio of the results foud i Study la. Also, the effect sizes for the log-term differetial traiig effect, that is, how much better participats leared i the frequecy tree tha i the rule traiig coditio, are comparable to the oe foud i Study la

13 392 SEDLEffiR AND GIGERENZER 1 T Frequecy Tree Frequecy Grid Rule Traiig A - - Rule traiig, complete data oly X Cotrol Baselie (Test 1) Post traiig (Test 2) Oe-week follow up (Test 3) Five-weeks follow up (Test 4) Figure 9. percetages of Bayesia solutios obtaied i Study la (out of 1 possible) for the three traiig coditios ad the cotrol coditio (liberal scorig criterio). For the rule traiig, values for all participats ad those participats who completed all four tests are show separately. (Table 2, log-term differetial traiig effect, "No bous"). The combiatio of a bous with rule traiig, however, led to a ew result. We try to explai this result i the ext sectio. Discussio Study Ib reduced the attritio rate to zero ad replicated the major fidigs of Study la: Both the represetatio ad the rule traiig led to a substatial ad immediate improvemet i Bayesia iferece, with about the same advatage for represetatio traiig as i Study la; both types of traiig were equally excellet i trasfer; ad the represetatio traiig provided temporally stable improvemets, whereas the rule traiig showed decay. This holds true for both the media solutio rates ad the effect sizes based o the meas. The ew fidig was that the rule traiig did ot show a decay whe participats were offered a bous (there was oe i Study la). What could be the reaso for this bous effect? We suggest the followig: ay Germa high school studets ad most Germa uiversity studets have heard about Bayes's formula. At miimum, our participats probably kew where they could fid out Table 1 Correlatioal Effect Sizes Expressig Immediate Traiig Effects Differetial Traiig Effects Across Coditios i Study la Liberal scorig Withi Coditios ad Strict scorig Traiig effect Test statistic Test statistic df Immediate (Test 2 - Test 1) Frequecy tree Frequecy grid Rule traiig Short-term differetial (Test 2 - Test 1) Frequecy coditios versus rule traiig Log-term differetial (Test 4 - Test 1) Frequecy coditio(s) versus rule traiig Note. The effect sizes for the immediate traiig effects were calculated from repeated measures ANOVAs with tests (Test 1, Test 2) as the repeated factors, ad differetial traiig effects were calculated from t tests of group differeces usig improvemet scores (Test 2 Test 1 ad Test 4 Test 1). The table icludes data for both liberal ad strict scorig criteria. For each compariso, it shows test statistic (F for immediate effects ad / for differetial effects), correlatioal effect size r, ad df.

14 TEACHING BAYESIAN REASONING 393 a) No Bous Oe-week Five-weeks follow up follow up (Test 3) (Test 4) Frequecy Tree Rule Traiig Baselie (Test 1) Post traiig (Test 2) Oe-week follow up (Test 3) Five-weeks follow up (Test 4) Baselie Post Oe-week Five-weeks (Test 1) traiig follow up follow up (Test 2) (Test 3) (Test 4) Figure 1. percetages of Bayesia solutios obtaied i Study Ib (out of seve possible) for the two traiig coditios (liberal scorig criterio). Combied results (Pael a) ad separate results for bous ad o-bous subgroups (Paels b ad c) are show. about the formula: i mathematics school books for Grades 1 to ad i statistics textbooks. Thus, some of the participats who were motivated by the prospect of a bous may have looked up Bayes's rule i the books. To check this hypothesis, we tried to cotact all participats i the rule traiig who were told about the bous. Because of address chages ad other reasos, we were able to reach oly 7 participats. Of these, oly 1, a law studet, reported that he had leared the formula durig the traiig ad remembered it well over the whole period without thikig much about it. The other 6 coceded that they had recogized the Table 2 Correlatioal Effect Sizes Expressig Immediate Traiig Effects Differetial Traiig Effects Across Coditios i Study Ib Liberal scorig Withi Coditios ad Strict scorig Traiig effect Test statistic r Test statistic r df Immediate (Test 2 - Test 1) Frequecy tree Bous No bous Rule traiig Bous No bous Short-term differetial (Test 2 - Test 1) Frequecy tree versus rule traiig Bous No bous Log-term differetial (Test 4 - Test 1) Frequecy tree versus rule traiig Bous No bous Note. The effect sizes for the immediate traiig effects were calculated from repeated measures ANOVAs with tests (Test 1, Test 2) as the repeated factors, ad differetial traiig effects were calculated from t tests of group differeces usig improvemet scores (Test 2 - Test 1 ad Test 4 - Test 1). For each compariso, the test statistic (F for immediate effects ad t for differetial effects), correlatioal effect size r, ad df are show.

15 394 SEDLEIER AND GIGERENZER formula from some statistics course ad had thought about it before the retests. Two of the participats said that they had looked it up i statistics textbooks, ad 1 admitted to havig made a copy of the formula from the traiig sessio ad practicig with that copy at home. Thus, it seems that additioal effort is a plausible explaatio for the better results of those participats i the rule traiig coditio who could expect to receive a bous. I cotrast, there was o way that participats could lear about the frequecy tree because Germa mathematics or statistics textbooks do ot itroduce that kid of represetatio. This iterpretatio suggests that fiacial icetives ca play a importat role i statistical traiig, i leadig to additioal efforts to look up the formula outside the laboratory. Nobody deies that studets ca, i priciple, lear to apply Bayes's rule successfully (otherwise, there would be o experts i statistics), ad this study has show that moetary icetives help. Frequecy represetatios, however, still lead to slightly better (ad cheaper) results without a moetary motivatio. ^= Probability Tree L-^=^ p[breast Cacer or o Btesst Cacer] Study 2 Study Ib successfully replicated the differece i learig effect betwee a traiig program usig a frequecy represetatio ad oe relyig o the use of rules. However, the results of both Studies la ad Ib still left ope a alterative explaatio for the superiority of the represetatio traiig over the rule traiig: Perhaps it was ot the differece betwee frequecy ad probability formats, but rather whether graphical aids were used, that was resposible for the differece i traiig results. The mai aim of Study 2 was to test this objectio. Furthermore, this study examied whether the stability over time foud for the represetatio traiig i the previous studies holds for a loger period of time weeks rather tha 5 weeks. To avoid the potetial ifluece of lookig up Bayes's rule outside the laboratory, o bous was offered i this study. As i Study Ib, participats were paid at the ed of Sessio 3. A graphical aid, the tree, was used for both the frequecy ad the probability formats. If the graphical aid is the decisive factor i Bayesia iferece traiig, the there should be o systematic differece i results betwee the probability ad frequecy coditios. The tree coditios were compared agaist the stadard rule traiig used i the previous studies. ethod Two of the three traiig methods used i this study, the rule traiig ad the frequecy tree traiig, were idetical to the oes i Study Ib. We refer to the third traiig method as the "probability tree" traiig. Probability tree. I a probability tree (Figure 11), the top ode cotais the value 1, that is, the probability that the respective hypothesis is true or ot true. I the specific example that uses the mammography task (see earlier example), this is the probability that a woma who has udergoe a mammography does or does ot have breast cacer. The two middle odes show the base-rate probabilities of breast cacer (p =.1) ad its complemet, o breast cacer (p =.99). The four odes at the lowest level split up the base-rate probabilities accordig to the diagostic iformatio i our case, the result of the mammography. Oly the values i the two shaded odes are eeded to calculate the posterior probability, p(cacer I positive test), because ^(cacer I positive test) = p( cacer & positive test)//>(positive test), where ^(cacer & positive test) is represeted by the left black ode, ad the sum of both black odes gives ^(positive Figure 11. Probability tree as used i Study 2. Scree shot is from the first phase of traiig (mammography task). test). Thus, calculatio i the probability tree is idetical to that i the frequecy tree except that the value i the top ode is always 1. Procedure. Three groups of participats took part i the study. Oe group worked with the frequecy tree, the secod with the probability tree, ad the third with Bayes's formula. Participats used Germa versios of the tasks from Study la ad completed three sessios. The first sessio cotaied a baselie test (Test 1). the traiig, ad a posttest (Test 2). Testig ad traiig proceeded as i Study Ib. The secod sessio (Test 3, about 1 week after the first) served to assess trasfer ad short-term stability. Fially, the third sessio, which was held about weeks after the first, measured log-term stability. The average itervals betwee traiig ad Test 4 for the frequecy tree, the probability tree, ad the rule traiig coditio were.4 weeks,.8 weeks, ad.8 weeks, respectively. This prologed time iterval allowed us to test to what degree the excellet stability observed i Studies la ad Ib, 5 weeks after traiig, still existed at the later time. Participats. Sevety-two studets at the Uiversity of uich, Germay, were paid for their participatio. Twety-four participats were traied i each of the three coditios. The data for oe participat i the rule traiig coditio were lost due to a computer breakdow. There was o attritio i the first two sessios, but there was attritio i the third sessio (Test 4), probably due to the log time iterval ( weeks) betwee Sessios 2 ad 3. The umber of participats i Test 4 was = 21, = 21, ad = 18 i the frequecy tree, probability tree, ad rule traiig coditios, respectively.

16 TEACHING BAYESIAN REASONING UU \ f 4 - * Frequecy Tree 3 / 2 - A Rule Traiig i A " ' - -. ^ Baselie Post traiig Oe-week (Test 1) (Test 2) follow up (Test 3) X ^^^^ '''\ - / -X -Probability Tree 1 1 '. '> Threemoths follow up (Test 4) Figure 12. percetages of Bayesia solutios obtaied i Study 2 (out of seve possible) for the three traiig coditios (liberal scorig criterio). Results The same two scorig criteria as i Studies la ad Ib were used. Figure 12 shows the media percetages of Bayesia solutios, for the liberal scorig criterio. Agai, results for liberal ad strict criterio differed i quatity but ot i quality. Immediate effect. As i the previous studies, the baselie test (Test 1) idicated that participats had few skills for solvig Bayesia tasks. Before the traiig, the media percetage of Bayesia solutios over all participats was %. The immediate traiig effect was strog for all three traiig programs ad agai yielded large effect sizes that were comparable to those obtaied i the previous studies (see Table 3). Trasfer. As i the previous studies, trasfer was excellet i all three traiig programs. O average, usig the liberal scorig criterio, the differece betwee old ad ew tasks was.6 percetage poits i the frequecy tree, 4.9 i the probability tree, ad 4.8 i the rule traiig coditio (usig the strict scorig criterio, the correspodig values were 2.1, 3.8, ad 1.2 percetage poits, respectively). Stability. I the previous studies, the effect of the represetatio traiig was stable over a 5-week period. I rule traiig, by cotrast, the effect faded away over time (with the otable exceptio of the bous group i Study Ib). Ca participats still maitai their represetatio skills weeks after traiig? Do we still obtai the differece betwee the rule ad represetatio traiig as i Studies la ad Ib? Figure 12 shows that, cosistet with the results i the previous studies, o decay occurred i the group that received represetatio traiig (frequecy tree). Here, the immediate traiig effect of a media of 93% Bayesia solutios remaied stable over the whole period of weeks. I fact, it eve icreased to 1% Bayesia solutios at Test 4. I cotrast, the rule traiig group bega high, at a media of % Bayesia solutios, ad eded up at a media of 5% after weeks. The probability tree traiig shows a similar patter of results as the rule traiig. There is some decay from Test 2 to Test 3 ad a more proouced decay from there to Test 4, with a fial level of 57% Bayesia solutios. A compariso of the log-term improvemet scores (Test 4 Test 1) betwee the frequecy tree coditio o the oe had ad the probability tree ad rule traiig coditios o the other had agai yields medium- to large-sized effects. The differece betwee the two probability coditios is, by cotrast, Table 3 Correlatioal Effect Sizes Expressig Immediate Traiig Effects Differetial Traiig Effects Across Coditios i Study 2 Liberal scorig Withi Coditios ad Strict scorig Traiig effect Test statistic r Test statistic r df Immediate (Test 2 - Test 1) Frequecy tree Rule traiig Probability tree Short-term differetial (Test 2 - Test 1) Frequecy tree versus rule traiig Frequecy tree versus probability tree Probability tree versus rule traiig Log-term differetial (Test 4 - Test 1) Frequecy tree versus rule traiig Frequecy tree versus probability tree Probability tree versus rule traiig Note. The effect sizes for the immediate traiig effects were calculated from repeated measures ANOVAs with tests (Test 1, Test 2) as the repeated factors, ad differetial traiig effects were calculated from t tests of group differeces usig improvemet scores (Test 2 - Test 1 ad Test 4 - Test 1). For each compariso, it shows test statistic (F for immediate effects ad t for differetial effects), correlatioal effect size r, ad df.

17 396 SEDLEIER AND GIGERENZER small, especially whe the liberal scorig criterio is used (Figure 12; see Table 3, log-term differetial effect). 4 Discussio Studies la ad Ib left ope a possible alterative explaatio for the superior results i the represetatio traiig as compared with the rule traiig. The former used graphical aids, whereas the latter did ot, ad therefore the graphical aid might have made the differece. I Study 2, both a frequetistic ad a probabilistic coditio used the same graphical aid, a tree structure. The immediate traiig results were very high i both tree coditios, but they differed markedly i the stability of the traiig success over time. The frequecy tree traiig eabled participats to retai what they had leared more tha 3 moths before, whereas the effect of the probability tree traiig decayed over time, to a media of 57%. How much could the probability traiig gai by usig a graphical aid? Figure 12 shows that performace was slightly better after weeks, but overall, there is little, if ay, differece. This holds despite the rule traiig group havig to lear a more complicated formula (Bayes's rule for probabilities) tha the probability tree group. The similar performace i the two probability traiig programs idicates that the importat questio is ot whether a graphical aid should be used i teachig statistical literacy but what is a proper represetatio for a graphical aid. 5 It also idicates that the superior effect of atural frequecies is ot due solely to computatioal simplicity, which is the same for probability trees as for frequecy trees except that the decimal poit is moved to the left. The results i Study 2 are cosistet with Gigerezer ad Hoffrage's (1995) coclusio that atural frequecies costitute a proper represetatio of ucertaities. Coclusio Gigerezer ad Hoffrage (1995) have stressed the importace of studyig cogitive algorithms i tadem with the iformatio format for which they are desiged. The thesis is that humas ad aimals more easily ecode iformatio about ucertai eviromets i terms of atural frequecies compared with probabilities, ad oe ca show that Bayesia computatios are simpler whe iformatio is represeted i atural frequecies. Both the frequecy grid ad the frequecy tree are realizatios of atural samplig of frequecies. We applied Gigerezer ad Hoffrage's work to a uresolved problem: how to desig a method for teachig Bayesia reasoig that is built o psychological priciples ad ca overcome the lack of success reported i previous studies. The cetral idea is to teach people to represet iformatio i a way that is tued to their cogitive algorithms. Whether such cogitive algorithms are the direct result of evolutio or whether they rely o evolved metal architectures ad are shaped to a large extet durig otogeesis by learig processes does ot matter much for our argumet. For istace, Sedlmeier (1999) showed that a associative learig model also arrives at the predictio that Bayesia algorithms crucially deped o iformatio format. Our research emphasizes the role of the iformatio represetatio at ecodig. If iformatio is ecoded i terms of atural frequecies, probability judgmets ca be quite exact (Sedlmeier, 1999, pp ). This psychological approach was cotrasted with the traditioal approach to the teachig of statistical reasoig, which emphasizes how to isert the right umbers ito the right rule. Similar to prior traiig attempts o the impact of sample size (e.g., Fog & Nisbett, 1991), the rule traiig method showed a substatial short-term icrease i performace, ad relative to this icrease, a excellet trasfer. After several weeks, however, Bayesia reasoig had udergoe the well-kow decay fuctio. Whe participats were taught represetatios istead of rules, the iitial traiig effect was oticeably higher, trasfer was equally good, ad there was o loss of performace after weeks. Let us reflect o the larger cotext i which the preset approach to teachig stads. First, there is a ecological perspective: Cogitive algorithms (or rules) are adapted to specific iformatio formats i the eviromet. Specifically, the exteral represetatio of iformatio ca "perform" part of the computatios. Secod, there is the evolutioary distictio betwee the past eviromet to which the cogitive processes of a orgaism are adapted ad the preset eviromet i which a orgaism lives (e.g., Buss, Haselto, Shackelford, Bleske, & Wakefield, 1998; Cummis, 1998). Whe eviromets chage, such as by the ivetio of ew forms for the represetatio of iformatio such as probabilities, cogitive processes may o loger fuctio as well as before, ad "illusios" ca be a cosequece. As a example from visio, cosider color costacy, a impressive adaptatio of the huma perceptual system. It allows people to see the same color uder chagig illumiatios: uder the bluish light of day as well as the reddish light of the settig su. Color costacy, however, fails uder certai artificial lights such as sodium or mercury vapor lamps, which were ot preset i the eviromet whe mammals first evolved (Shepard, 1992). The same type of argumet ca be made for statistical reasoig (Gigerezer, 1998), where atural frequecies correspod to the format of iformatio a foragig orgaism would have ecoutered before the ivetio of books ad statistics, ad probabilities ad percetages correspod to a iformatio eviromet that has bee chaged by the ivetio of mathematical probability. Compared with the earlier emphasis o demostratig cogitive biases i statistical reasoig, or so-called "ievitable" illusios (e.g., Piattelli-Palmarii, 1994), the ecological perspective ca 4 There is oe otable exceptio from the fidig that the results based o the strict ad liberal scorig criteria differ oly i quatity, i Study 2. Accordig to the strict scorig criterio, there is a relatively large differece i the media percetages at Test 4 (36 percetage poits) betwee rule traiig ad probability tree coditios, which is much smaller whe expressed i meas (6 percetage poits) ad which is ot foud whe applyig the liberal criterio (see Figure 12 ad the Appedix). The large differece is, however, due i part to the coarse step size that determies the possible media percetage values. Recall that with eight possible values ( to 7 problems solved) a differece of oe problem solved amouts to a differece of.3 percetage poits. 5 A alterative way to disetagle the possible ifluece of a graphical aid from that of the iformatio represetatio (frequetistic vs. probabilistic) would have bee to dispese with graphical aids i both represetatios (rather tha to use them i both represetatios, as i Study 2). We did ot proceed with this route because it has already bee show by Gigerezer ad Hoffrage (1995) that frequecy represetatios yield solutio rates about three times as high (about 5% correct solutios) as probability represetatios both without graphical aids.

18 TEACHING BAYESIAN REASONING 397 actually advise us how to help people uderstad statistical iformatio. Here, the exteral represetatio of umerical iformatio, ad the iteral traslatio of oe represetatio ito aother, ca be a major tool for helpig people to attai isight. This is ot to say that frequecy represetatios are the oly tool. Study Ib, for istace, idicated that offerig moetary icetives ca motivate studets to make additioal effort ad ca eable them to perform about as well as those who had a represetatio traiig. We coclude with some ope questios ad possible extesios of the preset work. First, we have dealt with oly a elemetary form of Bayesia iferece, ad we do ot kow how these results geeralize to situatios i which hypotheses ad data are ot biary but multivalued or cotiuous. Secod, we have ot dealt with situatios i which there is more tha oe piece of diagostic iformatio, such as two medical tests i sequece. ultiple pieces of iformatio ca be reduced to the sequetial applicatio of two frequecy represetatios, ad Krauss, artigo, ad Hoffrage (1999) have show that the effect of atural frequecies remais as strog with two pieces of iformatio as it is with oe. This result suggests extedig teachig represetatios to situatios with multiple pieces of iformatio. Third, a extesio of the traiig program would be to teach Bayesia shortcuts, as described i Gigerezer ad Hoffrage (1995). For istace, whe a disease is rare (low base rate) ad ca be easily detected (high hit rate) ad false positives are umerous, as compared with true positives, the the ratio betwee base rate ad false-alarm rate is a good approximatio of the Bayesia estimate. For istace, assume that oly 2 out of 1, me have HIV; the hit rate of a ELISA test is very high; ad there are about 2 false positives amog those 9,998 me who do ot have the virus. The probability that a ma who tests positive actually has the virus ca be approximated by simply dividig the base-rate frequecy (2) by the false-alarm frequecy (2); this shortcut results i a value of 1 i 1. A fial extesio of the traiig program would be to teach participats to uderstad ad judge the assumptios for the applicability of Bayes's rule (e.g., Barma, 1992) as well as other, competig statistical methods for iferece. Tutorial programs could play a useful role i educatio for mathematical ad statistical literacy ad i overcomig iumeracy (Paulos, 1988; Sedlmeier, 1999, 2). Because the represetatio traiig lasts oly 1-2 hr, it ca be used, for istace, i high school curricula to teach youg people how to evaluate the results of pregacy, HIV, or drug tests. Similarly, it ca be used to teach both patiets ad physicias to estimate the chaces of actually havig breast cacer after a positive mammogram, ad the like. Computerized programs have bee prove to attract the attetio of youg ad old alike, ad we have observed i our participats a high degree of ivolvemet ad desire to succeed. The teachig of statistical literacy ca take advatage of huma psychology. Refereces Aberathy, C.., & Hamm, R.. (1995). Surgical ituitio: What it is ad how to get it. Philadelphia, PA: Haley & Belfus. Apple Computer, Ic. (1992). acitosh commo lisp referece. Cupertio, CA: Author. Arkes, H. R. (1981). Impedimets to accurate cliical judgmet ad possible ways to miimize their impact. Joural of Cosultig ad Cliical Psychology, 49, Bar-Hillel,. (198). The base rate fallacy i probability judgmets. Acta Psychologica, 44, Bourguet,. -N. (1987). Decrire, compter, calculer: The debate over statistics durig the Napoleoic period. I L. Kriiger, L. Dasto, &. Heidelberger (Eds.), The probabilistic revolutio: Vol. 1. Ideas i history (pp ). Cambridge, A: IT Press. Buss, D. A., Haselto,. G., Shackelford, T. K., Bleske, A. L., & Wakefield, J. C. (1998). Adaptatios, exaptatios, ad spadrels. America Psychologist, 53, Christese-Szalaski, J. J. J., & Beach, L. R. (1982). Experiece ad the base-rate fallacy. Orgaizatioal Behavior ad Huma Performace, 29, Cohe, J. (199). Thigs I have leared (so far). America Psychologist, 45, Cohe, J. (1992). A power primer. Psychological Bulleti, 112, 5-9. Cole, W. G. (1988). Three graphic represetatios to aid Bayesia iferece. ethods of Iformatics i edicie, 27, Cosmides, L., & Tooby, J. (1996). Are humas good ituitive statisticias after all? Rethikig some coclusios from the literature o judgmet uder ucertaity. Cogitio, 58, Cummis, D. D. (1998). Social orms ad other mids: The evolutioary roots of higher cogitio. I D. D. Cummis & C. Alle (Eds.), The evolutio of mid (pp. 31-5). New York: Oxford Uiversity Press. Dowie, J., & Elstei, A. (1988). Professioal judgmet: A reader i cliical decisio makig. Cambridge, Eglad: Cambridge Uiversity Press. Barma, J. (1992). Bayes or bust? A critical examiatio of Bayesia cofirmatio theory. Cambridge, A: IT Press. Eddy, D.. (1982). Probabilistic reasoig i cliical medicie: Problems ad opportuities. I D. Kahema, P. Slovic, & A. Tversky (Eds.), Judgmet uder ucertaity: Heuristics ad biases (pp ). Cambridge, Eglad: Cambridge Uiversity Press. Edwards, W. (1968). Coservatism i huma iformatio processig. I B. Kleimutz (Ed.), Formal represetatio of huma judgmet (pp ). New York: Wiley. Falk, R., & Koold, C. (1992). The psychology of learig probability. I F. S. Gordo & S. P. Gordo (Eds.), Statistics for the twety-first cetury (pp ). Washigto, DC: The athematical Associatio of America. Fischhoff, B., & Bar-Hillel,. (1984). Focusig techiques: A shortcut to improvig probability judgmets? Orgaizatioal Behavior ad Huma Performace, 34, Fischhoff, B., Slovic, P., & Lichtestei, S. (1979). Subjective sesitivity aalysis. Orgaizatioal Behavior ad Huma Performace,, Fog, G. T., Lurigio, A. J., & Stalas, L. J. (199). Improvig probatio decisios through statistical traiig. Crimial Justice ad Behavior, 17, Fog, G. T., & Nisbett, R. E. (1991). Immediate ad delayed trasfer of traiig effects i statistical reasoig. Joural of Experimetal Psychology: Geeral, 12, Garfield, J., & Ahlgre, A. (1988). Difficulties i learig basic cocepts i probability ad statistics: Implicatios for research. Joural of Research i athematics Educatio, 19, Gigerezer, G. (1991). O cogitive illusios ad ratioality. Poza Studies i the Philosophy of the Scieces ad the Humaities, 21, Gigerezer, G. (1994). Why the distictio betwee sigle-evet probabilities ad frequecies is importat for psychology (ad vice versa). I G. Wright & P. Ayto (Eds.), Subjective probability (pp ). New York: Wiley. Gigerezer, G. (1996). The psychology of good judgmet: Frequecy formats ad simple algorithms. Joural of edical Decisio akig, 16, Gigerezer, G. (1998). Ecological itelligece: A adaptatio for freque-

19 398 SEDLEIER AND GIGERENZER cies. I D. E. Cummis & C. Alle (Eds.), The evolutio of mid (pp. 9-29). New York: Oxford Uiversity Press. Gigerezer, G., & Hoffrage, U. (1995). How to improve Bayesia reasoig without istructio: Frequecy formats. Psychological Review, 12, Gigerezer, G., & Hoffrage, U. (1999). Overcomig difficulties i Bayesia reasoig: A reply to Lewis ad Kere (1999) ad ellers ad cgraw (1999). Psychological Review, 16, Gigerezer, G., Hoffrage, U., & Ebert, A. (1998). AIDS cousellig for low-risk cliets. AIDS CARE, 1, Gigerezer, G., Swijtik, Z., Porter, T., Dasto, L., Beatty, J., & Kriiger, L. (1989). The empire of chace: How probability chaged sciece ad everyday life. Cambridge, Eglad: Cambridge Uiversity Press. Good, I. J. (1995, Jue). Whe batterers tur murderer. Nature, 375, 541. Gould, S. J. (1992). Bully for brotosaurus: Further reflectios i atural history. New York: Pegui Books. Hertwig, R., & Ortma, A. (1999). Experimetal practices i ecoomics: A methodological challege for psychologists? auscript submitted for publicatio. Hoffrage, U., & Gigerezer, G. (1998). Usig atural frequecies to improve diagostic ifereces. Academic edicie, 73, Kahema, D., & Tversky, A. (1972). Subjective probability: A judgmet of represetativeess. Cogitive Psychology, 3, Kahema, D., & Tversky, A. (1973). O the psychology of predictio. Psychological Review, 8, Kahema, D., & Tversky, A. (1996). O the reality of cogitive illusios. Psychological Review,, Kleiter, G. (1994). Natural samplig: Ratioality without base rates. I G. H. Fischer & D. Lamig (Eds.), Cotributios to mathematical psychology, psychometrics, ad methodology (pp ). New York: Spriger. Koehler, J. J. (1996). The base rate fallacy recosidered: Descriptive, ormative, ad methodological challeges. Behavior ad Brai Scieces, 19, Koehler, J. J. (1997). Oe i millios, billios, ad trillios: Lessos from People v. Collis (1968) for People v. Simpso (1995). Joural of Legal Educatio, 47, 2-2. Krauss, S., artigo, L., & Hoffrage, U. (1999). Simplifyig Bayesia iferece. I L agai, N. Nersessia, & N. Thagard (Eds.), odelbased reasoig i scietific discovery (pp ). New York: Pleum Press. Kriiger, L., Dasto, L., & Heidelberger,. (Eds.). (1987). The probabilistic revolutio: Vol. 1. Ideas i history. Cambridge, A: IT Press. Lidema, S. T., va de Brik, W. P., & Hoogstrate, J. (1988). Effect of feedback o base-rate utilizatio. Perceptual ad otor Skills, 67, Loftus, G. R. (1993). A picture is worth a thousad p values: O the irrelevace of hypothesis testig i the microcomputer age. Behavior Research ethods, Istrumets & Computers, 25, arr, D. (1982). Visio: A computatioal ivestigatio ito the huma represetatio ad processig of visual iformatio. Sa Fracisco: Freema. Paulos, J. A. (1988). Iumeracy: athematical illiteracy ad its cosequeces. New York: Vitage Books. Peterso, C. R., DuCharme, W.., & Edwards, W. (1968). Samplig distributios ad probability revisio. Joural of Experimetal Psychology, 76, 6-3. Piaget, J., & Ihelder, B. (1975). The origi of the idea of chace i childre. New York: Norto & Compay. (Origial work published 1951) Piattelli-Palmarii,. (1994). Ievitable illusios. How mistakes of reaso rule our mids. New York: Wiley. Ploger, D., & Wilso,. (1991). Statistical reasoig: What is the role of iferetial rule traiig? Commet o Fog ad Nisbett. Joural of Experimetal Psychology: Geeral, 12, 2-2. Porter, T.. (19). The rise of statistical thikig Priceto, NJ: Priceto Uiversity Press. Reeves, L.., & Weisberg, R. W. (1993). Abstract versus cocrete iformatio as the basis for trasfer i problem solvig: Commet o Fog ad Nisbett (1991). Joural of Experimetal Psychology: Geeral, 122, Rosethal, R., & Rosow, R. L. (1991). Essetials of behavioral research: ethods ad data aalysis (2d ed.). New York: cgraw-hill. Rosow, R. L., & Rosethal, R. (1996). Computig cotrasts, effect sizes, ad couterulls o other people's published data: Geeral procedures for research cosumers. Psychological ethods, 1, Schaefer, R. E. (1976). The evaluatio of idividual ad aggregated subjective probability distributios. Orgaizatioal Behavior ad Huma Performace, 17, Schmidt, F. L. (1996). Statistical sigificace testig ad cumulative kowledge i psychology: Implicatios for traiig of researchers. Psychological ethods, 1, Sedlmeier, P. (1996). Jeseits des Sigifikaztest-Rituals: Ergazuge ud Alterative [Beyod the ritual of sigificace testig: Alterative ad supplemetary methods]. ethods of Psychological Researcholie, 1. Available o the World Wide Web: Sedlmeier, P. (1997). BasicBayes: A tutor system for simple Bayesia iferece. Behavior Research ethods, Istrumets, & Computers. 29, Sedlmeier, P. (1999). Improvig statistical reasoig: Theoretical models ad practical implicatios. ahwah, NJ: Erlbaum. Sedlmeier, P. (2). How to improve statistical thikig: Choose the task represetatio wisely ad lear by doig. Istructioal Sciece,, Sedlmeier, P., & Gigerezer, G. (1997). Ituitios about sample size: The empirical law of large umbers? Joural of Behavioral Decisio akig, 1, Sedlmeier, P., & Gigerezer, G. (2). Was Beroulli wrog? O ituitios about sample size. Joural of Behavioral Decisio akig,, 3-9. Sedlmeier, P., & Kohlers, D. (21). Wahrscheilichkeite im Alltag: Statistik ohe Formel [Probabilities i everyday life: Statistics without formulas]. Brauschweig, Germay: Westerma. Shaugessy, J.. (1992). Research i probability ad statistics: Reflectios ad directios. I D. A. Grouws (Ed.), Hadbook of research o mathematics teachig ad learig (pp ). New York: acmilla. Shepard, R. N. (1992). The perceptual orgaizatio of colors: A adaptatio to regularities of the terrestrial world? I J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mid: Evolutioary psychology ad the geeratio of culture (pp ). New York: Oxford Uiversity Press. Tversky, A., & Kahema, D. (1982). Evidetial impact of base rates. I D. Kahema, P. Slovic, & A. Tversky, (Eds.), Judgmet uder ucertaity: Heuristics ad biases (pp. 3-16). Cambridge, Eglad: Cambridge Uiversity Press. vo Witerfeldt, D., & Edwards, W. (19). Decisio aalysis ad behavioral research. Cambridge, Eglad: Cambridge Uiversity Press. Wolfe, C. R. (1995). Iformatio seekig o Bayesia coditioal probability problems: A fuzzy-trace theory. Joural of Behavioral Decisio akig, 8,

20 TEACHING BAYESIAN REASONING 399 Appedix ad ea Percetages, ad Stadard Deviatios ad Group Sizes, for All Tests i Studies la, Ib, ad 2 Liberal scorig Strict scorig easure Test 1 Test 2 Test 3 Test 4 Test 1 Test 2 Test 3 Test 4 Study la Frequecy tree Frequecy grid Rule traiig Cotrol Study la, complete data sets Frequecy tree ( = 5) Frequecy grid ( = 7) Rule traiig ( = 1) Frequecy tree Rule traiig Bous Frequecy tree Rule traiig Study Ib (Appedix cotiues)