Similar documents

Magrathea Non-Geographic Numbering Plan

South East of Process Main Building / 1F. North East of Process Main Building / 1F. At 14:05 April 16, Sample not collected

b) since the remainder is 0 I need to factor the numerator. Synthetic division tells me this is true

Smart Meters Programme Schedule 2.6. (Insurance) (CSP North version)

arts & crafts theatre literature dance artisan goods visual art music

Your gas and electricity bill actual readings

VLSM Static routing. Computer networks. Seminar 5

Contents. Financial Analysis Report

Xxxxxxxxxxxxxxxx Xxxxxxxxxxxxxxxx

CERTIFIED TRANSLATION

Table of Contents. Volume No. 2 - Classification & Coding Structure TOPIC NO Function No CARS TOPIC CHART OF ACCOUNTS.

2011 Latin American Network Security Markets. N July 2011

Accounting Notes. Purchasing Merchandise under the Perpetual Inventory system:

Question 1a of 14 ( 2 Identifying the roots of a polynomial and their importance )

Royal Mail Local Collect

Freshly Investigated Credit Report

HEALTH SYSTEM INTERFUND JOURNAL ENTRY EXAMPLES

acyclotomicpolynomial).otherexamples,writingthefactorizationsasdierencesof squares,are (5y2)5?1 (3y2)3+1 3y2+1=(3y2+1)2?(3y)2

FOIL FACTORING. Factoring is merely undoing the FOIL method. Let s look at an example: Take the polynomial x²+4x+4.

The North American Industry Classification System (NAICS)

PROPOSAL: MASTER S DEGREE

Instructions for the Completion of the Report on Interest Rates on Loans and Deposits

Michigan Public School Accounting Manual presented by Glenda Rader Grand Ledge Public Schools September 23, 2015

OPEN WHEN... OPEN WHEN...

USAF STRATEGIC PLANNING ICT MARKET ASSESSMENT TEMPLATE

Age: Use B-L-A-C-K H-O-R-S-E code B = 71 or 81 L = 72 or 82 A = 73 or 83 C = 74 or 84 K = 75 or 85

CUSTOMER INFORMATION LETTER N 34 Ind: 02 - May 29th, 2013

Table of Contents. Summary of Changes

Pricing Formula for 3-Period Discrete Barrier Options

Alcohol and Drugs. 1. When was the first time you consumed alcohol/drugs? What form of substance did you take? Why did you do it?

IDENTITY THEFT PREVENTION AND ANONYMISATION POLICY

CORPORATE PURCHASING CARD (CPC) PROGRAM POSTING AND PAYMENT DISTRIBUTION PROCESS

Building a Spatial Database for Earthquake Risk Assessment and Management in the Caribbean

Schedule of VET tuition fees 2016 Name of course: Diploma of Business Training Package Code: BSB50215 Delivery location(s): Virtual Campus

Financial Services MOU PROCEDURES MATRIX

How To Account For Allcatins

January 20, 2009 GEORGE W. WRIGHT VICE PRESIDENT, INFORMATION TECHNOLOGY OPERATIONS

Solvency ii: an overview. Lloyd s July 2010

To provide Employees and Managers with a clear understanding of how training is identified and supported at PSUAD.

Sample Plan of Studies Lindenwood University, BS in Allied Health Sciences Spring Start 6 or less General Education Courses

Electronic Warfare - Emerging Trends, Approaches, Key issues and Investment Outlook. Reference code: DF4420PR Published: March 2015

Texcellent System. Remote Control User s Manual V2.2

problemofndinganindependentsetofmaximumcardinalityisoneofthefundamentalcombinatorialproblems.itisknowntobenp-complete,evenforbounded-degreegraphs,and

SkillCheck's Call Centre Customer Service Scenarios Test

G e r m a n T O & E TABLE OF CONTENTS. PANZER German TO&E

Budget Transparency Reporting: Personnel Expenditures

Changes to telemarketing and non-geographic numbers in the UK. Your questions answered

How To Set The Sensor On A Powerpoint 3.5 (Powerpoint) On A Blackberry 2.5A (Powerplant) On An Iphone Or Ipad (Powerplant) On The Blackberry 3.2 (

XXXXXXX CHANGE PLAN JANUARY 2012-MARCH 2013

AGREEMENT. Between. And

Understanding FOAPal. Finance Training

Storm Damage Arbitration Agreement ADR Systems File # xxxxxxxxx Insurance Claim # xxxxxxxxxx

Math Integrated B.Sc./B.Ed. Programs 1

Exposure to Liability Effect of Transition Rules Beginning June 1, 2013

Financial Reporting Fluctuation ( Flux ) Analysis

GOAL Program: Project Cost Certification Requirements

SSLV105 - Stiffening centrifuges of a beam in rotation

Vibrations of a Free-Free Beam

U.S. Contract Research Outsourcing Market: Trends, Challenges and Competition in the New Decade. N8B7-52 December 2010

Power On: The light comes on when there is power to the system unit.

CUSTOMER INFORMATION LETTER N 34 Ind: 04 - FEB 19/2015

Deutsche Bank Paper Invoice Submission and Compliance Requirements Manual (PO and Non PO) India Region

620M User's Guide. Motor Finance Company

Loan Programs and Student Loans - How to Get the Best Deal

Erasmus Mundus Master QEM Models and Methods of Quantitative Economics

Banner Finance. Banner Navigation Quick Hints Managing your Banner Account Setting Up My Banner... 11

Main TVM functions of a BAII Plus Financial Calculator

SUGI 29 Posters. Web Server

MECHANICAL ENGINEERING PROGRAMME DIRECTOR S MESSAGE

Why is a budget important? I. What is a budget? II.

SUMMARY REPORT THE BOTTOM LINE. Electrical. Report Number: Inspection Date: Saturday, February XX, XXXX

Testing of inter-process communication and synchronization of ITP LoadBalancer software via model-checking

DATA SHEET ARRAY CHIP RESISTORS YC/TC 5%, 1% sizes YC:102/104/122/124/162/164/248/324/158/358 TC: 122/124/164

National Electrical Manufacturers Association Guidance on Energy Policy Act Commercial Building s Tax Deduction Certification Letters

Algebra Sequence - A Card/Board Game

The History of NAICS

ASUH Funding Fiscal Procedures

Program Description. [Program, e.g. B.S. in Nursing] To Be Offered by [Campus] at [Location]

Accounting Notes. Types (classifications) of Assets:

s i æ I N D h / i E l f w g u v b s Z b S p U m z ts n z ˆ k Z ç w o R U h E D d d æ T S

Algebra (Expansion and Factorisation)

IMPORT GUIDE Checklist for Importing ASCII Client Data into CSA Using Microsoft Excel

Identity Theft Protection in Structured Overlays

Guide For Using The Good MPF Employer Award Logo

ILLUSTRATION 21-1 BASIC FEATURES AND TYPES OF PENSION PLANS. (defined). in the future is a computation.

APP USER MANUAL. Trackunit Virtual Hardware. Status / Tracking / Map

Style Guide for the Applied Dissertation

How to build ADaM from SDTM: A real case study

Homotopy Perturbation Method for Solving Partial Differential Equations with Variable Coefficients

TAFE TEACHERS AND RELATED EMPLOYEES ENTERPRISE AGREEMENT Statements of duties for proposed new roles

Year Dyansonic Holiday Powertone Tower Luxor, Mercury. 12xxx 20xxx 2xxx-5xxx 1xxx-2xxx Script Rogers Logo. Parklane Cocktail Outfit is introduced

SSLV160 - bi--supported Beam subjected to a nodal force on its neutral fiber

7.6 VULNERABILITY SCANNING SERVICE (VSS) (L ; C ) Satisfying the Service Requirements (L (c))

ABAP QUERY AN EXAMPLE

Substitutes for Mobile Voice Roaming Services ( MVRS )

Chapter 5. Rational Expressions

Coding Systems. Understanding NDC and HCPCS. December 2014

Coldwell Banker United, Realtors

NEXT. Tools of the Participant Portal: Scientific Reports & Deliverables

Transcription:

Bayesianprobabilisticextensionsofadeterministicclassicationmodel K.U.Leuven,Belgium IwinLeenenandIvenVanMechelen AndrewGelman ColumbiaUniversity,NewYork binarypredictorvariablesx1;:::;xk,abooleanregressionmodelisaconjunctive(ordisjunctive)logicalcombinationconsistingofasubsetsofthe aspecicationofak-dimensionalbinaryindicatorvector(1;:::;k)with Xvariables,whichpredictsY.Formally,Booleanregressionmodelsinclude yitodier(foranyobservationi).withinbayesianestimation,aposterior distributionoftheparameters(1;:::;k;)islookedfor.theadvantages ofsuchabayesianapproachincludeaproperaccountfortheuncertainty ThispaperextendsdeterministicmodelsforBooleanregressionwithina Bayesianframework.ForagivenbinarycriterionvariableYandasetofk Summary j=1ixj2s.inaprobabilisticextension,aparameterisaddedwhich representstheprobabilityofthepredictedvalue^yiandtheobservedvalue Tiensestraat102,B-3000Leuven,Belgium. posteriorpredictivechecks).weillustrateinanexampleusingrealdata. inthemodelestimatesandvariouspossibilitiesformodelchecking(using draftofthispaper,andjohannesberkhofforhelpfuldiscussions. TheauthorsgratefullyacknowledgeBrianJunkerforhishelpfulcommentsonanearlier ThisworkwassupportedinpartbytheU.S.NationalScienceFoundationGrantSBR- AddresscorrespondencetoIwinLeenen,DepartmentofPsychology,K.U.Leuven, 9708424.

Keywords:Bayesianestimation,Booleanregression,logicalruleanalysis, posteriorpredictivechecks 2 Inmanyresearchlines,predictionproblemsareconsideredwiththepredictors 1and/orcriteriabeingbinaryvariables.Asaresult,anumberofmodelsand associatedtechniqueshavebeendevelopedtoexaminetherelationsinthis Introduction though,oneaimsatndingthesucientand/ornecessaryconditionsfor theprobabilitythatthecriterionvariableassumeseitherofthetwopossible valuesisalinearfunctionofanumberofpredictors.inmanyrelevantcases, example,inalogisticregressionmodelwithbinaryvariables,thelogitof typeofdata,includinginstantiationsofthegeneralizedlinearmodel.for approach,whichassumesacompensatoryassociationrule,lessappropriate acriteriontooccur,which,asaresult,makesthegeneralizedlinearmodel fromatheoreticalpointofview.inmedicaldiagnoses,forexample,assigning andconceptsassumethatassignmenttoacategoryisbasedonthepresence ofasetofsinglynecessaryandjointlysucientattributes. adiseasetoagivenpatientisoftenbasedonconsideringalistofnecessary model(vanmechelen,1988;vanmechelen&deboeck,1990)maybehelpful andsucientconditions;asanotherexample,sometheoriesoncategories asitidentiesforagivenbinarycriterionandagivensetofbinarypredictors asubsetofthepredictorsthatareconjunctively(resp.disjunctively)combinedtopredictthevalueonthecriterionvariable.besidesapplicationsin Insearchofnecessaryand/orsucientconditions,aBooleanregression 1984;VanMechelen&DeBoeck,1990),techniquesrelatedtoBooleanregressionhavebeenstudiedindiscretemathematicsandinthecontextofthe thesocialsciences(mckenzie,clarke,&low,1992;ragin,mayer,&drass, designofswitchingcircuitsinelectronics(biswas,1975;halder,1978;mc- suchasdisjunctivecombinationsofconjunctions(orviceversa),arealso Cluskey,1965;Sen,1983).Inthelatterpublications,morecomplexrules, considered. ExistingalgorithmsforBooleanregressionaimatndingasubsetofthe 1988).However,atleastthreeshortcomingsgowiththeapproachofnding predictorswhichminimizesthenumberofpredictionerrors(vanmechelen, asinglebestsolution:first,inmanyempiricalapplications,severaldierent Booleanregressionhasinitiallybeenformulatedasadeterministicmodel. subsetsofthepredictorsmaytthedata(almost)equallywell,whereasfrom asubstantiveviewpointtheymaybequitedierent.second,itisnotobvious provideanytoolsformodelcheckingduetothefactthatthemodeldoes thenumberofpredictionerrors).third,thedeterministicmodeldoesnot howtodrawstatisticalinferencesaboutthesizeofthepredictionerroras notspecifyitsrelationtothedata.hence,amethodwhichgivesinsight estimatesthetruemodelerror(becausethealgorithmaimsatminimizing thepredictionerrorassociatedwiththesinglebestsolutionprobablyunder-

them,isofgreatinterest. inseveralconcurringmodelsandinthelevelofuncertaintyassociatedwith 3 followsthegeneralrecipeproposedbygelman,leenen,vanmechelen,and uralconceptualframeworkforexploringthelikelihoodofseveralpossible concurringmodelsforagivendataset.themodelextensionpresentedhere withinabayesianframework.bayesianstatisticscanbeconsideredanat- Therefore,thepresentpaperextendsthemodelforBooleanregression DeBoeck(inpreparation),whichbringsmostofthetoolsthatareavailable forstochasticmodelswithintherealmofdeterministicmodels(likethemodel ofbooleanregression). thedeterministicmodelofbooleanregression.insection3,thestochastic extensionispresentedandestimationandcheckingofthemodelwithina Bayesianframeworkisdiscussed.InSection4anexampleondenitionsof emotionsillustratestheapplicationofthenewmodeltorealdata.section5 Theremainderofthepaperisorganizedasfollows:Section2recapitulates dealswithpossibleextensionsandcontainssomeconcludingremarks. 2.1Modelformulation ConsiderannkbinarymatrixX,whichdenotestheobservationsforn 2 TheDeterministicBooleanRegressionModel y=(y1;:::;yi;:::;yn),whichcontainstheobservedvaluesforthenunits unitsonkexplanatoryvariablesx1;:::;xj;:::;xk,andabinaryvector onacriterionvariabley.booleanregression,then,speciesaparameter conjunctivemodel, existwhichdierinthewaythatandxarecombinedtoget^y.ina onthecriterion.bothadisjunctiveandaconjunctivevariantofthemodel combinedwithxtogetabinaryvector^y=(^y1;:::;^yn)ofpredictedvalues vector=(1;:::;k)withj2f0;1g(j=1;:::;k)whichissubsequently whereasinthedisjunctivevariant: ^yi^y(;x)i=y ^yi^y(;x)i=1 Y jjj=1xij; (1) Despitetheirsubstantivedierence,conjunctiveanddisjunctivemodelsare dualmodels,though:acomparisonofeq.(1)andeq.(2)showsthatif jjj=1(1 xij): (2) aconjunctivemodeltssomedatasetxandythensimultaneouslythe Eq.(1). theconjunctivemodeland,unlessotherwisestated,any^yiiscalculatedasin onlyoneofbothvariantsneedstobeconsidered;inthispaper,wefocuson wherexcij=1 xijandyci=1 yi(i=1;:::;n;j=1;:::;k).asaresult, disjunctivemodeltsthecomplementeddataxcandyc,andviceversa,

icationoftherelationbetweentheobservedyandthepredicted^y.even Booleanregressionbeingadeterministicmodeldoesnotincludeaspec- 4 themodelshouldberejected.inpracticalapplicationsofthemodel,though, oneallowsforpredictionerrorsandthemodelgoeswithalgorithmsthataim everanobservationiexistsforwhichyiand^yiarediscrepant(i.e.,yi6=^yi), more,strictlyspeaking,themodelrequiresthemtobeequal.hence,when- atndingwiththeminimalnumberofdiscrepancies: 2.2Modelestimation D(y;)=nXi=1[yi ^y(;x)i]2: Mechelen&DeBoeck,1990)useagreedyheuristicwhichinitializesthe eachtimeselectingthatjforwhichthechangeyieldsthelargestdecrease algorithms(mickey,mundle,&engelman,1983;vanmechelen,1988;van entriesinto1andsuccessivelychangesthevalueofsomeentryjinto0, TondathatminimizesD(y;),twostrategieshavebeenproposed.Most innumberofdiscrepancies,untilchanginganyoftheremainingj'sdoesnot boundalgorithmthatguaranteesthatasolutionwithminimalvalueon furtherimprovethesolution. D(y;)isfound.Thisalgorithmpassesthroughatree,makingextensively useofthepropertythatinaconjunctivemodelchanginganarbitraryentry jfrom1into0doesnotdecreasethenumberoffalsenegatives(afalse Recently,LeenenandVanMechelen(1998)haveproposedabranch-andertyallowsthealgorithmtoapplybranchingandboundingtoalargeextent, equals0andtheobservedvalueyiequals1).inmanycases,thelatterprop- negativebeingdenedasanobservationiforwhichthepredictedvalue^yi therebystronglyreducingtheprocessingtimecomparedtoanenumerative searchamongallpossiblesolutions. numberofdescriptivestatistics,includingproportionofdiscrepancies,jaccard'sgoodness-of-tstatistic(sneath&sokal,1973;tversky,1977),andictivegainbyknowingthemodeloverapredictionbasedonthemarginal criterionprobabilityonly.however,thesestatisticsarelimitedinthatthey arebasedonthetotalgoodness-of-tanddonotexaminethestructureof notasolutionis\sucientlygood." 2.3Modelchecking Thegoodnessoftofthedeterministicmodelcanbesummarizedintoa VanMechelenandDeBoeck's(1990)^p,whichindicatestheamountofpre- theerrors.also,onlyrulesofthumbareavailabletodecideonwhetheror

3 BayesianBooleanRegression 5 maythereforebeconsideredthatexplicitlyincludesthepossibilityofapredictionerror. modelunderlyingthedeterministicmodel.anaturalextensionofthemodel 3.1Modelformulation Allowingfordiscrepanciesrevealstheimplicitassumptionofastochastic themodelandwhichisassumedtobeidenticalacrossobservations.hence, variablepossiblychangingfrom0into1orviceversa.forthispurpose,a newparameterisaddedtothemodel,whichistheexpectederrorrateof tothedeterministicmodel,whichaccountsforthevaluesonthecriterion ThestochasticextensionimpliestheadditionofaBernoulli-likeprocess foranyobservationi,itholdsthat: itlyindicatedbecausethepredictorvaluesareconsideredxed.)underlocal (Inthelatterandallfollowingequations,thedependenceonXisnotexplic- stochasticindependence,itfurtherholdsthatthelikelihoodofyunderthis Pr(yi=^yij;)=1 : (3) modelis: work,whichprovidestoolsforexploringtheposteriordistribution: Forconvenience,D(y;)isabbreviatedtoDinformulas. Inanextstep,thestochasticmodelisconsideredwithinaBayesianframe- p(yj;)=d(1 )n D: p(;jy)=p(yj;)p(;) ingdeterministicmodel:for,inthiscasemaximizingthelikelihood(which Uniformpriordistributionsimplyaminimalextensionofthealreadyexist- Wewillassumeandtohaveindependentanduniformpriordistributions. p(y) : (4) impliesminimizingthenumberofdiscrepancies)correspondstondingthe modeoftheposteriordistribution(gelmanetal.,inpreparation). AsshownintheAppendix,workingouttheposterioryields: wherethesuminthedenominatorisoverall2kvaluesintheparameter p(;jy)=(n+1)d(1 )n D space.clearly,evaluatingthissumisfeasibleforsmallkonly. #21 P(n D#) ; (5) Eq.(5)resultsin: parameter.againintheappendix,itisshownthatintegratingoutin Often,onewillbeinterestedinthemarginalposteriordistributionofthe p(jy)= #21 P(nD) 1 D#): (6)

yhaveequalposteriorprobabilities.furthermore,itfollowsthatifhas Thelatterimpliesthattwoparameterswhichareequallydiscrepantwith 6 probabilitiesequals: onediscrepancyfewerthanthentheratiooftheirmarginalposterior 3.2Modelestimation p(jy)=n D p(jy) D: (7) Inthissectionweshowhowonecangaininsightintheposteriordistribution Step0Asaninitializationstep,mestimates(s;0)andmestimates(s;0), bydrawingsimulationswithagibbs-metropolisalgorithm: value: vectorwithpr((s;0) (s=1;:::;m),areconstructedasfollows:(s;0)isarandombinary j =1)=0.5(j=1;:::;k)and(s;0)isgiventhe estimatesoftobe0or1(gelmanetal.,inpreparation). Weadd1inthenominatorand2inthedenominatortoavoidinitial D(y;(s;0))+1 n+2 : Step1WerunmparallelsequencesofaMetropolisalgorithm,with((s;0);(s;0)) asthestartingpointforsequences(s=1;:::;m).ateachiterationt (t=1;2;:::),thefollowingsubstepsareexecutedforeachsequences: 1.Acandidatevalueisconstructedbasedonthevalue(s;t 1) inthepreviousiteration.therefore,rstanintegerw(s;t)from adiscretedensity(e.g.,poissonorbinomial)isdrawnwiththe or1into0)toobtain.assuch,w(s;t)representsthenumberof randomlyselectedandsubsequentlychanged(fromeither0into1 entriesinthatarechangedfrom(s;t 1). restriction1w(s;t)k.next,w(s;t)entriesin(s;t 1)are ingfromthefollowingjumpingdistribution: Thisprocedureforconstructingtechnicallycorrespondstodraw- J(j(s;t 1))=kXw=11 Thejumpingdistributionreturnstheprobabilityofconsideringthe wherep(w)isthe(truncated)discretedensitymentionedabove. kwp(w); algorithmisofthemetropolistype. Clearly,Jissymmetric:J(j)=J(j)suchthattheresulting candidate,giventhevalueof(s;t 1)ofthepreviousiteration.

2.Theratiooftheposteriordensities,orequivalently,theratioof thelikelihoods,iscalculated: 7 3.Valuesareassignedto(s;t)and(s;t): r=p(yj;(s;t 1)) p(yj(s;t 1);(s;t 1))=1 (s;t 1) (s;t 1)D(y;(s;t 1)) D(y;): Thevaluefor(s;t)isobtainedbyadrawfromaBeta(D(y;(s;t))+ (s;t 1)otherwise withprobabilitymin(1,r) Thesestepsarerepeateduntilthemsequencesappearmixed.Gelman andrubin's(1992)p^rstatisticmaybeusedasadiagnosticinstrument inmonitoringtheconvergence. 1;n D(y;(s;t))+1)distribution. Step2InordertoobtainLposteriorsimulationdraws,theproceduredescribedinstep1continues,afterconvergenceofthesequences,for anotherl=miterations.thelatterdrawsinthemsequencesare 3.3Modelchecking collectedandwilleventuallyconstitutethesetofsimulationdraws AnaturalwayformodelcheckinginBayesianstatisticsisusingposterior f((l);(l))j(l=1;:::;l)gfromtheposteriordistribution. Step3ForeachoftheLposteriorsimulationdrawsareplicateddataset predictivechecks.therefore,weproceedwiththenextsteps: Step4AtestvariableT(y;)isdenedwhichsummarizessomeaspectof simulatedfrom^y(l)basedoneq.(3)(with(l)substitutedfor). y(l)issimulatedasfollows:first,^y(l)=^y((l);x)iscomputedusing interestofthedataorthediscrepancybetweenmodelanddata. Eq.(1),and,subsequently,thencomponentsofy(l)areindependently Step5TherealizedvalueT(y;(l))fortheobserveddataandthereplicated Step6Therealizedvalueandthereplicatedvaluearecomparedtoestimate Lsimulationdraws. valuet(y(l);(l))forthereplicateddataarecomputedforeachofthe Themodelcheckingprocedurepresentedherewillbeillustratedintheexample. theposteriorpredictivep-valueastheproportionofthelsimulations forwhicht(y(l);(l))>t(y;(l)).

4 IllustrativeApplication 8 4.1ProblemandData standable(nontechnical),andwhichthemselvesarenotnamesofspecic semanticprimitives,whichare\termsofwordswhichareintuitivelyunder- conceptscanbedenedbyasetofsinglynecessaryandjointlysucient Inthissectionweillustratethenewapproachbyanexampleintheeldof emotionsoremotionalstates."table1listssomeofthesemanticprimitives deningemotionconcepts.accordingtowierzbicka(1992,p.541),emotion explicitdenitions(i.e.,byexperts),thepresentstudyconsidersimplicittheoriesinlaymenandevaluateswhethertheseimplicittheoriesareconjunctive PredictorSemanticprimitive X1Apersondidsomethingbad X3Iwouldwanttochangethis X2Idon'twantthis sheproposed.asherdenitionsofemotionsareconjunctivecombinations andanemotionconcept(asthecriterion).whereaswierzbickadealswith propriatelydescribetherelationbetweensemanticprimitives(aspredictors) ofsemanticprimitives,abooleanregressionmodelmaybeexpectedtoap- combinationsofsemanticprimitivesaswell. X7Iwouldwantthatsomethingdidn'thappen X6Somethingbadhappened X5Ifeelbad X4Iwouldwanttodosomethingbadtosomebody X11Ifeelgood X12Somebodydidsomethinggood X10Iwantsomethinglikethis X9Somethinggoodhappened X8Ican'tchangethesituation Table1:Listofthe(noncomplemented)predictorsfortheBooleanregression X13Idon'twanttochangethis analysesintheapplication X14Iwouldwanttodosomethinggoodforsomebody eachof14semanticprimitivesintable1wastrueforthegivensituationand askedtogeneratetwentydierentsituationsinwhichtheyhadrecentlybeen askedtospecifyforthetwentysituationstheygenerated:(1)whetherornot involvedandfelteitherangry,sad,grateful,orhappy.next,thesubjectswere Fiverst-yearpsychologystudentsoftheUniversityofLeuvenwereeach (2)whetherornottheyexperiencedeachofthe4forementionedemotions: anger,sadness,gratitude,andhappiness.intheanalyses,the520situa-

originalandthecomplementedsemanticprimitivesareincludedaspredictors,eventuallyresultingin28predictors(x1;:::;x14;:x1;:::;:x14)anativeemotions,angerandsadness,wereverysimilar,astheresultsforboth tionswereconcatenated,resultinginton=100observations,andboththe 9 positiveemotions,gratitudeandhappiness,were,onlyanalyseswithanger 4criteriaYangry,Ysad,Ygrateful,andYhappy.Becausetheresultsforbothneg- andhappinessarepresentedinthefollowingsections. (Leenen&VanMechelen,1998).ForYangry,thebestlogicalrulecombines cies)werefoundusingthepreviouslydiscussedbranch-and-boundalgorithm Optimalconjunctivelogicalrules(i.e.,withminimalnumberofdiscrepan- 4.2Deterministicanalysis thecomplementsofthepredictors9,10,and14:apersonreports(s)heexperiencesangerinagivensituationi\itisnotthecasethatsomethinggood happenedand(s)hedoesnotwantsomethinglikethisand(s)hedoesnot predictedbythesinglepredictor9:apersonreports(s)hefeelshappyi \somethinggoodhappened."table2presentssomegoodness-of-tindices wanttodoanythinggoodforsomebody."yhappyontheotherhandisbest forbothoptimalrules. Emotion Anger :X9^:X10^:X14 Optimalrule %discrepanciesjaccardindex^p Table2:OptimallogicalrulesforYangryandYhappyandassociatedgoodnessof-tstatisticsasfoundbyadeterministicanalysis Happiness 69.89.80.88.75 4.3Bayesiananalysis TheprocedurediscussedinSection3.2wasusedtosimulatetheposterior 4.3.1Modelestimation distributionof(;).foreachcriterion,weranm=5sequencesofthe describedgibbs-metropolisalgorithm.afterconvergence,namelywhengelmanandrubin's(1992)^r-statisticwassmallerthan1.1foreachofthe parametersj(j=1;:::;28)and,another2000runsineachsequence wereexecuted,endingupwithl=10000posteriordrawsforeachcriterion. lently,theconjunctivecombinations)forangerandhappiness,respectively. ministicbranch-and-boundalgorithmmaybeoneofseveral\best"solutions: TheresultsoftheBayesiananalysisshowthattherulefoundbythedeter- Foranger,(atleast)vedierentconjunctivecombinationshaveaminimal Table3givesthemarginaldistributionofthe-parameters(or,equiva-

Logicalrule 10 Angry :X10^:X11^:X12^:X14 Posteriorprobability%discrepancies :X9^:X10^:X11^:X12^:X14 :X9^:X10^:X14 :X9^:X10^:X11^:X14.189.183.166 other :X10^:X11^:X14 :X9^:X10^:X12^:X14 :X10^:X12^:X14 <:006.015.164.019 11 10 9 :X4^X9 :X1^:X4^X9 Happy X9 :X1^X9 :X4^X11.204.185 :X5^X9.170 :X1^:X5^X9.030 :X1^:X4^X11.023 6 :X4^:X5^X9 :X4^:X5^X11 :X1^:X4^:X5^X11 X11.017 :X1^:X5^X11 :X5^X11.016 :X1^:X4^:X5^X9 :X1^X11.015 other.014.010 Table3:SimulatedposteriordistributionofforYangryandYhappy <:004 8 7 numberofdiscrepanciesandtwohaveonly1discrepancymore;forhappiness, isentirelyduetosimulationvariability.)thewiderangeofavailablemodels numberofdiscrepancieshavedierentcomputedposteriorprobabilities;this areequaltothenumberofdiscrepancies.inthetable,modelswiththesame crepancymore.(inourexample,n=100,sothe%discrepanciesintable3 fourconjunctivecombinationsdoequallywelland12logicalruleshave1dis- thattaboutequallywellindicatesthatthestochasticextensioncanadda therulefoundbythedeterministicalgorithm,whichmakethemlessimpor- remarkthatmostoftheotherrulesmerelyaddoneormorepredictorsto wiseonlyasinglerulemightbeconsidered.forthisparticularcase,onemay considerableamountofinformationtothedeterministicanalysis,asother-

tantastheaddedpredictorscannotbeconsideredsinglynecessary.buteven then,thebayesiananalysisgivesmoreinsightintotheuncertaintyassociated 11 =:072. withthemodels,itwasfoundforangerthat=:100andforhappinessthat consideredforthegivendataset.withrespecttotheuncertaintyassociated withthebestsolutionsandintowhichothervaluesforcanreasonablybe and:x14.similarly,forhappiness,thelogicalruleswithhighestposterior ruleswithposteriorprobabilityover.10,namely::x9,:x10,:x11,:x12, X11.Bywayofillustration,wediscusstheresultsofareanalysisofboth massonlyuseasubsetofthesixpredictors:x1,:x4,:x5,x9,x10,and Forthecriterionanger,onlyvepredictorsshowedupintheconjunctive tors.thisallowsustotheoreticallycomputetheposteriordistributionsand collectl=10000posteriordrawsforbothcriteria. inthepreviousanalysis,weusem=5sequencesand,afterconvergence,we criteriawithonlythesemanticprimitivesthatappearedrelevantaspredic- tocomparethistheoreticaldistributionwiththesimulateddistribution.like procedureworksne. happinessrespectively,boththesimulatedandthetheoreticalposteriordistribution.theresultsshowthatthesimulateddistributionisalwayscloseto thetheoreticaldistribution,fromwhichwemayconcludethattheestimation Table4displaysthemarginalposteriordistributionofforangerand subsection.weassumedthatwasconstantacrossobservations(see,eq(3)) ingoneassumptionthatimplicitlyunderliesthemodelappliedintheprevious Inthissection,theposterior-predictive-checkapproachisillustratedforcheck- 4.3.2Modelchecking suchthatinthestudydiscussedabove,nodierencesamongthevesubjects logicalruleswithequalaccuracy. innumberofpredictionerrorsbetweenthevesubjects.therefore,atest involvedareallowed.or,otherwisestated,thesubjectsapplytherespective variablet(y)isdenedas: Individualdierencesinerrorratemaybequantiedbythevariance T(y;)=5Ph=1hDh(y;) 20 D(y;) 5 1100i2 wheredhisthenumberofdiscrepanciesbetweenthe20-componentyand^y vectorofsubjecth.thelargerthevariationamongsubjects,thelargerthe ; lated.next,posteriorpredictivep-valueswerecomputedastheproportionof T(y;(l))andthereplicatedvalueT(y(l);(l))(l=1;:::;10000)werecalcu- y(10000)weresimulatedasdescribedinsection3.3andboththerealizedvalue valueoft. Forbothcriteriaangerandhappiness,10000replicateddatasetsy(1);:::;

12 Logicalrule Angry :X9^:X10^:X11^:X12^:X14 posteriorprobabilityposteriorprobability Exact Simulated :X9^:X10^:X12^:X14 :X10^:X11^:X12^:X14 :X9^:X10^:X14 :X9^:X10^:X11^:X14.189.170.178.189.212 other :X10^:X12^:X14 :X10^:X11^:X14 Happy <:005.021 <:003.017.028 :X1^:X5^X11 :X1^:X4^X9 :X4^X9 :X1^X9 X9.200.190.192.195.212 :X1^:X4^:X5^X11 :X5^X11 :X1^X11 :X5^X9 :X4^:X5^X9.015.017.018.019 :X1^:X4^:X5^X9 X11 :X4^:X5^X11 :X1^:X5^X9 :X1^:X4^X11.015.013.015.016 Table4:ExactandsimulatedposteriordistributionofforYangryandYhappy other :X4^X11 <:003.015 <:002.012 the10000simulationsforwhicht(y;(l))>t(y(l);(l)).foranger,theposteriorpredictivep-valueequals.566andforhappiness,itequals.624,which usingrelevantpredictorsonly isvisualizedinfigure1.figure1plotstheobservedversusthereplicated halfthenumberofpointsbelowtherstbisector.asaresult,itisconcludedthattheposteriorpredictivecheckprovidesnoevidenceforindividual valuesonthetestvariable:roughlyhalfthenumberofpointslieaboveand dierencesinaccuracy.

13 addingnormalrandomnumberstoeachpoint'scoordinates(withstandard theemotions\happy"and\angry."thexandycoordinatesarejitteredby deviation.001)inordertodisplaymultiplevalues. Figure1:PlotoftherealizedT(y;(l))versusthereplicatedT(y(l);(l))for Insomecases,onemayexpecttheprobabilityofafalsepositivetodier fromtheprobabilityofafalsenegativepredictionerror.forexample,ina 5medicalcontext,cautionmaycauseabiasinpredictingsuccessonadan- geroussurgerywhichmakesitunlikelythatfailureoccurswhensuccesswas AsdiscussedbyGelmanetal.(inpreparation),themodelcanbestraight- predictedtoby0and1,respectively. forwardlyexpandedbyallowingdierenterrorrates0and1forresponses Concludingremarks predicted,whereasthereversepredictionerroris(fortunately)morelikely. behelpfulindistinguishingbetweendisjunctiveandconjunctiveassociation than0.5.moreover,allowingtocoverthecompleterangefrom0to1may Booleanregressionmodelnoneedtorestrict(or0and1)tobesmaller junctive/conjunctivemodels(gelmanetal.,inpreparation),thereisforthe IncontrastwithmostotherBayesiangeneralizationsofdeterministicdis- insection2.1,itisclearthatifforeachxjboththeoriginalvariablexjand thecomplementedvariable:xjareincludedaspredictorsthenaconjunctive rules.fromourdiscussiononthedualityofconjunctiveanddisjunctiverules rulewitherrorrateisformallyequivalentwithadisjunctiverulewitherror 1.Theanalysesinsection4fortheillustrativeexampledidincludefor everypredictorboththeoriginalandthecomplementedversionandresulted intovaluesforthatare(considerably)smallerthan0.5.hence,forthis particularcaseaconjunctiveruleisfoundtobemoreappropriatethana disjunctiveone,whichisaresultthatcorrespondswithearliertheoriesand

thatwasestablishedonlyaposteriori. Asanalcomment,wenotethatboththedeterministicandBayesian 14 withtherespectivemodels)isexpectedtoincreasewiththenumberofobservations.andmoreinparticular,thedierencebetweenthebestandthe ind(y;)among,(i.e.,dierencesinnumberofdiscrepanciesassociated ofobservationsisverylarge.for,itistrueingeneralthatthevariance approachesforbooleanregressionseemstobelessusefulwhenthenumber secondbestmodelmostlikelyincreaseswiththenumberofobservations. millionobservationshas10%discrepanciesandhas10:01%discrepancies, FromEquations(6)and(7),whichmakeclearhowtheposteriordensitydependsonthenumberofdiscrepancies,itfollowsthatthelargerthenumber thenhasamuchhigherposteriordensitythan.howthisndingcan peaked.thisimpliesthatifamodelforsomedatasetwith,say,n=1 ofobservations,thesharperthe(marginal)posteriordistribution(for)is posteriordensity,isoneoftheobjectivesforfurtherresearch. bereconciledwiththeintuitionthatbothmodelsshouldhaveaboutequal

References 15 GordonandBreach. Bridgesbetweendeterministicandprobabilisticclassicationmodels. Biswas,N.N.(1975).Introductiontologicandswitchingtheory.NewYork: multiplesequences.statisticalscience,7,457{511. Gelman,A.,Leenen,I.,VanMechelen,I.,&DeBoeck,P.(inpreparation). functions.proceedingsoftheinstitutionofelectricengineerslondon,125,474{ Gelman,A.,&Rubin,D.B.(1992).Inferencefromiterativesimulationusing 482.Leenen,I.,&VanMechelen,I.(1998).Abranch-and-boundalgorithmfor Halder,A.K.(1978).Groupingtablefortheminimizationofn-variableBoolean ysis(pp.164{171).berlin:springer{verlag. HighwaysandInformationFlooding,aChallengeforClassicationandDataAnal- Booleanregression.In:I.Balderjahn,R.Mathar,&M.Schader(Eds.),Data York:McGraw{Hill. McCluskey,E.J.(1965).Introductiontothetheoryofswitchingcircuits.New PsychiatricResearch,2,71{79. parsimoniousdiagnosticandscreeningtests.internationaljournalofmethodsin Ragin,C.C.,Mayer,S.E.,&Drass,K.A.(1984).Assessingdiscrimination:A McKenzie,D.M.,Clarke,D.M.,&Low,L.H.(1992).Amethodofconstructing usingdecimallabels.informationsciences,30,37{45. Booleanapproach.AmericanSociologicalReview,49,221{234. Freeman. Sneath,P.H.A.,&Sokal,R.R.(1973).Numericaltaxonomy.SanFrancisco: Sen,M.(1983).MinimizationofBooleanfunctionsofanynumberofvariables meansofalogicalcombinationofdichotomouspredictors.mathemathiques,informatiquesetscienceshumaines,102,47{54. Tversky,A.(1977).Featuresofsimilarity.PsychologicalReview,84,327{352. VanMechelen,I.,&DeBoeck,P.(1990).Projectionofabinarycriterioninto VanMechelen,I.(1988).Predictionofadichotomouscriterionvariableby amodelofhierarchicalclasses.psychometrika,55,677{694. 581.Wierzbicka,A.(1992).Deningemotionconcepts.CognitiveScience,16,539{

Appendix:Derivingposteriordistributions 16 p(y)=x#2z1 Werstworkoutthepriorpredictivedistributionp(y): 0p(yj#;)p(#)p()d =X#2B(D#+1;n D#+1) 0D#(1 )n D#12k1d =X#212kD#!;(n D#)! 2k Z1 0B(D#+1;n D#+1)D#(1 )n D#d (n+1)! 1 Theintegralinthethirdstepbeingequalto1asitistheareaunderaBeta density. = 2k(n+1)X#21 1 nd# Fortheposteriordistributionof(;),westartfromEq.(4): p(;jy)=p(yj;)p(;) =D(1 )n D1 p(y) =(n+1)d(1 )n D 2k(n+1)P 1 P#21 D#) 2k Toderivethemarginalposteriordistributionof,isintegratedoutinthe jointposteriordistributionforandintheformulaabove. D#) p(jy)=z1 0p(;jy)d 0(n+1)D(1 )n D = #21 P(nD) 1 #21 P D#) d thelatterintegralbeing1asitisagaintheareaunderabetadensity. D#)Z1 0nD(n+1)D(1 )n Dd;