26onetodescriberelationshipsbetweenvariablesforprediction,quantifyingeects,or

Size: px
Start display at page:

Download "26onetodescriberelationshipsbetweenvariablesforprediction,quantifyingeects,or"

Transcription

1 StatisticalThemesandLessonsforDataMining c1996kluweracademicpublishers,boston.manufacturedinthenetherlands. DataMiningandKnowledgeDiscovery,1,25{42(1996) CLARKGLYMOUR DepartmentofCognitivePsychology,CarnegieMellonUniversity,Pittsburgh,PA15213 DAVIDMADIGAN DepartmentofStatistics,Box354322,UniversityofWashington,Seattle,WA98195 DARYLPREGIBON PADHRAICSMYTH somestatisticalthemesandlessonsthataredirectlyrelevanttodataminingandattemptstoidentifyopportunitieswhereclosecooperationbetweenthestatisticalandcomputationalcommunities inbothdisciplinestomakeprogressinextractinginformationfromlargedatabases.itisanemergingeldthathasattractedmuchattentioninaveryshortperiodoftime.thisarticlehighlights InformationandComputerScience,UniversityofCalifornia,Irvine,CA92717 Editor:UsamaFayyad Abstract.DataminingisontheinterfaceofComputerScienceandStatistics,utilizingadvances mightreasonablyprovidesynergyforfurtherprogressindataanalysis. Keywords:Statistics,uncertainty,modeling,bias,variance 1.Introduction softwarehavefreedthestatisticianfromnarrowlyspeciedmodelsandspawned statisticaltoolkitdrawsonarichbodyoftheoreticalandmethodologicalresearch (Table1). afreshapproachtothesubject,especiallyasitrelatestodataanalysis.today's Statisticsisenjoyingarenaissanceperiod.Moderncomputinghardwareand andinterpretationofnumericaldata,especiallytheanalysisofpopulation characteristicsbyinferencefromsampling.(americanheritagedictionary). Sta-tis-tics(noun).Themathematicsofthecollection,organization, or\turningdataintoinformation".thecontextencompassesstatistics,butwith asomewhatdierentemphasis.inparticular,datamininginvolvesretrospective analysesofdata:thus,topicssuchasexperimentaldesignareoutsidethescopeof estedinunderstandabilitythanaccuracyorpredictabilityperse.thus,thereisa soforth.applicationsinvolvingverylargenumbersofvariablesandvastnumbers focusonrelativelysimpleinterpretablemodelsinvolvingrules,trees,graphs,and dataminingandfallwithinstatisticsproper.dataminersareoftenmoreinter- ofmeasurementsarealsocommonindatamining.thus,computationaleciency Theeldofdatamining,likestatistics,concernsitselfwith\learningfromdata"

2 26onetodescriberelationshipsbetweenvariablesforprediction,quantifyingeects,or Table1.Statisticianshavedevelopedalargeinfrastructure(theory)tosupporttheir theuncertaintyassociatedwithdrawinginferencesfromdata.thesemethodsenable methodsandalanguage(probabilitycalculus)todescribetheirapproachtoquantifying C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH AreaofStatistics experimentaldesign&samplinghowtoselectcasesifonehasthelibertytochoose suggestingcausalpaths. exploratorydataanalysis DescriptionofActivities andscalabilityarecriticallyimportant,andissuesofstatisticalconsistencymay beasecondaryconsideration.furthermore,thecurrentpracticeofdataminingis statisticalgraphics statisticalmodeling statisticalinference hypothesisgenerationratherthanhypothesistesting datavisualization regressionandclassicationtechniques (suchasanyofthemanyruleinductionsystemsonthemarket)willproducesets oftenpattern-focusedratherthanmodel-focused,i.e.,ratherthanbuildingacoherentglobalmodelwhichincludesallvariablesofinterest,dataminingalgorithms estimationandpredictiontechniques ticalcomputationalconcerns.however,infocusingalmostexclusivelyoncomputa- tionalissues,itiseasytoforgetthatstatisticsisinfactacorecomponent.theterm thefundamentalstatisticalnatureoftheinferenceproblemisindeedtobeavoided. andstuart,1966;chateld,1995).dataminingwithoutproperconsiderationof \datamining"haslonghadnegativeconnotationsinthestatisticsliterature(selvin However,agoalofthisarticleistoconvincethereaderthatmodernstatisticscan Inthisoverallcontext,currentdataminingpracticeisverymuchdrivenbyprac- ofstatementsaboutlocaldependenciesamongvariables(inruleform). oersignicantconstructiveadvicetothedataminer,althoughmanyproblemsremainunsolved.throughoutthearticlewehighlightsomemajorthemesofstatistics todatamining.forarigoroussurveyofstatistics,themathematicallyinclined research,focusinginparticularonthepracticallessonspertinenttodatamining. anumberofinterestingtopics,includingtimeseriesanalysisandmeta-analysis. readershouldsee,forexample,schervish(1995).forreasonsofspacewewillignore 2.AnOverviewofStatisticalScience ThisSectionbrieydescribessomeofthecentralstatisticalideaswethinkrelevant marginalization(summingoverasubsetofvalues)andconditionalization(forming characterizationsofawealthofprobabilitydistributions,aswellaspropertiesof sureassignsvalues.importantrelationsamongprobabilitydistributionsinclude randomvariables{functionsdenedonthe\events"towhichaprobabilitymea- ProbabilityDistributions.Thestatisticalliteraturecontainsmathematical

3 aconditionalprobabilitymeasurefromameasureonasamplespaceandsome eventofpositivemeasure).essentialrelationsamongrandomvariablesinclude STATISTICALTHEMESANDLESSONSFORDATAMINING independence,conditionalindependence,andvariousmeasuresofdependence,of anyparticularmemberofthefamilyfromdata,orbyclosurepropertiesusefulin characterizesfamiliesofdistributionsbypropertiesthatareusefulinidentifying whichthemostfamousisthecorrelationcoecient.thestatisticalliteraturealso 27 modelconstructionorinference,forexampleconjugatefamilies,closedunderconditionalization,andthemultinormalfamily,closedunderlinearcombination.a aprobabilitydistribution.classicalstatisticsinvestigatessuchdistributionsof ofestimatorscorrespondingtoallpossiblesamplesfromthatcollectionalsohas actualorpotentialcollectiongovernedbysomeprobabilitydistribution,thefamily dataandmakingappropriateinferences. knowledgeofthepropertiesofdistributionfamiliescanbeinvaluableinanalyzing estimatorsinordertoestablishbasicpropertiessuchasreliabilityanduncertainty. Avarietyofresamplingandsimulationtechniquesalsoexistforassessingestimator uncertainty(efronandtibshirani,1993). ModelAveraging.Anestimatorisafunctionfromsampledatatosomeestimand, suchasthevalueofaparameter.whenthedatacompriseasamplefromalarger Estimation,Consistency,Uncertainty,Assumptions,Robustness,and aretypicallyfalse,butoftenuseful.ifamodel(whichwecanthinkofasasetof assumptions)isincorrect,estimatesbasedonitcanbeexpectedtobeincorrect aswell.oneoftheaimsofstatisticalresearchistondwaystoweakenthe assumptionsnecessaryforgoodestimation.\robuststatistics"(huber,1981) looksforestimatorsthatworksatisfactorilyforlargerfamiliesofdistributionsand havesmallerrorswhenassumptionsareviolated. Estimationalmostalwaysrequiressomesetofassumptions.Suchassumptions sumptionsareoftenplausible.ratherthanmakinganestimatebasedonasingle model,severalmodelscanbeconsideredandanestimateobtainedastheweighted Carloanalysis.Ourimpressionisthattheerrorratesofsearchproceduresproposed 1994).Infact,suchBayesianmodelaveragingisboundtoimprovepredictiveperformance,onaverage.Sincethemodelsobtainedindataminingareusuallythe resultsofsomeautomatedsearchprocedure,accountingforthepotentialerrors Bayesianestimationemphasizesthatalternativemodelsandtheircompetingas- averageoftheestimatesgivenbytheindividualmodels(madiganandraftery, associatedwiththesearchitselfiscrucial.inpractice,thisoftenrequiresamonte hypothesistestingisinconsistentunlessthealphalevelofthetestingruleisdecreasedappropriatelyasthesamplesizeincreases.generally,anleveltestofone hypothesisandanleveltestofanotherhypothesisdonotjointlyprovidean leveltestoftheconjunctionofthetwohypotheses.inspecialcases,rules(some- andusedinthedataminingandinthestatisticalliteraturearefartoorarelyesti- matedinthisway.(seespirtesetal.,1993formontecarlotestdesignforsearch portantlimitationsshouldbenoted.viewedasaone-sidedestimationmethod, procedures.) HypothesisTesting.Sincestatisticaltestsarewidelyused,someoftheirim-

4 28 oferroneouslyndingsomedependentsetofvariableswheninfactallpairsare testingaseriesofhypothesis.if,forexample,foreachpairofasetofvariables, timescalledcontrasts)existforsimultaneouslytestingseveralhypotheses(miller, hypothesesofindependencearetestedat=0:05,then0.05isnottheprobability ingdirectlytodowiththeprobabilityoferrorinasearchprocedurethatinvolves 1981).Animportantcorollaryfordataminingisthatthelevelofatesthasnoth- C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH independent.thus,indataminingproceduresthatuseasequenceofhypothesis tests,thealphalevelofthetestscannotgenerallybetakenasanestimateofany nomatterhowcloselytheyseemtotthedata. ples;testsoflinearmodels,forexample,typicallyrejecttheminverylargesamples errorprobabilityrelatedtotheoutcomeofthesearch. dowiththetruthofhypotheses,theconnectionissomewhattenuous(seesection 5.3).Hypothesesthatareexcellentapproximationsmayberejectedinlargesam- Dataminersshouldnotethatwhileerrorprobabilitiesoftestshavesomethingto correspondstoapreferenceorderingoverthespaceofmodels,giventhedata.for thereasonsjustconsidered,scoringrulesareoftenanattractivealternativetotests. modelsorhypothesestoothers,andtobeindierentbetweenstillothermodels.a InformationCriterion(Raftery,1995),andMinimumDescriptionlength(Rissanen, scoreisanyrulethatmapsmodelsanddatatonumberswhosenumericalordering withthemodel,thenumberofparameters,ordimension,ofthemodel,andthe Typicalrulesassignmodelsavaluedeterminedbythelikelihoodfunctionassociated data.popularrulesincludetheakaikeinformationcriterion(akaike,1974),bayes ModelScoring.Theevidenceprovidedbydatashouldleadustoprefersome onthedataisitselfascoringfunction,arguablyaprivilegedone.thebayes InformationCriterionapproximatesposteriorprobabilitiesinlargesamples. 1978).Givenapriorprobabilitydistributionovermodels,theposteriorprobability modelspacetocalculatescoresforallmodels;itis,however,oftenfeasibleto samemodel,butevendierentorderingsofmodels. fromthesamedistributionmayyieldnotonlydierentnumericalvaluesforthe uncertaintiesassociatedwithscores,sincetwodierentsamplesofthesamesize scores.aicscoresarenot,ingeneral,consistent(schwartz,1978).therearealso plelimit,almostsurelythetruemodelshouldbeamongthosereceivingmaximal Forobviouscombinatorialreasons,itisoftenimpossiblewhensearchingalarge Thereisanotionofconsistencyappropriatetoscoringrules;inthelargesam- describeandcalculatescoresforafewequivalenceclassesofmodelsreceivingthe highestscores. inmontecarlomethodshave,however,liberatedanalystsfromsomeofthesecon- Bayesianmodelsandcomplexlikelihoodcalculations.Recentdramaticadvances dicultiesforceddataanalyststoeschewexactanalysisofelaboratehierarchical frominferencesmadewithhypothesistests.raftery(1995)givesexamplesofmodelsthataccountforalmostallofthevarianceofanoutcomeofinterest,andhave veryhighbayesianscores,butareoverwhelminglyrejectedbystatisticaltests. Insomecontexts,inferencesmadeusingBayesianscorescandieragreatdeal MarkovChainMonteCarlo.Historically,insurmountablecomputational

5 straints.oneparticularclassofsimulationmethods,dubbedmarkovchainmonte STATISTICALTHEMESANDLESSONSFORDATAMINING Carlo,originallydevelopedinstatisticalmechanics,hasrevolutionizedthepractice ofbayesianstatistics.smithandroberts(1993)provideanaccessibleoverview fromthebayesianperspective;gilksetal.(1996)provideapracticalintroduction addressingbothbayesianandnon-bayesianperspectives. Simulationmethodsmaybecomeunacceptablyslowwhenfacedwithmassive 29 GeneralizedLinearModels,forinstance,embracemanyclassicallinearmodels,and calresearchhasbeenthedevelopmentofverygeneralandexiblemodelclasses. seeforexamplekooperbergetal.(1996),kassandraftery(1995),andgeigeret al.(1996). unifyestimationandtestingtheoryforsuchmodels(mccullaghandnelder,1989). GeneralizedAdditiveModelsshowsimilarpotential(HastieandTibshirani,1990). datasets.insuchcases,recentadvancesinanalyticapproximationsproveuseful- Graphicalmodels(Lauritzen,1996)representprobabilisticandstatisticalmodels fordescribingmodelsandthegraphsthemselvesmakemodelingassumptionsexplicit.graphicalmodelsprovideimportantbridgesbetweenthevaststatistical analysis,anddatamining. withplanargraphs,wheretheverticesrepresent(possiblylatent)randomvariables andtheedgesrepresentstochasticdependences.thisprovidesapowerfullanguage Generalizedmodelclasses.Amajorachievementofstatisticalmethodologi- literatureonmultivariateanalysisandsucheldsasarticialintelligence,causal etc.typically,rationaldecisionmakingandplanningarethegoalsofdatamining, Givenallofthisinformation,adecisionrulespecieswhichofthealternativeactionsoughttobetaken.Alargeliteratureinstatisticsandeconomicsaddresses alternativedecisionrules{maximizingexpectedutility,minimizingmaximumloss, sumesthedecisionmakerhasavailableadenitesetofalternativeactions,knowl- edgeofadenitesetofpossiblealternativestatesoftheworld,knowledgeofthe RationalDecisionMakingandPlanning.Thetheoryofrationalchoiceas- theworld,andknowledgeoftheprobabilitiesofvariouspossiblestatesoftheworld. payosorutilitiesoftheoutcomesofeachpossibleactionineachpossiblestateof rationalchoiceposesnormsfortheuseofinformationobtainedfromadatabase. andratherthanprovidingtechniquesormethodsfordatamining,thetheoryof knowledgeoftheeectsalternativeactionswillhave.toknowtheoutcomesof ofbernoulliandlaplace,theabsenceofcausalconnectionbetweentwovariables actionsistoknowsomethingofcauseandeectrelations,andextractingsuch causalinformationisoftenoneoftheprinciplegoalsofdataminingandofstatisticalinferencemoregenerally. historicaldevelopmentofstatistics.fromthebeginningofthesubject,inthework Theveryframeworkofrationaldecisionmakingrequiresprobabilitiesanda hasbeentakentoimplytheirprobabilisticindependence(seestigler,1986),and thesameideaisfundamentalinthetheoryofexperimentaldesign(fisher,1958). Earlyinthiscentury,Wright(1921)introduceddirectedgraphstorepresentcausal hypotheses(withverticesasrandomvariablesandedgesrepresentingdirectinu- InferencetoCauses.Understandingcausationisthehiddenforcebehindthe

6 30 socialsciences,biology,computerscienceandengineering. ences),andtheyhavebecomecommonrepresentationsofcausalhypothesesinthe betweenindependenceandabsenceofcausalconnectioninwhattheycalledthe Markovcondition:providedYisnotaneectofX,XandYareconditionally independentgiventhedirectcausesofx.theyshowedthatmuchofthelinear KiiveriandSpeed(1982)combineddirectedgraphswithageneralizedconnection C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH causalmodelsofcategoricaldata,andvirtuallyallcausalmodelsofsystemswithoutfeedback.underadditionalassumptions,conditionalindependencetherefore modelingliteraturetacitlyassumedthemarkovcondition;thesameistruefor manysourcesoferroranddataminersshouldproceedwithextremecaution. tributionssatisfyingthemarkovconditionarecalledbydierentnamesindierent names,including\faithfulness."directedgraphswithassociatedprobabilitydis- literatures:bayesnets,beliefnets,structuralequationmodels,pathmodels,etc. oughlyinvestigated,additionalassumptionisthatallconditionalindependencies Nonetheless,causalinferencesfromuncontrolledconveniencesamplesareliableto providesinformationaboutcausaldependence.themostcommon,andmostthortualcausalprocessesgeneratingthedata,arequirementthathasbeengivenmany areduetothemarkovconditionappliedtothedirectedgraphdescribingtheacpliedbyhumanexperts,orinferredfromthedatabaseautomatically.regression, probabilitydistribution.indataminingcontexts,structureistypicallyeithersup- obtainedfromthesameprobabilitydistribution.aswithestimation,inprediction varianceofthepredictor. weareinterestedbothinreliabilityandinuncertainty,oftenmeasuredbythe predictpropertiesofanewsample,whereitisassumedthatthetwosamplesare forexample,assumesaparticularfunctionalformrelatingvariables.structurecan Predictionmethodsforthissortofproblemalwaysassumesomestructureinthe Prediction.Sometimesoneisinterestedinusingasample,oradatabase,to bealsobespeciedintermsofconstraints,suchasindependence,conditionalindependence,higherorderconditionsoncorrelations,etc.onaverage,aprediction methodthatguaranteessatisfactionoftheconstraintsrealizedintheprobability distribution{andnoothers{willbemoreaccurateandhavesmallervariancethan Inthemid1960's,thestatisticscommunityreferredtounfetteredexplorationof 3.IsDataMining\StatisticalDejaVu"(AllOverAgain)? bymodelaveraging,providedthepriorprobabilitiesofthealternativeassumptions imposedbythemodelareavailable. cultissueinthissortofprediction.aswithestimation,predictioncanbeimproved onethatdoesnot.findingtheappropriateconstraintstosatisfyisthemostdi- arguedthatsincetheirtheorieswereinvalidatedby\lookingatthedata",itwas enamoredbyelegant(analytical)mathematicalsolutionstoinferentialproblems, wrongtodoso.themajorproponentoftheexploratorydataanalysis(eda) dataas\shing"or\datadredging"(selvinandstuart,1966).thecommunity, school,j.w.tukey,counteredthisargumentwiththeobviousretortthatstatis-

7 ticianswereputtingthecartbeforethehorse.hearguedthatstatisticaltheory STATISTICALTHEMESANDLESSONSFORDATAMINING anddevisingformalmethodstoaccountforsearchintheirinferentialprocedures. shouldadapttothescienticmethodratherthantheotherwayaround.thirty yearshence,thestatisticalcommunityhaslargelyadoptedtukey'sperspective, andhasmadeconsiderableprogressinservingbothmasters,namelyacknowledgingthatmodelsearchisacriticalandunavoidablestepinthemodelingprocess, 31 minersare:clarityaboutgoals,appropriatereliabilityassessment,andadequate ticularlychallengingindynamicsituations).inyetothercases,dataanalysisaims accountingforsourcesofuncertainty. Inothercases,dataanalysisaimstopredictfeaturesofnewcases,ornewsamples, drawnfromoutsidethedatabaseusedtodevelopapredictivemodel(thisispar- computablerepresentationofhowthedataaredistributedinaparticulardatabase. Threethemesofmodernstatisticsthatareoffundamentalimportancetodata fromwhichthemodel(ormodels)weredeveloped.eachofthesegoalspresent causalmechanismsthatareusedtoformpredictionsaboutnewsamplesthatmight toprovideabasisforpolicy.thatis,theanalysisisintendedtoyieldinsightinto beproducedbyinterventionsoractionsthatdidnotapplyintheoriginaldatabase Clarityaboutgoals.Sometimesdataanalysisaimstondaconvenient,easily distinctinferenceproblems,withdistincthazards.confusingorequivocatingover theaiminvitestheuseofinappropriatemethodsandmayresultinunfortunate usewillresultinimprovedobstetricoutcome".fortunately,thereexistsindependentevidencetosupportthiscausalclaim.however,muchofchasnoetal.'spaper focusesonastatisticalanalysis(analysisofvariance)thathaslittle,ifanything,to dowiththecausalquestionofinterest. (1989)comparingbabiesborntococaine-usingmotherswithbabiesborntononcocaine-usingmothers.Theauthorsconcluded:\Forwomenwhobecomepregnant Asanexample,considertheobservationalstudyreportedbyChasnoetal. andareusersofcocaine,interventioninearlypregnancywithcessationofcocaine predictionsandinferences. particulartreatment(diggleandkenward,1994).inthiscase,theimportantissue analyzingclinicaltrialdatawherepatientsdropoutduetoadverseside-eectsofa thepopulationwhoremainwithinthetrial?thisproblemarisesinmoregeneral settingsthaninclinicaltrials,e.g.,non-respondents(refusers)insurveydata.in answer. iswhichpopulationisoneinterestedinmodelling?thepopulationatlargeversus rightanswerstothewrongquestion.forexample,hediscussestheproblemof suchsituationsitisimportanttobeexplicitaboutthequestionsoneistryingto Hand(1994)providesaseriesofexamplesillustratinghoweasyitistogivethe problemsothattherightquestioncanbeasked?hand'sconclusionisthatthis islargelyan\art"becauseitislesswellformalizedthanthemathematicaland thatofformulatingstatisticalstrategyi.e.,howdoesonestructureadataanalysis computationaldetailsofapplyingaparticulartechnique.this\art"isgained throughexperience(atpresentatleast)ratherthantaught.theimplicationfor Inthisgeneralcontextanimportantissue(discussedatlengthinHand(1994))is

8 32 dataminingisthathumanjudgementisessentialformanynon-trivialinference problems.thus,automationcanatbestonlypartiallyguidethedataanalysis oftendicult,process. theuser(andconsumer)understandsandndsplausibleinthecontext. process.properlydeningthegoalsofananalysisremainsahuman-centred,and Useofmethodsthatarereliablemeanstothegoal,underassumptions C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH Statisticaltheoryappliesseveralmeaningstotheword\Reliability",manyofwhich alsoapplytomodelsearch.forexample,underwhatconditionsdoesasearch procedureprovidecorrectinformation,ofthekindsought,withprobabilityone asthesamplesizeincreaseswithoutbound?answerstosuchquestionsareoften available,thedataanalystshouldpaycarefulattentiontothereasonablenessof elusiveandcanrequiresophisticatedmathematicalanalysis.whereanswersare underlyingassumptions.anotherkeydataminingquestionisthis:whatarethe probabilitiesofvariouskindsoferrorsthatresultfromusingamethodinnite samples?theanswerstothisquestionwilltypicallyvarywiththekindsoferrors considered,withthesamplesize,andwiththefrequencyofoccurrenceofthevarious pellingexample. orthecorrectprediction.thedataanalystmustquantifytheseuncertaintiesso shouldleavetheinvestigatorwitharangeofuncertaintiesaboutthecorrectmodel, kindsoftargetsorsignalswhosedescriptionisthegoalofinference.thesequestions areoftenbestaddressedbymontecarlomethods,althoughinsomecasesanalytic thatsubsequentdecisionscanbeappropriatelyhedged.section4providesacomgroundknowledgeandeventhebestmethodsofsearchandstatisticalassessment resultsmaybeavailable. questioniswhetherornotspecicrecurrentpressurepatternscanbeclearlyidentiedfromdailygeopotentialheightrecordswhichhavebeencompiledinthe Anotherexampleinvolvesacurrentdebateintheatmosphericsciences.The Asenseoftheuncertaintiesofmodelsandpredictions.Quiteoftenback- NorthernHemispheresince1948.Theexistenceofwell-denedrecurrentpatterns modelsviaresamplingtechniques,itisdiculttoinferfromthemultiplestudies (or\regimes")hassignicantimplicationsformodelsofupperatmospherelowfrequencyvariabilitybeyondthetime-scaleofdailyweatherdisturbances(and, low-dimensionalprojectionsofthegriddeddata(seemichelangelietal.(1995)and thus,modelsoftheearth'sclimateoverlargetime-scales).severalstudieshave othersreferredtotherein).whilethisworkhasattemptedtovalidatethecluster degreeofcertaintyandthatthereisafundamentaluncertainty(giventhecurrent data)abouttheunderlyingmechanismsatwork.allisnotlost,however,sinceit whetherregimestrulyexist,and,iftheydo,wherepreciselytheyarelocated.it seemslikelythat48wintersworthofdataisnotenoughtoidentifyregimestoany usedavarietyofclusteringalgorithmstodetectinhomogeneities(\bumps")in isalsoclearthatonecouldquantifymodeluncertaintyinthiscontext,andtheorize accordingly(seesection4). ofthehazardsofdatamining. Inwhatfollowswewillelaborateonthesepointsandoeraperspectiveonsome

9 estimateorapredictionisalmostalwaysinadequate.quanticationoftheuncertaintyassociatedwithasinglenumber,whileoftenchallenging,iscriticalfor 4.CharacterizingUncertainty STATISTICALTHEMESANDLESSONSFORDATAMINING 33 Thestatisticalapproachcontendsthatreportingasinglenumberforaparameter subsequentdecisionmaking.asanexample,draper(1995),consideredthecaseof the1980energymodelingforum(emf)atstanforduniversitywherea43-person workinggroupofeconomistsandenergyexpertsconvenedtoforecastworldoil pricesfrom1981to2020.thegroupgeneratedpredictionsbasedonanumberof econometricmodelsandscenarios,embodyingavarietyofassumptionsaboutsupply,demand,andgrowthratesofrelevantquantities.aplausiblereferencescenario andmodelwasselectedasrepresentative,butthesummaryreport(emf,1982) thewarningaboutthepotentialuncertaintyassociatedwiththepointestimates, toacceptanyprojectionasaforecast."thesummaryreportdidconclude,however,thatmostoftheuncertaintyaboutfutureoilprices\concernsnotwhether cautionedagainstinterpretingpointpredictionsbasedonthereferencescenarioas thesepriceswillrise...buthowrapidlytheywillrise." inthequotationabove,andproceededtoinvestanestimated$500billiondollars, \[theworkinggroup's]`forecast'oftheoilfuture,astherearetoomanyunknowns governmentsandprivatecompaniesaroundtheworldfocusedonthelastsentence onthebasisthatthepricewouldprobablybecloseto$40dollarsperbarrelinthe mid-eighties.infact,theactual1986worldaveragespotpriceofoilwasabout$13 perbarrel. In1980,theaveragespotpriceofcrudeoilwasaround$32perbarrel.Despite (andshould)haveproceededmorecautiouslyin1980,hadtheyunderstoodthefull extentoftheiruncertainty. intervalforthe1986pricewouldhaverangedfromabout$20toover$90.note tisticalanalysisdoesnotprovideclairvoyance.however,decisionmakerswould thatthisintervaldoesnotactuallycontaintheactual1986price{insightfulstafulbutelementarystatisticalmethods,draper(1995)showsthata90%predictive Correctlyaccountingforthedierentsourcesofuncertaintypresentssignicant UsingonlytheinformationavailabletotheEMFin1980,alongwiththought- parametricandpredictiveuncertaintyinthecontextofaparticularmodel.two distinctapproachesareincommonuse.\frequentist"statisticiansfocusonthe tersandpredictionsbyso-calledsamplingdistributions.\bayesian"statisticians randomnessinsampleddataandsummarizetheinducedrandomnessinparame- insteadtreatthedataasxed,andusebayestheoremtoturnprioropinionabout challenges.untilrecently,thestatisticalliteraturefocusedprimarilyonquantifying calledposteriordistributionthatembracesalltheavailableinformation.theerce quantitiesofinterest(alwaysexpressedbyaprobabilitydistribution),intoaso- conictsbetweenpreviousgenerationsoffrequentistsandbayesians,havelargely givenwayinrecentyearstoamorepragmaticapproach;moststatisticianswill basetheirchoiceoftoolonscienticappropriatenessandconvenience.

10 34 uncertainty(asdiscussedinthepreviousparagraph)mayoften,inpractice,be andyork,1995).itiscommonpracticenowadaysforstatisticiansanddataminers tousecomputationallyintensivemodelselectionalgorithmstoseekoutasingle dominatedbybetween-modeluncertainty(chateld,1995,draper,1995,madigan optimalmodelfromanenormousclassofpotentialmodels.theproblemisthat Inanyevent,recentresearchhasleadtoincreasedawarenessthatwithin-model C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH ofuncertaintyincludebayesianmodelaveraging(draper,1995)andresampling carefullyaboutmodelassessmentandlookbeyondcommonlyusedgoodness-of-t measuressuchasmeansquareerror. Intuitively,ambiguityoverthemodelshoulddiluteinformationabouteectparametersandpredictions,since\partoftheevidenceisspenttospecifythemodel" (Leamer,1978,p.91).Promisingtechniquesforproperlyaccountingforthissource severaldierentmodelsmaybeclosetooptimal,yetleadtodierentinferences. methods(breiman,1996).themainpointhereisthatdataminersneedtothink meetsdata. ofstatistics.whilestatisticsdoesnothavealltheanswersforthedataminer,it thissection,wedescribesomelessonsthatstatisticianshavelearnedwhentheory doesprovideausefulandpracticalframeworkforwhichtosearchforsolutions.in 5.Whatcangowrong,willgowrong 5.1.DataCanLie Dataminingposesdicultandfundamentalchallengestothetheoryandpractice Dataminingapplicationstypicallyrelyonobservational(asopposedtoexperimental)data.Interpretingobservedassociationsinsuchdataischallenging;sensiblhospitaldeaths)from1981to1990,focusingspecicallyonpatientswhohadreceivedaprimaryopencholecystectomy.Someofthesepatientshadinaddition deaths.achi-squaretestcomparingthisoutcomeforthetwogroupsofpatients receivedanincidental(i.e.discretionary)appendectomyduringthecholecystectomyprocedure.table2displaysthedataononeoutcome,namelyin-hospital showsa\statisticallysignicant"dierence.this\nding"issurprisingsincelongtermpreventionofappendicitisisthesolerationalefortheincidentalappendectomy Wen,Hernandez,andNaylor(1995;WHNhereafter)analyzedadministrative factors.hereweoeradetailedexampletosupportthisposition. inferencesrequirecarefulanalysis,anddetailedconsiderationoftheunderlying recordsofallontariogeneralhospitalseparations(discharges,transfers,orin- procedure{noshort-termimprovementinoutcomesisexpected.this\nding" mightleadanaivehospitalpolicymakertoconcludethatallcholecystectomypatientsshouldhaveanincidentalappendectomytoimprovetheirchancesofagood outcome!clearlysomethingisamiss-howcouldincidentalappendectomyimprove outcomes?

11 STATISTICALTHEMESANDLESSONSFORDATAMINING Table2.In-hospitalSurvivalofPatientsUndergoingPrimaryOpen CholecystectomyWithandWithoutIncidentalAppendectomy. AppendectomyAppendectomy Without 35 (usingtendierentdenitionsof\low-risk"),incidentalappendectomyindeedre- butappearstopositivelyaectoutcomeswhenthelow-riskandhigh-riskpatients sultedinpooreroutcomes.paradoxically,itcouldevenbethecasethatappendec- tomyadverselyaectsoutcomesforbothhigh-riskpatientsandlow-riskpatients, WHNdidseparatelyconsiderasubgroupoflow-riskpatients.Forthesepatients In-hospitaldeaths,No.(%)21(0.27%)1,394(0.73%) In-hospitalsurvivors,No.(%)7,825(99.73%)190,205(99.27%) arecombined.whndonotprovideenoughdatatocheckwhetherthisso-called \Simpson'sParadox"(Simpson,1951)occurredinthisexample.However,Table3 presentsdatathatareplausibleandconsistentwithwhn'sdata. Table3.FictitiousdataconsistentwiththeWenetal.(1995) data. tiousdata.clearlytheriskanddeathcategoriesaredirectlycorrelated.inaddition, Table4displaysthecorrespondingproportionsofin-hospitaldeathforthesecti- Survival7700 DeathLow-RiskHigh-RiskLow-RiskHigh-Risk Appendectomy 7With Appendectomy Without thattheyhadanappendectomyallowsustoinferthattheyaremorelikelytobe appendectomiesaremorelikelytobecarriedoutonlow-riskpatientsthanonhighriskones.thus,ifwedidnotknowtheriskcategory(age)ofapatient,knowing 1294 pendectomywilllowerone'srisk.nonetheless,whenriskisomittedfromthetable, exactlysuchafallaciousconclusionappearsjustiedfromthedata. lowerrisk(younger).however,thisdoesnotinanywayimplythathavinganap- analysis,adjustingformanypossibleconfoundingvariables(e.g.age,sex,admissionstatus).theyconcludethat\thereisabsolutelynobasisforanyshort-term improvementinoutcomes"duetoincidentalappendectomy.thiscarefulanalysis agreeswithcommonsenseinthiscase.ingeneral,analysesofobservationaldata demandsuchcare,andcomewithnoguarantees.othercharacteristicsofavailable datathatconnivetospoilcausalinferencesinclude: Returningtotheoriginaldata,WHNprovideamoresophisticatedregression

12 36 riskgroupingforthectitiousdataoftable3. Table4.Proportionofin-hospitaldeathscrossclassiedbyincidentalappendectomyandpatient C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH Low-Risk AppendectomyAppendectomy With Without Thepopulationunderstudymaybeamixtureofdistinctcausalsystems,resultinginstatisticalassociationsthatareduetothemixingratherthantoany Associationsinthedatabasemaybedueinwholeorparttounrecordedcommon causes(latentvariables). Combined0.003 High-Risk Missingvaluesofvariablesforsomeunitsmayresultinmisleadingassociations Membershipinthedatabasemaybeinuencedbytwoormorefactorsunderstudy,whichwillcreatea\spurious"statisticalassociationbetweenthose directinuenceofvariablesononeanotheroranysubstantivecommoncause. Manymodelswithquitedistinctcausalimplicationsmay\t"thedataequally amongtherecordedvalues. Thefrequencydistributionsinsamplesmaynotbewellapproximatedbythe Therecordedvaluesofvariablesmaybetheresultof\feedback"mechanisms variables. oralmostequallywell. mostfamiliarfamiliesofprobabilitydistributions. regressioncaninsomecasesproduceinferiorestimatesofeectsizes.procedures asintheappendectomyexample,buttheyarenotalwaysadequateguardsagainst thesehazards.indeed,controllingforpossiblyconfoundingvariableswithmultiple suchasmultipleregression,andlogisticregressionmayworkinmanycases,such tisticalproceduresyetavailablethatcanbeused\otheshelf"{thewayrandom- izationisusedinexperimentaldesign{toreducetheserisks.standardtechniques Thereisresearchthataddressesaspectsoftheseproblems,buttherearefewsta- whicharenotwellrepresentedbysimple\non-recursive"statisticalmodels. recentlydevelopedinthearticialintelligenceandstatisticsliterature(spirteset al.,1993)addresssomeoftheproblemsassociatedwithlatentvariablesandmixing,butsofaronlyfortwofamiliesofprobabilitydistributions,thenormaland multinomial.

13 institutionsthatgiverisetodata,canbeuncooperative.insuchcases,inferences 5.2.Sometimesit'snotwhat'sinthedatathatmatters Classicalstatisticalmethodsstartwitharandomsample,yetinpractice,dataorthe STATISTICALTHEMESANDLESSONSFORDATAMINING thatignorehowthedatawere\selected"canleadtodistortedconclusions. Consider,forexample,theChallengerSpaceShuttleaccident.TheRogersCommissionconcludedthatanO-ringfailureinthesolidrocketboosterledtothe structuralbreakupandlossofthechallenger.inreconstructingtheeventsleadinguptothedecisiontolaunch,thecommissionnotedamistakeintheanalysis ofthermal-distressdatawherebyightswithno(i.e.zero)incidentsofo-ring thetemperatureeect.thistruncationofthedataledtotheconclusionthat temperaturesinceitwasfeltthattheydidnotcontributeanyinformationabout norelationshipbetweeno-ringdamageandtemperatureexisted,andultimately, damagewereexcludedfromcriticalplotsofo-ringdamageandambientlaunch thedecisiontolaunch.dalaletal.(1989)throwstatisticallightonthematter ariskyproposition. andquantifyingtherisk(ofcatastrophicfailure)at31of.hadtheoriginalanalysis bydemonstratingthestrongcorrelationbetweeno-ringdamageandtemperature, usedallofthedata,itwouldhaveindicatedthatthedecisiontolaunchwasatbest couldeasilyhavebeenavoided.inmostproblems,selectionbiasisaninherent standardinferences.thelessonstobelearnedhereare thatanytechniqueusedtoanalyzetruncateddataasifitwasarandomsample, characteristicoftheavailabledataandmethodsofanalysisneedtodealwithit.it isourexperiencethateverydatasethasthepotentialforselectionbiastoinvalidate Intheabovecase,theselectionbiasproblemwasoneof\humanerror"and 37 thedatathemselvesareseldomcapabletoalerttheanalystthataselection canbefooled,regardlessofhowthetruncationwasinduced; mechanismisoperating informationexternaltothedataathandiscritical dataminersastrayinmostapplications. makewidespreaduseofp-values.however,indiscriminateuseofp-valuescanlead classical(frequentist)statistics.itseemsnatural,therefore,thatdataminersshould 5.3.ThePerversityofthePervasiveP-value P-valuesandassociatedsignicance(orhypothesis)testsplayacentralrolein inunderstandingthenatureandextentofpotentialbiases. pothesesabouttheworld:thenullhypothesis,commonlydenotedbyh0,andthe isselectedandcalculatedfromthedataathand.theideaisthatt(data)should AlternativeHypothesis,commonlydenotedbyHA.TypicallyH0is\nested"within tozero,whilehamightplacenorestrictiononthecombination.ateststatistic,t HA;forexample,H0mightstatethatacertaincombinationofparametersisequal Thestandardsignicancetestproceedsasfollows.Considertwocompetinghy-

14 38 measuretheevidenceinthedataagainsth0.theanalystrejectsh0infavorofha ift(data)ismoreextremethanwouldbeexpectedifh0weretrue.specically, islessthanapresetsignicancelevel,. orequaltot(data),giventhath0istrue.theanalystrejectsh0ifthep-value theanalystcomputesthep-value,thatis,theprobabilityoftbeinggreaterthan Therearethreeprimarydicultiesassociatedwiththisapproach: C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH 1.Thestandardadvicethatstatisticseducatorsprovide,andscienticjournals 2.Raftery(1995)pointsoutthatthewholehypothesistestingframeworkrests rigidlyadhereto,istochoosetobe0.05or0.01,regardlessofsamplesize. agriculturalexperiments(ontheorderof30-200plots).textbookadvice(e.g., NeymanandPearson,1933)hasemphasizedtheneedtotakeaccountofthe Theseparticular-levelsaroseinSirRonaldFisher'sstudyofrelativelysmall samplesizeislarge.thiscrucialbutvagueadvicehaslargelyfallenondeaf powerofthetestagainsthawhensetting,andsomehowreducewhenthe onthebasicassumptionthatonlytwohypothesesareeverentertained.in ears. 3.TheP-valueistheprobabilityassociatedwiththeeventthattheteststatistic canleadtoundesirableoutcomessuchasselectingamodelwithparameters thatarehighlysignicantlydierentfromzero,evenwhenthetrainingdata aconsequence,indiscriminateuseofp-valueswith\standard"xed-levels practice,dataminerswillconsiderverylargenumbersofpossiblemodels.as arepurenoise(freedman,1983).thispointisoffundamentalimportancefor dataminers. wasasextremeasthevalueobserved,ormoreso.however,theeventthat actuallyhappenedwasthataspecicvalueoftheteststatisticwasobserved. Consequently,therelationshipbetweentheP-valueandtheveracityofH0is subtleatbest.jereys(1980)putsitthisway: toamoredirectinterpretation-thebayesiananalystcomputestheposteriorprobabilitythatahypothesisiscorrect.withxed-levels,thefrequentistandthe BayesFactorsaretheBayesiananalogueofthefrequentistP-valuesandadmit Theyamounttosayingthatahypothesisthatmayormaynotbe trueisrejectedbecauseagreaterdeparturefromthetrialvaluewas happened. improbable;thatis,thatithasnotpredictedsomethingthathasnot IhavealwaysconsideredtheargumentsfortheuseofPabsurd. Bayesianwillarriveatverydierentconclusions.Forexample,BergerandSellke distribution.onewaytoreconcilethetwopositionsistoviewbayesfactorsasa resultinaposteriorprobabilityforh0thatisatleast0.30forany\objective"prior methodforselectingappropriate-levels-seeraftery(1995). (1987)showthatdatathatyieldaP-valueof0.05whentestinganormalmean,

15 5.4.InterventionandPrediction STATISTICALTHEMESANDLESSONSFORDATAMINING Aspecicclassofpredictionproblemsinvolveinterventionsthataltertheprobabilitydistributionoftheproblem,asinpredictingthevalues(orprobabilities)of 39 variablesunderachangeinmanufacturingprocedures,orchangesineconomicor averagingapply.forgraphicalrepresentationsofcausalhypothesesaccordingto tionsfromcompleteorincompletecausalmodelsweredevelopedin(spirtesetal., tionwithoutintervention,althoughtheusualcaveatsaboutuncertaintyandmodel themarkovcondition,generalalgorithmsforpredictingtheoutcomesofintervenedgeoftherelevantcausalstructure,andareingeneralquitedierentfrompredicvenientcalculusbypearl(1995).arelatedtheorywithoutgraphicalmodelswas 1993).Someoftheseprocedureshavebeenextendedandmadeintoamorecon- developedearlierbyrubin(1974)andothers,andbyrobbins(1986). medicaltreatmentpolicies.accuratepredictionsofthiskindrequiresomeknowl- eachmeasurednumberisalinearcombinationofthetruevalueandanerror,and relationofleaddepositsinchildren'steethwiththeiriqsresulted,eventually, inremovaloftertraethylleadfromgasolineintheunitedstates.onedataset ingthatallofthevariablesweremeasuredwitherror.theirmodelassumesthat signicantregressors,includinglead.klepper(1988)reanalyzedthedataassum- Needlemanexaminedincludedmorethan200subjects,andmeasuredalargenumberofcovariates.Needleman,Geiger,andFrank(1985)re-analyzedthedatausing backwardsstep-wiseregressionofverbaliqonthesevariablesandobtainedsix Considerthefollowingexample.HerbertNeedleman'sfamousstudiesofthecor- thattheparametersofinterestarenottheregressioncoecientsbutratherthe coecientsrelatingtheunmeasured\truevalue"variablestotheunmeasuredtrue valueofverbaliq.thesecoecientsareinfactindeterminate{ineconometricterminology,\unidentiable".anintervalestimateofthecoecientsthatisstrictly positiveornegativeforeachcoecientcanbemade,however,iftheamountof measurementerrorcanbeboundedwithpriorknowledgebyanamountthatvaries tions(usingtetradmethodology)andconcludedthatthreeofthesixregressors couldhavenoinuenceoniq.theregressionincludedthethreeextravariables asstrongasneedleman'sanalysissuggested. fromcasetocase.klepperfoundthattheboundrequiredtoensuretheexistence ofastrictlynegativeintervalestimateforthelead{iqcoecientwasmuchtoo onlybecausethepartialregressioncoecientisestimatedbyconditioningonall stricttobecredible,thusheconcludedthatthecaseagainstleadwasnotnearly permodel,butwithoutthethreeirrelevantvariables,andassigningtoallofthe wrongthingtodoforcausalinferenceusingthemarkovcondition.usingtheklep- otherregressors,whichisjusttherightthingtodoforlinearprediction,butthe parametersanormalpriorprobabilitywithmeanzeroandasubstantialvariance, ScheinesthenusedMarkovchainMonteCarlotocomputeaposteriorprobabilitydistributionforthelead{IQparameter.Theprobabilityisveryhighthatlead Allowingthepossibilityoflatentvariables,Scheines(1996)reanalyzedthecorrela- exposurereducesverbaliq.

16 40 Easyaccesstodataindigitalformandtheavailabilityofsoftwaretoolsforstatisticalanalyseshavemadeitpossibleforthemaninstreettosetupshopand \dostatistics."nowhereisthismoretruetodaythanindatamining.basedon C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH 6.SymbiosisinStatistics assertthat: theargumentsinthisarticle,letusassumethatstatisticsisanecessarybutnot sucientcomponentinthepracticeofdatamining.howwellwillthestatistics professionservethedataminingcommunity?hoerletal.(1993),forexample, applicationsdoinfactdrivemuchofwhatgoesonitstatistics,althoughoftenina Despitethisrathernegativeviewoftherelevanceofstatisticalresearch,real-world veryindirectmanner. Asanexampleconsidertheeldofsignalprocessingandcommunications,anarea sionisintendedforothermembersofthestatisticalprofession. Weareourownbestcustomers.Muchoftheworkofthestatisticalprofes- fromclaudeshannonandothersinthe1940's.likemostoftheothercontributors totheeld,shannonwasnotastatistician,butpossessedadeepunderstanding intoeverydayuseinradioandnetworkcommunicationssystems.modernstatistical relevantstatisticalmethodssuchasestimationanddetectionhavefoundtheirway duetorapidadvancesinboththeoryandhardware,theeldhasexplodedand whereaspecializedsetofrelativelysophisticatedstatisticalmethodsandmodels Engineeringresearchersintheeldareineect\adjunct"statisticians:educated communicationsreectsthesymbiosisofstatisticaltheoryandengineeringpractice. havebeenhonedforpracticaluse.theeldwasdrivenbyfundamentaladvances inprobabilitytheoryandbasicstatisticstheyhavethetoolstoapplystatistical ofprobabilitytheoryanditsapplications.throughthe1950'stothepresent, methodstotheirproblemsofinterest.meanwhilestatisticianscontinuetodevelop speechrecognition(whereforexamplehiddenmarkovmodelsprovidethestate-ofthe-artintheeld),andmostnotably,epidemiology.indeed,ifstatisticscanclaistandstatisticalprinciples,andstatisticiansneedtounderstandthenatureofthe problemsincommunications. moregeneralmodelsandestimationtechniquesofpotentialapplicabilitytonew importantproblemsthatthedataminingcommunityisattackingorbeingasked tohaverevolutionizedanyeld,itisinthebiologicalandhealthscienceswherethe statisticalapproachtodataanalysisgavebirthtotheeldofbiostatistics. Thistypeofsymbiosiscanalsobeseeninotherareassuchasnancialmodelling, toattack.thishasbeenasuccessfulmodelinthepastforeldswherestatistics hashadconsiderableimpactandhasthepotentialtoseeongoingsuccess. Therelevanceofthissymbiosisfordataminingisthatdata-minersneedtounder-

17 STATISTICALTHEMESANDLESSONSFORDATAMINING 41 7.Conclusion Thestatisticalliteraturehasawealthoftechnicalproceduresandresultstooer datamining,butitalsohasafewsimplemethodologicalmorals:provethatestimationandsearchproceduresusedindataminingareconsistentunderconditions reasonablythoughttoapplyinapplications;useandrevealuncertainty,don'thide it;calibratetheerrorsofsearch,bothforhonestyandtotakeadvantagesofmodel averaging;don'tconfuseconditioningwithintervening;andnally,don'ttakethe errorprobabilitiesofhypothesisteststobetheerrorprobabilitiesofsearchprocedures. References Akaike,H.1974.Anewlookatthestatisticalmodelidentication.IEEETrans.Automat. Contr.AC-19:716{723. Berger,J.O.andSellke,T.1987.Testingapointnullhypothesis:theirreconcilabilityofPvalues andevidence(withdiscussion).journaloftheamericanstatisticalassociation82:112{122. Breiman,L.1996.Baggingpredictors.MachineLearning,toappear. Chasno,I.J.,Grith,D.R.,MacGregor,S.,Dirkes,K.,Burns,K.A.1989.Temporalpatterns ofcocaineuseinpregnancy:perinataloutcome.journaloftheamericanmedicalassociation 261(12):1741{4. Chateld,C.1995.Modeluncertainty,datamining,andstatisticalinference(withdiscussion). JournaloftheRoyalStatisticalSociety(SeriesA)158:419{466. Dalal,S.R.,Fowlkes,E.B.andHoadley,B.1989.Riskanalysisofthespaceshuttle:Pre-Challenger predictionoffailure.journaloftheamericanstatisticalassociation84:945{957. Diggle,P.andKenward,M.G.1994.Informativedrop-outinlongitudinaldataanalysis(with discussion).appliedstatistics:43:49{93. Draper,D.,Gaver,D.P.,Goel,P.K.,Greenhouse,J.B.,Hedges,L.V.,Morris,C.N.,Tucker,J., andwaternaux,c.1993.combininginformation:nationalresearchcouncilpanelonstatisticalissuesandopportunitiesforresearchinthecombinationofinformation.washington: NationalAcademyPress. Draper,D.1995.Assessmentandpropagationofmodeluncertainty(withdiscussion).Journalof theroyalstatisticalsociety(seriesb).57:45{97. Efron,B.andTibshirani,R.J.1993.AnIntroductiontotheBoostrap.NewYork:Chapmanand Hall. EnergyModelingForum1982.WorldOil:Summaryreport.EMFReport6,EnergyModeling Forum,StanfordUniversity,Stanford,CA. Fisher,R.A.1958.Statisticalmethodsforresearchworkers.NewYork:HafnerPub.Co. Freedman,D.A.1983.Anoteonscreeningregressionequations.TheAmericanStatistician 37:152{155. Geiger,D.Heckerman,D.,andMeek,C.1996.Asymptoticmodelselectionfordirectednetworkswithhiddenvariables.ProceedingsoftheTwelfthAnnualConferenceonUncertaintyin ArticialIntelligence.SanFrancisco:MorganKaufman. Gilks,W.R.,Richardson,S.,andSpiegelhalter,D.J.1996.MarkovchainMonteCarloinpractice. London:ChapmanandHall. Hand,D.J.1994.Deconstructingstatisticalquestions(withdiscussion).JournaloftheRoyal StatisticalSociety(SeriesA)157:317{356. Hastie,T.J.andTibshirani,R.1990.GeneralizedAdditiveModels.London:ChapmanandHall. Hoerl,R.W.,Hooper,J.H.,Jacobs,P.J.,Lucas,J.M.1993.Skillsforindustrialstatisticiansto surviveandprosperintheemergingqualityenvironment.theamericanstatistician47:280{292. Huber,P.J.1981.RobustStatistics.NewYork:Wiley.

18 42 C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH Jereys,H.1980.Somegeneralpointsinprobabilitytheory.In:A.Zellner(Ed.),Bayesian AnalysisinEconometricsandStatistics.Amsterdam:North-Holland,451{454. Kass,R.E.andRaftery,A.E.1995.Bayesfactors.JournaloftheAmericanStatisticalAssociation 90:773{795. Kiiveri,H.andSpeed,T.P.1982.Structuralanalysisofmultivariatedata:Areview.Sociological Methodology209{289. Kooperberg,C.,Bose,S.,andStone,C.J.1996.Polychotomousregression.JournaloftheAmericanStatisticalAssociation,toappear. Lauritzen,S.L.1996.GraphicalModels.Oxford:OxfordUniversityPress. Leamer,E.E.1978.SpecicationSearches.AdHocInferencewithNonexperimentalData.Wiley: NewYork. Madigan,D.andRaftery,A.E.1994.Modelselectionandaccountingformodeluncertainty ingraphicalmodelsusingoccam'swindow.journaloftheamericanstatisticalassociation 89:1335{1346. Madigan,D.andYork,J.1995.Bayesiangraphicalmodelsfordiscretedata.International StatisticalReview63:215{232. Matheson,J.E.andWinkler,R.L.1976.Scoringrulesforcontinuousprobabilitydistributions. ManagementScience22:1087{1096. McCullagh,P.andNelder,J.A.1989.GeneralizedLinearModels.London:ChapmanandHall. Michelangeli,P.A.,Vautard,R.,andLegras,B.1995.Weatherregimes:recurrenceandquasistationarity.JournaloftheAtmosphericSciences52(8):1237{56. Miller,R.G.Jr.1981.Simultaneousstatisticalinference(SecondEdition).NewYork:Springer- Verlag. Neyman,J.andPearson,E.S.1933.Ontheproblemofthemostecienttestsofstatistical hypotheses.philosophicaltransactionsoftheroyalsociety(seriesa)231:289{337. Raftery,A.E.1995.Bayesianmodelselectioninsocialresearch(withdiscussion).InSociological Methodology(ed.P.V.Marsden),Oxford,U.K.:Blackwells,111{196. Rissanen,J.1978.Modelingbyshortestdatadescription.Automatica14:465{471. Schervish,M.J.1995.TheoryofStatistics,NewYork:SpringerVerlag. Schwartz,G.1978.Estimatingthedimensionofamodel.AnnalsofStatistics6:461{464. Selvin,H.andStuart,A.1966.Datadredgingproceduresinsurveyanalysis.TheAmerican Statistician20(3):20{23. Simpson,C.H.1951.Theinterpretationofinteractionincontingencytables.Journalofthe RoyalStatisticalSociety(SeriesB)13:238{241. Smith,A.F.M.andRoberts,G.1993.BayesiancomputationviatheGibbssamplerandrelated MarkovchainMonteCarlomethods(withdiscussion).JournaloftheRoyalStatisticalSociety (SeriesB)55:3{23. Spirtes,P.,GlymourC.,andScheines,R.1993.Causation,PredictionandSearch,Springer LectureNotesinStatistics,NewYork:SpringerVerlag. Stigler,S.M.1986.Thehistoryofstatistics:Themeasurementofuncertaintybefore1900. Harvard:HarvarduniversityPress. Wen,S.W.,Hernandez,R.,andNaylor,C.D.1995.Pitfallsinnonrandomizedstudies:The caseofincidentalappendectomywithopencholecystectomy.journaloftheamericanmedical Association274:1687{1691. Wright,S.1921.Correlationandcausation.JournalofAgriculturalResearch20:557{585. ReceivedDate AcceptedDate FinalManuscriptDate

UNIVERSITY of TORONTO. Faculty of Arts and Science

UNIVERSITY of TORONTO. Faculty of Arts and Science UNIVERSITY of TORONTO Faculty of Arts and Science AUGUST 2005 EXAMINATION AT245HS uration - 3 hours Examination Aids: Non-programmable or SOA-approved calculator. Instruction:. There are 27 equally weighted

More information

Centralized vs Onsite Monitoring:

Centralized vs Onsite Monitoring: Centralized vs Onsite Monitoring: A Sponsor s Balancing Act Applying a Risk-based Approach Introduction Since the August 2011 release of the draft guidance document by FDA on a risk-based approach to monitoring

More information

Accident Prevention Techniques

Accident Prevention Techniques Topic 9 Accident Prevention Techniques LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Describe Job Hazard Analysis (JHA) as an accident prevention technique; 2. Describe Job Safety

More information

Title: The BCL2-938 C>A promoter polymorphism is associated with risk group classification in children with acute lymphoblastic leukemia

Title: The BCL2-938 C>A promoter polymorphism is associated with risk group classification in children with acute lymphoblastic leukemia Author's response to reviews Title: The BCL2-938 C>A promoter polymorphism is associated with risk group classification in children with acute lymphoblastic leukemia Authors: Annette Kuenkele ([email protected])

More information

PRE/POST TESTS and PRE/POST TEST INSTRUCTOR KEYS

PRE/POST TESTS and PRE/POST TEST INSTRUCTOR KEYS PRE/POST TESTS and PRE/POST TEST INSTRUCTOR KEYS Enclosed are two versions of optional PRIME For Life Pre/Post Tests and Test Keys for your participants. You may use either test with your groups. For accurate

More information

ESI ANNUAL SALARY SURVEY

ESI ANNUAL SALARY SURVEY ESI ANNUAL SALARY SURVEY In order to uncover how public and private sector organizations are going about building and developing their project communities, ESI International conducted the ESI 2013 Project

More information

Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Longitudinal Data Analysis. Wiley Series in Probability and Statistics Brochure More information from http://www.researchandmarkets.com/reports/2172736/ Longitudinal Data Analysis. Wiley Series in Probability and Statistics Description: Longitudinal data analysis for biomedical

More information

VCE Business Management 2013 2015

VCE Business Management 2013 2015 VCE Business Management 2013 2015 Written examination November Examination specifications The following information updates the specifications published in 2010. It reflects a change to the format introduced

More information

Creating Customer Value, Satisfaction, and Loyalty 9/5/2008. Building Customer Value and Satisfaction

Creating Customer Value, Satisfaction, and Loyalty 9/5/2008. Building Customer Value and Satisfaction Chapter 4 Creating Customer Value, Satisfaction, and Loyalty 4-1 Chapter Questions How can companies deliver customer value, satisfaction, and loyalty? What is the lifetime value of a customer, and why

More information

Colocation Services. Retail Colocation as it s meant to be

Colocation Services. Retail Colocation as it s meant to be Colocation Services Retail Colocation as it s meant to be We are an agile business and look for similar organisations we can scale with. Infinity was the perfect choice. Jamie Donnelly Managing Director,

More information

Homework 3 Solution, due July 16

Homework 3 Solution, due July 16 Homework 3 Solution, due July 16 Problems from old actuarial exams are marked by a star. Problem 1*. Upon arrival at a hospital emergency room, patients are categorized according to their condition as

More information

Essential QA Metrics for Determining Solution Quality

Essential QA Metrics for Determining Solution Quality 1.0 Introduction In today s fast-paced lifestyle with programmers churning out code in order to make impending deadlines, it is imperative that management receives the appropriate information to make project

More information

THE PREDICTIVE MODELLING PROCESS

THE PREDICTIVE MODELLING PROCESS THE PREDICTIVE MODELLING PROCESS Models are used extensively in business and have an important role to play in sound decision making. This paper is intended for people who need to understand the process

More information

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 1

Math 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 1 Math 370/408, Spring 2008 Prof. A.J. Hildebrand Actuarial Exam Practice Problem Set 1 About this problem set: These are problems from Course 1/P actuarial exams that I have collected over the years, grouped

More information

Master of Science in Statistics

Master of Science in Statistics Master of Science in Statistics Options: Biometrics Social, Behavioural and Educational Statistics Business Statistics Industrial Statistics General Statistical Methodology All Round Statistics Rubik s

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Sample Script of an Initial Brief Alcohol Counseling Session

Sample Script of an Initial Brief Alcohol Counseling Session Information Sheet for Behavioral Health Providers in Primary Care Sample Script of an Initial Brief Alcohol Counseling Session Introduce the Subject with a Transitional Statement From your answers it appears

More information

Master of Science in Statistics

Master of Science in Statistics Master of Science in Statistics Majors: Biometrics Social, Behavioural and Educational Statistics Business Statistics Industrial Statistics General Statistical Methodology All Round Statistics INTERFACULTY

More information

Getting Started Different Ways of Deleading Other Options and Resources

Getting Started Different Ways of Deleading Other Options and Resources Contents Getting Started Protecting Children from Lead Poisoning page 2 Massachusetts Lead Law page 3 What is Deleading? page 4 Getting Your Home Inspected for Lead page 5 Different Ways of Deleading Low-Risk

More information

Auditorium Acoustics and Architectural Design

Auditorium Acoustics and Architectural Design Auditorium Acoustics and Architectural Design Second Edition Michael Barron. J ^A Spon Press an imprint of Taylor & Francis LONDON AND NEWYORK Contents Preface Preface to the first edition Foreword ix

More information

Does my patient need more therapy after prostate cancer surgery?

Does my patient need more therapy after prostate cancer surgery? Does my patient need more therapy after prostate cancer surgery? Contact the GenomeDx Patient Care Team at: 1.888.792.1601 (toll-free) or e-mail: [email protected] Prostate Cancer Classifier

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

Small employers. Issue Brief. Health Insurance Purchasing Cooperatives. Elliot K.Wicks Economic and Social Research Institute

Small employers. Issue Brief. Health Insurance Purchasing Cooperatives. Elliot K.Wicks Economic and Social Research Institute TASK FORCE ON THE FUTURE OF HEALTH INSURANCE Issue Brief NOVEMBER 2002 Health Insurance Purchasing Cooperatives Elliot K.Wicks Economic and Social Research Institute The Commonwealth Fund is a private

More information

Multinational Comparisons of Health Systems Data, 2014

Multinational Comparisons of Health Systems Data, 2014 Multinational Comparisons of Health Systems Data, 214 Chloe Anderson The Commonwealth Fund November 214 Health Care Spending 2 Dollars ($US) Average Health Care Spending per Capita, 198 212 Adjusted for

More information

Prostate cancer. Christopher Eden. The Royal Surrey County Hospital, Guildford & The Hampshire Clinic, Old Basing.

Prostate cancer. Christopher Eden. The Royal Surrey County Hospital, Guildford & The Hampshire Clinic, Old Basing. Prostate cancer Christopher Eden The Royal Surrey County Hospital, Guildford & The Hampshire Clinic, Old Basing. Screening Screening men for PCa (prostate cancer) using PSA (Prostate Specific Antigen blood

More information

MBA PROGRAMME: 2015. Appendix 1 FINANCE AND RESPONSIBLE INVESTMENT SUBJECT CODE: CMBC 191

MBA PROGRAMME: 2015. Appendix 1 FINANCE AND RESPONSIBLE INVESTMENT SUBJECT CODE: CMBC 191 MBA PROGRAMME: 2015 Appendix 1 FINANCE AND RESPONSIBLE INVESTMENT STUDY GUIDE AND COURSE OUTLINE SUBJECT CODE: CMBC 191 1. Lecturing Dates February 7 February 27 March 27 April 17 May 17 May 22 2. Module

More information

How To Understand Predictive Analysis And Data Mining

How To Understand Predictive Analysis And Data Mining DATA MINING AND PREDICTIVE ANALYSIS PDF ==> Download: DATA MINING AND PREDICTIVE ANALYSIS PDF DATA MINING AND PREDICTIVE ANALYSIS PDF - Are you searching for Data Mining And Predictive Analysis Books?

More information

Curriculum Vitae: Raul J. Cano, Ph.D.

Curriculum Vitae: Raul J. Cano, Ph.D. CurriculumVitae:RaulJ.Cano,Ph.D. I.PERSONALINFORMATION NAME: RaulJ.Cano OFFICEADDRESS: BiologicalSciencesDepartment,53 210E CaliforniaPolytechnicStateUniversity SanLuisObispo,CA93407 OFFICETELEPHONE: (805)756

More information

Social Networks and their Economics. Influencing Consumer Choice. Daniel Birke

Social Networks and their Economics. Influencing Consumer Choice. Daniel Birke Social Networks and their Economics Influencing Consumer Choice Daniel Birke Visiting Researcher, Aston Business School, Birmingham, and works in a leading international management consultancy in Germany.

More information

Trends in Publicly Reported Nursing Facility Quality Measures

Trends in Publicly Reported Nursing Facility Quality Measures Trends in Publicly Reported Nursing Facility Quality Measures American Health Care Association Reimbursement and Research Department January 2011 Trends in Publicly Reported Nursing Facility Quality Measures

More information

Radiation Therapy for Prostate Cancer: Treatment options and future directions

Radiation Therapy for Prostate Cancer: Treatment options and future directions Radiation Therapy for Prostate Cancer: Treatment options and future directions David Weksberg, M.D., Ph.D. PinnacleHealth Cancer Institute September 12, 2015 Radiation Therapy for Prostate Cancer: Treatment

More information

Quality Scorecard overall heart attack care overall heart failure overall pneumonia care overall surgical infection rate patient safety survival

Quality Scorecard overall heart attack care overall heart failure overall pneumonia care overall surgical infection rate patient safety survival Quality Scorecard s are required to report quality statistics to the s for Medicare and Medicaid Services (CMS) and the Department of Health (DOH). This information is made available at www.hospitalcompare.hhs.gov

More information

Atherosclerosis of the aorta. Artur Evangelista

Atherosclerosis of the aorta. Artur Evangelista Atherosclerosis of the aorta Artur Evangelista Atherosclerosis of the aorta Diagnosis Classification Prevalence Risk factors Marker of generalized atherosclerosis Risk of embolism Therapy Diagnosis Atherosclerosis

More information

Julio is [it] the best option?

Julio is [it] the best option? BEG_CTRL_NUM : DONZ000043764 END_CTRL_NUM : DONZ000043764 DATESENT = July 11, 2007 TIMESENT = 3:20:43 pm RECEIVEDDATE = July 11, 2007 TIMERECEIVED = 3:20:43 pm FILENAME : Re: seguro para el wao.msg SUBJECT

More information

Core Music Curriculum General Education

Core Music Curriculum General Education Department of Music BA Degree, Major in Music: 120 hours BM Degree, Major in Music Education: 126 hours BM Degree, Major in Performance: 120 hours College of Arts and Architecture UNC Charlotte www.music.uncc.edu

More information

Creating Strategic Alliances for Post-Acute Coordination of Care

Creating Strategic Alliances for Post-Acute Coordination of Care Creating Strategic Alliances for Post-Acute Coordination of Care Kathleen Yosko, PhD President/CEO Wheaton Franciscan Health Care Sole Illinois property Free-standing facility 101 IRF beds 27 SNF beds

More information

An Introduction to Advanced Analytics and Data Mining

An Introduction to Advanced Analytics and Data Mining An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit

More information

Test your knowledge on risk. Fill in the box for the correct answer for each question or statement.

Test your knowledge on risk. Fill in the box for the correct answer for each question or statement. Test your knowledge on risk. Fill in the box for the correct answer for each question or statement. 1 2 Which 3 Which 4 The 5 In Which statement(s) describe the relationship between risk and insurance?

More information

Multinomial Logistic Regression

Multinomial Logistic Regression Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

More information

TABLE OF CONTENTS BACKGROUND AND INTRODUCTION... 5 PURPOSE... 5 SCOPE... 6 RISK ASSESSMENT PROCESS... 6

TABLE OF CONTENTS BACKGROUND AND INTRODUCTION... 5 PURPOSE... 5 SCOPE... 6 RISK ASSESSMENT PROCESS... 6 TABLE OF CONTENTS BACKGROUND AND INTRODUCTION... 5 PURPOSE... 5 SCOPE... 6 RISK ASSESSMENT PROCESS... 6 RISK ASSESSMENT AND EVALUATION METHODOLOGY... 6 RESULTS... 8 RISK ASSESSMENT GAPS... 9 RISK ASSESSMENT

More information

Plugging Premium Leakage

Plugging Premium Leakage Plugging Premium Leakage Using Analytics to Prevent Underwriting Fraud WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Types of Underwriting Fraud... 1 Application Fraud/Rate Manipulation....

More information

Decision & Risk Analysis Lecture 6. Risk and Utility

Decision & Risk Analysis Lecture 6. Risk and Utility Risk and Utility Risk - Introduction Payoff Game 1 $14.50 0.5 0.5 $30 - $1 EMV 30*0.5+(-1)*0.5= 14.5 Game 2 Which game will you play? Which game is risky? $50.00 Figure 13.1 0.5 0.5 $2,000 - $1,900 EMV

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Milwaukee County Early Intervention Program

Milwaukee County Early Intervention Program Milwaukee County Early Intervention Program National Symposium on Pretrial Diversion Strengthening the Evidence-Based Framework Washington D.C. May 30, 2012 District Attorney John T. Chisholm First Assistant

More information

Building flexible, easy to change and rock-solid applications with BRFplus decision services. Carsten Ziegler, James Taylor

Building flexible, easy to change and rock-solid applications with BRFplus decision services. Carsten Ziegler, James Taylor [ Building flexible, easy to change and rock-solid applications with BRFplus decision services Carsten Ziegler, James Taylor [ Learning Points Learn how the empowerment of business experts is built into

More information

Waterfall vs. Agile Project Management

Waterfall vs. Agile Project Management Lisa Sieverts, PMP, PMI-ACP Phil Ailes, PMI-ACP Agenda What is a Project Overview Traditional Project Management Agile Project Management The Differences Product Life Cycle The Teams Requirements WBS/Product

More information

Project Management in a Multi-Environment Ken Halloway, PMP, ITIL 21 October 2015

Project Management in a Multi-Environment Ken Halloway, PMP, ITIL 21 October 2015 Project Management in a Multi-Environment Ken Halloway, PMP, ITIL 21 October 2015 www.pmihr.org 1 What Am I Talking About? www.pmihr.org 2 Project www.pmihr.org 3 Lifecycle Initiating Planning Executing

More information

Information asymmetries

Information asymmetries Adverse selection 1 Repeat: Information asymmetries Problems before a contract is written: Adverse selection i.e. trading partner cannot observe quality of the other partner Use signaling g or screening

More information

Life expectancy of children with cerebral palsy

Life expectancy of children with cerebral palsy Life expectancy of children with cerebral palsy J L Hutton, K Hemming and UKCP collaboration What is UKCP? Information about the physical effects of cerebral palsy on the everyday lives of children and

More information

Administrative Measures of Settlement Reserve Funds by China Securities Depository and Clearing Corporation Limited

Administrative Measures of Settlement Reserve Funds by China Securities Depository and Clearing Corporation Limited Administrative Measures of Settlement Reserve Funds by China Securities Depository and Clearing Corporation Limited Article 1: In order to prevent and remove the securities transactions clearing and settlement

More information

Rockford s map update project is a joint effort with FEMA in cooperation with local associations and other state partners.

Rockford s map update project is a joint effort with FEMA in cooperation with local associations and other state partners. FREQUENTLY ASKED QUESTIONS 1. Why is Rockford getting new flood hazard maps? Flood hazard maps, also known as Flood Insurance Rate Maps (FIRMs), are important tools in the effort to protect lives and properties

More information

The Entrepreneur s Guide to Financial Maturity Factoring - Financing for Companies Seeking Fast Cash

The Entrepreneur s Guide to Financial Maturity Factoring - Financing for Companies Seeking Fast Cash The Entrepreneur s Guide to Financial Maturity Factoring - Financing for Companies Seeking Fast Cash A healthy cash flow is an essential part of any successful business. Some entrepreneurs claim that a

More information

Copyright 2009 Pearson Education Canada

Copyright 2009 Pearson Education Canada The consequence of failing to adjust the discount rate for the risk implicit in projects is that the firm will accept high-risk projects, which usually have higher IRR due to their high-risk nature, and

More information

International Services

International Services International Services Consistently ranked as one of the best hospitals in the United States by U.S.News & World Report, patients from around the world travel to UCSF Medical Center and UCSF Benioff Children

More information

Sun Li Centre for Academic Computing [email protected]

Sun Li Centre for Academic Computing lsun@smu.edu.sg Sun Li Centre for Academic Computing [email protected] Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic

More information

Sample Size Designs to Assess Controls

Sample Size Designs to Assess Controls Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference

More information

Doctorates in Occupational Safety and Health: A Critical Shortage

Doctorates in Occupational Safety and Health: A Critical Shortage Doctorates in Occupational Safety and Health: A Critical Shortage By Anthony Veltri, Ed.D., MS, CSHM and Jim Ramsay, Ph.D., MA, CSP Contact Information: Anthony Veltri, Ed.D., MS, CSHM Associate Professor

More information

UNIT-LINKED LIFE INSURANCE CONTRACTS WITH INVESTMENT GUARANTEES A PROPOSAL FOR ROMANIAN LIFE INSURANCE MARKET

UNIT-LINKED LIFE INSURANCE CONTRACTS WITH INVESTMENT GUARANTEES A PROPOSAL FOR ROMANIAN LIFE INSURANCE MARKET UNIT-LINKED LIFE INSURANCE CONTRACTS WITH INVESTMENT GUARANTEES A PROPOSAL FOR ROMANIAN LIFE INSURANCE MARKET Cristina CIUMAŞ Department of Finance, Faculty of Economics and Business Administration, Babeş-Bolyai

More information

Time s Up: DCAA s Renewed Focus on Incurred Cost Submissions

Time s Up: DCAA s Renewed Focus on Incurred Cost Submissions Time s Up: DCAA s Renewed Focus on Incurred Cost Submissions Nicole Mitchell, CPA Donna Dominguez Aronson LLC May 1, 2013 2013 All Rights Reserved 805 King Farm Boulevard Suite 300 Rockville, Maryland

More information

WORKING CAPITAL MANAGEMENT OF BAJAJ AUTO LTD. WITH SPECIAL REFERENCE TO AUTOMOBILE INDUSTRY.

WORKING CAPITAL MANAGEMENT OF BAJAJ AUTO LTD. WITH SPECIAL REFERENCE TO AUTOMOBILE INDUSTRY. International Journal of Entrepreneurship and Management Research Vol. 1 No. 1 (January-June 2011) pp. 63-71 WORKING CAPITAL MANAGEMENT OF BAJAJ AUTO LTD. WITH SPECIAL REFERENCE TO AUTOMOBILE INDUSTRY.

More information

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Gordon K. Smyth & Belinda Phipson Walter and Eliza Hall Institute of Medical Research Melbourne,

More information

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does

More information

The Use of M&S VV&A as a Risk Mitigation Strategy in Defense Acquisition

The Use of M&S VV&A as a Risk Mitigation Strategy in Defense Acquisition The Use of M&S VV&A as a Risk Mitigation Strategy in Defense Acquisition Michelle Kilikauskas Joint Accreditation Support Activity NAVAIR Weapons Division China Lake, CA 93555 [email protected]

More information

FP7-ICT-2013-11-4.2. Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time)

FP7-ICT-2013-11-4.2. Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time) Scalable Data Analytics Deadline: 16 April 2013 at 17:00:00 (Brussels local time) Agenda Time 14H30 Programme Overview of Objective 4.2 Scalable Data Analytics By Carola Carstens, European Commission,

More information

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions

More information