26onetodescriberelationshipsbetweenvariablesforprediction,quantifyingeects,or
|
|
- Penelope Nash
- 8 years ago
- Views:
Transcription
1 StatisticalThemesandLessonsforDataMining c1996kluweracademicpublishers,boston.manufacturedinthenetherlands. DataMiningandKnowledgeDiscovery,1,25{42(1996) CLARKGLYMOUR DepartmentofCognitivePsychology,CarnegieMellonUniversity,Pittsburgh,PA15213 DAVIDMADIGAN DepartmentofStatistics,Box354322,UniversityofWashington,Seattle,WA98195 DARYLPREGIBON PADHRAICSMYTH somestatisticalthemesandlessonsthataredirectlyrelevanttodataminingandattemptstoidentifyopportunitieswhereclosecooperationbetweenthestatisticalandcomputationalcommunities inbothdisciplinestomakeprogressinextractinginformationfromlargedatabases.itisanemergingeldthathasattractedmuchattentioninaveryshortperiodoftime.thisarticlehighlights InformationandComputerScience,UniversityofCalifornia,Irvine,CA92717 Editor:UsamaFayyad Abstract.DataminingisontheinterfaceofComputerScienceandStatistics,utilizingadvances mightreasonablyprovidesynergyforfurtherprogressindataanalysis. Keywords:Statistics,uncertainty,modeling,bias,variance 1.Introduction softwarehavefreedthestatisticianfromnarrowlyspeciedmodelsandspawned statisticaltoolkitdrawsonarichbodyoftheoreticalandmethodologicalresearch (Table1). afreshapproachtothesubject,especiallyasitrelatestodataanalysis.today's Statisticsisenjoyingarenaissanceperiod.Moderncomputinghardwareand andinterpretationofnumericaldata,especiallytheanalysisofpopulation characteristicsbyinferencefromsampling.(americanheritagedictionary). Sta-tis-tics(noun).Themathematicsofthecollection,organization, or\turningdataintoinformation".thecontextencompassesstatistics,butwith asomewhatdierentemphasis.inparticular,datamininginvolvesretrospective analysesofdata:thus,topicssuchasexperimentaldesignareoutsidethescopeof estedinunderstandabilitythanaccuracyorpredictabilityperse.thus,thereisa soforth.applicationsinvolvingverylargenumbersofvariablesandvastnumbers focusonrelativelysimpleinterpretablemodelsinvolvingrules,trees,graphs,and dataminingandfallwithinstatisticsproper.dataminersareoftenmoreinter- ofmeasurementsarealsocommonindatamining.thus,computationaleciency Theeldofdatamining,likestatistics,concernsitselfwith\learningfromdata"
2 26onetodescriberelationshipsbetweenvariablesforprediction,quantifyingeects,or Table1.Statisticianshavedevelopedalargeinfrastructure(theory)tosupporttheir theuncertaintyassociatedwithdrawinginferencesfromdata.thesemethodsenable methodsandalanguage(probabilitycalculus)todescribetheirapproachtoquantifying C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH AreaofStatistics experimentaldesign&samplinghowtoselectcasesifonehasthelibertytochoose suggestingcausalpaths. exploratorydataanalysis DescriptionofActivities andscalabilityarecriticallyimportant,andissuesofstatisticalconsistencymay beasecondaryconsideration.furthermore,thecurrentpracticeofdataminingis statisticalgraphics statisticalmodeling statisticalinference hypothesisgenerationratherthanhypothesistesting datavisualization regressionandclassicationtechniques (suchasanyofthemanyruleinductionsystemsonthemarket)willproducesets oftenpattern-focusedratherthanmodel-focused,i.e.,ratherthanbuildingacoherentglobalmodelwhichincludesallvariablesofinterest,dataminingalgorithms estimationandpredictiontechniques ticalcomputationalconcerns.however,infocusingalmostexclusivelyoncomputa- tionalissues,itiseasytoforgetthatstatisticsisinfactacorecomponent.theterm thefundamentalstatisticalnatureoftheinferenceproblemisindeedtobeavoided. andstuart,1966;chateld,1995).dataminingwithoutproperconsiderationof \datamining"haslonghadnegativeconnotationsinthestatisticsliterature(selvin However,agoalofthisarticleistoconvincethereaderthatmodernstatisticscan Inthisoverallcontext,currentdataminingpracticeisverymuchdrivenbyprac- ofstatementsaboutlocaldependenciesamongvariables(inruleform). oersignicantconstructiveadvicetothedataminer,althoughmanyproblemsremainunsolved.throughoutthearticlewehighlightsomemajorthemesofstatistics todatamining.forarigoroussurveyofstatistics,themathematicallyinclined research,focusinginparticularonthepracticallessonspertinenttodatamining. anumberofinterestingtopics,includingtimeseriesanalysisandmeta-analysis. readershouldsee,forexample,schervish(1995).forreasonsofspacewewillignore 2.AnOverviewofStatisticalScience ThisSectionbrieydescribessomeofthecentralstatisticalideaswethinkrelevant marginalization(summingoverasubsetofvalues)andconditionalization(forming characterizationsofawealthofprobabilitydistributions,aswellaspropertiesof sureassignsvalues.importantrelationsamongprobabilitydistributionsinclude randomvariables{functionsdenedonthe\events"towhichaprobabilitymea- ProbabilityDistributions.Thestatisticalliteraturecontainsmathematical
3 aconditionalprobabilitymeasurefromameasureonasamplespaceandsome eventofpositivemeasure).essentialrelationsamongrandomvariablesinclude STATISTICALTHEMESANDLESSONSFORDATAMINING independence,conditionalindependence,andvariousmeasuresofdependence,of anyparticularmemberofthefamilyfromdata,orbyclosurepropertiesusefulin characterizesfamiliesofdistributionsbypropertiesthatareusefulinidentifying whichthemostfamousisthecorrelationcoecient.thestatisticalliteraturealso 27 modelconstructionorinference,forexampleconjugatefamilies,closedunderconditionalization,andthemultinormalfamily,closedunderlinearcombination.a aprobabilitydistribution.classicalstatisticsinvestigatessuchdistributionsof ofestimatorscorrespondingtoallpossiblesamplesfromthatcollectionalsohas actualorpotentialcollectiongovernedbysomeprobabilitydistribution,thefamily dataandmakingappropriateinferences. knowledgeofthepropertiesofdistributionfamiliescanbeinvaluableinanalyzing estimatorsinordertoestablishbasicpropertiessuchasreliabilityanduncertainty. Avarietyofresamplingandsimulationtechniquesalsoexistforassessingestimator uncertainty(efronandtibshirani,1993). ModelAveraging.Anestimatorisafunctionfromsampledatatosomeestimand, suchasthevalueofaparameter.whenthedatacompriseasamplefromalarger Estimation,Consistency,Uncertainty,Assumptions,Robustness,and aretypicallyfalse,butoftenuseful.ifamodel(whichwecanthinkofasasetof assumptions)isincorrect,estimatesbasedonitcanbeexpectedtobeincorrect aswell.oneoftheaimsofstatisticalresearchistondwaystoweakenthe assumptionsnecessaryforgoodestimation.\robuststatistics"(huber,1981) looksforestimatorsthatworksatisfactorilyforlargerfamiliesofdistributionsand havesmallerrorswhenassumptionsareviolated. Estimationalmostalwaysrequiressomesetofassumptions.Suchassumptions sumptionsareoftenplausible.ratherthanmakinganestimatebasedonasingle model,severalmodelscanbeconsideredandanestimateobtainedastheweighted Carloanalysis.Ourimpressionisthattheerrorratesofsearchproceduresproposed 1994).Infact,suchBayesianmodelaveragingisboundtoimprovepredictiveperformance,onaverage.Sincethemodelsobtainedindataminingareusuallythe resultsofsomeautomatedsearchprocedure,accountingforthepotentialerrors Bayesianestimationemphasizesthatalternativemodelsandtheircompetingas- averageoftheestimatesgivenbytheindividualmodels(madiganandraftery, associatedwiththesearchitselfiscrucial.inpractice,thisoftenrequiresamonte hypothesistestingisinconsistentunlessthealphalevelofthetestingruleisdecreasedappropriatelyasthesamplesizeincreases.generally,anleveltestofone hypothesisandanleveltestofanotherhypothesisdonotjointlyprovidean leveltestoftheconjunctionofthetwohypotheses.inspecialcases,rules(some- andusedinthedataminingandinthestatisticalliteraturearefartoorarelyesti- matedinthisway.(seespirtesetal.,1993formontecarlotestdesignforsearch portantlimitationsshouldbenoted.viewedasaone-sidedestimationmethod, procedures.) HypothesisTesting.Sincestatisticaltestsarewidelyused,someoftheirim-
4 28 oferroneouslyndingsomedependentsetofvariableswheninfactallpairsare testingaseriesofhypothesis.if,forexample,foreachpairofasetofvariables, timescalledcontrasts)existforsimultaneouslytestingseveralhypotheses(miller, hypothesesofindependencearetestedat=0:05,then0.05isnottheprobability ingdirectlytodowiththeprobabilityoferrorinasearchprocedurethatinvolves 1981).Animportantcorollaryfordataminingisthatthelevelofatesthasnoth- C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH independent.thus,indataminingproceduresthatuseasequenceofhypothesis tests,thealphalevelofthetestscannotgenerallybetakenasanestimateofany nomatterhowcloselytheyseemtotthedata. ples;testsoflinearmodels,forexample,typicallyrejecttheminverylargesamples errorprobabilityrelatedtotheoutcomeofthesearch. dowiththetruthofhypotheses,theconnectionissomewhattenuous(seesection 5.3).Hypothesesthatareexcellentapproximationsmayberejectedinlargesam- Dataminersshouldnotethatwhileerrorprobabilitiesoftestshavesomethingto correspondstoapreferenceorderingoverthespaceofmodels,giventhedata.for thereasonsjustconsidered,scoringrulesareoftenanattractivealternativetotests. modelsorhypothesestoothers,andtobeindierentbetweenstillothermodels.a InformationCriterion(Raftery,1995),andMinimumDescriptionlength(Rissanen, scoreisanyrulethatmapsmodelsanddatatonumberswhosenumericalordering withthemodel,thenumberofparameters,ordimension,ofthemodel,andthe Typicalrulesassignmodelsavaluedeterminedbythelikelihoodfunctionassociated data.popularrulesincludetheakaikeinformationcriterion(akaike,1974),bayes ModelScoring.Theevidenceprovidedbydatashouldleadustoprefersome onthedataisitselfascoringfunction,arguablyaprivilegedone.thebayes InformationCriterionapproximatesposteriorprobabilitiesinlargesamples. 1978).Givenapriorprobabilitydistributionovermodels,theposteriorprobability modelspacetocalculatescoresforallmodels;itis,however,oftenfeasibleto samemodel,butevendierentorderingsofmodels. fromthesamedistributionmayyieldnotonlydierentnumericalvaluesforthe uncertaintiesassociatedwithscores,sincetwodierentsamplesofthesamesize scores.aicscoresarenot,ingeneral,consistent(schwartz,1978).therearealso plelimit,almostsurelythetruemodelshouldbeamongthosereceivingmaximal Forobviouscombinatorialreasons,itisoftenimpossiblewhensearchingalarge Thereisanotionofconsistencyappropriatetoscoringrules;inthelargesam- describeandcalculatescoresforafewequivalenceclassesofmodelsreceivingthe highestscores. inmontecarlomethodshave,however,liberatedanalystsfromsomeofthesecon- Bayesianmodelsandcomplexlikelihoodcalculations.Recentdramaticadvances dicultiesforceddataanalyststoeschewexactanalysisofelaboratehierarchical frominferencesmadewithhypothesistests.raftery(1995)givesexamplesofmodelsthataccountforalmostallofthevarianceofanoutcomeofinterest,andhave veryhighbayesianscores,butareoverwhelminglyrejectedbystatisticaltests. Insomecontexts,inferencesmadeusingBayesianscorescandieragreatdeal MarkovChainMonteCarlo.Historically,insurmountablecomputational
5 straints.oneparticularclassofsimulationmethods,dubbedmarkovchainmonte STATISTICALTHEMESANDLESSONSFORDATAMINING Carlo,originallydevelopedinstatisticalmechanics,hasrevolutionizedthepractice ofbayesianstatistics.smithandroberts(1993)provideanaccessibleoverview fromthebayesianperspective;gilksetal.(1996)provideapracticalintroduction addressingbothbayesianandnon-bayesianperspectives. Simulationmethodsmaybecomeunacceptablyslowwhenfacedwithmassive 29 GeneralizedLinearModels,forinstance,embracemanyclassicallinearmodels,and calresearchhasbeenthedevelopmentofverygeneralandexiblemodelclasses. seeforexamplekooperbergetal.(1996),kassandraftery(1995),andgeigeret al.(1996). unifyestimationandtestingtheoryforsuchmodels(mccullaghandnelder,1989). GeneralizedAdditiveModelsshowsimilarpotential(HastieandTibshirani,1990). datasets.insuchcases,recentadvancesinanalyticapproximationsproveuseful- Graphicalmodels(Lauritzen,1996)representprobabilisticandstatisticalmodels fordescribingmodelsandthegraphsthemselvesmakemodelingassumptionsexplicit.graphicalmodelsprovideimportantbridgesbetweenthevaststatistical analysis,anddatamining. withplanargraphs,wheretheverticesrepresent(possiblylatent)randomvariables andtheedgesrepresentstochasticdependences.thisprovidesapowerfullanguage Generalizedmodelclasses.Amajorachievementofstatisticalmethodologi- literatureonmultivariateanalysisandsucheldsasarticialintelligence,causal etc.typically,rationaldecisionmakingandplanningarethegoalsofdatamining, Givenallofthisinformation,adecisionrulespecieswhichofthealternativeactionsoughttobetaken.Alargeliteratureinstatisticsandeconomicsaddresses alternativedecisionrules{maximizingexpectedutility,minimizingmaximumloss, sumesthedecisionmakerhasavailableadenitesetofalternativeactions,knowl- edgeofadenitesetofpossiblealternativestatesoftheworld,knowledgeofthe RationalDecisionMakingandPlanning.Thetheoryofrationalchoiceas- theworld,andknowledgeoftheprobabilitiesofvariouspossiblestatesoftheworld. payosorutilitiesoftheoutcomesofeachpossibleactionineachpossiblestateof rationalchoiceposesnormsfortheuseofinformationobtainedfromadatabase. andratherthanprovidingtechniquesormethodsfordatamining,thetheoryof knowledgeoftheeectsalternativeactionswillhave.toknowtheoutcomesof ofbernoulliandlaplace,theabsenceofcausalconnectionbetweentwovariables actionsistoknowsomethingofcauseandeectrelations,andextractingsuch causalinformationisoftenoneoftheprinciplegoalsofdataminingandofstatisticalinferencemoregenerally. historicaldevelopmentofstatistics.fromthebeginningofthesubject,inthework Theveryframeworkofrationaldecisionmakingrequiresprobabilitiesanda hasbeentakentoimplytheirprobabilisticindependence(seestigler,1986),and thesameideaisfundamentalinthetheoryofexperimentaldesign(fisher,1958). Earlyinthiscentury,Wright(1921)introduceddirectedgraphstorepresentcausal hypotheses(withverticesasrandomvariablesandedgesrepresentingdirectinu- InferencetoCauses.Understandingcausationisthehiddenforcebehindthe
6 30 socialsciences,biology,computerscienceandengineering. ences),andtheyhavebecomecommonrepresentationsofcausalhypothesesinthe betweenindependenceandabsenceofcausalconnectioninwhattheycalledthe Markovcondition:providedYisnotaneectofX,XandYareconditionally independentgiventhedirectcausesofx.theyshowedthatmuchofthelinear KiiveriandSpeed(1982)combineddirectedgraphswithageneralizedconnection C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH causalmodelsofcategoricaldata,andvirtuallyallcausalmodelsofsystemswithoutfeedback.underadditionalassumptions,conditionalindependencetherefore modelingliteraturetacitlyassumedthemarkovcondition;thesameistruefor manysourcesoferroranddataminersshouldproceedwithextremecaution. tributionssatisfyingthemarkovconditionarecalledbydierentnamesindierent names,including\faithfulness."directedgraphswithassociatedprobabilitydis- literatures:bayesnets,beliefnets,structuralequationmodels,pathmodels,etc. oughlyinvestigated,additionalassumptionisthatallconditionalindependencies Nonetheless,causalinferencesfromuncontrolledconveniencesamplesareliableto providesinformationaboutcausaldependence.themostcommon,andmostthortualcausalprocessesgeneratingthedata,arequirementthathasbeengivenmany areduetothemarkovconditionappliedtothedirectedgraphdescribingtheacpliedbyhumanexperts,orinferredfromthedatabaseautomatically.regression, probabilitydistribution.indataminingcontexts,structureistypicallyeithersup- obtainedfromthesameprobabilitydistribution.aswithestimation,inprediction varianceofthepredictor. weareinterestedbothinreliabilityandinuncertainty,oftenmeasuredbythe predictpropertiesofanewsample,whereitisassumedthatthetwosamplesare forexample,assumesaparticularfunctionalformrelatingvariables.structurecan Predictionmethodsforthissortofproblemalwaysassumesomestructureinthe Prediction.Sometimesoneisinterestedinusingasample,oradatabase,to bealsobespeciedintermsofconstraints,suchasindependence,conditionalindependence,higherorderconditionsoncorrelations,etc.onaverage,aprediction methodthatguaranteessatisfactionoftheconstraintsrealizedintheprobability distribution{andnoothers{willbemoreaccurateandhavesmallervariancethan Inthemid1960's,thestatisticscommunityreferredtounfetteredexplorationof 3.IsDataMining\StatisticalDejaVu"(AllOverAgain)? bymodelaveraging,providedthepriorprobabilitiesofthealternativeassumptions imposedbythemodelareavailable. cultissueinthissortofprediction.aswithestimation,predictioncanbeimproved onethatdoesnot.findingtheappropriateconstraintstosatisfyisthemostdi- arguedthatsincetheirtheorieswereinvalidatedby\lookingatthedata",itwas enamoredbyelegant(analytical)mathematicalsolutionstoinferentialproblems, wrongtodoso.themajorproponentoftheexploratorydataanalysis(eda) dataas\shing"or\datadredging"(selvinandstuart,1966).thecommunity, school,j.w.tukey,counteredthisargumentwiththeobviousretortthatstatis-
7 ticianswereputtingthecartbeforethehorse.hearguedthatstatisticaltheory STATISTICALTHEMESANDLESSONSFORDATAMINING anddevisingformalmethodstoaccountforsearchintheirinferentialprocedures. shouldadapttothescienticmethodratherthantheotherwayaround.thirty yearshence,thestatisticalcommunityhaslargelyadoptedtukey'sperspective, andhasmadeconsiderableprogressinservingbothmasters,namelyacknowledgingthatmodelsearchisacriticalandunavoidablestepinthemodelingprocess, 31 minersare:clarityaboutgoals,appropriatereliabilityassessment,andadequate ticularlychallengingindynamicsituations).inyetothercases,dataanalysisaims accountingforsourcesofuncertainty. Inothercases,dataanalysisaimstopredictfeaturesofnewcases,ornewsamples, drawnfromoutsidethedatabaseusedtodevelopapredictivemodel(thisispar- computablerepresentationofhowthedataaredistributedinaparticulardatabase. Threethemesofmodernstatisticsthatareoffundamentalimportancetodata fromwhichthemodel(ormodels)weredeveloped.eachofthesegoalspresent causalmechanismsthatareusedtoformpredictionsaboutnewsamplesthatmight toprovideabasisforpolicy.thatis,theanalysisisintendedtoyieldinsightinto beproducedbyinterventionsoractionsthatdidnotapplyintheoriginaldatabase Clarityaboutgoals.Sometimesdataanalysisaimstondaconvenient,easily distinctinferenceproblems,withdistincthazards.confusingorequivocatingover theaiminvitestheuseofinappropriatemethodsandmayresultinunfortunate usewillresultinimprovedobstetricoutcome".fortunately,thereexistsindependentevidencetosupportthiscausalclaim.however,muchofchasnoetal.'spaper focusesonastatisticalanalysis(analysisofvariance)thathaslittle,ifanything,to dowiththecausalquestionofinterest. (1989)comparingbabiesborntococaine-usingmotherswithbabiesborntononcocaine-usingmothers.Theauthorsconcluded:\Forwomenwhobecomepregnant Asanexample,considertheobservationalstudyreportedbyChasnoetal. andareusersofcocaine,interventioninearlypregnancywithcessationofcocaine predictionsandinferences. particulartreatment(diggleandkenward,1994).inthiscase,theimportantissue analyzingclinicaltrialdatawherepatientsdropoutduetoadverseside-eectsofa thepopulationwhoremainwithinthetrial?thisproblemarisesinmoregeneral settingsthaninclinicaltrials,e.g.,non-respondents(refusers)insurveydata.in answer. iswhichpopulationisoneinterestedinmodelling?thepopulationatlargeversus rightanswerstothewrongquestion.forexample,hediscussestheproblemof suchsituationsitisimportanttobeexplicitaboutthequestionsoneistryingto Hand(1994)providesaseriesofexamplesillustratinghoweasyitistogivethe problemsothattherightquestioncanbeasked?hand'sconclusionisthatthis islargelyan\art"becauseitislesswellformalizedthanthemathematicaland thatofformulatingstatisticalstrategyi.e.,howdoesonestructureadataanalysis computationaldetailsofapplyingaparticulartechnique.this\art"isgained throughexperience(atpresentatleast)ratherthantaught.theimplicationfor Inthisgeneralcontextanimportantissue(discussedatlengthinHand(1994))is
8 32 dataminingisthathumanjudgementisessentialformanynon-trivialinference problems.thus,automationcanatbestonlypartiallyguidethedataanalysis oftendicult,process. theuser(andconsumer)understandsandndsplausibleinthecontext. process.properlydeningthegoalsofananalysisremainsahuman-centred,and Useofmethodsthatarereliablemeanstothegoal,underassumptions C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH Statisticaltheoryappliesseveralmeaningstotheword\Reliability",manyofwhich alsoapplytomodelsearch.forexample,underwhatconditionsdoesasearch procedureprovidecorrectinformation,ofthekindsought,withprobabilityone asthesamplesizeincreaseswithoutbound?answerstosuchquestionsareoften available,thedataanalystshouldpaycarefulattentiontothereasonablenessof elusiveandcanrequiresophisticatedmathematicalanalysis.whereanswersare underlyingassumptions.anotherkeydataminingquestionisthis:whatarethe probabilitiesofvariouskindsoferrorsthatresultfromusingamethodinnite samples?theanswerstothisquestionwilltypicallyvarywiththekindsoferrors considered,withthesamplesize,andwiththefrequencyofoccurrenceofthevarious pellingexample. orthecorrectprediction.thedataanalystmustquantifytheseuncertaintiesso shouldleavetheinvestigatorwitharangeofuncertaintiesaboutthecorrectmodel, kindsoftargetsorsignalswhosedescriptionisthegoalofinference.thesequestions areoftenbestaddressedbymontecarlomethods,althoughinsomecasesanalytic thatsubsequentdecisionscanbeappropriatelyhedged.section4providesacomgroundknowledgeandeventhebestmethodsofsearchandstatisticalassessment resultsmaybeavailable. questioniswhetherornotspecicrecurrentpressurepatternscanbeclearlyidentiedfromdailygeopotentialheightrecordswhichhavebeencompiledinthe Anotherexampleinvolvesacurrentdebateintheatmosphericsciences.The Asenseoftheuncertaintiesofmodelsandpredictions.Quiteoftenback- NorthernHemispheresince1948.Theexistenceofwell-denedrecurrentpatterns modelsviaresamplingtechniques,itisdiculttoinferfromthemultiplestudies (or\regimes")hassignicantimplicationsformodelsofupperatmospherelowfrequencyvariabilitybeyondthetime-scaleofdailyweatherdisturbances(and, low-dimensionalprojectionsofthegriddeddata(seemichelangelietal.(1995)and thus,modelsoftheearth'sclimateoverlargetime-scales).severalstudieshave othersreferredtotherein).whilethisworkhasattemptedtovalidatethecluster degreeofcertaintyandthatthereisafundamentaluncertainty(giventhecurrent data)abouttheunderlyingmechanismsatwork.allisnotlost,however,sinceit whetherregimestrulyexist,and,iftheydo,wherepreciselytheyarelocated.it seemslikelythat48wintersworthofdataisnotenoughtoidentifyregimestoany usedavarietyofclusteringalgorithmstodetectinhomogeneities(\bumps")in isalsoclearthatonecouldquantifymodeluncertaintyinthiscontext,andtheorize accordingly(seesection4). ofthehazardsofdatamining. Inwhatfollowswewillelaborateonthesepointsandoeraperspectiveonsome
9 estimateorapredictionisalmostalwaysinadequate.quanticationoftheuncertaintyassociatedwithasinglenumber,whileoftenchallenging,iscriticalfor 4.CharacterizingUncertainty STATISTICALTHEMESANDLESSONSFORDATAMINING 33 Thestatisticalapproachcontendsthatreportingasinglenumberforaparameter subsequentdecisionmaking.asanexample,draper(1995),consideredthecaseof the1980energymodelingforum(emf)atstanforduniversitywherea43-person workinggroupofeconomistsandenergyexpertsconvenedtoforecastworldoil pricesfrom1981to2020.thegroupgeneratedpredictionsbasedonanumberof econometricmodelsandscenarios,embodyingavarietyofassumptionsaboutsupply,demand,andgrowthratesofrelevantquantities.aplausiblereferencescenario andmodelwasselectedasrepresentative,butthesummaryreport(emf,1982) thewarningaboutthepotentialuncertaintyassociatedwiththepointestimates, toacceptanyprojectionasaforecast."thesummaryreportdidconclude,however,thatmostoftheuncertaintyaboutfutureoilprices\concernsnotwhether cautionedagainstinterpretingpointpredictionsbasedonthereferencescenarioas thesepriceswillrise...buthowrapidlytheywillrise." inthequotationabove,andproceededtoinvestanestimated$500billiondollars, \[theworkinggroup's]`forecast'oftheoilfuture,astherearetoomanyunknowns governmentsandprivatecompaniesaroundtheworldfocusedonthelastsentence onthebasisthatthepricewouldprobablybecloseto$40dollarsperbarrelinthe mid-eighties.infact,theactual1986worldaveragespotpriceofoilwasabout$13 perbarrel. In1980,theaveragespotpriceofcrudeoilwasaround$32perbarrel.Despite (andshould)haveproceededmorecautiouslyin1980,hadtheyunderstoodthefull extentoftheiruncertainty. intervalforthe1986pricewouldhaverangedfromabout$20toover$90.note tisticalanalysisdoesnotprovideclairvoyance.however,decisionmakerswould thatthisintervaldoesnotactuallycontaintheactual1986price{insightfulstafulbutelementarystatisticalmethods,draper(1995)showsthata90%predictive Correctlyaccountingforthedierentsourcesofuncertaintypresentssignicant UsingonlytheinformationavailabletotheEMFin1980,alongwiththought- parametricandpredictiveuncertaintyinthecontextofaparticularmodel.two distinctapproachesareincommonuse.\frequentist"statisticiansfocusonthe tersandpredictionsbyso-calledsamplingdistributions.\bayesian"statisticians randomnessinsampleddataandsummarizetheinducedrandomnessinparame- insteadtreatthedataasxed,andusebayestheoremtoturnprioropinionabout challenges.untilrecently,thestatisticalliteraturefocusedprimarilyonquantifying calledposteriordistributionthatembracesalltheavailableinformation.theerce quantitiesofinterest(alwaysexpressedbyaprobabilitydistribution),intoaso- conictsbetweenpreviousgenerationsoffrequentistsandbayesians,havelargely givenwayinrecentyearstoamorepragmaticapproach;moststatisticianswill basetheirchoiceoftoolonscienticappropriatenessandconvenience.
10 34 uncertainty(asdiscussedinthepreviousparagraph)mayoften,inpractice,be andyork,1995).itiscommonpracticenowadaysforstatisticiansanddataminers tousecomputationallyintensivemodelselectionalgorithmstoseekoutasingle dominatedbybetween-modeluncertainty(chateld,1995,draper,1995,madigan optimalmodelfromanenormousclassofpotentialmodels.theproblemisthat Inanyevent,recentresearchhasleadtoincreasedawarenessthatwithin-model C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH ofuncertaintyincludebayesianmodelaveraging(draper,1995)andresampling carefullyaboutmodelassessmentandlookbeyondcommonlyusedgoodness-of-t measuressuchasmeansquareerror. Intuitively,ambiguityoverthemodelshoulddiluteinformationabouteectparametersandpredictions,since\partoftheevidenceisspenttospecifythemodel" (Leamer,1978,p.91).Promisingtechniquesforproperlyaccountingforthissource severaldierentmodelsmaybeclosetooptimal,yetleadtodierentinferences. methods(breiman,1996).themainpointhereisthatdataminersneedtothink meetsdata. ofstatistics.whilestatisticsdoesnothavealltheanswersforthedataminer,it thissection,wedescribesomelessonsthatstatisticianshavelearnedwhentheory doesprovideausefulandpracticalframeworkforwhichtosearchforsolutions.in 5.Whatcangowrong,willgowrong 5.1.DataCanLie Dataminingposesdicultandfundamentalchallengestothetheoryandpractice Dataminingapplicationstypicallyrelyonobservational(asopposedtoexperimental)data.Interpretingobservedassociationsinsuchdataischallenging;sensiblhospitaldeaths)from1981to1990,focusingspecicallyonpatientswhohadreceivedaprimaryopencholecystectomy.Someofthesepatientshadinaddition deaths.achi-squaretestcomparingthisoutcomeforthetwogroupsofpatients receivedanincidental(i.e.discretionary)appendectomyduringthecholecystectomyprocedure.table2displaysthedataononeoutcome,namelyin-hospital showsa\statisticallysignicant"dierence.this\nding"issurprisingsincelongtermpreventionofappendicitisisthesolerationalefortheincidentalappendectomy Wen,Hernandez,andNaylor(1995;WHNhereafter)analyzedadministrative factors.hereweoeradetailedexampletosupportthisposition. inferencesrequirecarefulanalysis,anddetailedconsiderationoftheunderlying recordsofallontariogeneralhospitalseparations(discharges,transfers,orin- procedure{noshort-termimprovementinoutcomesisexpected.this\nding" mightleadanaivehospitalpolicymakertoconcludethatallcholecystectomypatientsshouldhaveanincidentalappendectomytoimprovetheirchancesofagood outcome!clearlysomethingisamiss-howcouldincidentalappendectomyimprove outcomes?
11 STATISTICALTHEMESANDLESSONSFORDATAMINING Table2.In-hospitalSurvivalofPatientsUndergoingPrimaryOpen CholecystectomyWithandWithoutIncidentalAppendectomy. AppendectomyAppendectomy Without 35 (usingtendierentdenitionsof\low-risk"),incidentalappendectomyindeedre- butappearstopositivelyaectoutcomeswhenthelow-riskandhigh-riskpatients sultedinpooreroutcomes.paradoxically,itcouldevenbethecasethatappendec- tomyadverselyaectsoutcomesforbothhigh-riskpatientsandlow-riskpatients, WHNdidseparatelyconsiderasubgroupoflow-riskpatients.Forthesepatients In-hospitaldeaths,No.(%)21(0.27%)1,394(0.73%) In-hospitalsurvivors,No.(%)7,825(99.73%)190,205(99.27%) arecombined.whndonotprovideenoughdatatocheckwhetherthisso-called \Simpson'sParadox"(Simpson,1951)occurredinthisexample.However,Table3 presentsdatathatareplausibleandconsistentwithwhn'sdata. Table3.FictitiousdataconsistentwiththeWenetal.(1995) data. tiousdata.clearlytheriskanddeathcategoriesaredirectlycorrelated.inaddition, Table4displaysthecorrespondingproportionsofin-hospitaldeathforthesecti- Survival7700 DeathLow-RiskHigh-RiskLow-RiskHigh-Risk Appendectomy 7With Appendectomy Without thattheyhadanappendectomyallowsustoinferthattheyaremorelikelytobe appendectomiesaremorelikelytobecarriedoutonlow-riskpatientsthanonhighriskones.thus,ifwedidnotknowtheriskcategory(age)ofapatient,knowing 1294 pendectomywilllowerone'srisk.nonetheless,whenriskisomittedfromthetable, exactlysuchafallaciousconclusionappearsjustiedfromthedata. lowerrisk(younger).however,thisdoesnotinanywayimplythathavinganap- analysis,adjustingformanypossibleconfoundingvariables(e.g.age,sex,admissionstatus).theyconcludethat\thereisabsolutelynobasisforanyshort-term improvementinoutcomes"duetoincidentalappendectomy.thiscarefulanalysis agreeswithcommonsenseinthiscase.ingeneral,analysesofobservationaldata demandsuchcare,andcomewithnoguarantees.othercharacteristicsofavailable datathatconnivetospoilcausalinferencesinclude: Returningtotheoriginaldata,WHNprovideamoresophisticatedregression
12 36 riskgroupingforthectitiousdataoftable3. Table4.Proportionofin-hospitaldeathscrossclassiedbyincidentalappendectomyandpatient C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH Low-Risk AppendectomyAppendectomy With Without Thepopulationunderstudymaybeamixtureofdistinctcausalsystems,resultinginstatisticalassociationsthatareduetothemixingratherthantoany Associationsinthedatabasemaybedueinwholeorparttounrecordedcommon causes(latentvariables). Combined0.003 High-Risk Missingvaluesofvariablesforsomeunitsmayresultinmisleadingassociations Membershipinthedatabasemaybeinuencedbytwoormorefactorsunderstudy,whichwillcreatea\spurious"statisticalassociationbetweenthose directinuenceofvariablesononeanotheroranysubstantivecommoncause. Manymodelswithquitedistinctcausalimplicationsmay\t"thedataequally amongtherecordedvalues. Thefrequencydistributionsinsamplesmaynotbewellapproximatedbythe Therecordedvaluesofvariablesmaybetheresultof\feedback"mechanisms variables. oralmostequallywell. mostfamiliarfamiliesofprobabilitydistributions. regressioncaninsomecasesproduceinferiorestimatesofeectsizes.procedures asintheappendectomyexample,buttheyarenotalwaysadequateguardsagainst thesehazards.indeed,controllingforpossiblyconfoundingvariableswithmultiple suchasmultipleregression,andlogisticregressionmayworkinmanycases,such tisticalproceduresyetavailablethatcanbeused\otheshelf"{thewayrandom- izationisusedinexperimentaldesign{toreducetheserisks.standardtechniques Thereisresearchthataddressesaspectsoftheseproblems,buttherearefewsta- whicharenotwellrepresentedbysimple\non-recursive"statisticalmodels. recentlydevelopedinthearticialintelligenceandstatisticsliterature(spirteset al.,1993)addresssomeoftheproblemsassociatedwithlatentvariablesandmixing,butsofaronlyfortwofamiliesofprobabilitydistributions,thenormaland multinomial.
13 institutionsthatgiverisetodata,canbeuncooperative.insuchcases,inferences 5.2.Sometimesit'snotwhat'sinthedatathatmatters Classicalstatisticalmethodsstartwitharandomsample,yetinpractice,dataorthe STATISTICALTHEMESANDLESSONSFORDATAMINING thatignorehowthedatawere\selected"canleadtodistortedconclusions. Consider,forexample,theChallengerSpaceShuttleaccident.TheRogersCommissionconcludedthatanO-ringfailureinthesolidrocketboosterledtothe structuralbreakupandlossofthechallenger.inreconstructingtheeventsleadinguptothedecisiontolaunch,thecommissionnotedamistakeintheanalysis ofthermal-distressdatawherebyightswithno(i.e.zero)incidentsofo-ring thetemperatureeect.thistruncationofthedataledtotheconclusionthat temperaturesinceitwasfeltthattheydidnotcontributeanyinformationabout norelationshipbetweeno-ringdamageandtemperatureexisted,andultimately, damagewereexcludedfromcriticalplotsofo-ringdamageandambientlaunch thedecisiontolaunch.dalaletal.(1989)throwstatisticallightonthematter ariskyproposition. andquantifyingtherisk(ofcatastrophicfailure)at31of.hadtheoriginalanalysis bydemonstratingthestrongcorrelationbetweeno-ringdamageandtemperature, usedallofthedata,itwouldhaveindicatedthatthedecisiontolaunchwasatbest couldeasilyhavebeenavoided.inmostproblems,selectionbiasisaninherent standardinferences.thelessonstobelearnedhereare thatanytechniqueusedtoanalyzetruncateddataasifitwasarandomsample, characteristicoftheavailabledataandmethodsofanalysisneedtodealwithit.it isourexperiencethateverydatasethasthepotentialforselectionbiastoinvalidate Intheabovecase,theselectionbiasproblemwasoneof\humanerror"and 37 thedatathemselvesareseldomcapabletoalerttheanalystthataselection canbefooled,regardlessofhowthetruncationwasinduced; mechanismisoperating informationexternaltothedataathandiscritical dataminersastrayinmostapplications. makewidespreaduseofp-values.however,indiscriminateuseofp-valuescanlead classical(frequentist)statistics.itseemsnatural,therefore,thatdataminersshould 5.3.ThePerversityofthePervasiveP-value P-valuesandassociatedsignicance(orhypothesis)testsplayacentralrolein inunderstandingthenatureandextentofpotentialbiases. pothesesabouttheworld:thenullhypothesis,commonlydenotedbyh0,andthe isselectedandcalculatedfromthedataathand.theideaisthatt(data)should AlternativeHypothesis,commonlydenotedbyHA.TypicallyH0is\nested"within tozero,whilehamightplacenorestrictiononthecombination.ateststatistic,t HA;forexample,H0mightstatethatacertaincombinationofparametersisequal Thestandardsignicancetestproceedsasfollows.Considertwocompetinghy-
14 38 measuretheevidenceinthedataagainsth0.theanalystrejectsh0infavorofha ift(data)ismoreextremethanwouldbeexpectedifh0weretrue.specically, islessthanapresetsignicancelevel,. orequaltot(data),giventhath0istrue.theanalystrejectsh0ifthep-value theanalystcomputesthep-value,thatis,theprobabilityoftbeinggreaterthan Therearethreeprimarydicultiesassociatedwiththisapproach: C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH 1.Thestandardadvicethatstatisticseducatorsprovide,andscienticjournals 2.Raftery(1995)pointsoutthatthewholehypothesistestingframeworkrests rigidlyadhereto,istochoosetobe0.05or0.01,regardlessofsamplesize. agriculturalexperiments(ontheorderof30-200plots).textbookadvice(e.g., NeymanandPearson,1933)hasemphasizedtheneedtotakeaccountofthe Theseparticular-levelsaroseinSirRonaldFisher'sstudyofrelativelysmall samplesizeislarge.thiscrucialbutvagueadvicehaslargelyfallenondeaf powerofthetestagainsthawhensetting,andsomehowreducewhenthe onthebasicassumptionthatonlytwohypothesesareeverentertained.in ears. 3.TheP-valueistheprobabilityassociatedwiththeeventthattheteststatistic canleadtoundesirableoutcomessuchasselectingamodelwithparameters thatarehighlysignicantlydierentfromzero,evenwhenthetrainingdata aconsequence,indiscriminateuseofp-valueswith\standard"xed-levels practice,dataminerswillconsiderverylargenumbersofpossiblemodels.as arepurenoise(freedman,1983).thispointisoffundamentalimportancefor dataminers. wasasextremeasthevalueobserved,ormoreso.however,theeventthat actuallyhappenedwasthataspecicvalueoftheteststatisticwasobserved. Consequently,therelationshipbetweentheP-valueandtheveracityofH0is subtleatbest.jereys(1980)putsitthisway: toamoredirectinterpretation-thebayesiananalystcomputestheposteriorprobabilitythatahypothesisiscorrect.withxed-levels,thefrequentistandthe BayesFactorsaretheBayesiananalogueofthefrequentistP-valuesandadmit Theyamounttosayingthatahypothesisthatmayormaynotbe trueisrejectedbecauseagreaterdeparturefromthetrialvaluewas happened. improbable;thatis,thatithasnotpredictedsomethingthathasnot IhavealwaysconsideredtheargumentsfortheuseofPabsurd. Bayesianwillarriveatverydierentconclusions.Forexample,BergerandSellke distribution.onewaytoreconcilethetwopositionsistoviewbayesfactorsasa resultinaposteriorprobabilityforh0thatisatleast0.30forany\objective"prior methodforselectingappropriate-levels-seeraftery(1995). (1987)showthatdatathatyieldaP-valueof0.05whentestinganormalmean,
15 5.4.InterventionandPrediction STATISTICALTHEMESANDLESSONSFORDATAMINING Aspecicclassofpredictionproblemsinvolveinterventionsthataltertheprobabilitydistributionoftheproblem,asinpredictingthevalues(orprobabilities)of 39 variablesunderachangeinmanufacturingprocedures,orchangesineconomicor averagingapply.forgraphicalrepresentationsofcausalhypothesesaccordingto tionsfromcompleteorincompletecausalmodelsweredevelopedin(spirtesetal., tionwithoutintervention,althoughtheusualcaveatsaboutuncertaintyandmodel themarkovcondition,generalalgorithmsforpredictingtheoutcomesofintervenedgeoftherelevantcausalstructure,andareingeneralquitedierentfrompredicvenientcalculusbypearl(1995).arelatedtheorywithoutgraphicalmodelswas 1993).Someoftheseprocedureshavebeenextendedandmadeintoamorecon- developedearlierbyrubin(1974)andothers,andbyrobbins(1986). medicaltreatmentpolicies.accuratepredictionsofthiskindrequiresomeknowl- eachmeasurednumberisalinearcombinationofthetruevalueandanerror,and relationofleaddepositsinchildren'steethwiththeiriqsresulted,eventually, inremovaloftertraethylleadfromgasolineintheunitedstates.onedataset ingthatallofthevariablesweremeasuredwitherror.theirmodelassumesthat signicantregressors,includinglead.klepper(1988)reanalyzedthedataassum- Needlemanexaminedincludedmorethan200subjects,andmeasuredalargenumberofcovariates.Needleman,Geiger,andFrank(1985)re-analyzedthedatausing backwardsstep-wiseregressionofverbaliqonthesevariablesandobtainedsix Considerthefollowingexample.HerbertNeedleman'sfamousstudiesofthecor- thattheparametersofinterestarenottheregressioncoecientsbutratherthe coecientsrelatingtheunmeasured\truevalue"variablestotheunmeasuredtrue valueofverbaliq.thesecoecientsareinfactindeterminate{ineconometricterminology,\unidentiable".anintervalestimateofthecoecientsthatisstrictly positiveornegativeforeachcoecientcanbemade,however,iftheamountof measurementerrorcanbeboundedwithpriorknowledgebyanamountthatvaries tions(usingtetradmethodology)andconcludedthatthreeofthesixregressors couldhavenoinuenceoniq.theregressionincludedthethreeextravariables asstrongasneedleman'sanalysissuggested. fromcasetocase.klepperfoundthattheboundrequiredtoensuretheexistence ofastrictlynegativeintervalestimateforthelead{iqcoecientwasmuchtoo onlybecausethepartialregressioncoecientisestimatedbyconditioningonall stricttobecredible,thusheconcludedthatthecaseagainstleadwasnotnearly permodel,butwithoutthethreeirrelevantvariables,andassigningtoallofthe wrongthingtodoforcausalinferenceusingthemarkovcondition.usingtheklep- otherregressors,whichisjusttherightthingtodoforlinearprediction,butthe parametersanormalpriorprobabilitywithmeanzeroandasubstantialvariance, ScheinesthenusedMarkovchainMonteCarlotocomputeaposteriorprobabilitydistributionforthelead{IQparameter.Theprobabilityisveryhighthatlead Allowingthepossibilityoflatentvariables,Scheines(1996)reanalyzedthecorrela- exposurereducesverbaliq.
16 40 Easyaccesstodataindigitalformandtheavailabilityofsoftwaretoolsforstatisticalanalyseshavemadeitpossibleforthemaninstreettosetupshopand \dostatistics."nowhereisthismoretruetodaythanindatamining.basedon C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH 6.SymbiosisinStatistics assertthat: theargumentsinthisarticle,letusassumethatstatisticsisanecessarybutnot sucientcomponentinthepracticeofdatamining.howwellwillthestatistics professionservethedataminingcommunity?hoerletal.(1993),forexample, applicationsdoinfactdrivemuchofwhatgoesonitstatistics,althoughoftenina Despitethisrathernegativeviewoftherelevanceofstatisticalresearch,real-world veryindirectmanner. Asanexampleconsidertheeldofsignalprocessingandcommunications,anarea sionisintendedforothermembersofthestatisticalprofession. Weareourownbestcustomers.Muchoftheworkofthestatisticalprofes- fromclaudeshannonandothersinthe1940's.likemostoftheothercontributors totheeld,shannonwasnotastatistician,butpossessedadeepunderstanding intoeverydayuseinradioandnetworkcommunicationssystems.modernstatistical relevantstatisticalmethodssuchasestimationanddetectionhavefoundtheirway duetorapidadvancesinboththeoryandhardware,theeldhasexplodedand whereaspecializedsetofrelativelysophisticatedstatisticalmethodsandmodels Engineeringresearchersintheeldareineect\adjunct"statisticians:educated communicationsreectsthesymbiosisofstatisticaltheoryandengineeringpractice. havebeenhonedforpracticaluse.theeldwasdrivenbyfundamentaladvances inprobabilitytheoryandbasicstatisticstheyhavethetoolstoapplystatistical ofprobabilitytheoryanditsapplications.throughthe1950'stothepresent, methodstotheirproblemsofinterest.meanwhilestatisticianscontinuetodevelop speechrecognition(whereforexamplehiddenmarkovmodelsprovidethestate-ofthe-artintheeld),andmostnotably,epidemiology.indeed,ifstatisticscanclaistandstatisticalprinciples,andstatisticiansneedtounderstandthenatureofthe problemsincommunications. moregeneralmodelsandestimationtechniquesofpotentialapplicabilitytonew importantproblemsthatthedataminingcommunityisattackingorbeingasked tohaverevolutionizedanyeld,itisinthebiologicalandhealthscienceswherethe statisticalapproachtodataanalysisgavebirthtotheeldofbiostatistics. Thistypeofsymbiosiscanalsobeseeninotherareassuchasnancialmodelling, toattack.thishasbeenasuccessfulmodelinthepastforeldswherestatistics hashadconsiderableimpactandhasthepotentialtoseeongoingsuccess. Therelevanceofthissymbiosisfordataminingisthatdata-minersneedtounder-
17 STATISTICALTHEMESANDLESSONSFORDATAMINING 41 7.Conclusion Thestatisticalliteraturehasawealthoftechnicalproceduresandresultstooer datamining,butitalsohasafewsimplemethodologicalmorals:provethatestimationandsearchproceduresusedindataminingareconsistentunderconditions reasonablythoughttoapplyinapplications;useandrevealuncertainty,don'thide it;calibratetheerrorsofsearch,bothforhonestyandtotakeadvantagesofmodel averaging;don'tconfuseconditioningwithintervening;andnally,don'ttakethe errorprobabilitiesofhypothesisteststobetheerrorprobabilitiesofsearchprocedures. References Akaike,H.1974.Anewlookatthestatisticalmodelidentication.IEEETrans.Automat. Contr.AC-19:716{723. Berger,J.O.andSellke,T.1987.Testingapointnullhypothesis:theirreconcilabilityofPvalues andevidence(withdiscussion).journaloftheamericanstatisticalassociation82:112{122. Breiman,L.1996.Baggingpredictors.MachineLearning,toappear. Chasno,I.J.,Grith,D.R.,MacGregor,S.,Dirkes,K.,Burns,K.A.1989.Temporalpatterns ofcocaineuseinpregnancy:perinataloutcome.journaloftheamericanmedicalassociation 261(12):1741{4. Chateld,C.1995.Modeluncertainty,datamining,andstatisticalinference(withdiscussion). JournaloftheRoyalStatisticalSociety(SeriesA)158:419{466. Dalal,S.R.,Fowlkes,E.B.andHoadley,B.1989.Riskanalysisofthespaceshuttle:Pre-Challenger predictionoffailure.journaloftheamericanstatisticalassociation84:945{957. Diggle,P.andKenward,M.G.1994.Informativedrop-outinlongitudinaldataanalysis(with discussion).appliedstatistics:43:49{93. Draper,D.,Gaver,D.P.,Goel,P.K.,Greenhouse,J.B.,Hedges,L.V.,Morris,C.N.,Tucker,J., andwaternaux,c.1993.combininginformation:nationalresearchcouncilpanelonstatisticalissuesandopportunitiesforresearchinthecombinationofinformation.washington: NationalAcademyPress. Draper,D.1995.Assessmentandpropagationofmodeluncertainty(withdiscussion).Journalof theroyalstatisticalsociety(seriesb).57:45{97. Efron,B.andTibshirani,R.J.1993.AnIntroductiontotheBoostrap.NewYork:Chapmanand Hall. EnergyModelingForum1982.WorldOil:Summaryreport.EMFReport6,EnergyModeling Forum,StanfordUniversity,Stanford,CA. Fisher,R.A.1958.Statisticalmethodsforresearchworkers.NewYork:HafnerPub.Co. Freedman,D.A.1983.Anoteonscreeningregressionequations.TheAmericanStatistician 37:152{155. Geiger,D.Heckerman,D.,andMeek,C.1996.Asymptoticmodelselectionfordirectednetworkswithhiddenvariables.ProceedingsoftheTwelfthAnnualConferenceonUncertaintyin ArticialIntelligence.SanFrancisco:MorganKaufman. Gilks,W.R.,Richardson,S.,andSpiegelhalter,D.J.1996.MarkovchainMonteCarloinpractice. London:ChapmanandHall. Hand,D.J.1994.Deconstructingstatisticalquestions(withdiscussion).JournaloftheRoyal StatisticalSociety(SeriesA)157:317{356. Hastie,T.J.andTibshirani,R.1990.GeneralizedAdditiveModels.London:ChapmanandHall. Hoerl,R.W.,Hooper,J.H.,Jacobs,P.J.,Lucas,J.M.1993.Skillsforindustrialstatisticiansto surviveandprosperintheemergingqualityenvironment.theamericanstatistician47:280{292. Huber,P.J.1981.RobustStatistics.NewYork:Wiley.
18 42 C.GLYMOUR,D.MADIGAN,D.PREGIBONANDP.SMYTH Jereys,H.1980.Somegeneralpointsinprobabilitytheory.In:A.Zellner(Ed.),Bayesian AnalysisinEconometricsandStatistics.Amsterdam:North-Holland,451{454. Kass,R.E.andRaftery,A.E.1995.Bayesfactors.JournaloftheAmericanStatisticalAssociation 90:773{795. Kiiveri,H.andSpeed,T.P.1982.Structuralanalysisofmultivariatedata:Areview.Sociological Methodology209{289. Kooperberg,C.,Bose,S.,andStone,C.J.1996.Polychotomousregression.JournaloftheAmericanStatisticalAssociation,toappear. Lauritzen,S.L.1996.GraphicalModels.Oxford:OxfordUniversityPress. Leamer,E.E.1978.SpecicationSearches.AdHocInferencewithNonexperimentalData.Wiley: NewYork. Madigan,D.andRaftery,A.E.1994.Modelselectionandaccountingformodeluncertainty ingraphicalmodelsusingoccam'swindow.journaloftheamericanstatisticalassociation 89:1335{1346. Madigan,D.andYork,J.1995.Bayesiangraphicalmodelsfordiscretedata.International StatisticalReview63:215{232. Matheson,J.E.andWinkler,R.L.1976.Scoringrulesforcontinuousprobabilitydistributions. ManagementScience22:1087{1096. McCullagh,P.andNelder,J.A.1989.GeneralizedLinearModels.London:ChapmanandHall. Michelangeli,P.A.,Vautard,R.,andLegras,B.1995.Weatherregimes:recurrenceandquasistationarity.JournaloftheAtmosphericSciences52(8):1237{56. Miller,R.G.Jr.1981.Simultaneousstatisticalinference(SecondEdition).NewYork:Springer- Verlag. Neyman,J.andPearson,E.S.1933.Ontheproblemofthemostecienttestsofstatistical hypotheses.philosophicaltransactionsoftheroyalsociety(seriesa)231:289{337. Raftery,A.E.1995.Bayesianmodelselectioninsocialresearch(withdiscussion).InSociological Methodology(ed.P.V.Marsden),Oxford,U.K.:Blackwells,111{196. Rissanen,J.1978.Modelingbyshortestdatadescription.Automatica14:465{471. Schervish,M.J.1995.TheoryofStatistics,NewYork:SpringerVerlag. Schwartz,G.1978.Estimatingthedimensionofamodel.AnnalsofStatistics6:461{464. Selvin,H.andStuart,A.1966.Datadredgingproceduresinsurveyanalysis.TheAmerican Statistician20(3):20{23. Simpson,C.H.1951.Theinterpretationofinteractionincontingencytables.Journalofthe RoyalStatisticalSociety(SeriesB)13:238{241. Smith,A.F.M.andRoberts,G.1993.BayesiancomputationviatheGibbssamplerandrelated MarkovchainMonteCarlomethods(withdiscussion).JournaloftheRoyalStatisticalSociety (SeriesB)55:3{23. Spirtes,P.,GlymourC.,andScheines,R.1993.Causation,PredictionandSearch,Springer LectureNotesinStatistics,NewYork:SpringerVerlag. Stigler,S.M.1986.Thehistoryofstatistics:Themeasurementofuncertaintybefore1900. Harvard:HarvarduniversityPress. Wen,S.W.,Hernandez,R.,andNaylor,C.D.1995.Pitfallsinnonrandomizedstudies:The caseofincidentalappendectomywithopencholecystectomy.journaloftheamericanmedical Association274:1687{1691. Wright,S.1921.Correlationandcausation.JournalofAgriculturalResearch20:557{585. ReceivedDate AcceptedDate FinalManuscriptDate
Abstractzmhasan@db.toronto.edu
SupportingNetworkManagement consens@db.toronto.edu MarianoP.Consens throughdeclarativelyspecied DataVisualizations1 ComputerSystemsResearchInstitute Toronto,CanadaM5S1A1 UniversityofToronto MasumZ.Hasan
More informationInsurance Brokers and the PPACA: The Potential for Field Underwriting and Other Concerns
Insurance Brokers and the PPACA: The Potential for Field Underwriting and Other Concerns Joshua P. Booth, J.D., LL.M. candidate (Health Law) jpbooth@email.wm.edu Introduction Independent health insurance
More informationBUSINESS APPLICATIONS OF DATA MINING
BUSINESS They help identify and predict individual, as well as aggregate, behavior, as illustrated by four application domains: direct mail, retail, automobile insurance, and health care. APPLICATIONS
More informationDRAFT ALCOHOL POLICY 1. Particulars to follow 2. POLICY STATEMENT
DRAFT ALCOHOL POLICY 1. Particulars to follow 2. POLICY STATEMENT The University encourages an enlightened, mature and responsible approach to moderate alcohol consumption, based on the undeniable fact
More informationUNIVERSITY of TORONTO. Faculty of Arts and Science
UNIVERSITY of TORONTO Faculty of Arts and Science AUGUST 2005 EXAMINATION AT245HS uration - 3 hours Examination Aids: Non-programmable or SOA-approved calculator. Instruction:. There are 27 equally weighted
More informationCentralized vs Onsite Monitoring:
Centralized vs Onsite Monitoring: A Sponsor s Balancing Act Applying a Risk-based Approach Introduction Since the August 2011 release of the draft guidance document by FDA on a risk-based approach to monitoring
More informationHow to Read the New Preliminary Online Flood Zone Maps
How to Read the New Preliminary Online Flood Zone Maps On the Hillsborough County website at www.hillsboroughcounty.org, select the Mapping the Risk: Flood Map Update link, next select the Proposed Flood
More informationAccident Prevention Techniques
Topic 9 Accident Prevention Techniques LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Describe Job Hazard Analysis (JHA) as an accident prevention technique; 2. Describe Job Safety
More informationTitle: The BCL2-938 C>A promoter polymorphism is associated with risk group classification in children with acute lymphoblastic leukemia
Author's response to reviews Title: The BCL2-938 C>A promoter polymorphism is associated with risk group classification in children with acute lymphoblastic leukemia Authors: Annette Kuenkele (annette.kuenkele@uk-essen.de)
More informationPRE/POST TESTS and PRE/POST TEST INSTRUCTOR KEYS
PRE/POST TESTS and PRE/POST TEST INSTRUCTOR KEYS Enclosed are two versions of optional PRIME For Life Pre/Post Tests and Test Keys for your participants. You may use either test with your groups. For accurate
More informationESI ANNUAL SALARY SURVEY
ESI ANNUAL SALARY SURVEY In order to uncover how public and private sector organizations are going about building and developing their project communities, ESI International conducted the ESI 2013 Project
More informationType B Risk Assessment & Reporting Findings
Type B Risk Assessment & Reporting Findings Virginia s Practices for Completing Type B Risk Assessments & Reporting Findings Presented at: Mid-America Intergovernmental Audit Forum (MAMIAF) Single Audit
More information1 M.P,LeRouxB;ForewordbyP.Suppes: Excerptfrom significanceteststobayesianinference",bern, RouanetH,BernardJ.M,LecoutreB,Lecoutre PeterLang. "Newwaysinstatisticalmethodology:from Chapter4 Introductionto
More informationLongitudinal Data Analysis. Wiley Series in Probability and Statistics
Brochure More information from http://www.researchandmarkets.com/reports/2172736/ Longitudinal Data Analysis. Wiley Series in Probability and Statistics Description: Longitudinal data analysis for biomedical
More informationVCE Business Management 2013 2015
VCE Business Management 2013 2015 Written examination November Examination specifications The following information updates the specifications published in 2010. It reflects a change to the format introduced
More informationCreating Customer Value, Satisfaction, and Loyalty 9/5/2008. Building Customer Value and Satisfaction
Chapter 4 Creating Customer Value, Satisfaction, and Loyalty 4-1 Chapter Questions How can companies deliver customer value, satisfaction, and loyalty? What is the lifetime value of a customer, and why
More information!! Data$Analytics$and$the$ Microsoft)Ecosystem)for)the) Political(Campaign!
Data$Analytics$and$the$ Microsoft)Ecosystem)for)the) Political(Campaign A"brief"outline"of"Microsoft"data"and"analytics"tools"useful"to"assist"a"political"campaign." Created"by: MicrosoftTechnologyandCivicEngagement
More informationSupplemental Materials
Supplemental Materials How Can I Use Student Feedback to Improve My Teaching? Presented by: Ken Alford, Ph.D. and Tyler Griffin, Ph.D. 2014 Magna Publications Inc. All rights reserved. It is unlawful to
More informationWashington State Health Benefit Exchange Program
Summary Washington State Health Benefit Exchange Program Issue Brief #7: Managing Health Insurance Expenditure Risks for Washington State s Exchange As Submitted to the Federal Department of Health and
More informationColocation Services. Retail Colocation as it s meant to be
Colocation Services Retail Colocation as it s meant to be We are an agile business and look for similar organisations we can scale with. Infinity was the perfect choice. Jamie Donnelly Managing Director,
More informationHomework 3 Solution, due July 16
Homework 3 Solution, due July 16 Problems from old actuarial exams are marked by a star. Problem 1*. Upon arrival at a hospital emergency room, patients are categorized according to their condition as
More informationEssential QA Metrics for Determining Solution Quality
1.0 Introduction In today s fast-paced lifestyle with programmers churning out code in order to make impending deadlines, it is imperative that management receives the appropriate information to make project
More informationTHE PREDICTIVE MODELLING PROCESS
THE PREDICTIVE MODELLING PROCESS Models are used extensively in business and have an important role to play in sound decision making. This paper is intended for people who need to understand the process
More informationMath 370/408, Spring 2008 Prof. A.J. Hildebrand. Actuarial Exam Practice Problem Set 1
Math 370/408, Spring 2008 Prof. A.J. Hildebrand Actuarial Exam Practice Problem Set 1 About this problem set: These are problems from Course 1/P actuarial exams that I have collected over the years, grouped
More informationAn Empirical Analysis on Individuals Deposit- Withdrawal Behaviors Using Data Collected through a Web-Based Survey
Eurasian Journal of Business and Economics 2009, 2 (4), 27-41. An Empirical Analysis on Individuals Deposit- Withdrawal Behaviors Using Data Collected through a Web-Based Survey Toshihiko TAKEMURA *, Takashi
More informationMaster of Science in Statistics
Master of Science in Statistics Options: Biometrics Social, Behavioural and Educational Statistics Business Statistics Industrial Statistics General Statistical Methodology All Round Statistics Rubik s
More informationSECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9.
SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION Friday 14 March 2008 9.00-9.45 am Attempt all ten questions. For each question, choose the
More informationWork Account: Experience Rating Consultation. Submission by: The Southern Cross Medical Care Society
Work Account: Experience Rating Consultation Submission by: October 2010 Work Account: Experience Rating Consultation Page 2 1. Introduction This submission is made on behalf of The Southern Cross Medical
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationA dynamic, integrated wellness platform and app designed to engage workers on their Personal Pathways to better health.
Vitality Overview Who is Vitality? A dynamic, integrated wellness platform and app designed to engage workers on their Personal Pathways to better health. Vitality encourages individuals to commit to long-term
More informationSample Script of an Initial Brief Alcohol Counseling Session
Information Sheet for Behavioral Health Providers in Primary Care Sample Script of an Initial Brief Alcohol Counseling Session Introduce the Subject with a Transitional Statement From your answers it appears
More informationNew York State Department of Financial Services. Update on Cyber Security in the Banking Sector: Third Party Service Providers
New York State Department of Financial Services Update on Cyber Security in the Banking Sector: Third Party Service Providers April 2015 Update on Cyber Security in Banking Sector: Third-Party Service
More informationMaster of Science in Statistics
Master of Science in Statistics Majors: Biometrics Social, Behavioural and Educational Statistics Business Statistics Industrial Statistics General Statistical Methodology All Round Statistics INTERFACULTY
More informationLife Insurance and AD&D
Life Insurance and AD&D The plan provides you with life and accident coverage that gives you and your family protection against some of the financial hardships that can occur if you become injured or die.
More informationGetting Started Different Ways of Deleading Other Options and Resources
Contents Getting Started Protecting Children from Lead Poisoning page 2 Massachusetts Lead Law page 3 What is Deleading? page 4 Getting Your Home Inspected for Lead page 5 Different Ways of Deleading Low-Risk
More informationPROJECT MATURITY IN ORGANISATIONS
PROJECT MATURITY IN ORGANISATIONS By Erling S. Andersen Professor of Information Systems and Project Management Norwegian School of Management BI P.O.Box 580, N-1302 Sandvika, Norway erling.s.andersen@bi.no
More informationBreaking Down Work Comp Premium
3. 4. Missouri Employers Mutual Understand How it Can Add Up to Savings For more information: www.mem-ins.com 1.800.442.0593 These recommendations were developed from national standards and sources believed
More informationAuditorium Acoustics and Architectural Design
Auditorium Acoustics and Architectural Design Second Edition Michael Barron. J ^A Spon Press an imprint of Taylor & Francis LONDON AND NEWYORK Contents Preface Preface to the first edition Foreword ix
More informationDoes my patient need more therapy after prostate cancer surgery?
Does my patient need more therapy after prostate cancer surgery? Contact the GenomeDx Patient Care Team at: 1.888.792.1601 (toll-free) or e-mail: client.service@genomedx.com Prostate Cancer Classifier
More informationStatistics in Applications III. Distribution Theory and Inference
2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied
More informationViewPoint Accountable Care Organizations
ViewPoint Accountable Care Organizations Improving the Quality and Accountability of Care for ACOs with Web- Based Technology As an ACO, if you are finding it difficult to manage your patient populations
More informationIowa State Board of Education
Iowa State of Education Executive Summary January 23, 2014 Framework for Policy Development and Decision Making Issue Identification Follow- Through Identifies Priorities Analysis Study Action Agenda Item:
More informationOctober 3, 2011. Richard Van Acker, Chair Senate Committee on Educational Policy. Kim Neumann, Assistant Director for Academic Programs
Office of Programs and Academic Assessment (MC 103) 2630 University Hall 601 South Morgan Street Chicago, Illinois 60607-7128 October 3, 2011 TO: FROM: Richard Van Acker, Chair Senate Committee on Educational
More informationSmall employers. Issue Brief. Health Insurance Purchasing Cooperatives. Elliot K.Wicks Economic and Social Research Institute
TASK FORCE ON THE FUTURE OF HEALTH INSURANCE Issue Brief NOVEMBER 2002 Health Insurance Purchasing Cooperatives Elliot K.Wicks Economic and Social Research Institute The Commonwealth Fund is a private
More informationInfluence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions
Influence of the Premium Subsidy on Farmers Crop Insurance Coverage Decisions Bruce A. Babcock and Chad E. Hart Working Paper 05-WP 393 April 2005 Center for Agricultural and Rural Development Iowa State
More informationMultinational Comparisons of Health Systems Data, 2014
Multinational Comparisons of Health Systems Data, 214 Chloe Anderson The Commonwealth Fund November 214 Health Care Spending 2 Dollars ($US) Average Health Care Spending per Capita, 198 212 Adjusted for
More informationProstate cancer. Christopher Eden. The Royal Surrey County Hospital, Guildford & The Hampshire Clinic, Old Basing.
Prostate cancer Christopher Eden The Royal Surrey County Hospital, Guildford & The Hampshire Clinic, Old Basing. Screening Screening men for PCa (prostate cancer) using PSA (Prostate Specific Antigen blood
More informationMBA PROGRAMME: 2015. Appendix 1 FINANCE AND RESPONSIBLE INVESTMENT SUBJECT CODE: CMBC 191
MBA PROGRAMME: 2015 Appendix 1 FINANCE AND RESPONSIBLE INVESTMENT STUDY GUIDE AND COURSE OUTLINE SUBJECT CODE: CMBC 191 1. Lecturing Dates February 7 February 27 March 27 April 17 May 17 May 22 2. Module
More informationHow To Understand Predictive Analysis And Data Mining
DATA MINING AND PREDICTIVE ANALYSIS PDF ==> Download: DATA MINING AND PREDICTIVE ANALYSIS PDF DATA MINING AND PREDICTIVE ANALYSIS PDF - Are you searching for Data Mining And Predictive Analysis Books?
More informationCurriculum Vitae: Raul J. Cano, Ph.D.
CurriculumVitae:RaulJ.Cano,Ph.D. I.PERSONALINFORMATION NAME: RaulJ.Cano OFFICEADDRESS: BiologicalSciencesDepartment,53 210E CaliforniaPolytechnicStateUniversity SanLuisObispo,CA93407 OFFICETELEPHONE: (805)756
More informationUse of Androgen Deprivation Therapy (ADT) in Localized Prostate Cancer
Use of Androgen Deprivation Therapy (ADT) in Localized Prostate Cancer Adam R. Kuykendal, MD; Laura H. Hendrix, MS; Ramzi G. Salloum, PhD; Paul A. Godley, MD, PhD; Ronald C. Chen, MD, MPH No conflicts
More information!!!!!!! !! Homeowners!Insurance!in!New!York:
HomeownersInsuranceinNewYork: AsInsurersMakeOutsizeProfits,PolicyholdersHaveLittleLegal RecoursetoChallengeUnfairClaimsSettlementPractices June8,2015 HomeownersInsuranceinNewYork: AsInsurersMakeOutsizeProfits,PolicyholdersHaveLittleLegalRecourseto
More informationSocial Networks and their Economics. Influencing Consumer Choice. Daniel Birke
Social Networks and their Economics Influencing Consumer Choice Daniel Birke Visiting Researcher, Aston Business School, Birmingham, and works in a leading international management consultancy in Germany.
More informationTrends in Publicly Reported Nursing Facility Quality Measures
Trends in Publicly Reported Nursing Facility Quality Measures American Health Care Association Reimbursement and Research Department January 2011 Trends in Publicly Reported Nursing Facility Quality Measures
More informationRadiation Therapy for Prostate Cancer: Treatment options and future directions
Radiation Therapy for Prostate Cancer: Treatment options and future directions David Weksberg, M.D., Ph.D. PinnacleHealth Cancer Institute September 12, 2015 Radiation Therapy for Prostate Cancer: Treatment
More informationQuality Scorecard overall heart attack care overall heart failure overall pneumonia care overall surgical infection rate patient safety survival
Quality Scorecard s are required to report quality statistics to the s for Medicare and Medicaid Services (CMS) and the Department of Health (DOH). This information is made available at www.hospitalcompare.hhs.gov
More informationAtherosclerosis of the aorta. Artur Evangelista
Atherosclerosis of the aorta Artur Evangelista Atherosclerosis of the aorta Diagnosis Classification Prevalence Risk factors Marker of generalized atherosclerosis Risk of embolism Therapy Diagnosis Atherosclerosis
More informationJulio is [it] the best option?
BEG_CTRL_NUM : DONZ000043764 END_CTRL_NUM : DONZ000043764 DATESENT = July 11, 2007 TIMESENT = 3:20:43 pm RECEIVEDDATE = July 11, 2007 TIMERECEIVED = 3:20:43 pm FILENAME : Re: seguro para el wao.msg SUBJECT
More informationCore Music Curriculum General Education
Department of Music BA Degree, Major in Music: 120 hours BM Degree, Major in Music Education: 126 hours BM Degree, Major in Performance: 120 hours College of Arts and Architecture UNC Charlotte www.music.uncc.edu
More informationCreating Strategic Alliances for Post-Acute Coordination of Care
Creating Strategic Alliances for Post-Acute Coordination of Care Kathleen Yosko, PhD President/CEO Wheaton Franciscan Health Care Sole Illinois property Free-standing facility 101 IRF beds 27 SNF beds
More informationAn Introduction to Advanced Analytics and Data Mining
An Introduction to Advanced Analytics and Data Mining Dr Barry Leventhal Henry Stewart Briefing on Marketing Analytics 19 th November 2010 Agenda What are Advanced Analytics and Data Mining? The toolkit
More informationAssessing Data Mining: The State of the Practice
Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality
More informationFollow-up by the Riksdag of the duty to enter into a contract in the case of child insurance applications
Summary of research report 2012/13:RFR6 The Committee on Civil Affairs Follow-up by the Riksdag of the duty to enter into a contract in the case of child insurance applications Summary of research report
More informationTest your knowledge on risk. Fill in the box for the correct answer for each question or statement.
Test your knowledge on risk. Fill in the box for the correct answer for each question or statement. 1 2 Which 3 Which 4 The 5 In Which statement(s) describe the relationship between risk and insurance?
More informationMultinomial Logistic Regression
Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a
More informationTABLE OF CONTENTS BACKGROUND AND INTRODUCTION... 5 PURPOSE... 5 SCOPE... 6 RISK ASSESSMENT PROCESS... 6
TABLE OF CONTENTS BACKGROUND AND INTRODUCTION... 5 PURPOSE... 5 SCOPE... 6 RISK ASSESSMENT PROCESS... 6 RISK ASSESSMENT AND EVALUATION METHODOLOGY... 6 RESULTS... 8 RISK ASSESSMENT GAPS... 9 RISK ASSESSMENT
More informationPlugging Premium Leakage
Plugging Premium Leakage Using Analytics to Prevent Underwriting Fraud WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Types of Underwriting Fraud... 1 Application Fraud/Rate Manipulation....
More informationDecision & Risk Analysis Lecture 6. Risk and Utility
Risk and Utility Risk - Introduction Payoff Game 1 $14.50 0.5 0.5 $30 - $1 EMV 30*0.5+(-1)*0.5= 14.5 Game 2 Which game will you play? Which game is risky? $50.00 Figure 13.1 0.5 0.5 $2,000 - $1,900 EMV
More informationAim- How can you be safe online? What is so bad about cyber bullying How do you connect to the internet?
Mr. Polley Technology 7 03/12/15 PolleyTechBMCHSD.weebly.com BYOD HW6 will be sent via remind. Check website if you do not have remind. Aim- How can you be safe online? What is so bad about cyber bullying
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationMilwaukee County Early Intervention Program
Milwaukee County Early Intervention Program National Symposium on Pretrial Diversion Strengthening the Evidence-Based Framework Washington D.C. May 30, 2012 District Attorney John T. Chisholm First Assistant
More informationBuilding flexible, easy to change and rock-solid applications with BRFplus decision services. Carsten Ziegler, James Taylor
[ Building flexible, easy to change and rock-solid applications with BRFplus decision services Carsten Ziegler, James Taylor [ Learning Points Learn how the empowerment of business experts is built into
More informationWaterfall vs. Agile Project Management
Lisa Sieverts, PMP, PMI-ACP Phil Ailes, PMI-ACP Agenda What is a Project Overview Traditional Project Management Agile Project Management The Differences Product Life Cycle The Teams Requirements WBS/Product
More informationProject Management in a Multi-Environment Ken Halloway, PMP, ITIL 21 October 2015
Project Management in a Multi-Environment Ken Halloway, PMP, ITIL 21 October 2015 www.pmihr.org 1 What Am I Talking About? www.pmihr.org 2 Project www.pmihr.org 3 Lifecycle Initiating Planning Executing
More informationTHE POSITIVE PERSPECTIVE 10 STEP CHANGE MANAGEMENT APPROACH Bridging the gap..
THE POSITIVE PERSPECTIVE 10 STEP CHANGE MANAGEMENT APPROACH Bridging the gap.. We aim to get involved and not just write worthy reports We will achieve this by identifying the key issues of change and
More informationInformation asymmetries
Adverse selection 1 Repeat: Information asymmetries Problems before a contract is written: Adverse selection i.e. trading partner cannot observe quality of the other partner Use signaling g or screening
More informationLife expectancy of children with cerebral palsy
Life expectancy of children with cerebral palsy J L Hutton, K Hemming and UKCP collaboration What is UKCP? Information about the physical effects of cerebral palsy on the everyday lives of children and
More informationAdministrative Measures of Settlement Reserve Funds by China Securities Depository and Clearing Corporation Limited
Administrative Measures of Settlement Reserve Funds by China Securities Depository and Clearing Corporation Limited Article 1: In order to prevent and remove the securities transactions clearing and settlement
More informationRockford s map update project is a joint effort with FEMA in cooperation with local associations and other state partners.
FREQUENTLY ASKED QUESTIONS 1. Why is Rockford getting new flood hazard maps? Flood hazard maps, also known as Flood Insurance Rate Maps (FIRMs), are important tools in the effort to protect lives and properties
More informationUsing the Past to Predict the Future
Predictive BI Using the Past to Predict the Future Antony Heljula Technical Director Peak Indicators Limited 2 Using the Past to Predict the Future About Predictive BI The 1 Billion Problem How does it
More informationThe Entrepreneur s Guide to Financial Maturity Factoring - Financing for Companies Seeking Fast Cash
The Entrepreneur s Guide to Financial Maturity Factoring - Financing for Companies Seeking Fast Cash A healthy cash flow is an essential part of any successful business. Some entrepreneurs claim that a
More informationCopyright 2009 Pearson Education Canada
The consequence of failing to adjust the discount rate for the risk implicit in projects is that the firm will accept high-risk projects, which usually have higher IRR due to their high-risk nature, and
More informationInternational Services
International Services Consistently ranked as one of the best hospitals in the United States by U.S.News & World Report, patients from around the world travel to UCSF Medical Center and UCSF Benioff Children
More informationSun Li Centre for Academic Computing lsun@smu.edu.sg
Sun Li Centre for Academic Computing lsun@smu.edu.sg Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic
More informationSample Size Designs to Assess Controls
Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference
More informationDoctorates in Occupational Safety and Health: A Critical Shortage
Doctorates in Occupational Safety and Health: A Critical Shortage By Anthony Veltri, Ed.D., MS, CSHM and Jim Ramsay, Ph.D., MA, CSP Contact Information: Anthony Veltri, Ed.D., MS, CSHM Associate Professor
More informationUNIT-LINKED LIFE INSURANCE CONTRACTS WITH INVESTMENT GUARANTEES A PROPOSAL FOR ROMANIAN LIFE INSURANCE MARKET
UNIT-LINKED LIFE INSURANCE CONTRACTS WITH INVESTMENT GUARANTEES A PROPOSAL FOR ROMANIAN LIFE INSURANCE MARKET Cristina CIUMAŞ Department of Finance, Faculty of Economics and Business Administration, Babeş-Bolyai
More informationALLEGHENY COUNTY BOARD OF HEALTH
ALLEGHENY COUNTY BOARD OF HEALTH MINUTES March 3, 2014 Present: Lee Harrison, MD, Chair William Youngblood, Vice Chair Karen Hacker, MD, Secretary Anthony Ferraro Joylette Portlock, PhD Edith Shapira,
More informationBASIC LIFE AND ACCIDENTAL DEATH & DISMEMBERMENT UNDERWRITTEN BY LIFEWISE ASSURANCE COMPANY
BASIC LIFE AND ACCIDENTAL DEATH & DISMEMBERMENT UNDERWRITTEN BY LIFEWISE ASSURANCE COMPANY This summary of benefits explains the key features of your Group Life and AD&D benef its. The contrac t between
More informationTime s Up: DCAA s Renewed Focus on Incurred Cost Submissions
Time s Up: DCAA s Renewed Focus on Incurred Cost Submissions Nicole Mitchell, CPA Donna Dominguez Aronson LLC May 1, 2013 2013 All Rights Reserved 805 King Farm Boulevard Suite 300 Rockville, Maryland
More informationWORKING CAPITAL MANAGEMENT OF BAJAJ AUTO LTD. WITH SPECIAL REFERENCE TO AUTOMOBILE INDUSTRY.
International Journal of Entrepreneurship and Management Research Vol. 1 No. 1 (January-June 2011) pp. 63-71 WORKING CAPITAL MANAGEMENT OF BAJAJ AUTO LTD. WITH SPECIAL REFERENCE TO AUTOMOBILE INDUSTRY.
More informationPermutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn
Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Gordon K. Smyth & Belinda Phipson Walter and Eliza Hall Institute of Medical Research Melbourne,
More informationSpring 2006 Meeting Notice. Northeastern Illinois Chapter American Statistical Association
Spring 2006 Meeting Notice Northeastern Illinois Chapter American Statistical Association DATE: Thursday, March 9, 2006 (Luncheon at 11:30 a.m.) LOCATION: Glenview Suites, 1400 Milwaukee Ave., Glenview,
More informationWhite Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics
White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does
More informationBRAIN AWARENESS ACTIVITY IN NEUROSCIENCES. City Date Hall TITLE: BRAIN AND BEHAVIOR PROGRAM. Basic brain anatomy
BRAIN AWARENESS ACTIVITY IN NEUROSCIENCES City Date Hall Rhodes 13 March 2012 High School of Ialysos Rhodes 14 March 2012 1 st High School of Rhodes Rhodes 14 March 2012 3 rd High School of Rhodes Rhodes
More informationPublic Reporting of Nursing Home Quality: Does It Pay Off?
Public Reporting of Nursing Home Quality: Does It Pay Off? Jeongyoung Park University of Pennsylvania Co-authors: Rachel M. Werner R. Tamara Konetzka Funding: AHRQ (R01-HS016478-01) 1 Public Reporting
More informationThe Use of M&S VV&A as a Risk Mitigation Strategy in Defense Acquisition
The Use of M&S VV&A as a Risk Mitigation Strategy in Defense Acquisition Michelle Kilikauskas Joint Accreditation Support Activity NAVAIR Weapons Division China Lake, CA 93555 michelle.kilikauskas@navy.mil
More informationFP7-ICT-2013-11-4.2. Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time)
Scalable Data Analytics Deadline: 16 April 2013 at 17:00:00 (Brussels local time) Agenda Time 14H30 Programme Overview of Objective 4.2 Scalable Data Analytics By Carola Carstens, European Commission,
More informationStatistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept
Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions
More information