eachother'smethodsandusingajudiciouslychosencombinationofthem. Abstract
|
|
- Elfreda Catherine Shaw
- 8 years ago
- Views:
Transcription
1 IBMT.J.WatsonResearchCenter,YorktownHeights,N.Y.,U.S.A. Astatisticalperspectiveondatamining JonathanHosking,EdwinPednaultandMadhuSudan therearesomephilosophicalandmethodologicaldierences.weexaminethesedierences,and wedescribethreeapproachestomachinelearningthathavedevelopedlargelyindependently: Dataminingcanberegardedasacollectionofmethodsfordrawinginferencesfromdata.The aimsofdatamining,andsomeofitsmethods,overlapwiththoseofclassicalstatistics.however, classicalstatistics,vapnik'sstatisticallearningtheory,andcomputationallearningtheory.comparingtheseapproaches,weconcludethatstatisticiansanddataminerscanprotbystudying eachother'smethodsandusingajudiciouslychosencombinationofthem. Abstract asobtainingecientsummariesoflargeamountsofdata,identifyinginterestingstructuresand 1Introduction:astatisticianlooksatdatamining Therecentupsurgeofinterestintheeldvariouslyknownasdatamining,knowledgediscovery ormachinelearning1hastakenmanystatisticiansbysurprise.dataminingattackssuchproblems Keywords:classication,frequentistinference,PAClearning,statisticallearningtheory. temptingforastatisticiantoregarddataminingasnomorethanabranchofstatistics. lectionofmethodsforsummarizingandidentifyingpatternsindata.manystatisticalmodelsexist forexplainingrelationshipsinadatasetorformakingpredictions:clusteranalysis,discriminant problems.exploratorydataanalysis,aeldparticularlyassociatedwithj.w.tukey[18],isacol- analysisandnonparametricregressioncanbeusedinmanydataminingproblems.itistherefore torsoffutureobservations.statisticianshavewellestablishedtechniquesforattackingallofthese relationshipswithinadataset,andusingasetofpreviouslyobserveddatatoconstructpredic- Datasetscanbeverymuchlargerthanisusualinstatistics,runningtohundredsofgigabytesor timetotasinglemodel.therearedierencesofemphasisintheapproachtomodeling:comparedwithstatistics,dataminingpayslessattentiontothelarge-sampleasymptoticproperties ofitsinferencesandmoretothegeneralphilosophyof\learning",includingconsiderationofthe complexityofmodelsandofthecomputationsthattheyrequire.somemodelingtechniques,such Nonetheless,theproblemsandmethodsofdatamininghavesomedistinctfeaturesoftheirown. terabytes.dataanalysesareonacorrespondinglylargerscale,oftenrequiringdaysofcomputer independentlyofinputfromstatisticians. asneuralnetworks,haveanextensivemethodologyandterminologythathasdevelopedlargely asrule-basedmethods,arediculttotintotheclassicalstatisticalframework,andothers,such \learning"isaloadedterm. modelthatisunjustiablyelaborateforagivendataset(e.g.[11]).\machinelearning"isprobablybetter,though 1Unfortunately,\datamining"isapejorativetermtostatisticians,whouseittodescribethettingofastatistical 1
2 ties:classicalstatistics,thestatisticallearningtheoryofv.vapnik,andcomputationallearning theory.section6containssomecomparisonsandconclusions. anddatamining.insection2weobservesomeofthedierencesbetweenthestatisticalanddataminingapproachestodataanalysisandmodeling.insections3{5wedescribeinmoredetailsome Thispaperisabriefintroductiontosomeofthesimilaritiesanddierencesbetweenstatistics approachestomachinelearningthathaveariseninthreemore-or-lessdisjointacademiccommuni- Bothstatisticsanddataminingareconcernedwithdrawinginferencesfromdata.Theaimof theinferencemaybeunderstandingthepatternsofcorrelationandcausallinksamongthedata statisticshasdevelopedanapproach,describedfurtherinsection3below,thatinvolvesspecifying 2.1Featuresofdatamining values(\explanation"),ormakingpredictionsoffuturedatavalues(\generalization").classical 2Statisticsanddatamining statements.data-miningmethodshaveinmanycasesbeendevelopedforproblemsthatdonott amodelfortheprobabilitydistributionofthedataandmakinginferencesintheformofprobability whenappliedtofamiliarstatisticalproblemssuchasclassicationandregression,theyretainsome distinctfeatures.wenowmentionsomefeaturesofthedata-miningapproachesandtheirtypical easilyintotheframeworkofclassicalstatisticsandhaveevolvedinisolationfromstatistics.even implementations. Complexmodels.Someproblemsinvolvecomplexinteractionsbetweenfeaturevariables,with datasets(104to107examples).thisisinsomecasesaconsequenceoftheuseofcomplexmodels, Largeproblems.Bythestandardsofclassicalstatistics,dataminingoftendealswithverylarge shouldhavebetterprospectsofsuccessincomplexproblems. asneuralnetworksandrule-basedclassiershavethecapacitytomodelcomplexrelationshipsand a1616arrayofpixels,itisdiculttoformulateacomprehensiblestatisticalmodelthatcan identifythecharacterthatcorrespondstoagivenpatternofdots.data-miningtechniquessuch nosimplerelationshipsbeingapparentinthedata.characterrecognitionisagoodexample;given mining. Manydiscretevariables.Datasetsthatcontainamixtureofcontinuousanddiscrete-valued ofcomputationalcomplexityandscalabilityofalgorithmsareoftenofgreatimportanceindata continuousvariables.manydataminingmethodsaremoretolerantofdiscrete-valuedvariables. variablesarecommoninpractice.mostmultivariateanalysismethodsinstatisticsaredesignedfor Indeed,somerule-basedapproachesuseonlydiscretevariablesandrequirecontinuousvariablesto forwhichlargeamountsofdataareneededtoderivesecureinferences.inconsequence,issues expressedintermsofpredictionerror:forexample,inclassicationproblemsthelossfunction mightbethemisclassicationrateonasetofexamplesnotusedinthemodel-ttingprocedure. bediscretized. Wideuseofcross-validation.Data-miningmethodsoftenseektominimizealossfunction Predictionerrorisoftenestimatedbycross-validation,atechniqueknowntostatisticsbutused muchmorewidelyindatamining. canbeusedinanestedfashion the\wrappermethod"[7] tooptimizeseveralaspectsofthe model.theseincludevariousparametersthatmightotherwisebechosenarbitrarily(e.g.,the Minimizationofthepredictionerrorestimatedbycross-validationisapowerfultechniquethat 2
3 amountofpruningofadecisiontree,orthenumberofneighborstouseinanearest-neighbor examples.statisticalmethodsareparticularlylikelytobepreferablewhenfairlysimplemodelsare thegreatercomplexityofdataminingmethodsisnotalwaysjustiable:ripley[16]citesseveral approachesispossiblebutseemsrarelytobeperformed.somecomparisonshavefoundthat eliminatedfromthemodel. Fewcomparisonswithsimplestatisticalmodels.Whendataminingmethodsareused onproblemstowhichclassicalstatisticalmethodsarealsoapplicable,directcomparisonofthe classier)andthechoiceofwhichfeaturevariablesarerelevantforclassicationandwhichcanbe adequateandtheimportantvariablescanbeidentiedbeforemodeling.thisisacommonsituation logisticregressionandconcludedthattheuseofneuralnetworks\doesnotnecessarilyimplyany inbiomedicalresearch,forexample.inthiscontextvachetal.[19]comparedneuralnetworksand progress:theyfailintranslatingtheirincreasedexibilityintoanimprovedestimationofthe Acommonprobleminstatisticsanddataminingistouseobservationsonasetof\featurevariables" topredictthevalueofa\classvariable".thisproblemcorrespondstostatisticalmodelsfor classicationwhentheclassvariabletakesadiscretesetofvaluesandforregressionwhenthe regressionfunctionduetoinsucientsamplesizes,theydonotgivedirectinsighttotheinuence 2.2Classication:anillustrativeproblem ofsinglecovariates,andtheyarelackinguniquenessandreproducibility". theclassvariablebyyandthefeaturevariablesbythevectorx=[x1:::xf].itissometimes valuesoftheclassvariablecoveracontinuousrange.toillustratetherangeofapproachesavailable thisonecanlistvariousdata-miningmethodsindecreasingorderoftheirresemblancetoclassical statisticalmodeling.moredetailsofmanyofthesemethodscanbefoundin[13].wedenote usedforclassication.theclassicalstatisticalapproachisdiscriminantanalysis;startingfrom convenienttothinkofthefeaturevariablesasordinatesofa\featurespace"withtheaimofthe instatisticsanddataminingweconsidertheclassicationproblem.manydierentmethodsare ticaltechniquebasedonstatisticalmodelscontaining,usually,relativelyfewparameters.the modelingprocedureseekslinearorquadraticcombinationsofthefeaturevariablesthatidentify analysisbeingtopartitionthefeaturespaceintoregionscorrespondingtothedierentclasses (valuesofy). arecontinuous-valuedand,withineachclass,approximatelynormallydistributed. Linear/quadratic/logisticdiscriminantanalysis.Discriminantanalysisisaclassicalstatis- nonlineartransformationsoftheselinearcombinations,withtheprobabilityofafeaturevectorx belongingtoclasskbeingmodeledasmxm=1km theboundariesbetweenclasses.themostdetailedtheoryappliestocasesinwhichthefeatures izationoflogisticdiscriminationthatalsoinvolveslinearcombinationsoffeaturesbutalsoincludes Projectionpursuit.Forclassicationproblems,projectionpursuitcanbethoughtofasageneral- projectionpursuitasa\neostatistical"ratherthanaclassicalstatisticaltechnique. tation.thenonlinearitiesandoftenlargenumbersofparametersinthemodelleadsonetoregard mareprespeciedscatterplotsmoothingfunctions,choseninpartfortheirspeedofcompu- mfxj=1mjxj: 3 (1)
4 model.theprobabilityofafeaturevectorxbelongingtoclasskismodeledas Radialbasisfunctions.Radialbasisfunctionsformanotherkindofnonlinearneostatistical feedforwardnetwork,canbethoughtofasamodelsimilarto(1).however,the Neuralnetworks.Acommonformofneuralnetworkfortheclassicationproblem,themultilayer Herejjx cmjjisthedistancefrompointxinfeaturespacetothemthcentercm,misascale factor,andisabasisfunction,oftenchosentobethegaussianfunction(r)=exp( r2). MXm=1m(jjx cmjj=m): thatisunfamiliartostatisticians. Graphicalmodels.Graphicalmodels,alsoknownasBayesiannetworks,belieffunctions,orcausal neostatisticalmodels,butauniquemethodologyandterminologyforneuralnetworkshasdeveloped aredierent generallythelogisticfunction onelayeroflogistictransformationsmaybeapplied.neuralnetworksarerecognizablycloseto diagrams,involvethespecicationofanetworkoflinksbetweenfeatureandclassvariables.the m(t)=1=f1+exp( t)gisused andmorethan mtransformations linksspecifyrelationsofstatisticaldependencebetweenparticularfeatures;equallyimportantly, absenceofadirectlinkbetweentwofeaturesisanassertionoftheirconditionalindependence graphicalmodelsinvolvelargenumbersofparametersanddonottwellintotheframeworkof giventheotherfeaturesappearinginthenetwork.linksinthenetworkcanbeinterpretedas causalrelationsbetweenfeatures thoughthisisnotalwaysstraightforward,asexempliedby classicalstatisticalinference. Nearest-neighbormethods.Atitssimplest,thek-nearestneighborprocedureassignsaclassto thediscussionin[15] whichcanyieldparticularlyinformativeinferences.forrealisticproblems, pointxinfeaturespaceaccordingtothemajorityvoteoftheknearestdatapointstox.thisisa smoothingprocedure,andwillbeeectivewhenclassprobabilitiesvarysmoothlyoverthefeature basedonthevaluetakenbyasinglefeature,untilthepartitionsaresonethateachcorrespondsto Decisiontrees.Adecisiontreeisasuccessionofpartitionsoffeaturespace,eachpartitionusually asinglevalueoftheclassvariable.thisformulationbearslittleresemblancetoclassicalparametric space.questionsariseastothechoiceofkandofanappropriatedistancemeasureinfeaturespace. Theseissuesarenoteasilyexpressedintermsofclassicalstatisticalmodels.Modelspecicationis thereforedeterminedbymaximizingclassicationaccuracyonasetoftrainingdataratherthan measuredbyminimumdescriptionlength. statisticalmodels.choiceofthebesttreerepresentationisobtainedbycomparingdierenttrees intermsoftheirpredictiveaccuracy,estimatedbycross-validation,andtheircomplexity,often byformallyspecifyingandttingastatisticalmodel. Rules.Rule-basedmethodsseektoassignclasslabelstosubregionsoffeaturespaceaccordingto considerationofaruleset'spredictiveaccuracyandcomplexity. ofclassicalstatisticalmodels,andtheparametervaluesareoptimized,asfordecisiontrees,by involveparameterswhoseoptimalvaluesareunknown.themethodscannotbeexpressedinterms Individualrulescanbecomplexandhardtointerpretsubjectively.Rule-generationmethodsoften logicalcriteriasuchasifx1=3andx215andx2<30theny=1: 4
5 theirdierences.anygivendatasetmaycontainirrelevantorpoorlymeasuredfeatureswhichonly addnoisetotheanalysisandshouldforeciency'ssakebedeleted;somedependencesbetween classandfeaturesmaybemostsuccinctlyexpressedintermsofafunctionofseveralfeatures featuresx1;:::;xf.itcanbearguedthatthissimilaritybetweentheapproachesoutweighsallof ratherthanbyasinglefeature.nomethodcanbeexpectedtoperformwellifdoesnotusethe classicationproblem.however,eachapproachrequiresatsomestagetheselectionofappropriate Theforegoinglistillustratesawiderangeofstatisticalanddata-miningapproachestothe mostinformativefeatures:\garbagein,garbageout". componentsregressionexplicitlyformlinearcombinationsoffeaturesthatarethenusedasnew ontheimpurityoftheconditionalprobabilitydistributionoftheclassvariablegiventhefeatures, usedindecision-treeandrule-basedclassiers[10].asnotedabove,the\wrapper"methodisa powerfulandwidelyapplicabletechniqueforfeatureselection. above.theserangefromcriteriabasedonsignicancetestsforstatisticalmodelstomeasuresbased Constructionofnewfeaturescanbeexplicitorimplicit.Sometechniquessuchasprincipal- Explicitfeatureselectioncriteriahavebeendevelopedforseveralofthemethodsdescribed featurevariablesinthemodel.conversely,thelinearcombinationspjmjxjoffeaturesthat appearintherepresentation(1)forprojection-pursuitandneural-networkclassiersareimplicit andscienticinference.adetailedaccountofthetheoryisgivenbycoxandhinkley[2].the constructedfeatures.constructionofnonlinearcombinationsoffeaturesisgenerallyamatterfor Inthissectionwegiveabriefsummaryoftheclassical\frequentist"approachtostatisticalmodeling 3Classicalstatisticalmodeling subjectivejudgement. techniquesusedinappliedstatisticalanalysesaredescribedinmorespecializedtextssuchas[4]for andclassication,thedatavectorzisdecomposedintoz=[x;y]andyismodeledasafunction classicationproblemsand[27]forregression.weassumethatinferencefocusesonadatavectorz withtheavailabledatazi;i=1;:::;`,being`instancesofz.inmanyproblems,suchasregression with\whatmighthavehappened,butdidn't"(otherpotentiallyobservabledatavectors). vectorz.thisenables\whathappened"(theobserveddatavector)tobequantitativelycompared 3.1Modelspecication Astatisticalmodelisthespecicationofafrequencydistributionp(z)fortheelementsofthedata ofthexvalues. interest;thefrequencydistributionofxmayormaynotberelevant.inmoststatisticalregression therelationshipbetweenyandxisobservedwitherror.thealternativespecicationinwhichthe whereeisanerrortermhavingmeanzeroandsomeprobabilitydistribution;i.e.,itisassumedthat analysesthemodelhastheform functionalrelationshipy=f(x)isexactanduncertaintyarisesonlywhenpredictingyathitherto Inregressionandclassicationproblemstheconditionaldistributionofygivenx,p(yjx),isof unobservedvaluesofxismuchlesscommon:oneexampleistheinterpolationofrandomspatial processesbykriging[8]. Inclassicalstatistics,modelspecicationhasalargesubjectivecomponent.Candidatesforthe y=f(x)+e (2) distributionofz,ortheformoftherelationshipbetweenyandx,maybeobtainedfrominspection 5
6 bythemaximum-likelihoodprocedure:thejointprobabilitydensityfunctionofthedata,p(z;), 3.2Estimation Modelspecicationgenerallyinvolvesanunknownparametervector.Thisistypicallyestimated ofthedata,fromfamiliaritywithrelationsestablishedbypreviousanalysisofsimilardatasets,or function logp(z;).whenthedataareassumedtobeasetofindependentandidentically ismaximizedover.maximum-likelihoodestimationcanberegardedasminimizationoftheloss fromascientictheorythatentailsparticularrelationsbetweenelementsofthedatavector. Whenthedatavectorisdecomposedasz=[x;y],theobserveddataaresimilarlydecomposedas zi=[xi;yi],andthelossfunction(negativelog-likelihood)is distributedvectorszi,i=1;:::;`,thislossfunctionis `Xi=1 logp(yijxi;): `Xi=1 logp(zi;): varianceindependentofi,thislossfunctionisequivalenttothesumofsquares andecientasthesamplesize`increasestoinnity.exceptforcertainmodelswhoseanalysisis IftheconditionaldistributionofyigivenxiisNormalwithmeanafunctionofxi,f(xi;),and particularlysimple,classicalstatisticshaslittletosayaboutnite-samplepropertiesofestimators andpredictors. Thejusticationformaximum-likelihoodestimationisasymptotic:theestimatorsareconsistent `Xi=1fyi f(xi;)g2: theparameterisregardedasxedbutunknown,anddoesnothaveaprobabilitydistribution. Estimatesofaccuracyaretypicallyexpressedintermsofcondenceregions.Infrequentistinference Insteadoneconsidershypotheticalrepetitionsoftheprocessofgenerationofdatafromthemodel withaxedvalue0oftheparametervector,followedbycomputationof^,themaximumlikelihoodestimatorof.overtheserepetitionsaprobabilitydistributionfor^ willbebuilt Assessmentoftheaccuracyofestimatedparametersisanimportantpartoffrequentistinference. up.likelihoodtheoryprovidesanasymptoticlarge-sampleapproximationtothisdistribution. accuracywithwhichtheparametercanbeestimated. madefromthemodel.thesetooareasymptoticlarge-sampleapproximations.condencestatementsforparametersandpredictionsarevalidonlyontheassumptionthatthemodeliscorrect, thenacondenceregionforwithcondencelevel.thesizeoftheregionisameasureofthe FromitonecandeterminearegionC(^),dependingon^,ofthespaceofpossiblevaluesof,that containsthetruevalue0withprobability(nomatterwhatthistruevaluemaybe).c(^)is therelativefrequenciesofallofthepossiblevaluesofz.ifthemodelisfalse,predictionsmaybe i.e.thatforsomevalueofthespeciedfrequencydistributionp(z;)forzaccuratelyrepresents inaccurateandestimatedparametersmaynotbemeaningful. Condenceregionscanalsobeobtainedforsubsetsofthemodelparametersandforpredictions 6
7 beinadequatethroughhavingthewrongstructure:forexample,aregressionmodelmayrelatey linearlytoxwhenthecorrectphysicalrelationislinearbetweenlogyandlogx. withadditionalstructurebeingneededtodescribethepatternsinthedata.amodelmayalso isunjustiablyelaborate,withthemodelstructureinpartrepresentingmerelyrandomnoiseinthe data.underttingistheconversesituation,inwhichthemodelisanoversimplicationofreality Inadequacyofastatisticalmodelmayarisefromthreesources.Overttingoccurswhenthemodel 3.3Diagnosticchecking parameterisdroppedwillusuallybedeemedadequate. nosticgoodness-of-ttests.astatistictiscomputedwhosedistributioncanbefound,either correct.ifthecomputedvalueoftisintheextremetailofitsdistributionthereisanindicationof exactlyorasalarge-sampleasymptoticapproximation,undertheassumptionthatthemodelis Ifthecondenceregionforaparameterincludesthevaluezero,thenasimplermodelinwhichthe Inthefrequentistframework,underttingbyastatisticalmodelistypicallyassessedbydiag- Comparisonofparameterswiththeirestimatedaccuracyprovidesacheckagainstovertting. andawayofmodifyingthemodeltocorrecttheinadequacy. valueoftoften(butnotalways)suggestsaparticulardirectioninwhichthemodelisinadequate, modelinadequacy:eitherthemodeliswrongorsomethingveryunusualhasoccurred.anextreme structure,thisisanindicationofmodelinadequacyandmaysuggestsomewayinwhichthemodel independentlydistributed;ifaplotofresidualsagainstthettedvaluesshowsanynoticeable shouldbemodied. ofmodeladequacy,foridenticationeitherofunderttingorofincorrectmodelstructure.for example,theresidualsfromaregressionmodelthatiscorrectlyspeciedwillbeapproximately notusedinformalgoodness-of-ttests,theycanbeusedasthebasisofsubjectivejudgements Manydiagnosticplotsandstatisticshavebeendevisedforparticularstatisticalmodels.Though themodelanditsparticularestimatedparametervalues.inanalysesinwhichthereistheoption inuentialvalueshavebeenmeasuredwithsucientaccuracytojustifyconclusionsdrawnfrom ontheestimatedvaluesofthemodelparameters.suchdatapointsmeritcloseinspectiontocheck whethertheoutliersmayhavearisenfromfaultydatacollectionortranscription,andwhetherthe orinuentialvalues,whicharesuchthatasmallchangeinthedatavaluewillhavealargeeect observationsmaybeoutliers,valuesthatarediscordantwiththepatternoftheotherdatavalues, Diagnosticplotsarealsousedtoidentifydatavaluesthatareunusualinsomerespect.Unusual modelinadequacyrevealedbydiagnosticcheckssuggestsamodiedmodelspecicationdesigned ofcollectingadditionaldataatcontrolledpoints,forexamplewhenmodelingtherelationy=f(x) wherexcanbexedandthecorrespondingvalueofyobserved,themostinformativexvaluesat tocorrecttheinadequacy;themodiedmodelisthenitselfestimatedandchecked,andthecycle whichtocollectmoredatawillbeintheneighborhoodofoutlyingandinuentialdatapoints. Thesequenceofspecication{estimation{checkinglendsitselftoaniterativeprocedureinwhich 3.4Modelbuildingasaniterativeprocedure bespeciedapriori.thisisthecase,forexample,whenthecandidatesformasequenceofnested isrepeateduntilasatisfactorymodelisobtained.thisprocedureoftenhasalargesubjective isalsoincludedin(j+1).carefulcontrolovertheprocedureisnecessaryinordertoensurethat formalprocedurestoidentifythebestmodelcanbedevisediftheclassofcandidatemodelscan modelsm1;:::;mm,whoseparametervectors(1);:::;(m)aresuchthateveryelementof(j) component,arisingfromthemodelspecicationsandthechoiceofdiagnosticchecks.however, 7
8 inferencesarevalid,forexamplethatcondenceregionsfortheparametersinthenalmodelhave thecorrectcoverageprobability. forexamplewhetheraregressionmodely=(1) logy=(2) onthequalityoftofthemodels,theireaseofinterpretationandtheirconcordancewithknown physicalmechanismsrelatingthevariablesinthemodel. basedontheassumptionthatthenalmodeliscorrect.thisisproblematicalintworespects.in Classicalfrequentiststatisticshaslittletosayaboutthechoicebetweennonnestedmodels, manysituationsonemaybelievethatthetruedistributionofzhasaverycomplexstructureto Onceasatisfactorymodelhasbeenobtained,furtherinferencesandpredictionsaretypically 1x1+(2) 2x3.Suchdecisionsaregenerallyleftasamatterofsubjectivejudgementbased 1x1+(1) 2x2issuperiortoanalternativemodel inferences. estimatedandtestedonthesamesetofdata,andfailuretoallowforthiscanleadtoinaccurate parameterestimatorsinthenalmodelmaybeaectedbythefactthatseveralmodelshavebeen whichanystatisticalmodelisatbestanapproximation.furthermore,thestatisticalpropertiesof variabilitycancausexvariablesthatareactuallyunrelatedtoytoappeartobestatistically leadstounderestimationofthevariabilityoftheerrortermintheregressionmodel,whichcanlead procedureforidentifyingthebeststatisticalmodel,inthiscasedecidingwhichelementsofthe signicant,theestimatedregressioncoecientsofthevariablesselectedforthenalmodeltend tobeoverestimatesoftheabsolutemagnitudeofthetrueparametervalues.this\selectionbias" xcomponentofthedatavectorshouldappearintheregressionmodel(2).becauserandom Asanexampleofthislastproblem,weconsiderstepwiseregression.Thisisawidelyused topoorresultswhenthenalmodelisusedforprediction.inpracticeitisoftenbettertouseall oftheavailablevariablesratherthanastepwiseprocedureforprediction[14]. Simulation-basedmethodssuchasthebootstrap[3]enablebetterassessmentofaccuracyinnite 3.5Recentdevelopments useofnonlinearmodelsenablesawiderrangeofx{yrelationshipstobeaccuratelymodeled. classicalfrequentistapproach.akaike'sinformationcriterion[17],andrelatedmeasuresofschwarz samples. estimators[6]hasmadeinferencelesssusceptibletooutliersandinuentialdatavalues.greater Developmentsinstatisticaltheorysincethe1970shaveaddressedsomeofthedicultieswiththe 4Vapnik'sstatisticallearningtheory andrissanen,providelikelihood-basedcomparisonsofnonnestedmodels.developmentofrobust itsformisnotexactlycorrect,isoftenthepurposeoftheanalysis.thissituationisalsofacedin classicalstatisticalmodelingandhasledtothecreationofthediagnosticchecksdiscussedearlier. theformofthecorrectmodelisusuallyunknown.infact,discoveringanadequatemodel,evenif Onereasonthatclassicalstatisticalmodelinghasalargesubjectivecomponentisthatmostofthe mathematicaltechniquesusedintheclassicalapproachassumethattheformofthecorrectmodel decidedsubjectivelybasedonthejudgmentandexperienceofthedataanalyst. guidancewhencomparingdierenttypesofmodels.thequestionofmodeladequacymuststillbe However,evenwiththesediagnostics,theclassicalapproachdoesnotprovidermmathematical isknownandthattheproblemistoestimateitsparameters.indatamining,ontheotherhand, amathematicalbasisforcomparingmodelsofdierentformsandforestimatingtheirrelative ThislattersourceofsubjectivityhasmotivatedVapnikandChervonenkis[24,25,26]todevelop 8
9 statisticallearningtheorycloselymatchesthesituationactuallyfacedindatamining. nitesamplesenablesoverttingtobequantitativelyassessed.thus,theunderlyingpremiseof asymptoticstatisticsasisusuallythecaseintheclassicalapproach.thisshiftofemphasisto becorrect.inaddition,comparisonsbetweenmodelsarebasedonnite-samplestatistics,not ofthecorrectmodelistrulyunknownandthatthegoalistoidentifythebestpossiblemodel fromagivensetofmodels.themodelsneednotbeofthesameformandnoneofthemneed adequacies.thisbodyofwork,nowknownasstatisticallearningtheory,presumesthattheform 4.1Modelspecication Asinclassicalstatisticalmodeling,modelsforthedatamustbespeciedbytheanalyst.However, orderingisusedtoaddresstheissueofovertting.inpractice,modelswithfewerparametersor thedata.inaddition,apreferenceorderingoverthemodelsmustalsobespecied.thispreference insteadofspecifyingasingle(parametric)modelwhoseformisthenassumedtobecorrect,aseries ofcompetingmodelsmustbespeciedoneofwhichwillbeselectedbasedonanexaminationof modeling;however,whatisbeingestimatedisquitedierent.intheclassicalapproach,theformof explainsthedata. 4.2Estimation Estimationplaysacentralroleinstatisticallearningtheoryjustasitdoesinclassicalstatistical Whenapplyingstatisticallearningtheory,onesearchesforthemostpreferablemodelthatbest degreesoffreedomarepreferabletothosewithmore,sincetheyarelesslikelytoovertthedata. estimatingtherelativeperformanceofcompetingmodelssothatthebestmodelcanbeselected. themodelisassumedtobeknownand,hence,emphasisisplacedonestimatingitsparameters.in statisticallearningtheory,thecorrectmodelisassumedtobeunknownandemphasisisplacedon extendedsothatdenesboththespecicparametersofthemodelandtheparametricfamily specicmodel.inthecaseofaparametricfamilyofmodels,thenotationintroducedearlieris arealsoconsideredfordierentkindsofmodelingproblems. statisticallearningtheorywhencomparingprobabilitydistributions.however,otherlossfunctions Thenegativelog-likelihoodfunctionsemployedinclassicalstatisticalmodelingarealsousedin Ingeneral,statisticallearningtheoryconsidersthelossQ(z;)betweenadatavectorzanda Therelativeperformanceofcompetingmodelsismeasuredthroughtheuseoflossfunctions. towhichthemodelbelongs.inthisway,modelsfromdierentfamiliescanbecompared.when modelingthejointprobabilitydensityofthedata,theappropriatelossfunctionisthesamejoint Similarly,whenthedatavectorzcanbedecomposedintotwocomponents,z=[x;y]andweare negativelog-likelihoodusedinclassicalstatisticalmodeling: interestedinmodelingtheconditionalprobabilitydistributionofyasafunctionofx,thenthe conditionalnegativeloglikelihoodistheappropriatelossfunction: Ontheotherhand,ifwearenotinterestedintheactualdistributionofybutonlyinconstructing Q(z;)= logp(yjx;): Q(z;)= logp(z;): 0/1lossfunctionusedinpatternrecognitionisappropriate: apredictorf(x;)forythatminimizestheprobabilityofmakinganincorrectprediction,thenthe Q(z;)=0;iff(x;)=y, 1;iff(x;)6=y. 9
10 wealreadyknewallofthestatisticalpropertiesofthedata.ifthedatavectorzisgeneratedbya Ingeneral,Q(z;)canbechosendependingonthenatureofthemodelingproblemonefaces.Its minimizestheexpectedlossr()withrespecttof(z),where randomprocessaccordingtotheprobabilitymeasuref(z),thenthebestmodelistheonethat lossesimplybettermodelsofthedata. purposeistomeasuretheperformanceofamodelsothatthebestmodelcanbeselected.theonly requirementfromthepointofviewofstatisticallearningtheoryisthat,byconvention,smaller Oncealossfunctionhasbeenselected,identifyingthebestmodelwouldberelativelyeasyif utilitymeasureoftheoutcomegiventhedecision.utilitymeasuresprovideanumericalencoding ofuncertaintyoneiswillingtoacceptinchoosingariskydecisionthathasalowprobabilityof ofwhichoutcomesarepreferredoverothers,aswellasaquantitativemeasurementofthedegree nologyofdecisiontheory,isadecisionvector,zisanoutcome,andq(z;)isthe(negative) ThemodelthatminimizesR()isoptimalfromadecision-theoreticpointofview.Inthetermi- R()=ZQ(z;)dF(z): onemustchoosethemostsuitablemodelonecanidentifybasedonasetofobserveddatavectors probabilitymeasuref(z)thatdenesthestatisticalpropertiesofthedataisunknown.instead, measure thatis,thebestmodelgiventhelossfunction. utilityr()producesanoptimaldecisionconsistentwiththeriskpreferencesdenedbytheutility obtainingahighlydesirableoutcomeversusamoreconservativedecisionwithahighprobability ofamoderateoutcome.choosingthedecisionvectorthathasthebestexpected(negative) distributed,theaveragelossremp(;`)fortheobserveddatacanbeusedasanempiricalestimator zi,i=1;:::;`.assumingthattheobservedvectorsarestatisticallyindependentandidentically oftheexpectedloss,where Unfortunately,inpractice,theexpectedlossR()cannotbecalculateddirectlybecausethe modelsand/ortheirparametersareselectedbyoptimizingnumericalcriteriaofthisgeneralform. StatisticallearningtheorypresumesthatmodelsarechosenbyminimizingRemp(;`).Notethat thispresumptionisconsistentwithstandardmodel-ttingproceduresusedinstatisticsinwhich doesminimizingtheaverageempiricallossremp(;`)yieldmodelsthatalsominimizetheexpected Thefundamentalquestionofstatisticallearningtheoryisthefollowing:underwhatconditions Remp(;`)=1``Xi=1Q(zi;): fortheexpectedlosses,notfortheparameters.theexpectedlossr()foramodelisregarded expressedintermsofcondenceregions;however,inthiscase,condenceregionsareconstructed isarandomquantitythatwecansample,sinceitsvaluedependsonthevaluesoftheobserved asxedbutunknown,sincetheprobabilitymeasuref(z)thatdenesthestatisticalpropertiesof byconsideringtheaccuracyoftheempiricallossestimate.asinclassicalstatistics,accuracyis thedatavectorsisxedbutunknown.ontheotherhand,theaverageempiricallossremp(;`) lossr(),sincethelatteriswhatweactuallywanttoaccomplish?thisquestionisanswered datavectorszi,i=1;:::;`,usedinitscalculation.statisticallearningtheorythereforeconsiders condenceregionsforr()givenremp(;`). distinguishesstatisticallearningtheoryfromclassicalstatistics.oneofthefundamentaltheorems modelsareselectedbyminimizingaverageempiricalloss.thislattercaveatisthekeyissuethat dierencebetweentheexpectedandaverageempiricallosseswhiletakingintoaccountthefactthat Toconstructthesecondenceregions,weneedtoconsidertheprobabilitydistributionofthe 10
11 andaverageempiricallosses;thatis,onemustconsiderthedistributionof ofstatisticallearningtheoryshowsthat,inordertoaccountforthefactthatmodelsareselectedby minimizingaverageempiricalloss,onemustconsiderthemaximumdierencebetweentheexpected somanydegreesoffreedomthatonecanndamodelthattsthenoiseinthedatabutdoesnot adequatelyreecttheunderlyingrelationships.asaresult,oneobtainsamodelthatlooksgood whereisthesetofmodelsoneisselectingfrom. ofovertting.intuitivelyspeaking,overttingoccurswhenthesetofmodelstochoosefromhas Thereasonthatthemaximumdierencemustbeconsideredhastodowiththephenomenon 2R() Remp(;`); sup minimizesremp(;`).becauseofthissearch,themaximumdierencebetweentheexpectedand empiricallosswillunderestimatetheexpectedlossforaxedmodel,boththeprobabilityand thedegreeofunderestimationareincreasedbythefactthatweexplicitlysearchforthemodelthat averageempiricallossesisthequantitythatgovernsthecondenceregion. maticallycorrespondstoasituationinwhichtheaverageempiricallossremp(;`)substantially underestimatestheexpectedlossr().althoughthereisalwayssomeprobabilitythattheaverage relativetothetrainingdatabutthatperformspoorlywhenappliedtonewdata.thismathe- theyhavedevelopedtoconstructsmall-samplecondenceregionsfortheexpectedlossgiventhe averageempiricalloss.theresultingcondenceregionsdierfromthoseobtainedinclassical statisticsinthreerespects.first,theydonotassumethatthechosenmodeliscorrect.second, modelsoneisselectingfromindependentoftheformsofthosemodels.thismethodisbasedona theyarebasedonsmall-samplestatisticsandarenotasymptoticapproximationsasistypicallythe case.third,auniformmethodisusedtotakeintoaccountthedegreesoffreedominthesetof ThelandmarkcontributionofVapnikandChervonenkisisaseriesofprobabilityboundsthat example,thevcdimensionofalinearregressionordiscriminantmodelisequaltothenumberof termsinthemodel(i.e.,thenumberofdegreesoffreedomintheclassicalsense),sincenlinear anddoesnotformallyrequireanexactt;nevertheless,theintuitiveinsightsgainedbythinking termscanbeusedtoexactlytnpoints.theactualdenitionofvcdimensionismoregeneral ofdatavectorsforwhichoneisprettymuchguaranteedtondamodelthattsexactly.for measurementknownasthevapnik-chervonenkis(vc)dimension. abouttheconsequencesofexacttsareoftenvalidwithregardtovcdimension.forexample,one TheVCdimensionofasetofmodelscanconceptuallybethoughtofasthemaximumnumber exceedthevcdimensionofthesetofmodelstochoosefrom;otherwise,onecouldobtainanexact consequenceisthatinordertoavoidoverttingthenumberofdatasamplesshouldsubstantially ttoarbitrarydata. equallyapplicabletolinear,nonlinearandnonparametricmodels,andtocombinationsofdissimilar modelfamilies.thisincludesneuralnetworks,classicationandregressiontrees,classicationand regressionrules,radialbasisfunctions,bayesiannetworks,andvirtuallyanyothermodelfamily ofmodelswithonlyoneparameterthathaveinnitevcdimensionand,hence,areabletoexactly imaginable.inaddition,vcdimensionisamuchbetterindicatoroftheabilityofmodelstot arbitrarydatathanissuggestedbythenumberofparametersinthemodels.thereareexamples BecauseVCdimensionisdenedintermsofmodelttingandnumbersofdatapoints,itis tanysetofdata[22,23].therearealsomodelswithbillionsofparametersthathavesmallvc dimensions,whichenablesonetoobtainreliablemodelsevenwhenthenumberofdatasamplesis muchlessthanthenumberofparameters.vcdimensioncoincideswiththenumberofparameters 11
12 example,ifthelossfunctionq(z;)isthe0/1lossusedinpatternrecognition,thenwithprobability onlyforcertainmodelfamilies,suchaslinearregression/discriminantmodels.vcdimension thereforeoersamuchmoregeneralnotionofdegreesoffreedomthanisfoundinclassicalstatistics. regionislargelydeterminedbytheratioofthevcdimensiontothenumberofdatavectors.for atleast1, IntheprobabilityboundsobtainedbyVapnikandChervonenkis,thesizeofthecondence Remp(;`) pe VCdimensionhtothenumberofdatavectors`isthedominantterminthedenitionofEand, andwherehisthevcdimensionofthesetofmodelstochoosefrom.notethattheratioofthe E=4h`ln2` h+1 4`ln4 E1A; hence,inthesizeofthecondenceregionforr().otherfamiliesoflossfunctionshaveanalogous Theboundsarethereforeapplicableforanextremelywiderangeofmodelingproblemsandforany condenceregionsinvolvingthequantitye. familyofmodelsimaginable. propertiesofthedatavectors,theyarevalidforsmallsamplesizes,andtheyaredependentonly thattheymakenoassumptionsabouttheprobabilitydistributionf(z)thatdenesthestatistical onthevcdimensionofthesetofmodelsandonthepropertiesofthelossfunctionemployed. discussedindetailinbooksbyvapnik[21,22,23].theremarkablepropertiesoftheseboundsare TheconceptofVCdimensionandcondenceboundsforvariousfamiliesoflossfunctionsare 4.3Modelselection Asdiscussedatthebeginningofthissection,thedataanalystisexpectedtoprovidenotjusta singleparametricmodel,butanentireseriesofcompetingmodelsorderedaccordingtopreference, oneofwhichwillbeselectedbasedonanexaminationofthedata.theresultsofstatisticallearning theoryarethenusedtoselectthemostpreferablemodelthatbestexplainsthedata. amongthosemodelsthatoccurbeforethecut-o.asthecut-opointisadvancedthroughthe averageempiricallosssteadilydecreases.thesecondeectisthatthesizeofthecondenceregion ordering,theotheristoselectthemodelwiththesmallestaverageempiricallossremp(;`)from moremodelstochoosefromonecanusuallyobtainabetterttothedata;hence,theminimum preferenceordering,boththesetofmodelsthatappearbeforethecut-oandthevcdimensionof thissetsteadilyincrease.thisincreaseinvcdimensionhastwoeects.thersteectisthatwith Theselectionprocesshastwocomponents:oneistodetermineacut-opointinthepreference chooseacut-opointinthepreferenceordering,vapnikandchervonenkisadvocateminimizing fortheexpectedlossr()steadilyincreasesbecausethesizeisgovernedbythevcdimension.to forthedata. ofthecondenceparameter.themodelthatminimizestheaverageempiricallossremp(;`) forthosemodelsthatoccurbeforethechosencut-oisthenselectedasthemostsuitablemodel estimateofr().forexample,ifthe0/1lossfunctionwerebeingused,onewouldchoosethe theupperboundonthecondenceregionfortheexpectedloss;thatis,minimizetheworst-case cut-osoastominimizethelefthandsideoftheinequalitypresentedaboveforadesiredsetting TheoverallapproachisillustratedbythegraphinFigure1.Theprocessbalancestheability 12
13 Loss UpperBoundon ExpectedLoss Cut-O BestMinimumAverage EmpiricalLoss apoormodel.thepreferenceorderingprovidesthenecessarystructureinwhichtocompare tondincreasinglybettertstothedataagainstthedangerofoverttingandtherebyselecting Figure1:Expectedlossandaverageempiricallossasafunctionofthepreferencecut-o. PreferenceCut-O ofavailabledataincreases. 4.4Useofvalidationdata OnedrawbacktotheVapnik-ChervonenkisapproachisthatitcanbediculttodeterminetheVC processitselfattemptstomaximizetherateofconvergencetoanoptimummodelasthequantity (i.e.,vcdimension).theresultisamodelthatminimizestheworst-caselossonfuturedata.the competingmodelswhileatthesametimetakingintoaccounttheireectivedegreesoffreedom dimensionofasetofmodels,especiallyforthemoreexotictypesofmodels.evenforsimplelinear regression/discriminantmodels,thesituationisnotentirelystraightforward.therelationship statedabovethatthevcdimensionisequaltothenumberoftermsinsuchamodelisactually anupperboundonthevcdimension.ifthemodelsarewritteninacertaincanonicalform, dimensionsareordersofmagnitudesmallerthanthenumberofterms,evenifthemodelscontain thenthevcdimensionisalsoboundedbythequantityr2a2+1,whereristheradiusofthe onthevcdimensionmakesitpossibletoobtainlinearregression/discriminantmodelswhosevc billionsofterms.thisfactisextremelyfortunatebecauseitoersameansofavoidingthe\curseof smallestspherethatenclosestheavailabledatavectorsanda2isthesumofthesquaresofthe coecientsofthemodelinitscanonicalform.asvapnikhasshown[22],thisadditionalbound dimensionality,"enablingreliablemodelstobeobtainedeveninhigh-dimensionalspacesbybasing thepreferenceorderingofthemodelsonthesumofthesquaresofthemodelcoecients. canbeestimatedusingresamplingtechniques[3].inthesimplestoftheseapproaches,theavailable setofdataisrandomlydividedintotrainingandvalidationsets.thetrainingsetisusedrstto selectthebest-ttingmodelforeachcut-opointinthepreferenceordering.thevalidationset isthenusedtoestimatetheexpectedlossesoftheselectedmodelsbycalculatingtheiraverage IncaseswheretheVCdimensionofasetofmodelsisdiculttodetermine,theexpectedloss 13
14 expectedlossonthevalidationdataischosenasthemostsuitablemodel. uousparametersimpliesaninnitesetofmodels),itisveryeasytoobtaincondenceboundsfor empiricallossesonthevalidationdata.finally,themodelwiththesmallestupperboundforthe before,exceptthatenowhasthevalue aboutvcdimension[22].inparticular,thesameequationsforthecondenceboundsareusedas theexpectedlossesofthesemodelsindependentoftheirexactformsandwithouthavingtoworry Becauseonlyanitenumberofmodelsareevaluatedonthevalidationset(modelswithcontin- thesemodelsgiventheiraverageempiricallossesonthevalidationdata.sincethesameunderlying size`vofthevalidationset,onecanobtaintightcondenceregionsfortheexpectedlossesof principlesareatwork,thisapproachexhibitsthesamekindofrelationshipbetweentheexpected validationset.moreover,becausethenumbernofsuchmodelsistypicallysmallrelativetothe wherenisthenumberofmodelsevaluatedagainstthevalidationsetand`visthesizeofthe E=2`vlnN 2`vln; andaverageempiricallossesasthatshowninfigure1. expectedlossestimates,ithasthedisadvantagethatdividingtheavailabledataintosubsets decreasestheoverallaccuracyoftheresultingestimates.thisdecreaseinaccuracyisusually modelstoallofthedataandcalculatingthevcdimensionforallrelevantsetsofmodelsbecomes moreattractive. notmuchofaconcernwhendataisplentiful.however,whenthesamplesizeissmall,tting Althoughthisvalidation-setapproachhasanadvantageinthatitisrelativelyeasytoobtain Thestatisticaltheoryofminimizationoflossfunctionsprovidesageneralanalysisoftheconditions underwhichaclassofmodelsislearnable.thetheoryreducesthetaskoflearningtothatofsolving 5ComputationallearningtheoryandPAClearning empiricallossonthesamplesz1;:::;z`.beforeevendeningeciencyformally(weshalldo elaborateonpresently,thisturnsouttoberelatedtothefamousquestionfromcomputational widespreadbeliefisthatsuchalgorithmswillnotexistformanyclassofmodels.asweshall sosoon),wepointoutthatsuchecientalgorithmsarenotknowntoexist.furthermore,the Theperfectcomplementtothistheorywouldbeanecientalgorithmforeveryclassofmodels betocharacterizethemodelclassesforwhichecientalgorithmsdoexist.unfortunately,such characterizationsarealsoruledoutduetotheinherentundecidabilityofsuchquestions.inview ofthesebarriers,itbecomesclearthatthequestionofwhetheragivenmodelclassallowsforan ecientalgorithmtosolvetheminimizationproblemhastobetackledonanindividualbasis. Giventhattheanswertothisquestionismostprobablynegative,thenextbesthopewould focusonresultsthattendtounifythearea.thusmostofthissurveyisfocusedonformulating Thereareplentyofresultsthatshowhowtosolvesuchminimizationproblemsforvariousclasses ofmodels.theseshowthediversitywithintheareaofcomputationallearning.weshallhowever analysisoftheseproblems.wecoversomeofthesalientresultsinthisareainthisbriefsurvey. Thecomputationaltheoryoflearning,initiatedbyValiant'sworkin1984,isdevotedtothe 14
15 ofthemodel. 5.1Computationalmodeloflearning afunctionoftheinputandoutputsizeofthefunctiontobecomputed.thewell-entrenchedand Thecomplexityofacomputationaltaskisthenumberofelementarysteps(addition,subtraction, multiplication,division,comparison,etc.)ittakestoperformthecomputation.thisisstudiedas therightdenitionforthecomputationalsettingandexaminingseveralparametersandattributes well-studiednotionofeciencyisthatofpolynomialtime:analgorithmisconsideredecientif thenumberofelementaryoperationsitperformsisboundedbysomexedpolynomialintheinput andoutputsizes.theclassofproblemswhichcanbesolvedbysuchecientalgorithmsisdenoted byp(forpolynomialtime).thisshallbeournotionofeciencyaswell. representationofthehypothesis.inordertocircumventsuchdiculties,oneforcestherunning passasecient,bypickinganunnecessarilylargenumberofsamplesoranunnecessarilyverbose whichmaybeleftunclearbytheproblem.thechoicecouldeasilyallowaninecientalgorithmto Similarly,theoutputofthelearningalgorithmisagainarepresentationofthemodel,thechoiceof z1;:::;z`2rn,but`itselfmaybethoughtofasaparametertobechosenbythelearningalgorithm. theinputandoutputsizescarefully.theinputtothelearningtaskisacollectionofvectors Inordertostudythecomputationalcomplexityofthelearningproblem,wehavetodene with` atleast,notdirectly.butthesmallest`requiredtoguaranteegoodconvergencegrows consistentwiththedatawillbeatleastd.thusindirectlythisdoesallowtherunningtimetobe apolynomialin`. timeofthealgorithmtobepolynomialinn(theinputsizeofasinglesample)andthesizeof learningalgorithmproducesahypothesiswhosepredictionabilityisveryclose(givenbyanaccuracy algorithmisthatwithhighprobability(boundedawayfrom1byacondenceparameter),the functionqandasourceofrandomvectorsz2rnthatfollowsomeunknowndistributionf(z),a allowedtobeapolynomialin1=and1=aswell. beingdecidedbythealgorithm,andoutputsamodel(hypothesis)h(z1;:::;z`),possiblyfroma (generalized)paclearningalgorithmisonethattakestwoparameters(theaccuracyparameter) and(thecondenceparameter),reads`randomexamplesz1;:::;z`asinput,thechoiceof` Theabovediscussioncannowbeformalizedinthefollowingdenition,whichispopularly tobeecientifitsrunningtimeisboundedbyapolynomialinn,1=,1=andtherepresentation wherer()isthesameexpectedlossconsideredinstatisticallearningtheory.thealgorithmissaid Pr F[z1;:::;z`]2Rn`:R(h(z1;:::;z`))inf
16 learningproblem,inthissurveyweshallfocusonthebooleanpattern-recognitionproblemstypically examinedincomputationallearningtheory.herethedatavectorzispartitionedintoavector Hencetheaccuracyparameterrepresentsthemaximumpredictionerrordesiredforthemodel. x2f0;1gn 1andabity2f0;1gthatistobepredicted.Themodelisgivenbyafunction f:f0;1gn 1!f0;1gandthelossfunctionQ(z;)ofavectorz=[x;y]is0iff(x)=yand1 WhilethenotionofgeneralizedPAClearning(cf.[5])isitselfgeneralenoughtostudyany HenceforthwefocusonproblemsforwhichQ(z;)iscomputableeciently(i.e.,f(x)iscom- 5.2Intractablelearningproblems well-studiedcomputationalclassnp.npconsistsofproblemsthatcanbesolvedecientlybyan terministicmachinecannondeterministicallyguessthethatminimizestheloss,thussolvingthe problemeasily.ofcourse,theideaofanalgorithmthatmakesnondeterministicchoicesismerely amathematicalabstraction andnotecientlyrealizable.theimportanceofthecomputational algorithmthatisallowedtomakenondeterministicchoices.inthecaseoflearning,thenonde- classnpcomesfromthefactthatitcapturesmanywidelystudiedproblemssuchasthetravelingsalespersonproblem,orthegraphcoloringproblem.evenmoreimportantisthenotion restricted(tosomethingxed).atypicalexampleisthatoflearningapattern-recognitionproblem: tosolvethem? ofnp-hardness aproblemisnp-hardiftheexistenceofanecient(polynomial-time)algorithm tosolveitwouldimplyapolynomial-timealgorithmtosolveeveryprobleminnp.thefamous question\isnp=p?"asksexactlythisquestion:donp-hardproblemshaveecientalgorithms \3-termDNF".Itcanbeshownthatlearning3-termDNFformulaewith3-termDNFisNP-hard. Interestinglyhoweveritispossibletoecientlylearnabroaderclass\3CNF"whichcontains3- termdnf.thusthisnp-hardnessresultisnotpointingtoanyinherentcomputationalbottlenecks learningproblemtractable. tothetaskoflearning itmerelyadvocatesajudiciouschoiceofthehypothesisclasstomakethe ItiseasytoshowthatseveralPAClearningproblemsareNP-hardifthehypothesisclassis areeasytocompute,buthardtoinvert,evenonrandomlychoseninstances.suchinstancesare commonincryptography,andinparticulararetheheartofwell-knowncryptosystemssuchasrsa. Ifthisassumptionistrue,itimpliesthatNP6=P.Underthisassumptionitispossibletoshowthat patternrecognitionproblems,wherethepatternisgeneratedbyadeterministicfiniteautomaton somethingstrongerthannp6=p.acommonassumptionhereisthatthereexistfunctionswhich ofchoicefortheoutput.inordertoshowthehardnessofsuchproblemsoneneedstoassume Itishardertoshowthataclassofproblemsishardtolearnindependentoftherepresentation (orhiddenmarkovmodel)arehardtolearn,undersomedistributionsonthespaceofthedata vectors.recentresultsalsoshowthatpatternsgeneratedbyconstantdepthbooleancircuitsare Furthermore,thecomplexityofthelearningprocessisdenitelydependentontheunderlying hardtolearnundertheuniformdistribution. i.e.,moretractable,whennorestrictionsareplacedonthemodelusedtodescribethegivendata. distributionaccordingtowhichwewishtolearn.16 Insummary,thenegativeresultsshednewlightontwoaspectsoflearning.Learningiseasier,
17 theroleoftheparametersandinthedenitionoflearning.aswewillseethesearenotvery inlearningandpresentanalternatemodelwhichshowsmorerobustnesstowardssuchnoise. Thestrengthofweaklearning.Ofthetwofuzzparameters,and,usedinthedenition criticaltothelearningprocess.thesecondissuewewillconsideristheroleof\classicationnoise" Wenowmovetosomelessonslearntfrompositiveresultsinlearning.Therstofthesefocuseson 5.3PAClearningalgorithms ofpaclearning,itseemsclearthat(theaccuracy)ismoresignicantthan(thecondence), especiallyforpatternrecognitionproblems.forsuchproblems,givenanalgorithmwhichcanlearn themajorityvoteis-inaccuratewithprobability1 exp( ck)forsomec>0. algorithmktimes,producinganewhypothesiseachtime.denotethesehypothesesbyh1;:::;hk. amodelwithprobability,say2=3(oranycondencestrictlygreaterthan1=2),itiseasytoboost Useforthenewpredictionthealgorithmwhosepredictiononanyvectorxisthemajorityvoteof thepredictionsofh1;:::;hk.itiseasytoshow,byanapplicationofthelawoflargenumbers,that thecondenceofgettingagoodhypothesisasfollows.pickaparameterkandrunthelearning accuracy.ofcourse,theproblemisthatwedon'tknowwhereourearlierpredictionswerewrong(if samelearningalgorithmontheregionwhereourearlierpredictionsareinaccuratetoboostour 1=3,independentofthedistributionfromwhichthedatavectorsarepicked,thenwecouldusethe weareluckyenoughtobeabletondlearningalgorithmswhichlearntopredictwithinaccuracy isunclearastohowonecouldusealearningalgorithmwhichcanlearntopredictamodelwith inaccuracy1=3togetanewalgorithmwhichcanpredictamodelwithinaccuracy1%.however,if Theaccuracyparameter,ontheotherhand,doesnotappeartoallowsuchsimpleboosting.It robustnessofpaclearning:weaklearning(withinaccuracybarelybelow1=2)isequivalentto stronglearning(withinaccuracyarbitrarilycloseto0).howeverwestressthatthisequivalence tosquareone,itturnsoutnottobethecase.in1986,schapireshowedhowtoturnthisintuition togetaboostingresultfortheaccuracyparameteraswell.thisresultdemonstratesasurprising weknewwewouldchangeourprediction!).thoughitappearsthatthisreasoninghasledusback isobservedwithnopredictionnoise.thisisnotanassumptionjustiedbyreality.itismade usuallytogetabasicunderstandingoftheproblem.howeverinordertomakeacomputational learningresultusefulinpractice,onemustallowfornoise.numerousexamplesareknownwherean distribution. Learningwithnoise.Mostresultsincomputationallearningstartbyassumingthatthedata withanoracleandgetstoask\statistical"questionsaboutthedatavectors.atypicalstatistical queryasksfortheprobabilitythataneventdenedoverthedataspaceoccursforavectorchosen insteadofactuallyseeingdatavectorszassampledfromthespace,thelearningalgorithmworks aretoleranttoerrorswhileothersarenot,amodeloflearningcalledstatisticalquerymodelhas beenproposedbykearnsin1992.thismodelrestrictsalearningalgorithminthefollowingway: amountofnoiseaswell.howeverthisisnotuniversallytrue.tounderstandwhysomealgorithms algorithmwhichlearnswithoutclassicationnoise,canbeconvertedintoonethatcantoleratesome samplesofthedata.furthermore,itiseasytoseehowtosimulatethisoracleevenwhenthedata withinanadditiveerrorof.itiseasytoseehowtosimulatethisoracle,givenaccesstorandom presentedwithatoleranceparameter.theoraclerespondswiththeprobabilityoftheeventto atrandomfromthedistributionunderwhichweareattemptingtolearn.further,thequeryis 17
18 Table1:Statisticians'anddataminers'issuesindataanalysis. Statisticians'issues Modelspecication Parameterestimation Dataminers'issues Accuracy statisticalqueryoracleisasucientconditionforlearningwithclassicationnoise.almostall vectorscomewithsomeclassicationnoise,butlessthan.thuslearningwithaccessonlytoa Modelcomparison Diagnosticchecks Asymptotics Generalizability Computationalcomplexity Modelcomplexity potentiallearningstrategywhenattemptingtolearninthepresenceofnoise. model.thusthismodelprovidesagoodstandpointfromwhichtoanalysetheeectivenessofa knownalgorithmsthatlearnwithclassicationnoisecanbeshowntolearninthestatisticalquery Speedofcomputation handwritingrecognitionprograms.aclassoflearningalgorithmsthatbehaveinthismannerhas askquestionsaboutthedataoneistryingtolearn.considerforinstanceahandwritingrecognition program,whichgeneratessomepatternsandaskstheteachertoindicatewhatletterthispattern Alternatemodelsforlearning.ThissurveyhasfocusedonthePACmodelsinceitisclose beenstudiedunderthelabeloflearningwithqueries.othermodelsforlearningthathavebeen seemstoresemble.itisconceivablethatsuchlearningprogramsmaybemoreecientthanpassive modelsotherthanthepacmodel.thisbodyofworkconsiderslearningwhenoneisallowedto tothespiritofdatamining.however,alargebodyofworkincomputationallearningfocuseson 5.4Furtherreading Wehavegivenaveryinformalsketchofthevariousnewquestionsposedbystudyingtheprocess oflearning,orttingmodelstoagivendata,fromthepointofviewofcomputation.duetospace studiedincludecapturescenariosofsupervisedlearningandlearninginanonlinesetting. above.theinterestedreaderisreferredtothethetextonthissubjectbykearnsandvazirani[9] limitations,wedonotgiveacompletelistofreferencestothesourcesoftheresultsmentioned learningandtheirapplicabilitytopracticalscenarios. 6Conclusions foradetailedcoverageofthetopicsabovewithcompletereferences.othersurveysonthistopic include,thosebyvaliant[20]andangluin[1].finallyanumberofdierentlecturenotesarenow Theforegoingsectionsillustratesomedierencesofapproachbetweenclassicalstatisticsanddataminingmethodsthatoriginatedincomputerscienceandengineering.Table1summarizeswhatwe regardastheprincipalissuesindataanalysisthatwouldbeconsideredbystatisticiansanddata Inaddition,theapproachesofstatisticallearningtheoryandcomputationallearningtheory includespointerstootherusefulhomepagesfortrackingrecentdevelopmentsincomputational availableonlineonthistopic.thissurvey,hasinparticularusedthoseofmansour[12],which provideproductiveextensionsofclassicalstatisticalinference.theinferenceproceduresofclassical miners. 18
19 datasamplesbutnotforthefactthatinmanycasesthechoiceofmodelisdependentonthedata. statisticsinvolverepeatedsamplingunderagivenstatisticalmodel;theyallowforvariationacross andthatthetargetconceptisdeterministic.evenwiththesesimplications,usefulpositiveresults fornear-optimalmodelingarediculttoobtain,andforsomemodelingproblemsonlynegative distributionsofthedata.however,themajorityoftheresultsassumethatthedataarenoise-free seektoidentifymodelingproceduresthathaveahighprobabilityofnear-optimalityoverallpossible thatcouldinpracticebeverylarge.thepac-learningresultsfromcomputationallearningtheory Statisticallearningtheorybasesitsinferencesonrepeatedsamplingfromanunknowndistribution resultshavebeenobtained. ofthedata,andallowsfortheeectofmodelchoice,atleastwithinaprespeciedclassofmodels Forexample,statisticianstendtoworkwithrelativelysimplemodelsforwhichissuesofcomputationalspeedhaverarelybeenaconcern.Someofthedierences,however,presentopportunities inferencearerelatedtothedierentkindsofproblemsonwhichtheseapproacheshavebeenused. forstatisticiansanddataminerstolearnfromeachother'sapproaches.statisticianswoulddowell modelhasbeenidentied,andinsteadgivemoreattentiontoestimatesofpredictiveaccuracy todownplaytheroleofasymptoticaccuracyestimatesbasedontheassumptionthatthecorrect Tosomeextent,thedierencesbetweenstatisticalanddata-miningapproachestomodelingand obtainedfromdataseparatefromthoseusedtotthemodel.dataminerscanbenetbylearning fromstatisticians'awarenessoftheproblemscausedbyoutliersandinuentialdatavalues,and modelsareadequateandtheimportantvariablescanbeidentiedbeforemodeling.inproblems bymakinggreateruseofdiagnosticstatisticsandplotstoidentifyirregularitiesinthedataand inadequaciesinthemodel. withlargedatasetsinwhichtherelationbetweenclassandfeaturevariablesiscomplexand exempliedbythoselistedinsection2.2,oersnosharpdistinctionbetweenstatisticalanddataminingmethods.nosinglemethodislikelytobeobviouslybestforagivenproblem,anduseofa Asnotedearlier,statisticalmethodsareparticularlylikelytobepreferablewhenfairlysimple poorlyunderstood,dataminingmethodsoerabetterchanceofsuccess.however,manypracticalproblemsfallbetweentheseextremes,andthevarietyofavailablemodelsfordataanalysisbasedclassiermightuseadditionalfeaturevariablesformedfromlinearcombinationsoffeatures computedimplicitlybylogisticdiscriminantoraneural-networkclassier.inferencesfromseveral combinationofapproachesoersthebestchanceofmakingsecureinferences.forexample,arule- them. minerscanprotbystudyingeachother'smethodsandusingajudiciouslychosencombinationof inputfeatures \stackedgeneralization"[28].theoverallconclusionisthatstatisticiansanddata distinctfamiliesofmodelscanbecombined,eitherbyweightingthemodels'predictionsorbyan Acknowledgements additionalstageofmodelinginwhichpredictionsfromdierentmodelsarethemselvesusedas WearehappytoacknowledgehelpfuldiscussionswithseveralparticipantsattheWorkshoponData YishayMansour,DanaRonandRonittRubinfeld(M.S.). MininganditsApplications,InstituteofMathematicsanditsApplications,Minneapolis,November 1996(J.H.),manyconversationswithVladimirVapnik(E.P.),andcommentsandpointersfrom
20 References [1]Angluin,D.(1992).Computationallearningtheory:surveyandselectedbibliography.InProceedings ofthetwentyfourthannualsymposiumontheoryofcomputing,351{369.acm. [2]Cox,D.R.andHinkley,D.V.(1986).Theoreticalstatistics.London:ChapmanandHall. [3]Efron,B.(1981).Thejackknife,thebootstrap,andotherresamplingplans,CBMSMonograph38. Philadelphia,Pa.:SIAM. [4]Hand,D.J.(1981).Discriminationandclassication.Chichester,U.K.:Wiley. [5]Haussler,D.(1990).DecisiontheoreticgeneralizationsofthePAClearningmodel.InAlgorithmic LearningTheory,eds.S.Arikawa,S.Goto,S.Ohsuga,andT.Yokomori,pp.21{41.NewYork:Springer- Verlag. [6]Huber,P.J.(1981).Robuststatistics.NewYork:Wiley. [7]John,G.,Kohavi,R.,andPeger,K.(1994).Irrelevantfeaturesandthesubsetselectionproblem. InMachineLearning:ProceedingsoftheEleventhInternationalConference,pp.121{129.SanMateo, Calif.:MorganKaufmann. [8]Journel,A.G.,andHuibregts,C.J.(1978).Mininggeostatistics.London:AcademicPress. [9]Kearns,M.J.,andVazirani,U.V.(1994).Anintroductiontocomputationallearningtheory.Cambridge, Mass.:MITPress. [10]Kononenko,I.,andHong,S.J.(1997).Attributeselectionformodeling.FutureGenerationComputer Systems,thisissue. [11]Lovell,M.C.(1983).Datamining.ReviewofEconomicsandStatistics,65,1{12. [12]Mansour,Y.Lecturenotesonlearningtheory.Availablefromhttp:// [13]Michie,D.,Spiegelhalter,D.J.,andTaylor,C.C.(eds.)(1994).Machinelearning,neuralandstatistical classication.hemelhempstead,u.k.:ellishorwood. [14]Miller,A.J.(1983).Contributiontothediscussionof\Regression,predictionandshrinkage"byJ.B. Copas.JournaloftheRoyalStatisticalSociety,SeriesB,45,346{347. [15]Pearl,J.(1995).Causaldiagramsforempiricalresearch.Biometrika,82,669{710. [16]Ripley,B.D.(1994).Commenton\Neuralnetworks:areviewfromastatisticalperspective"by B.ChengandD.M.Titterington.StatisticalScience,9,45{48. [17]Sakamoto,Y.,Ishiguro,M.,andKitagawa,G.(1986).Akaikeinformationcriterionstatistics.Dordrecht, Holland:Reidel. [18]Tukey,J.W.(1977).Exploratorydataanalysis.Reading,Mass.:Addison-Wesley. [19]Vach,W.,Rossner,R.,andSchumacher,M.(1996).Neuralnetworksandlogisticregression:partII. ComputationalStatisticsandDataAnalysis,21,683{701. [20]Valiant,L.(1991).Aviewofcomputationallearningtheory.InComputationandCognition:Proceedings ofthefirstnecresearchsymposium,32{51.philadelphia,pa.:siam. [21]Vapnik,V.N.(1982).Estimationofdependenciesbasedonempiricaldata.NewYork:Springer-Verlag. [22]Vapnik,V.N.(1995).Thenatureofstatisticallearningtheory.NewYork:Springer-Verlag. [23]Vapnik,V.N.(toappear,1997).Statisticallearningtheory.NewYork:Wiley. [24]Vapnik,V.N.,andChervonenkis,A.Ja.(1971).Ontheuniformconvergenceofrelativefrequencies ofeventstotheirprobabilities.theoryofprobabilityanditsapplications,16,264{280.originally publishedindokladyakademiinaukussr,181(1968). [25]Vapnik,V.N.,andChervonenkis,A.Ja.(1981).Necessaryandsucientconditionsfortheuniform convergenceofmeanstotheirexpectations.theoryofprobabilityanditsapplications,26,532{553. [26]Vapnik,V.N.,andChervonenkis,A.Ja.(1991).Thenecessaryandsucientconditionsforconsistency ofthemethodofempiricalriskminimization.patternrecognitionandimageanalysis,1,284{305. OriginallypublishedinYearbookoftheAcademyofSciencesoftheUSSRonRecognition,Classication, andforecasting,2(1989). [27]Weisberg,S.(1985).Appliedregressionanalysis,2ndedn.NewYork:Wiley. [28]Wolpert,D.(1992).Stackedgeneralization.NeuralNetworks,5,241{
( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )
{ } ( ) = ( ) = {,,, } ( ) β ( ), < 1 ( ) + ( ) = ( ) + ( ) max, ( ) [ ( )] + ( ) [ ( )], [ ( )] [ ( )] = =, ( ) = ( ) = 0 ( ) = ( ) ( ) ( ) =, ( ), ( ) =, ( ), ( ). ln ( ) = ln ( ). + 1 ( ) = ( ) Ω[ (
More informationú Ó Á É é ú ú É ú Á Á ú É É É ú É Ó É ó É Á ú ú ó Á Á ú Ó ú Ó ú É Á ú Á ú ó ú Á ú Á É Á Á Ó É Á ú ú é ú ú ú ú Á ú ó ú Ó Á Á Á Á ú ú ú é É ó é ó ú ú ú É é ú ú ú óú ú ú Ó Á ú ö é É ú ú ú úé ú ú É É Á É
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
More informationCS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationA Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationSchneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.
New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New
More informationMaster's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
More informationDr. BABASAHEB AMBEDKAR MARAHWADA UNIVERSITY, AURANGABAD. Syllabus of Post Graduate Diploma in Human Resource Management [PGDHRM]
Dr. BABASAHEB AMBEDKAR MARAHWADA UNIVERSITY, AURANGABAD Syllabus of Post Graduate Diploma in Human Resource Management [PGDHRM] As Per Credit System Effective From Academic Year 2009-2010 O- 819 A Candidate
More informationChapter 1. Introduction to Accounting and Business
1 Chapter 1 Introduction to Accounting and Business Learning Objective 1 Describe the nature of a business, the role of accounting, and ethics in business. Nature of Business and Accounting A business
More information技 術 論 Predictive Simulation of PFI Engine Combustion and Emission Hisashi Goto Takeshi Morikawa Mineo Yamamoto Minoru Iida INTRODUCTION
More information
Heat Exchangers - Introduction
Heat Exchangers - Introduction Concentric Pipe Heat Exchange T h1 T c1 T c2 T h1 Energy Balance on Cold Stream (differential) dq C = wc p C dt C = C C dt C Energy Balance on Hot Stream (differential) dq
More informationNominal and Real U.S. GDP 1960-2001
Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 318- Managerial Economics Use the data set for gross domestic product (gdp.xls) to answer the following questions. (1) Show graphically
More informationANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-26-32 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES
More informationAdjusting Entries and the Work Sheet
Heintz & Parry th Edition Chapter 5 th Edition College Accounting Adjusting Entries and the Work Sheet 1 Prepare end-of-period adjustments. END-OF-PERIOD ADJUSTMENTS Changes occur that affect the business
More informationEssential Topic: Continuous cash flows
Essential Topic: Continuous cash flows Chapters 2 and 3 The Mathematics of Finance: A Deterministic Approach by S. J. Garrett CONTENTS PAGE MATERIAL Continuous payment streams Example Continuously paid
More informationR 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models
Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More informationAPPENDIX 4 F HELP DESK SERVICES AND PERFORMANCE INDICATORS. In this Appendix the definitions used are as set out in Schedule 1 of the Agreement.
APPENDIX 4 F ELP DESK SERVICES AND PERFORMANCE INDICATORS 1. DEFINITIONS In this Appendix the definitions used are as set out in Schedule 1 of the Agreement. 2. ELP DESK SERVICES 2.1 General Requirements
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationThe frequency function of elliptical galaxy intrinsic shapes
Rochester Institute of Technology RIT Scholar Works Articles 1995 The frequency function of elliptical galaxy intrinsic shapes Benoit Tremblay David Merritt Follow this and additional works at: http://scholarworks.rit.edu/article
More informationManual for SOA Exam MLC.
Chapter 5. Life annuities. Extract from: Arcones Manual for the SOA Exam MLC. Spring 2010 Edition. available at http://www.actexmadriver.com/ 1/114 Whole life annuity A whole life annuity is a series of
More informationExpected default frequency
KM Model Expected default frequency Expected default frequency (EDF) is a forward-looking measure of actual probability of default. EDF is firm specific. KM model is based on the structural approach to
More informationBINOMIAL DISTRIBUTION
MODULE IV BINOMIAL DISTRIBUTION A random variable X is said to follow binomial distribution with parameters n & p if P ( X ) = nc x p x q n x where x = 0, 1,2,3..n, p is the probability of success & q
More informationAPPENDIX 4F HELP DESK SERVICES. In this Appendix the definitions used are as set out in Schedule 1 of the Agreement.
APPENDIX 4F HELP DESK SERVICES 1. DEFINITIONS In this Appendix the definitions used are as set out in Schedule 1 of the Agreement. 2. HELP DESK SERVICES 2.1 General Requirements (c) (d) (e) Project Co
More informationRecent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago
Recent Developments of Statistical Application in Finance Ruey S. Tsay Graduate School of Business The University of Chicago Guanghua Conference, June 2004 Summary Focus on two parts: Applications in Finance:
More informationNutrition and Biochemistry. Pr. Max. 100. Th. Max. 25. Sign. of HOD
FINAL RESULT OF INTERNAL ASSESSMENT: -May-June /Nov-Dec Examination 20. For Academic Year 2007-08 onwards. Faculty : First Basic B.Sc. () Name of the College : Phone No : Name of s Seat Anatomy & Physiology
More informationHow To Calculate The Power Of A Cluster In Erlang (Orchestra)
Network Traffic Distribution Derek McAvoy Wireless Technology Strategy Architect March 5, 21 Data Growth is Exponential 2.5 x 18 98% 2 95% Traffic 1.5 1 9% 75% 5%.5 Data Traffic Feb 29 25% 1% 5% 2% 5 1
More information270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
More informationChapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
More informationExpectations and Future Direction of MOP Guidelines Matthew Newton, Principal Officer Rehabilitation Standards Division of Resources & Energy
Expectations and Future Direction of MOP Guidelines Matthew Newton, Principal Officer Rehabilitation Standards Division of Resources & Energy Mine Rehab Conference 2014 Best Practice Ecological Rehabilitation
More informationPortfolio Using Queuing Theory
Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory Jean-Philippe Boucher and Guillaume Couture-Piché December 8, 2015 Quantact / Département de mathématiques, UQAM.
More informationAN INTRODUCTION TO MATCHING METHODS FOR CAUSAL INFERENCE
AN INTRODUCTION TO MATCHING METHODS FOR CAUSAL INFERENCE AND THEIR IMPLEMENTATION IN STATA Barbara Sianesi IFS Stata Users Group Meeting Berlin, June 25, 2010 1 (PS)MATCHING IS EXTREMELY POPULAR 240,000
More informationThe term structure of Russian interest rates
The term structure of Russian interest rates Stanislav Anatolyev New Economic School, Moscow Sergey Korepanov EvrazHolding, Moscow Corresponding author. Address: Stanislav Anatolyev, New Economic School,
More informationMethodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS).
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS. Vladislav Beresovsky National Center for Health Statistics 3311 Toledo Road Hyattsville, MD 078
More informationRisk-minimization for life insurance liabilities
Risk-minimization for life insurance liabilities Francesca Biagini Mathematisches Institut Ludwig Maximilians Universität München February 24, 2014 Francesca Biagini USC 1/25 Introduction A large number
More informationMansun Chan, Xuemei Xi, Jin He, and Chenming Hu
Mansun Chan, Xuemei Xi, Jin He, and Chenming Hu Acknowledgement The BSIM project is partially supported by SRC, CMC, Conexant, TI, Mentor Graphics, and Xilinx BSIM Team: Prof. Chenming Hu, Dr, Jane Xi,
More informationStirling s formula, n-spheres and the Gamma Function
Stirling s formula, n-spheres and the Gamma Function We start by noticing that and hence x n e x dx lim a 1 ( 1 n n a n n! e ax dx lim a 1 ( 1 n n a n a 1 x n e x dx (1 Let us make a remark in passing.
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More information529 QuickView Ease of Enrollment and Access to Your Client Accounts
529 QuickView TH E S TAT E T R EA SURER Administered by Nevada State Treasurer OFFICE O F Ease of Enrollment and Access to Your Client Accounts 18 64 4 186 DIO ECETES CIVITAS NE VA D A Access the Client
More informationStatistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
More informationSOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS
SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society
More informationAKRON PUBLIC SCHOOLS CURRICULUM PACING GUIDE 2013-14
GRADE/COURSE: Drawing and Design Semester The student will: Suggested Artworks Suggested Text/ Resources ELA s One- Three Review Elements of Art and Principles of Design as artworks are viewed, discussed
More informationHome Loan Documents Checklist Malaysians Working In Malaysia
Home Loan Documents Checklist Malaysians Working In Malaysia A. EMPLOYMENT NRIC (copy) Vendor /New Sales & Purchase Agreement Latest 3 months pay slip (for Basic Salary)/Latest 6 months pay slip (for Basic
More informationConstant Elasticity of Variance (CEV) Option Pricing Model:Integration and Detailed Derivation
Constant Elasticity of Variance (CEV) Option Pricing Model:Integration and Detailed Derivation Ying-Lin Hsu Department of Applied Mathematics National Chung Hsing University Co-authors: T. I. Lin and C.
More informationErrata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
More informationTHE SVM APPROACH FOR BOX JENKINS MODELS
REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box
More informationBayesian Networks. Mausam (Slides by UW-AI faculty)
Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
More informationANSWERS TO QUESTIONS FOR GROUP LEARNING
Accounting for a 5 Merchandising Business ANSWERS TO QUESTIONS FOR GROUP LEARNING Q5-1 A merchandising business has a major revenue reduction called cost of goods sold. The computation of cost of goods
More informationPacific Journal of Mathematics
Pacific Journal of Mathematics GLOBAL EXISTENCE AND DECREASING PROPERTY OF BOUNDARY VALUES OF SOLUTIONS TO PARABOLIC EQUATIONS WITH NONLOCAL BOUNDARY CONDITIONS Sangwon Seo Volume 193 No. 1 March 2000
More informationClicking on the + will display the courses available for selection. Science Options for Classes of 2018 If you have not yet completed Earth Science Essentials or Biology, please select these for 2015-2016
More informationEXP 481 -- Capital Markets Option Pricing. Options: Definitions. Arbitrage Restrictions on Call Prices. Arbitrage Restrictions on Call Prices 1) C > 0
EXP 481 -- Capital Markets Option Pricing imple arbitrage relations Payoffs to call options Black-choles model Put-Call Parity Implied Volatility Options: Definitions A call option gives the buyer the
More informationDistribution Analysis
Finding the best distribution that explains your data ENMAX Energy Corporation 8 October, 2015 Introduction Introduction Statistical tests Goodness of fit We often fit observations to a model (e.g., lognormal
More informationOn closed-form solutions of a resource allocation problem in parallel funding of R&D projects
Operations Research Letters 27 (2000) 229 234 www.elsevier.com/locate/dsw On closed-form solutions of a resource allocation problem in parallel funding of R&D proects Ulku Gurler, Mustafa. C. Pnar, Mohamed
More informationSolutions to Exercises, Section 4.5
Instructor s Solutions Manual, Section 4.5 Exercise 1 Solutions to Exercises, Section 4.5 1. How much would an initial amount of $2000, compounded continuously at 6% annual interest, become after 25 years?
More informationStochastic programming approaches to pricing in non-life insurance
Stochastic programming approaches to pricing in non-life insurance Martin Branda Charles University in Prague Department of Probability and Mathematical Statistics 11th International Conference on COMPUTATIONAL
More informationStatistik for MPH: 2. 10. september 2015. www.biostat.ku.dk/~pka/mph15. Risiko, relativ risiko, signifikanstest (Silva: 110-133.) Per Kragh Andersen
Statistik for MPH: 2 10. september 2015 www.biostat.ku.dk/~pka/mph15 Risiko, relativ risiko, signifikanstest (Silva: 110-133.) Per Kragh Andersen 1 Fra den. 1 uges statistikundervisning: skulle jeg gerne
More informationENERGY EFFICIENCY METRICS
ENERGY EFFICIENCY METRICS Ian Househam 011 482 5990 ihouseham@iiec.org Overview of South Africa s Energy Efficiency Strategy Energy Efficiency Strategy set sectoral and economy-wide energy efficiency targets
More informationContents. Dedication List of Figures List of Tables. Acknowledgments
Contents Dedication List of Figures List of Tables Foreword Preface Acknowledgments v xiii xvii xix xxi xxv Part I Concepts and Techniques 1. INTRODUCTION 3 1 The Quest for Knowledge 3 2 Problem Description
More informationVoluntary Voting: Costs and Bene ts
Voluntary Voting: Costs and Bene ts Vijay Krishna y and John Morgan z November 7, 2008 Abstract We study strategic voting in a Condorcet type model in which voters have identical preferences but di erential
More informationNaïve Bayes and Hadoop. Shannon Quinn
Naïve Bayes and Hadoop Shannon Quinn http://xkcd.com/ngram-charts/ Coupled Temporal Scoping of Relational Facts. P.P. Talukdar, D.T. Wijaya and T.M. Mitchell. In Proceedings of the ACM International Conference
More informationA POOLING METHODOLOGY FOR COEFFICIENT OF VARIATION
Sankhyā : The Indian Journal of Statistics 1995, Volume 57, Series B, Pt. 1, pp. 57-75 A POOLING METHODOLOGY FOR COEFFICIENT OF VARIATION By S.E. AHMED University of Regina SUMMARY. The problem of estimating
More informationMarketing & Communications
& 1 & Coordinator / Assistant Supports the Department with the coordination and development of reports. May also be required to perform marketing administrative duties. Diploma $1,800-$2,500 & Oversees
More informationPresenter: Sharon S. Yang National Central University, Taiwan
Pricing Non-Recourse Provisions and Mortgage Insurance for Joint-Life Reverse Mortgages Considering Mortality Dependence: a Copula Approach Presenter: Sharon S. Yang National Central University, Taiwan
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationCourse Syllabus Business Intelligence and CRM Technologies
Course Syllabus Business Intelligence and CRM Technologies August December 2014 IX Semester Rolando Gonzales I. General characteristics Name : Business Intelligence CRM Technologies Code : 06063 Requirement
More informationELY, WILLIAM, M.A. Pricing European Stock Options using Stochastic and Fuzzy Continuous Time Processes. (2012) Directed by Jan Rychtar. 71 pp.
ELY, WILLIAM, M.A. Pricing European Stock Options using Stochastic and Fuzzy Continuous Time Processes. (2012) Directed by Jan Rychtar. 71 pp. Over the past 40 years, much of mathematical nance has been
More informationBSc in Information Technology Degree Programme. Syllabus
BSc in Information Technology Degree Programme Syllabus Semester 1 Title IT1012 Introduction to Computer Systems 30 - - 2 IT1022 Information Technology Concepts 30 - - 2 IT1033 Fundamentals of Programming
More informationImplementing Propensity Score Matching Estimators with STATA
Implementing Propensity Score Matching Estimators with STATA Barbara Sianesi University College London and Institute for Fiscal Studies E-mail: barbara_s@ifs.org.uk Prepared for UK Stata Users Group, VII
More informationQuantity Purchase Agreement With The State Of Indiana
1 of 5 This is an award of a with the Goodyear Tire & Rubber Company for tire and tire services, per RFP 15-041. The vendor agrees to charge these prices for any products ordered on any QPA release received
More information1. Datsenka Dog Insurance Company has developed the following mortality table for dogs:
1 Datsenka Dog Insurance Company has developed the following mortality table for dogs: Age l Age l 0 2000 5 1200 1 1950 6 1000 2 1850 7 700 3 1600 8 300 4 1400 9 0 Datsenka sells an whole life annuity
More informationCollege Algebra. George Voutsadakis 1. LSSU Math 111. Lake Superior State University. 1 Mathematics and Computer Science
College Algebra George Voutsadakis 1 1 Mathematics and Computer Science Lake Superior State University LSSU Math 111 George Voutsadakis (LSSU) College Algebra December 2014 1 / 91 Outline 1 Exponential
More informationThe Impact of Publicly Available Information on Betting Markets: Implications for Bettors, Betting Operators and Regulators
1 The Impact of Publicly Available Information on Betting Markets: Implications for Bettors, Betting Operators and Regulators Ming-Chien Sung and Johnnie Johnson The 6 th European conference on Gambling
More informationAn Empirical Analysis of Sponsored Search Performance in Search Engine Advertising. Anindya Ghose Sha Yang
An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising Anindya Ghose Sha Yang Stern School of Business New York University Outline Background Research Question and Summary of
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
More informationUsing the SABR Model
Definitions Ameriprise Workshop 2012 Overview Definitions The Black-76 model has been the standard model for European options on currency, interest rates, and stock indices with it s main drawback being
More informationSome Research Problems in Uncertainty Theory
Journal of Uncertain Systems Vol.3, No.1, pp.3-10, 2009 Online at: www.jus.org.uk Some Research Problems in Uncertainty Theory aoding Liu Uncertainty Theory Laboratory, Department of Mathematical Sciences
More informationProject & Programme Management Training Schedule January 2016 - July 2016
Project & Programme Management Training Schedule January 2016 - July 2016 Upper Tier Bundle One Bundle Two PRINCE2 Foundation & Practitioner M_o_R Foundation & Practitioner APMP APM Professional Individual
More informationPractice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
More informationBACKGROUND DISCUSSION
CITY COMMISSION AGENDA MEMO September 24, 2014 FROM: Brian D. Johnson, P.E., City Engineer MEETING: October 7, 2014 SUBJECT: PRESENTER: Award Construction Contract for Stone Valley Addition, Unit Two,
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More information1 Inleiding 1.1 Probleemstelling 1.2 OverKPMGenKPMGITAdvisory 2 OperationeelRisico 2.1 DefinitieenomschrijvingRisico 2.2 DefinitieenomschrijvingOperationeelRisico 2.3 Regelgeving 2.3.1 LossDatabases
More informationThe Sieve Re-Imagined: Integer Factorization Methods
The Sieve Re-Imagined: Integer Factorization Methods by Jennifer Smith A research paper presented to the University of Waterloo in partial fulfillment of the requirement for the degree of Master of Mathematics
More information5.3 Improper Integrals Involving Rational and Exponential Functions
Section 5.3 Improper Integrals Involving Rational and Exponential Functions 99.. 3. 4. dθ +a cos θ =, < a
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationSUP Ann 6R: Persistency Report
SUP Ann 6R: Persistency Report 1. REP003 Persistency Report Nil Return Declaration 2. Persistency Report Life Policies 3. Persistency Report Stakeholder Pensions Financial Conduct Authority REP003 Persistency
More informationImpact of child care support on female labor supply, family income and public finance
Impact of child care support on female labor supply, family income and public finance Nicholas-James Clavet and Jean-Yves Duclos CIRPÉE, Université Laval May 2011 Preliminary please do not quote Abstract
More informationAppendix for Hierarchical Dirichlet Scaling Process for Multi-labeled Data
Appendix for Hierarchical Dirichlet Scaling Process for Multi-labeled Data Dongwoo Kim DW.KIM@KAIST.AC.KR KAIST, Daeeon, Korea Alice Oh ALICE.OH@KAIST.EDU KAIST, Daeeon, Korea This appendix has been provided
More informationBig Data for Law Firms DAMIAN BLACKBURN
Big Data for Law Firms DAMIAN BLACKBURN PUBLISHED BY IN ASSOCIATION WITH Contents Executive summary VII About the author XI Chapter 1: Introduction to big data 1 Factors leading to big data 2 The three
More informationDetail SE Transaction Set Trailer Summary GE Functional Group Trailer Summary IEA Interchange Control Trailer Summary. ISA Interchange Control Header
820 Payment Order / Remittance Advice Segment ID Description Location ISA Interchange Control Header Heading GS Functional Group Header Heading ST Transaction Set Header Heading 1 BPR Beginning Segment
More information3.4 - BJT DIFFERENTIAL AMPLIFIERS
BJT Differential Amplifiers (6/4/00) Page 1 3.4 BJT DIFFERENTIAL AMPLIFIERS INTRODUCTION Objective The objective of this presentation is: 1.) Define and characterize the differential amplifier.) Show the
More information1.5 / 1 -- Communication Networks II (Görg) -- www.comnets.uni-bremen.de. 1.5 Transforms
.5 / -- Communication Networks II (Görg) -- www.comnets.uni-bremen.de.5 Transforms Using different summation and integral transformations pmf, pdf and cdf/ccdf can be transformed in such a way, that even
More informationPRAXIS Pass Rates Fall 2010 through Spring 2013
PRAXIS Pass Rates Fall 2010 through Spring 2013 Program Semester Test # N Percent Comments BS Elementary Education Fall 2010 0710 4 100% 1 was ACT PRAXIS exempt BS Elementary Education Fall 2010 0172 4
More informationOnline Convex Programming and Generalized Infinitesimal Gradient Ascent
Online Convex Programming and Generalized Infinitesimal Gradient Ascent Martin Zinkevich February 003 CMU-CS-03-110 School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 Abstract Convex
More informationCould your house sale or purchase be affected by Contaminated Land?
Could your house sale or purchase be affected by Contaminated Land? What is Contaminated Land? The legal definition of Contaminated Land, as provided by Part IIA of the Environmental Protection Act 1990,
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationDATA MINING IN FINANCE
DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia
More informationThe Fast Convergence of Incremental PCA
The Fast Convergence of Incremental PCA Akshay Balsubramani UC San Diego abalsubr@cs.ucsd.edu Sanjoy Dasgupta UC San Diego dasgupta@cs.ucsd.edu Yoav Freund UC San Diego yfreund@cs.ucsd.edu Abstract We
More informationHow To Invest In Stocks With Options
Applied Options Strategies for Portfolio Managers Gary Trennepohl Oklahoma State University Jim Bittman The Options Institute Session Outline Typical Fund Objectives Strategies for special situations Six
More information