ImprovingRooftopDetectioninAerialImages MarcusA.Maloofy(maloof@apres.stanford.edu) ThroughMachineLearning PatLangleyy(langley@newatlantis.isle.org) ThomasBinfordz(binford@cs.stanford.edu) yinstituteforthestudyoflearningandexpertise StephanieSagey(sage@icaria.isle.org) zroboticslaboratory,departmentofcomputerscience 2164StauntonCourt,PaloAlto,CA94306 StanfordUniversity,Stanford,CA94305 theproblemofanalyzingaerialimagesanddescribeanexistingvisionsystemthatautomatesthe whichisonestepinavisionsystemthatrecognizesbuildingsinoverheadimagery.wereview Inthispaper,weexaminetheuseofmachinelearningtoimprovearooftopdetectionprocess, Abstract recognitionofbuildingsinsuchimages.afterthis,webrieyreviewtwowell-knownlearning algorithms,representingdierentinductivebiases,thatweselectedtoimproverooftopdetection. Animportantaspectofthisproblemisthatthedatasetsarehighlyskewedandthecostofmistakes bothtrainingandtestingdataarederivedfromthesameimage.anotheraddressesbetween-image learningtotheimageanalysistask.onesetofstudiesfocusesonwithin-imagelearning,inwhich ROCanalysis.Wereportthreesetsofexperimentsdesignedtoilluminatefacetsofapplyingmachine diersforthetwoclasses,soweevaluatethealgorithmsundervaryingmisclassicationcostsusing learning,inwhichtrainingandtestingsetscomefromdierentimages.analsetinvestigates learningusingallavailableimagedatainaneorttodeterminethebestperformingmethod. Experimentalresultsdemonstratethatusefulgeneralizationoccurswhentrainingandtestingon ahandcraftedlinearclassier,thesolutioncurrentlybeingusedinthebuildingdetectionsystem. classierexceeded,byasmuchasafactoroftwo,thepredictiveaccuracyofnearestneighborand that,undermostconditionsandacrossarangeofmisclassicationcosts,atrainednaivebayesian dataderivedfromimagesthatdierinlocationandinaspect.furthermore,theydemonstrate theavailabletrainingdata. AnalysisoflearningcurvesrevealsthatnaiveBayesachievedsuperiorityusingaslittleas6%of
RooftopDetectionThroughMachineLearning 1 abilitytoprocessthem.computationalaidswillberequiredtolterthisoodofimagesand Thenumberofimagesavailabletoimageanalystsisgrowingrapidly,andwillsoonoutpacetheir 1.Introduction focustheanalyst'sattentiononinterestingevents,butcurrentimageunderstandingsystemsare operationscangiveacceptableresultsonsomeimagesbutnotothers. consequentlyremainfragile.handcraftedknowledgeaboutwhenandhowtouseparticularvision notyetrobustenoughtosupportthisprocess.successfulimageunderstandingreliesonknowledge, anddespitetheoreticalprogress,implementedvisionsystemsstillrelyonheuristicmethodsand inthevisionprocess,andthusforproducingmorerobustsoftware.recentapplicationsofmachine learninginbusinessandindustry(langley&simon1995)holdusefullessonsforapplicationsin imageanalysis.akeyideainappliedmachinelearninginvolvesbuildinganadvisorysystemthat Inthispaper,weexploretheuseofmachinelearningasameansforimprovingknowledgeused systemanalysesacceptableandothersuninterestingorinerror.theaimofourresearchprogram recommendsactionsbutgivesnalcontroltoahumanuser,witheachdecisiongeneratingatraining issimilartothescenarioinwhichanimageanalystinteractswithavisionsystem,ndingsome istoembedmachinelearningintothisinteractiveprocessofimageanalysis. case,gatheredinanunobtrusiveway,foruseinlearning.thissettingforknowledgeacquisition individualsinresponsetofeedbackfromthoseusers.theoveralleectshouldbeanewclass thatimageanalystsmustmakeperpicture,thusimprovingtheirabilitytodealwithahighow ofimages.moreover,theresultingsystemsshouldadapttheirknowledgetothepreferencesof Thisadaptiveapproachtocomputervisionpromisestogreatlyreducethenumberofdecisions ofsystemsforimageanalysisthatreducestheworkloadonhumananalystsandgivethemmore reliableresults,thusspeedingtheimageanalysisprocess. domain identifyingbuildingsinaerialphotographs andthendescribethevisionsystemdesigned makingatonestageinanexistingimageunderstandingsystem.webeginbyexplainingthetask forthistask.next,wereviewtwowell-knownalgorithmsforsupervisedlearningthatholdpotential Inthesectionsthatfollow,wereportprogressonusingmachinelearningtoimprovedecision forimprovingthereliabilityofimageanalysisinthisdomain.afterthis,wereportthedesignof experimentstoevaluatethesemethodsandtheresultsofthosestudies.inclosing,wediscuss relatedandfuturework. 2.NatureoftheImageAnalysisTask otherinterestingbehavior.theimagesunderscrutinyareusuallycomplex,involvingmanyobjects Theimageanalystinterpretsaerialimagesofgroundsiteswithaneyetounusualactivityor inarangeofsizesandshapes,majorandminorroadways,sidewalks,parkinglots,vehicles,and vegetation.acommontaskfacedbytheimageanalystistodetectchangeatasiteasreectedin arrangedinavarietyofpatterns.overheadimagesofforthood,texas,collectedaspartofthe dierencesbetweentwoimages,asinthenumberofbuildings,roads,andvehicles.thisinturn RADIUSproject(Firschein&Strat1997),aretypicalofamilitarybaseandincludebuildings requirestheabilitytorecognizeexamplesfromeachclassofinterest.inthispaper,wefocuson theperformancetaskofidentifyingbuildingsinsatellitephotographs.
RooftopDetectionThroughMachineLearning 2 theimage,includingthetimeofday(whichaectscontrastandshadows),thetimeofyear(which parameters,suchasdistancefromthesite(whichaectssizeandresolution)andviewingangle (whichaectsperspectiveandvisiblesurfaces).butothervariablesalsoinuencethenatureof Aerialimagescanvaryacrossanumberofdimensions.Themostobviousfactorsconcernviewing aectsfoliage),andthesiteitself(whichdeterminestheshapesofviewedobjects).takentogether, thesefactorsintroduceconsiderablevariabilityintotheimagesthatconfronttheanalyst. thoughabuildingorvehiclewillappeardierentfromalternativeperspectivesanddistances,the eectsofsuchtransformationsarereasonablywellunderstood.butvariationsduetotimeofday, theseason,andthesitearemoreserious.shadowsandfoliagecanhideedgesandobscuresurfaces, Inturn,thisvariabilitycansignicantlycomplicatethetaskofrecognizingobjectclasses.Al- andbuildingsatdistinctsitesmayhavequitedierentstructuresandlayouts.suchvariationsserve computervisionsystems. asmeredistractionstothehumanimageanalyst,yettheyprovideseriouschallengestoexisting particularvisionsoftware.inthenexttwosections,webrieyreviewonesuchsystemforimage studythistaskintheabstract.wemustexploretheeectofspecicinductionalgorithmson knowledgethatimprovesthereliabilityofsuchanimageanalysissystem.however,wecannot Thissuggestsanaturaltaskformachinelearning:givenaerialimagesastrainingdata,acquire analysisandtwolearningmethodsthatmightgiveitmorerobustbehavior. 3.AnArchitectureforImageAnalysis LinandNevatia(1996)reportacomputervisionpackage,calledtheBuildingsDetectionand DescriptionSystem(Budds),fortheanalysisofgroundsitesinaerialimages.Likemanyprograms forimageunderstanding,theirsystemoperatesinaseriesofprocessingstages.eachstepinvolves aggregatinglowerlevelfeaturesintohigherlevelones,eventuallyreachinghypothesesaboutthe locationsanddescriptionsofbuildings.wewillconsiderthesestagesintheorderthattheyoccur. invokesalinendertogroupedgelsintolines.junctionsandparallellinesareidentiedand combinedtoformthree-sidedstructuresor\us".thealgorithmthengroupsselectedusand junctionstoformparallelograms.eachsuchparallelogramconstitutesahypothesisaboutthe Startingatthepixellevel,Buddsusesanedgedetectortogrouppixelsintoedgels,andthen positionandorientationoftheroofforsomebuilding,sowemaycallthissteprooftopgeneration. eachrooftopcandidatetodeterminewhetherithassucientevidencetoberetained.theaim ofthisprocessistoremovecandidatesthatdonotcorrespondtoactualbuildings.ideally,the systemwillrejectmostspuriouscandidatesatthispoint,althoughanalvericationstepmaystill Afterthesystemhascompletedtheaboveaggregationprocess,arooftopselectionstageevaluates collapseduplicateoroverlappingrooftops.thisstagemayalsoexcludecandidatesifthereisno evidenceofthree-dimensionalstructure,suchasshadowsandwalls. candidates.thisprocesstakesintoaccountbothlocalandglobalcriteria.localsupportcomes improvementthroughmachinelearning,becausethisstagemustdealwithmanyspuriousrooftop fromfeaturessuchaslinesandcornersthatareclosetoagivenparallelogram.sincethesesuggest Analysisofthesystem'soperationsuggestedthatrooftopselectionheldthemostpromisefor wallsandshadows,theyprovideevidencethatthecandidatecorrespondstoanactualbuilding.
RooftopDetectionThroughMachineLearning 3 constraintsappliedinthisprocesshaveasolidfoundationinboththeoryandpractice. tioncriteria,thesetofrooftopcandidatesisreducedtoamoremanageablesize.theindividual Globalcriteriaconsidercontainment,overlap,andduplicationofcandidates.Usingtheseevalua- thatvaryintheirglobalcharacteristics,suchascontrastandamountofshadow.however,methods Moreover,suchrulesofthumbarecurrentlycraftedbyhand,andtheydonotfarewellonimages frommachinelearning,towhichwenowturn,maybeabletoinducebetterconditionsforselecting Theproblemisthatwehaveonlyheuristicknowledgeabouthowtocombinetheseconstraints. orrejectingcandidaterooftops.iftheseacquiredheuristicsaremoreaccuratethantheexisting handcraftedsolutions,theywillimprovethereliabilityoftherooftopselectionprocess. 4.AReviewofThreeLearningTechniques Wecanformulatethetaskofacquiringrooftopselectionheuristicsintermsofsupervisedlearning. Inthisprocess,trainingcasesofsomeconceptarelabeledastotheirclass.Inrooftopselection, associatedvalues,alongwithaclasslabel.theselabeledinstancesconstitutetrainingdatathatare examplesoftheconcept\rooftop".eachinstanceconsistsofanumberofattributesandtheir onlytwoclassesexist rooftopandnon-rooftop whichwewillrefertoaspositiveandnegative providedasinputtoaninductivelearningroutine,whichgeneratesconceptdescriptionsdesigned todistinguishthepositiveexamplesfromthenegativeones.theseknowledgestructuresstatethe conditionsunderwhichtheconcept,inthiscase\rooftop",issatised. fortherooftopdetectiontaskandselectedthetwothatshowedpromiseofachievingabalancebetweenthetruepositiveandfalsepositiverates:nearestneighbor,andnaivebayes.thesemethods Inapreviousstudy(Maloofetal.1997),weevaluatedavarietyofmachinelearningmethods usedierentrepresentations,performanceschemes,andlearningmechanismsforsupervisedconceptlearning,andexhibitdierentinductivebiases,meaningthateachalgorithmacquirescertaisentationofknowledgethatsimplyretainstrainingcasesinmemory.thisapproachclassiesnew instancesbyndingthe\nearest"storedcase,asmeasuredbysomedistancemetric,thenpredictingtheclassassociatedwiththatcase.fornumericattributes,acommonmetric(whichweusein Thenearest-neighbormethod(e.g.,Aha,Kibler,&Albert1991),usesaninstance-basedrepre- conceptsmoreeasilythanothers. eachtraininginstance,alongwithitsassociatedclass.althoughthismethodisquitesimpleand hasknownsensitivitytoirrelevantattributes,inpracticeitperformswellinmanydomains.some ourstudies)iseuclideandistance.inthisframework,learninginvolvesnothingmorethanstoring versionsselectthekclosestcasesandpredictthemajorityclass;herewewillfocusonthe\simple" estimatedconditionalprobabilitiesofeachattributevaluegiventheclass.themethodclassies nearestneighborscheme,whichusesonlythenearestcaseforprediction. newinstancesbycomputingtheposteriorprobabilityofeachclassusingbayes'rule,combiningthe descriptionforeachclass.thisdescriptionincludesanestimateoftheclassprobabilityandthe ThenaiveBayesianclassier(e.g.,Langley,Iba,&Thompson1992)storesaprobabilisticconcept storedprobabilitiesbyassumingthattheattributesareindependentgiventheclassandpredicting
RooftopDetectionThroughMachineLearning 4 Figure1.Visualizationinterfaceforlabelingrooftopcandidates.Thesystempresentscandidatestoauser wholabelsthembyclickingeitherthe`roof'or`non-roof'button.italsoincorporatesasimple basedonpreviouslylabeledexamples. learningalgorithmtoprovidefeedbacktotheuseraboutthestatisticalpropertiesofacandidate itations,suchassensitivitytoattributecorrelationsandaninabilitytorepresentmultipledecision theclasswiththehighestposteriorprobability.likenearestneighbor,naivebayeshasknownlim- regions,butinpracticeitbehaveswellonmanynaturaldomains. whichisequivalenttoaperceptronclassier(e.g.,zurada1992).althoughwedidnottrainthis thepurposeofcomparison.thismethodrepresentsconceptsusingacollectionofweightswand methodaswedidnaivebayesandnearestneighbor,weincludedthismethodinourevaluationfor Currently,Buddsusesahandcraftedlinearclassierforrooftopdetection(Lin&Nevatia1996), athreshold.toclassifyaninstance,whichwerepresentasavectorofnnumbersx,wecompute theoutputooftheclassierusingtheformula: Forourapplication,theclassierpredictsthepositiveclassiftheoutputis+1andpredictsthe o=(+1ifpni=1wixi> negativeclassotherwise.thereareanumberofestablishedmethodsfortrainingperceptrons,but?1otherwise usedinbuddsasthe\buddsclassier". notusethelearnedperceptronshere.henceforth,wewillrefertothehandcraftedlinearclassier ourpreliminarystudiessuggestedthattheyfaredworsethanthemanuallysetweights,sowedid
RooftopDetectionThroughMachineLearning 5 Table1.Characteristicsoftheimagesanddatasets.Webeganwithanadirandanobliqueimageofan areaofforthood,texas,andderivedthreesubimagesfromeachthatcontainedconcentrationsof buildings.wethenusedbuddstoextractrooftopcandidatesandlabeledeachaseitherapositive ornegativeexampleoftheconcept\rooftop". Number Image 21 Original Image LocationAspectExamplesExamples Positive 197 Negative 34 1 2 238 71 74 1955 2645 3349 982 56 FHOV1027 FHOV625 3 Oblique Nadir 114 87 3722 4395 candidatesinaerialimages.thisrequiredthreethings:asetofimagesthatcontainbuildings, 5.Generating,Representing,andLabelingRooftopCandidates Wewereinterestedinhowwellthevariousinductionalgorithmscouldlearntoclassifyrooftop somemeanstogenerateandrepresentplausiblerooftops,andlabelsforeachsuchcandidate. werecollectedaspartoftheradiusprogram(firschein&strat1997).theseimagescoverthe sameareabutweretakenfromdierentviewpoints,onefromanadirangleandtheotherfroman obliqueangle.wesubdividedeachimageintothreesubimages,focusingonlocationsthatcontained Asourrststep,weselectedtwoimages,FHOV1027andFHOV625,ofFortHood,Texas,which image,producingsixdatasets.followinglinandnevatia(1996),thedatasetsdescribedeach concentrationsofbuildings,tomaximizethenumberofpositiverooftopcandidates.thisgaveus rooftopcandidateintermsofninecontinuousfeaturesthatsummarizetheevidencegatheredfrom threepairsofimages,eachpaircoveringthesameareabutviewedfromdierentaspects. thevariouslevelsofanalysis.forexample,positiveindicationsfortheexistenceofarooftop OuraimwastoimproveBuddssoweusedthissystemtogeneratecandidaterooftopsforeach junctionsadjacenttothecandidate,similarlyadjacentt-junctions,gapsinthecandidate'sedges, ofthecandidate.negativeevidenceincludedtheexistenceoflinesthatcrossthecandidate,l- includedevidenceforedgesandcorners,thedegreetowhichacandidate'sopposinglinesare andthedegreetowhichenclosinglinesfailedtoformaparallelogram. parallel,supportfortheexistenceoforthogonaltrihedralvertices,andshadowsnearthecorners thedata,andwemakenoclaimsthatthesenineattributesarethebestonesforrecognizingrooftops inaerialimages.however,becauseouraimwastoimprovetherobustnessofbudds,weneededto usethesamefeaturesaslinandnevatia'shandcraftedclassier.moreover,itseemedunlikelythat Weshouldnotethatinductionalgorithmsareoftensensitivetothefeaturesoneusestodescribe wecoulddevisebetterfeaturesthanthesystem'sauthorshaddevelopedduringyearsofresearch. themostinteresting.buddsitselfclassieseachcandidate,butsinceweweretryingtoimprove onitsability,wecouldnotusethoselabels.thus,wetriedanapproachinwhichanexpert Thethirdproblem,labelingthegeneratedrooftopcandidates,provedthemostchallengingand
RooftopDetectionThroughMachineLearning 6 aregionsurroundingtheactualrooftop.unfortunately,uponinspectionneitherapproachgaveus positiveornegativedependingonthedistanceoftheirverticesfromthenearestactualrooftop's corners.wealsotriedasecondschemethatusedthenumberofcandidateverticesthatfellwithin speciedtheverticesofactualrooftopsintheimage,thenweautomaticallylabeledcandidatesas satisfactorylabelingresults. process.oneisthattheyignoreinformationaboutthecandidate'sshape;agoodrooftopshould beaparallelogram,yetnearnessofverticesisneithersucientornecessaryforthisform.a seconddrawbackisthattheyignoreotherinformationcontainedintheninebuddsattributes, Analysisrevealedthedicultieswithusingsuchrelationstoactualrooftopsinthelabeling two-dimensionalspacethatdescribeslocationwithintheimage,ratherthanthenine-dimensional suchasshadowsandcrossinglines.thebasicproblemisthatsuchmethodsdealonlywiththe daunting,aseachimageproducedthousandsofcandidaterooftops.tosupporttheprocess,we spacethatwewantthevisionsystemtouseinclassifyingacandidate. eachextractedrooftoptotheuser.thesystemdrawseachcandidateovertheportionoftheimage implementedaninteractivelabelingsysteminjava,showninfigure1,thatsuccessivelydisplays Reluctantly,weconcludedthatmanuallabelingbyahumanwasnecessary,butthistaskwas fromwhichitwasextracted,thenletstheuserclickbuttonsfor`roof'or`non-roof'tolabelthe example. rooftops,andunknown.theinterfacedisplayslikelyrooftopsusinggreenrectangles,unlikely toimprovethelabelingprocess.asthesystemobtainsfeedbackfromtheuseraboutpositive andnegativeexamples,itdividesunlabeledcandidatesintothreeclasses:likelyrooftops,unlikely Thevisualinterfaceitselfincorporatesasimplelearningmechanism nearestneighbor designed sensitivityparameter1thataectshowcertainthesystemmustbebeforeitproposesalabel.after eitherthe`roof'or`non-roof'button.thesimplelearningmechanismthenusesthisinformation rooftopsasredrectangles,andunknowncandidatesasbluerectangles.thesystemincludesa toimprovesubsequentpredictionsofcandidatelabels. displayingarooftop,theusereitherconrmsorcontradictsthesystem'spredictionbyclicking fewerandfewercandidatesaboutwhichitwasuncertain,andthusspeedupthelaterstagesof session,theusertypicallyconrmsnearlyalloftheinterface'srecommendations.however,because interaction.informalstudiessuggestedthatthesystemachievesthisaim:bytheendofthelabeling Ourintentwasthat,astheinterfacegainedexperiencewiththeuser'slabels,itwoulddisplay manner,theinterfacerequiredonlyaboutvehourstolabelthe17,829roofcandidatesextracted wewereconcernedthatouruseofnearestneighbormightbiasthelabelingprocessinfavorofthis fromthesiximages.thiscomestounderonesecondpercandidate,whichstillseemsquiteecient. algorithmduringlaterstudies,wegeneratedthedatausedinsection7bythesettingsensitivity parametersothatthesystempresentedallcandidatesasuncertain.evenhandicappedinthis fascinatingissuesinourwork.toincorporatesupervisedconceptlearningintovisionsystems, whichcangeneratethousandsofcandidatesperimage,wemustdevelopmethodstoreducethe burdenoflabelingthesedata.infuturework,weintendtomeasuremorecarefullytheabilityof Insummary,whatbeganasthesimpletaskoflabelingvisualdataledustosomeofthemore learnedclassiertoordercandidaterooftops(showingtheleastcertainonesrst)andeventolter ouradaptivelabelingsystemtospeedthisprocess.wealsoplantoexploreextensionsthatusethe 1.TheusercansetthisparameterusingthesliderbarandnumbereldinthebottomrightcornerofFigure1.
RooftopDetectionThroughMachineLearning 7 candidatesbeforetheyarepassedontotheuser(automaticallylabelingthemostcondentones). Techniquessuchasselectivesampling(e.g.,Freundetal.1997)anduncertaintysampling(Lewis 6.Cost-SensitiveLearningandSkewedData &Catlett1994)shouldproveusefultowardtheseends. Twoaspectsoftherooftopselectiontaskinuencedourapproachtoimplementationandevaluation. First,Buddsworksinabottom-upmanner,soifthesystemdiscardsarooftop,itcannotretrieveit later.consequently,errorsontherooftopclass(falsenegatives)aremoreexpensivethanerrorson whenitcandrawuponaccumulatedevidence,suchastheexistenceofwallsandshadows.however, negative.thesystemhasthepotentialfordiscardingfalsepositivesinlaterstagesofprocessing sincefalsenegativescannotberecovered,weneedtominimizeerrorsontherooftopclass. thenon-rooftopclass(falsepositives),soitisbettertoretainafalsepositivethantodiscardafalse errorsonourminorityclass(rooftops)aremostexpensive,andtheextremeskewonlyincreases acrossclasses(781rooftopsvs.17,048non-rooftops).givensuchskeweddata,mostinduction algorithmshavedicultylearningtopredicttheminorityclass.moreover,wehaveestablishedthat Second,wehaveaseverelyskeweddataset,withtrainingexamplesdistributednon-uniformly sucherrors.thisinteractionbetweenskewedclassdistributionandunequalerrorcostsoccursin manycomputervisionapplications,inwhichavisionsystemgeneratesthousandsofcandidates butonlyahandfulcorrespondtoobjectsofinterest.italsoholdsmanyotherapplicationsof machinelearning,suchasfrauddetection(fawcett&provost1997),discourseanalysis(soderland &Lehnert1994),andtelecommunicationsriskmanagement(Ezawa,Singh,&Norton1996). thatcanachievehighaccuracyontheminorityclass.second,theyrequireanexperimentalmethodologythatletsuscomparedierentmethodsondomainslikerooftopdetection,inwhichtheclasses areskewedanderrorshavedierentcosts.intheremainderofthissection,wefurtherclarifythenatureoftheproblem,afterwhichweproposesomecost-sensitivelearningmethodsandanapproach 6.1FavoritismTowardtheMajorityClass toexperimentalevaluation. Theseissuesraisetwochallenges.First,theysuggesttheneedformodiedlearningalgorithms Inapreviousstudy(Maloofetal.1997),weevaluatedseveralalgorithmswithouttakinginto accountthecostofclassicationerrorsandgotconfusingexperimentalresults.somemethods,like thestandarderror-drivenalgorithmforrevisingperceptronweights(e.g.,zurada1992),learnedto alwayspredictthemajorityclass.thenaivebayesianclassierfoundamorecomfortabletrade-o setsthatareskewed,aninductivemethodthatlearnstopredictthemajorityclasswilloftenhavea higheroverallaccuracythanamethodthatndsabalancebetweentruepositiveandfalsepositive betweenthetruepositiveandfalsepositiverates,butstillfavoredthemajorityclass.2fordata whichmakesitamisleadingmeasureofperformance. rates.indeed,alwayspredictingthemajorityclassforourproblemyieldsahitrateof95percent, minorityclass.fortherooftopdomain,iftheerrorcostsforthetwoclasseswerethesame,thenwe 2.Coveringalgorithms,likeAQ15(Michalskietal.1986)orCN2(Clark&Niblett1989),maybelesssusceptible Thisbiastowardthemajorityclassonlycausesdicultywhenwecaremoreabouterrorsonthe toskeweddatasets,butthisishighlydependentontheirruleselectioncriteria.
RooftopDetectionThroughMachineLearning 8 wouldnotcareonwhichclasswemadeerrors,providedweminimizedthetotalnumberofmistakes. Norwouldtherebeanyproblemifmistakesonthemajorityclassweremoreexpensive,sincemost classdistributionrunscountertotherelativecostofmistakes,asinourdomain,thenwemustdo learningmethodsarebiasedtowardminimizingsucherrorsanyway.ontheotherhand,ifthe somethingtocompensate,bothinthelearningalgorithmitselfandinmeasuringitsperformance. costoferrors.inparticular,theypointoutthatonecanmitigatethebiasagainsttheminority classbyduplicatingexamplesofthatclassinthetrainingdata.thisalsohelpsexplainwhymost inductionmethodsgivemoreweighttoaccuracyonthemajorityclass,sinceskewedtrainingdata Breimanetal.(1984)notethecloserelationbetweenthedistributionofclassesandtherelative implicitlyplacesmoreweightonerrorsforthatclass.inresponse,severalresearchershaveexplored tobiastheperformanceelement(cardie&howe1997),removingunimportantexamplesfromthe approachesthatalterthedistributionoftrainingdatainvariousways,includinguseofweights majorityclass(kubat&matwin1997),and`boosting'theexamplesintheunder-representedclass themselvestomoredirectlyrespondtoerrorcosts. (Freund&Schapire1996).However,aswewillseeshortly,onecanalsomodifythealgorithms 6.2Cost-SensitiveLearningMethods errors,possiblybecausemostlearningmethodsdonotprovidewaystotakesuchcostsintoaccount. Empiricalcomparisonsamongmachinelearningalgorithmsseldomfocusonthecostofclassication havealsodonesomepreliminaryworkalongtheselines,whichtheydescribeasaddressingthecosts Happily,someresearchershaveexploredvariationsonstandardalgorithmsthateectivelybiasthe ratiointoc4.5(quinlan1993)tobiasittowardunder-representedclasses.pazzanietal.(1994) ofdierenterrortypes.theirmethodndstheminimum-costclassierforavarietyofproblems methodinfavorofoneclassoverothers.forexample,lewisandcatlett(1994)introducedaloss usingasetofhypotheticalerrorcosts.turney(1995)presentsresultsfromanempiricalevaluation algorithmtreatsinstancesfromthemoreexpensiveclassrelativetotheotherinstances,either ofalgorithmsthattakeintoaccountboththecostofteststomeasureattributesandthecostof duringthelearningprocessoratthetimeoftesting.inessence,wewanttoincorporateacost classicationerror. heuristicintothealgorithmssowecanbiasthemtowardmakingmistakesonthelesscostlyclass Whenimplementingcost-sensitivelearningmethods,thebasicideaistochangethewaythe ratherthanonthemoreexpensiveclass. relativecostofmakingamistakeononeclassversusanother.zeroindicatesthaterrorscost nothing,whereasonemeansthaterrorsaremaximallyexpensive.toincorporateacostheuristic intothealgorithms,wechosetomodifytheperformanceelementofthealgorithms,ratherthanthe Toaccomplishthis,wedenedacostforeachclassontherange[0:0;1:0]thatindicatesthe learningelement,byusingthecostheuristictoadjustthedecisionboundaryatwhichthealgorithm selectsoneclassversustheother. usingbayes'rule,sowewantthecostheuristictobiaspredictioninfavorofthemoreexpensive class.foracostparametercj2[0:0;1:0],wecomputedtheexpectedcostjfortheclass!jusing theformula: RecallthatnaiveBayespredictstheclasswiththehighestposteriorprobabilityascomputed
RooftopDetectionThroughMachineLearning 9 cost-sensitiveversionofnaivebayespredictstheclass!jwiththeleastexpectedcostj. wherexisthequery,andp(!jjx)istheposteriorprobabilityofthejthclassgiventhequery.the j=p(!jjx)+cj(1?p(!jjx)) magnitudeofthecostparameter.therefore,wecomputedtheexpectedcostjfortheclass!j exampleofthemoreexpensiveclass.themagnitudeofthischangeshouldbeproportionaltothe Therefore,thecostheuristicshouldhavetheeectofmovingthequerypointclosertotheclosest Nearestneighbor,asnormallyused,predictstheclassoftheexamplethatisclosesttothequery. usingtheformula: expectedcost.thismodicationalsoworksforknearestneighbor,whichconsidersthekclosest distancefunction.thecost-sensitiveversionofnearestneighborpredictstheclasswiththeleast wherexjistheclosestneighborfromclass!jtothequerypoint,andde(x;y)istheeuclidean j=de(x;xj)?cjde(x;xj) neighborswhenclassifyingunknowninstances. ingalgorithms,wecanmakesimilarchangestothebuddsclassier.sincethisclassierusesa lineardiscriminantfunction,wewantthecostheuristictoadjustthethresholdsothehyperplane ofdiscriminationisfartherfromthehypotheticalregionofexamplesofthemoreexpensiveclass, Finally,becauseourmodicationsfocusedontheperformanceelementsratherthanonthelearn- thusenlargingthedecisionregionofthatclass.thedegreetowhichthealgorithmadjuststhe thresholdisagaindependentonthemagnitudeofthecostparameter.theadjustedthreshold0 iscomputedby: thepositiveclassandnegativeforthenegativeclass,andjisthemaximumvaluetheweighted whereistheoriginalthresholdforthelineardiscriminantfunction,sgn(!j)returnspositivefor 0=?2Xj=1sgn(!j)cjj sumcantakeforthejthclass.thecost-sensitiveversionofthebuddsclassierpredictsthe otherwise,itpredictsthenegativeclass. positiveclassiftheweightedsumofaninstance'sattributessurpassestheadjustedthreshold0; Oursecondchallengewastoidentifyanexperimentalmethodologythatwouldletuscompare 6.3ROCAnalysisforEvaluatingPerformance costsorskeweddistributions.rather,wemustseparatelymeasureaccuracyonbothclasses,in thatcomparisonsbasedonoverallaccuracyarenotsucientfordomainsthatinvolvenon-uniform thebehaviorofourcost-sensitivelearningmethodsontherooftopdata.wehavealreadyseen termsoffalsepositivesandfalsenegatives.giveninformationabouttherelativecostsoferrors, sayfromconversationswithdomainexpertsorfromadomainanalysis,wecouldthencompute Fawcett&Provost1997). aweightedaccuracyforeachalgorithmthattakescostintoaccount(e.g.,pazzanietal.1994; ratherthanaimingforasingleperformancemeasure,astypicallydoneinmachinelearningex- resultsoftheirinterpretationstodeterminetheactualcostsforthedomain.insuchsituations, However,inthiscase,wehadnoaccesstoimageanalystsorenoughinformationaboutthe
RooftopDetectionThroughMachineLearning 10 1 True Positive Rate periments,anaturalsolutionistoevaluateeachlearningmethodoverarangeofcostsettings. 0 ROC(ReceiverOperatingCharacteristic)analysis(Swets1988)providesaframeworkforcarryingoutsuchcomparisons.Thebasicideaistosystematicallyvarysomeaspectofthesituation, negativerateforeachsituation.althoughresearchershaveusedsuchroccurvesinsignaldetectionandpsychophysicsfordecades(e.g.,green&swets1974;egan1975),thistechniquehas Maloofetal.1997;Provost&Fawcett1997). onlyrecentlybeguntolterintomachinelearningresearch(e.g.,ezawa,singh,&norton1996; Figure2.AnidealizedReceiverOperatingCharacteristic(ROC)curve. 0 1 False Positive Rate suchasthecostratioortheclassdistribution,andtoplotthefalsepositiverateagainstthefalse onthenegativeclassaremaximallyexpensive(i.e.,c+=0:0andc?=1:0).conversely,theupper learningalgorithm.thelowerleftcornerofthegurerepresentsthesituationinwhichmistakes rightcorneroftherocgraphrepresentsthesituationinwhichmistakesonthepositiveclassare Figure2showsanidealizedROCcurvegeneratedbyvaryingthecostparameterofacost-sensitive maximallyexpensive(i.e.,c+=1:0andc?=0:0).byvaryingovertherangeofcostparameters andplottingtheclassier'struepositiveandfalsepositiverates,weproduceaseriesofpointsthat representsthealgorithm'saccuracytrade-o.thepoint(0,1)iswhereclassicationisperfect, withafalsepositiverateofzeroandatruepositiverateofone,sowewantroccurvesthat\push" withcurvesthatcoverlargerareasgenerallybeingviewedasbetter(hanley&mcneil1982;swets towardthiscorner. 1988).Giventheskewednatureoftherooftopdata,andthedierentbutimprecisecostsoferrors onthetwoclasses,wedecidedtouseareaundertheroccurveasthedependentvariableinour TraditionalROCanalysisusesareaunderthecurveasthepreferredmeasureofperformance, experimentalstudies.thismeasureraisesproblemswhentwocurveshavesimilarareasbutare dissimilarandasymmetric,andthusoccupydierentregionsoftherocspace.insuchcases, other.aswewillsee,thisrelationtypicallyholdsforourcost-sensitivealgorithmsintherooftop appearstobemostappropriatewhencurveshavesimilarshapesandwhenoneisnestedwithinthe othertypesofanalysisaremoreuseful(e.g.,provost&fawcett1997),butareaunderthecurve detectiondomain.
RooftopDetectionThroughMachineLearning 11 1 1 True Positive Rate 0.8 0.8 True Positive Rate 0.6 0.6 Figure3.ROCcurvesforImages1and2.Weraneachmethodbytrainingandtestingusingdataderived 0.4 0.4 fromthesameimageoverarangeofmisclassicationcosts.weconductedtensuchrunsand Naive Bayes Naive Bayes 0.2 Nearest Neighbor 0.2 Nearest Neighbor Budds Classifier Budds Classifier 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 7.ExperimentalStudies butdierentaspects:image1isanadirview,whileimage2isanoblique. plottedtheaveragetruepositiveandfalsepositiverates.theseimagesareofthesamelocation False Positive Rate False Positive Rate (rooftopcandidates)separatefromthoseusedtotestthelearnedclassiers.aswewillsee,the Astypicallydoneinsuchstudies,ineachexperimentwetrainedtheinductionmethodsondata Toinvestigatetheuseofmachinelearningforthetaskofrooftopdetection,weconductedexperimentsusingthecost-sensitiveversionsofnaiveBayes,nearestneighbor,andtheBuddsclassier. experimentsdieredinwhetherthetrainingandtestcasescamefromthesameordistinctimages, whichletusexaminedierentformsofgeneralizationbeyondthetrainingdata. Ourrstexperimentalstudyexaminedhowthevariousmethodsbehavedgivenwithin-imagelearning,thatis,whengeneralizingtotestcasestakenfromthesameimageonwhichwetrainedthem. 7.1Within-ImageLearning rooftopclassierswouldhavelargerareasthanthoseofthebuddsclassier. Ourresearchhypothesiswasthatthelearnedclassierswouldbemoreaccurate,overarangeof misclassicationcosts,thanthehandcraftedlinearclassier.becauseourmeasureofperformance wasareaundertheroccurve,thistranslatesintoapredictionthattheroccurvesofthelearned andfalsepositiveratesfortenruns.sincecostsarerelative(i.e.,c+=0:0andc?=0:5isequivalent toc+=0:25andc?=0:75)andourdomaininvolvedonlytwoclasses,wevariedthecostparameter foronlyoneclassatatimeandxedtheotheratzero.eachruninvolvedpartitioningthedataset Foreachimageandmethod,wevariedtheerrorcostsandmeasuredtheresultingtruepositive set.becausethebuddsclassierwashand-congured,ithadnotrainingphase,soweappliedit inthetrainingset,andevaluatingtheresultingconceptdescriptionsusingthedatainthetest directlytotheinstancesinthetestset.foreachcostsettingandeachclassier,weplottedthe randomlyintotraining(60%)andtest(40%)sets,runningthelearningalgorithmsontheinstances similarresults,butbothfarebetterthanthebuddsclassier.ratherthanpresentthecurves averagefalsepositiverateagainsttheaveragetruepositiverateoverthetenruns. Figure3presentstheROCcurvesforImages1and2.NaiveBayesandnearestneighborgive
RooftopDetectionThroughMachineLearning 12 Table2.Resultsforwithin-imageexperiments.Foreachimage,wegeneratedROCcurvesbytrainingand testingeachmethodoverarangeofcosts.weusedtheapproximateareaunderthecurveasthe measureofperformance,whichappearwith95%condenceintervals.naivebayesperformedbest overall,withthebuddsclassieroutperformingnearestneighboronthreeofthesiximages. Image1 Image2ApproximateAreaunderROCCurve NearestNeighbor0.8230.0190.8330.0160.9110.0100.8010.0280.8190.0270.7390.017 BuddsClassier0.7170.0090.7730.0040.8990.0150.9010.0070.8330.0210.8490.010 NaiveBayes 0.8700.0080.8120.0170.9620.0130.9080.0250.8690.0160.8350.025 Image3 Image4 Image5 Image6 fortheremainingfourimages,wefollowswets(1988)andreport,intable2,theareaunder pairofadjacentpointsintheroccurve.forallimagesexceptforimage6,naivebayesproduced eachroccurve,whichweapproximatedbysummingtheareasofthetrapezoidsdenedbyeach curveswithareasgreaterthanthoseforthebuddsclassier,thusgenerallysupportingourresearch hypothesis.onimages4,5,and6,nearestneighbordidworsethanthehandcraftedmethod,which runscountertoourprediction. ourmotivatingproblemisthelargenumberofimagesthattheanalystmustprocess.inorderto 7.2Between-ImageLearning Wegearedournextsetofexperimentsmoretowardthegoalsofimageanalysis.Recallthat alleviatethisburden,wewanttoapplyknowledgelearnedfromsomeimagestomanyotherimages. dierenttimesandimagesofdierentareaspresentsimilarissues. viewpointsofthesamesiteinorientationorinanglefromtheperpendicular.imagestakenat learnedknowledgetonewimages.forexample,oneviewpointofagivensitecandierfromother Butwehavealreadynotedthatseveraldimensionsofvariationposeproblemstotransferringsuch versionofthepreviousone:classierslearnedfromonesetofimageswouldbemoreaccurateon unseenimagesthanhandcraftedclassiers.however,wealsoexpectedthatbetween-imagelearning generalizestootherimagesthatdieralongsuchdimensions.ourhypothesisherewasarened Wedesignedexperimentstoletusunderstandbetterhowtheknowledgelearnedfromoneimage wouldgiveloweraccuracythanthewithin-imagesituation,sincedierencesacrossimageswould makegeneralizationmoredicult. fromthesamelocation.asanexample,forthenadiraspect,wechoseimage1andthentested thelearningalgorithmsonanimagefromoneaspectandtestonanimagefromanotheraspectbut hadimagesfromtwoaspects(i.e.,nadirandoblique)andfromthreelocations.thisletustrain Oneexperimentfocusedonhowthemethodsgeneralizeoveraspect.RecallfromTable1thatwe plottedtheresultsasroccurves,asshowninfigure4.theareasunderthesecurvesandtheir usingtheimagesfromeachlocation,whilevaryingtheircostparametersandmeasuringtheirtrue onimage2,whichisanobliqueimageofthesamelocation.weranthealgorithmsinthismanner 95%condenceintervalsappearinTable3. positiveandfalsepositiverates.wethenaveragedthesemeasuresacrossthethreelocationsand
RooftopDetectionThroughMachineLearning 13 1 1 True Positive Rate 0.8 0.8 True Positive Rate 0.6 0.6 Figure4.ROCcurvesforexperimentsthattestedgeneralizationoveraspect.Left:Foreachlocation,we 0.4 0.4 trainedeachmethodontheobliqueimageandtestedtheresultingconceptdescriptionsonthe Naive Bayes Naive Bayes 0.2 nadirimage.weplottedtheaveragetruepositiveandfalsepositiverates.right:wefolloweda Nearest Neighbor 0.2 Nearest Neighbor similarmethodology,exceptthatwetrainedthemethodsonthenadirimagesandtestedonthe Budds Classifier Budds Classifier obliqueimages. 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 False Positive Rate False Positive Rate fortestingondatafromobliqueimages.forexample,table3showsthatnaivebayesgenerates obliqueimages,sincethecurvesfortestingonnadircandidatesaregenerallyhigherthanthose acurvewithanareaof0.878forthenadirimages,butproducesacurvewithanareaof0.842 Oneobviousconclusionisthatthenadirimagesappeartoposeaneasierproblemthanthe fortheobliqueimages.theothertwomethodsshowasimilardegradationinperformancewhen generalizingfromnadirtoobliqueimagesratherthanfromobliquetonadirimages. tion,naivebayes(withanareaundertheroccurveof0.878)performsbetterthanthebudds classier,withanareaof0.837,whichinturndidbetterthannearestneighbor(0.795).fornadir toobliquegeneralization,naivebayesperformsslightlybetterthanthebuddsclassier,which Uponcomparingthebehaviorofdierentmethods,wendthat,forobliquetonadirgeneraliza- produceareasof0.842and0.831,respectively.nearestneighbor'scurveinthissituationcoversan areaof0.785,whichisconsiderablysmaller. methodsonpairsofimagesfromoneaspectandtestedonthethirdimagefromthesameaspect. candidatesfromimages1and3,thentestingoncandidatesfromimage5.wethenraneachofthe Asanexample,forthenadirimages,oneofthethreelearningrunsinvolvedtrainingonrooftop Asecondexperimentexaminedgeneralizationoverlocation.Tothisend,wetrainedthelearning algorithmsacrossarangeofcosts,measuringthefalsepositiveandtruepositiverates.weplotted theaveragesofthesemeasuresacrossallthreelearningrunsforoneaspectinanroccurve,as showninfigure5. Comparingthebehaviorofthevariousmethods,Table3showsthat,forthenadiraspect,naive nitiontaskthanthenadiraspect,sincetheobliqueareasarelessthanthoseforthenadirimages. BayesperformsslightlybetterthantheBuddsclassier,whichgiveareasof0.901and0.837. Inthiscontext,weagainseeevidencethattheobliqueimagespresentedamoredicultrecog- curve.whengeneralizingoverlocationontheobliqueimages,naivebayesandthebuddsclassi- Asbefore,bothdidbetterthannearestneighbor,whichyieldedanareaof0.819underitsROC erproducedroccurveswithequalareasof0.831.thesewereconsiderablybetterthannearest neighbor's,whichhadanareaof0.697.
RooftopDetectionThroughMachineLearning 14 1 1 True Positive Rate 0.8 0.8 True Positive Rate 0.6 0.6 Figure5.ROCcurvesforexperimentthattestedgeneralizationoverlocation.Left:Foreachpairofimages 0.4 0.4 forthenadiraspect,wetrainedthemethodsonthatpairandtestedtheresultingconceptdescriptionsonthethirdimage.wethenplottedtheaveragetruepositiveandfalsepositiverates.right: Weappliedthesamemethodologyusingtheimagesfortheobliqueaspect. Naive Bayes Naive Bayes 0.2 Nearest Neighbor 0.2 Nearest Neighbor Budds Classifier Budds Classifier 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Thus,theresultswiththenaiveBayesianclassiersupportourmainhypothesis.InallexperimentalconditionsthismethodfaredbetterthanorequaltotheBuddslinearclassier.Onthe False Positive Rate False Positive Rate otherhand,thebehaviorofnearestneighbortypicallygaveworseresultsthanthehandcrafted rooftopdetector,whichwentagainstouroriginalexpectations. forthewithin-imagecondition(table2),naivebayesproducedanaveragerocareaof0.9forthe generalizingwithinimages.totestthishypothesis,wemustcomparetheresultsfromtheseexperimentswiththosefromthewithin-imageexperiments(seetable3).simplecalculationshowsthat, Recallthatwealsoanticipatedthatgeneralizingacrossimageswouldgiveloweraccuraciesthan nadirimagesand0.851fortheobliqueimages.similarly,nearestneighboraveraged0.851forthe nadirimagesand0.791fortheobliqueimages.mostofthesetheseareasaresubstantiallyhigher thenadirimage,buttheresultsgenerallysupportourprediction. thantheanalogousareasthatresultedwhenthesemethodsgeneralizedacrosslocationandaspect. TheoneexceptionisthatnaiveBayesactuallydidequallywellwhengeneralizingoverlocationfor performanceinthewithin-imageconditionandinthebetween-imageconditions.forexample, naivebayes'averagedegradationinperformanceoverallexperimentalconditionswas0.013,while eralizingtounseenimages.thiscanbeseenbycomparingthedierencesbetweeneachmethod's AlsonotethatnaiveBayes'performancedegradedlessthanthatofnearestneighborwhengen- nearestneighbor'swas0.47.thisconstitutesfurtherevidencethatnaivebayesisbettersuitedfor 7.3LearningfromAllAvailableImages thisdomain,atleastwhenoperatingovertheninefeaturesusedinourexperiments. OurnextstudyusedalloftherooftopcandidatesgeneratedfromthesixFortHoodimages,since wewantedtoreplicateourpreviousresultsinasituationsimilartothatweenvisionbeingusedin practice,whichwoulddrawontrainingcasesfromallimages.basedontheearlierexperiments,we anticipatedthatthenaivebayesianclassierwouldyieldanroccurveofgreaterareathanthose oftheothermethods.
RooftopDetectionThroughMachineLearning 15 Table3.Resultsforbetween-imageexperiments.WeagainusedtheapproximateareaundertheROC and`oblique'indicatethetestingcondition.wederivedanalogousresultsforthewithin-image thebest,whilethebuddsclassiergenerallyoutperformednearestneighbor.thelabels`nadir' experimentsbyaveragingtheresultsforeachcondition.approximateareasappearwith95% curveasthemeasureofperformance,alongwith95%condenceintervals.naivebayesperformed condenceintervals. Nadir AspectExperiment Oblique LocationExperiment Nadir Oblique Nadir WithinImage NearestNeighbor0.7950.0350.7850.0530.8190.0580.6970.0270.8510.0190.7910.020 BuddsClassier0.8370.0850.8310.0680.8370.0850.8310.0680.8370.0850.8310.068 NaiveBayes 0.8780.0420.8420.0630.9010.0790.8310.0670.9000.0120.8510.022 Oblique positiveand17,048labelednegative.weraneachalgorithmtentimesoverarangeofcosts.for (40%)sets,thenaveragedtheresultsforeachcostleveloveritstenruns. eachrunandsetofcostparameters,werandomlysplitthedataintotraining(60%)andtesting Combiningtherooftopcandidatesfromallsiximagesgaveus17,829instances,781labeled performedthebestoverall,producingacurvewitharea0.85.nearestneighborfaredslightly betterthanthebuddsclassier,yieldinganareaof0.801,comparedto0.787forthelatter. whereastable4givestheapproximateareaunderthesecurves.asanticipated,naivebayes Figure6showstheresultingROCcurves,whichplotthetruepositiveandfalsepositiverates, curvebut,rather,willhavespecicerrorcostsinmind,eveniftheycannotstatethemformally. WehaveusedROCcurvesbecausewedonotknowthesecostsinadvance,butwecaninspect behaviorofthevariousclassiersatdierentpointsonthesecurvestogivefurtherinsightintohow Inpractice,imageanalystswillnotevaluateaclassiersperformanceusingareaundertheROC muchthelearnedclassiersarelikelytoaidanalystsduringactualuse. rateof0.84andafalsepositiverateof0.27,thethirddiamondfromtherightinfigure6.toobtain thesametruepositiverate,thebuddsclassierproduceda0.62falsepositiverate.thismeans that,foragiventruepositiverate,naivebayesreducedthefalsepositiveratebymorethanhalf Forexample,considerthebehaviorofthenaiveBayesianclassierwhenitachievesatruepositive naivebayesimprovedthetruepositiverateby0.12overthebuddsclassier.inthiscase,the wouldhaverejected5,969morenon-rooftopsthanbudds_similarly,byxingthefalsepositiverate, Bayesianclassierwouldhavefound86morerooftopsthanBuddswouldhavedetected. overthehandcraftedclassier.hence,fortheimagesweconsidered,thenaivebayesianclassier 7.4RatesofLearning Wewerealsointerestedinthebehaviorofthelearningmethodsastheyprocessedincreasing amountsoftrainingdata.ourlong-termgoalistoembedthelearnedclassierinaninteractive systemthatsupportsanimageanalyst.forthisreason,wewouldpreferalearningalgorithmthat achieveshighaccuracyfromrelativelyfewtrainingcases,sincethisshouldreducetheloadonthe humananalyst.
RooftopDetectionThroughMachineLearning 16 1 True Positive Rate 0.8 0.6 0.4 Figure6.ROCcurvefortheexperimentusingallavailableimagedata.Weraneachmethodoverarangeof Naive Bayes costsusingatrainingset(60%)andatestingset(40%)andaveragedthetruepositiveandfalse 0.2 Nearest Neighbor Budds Classifier 0 0 0.2 0.4 0.6 0.8 1 Tothisend,wecarriedoutanalexperimentinwhichwesystematicallyvariedthenumber positiveratesovertenruns.naivebayesproducedthecurvewiththelargestarea,butnearest neighboralsoyieldedacurvelargerinareathanthatforthebuddsclassier. False Positive Rate candidates,splittingthedataintotraining(60%)andtest(40%)sets,butfurtherdividingthe trainingsetrandomlyintotensubsets(10%,20%,:::,100%).weranthelearningalgorithmson oftrainingcasesavailabletothelearningmethod.weagainusedalloftheavailablerooftop eachofthetrainingsubsetsandevaluatedtheacquiredconceptdescriptionsonthereservedtesting thethebuddsclassierisat,sinceitinvolvesnotrainingandwesimplyappliedittothesame data,averagingourresultsover25separatetraining/testsplits. undertheroccurvesforagivennumberoftrainingcases.asexpected,thelearningcurvefor testsetforeachnumberoftrainingcases.however,nearestneighborproducesacurvethatstarts Figure7showstheresultinglearningcurves,eachpointofwhichcorrespondstotheaveragearea belowthatofthebuddsclassierandthensurpassesitafterseeing70%ofthetrainingdata.naive oneimage.notonlywasnaivebayesthebestperformingmethod,butalsoitwasabletoachieve Bayesshowssimilarimprovementwithincreasingamountsoftrainingdata,butitsperformance Thisequatestoroughly6%oftheavailabledataandislessthantheamountofdataderivedfrom wasbetterthanthebuddsclassierfromthestart,afterobservingonly10%ofthetrainingdata. thisperformanceusingverylittleoftheavailabletrainingdata. 7.5Summary naivebayes,showedpromiseofimprovingtherooftopdetectiontaskoverthehandcraftedlinear dataderivedfromthesameimage,itwasapparentthatatleastonemachinelearningmethod, Fromthewithin-learningexperiments,inwhichwetrainedandtestedthelearningmethodsusing classier.theresultsfromthisexperimentalsoestablishedbaselineperformanceconditionsforthe couldbebecausebuddswasinitiallydevelopedusingnadirimagesandthenextendedtohandle thatrooftopdetectionforobliqueimagesposedamoredicultproblemthanfornadirimages.this methodsbecausetheycontrolledfordierencesinaspectandlocation. Inaneorttotestthelearningmethodsfortheirabilitytogeneralizetounseenimages,wefound
RooftopDetectionThroughMachineLearning 17 Table4.Resultsfortheexperimentusingalloftheimagedata.Wesplitthedataintotraining(60%)and test(40%)setsandraneachmethodoverarangeofcosts.wethencomputedtheaveragearea undertheroccurveand95%condenceintervalsovertenruns. Classier NaiveBayes NearestNeighbor BuddsClassier ApproximateArea 0.8500.008 0.8010.008 obliqueimages.thus,thefeaturesmaybebiasedtowardnadir-viewrooftops.amorelikely 0.7870.008 explanationisthatobliqueimagesaresimplyharderthannadirimages.nevertheless,underall nearestneighbor. generalizingtounseenimages,butthattheperformanceofnaivebayesdegradedlessthanthatof linearclassier.finally,wealsodiscoveredthattheperformanceofthemethodsdegradedwhen circumstances,theperformanceofnaivebayeswasequaltoorbetterthanthatofthehandcrafted thehandcraftedsolutionbymorethanafactoroftwofortruepositiveratesof0.84andhigher. naivebayesandnearestneighboroutperformedthebuddsclassier.furtheranalysisofspecic pointsontheroccurvesrevealedthatnaivebayesimproveduponthefalsepositiverateof Ournalexperimentusedalloftheavailableimagedataforlearninganddemonstratedthat theavailabletrainingdata. LearningcurvesdemonstratedthatnaiveBayesachievedsuperiorperformanceusingverylittleof workinvisuallearningtakesanimage-basedapproach(e.g.,beymer&poggio1996),inwhichthe Researchonlearningincomputervisionhasbecomeincreasinglycommoninrecentyears.Some 8.RelatedWork thepixelsintoadecisionorclassication.researchershaveusedthisapproachextensivelyforface andgesturerecognition(e.g.,chan,nasrabadi,&mirelli1996;guttaetal.1996;osuna,freund,& process,whichisresponsibleforformingtheintermediaterepresentationsnecessarytotransform imagesthemselves,usuallynormalizedortransformedinsomeway,areusedasinputtoalearning Girosi1997;Segen1994),althoughithasseenotherapplicationsaswell(e.g.,Nayar&Poggio1996; Pomerleau1996;Viola1993). features,basedonintensityorshapeproperties,thenlearnstorecognizedesiredobjectsusing thesemachine-producedclassiers.shepherd(1983)useddecision-treeinductiontoclassifyshapes ofchocolatesforanindustrialvisionapplication.cromwellandkak(1991)tookasimilarapproach Aslightlydierentapproachreliesonhandcraftedvisionroutinestoextractrelevantimage forrecognizingelectricalcomponents,suchastransistors,resistors,andcapacitors.maloofand Michalski(1997)examinedvariousmethodsoflearningshapecharacteristicsfordetectingblasting capsinx-rayimages,whereasadditionalwork(maloofetal.1996)discussedlearninginamultistepvisionsystemforthesamedetectionproblem. byconklin(1993),connellandbrady(1987),cooketal.(1993),provan,langley,andbinford Severalresearchershavealsoinvestigatedlearningforthree-dimensionalvisionsystems.Papers
RooftopDetectionThroughMachineLearning 18 0.9 Average Area under Curve 0.85 0.8 Figure7.LearningcurvesforareaundertheROCcurveusingallavailableimagedata.Weraneachmethod 0.75 Naive Bayes Nearest Neighbor Budds Classifier 0.7 (1996),andSenguptaandBoyer(1993)alldescribeinductiveapproachesaimedatimprovingobject onincreasingamountsoftrainingdataandevaluatedtheresultingconceptdescriptionsonreserved testingdata.eachpointisanaverageoftenruns. 10 20 30 40 50 60 70 80 90 100 Percentage of Training Data recognition.theaimhereistolearnthethree-dimensionalstructurethatcharacterizesanobjector objectclass,ratherthanitsappearance.anotherlineofresearch,whichfallsmidwaybetweenthis approachandimage-basedschemes,insteadattemptstolearnasmallsetofcharacteristicviews, Pope&Lowe1996). eachofwhichcanbeusedtorecognizeanobjectfromadierentperspective(e.g.,gros1993; costoferrorsintotheiralgorithmforconstructingandpruningmultivariatedecisiontrees.they theselineshassomeprecedents.inparticular,draper,brodley,andutgo(1994)incorporatethe testedthisapproachonthetaskoflabelingpixelsfromoutdoorimagesforusebyaroad-following Mostworkonvisuallearningignorestheimportanceofmisclassicationcosts,butourworkalong testpixels.woods,bowyer,andkegelmeyer(1996),aswellasrowley,baluja,andkanade(1996), reportsimilarworkthattakesintoaccountthecostoferrors. thanthereverse,andshowedexperimentallythattheirmethodcouldreducesucherrorsonnovel vehicle.theydeterminedthat,inthiscontext,labelingaroadpixelasnon-roadwasmorecostly intosemanticnetworksthatitthengeneralizedbycomparingtodescriptionsofotherinstances. Draper1997;Teller&Veloso1997).OneexceptionisConnellandBrady's(1987)workonlearning structuraldescriptionsofairplanesfromaerialviews.theirmethodconvertedtrainingimages Muchoftheresearchonvisuallearningusesimagesofscenesorobjectsviewedateyelevel(e.g., However,theauthorsdonotappeartohavetestedexperimentallytheiralgorithm'sabilityto 1996),whichcatalogscelestialobjects,suchasgalaxiesandstars,usingimagesfromtheSecond accuratelyclassifyobjectsinnewimages.anotherexampleistheskicatsystem(fayyadetal. PalomarObservatorySkySurvey. UsingROCcurves,theydemonstratethattheensembleachievedbetterperformancethaneither detectvenusianvolcanos,usingsyntheticapertureradaronthemagellanspacecraft.askerand Maclin(1997)extendJARToolbyusinganensembleof48neuralnetworkstoimproveperformance. Arelatedsystem,JARTool(Fayyadetal.1996),alsoanalyzesaerialimages,inthiscaseto theindividuallearnedclassiersortheoneusedoriginallyinjartool.theyalsodocumentsome
RooftopDetectionThroughMachineLearning 19 ofthedicultiesassociatedwithapplyingmachinelearningtechniquestoreal-worldproblems,such asfeatureselectionandinstancelabeling,whichweresimilartoproblemsweencountered. neuralnetworks)tolearnconditionsonoperatorselection.hepresentsinitialresultsonaradius procedure(forsoftwaresimilartobudds),thenusesaninductionmethod(backpropagationin Hisapproachadaptsmethodsforreinforcementlearningtoassigncreditinmulti-stagerecognition Finally,Draper(1996)reportsacarefulstudyoflearninginthecontextofanalyzingaerialimages. taskthatalsoinvolvesthedetectionofroofs.ourframeworksharessomefeatureswithdraper's approach,butassumesthatlearningisdirectedbyfeedbackfromahumanexpert.wepredict thatoursupervisedmethodwillbemorecomputationallytractablethanhisuseofreinforcement learning,whichiswellknownforitshighcomplexity.ourapproachdoesrequiremoreinteraction withusers,butwebelievethisinteractionwillbeunobtrusiveifcastwithinthecontextofan advisorysystemforimageanalysis. Althoughthisstudyhasprovidedsomeinsightintotheroleofmachinelearninginimageanalysis, 9.ConcludingRemarks muchstillremainstobedone.forexample,wemaywanttoconsiderothermeasuresofperformance thattakeintoaccountthepresenceofmultiplevalidcandidatesforagivenrooftop.classifying methods,weintendtoworkatbothearlierandlaterlevelsofthebuildingdetectionprocess.the goalhereisnotonlytoincreaseclassicationaccuracy,whichcouldbehandledentirelybycandidate oneofthesecandidatescorrectlyissucientforthepurposeofimageanalysis. selection,butalsotoreducethecomplexityofprocessingbyremovingpoorcandidatesbeforethey Inaddition,althoughtherooftopselectionstagewasanaturalplacetostartinapplyingour areaggregatedintolargerstructures.withthisaiminmind,weplantoextendourworktoall andtakingthisintoaccountinourmodiedinductionalgorithms.anotherconcernswhetherwe shouldusethesameinductionalgorithmateachlevelorusedierentmethodsateachstage. levelsoftheimageunderstandingprocess.wemustaddressanumberofissuesbeforewecanmake progressontheseotherstages.oneinvolvesidentifyingthecostofdierenterrorsateachlevel, madebytheimageunderstandingsystem,generatingtrainingdataintheprocess.atintervals hopetointegratelearningroutinesintobudds.thissystemwasnotdesignedinitiallytobeinteractive,butweintendtomodifyitsothattheimageanalystcanacceptorrejectrecommendations Aswementionedearlier,inordertoautomatethecollectionoftrainingdataforlearning,wealso thesystemwouldinvokeitslearningalgorithms,producingrevisedknowledgethatwouldalterthe interactivelabelingsystemdescribedinsection5couldserveasaninitialmodelforthisinterface. system'sbehaviorinthefutureand,hopefully,reducetheuser'sneedtomakecorrections.the ingtheaccuracy,andthustherobustness,ofimageanalysissystems.however,weneedadditional experimentstogivebetterunderstandingofthefactorsaectingbetween-imagegeneralizationand weneedtoextendlearningtoadditionallevelsoftheimageunderstandingprocess.also,beforewe Inconclusion,ourstudiessuggestthatmachinelearninghasanimportantroletoplayinimprov- canbuildasystemthattrulyaidsthehumanimageanalyst,wemustfurtherdevelopunobtrusive waystocollecttrainingdatatosupportlearning.
RooftopDetectionThroughMachineLearning 20 TheauthorsthankRamNevatia,AndresHuertas,andAndyLinfortheirassistanceinobtaining Acknowledgements theimagesanddatausedforexperimentationandprovidingvaluablecommentsandadvice.we wouldalsoliketothankdanshapirofordiscussionsaboutdecisiontheoryandwayneibaforhis assistancewithnaivebayes.thisworkwasconductedattheinstituteforthestudyoflearning andexpertiseandinthecomputationallearninglaboratory,centerforthestudyoflanguage andinformation,atstanforduniversity.theresearchwassupportedbythedefenseadvanced ResearchProjectsAgency,undergrantN00014-94-1-0746,administeredbytheOceofNaval Research,andbySunMicrosystemsthroughagenerousequipmentgrant. References Aha,D.;Kibler,D.;andAlbert,M.1991.Instance-basedlearningalgorithms.MachineLearning Asker,L.,andMaclin,R.1997.Featureengineeringandclassierselection:acasestudyinVenusianvolcanodetection.InProceedingsoftheFourteenthInternationalConferenceonMachine 6:37{66. Beymer,D.,andPoggio,T.1996.Imagerepresentationsforvisuallearning.Science272:1905{1909. Learning,3{11.SanFrancisco,CA:MorganKaufmann. Bradley,A.1997.TheuseoftheareaundertheROCcurveintheevaluationofmachinelearning Breiman,L.;Friedman,J.;Olshen,R.;andStone,C.1984.Classicationandregressiontrees. algorithms.patternrecognition30:1145{1159. Cardie,C.,andHowe,N.1997.Improvingminorityclasspredictionusingcase-specicfeature Belmont,CA:Wadsworth. Chan,L.;Nasrabadi,N.;andMirelli,V.1996.Multi-stagetargetrecognitionusingmodularvector 65.SanFrancisco,CA:MorganKaufmann. weights.inproceedingsofthefourteenthinternationalconferenceonmachinelearning,57{ Clark,P.,andNiblett,T.1989.TheCN2inductionalgorithm.MachineLearning3:261{284. VisionandPatternRecognition,114{119.LosAlamitos,CA:IEEEPress. quantizersandmultilayerperceptrons.inproceedingsoftheieeeconferenceoncomputer Conklin,D.1993.Transformation-invariantindexingandmachinediscoveryforcomputervision.In Connell,J.,andBrady,M.1987.Generatingandgeneralizingmodelsofvisualobjects.Articial WorkingNotesoftheAAAIFallSymposiumonMachineLearninginComputerVision,10{14. Intelligence31:159{183. MenloPark,CA:AAAIPress. Cook,D.;Hall,L.;Stark,L.;andBowyer,K.1993.Learningcombinationofevidencefunctions Cromwell,R.,andKak,A.1991.Automaticgenerationofobjectclassdescriptionsusingsymbolic inobjectrecognition.inworkingnotesoftheaaaifallsymposiumonmachinelearningin ComputerVision,139{143.MenloPark,CA:AAAIPress. learningtechniques.inproceedingsoftheninthnationalconferenceonarticialintelligence, 710{717.
RooftopDetectionThroughMachineLearning 21 Draper,B.;Brodley,C.;andUtgo,P.1994.Goal-directedclassicationusinglinearmachine Draper,B.1996.Learninggroupingstrategiesfor2Dand3Dobjectrecognition.InProceedingsof decisiontrees.ieeetransactionsonpatternanalysisandmachineintelligence16(9):888{893. Draper,B.1997.Learningcontrolstrategiesforobjectrecognition.InIkeuchi,K.,andVeloso,M., eds.,symbolicvisuallearning.newyork,ny:oxforduniversitypress.49{76. theimageunderstandingworkshop,1447{1454.sanfrancisco,ca:morgankaufmann. Egan,J.1975.SignaldetectiontheoryandROCanalysis.NewYork,NY:AcademicPress. Fawcett,T.,andProvost,F.1997.Adaptivefrauddetection.DataMiningandKnowledgeDiscovery Ezawa,K.;Singh,M.;andNorton,S.1996.Learninggoal-orientedBayesiannetworksfortelecommunicationsriskmanagement.InProceedingsoftheThirteenthInternationalConferenceon 1:291{316. MachineLearning,139{147.SanFrancisco,CA:MorganKaufmann. Firschein,O.,andStrat,T.,eds.1997.RADIUS:imageunderstandingforimageryintelligence. Fayyad,U.;Smyth,P.;Burl,M.;andPerona,P.1996.Learningtocatalogscienceimages.In 237{268. Nayar,S.,andPoggio,T.,eds.,Earlyvisuallearning.NewYork,NY:OxfordUniversityPress. Freund,Y.,andSchapire,R.1996.Experimentswithanewboostingalgorithm.InProceedings SanFrancisco,CA:MorganKaufmann. Freund,Y.;Seung,H.;Shamir,E.;andTishby,N.1997.SelectivesamplingusingtheQueryby ofthethirteenthinternationalconferenceonmachinelearning,148{156.sanfrancisco,ca: MorganKaufmann. Green,D.,andSwets,J.1974.Signaldetectiontheoryandpsychophysics.NewYork,NY:Robert Committeealgorithm.MachineLearning28:133{168. Gros,P.1993.Matchingandclustering:Twostepstowardsautomaticobjectmodelgeneration E.KriegerPublishing. Gutta,S.;Huang,J.;Imam,I.;andWeschler,H.1996.Faceandhandgesturerecognitionusing incomputervision.inworkingnotesoftheaaaifallsymposiumonmachinelearningin ComputerVision,40{44.MenloPark,CA:AAAIPress. Hanley,J.,andMcNeil,B.1982.ThemeaninganduseoftheareaunderaReceiverOperating hybridclassiers.inproceedingsofthesecondinternationalconferenceonautomaticfaceand Characteristic(ROC)curve.Radiology143:29{36. GestureRecognition,164{169.LosAlamitos,CA:IEEEPress. Kubat,M.,andMatwin,S.1997.Addressingthecurseofimbalancedtrainingsets:one-sided Langley,P.,andSimon,H.1995.Applicationsofmachinelearningandruleinduction.CommunicationsoftheACM38:55{64. selection.inproceedingsofthefourteenthinternationalconferenceonmachinelearning,179{ 186.SanFrancisco,CA:MorganKaufmann. Langley,P.;Iba,W.;andThompson,K.1992.AnanalysisofBayesianclassiers.InProceedings Press. ofthetenthnationalconferenceonarticialintelligence,223{228.menlopark,ca:aaai
RooftopDetectionThroughMachineLearning 22 Lin,C.,andNevatia,R.1996.Buildingdetectionanddescriptionfrommonocularaerialimages. Lewis,D.,andCatlett,J.1994.Heterogeneousuncertaintysamplingforsupervisedlearning.In cisco,ca:morgankaufmann. ProceedingsoftheEleventhInternationalConferenceonMachineLearning,148{156.SanFran- Maloof,M.,andMichalski,R.1997.Learningsymbolicdescriptionsofshapeforobjectrecognition InProceedingsoftheImageUnderstandingWorkshop,461{468.SanFrancisco,CA:Morgan inx-rayimages.expertsystemswithapplications12:11{20. Kaufmann. Maloof,M.;Langley,P.;Sage,S.;andBinford,T.1997.Learningtodetectrooftopsinaerial Maloof,M.;Duric,Z.;Michalski,R.;andRosenfeld,A.1996.RecognizingblastingcapsinX-ray MorganKaufmann. images.inproceedingsoftheimageunderstandingworkshop,835{845.sanfrancisco,ca: images.inproceedingsoftheimageunderstandingworkshop,1257{1261.sanfrancisco,ca: Michalski,R.;Mozetic,I.;Hong,J.;andLavrac,H.1986.Themulti-purposeincrementallearning MorganKaufmann. Nayar,S.,andPoggio,T.,eds.1996.Earlyvisuallearning.NewYork,NY:OxfordUniversityPress. systemaq15anditstestingapplicationtothreemedicaldomains.inproceedingsofthefifth Osuna,E.;Freund,R.;andGirosi,F.1997.TrainingSupportVectorMachines:anapplication NationalConferenceonArticialIntelligence,1041{1045.MenloPark,CA:AAAIPress. Pazzani,M.;Merz,C.;Murphy,P.;Ali,K.;Hume,T.;andBrunk,C.1994.Reducingmisclassi- tofacedetection.inproceedingsoftheieeeconferenceoncomputervisionandpattern cationcosts.inproceedingsoftheeleventhinternationalconferenceonmachinelearning, Recognition,130{136.LosAlamitos,CA:IEEEPress. Pomerleau,D.1996.Neuralnetworkvisionforrobotdriving.InNayar,S.,andPoggio,T.,eds., Earlyvisuallearning.NewYork,NY:OxfordUniversityPress.161{181. 217{225.SanFrancisco,CA:MorganKaufmann. Pope,A.,andLowe,D.1996.Learningprobabilisticappearancemodelsforobjectrecognition.In Provan,G.;Langley,P.;andBinford,T.1996.Probabilisticlearningofthree-dimensionalobject models.inproceedingsoftheimageunderstandingworkshop,1403{1413.sanfrancisco,ca: 67{97. Nayar,S.,andPoggio,T.,eds.,Earlyvisuallearning.NewYork,NY:OxfordUniversityPress. Provost,F.,andFawcett,T.1997.Analysisandvisualizationofclassierperformance:comparison MorganKaufmann. Quinlan,J.1993.C4.5:Programsformachinelearning.SanFrancisco,CA:MorganKaufmann. underimpreciseclassandcostdistributions.inproceedingsofthethirdinternationalconferenceonknowledgediscoveryanddatamining,43{48.menlopark,ca:aaaipress. Rowley,H.;Baluja,S.;andKanade,T.1996.Neuralnetwork-basedfacedetection.InProceedings oftheieeeconferenceoncomputervisionandpatternrecognition,203{208.losalamitos, CA:IEEEPress.
RooftopDetectionThroughMachineLearning 23 Segen,J.1994.GEST:alearningcomputervisionsystemthatrecognizeshandgestures.InMichalski,R.,andTecuci,G.,eds.,MachineLearning:AMultistrategyApproach,volume4.San Francisco,CA:MorganKaufmann.621{634. Sengupta,K.,andBoyer,K.1993.Incrementalmodelbaseupdating:learningnewmodelsites.In Shepherd,B.1983.Anappraisalofadecisiontreeapproachtoimageclassication.InIJCAI-83, 473{475. MenloPark,CA:AAAIPress. WorkingNotesoftheAAAIFallSymposiumonMachineLearninginComputerVision,1{5. Soderland,S.,andLehnert,W.1994.Corpus-drivenknowledgeacquisitionfordiscourseanalysis. Teller,A.,andVeloso,M.1997.PADO:anewlearningarchitectureforobjectrecognition.In Swets,J.1988.Measuringtheaccuracyofdiagnosticsystems.Science240:1285{1293. InProceedingsoftheTwelfthNationalConferenceonArticialIntelligence,827{832. Turney,P.1995.Cost-sensitiveclassication:empiricalevaluationofahybridgeneticdecisiontree Ikeuchi,K.,andVeloso,M.,eds.,Symbolicvisuallearning.NewYork,NY:OxfordUniversity Press.77{112. Viola,P.1993.Feature-basedrecognitionofobjects.InWorkingNotesoftheAAAIFallSymposium inductionalgorithm.journalofarticialintelligenceresearch2:369{409. Woods,K.;Bowyer,K.;andKegelmeyer,W.1996.Combinationofmultipleclassiersusinglocal onmachinelearningincomputervision,60{64.menlopark,ca:aaaipress. Zurada,J.1992.Introductiontoarticialneuralsystems.St.Paul,MN:WestPublishing. accuracyestimates.inproceedingsoftheieeeconferenceoncomputervisionandpattern Recognition,391{396.LosAlamitos,CA:IEEEPress.