Modelingand Rendering Architecture from Photographs: Ahybrid geometry- and image-based approach

Size: px
Start display at page:

Download "Modelingand Rendering Architecture from Photographs: Ahybrid geometry- and image-based approach"


1 Modelingand Rendering Architecture from Photographs: Ahbrid geometr- and image-based approach PaulE. Debevec CamilloJ. Talor JitendraMalik UniversitofCaliforniaat Berkele 1 ABSTRACT Wepresentanewapproachformodelingandrenderingeistingarchitectural scenesfrom asparsesetof still photographs. Our modeling approach, which combines both geometr-basedand imagebased techniques, has two components. The first component is a photogrammetricmodelingmethodwhichfacilitatestherecoverof thebasicgeometrofthephotographedscene. Ourphotogrammetric modelingapproachis effective, convenient,androbustbecause it eploits the constraints that are characteristic of architectural scenes. Thesecondcomponentis amodel-basedstereo algorithm, which recovers how the real scene deviates from the basic model. Bmakinguseofthemodel,ourstereotechniquerobustlrecovers accuratedepthfrom widel-spacedimagepairs. Consequentl,our approachcanmodellargearchitecturalenvironmentswithfarfewer photographs than current image-based modeling approaches. For producingrenderings,wepresentview-dependentteturemapping, amethodof compositingmultiple views of ascenethat better simulates geometricdetailonbasicmodels. Ourapproachcanbeused to recovermodelsfor useineither geometr-basedorimage-based rendering sstems. Wepresent results that demonstrate our approach sabilittocreaterealisticrenderingsofarchitecturalscenes from viewpointsfar from theoriginal photographs. CR Descriptors: I.2.10 [Artificial Intelligence]: Vision and Scene Understanding - Modeling and recover of phsical attributes; I.3.7 [Computer Graphics]: Three-Dimensional GraphicsandRealism-Color,shading,shadowing,andtetureI.4.8[Image Processing]: Scene Analsis -Stereo; J.6 [Computer-Aided Engineering]: Computer-aideddesign(CAD). 1 INTRODUCTION Efforts to model the appearance and dnamics of the real world have producedsome of the most compelling imager in computer graphics. In particular, efforts to model architectural scenes, from the Amiens Cathedral to the Giza Pramids to Berkele s Soda Hall, have produced impressive walk-throughs and inspiring flbs. Clearl, it is anattractive applicationtobe ableto eplorethe world sarchitectureunencumberedb fences,gravit, customs,or jetlag. 1 Computer Science Division, Universit of California at Berkele, Berkele,CA also debevec/research (a) Geometr Based user input Modeling Program model Rendering Algorithm renderings teture maps (b) Hbrid Approach images basic model Model Based Stereo depth maps Image Warping renderings user input Photogrammetric Modeling Program (c) Image Based images Stereo Correspondence depth maps Image Warping renderings (user input) Unfortunatel, current geometr-based methods (Fig. 1a) of modeling eisting architecture, in which amodeling program is used to manuall position the elements of the scene, have several drawbacks. First,theprocessisetremellabor-intensive,tpicall involving surveing the site, locating and digitizing architectural plans(ifavailable),orconvertingeistingcaddata(again,ifavailable). Second,it is difficulttoverifwhethertheresultingmodelis accurate. Most disappointing,though, is that the renderings of the resultingmodelsarenoticeablcomputer-generated;eventhosethat emploliberalteture-mappinggenerallfailtoresemblerealphotographs. Figure 1: Schematic of how our hbrid approach combines geometr-basedandimage-basedapproachestomodelingandrenderingarchitecturefromphotographs. Recentl, creating models directl from photographs has received increased interest in computer graphics. Since real images areusedas input, suchanimage-basedsstem(fig. 1c) hasanadvantagein producingphotorealistic renderings as output. Some of the most promisingof thesesstems[16, 13] rel on the computer visiontechniqueofcomputationalstereopsistoautomaticalldeterminethestructureofthescenefromthemultiplephotographsavailable. As aconsequence,however,thesesstemsareonl as strong as the underling stereoalgorithms. This has causedproblems because state-of-the-art stereo algorithms have anumber of significantweaknesses;inparticular,thephotographsneedtoappearver similar for reliable results to be obtained. Becauseof this, current image-basedtechniquesmustusemancloselspacedimages,and insomecasesemplosignificantamountsofuserinputforeachimagepair to supervisethe stereoalgorithm. In this framework, capturingthedataforarealisticallrenderablemodelwouldrequirean impracticalnumberofcloselspacedphotographs,andderivingthe depthfromthephotographscouldrequireanimpracticalamountof userinput. Theseconcessionstotheweaknessof stereoalgorithms bode poorl for creating large-scale, freel navigable virtual environmentsfrom photographs. Ourresearchaims tomaketheprocessofmodelingarchitectural

2 scenes more convenient, more accurate, and more photorealistic thanthemethodscurrentlavailable. Todothis,wehavedeveloped anewapproachthatdrawsonthestrengthsofbothgeometr-based andimage-basedmethods,asillustratedinfig. 1b. Theresultisthat ourapproachtomodelingandrenderingarchitecturerequiresonla sparsesetofphotographsandcanproducerealisticrenderingsfrom arbitrar viewpoints. In our approach, abasic geometric model of the architectureis recoveredinteractivelwith aneas-to-usephotogrammetricmodelingsstem,novelviewsarecreatedusingviewdependentteturemapping,andadditional geometricdetail canbe recovered automaticall through stereo correspondence. The final images can be rendered with current image-based rendering techniques. Because onl photographs are required, our approach to modeling architecture is neither invasive nor does it require architecturalplans,cadmodels,orspecializedinstrumentationsuchas surveingequipment,gpssensorsor rangescanners. 1.1 Background and Related Work The process of recovering 3D structure from 2D images has been acentral endeavorwithin computer vision, and the processof rendering such recovered structures is asubject receiving increased interest in computer graphics. Although no general technique eists to derivemodelsfrom images,fourparticular areasof research haveprovidedresultsthatareapplicabletotheproblemofmodeling andrendering architectural scenes. Theare: Camera Calibration, Structure from Motion, Stereo Correspondence, and Image-Based Rendering Camera Calibration Recovering3D structure from images becomes asimpler problem whenthecamerasusedarecalibrated,thatis,themappingbetween imagecoordinatesanddirectionsrelativeto eachcamerais known. This mappingis determined b,among other parameters,the camera sfocal length and its pattern of radial distortion. Camera calibrationisawell-studiedproblembothinphotogrammetrandcomputer vision;somesuccessfulmethods include[20] and[5]. While there has beenrecent progress in the useof uncalibratedviews for 3D reconstruction [7], we have found camera calibration to be a straightforward processthatconsiderablsimplifiestheproblem Structure from Motion Given the 2D projection of apoint in the world, its position in 3D spacecouldbe anwhereon ara etendingout in aparticular direction from the camera soptical center. However, when the projections of asufficient numberof points in the world are observed inmultipleimagesfromdifferentpositions,itistheoreticallpossibletodeducethe3dlocationsofthepoints aswellas thepositions oftheoriginal cameras,uptoanunknownfactor ofscale. This problem has been studied in the area of photogrammetr for theprincipal purposeof producingtopographicmaps. In 1913, Kruppa[10] provedthefundamentalresultthat giventwo viewsof five distinct points, one could recover the rotation and translation betweenthetwocamerapositionsaswellasthe3dlocationsofthe points(uptoascalefactor). Sincethen,theproblem smathematical andalgorithmicaspectshavebeeneploredstartingfromthefundamentalworkofullman[21] andlonguet-higgins[11],intheearl 1980s. Faugeras sbook[6]overviewsthestateoftheartasof1992. So far, ake realization has been that the recover of structure is versensitivetonoiseinimagemeasurementswhenthetranslation betweentheavailablecamerapositions is small. Attention has turned to using more than two views with image streammethodssuchas[19]orrecursiveapproaches(e.g. [1]). [19] showsecellentresultsforthecaseoforthographiccameras,butdirect solutions for the perspective case remain elusive. In general, linear algorithms for the problem fail to make use of all available informationwhilenonlinearminimizationmethodsarepronetodifficultiesarisingfromlocalminimaintheparameterspace. Analternativeformulation oftheproblem[17] useslines ratherthanpoints as image measurements, but the previousl stated concerns were shown to remain largel valid. For purposes of computer graphics,thereis etanotherproblem: themodelsrecoveredbthesealgorithms consistof sparsepoint fields or individual line segments, whicharenot directlrenderableassolid3dmodels. In our approach, we eploit the fact that we are tring to recovergeometricmodelsofarchitecturalscenes,notarbitrar threedimensional point sets. This enables us to include additional constraints not tpicall availableto structurefrom motion algorithms and to overcomethe problems of numerical instabilit that plague suchapproaches. Ourapproachis demonstratedinausefulinteractivesstemfor buildingarchitecturalmodelsfrom photographs Stereo Correspondence The geometricaltheor of structure from motion assumesthat one isabletosolvethecorrespondenceproblem,whichistoidentifthe pointsin twoor moreimagesthatareprojections ofthesamepoint in the world. In humans, correspondingpoints in the two slightl differing images on the retinas are determined bthe visual corte intheprocesscalledbinocularstereopsis. Years of research(e.g. [2, 4, 8, 9, 12, 15]) have shownthat determiningstereocorrespondencesbcomputeris difficultproblem. Ingeneral,currentmethodsaresuccessfulonlwhentheimagesare similarinappearance,asinthecaseofhumanvision,whichis usuallobtainedbusingcamerasthatarecloselspacedrelativetothe objectsinthescene. Whenthedistancebetweenthecameras(often called the baseline) becomes large, surfaces in the images ehibit different degreesof foreshortening,different patterns of occlusion, andlargedisparitiesintheirlocationsinthetwoimages,allofwhich makesit muchmoredifficult for the computertodeterminecorrect stereocorrespondences.unfortunatel,thealternativeofimproving stereocorrespondencebusingimagestakenfromnearblocations has the disadvantagethat computingdepthbecomesver sensitive tonoisein imagemeasurements. In this paper,we showthat havinganapproimatemodelof the photographedscenemakes it possibleto robustl determine stereo correspondences from images taken from widel varing viewpoints. Specificall, the model enables us to warp the images to eliminate unequalforeshorteningandto predict major instancesof occlusionbeforetringto findcorrespondences Image-Based Rendering In animage-basedrenderingsstem, the modelconsistsof aset of images of asceneand their correspondingdepth maps. When the depth of ever point in an image is known, the image can be rerenderedfrom annearbpoint ofview bprojectingthepiels of the image to their proper 3D locations and reprojecting them onto anew image plane. Thus, anew image of the sceneis created b warping the images accordingto their depthmaps. Aprincipal attraction of image-basedrendering is that it offers amethodof rendering arbitraril comple scenes with aconstantamount of computationrequiredperpiel. Usingthis propert,[23] demonstrated howregularl spacedsntheticimages (with their computeddepth maps)couldbewarpedandcompositedinrealtimetoproduceavirtualenvironment. Morerecentl,[13]presentedareal-timeimage-basedrendering sstem that used panoramic photographs with depth computed, in part,fromstereocorrespondence.onefindingofthepaperwasthat etracting reliable depth estimates from stereo is ver difficult. The method was nonetheless able to obtain acceptable results for nearbviews usinguserinput to aidthe stereodepthrecover: the correspondencemapforeachimagepairwasseededwith100to500 user-suppliedpointcorrespondencesandalsopost-processed.even 2

3 with userassistance,theimagesusedstill hadtobecloselspaced; thelargestbaselinedescribedinthepaperwas fivefeet. Therequirementthat samplesbe closetogether is aserious limitation to generating afreel navigable virtual environment. Covering the size of just one cit block would require thousands of panoramic images spaced five feet apart. Clearl, acquiring so manphotographsisimpractical. Moreover,evenadenselatticeof ground-basedphotographswouldonlallowrenderingstobegeneratedfrom withinafewfeetoftheoriginal cameralevel,precluding anvirtualfl-bsofthescene. Etendingthedenselatticeofphotographs into three dimensionswould clearl makethe acquisition process even more difficult. The approachdescribedin this paper takesadvantageof thestructureinarchitecturalscenessothatit requiresonlasparsesetofphotographs. Foreample,ourapproach hasieldedavirtualfl-aroundofabuildingfrom justtwelvestandardphotographs. 1.2 Overview In this paper we present three new modeling and rendering techniques: photogrammetric modeling, view-dependent teture mapping, andmodel-basedstereo. Weshow how thesetechniquescan be used in conjunction to ield aconvenient, accurate, and photorealistic methodof modelingandrendering architecture from photographs. Inourapproach,thephotogrammetricmodelingprogram isusedtocreateabasicvolumetricmodelofthescene,whichisthen usedto constrainstereomatching. Our renderingmethod compositesinformationfrommultipleimageswithview-dependentteturemapping. Our approach is successful becauseit splits the task of modelingfrom imagesintotaskswhichareeasilaccomplishedb aperson(butnotacomputeralgorithm), andtaskswhichareeasil performedbacomputeralgorithm (butnotaperson). In Section 2, we present our photogrammetric modeling method. Inessence,wehaverecastthestructurefrommotionproblem not as the recover of individual point coordinates, but as the recoveroftheparametersofaconstrainedhierarchofparametric primitives. The result is that accurate architectural models can be recoveredrobustlfromjustafewphotographsandwithaminimal numberof user-suppliedcorrespondences. InSection3,wepresentview-dependentteturemapping,and showhowitcanbeusedtorealisticallrendertherecoveredmodel. Unlike traditional teture-mapping, in which asingle static image is used to color in each face of the model, view-dependent teturemappinginterpolates betweentheavailablephotographsofthe scenedependingon the user spoint of view. This results in more lifelike animations that better capturesurfacespecularities andunmodeledgeometricdetail. Lastl, in Section 4, we present model-based stereo, which is usedtoautomaticallrefineabasicmodelofaphotographedscene. This techniquecanbeusedto recoverthestructure ofarchitectural ornamentation that would be difficult to recover with photogrammetricmodeling. Inparticular,weshowthatprojectingpairsofimages onto an initial approimatemodel allows conventionalstereo techniques to robustl recover ver accurate depth measurements from imageswith widelvaringviewpoints. 2 Photogrammetric Modeling In this sectionwepresentour methodfor photogrammetricmodeling, in which the computer determines the parameters of ahierarchicalmodelof parametricpolhedralprimitives toreconstructthe architectural scene. Wehaveimplemented this methodin Faç ade, aneas-to-useinteractivemodelingprogramthatallows theuserto constructageometricmodelofascenefromdigitizedphotographs. WefirstoverviewFaç adefromthepointofviewoftheuser,thenwe describeour model representation, andthen we eplain our reconstructionalgorithm. Lastl,wepresentresultsfromusingFaç adeto reconstructseveralarchitectural scenes. 2.1 The User s View Constructing ageometric model of an architectural scene using Faç adeisanincrementalandstraightforwardprocess. Tpicall,the userselectsasmallnumberofphotographstobeginwith,andmodelsthesceneonepieceatatime. Theusermarefinethemodeland includemoreimagesintheprojectuntilthemodelmeetsthedesired levelofdetail. Fig. 2(a)and(b)showsthetwotpesofwindowsusedinFaç ade: image viewers and model viewers. The user instantiates the componentsof the model, marks edgesin the images, andcorresponds theedgesintheimagestotheedgesinthemodel. Wheninstructed, Faç adecomputesthesizesandrelativepositionsofthemodelcomponentsthat bestfit theedgesmarkedinthephotographs. Components of the model, called blocks, are parameterized geometric primitives such as boes, prisms, and surfaces of revolution. Abo,foreample,isparameterizedbits length,width,and height. The user models the scene as acollection of such blocks, creatingnewblockclassesas desired. Of course,the userdoesnot needto specif numerical values for the blocks parameters, since thesearerecoveredbtheprogram. The userma chooseto constrainthe sizesand positions of an oftheblocks. InFig. 2(b),mostoftheblockshavebeenconstrained to have equal length and width. Additionall, the four pinnacles havebeenconstrainedto havethesameshape. Blocksmaalsobe placedin constrainedrelations to oneother. For eample,manof theblocksinfig. 2(b) havebeenconstrainedtositcenteredandon topoftheblockbelow. Suchconstraintsarespecifiedusingagraphical3Dinterface. Whensuchconstraintsareprovided,theareused tosimplif thereconstructionproblem. The user marks edge features in the images using apoint-andclickinterface;agradient-basedtechniqueasin[14] canbeusedto align the edges with sub-piel accurac. Weuse edge rather than point features since the are easier to localize and less likel to be completel obscured. Onl asection of each edge needs to be marked,making it possibleto usepartiall visibleedges. For each markededge,the user alsoindicates the correspondingedgein the model. Generall,accuratereconstructionsareobtainedifthereare as man correspondences in the images as there are free camera andmodelparameters. Thus,Faç adereconstructsscenesaccuratel evenwhenjustaportionof thevisibleedgesandmarkedintheimages, and when just aportion of the model edges are given correspondences. Atantime,theusermainstructthecomputertoreconstructthe scene. The computer then solves for the parameters of the model that causeit to align with the markedfeatures in the images. During thereconstruction,the computercomputesanddisplas thelocationsfromwhichthephotographsweretaken. Forsimplemodels consistingofjustafewblocks,afullreconstructiontakesonlafew seconds;for more complemodels, it cantakeafew minutes. For this reason,theusercaninstruct thecomputerto emplofaster but less precisereconstructionalgorithms (see Sec. 2.4) during theintermediatestagesof modeling. Toverifthetheaccuracoftherecoveredmodelandcamerapositions,Faç adecanprojectthemodelintotheoriginalphotographs. Tpicall, the projected model deviates from the photographs b lessthanapiel. Fig. 2(c)showstheresultsofprojectingtheedges ofthemodelinfig. 2(b) intotheoriginal photograph. Lastl,the usermageneratenovelviews of themodelbpositioningavirtualcameraatandesiredlocation. Faç adewillthenuse theview-dependentteture-mappingmethodofsection3torender anovelviewofthescenefromthedesiredlocation. Fig. 2(d)shows anaerial renderingof thetowermodel. 2.2 Model Representation Thepurposeofourchoiceofmodelrepresentationistorepresentthe sceneas asurfacemodelwith as fewparametersas possible: when 3

4 (a) (b) (c) (d) Figure 2: (a) Aphotographof the Campanile, Berkele sclocktower,with markededges shown in green. (b) The model recoveredb our photogrammetricmodelingmethod. Althoughonltheleftpinnaclewasmarked,theremainingthree(includingonenotvisible)wererecovered from smmetrical constraints in the model. Our method allows an number of images to be used, but in this case constraints of smmetr madeit possibleto recoveranaccurate3d modelfromasinglephotograph. (c) Theaccuracof themodelis verifiedbreprojectingit into the original photographthroughthe recoveredcameraposition. (d) Asnthetic viewof the Campanilegeneratedusing the view-dependent teture-mappingmethoddescribedin Section3. Arealphotographfrom this position wouldbe difficult to takesincethe cameraposition is 250feet abovetheground. themodelhasfewerparameters,theuserneedstospeciffewercorrespondences,andthecomputercanreconstructthemodelmoreefficientl. In Faç ade, the sceneis representedas aconstrainedhierarchical model of parametric polhedral primitives, called blocks. Each block has asmall set of parameters which serve to define its size and shape. Each coordinate of each verte of the block is thenepressedaslinearcombinationoftheblock sparameters,relative to an internal coordinate frame. For eample, for the wedge block in Fig. 3, the coordinates of the verte P o are written in terms of theblockparameters width, height, andlength as P o = (,width;,height;length ) T.Eachblockis alsogivenanassociatedboundingbo. wedge_height P 0 Bounding Bo z wedge_width wedge_depth Figure3: Awedgeblockwith its parametersandboundingbo. roof z first_store z ground_plane z z entrance roof ground_plane g (X) 1 first_store g (X) 2 entrance (a) (b) Figure 4: (a) Ageometric model of asimple building. (b) The model shierarchical representation. The nodes in the tree represent parametric primitives (called blocks) while the links contain thespatialrelationshipsbetweentheblocks. TheblocksinFaç adeareorganizedinahierarchicaltreestructure asshowninfig. 4(b). Eachnodeofthetreerepresentsanindividual block,whilethelinksinthetreecontainthespatialrelationshipsbetweenblocks,calledrelations. Suchhierarchicalstructuresarealso usedintraditional modelingsstems. Therelationbetweenablockanditsparentismostgenerallrepresentedas arotationmatri R andatranslationvector t. This representationrequiressiparameters: threeeachfor R and t. Inarchitectural scenes,however,therelationship betweentwo blocks usuall has asimple form that canbe representedwith fewer parameters, andfaçadeallowstheusertobuildsuchconstraintson R and t intothemodel. Therotation R betweenablockandits parentcan bespecifiedinoneofthreewas: first,asanunconstrainedrotation, requiring three parameters; second,as arotation about aparticular coordinateais,requiringjust oneparameter;or third, as afiedor nullrotation, requiringnoparameters. Likewise, Faç ade allows for constraints to be placed on each component of the translation vector t. Specificall, the user can constrainthe boundingboes of two blocks to align themselves in some manner along eachdimension. For eample, in order to ensurethattheroof blockin Fig. 4lies ontopof thefirststor block, the usercan require thatthe maimum etentof the firststor block be equal to the minimum etent of the roof block. With this constraint, the translation along the ais is computed (t = (firststor MA X, roof MIN )) rather than representedas aparameterof themodel. Eachparameterof eachinstantiatedblockis actuallareference to anamedsmbolic variable, as illustrated in Fig. 5. As aresult, two parameters of different blocks (or of the same block) can be equatedbhavingeachparameterreferencethesamesmbol. This facilit allows theuser to equatetwo or more of the dimensionsin amodel, which makes modeling smmetrical blocks and repeated structuremoreconvenient. Importantl,theseconstraintsreducethe numberofdegreesoffreedominthemodel,which,aswewillshow, simplifiesthestructure recoverproblem. Once the blocks and their relations have beenparameterized, it is straightforward to derive epressions for the world coordinates of the block vertices. Consider the set of edges which link aspecific block in the model to the ground plane as shown in Fig. 4. 4

5 ToappearintheSIGGRAPH 96 conferenceproceedings BLOCKS Block1 tpe: wedge length width height name: "building_length" value: 20.0 name: "building_width" value: 10.0 VARIABLES Thus, the unknown model parameters and camera positions are computedbminimizing O withrespecttothesevariables. Oursstemusesthe the error function Err i from [17], describedbelow. Block2 tpe: bo length width height name:"roof_height" value: 2.0 name: "first_store_height" value: 4.0 Figure5: Representationofblockparametersassmbolreferences. Asinglevariablecanbereferencedbthemodelinmultipleplaces, allowingconstraintsofsmmetrto beembeddedin themodel. Let g 1(X);:::;g n(x) representtherigidtransformationsassociated with each of these links, where X represents the vector of all the model parameters. The world coordinates P w(x) of aparticular blockverte P (X) is then: P w(x) =g 1(X):::g n(x)p (X) (1) Similarl, the world orientation v w(x) of aparticular line segment v(x) is: v w(x) =g 1(X):::g n(x)v(x) (2) Intheseequations,thepointvectors P and P w andtheorientation vectors v and v w arerepresentedin homogeneouscoordinates. Modelingthescenewithpolhedralblocks,asopposedtopoints, line segments, surface patches, or polgons, is advantageousfor a numberof reasons: Mostarchitecturalscenesarewellmodeledbanarrangement ofgeometricprimitives. Blocksimplicitlcontaincommonarchitecturalelementssuch asparallel lines andright angles. Manipulatingblockprimitives is convenientsincetheareat asuitablhighlevelofabstraction;individualfeaturessuchas points andlines arelessmanageable. Asurface model of the scene is readil obtained from the blocks,sothereis noneedtoinfersurfacesfrom discretefeatures. Modelinginterms ofblocksandrelationshipsgreatlreduces the number of parameters that the reconstruction algorithm needsto recover. Thelastpointiscrucialtotherobustnessofourreconstructionalgorithm andtheviabilit of ourmodelingsstem,andis illustrated bestwith aneample. Themodelin Fig. 2is parameterizedbjust 33variables(the unknowncamerapositionaddssimore). If each blockin the scenewere unconstrained(in its dimensionsandposition),themodelwouldhave240parameters;ifeachlinesegmentin the scenewere treated independentl,the model wouldhave2,896 parameters. Thisreductioninthenumberofparametersgreatlenhancesthe robustnessandefficiencof themethodas comparedto traditionalstructurefrommotionalgorithms. Lastl,sincethenumber of correspondencesneededto suitabl overconstrainthe minimizationis roughlproportionaltothenumberofparametersinthe model,thisreductionmeansthatthenumberofcorrespondencesrequiredoftheuseris manageable. 2.3 Reconstruction Algorithm Our reconstruction algorithm works b minimizing an objective function O that sums the disparit betweenthe projected edges of the model and the edges markedin the images, i.e. O = P Err i where Err i represents the disparit computed for edge feature i. image plane Camera Coordinate Sstem m z image edge <R, t> v World Coordinate Sstem z d 3D line h(s) h 1 ( 1, 1 ) predicted line: m + m + m z f = 0 P(s) Observed edge segment h 2 ( 2, 2 ) (a) (b) Figure 6: (a) Projection of astraight line onto acamera simage plane. (b) Theerrorfunctionusedin thereconstructionalgorithm. Theheavlinerepresentstheobservededgesegment(markedbthe user)andthelighterlinerepresentsthemodeledgepredictedbthe currentcameraandmodelparameters. Fig. 6(a) shows how astraight line in the model projects onto the image plane of acamera. The straight line can be defined b a pair of vectors hv;d i where v represents the direction of the line and d representsapointontheline. Thesevectorscanbecomputed fromequations2and1respectivel. Thepositionofthecamerawith respecttoworldcoordinatesisgivenintermsofarotationmatri R j andatranslationvector t j.thenormalvectordenotedb m inthe figureis computedfrom thefollowing epression: m = R j(v (d, t j)) (3) Theprojection of theline ontothe imageplaneis simpl theintersectionoftheplanedefinedb m withtheimageplane,locatedat z =,f where f is thefocallengthofthecamera. Thus,theimage edgeis definedbtheequation m + m, m zf =0. Fig. 6(b)showshowtheerror betweentheobservedimageedge f( 1; 1); ( 2; 2)g and the predicted image line is calculated for eachcorrespondence. Points onthe observededgesegmentcanbe parameterized b a single scalar variable s 2 [0;l] where l is the lengthoftheedge. Welet h(s) bethefunctionthatreturnstheshortestdistancefromapointonthesegment, p(s),tothepredictededge. With thesedefinitions,thetotal errorbetweentheobservededge segmentandthepredictededgeis calculatedas: Err i = where: Z l 0 A = B = h 2 (s)ds = m ( = m ;m ;m z) T l 3 (h2 1+h 1h 2+h 2 2) = m T (A T BA)m l 0 1:5 3(m 2 + 1m 2 ) 0:5 Thefinalobjectivefunction O isthesumoftheerrortermsresulting from eachcorrespondence. Weminimize O using avariant of thenewton-raphsonmethod,whichinvolvescalculatingthegradient andhessianof O with respectto the parameters of the camera (4) 5

6 and the model. As we have shown, it is simple to construct smbolicepressionsfor m intermsoftheunknownmodelparameters. The minimization algorithm differentiates these epressions smbolicall to evaluate the gradient and Hessian after each iteration. The procedureis inepensivesince the epressions for d and v in Equations2and1haveaparticularlsimpleform. 2.4 Computing an Initial Estimate TheobjectivefunctiondescribedinSection2.3sectionisnon-linear with respecttothemodelandcameraparametersandconsequentl can have local minima. If the algorithm begins at arandom location in the parameter space,it stands little chanceof convergingto thecorrectsolution. Toovercomethis problemwehavedeveloped amethodto directl computeagoodinitial estimate for the model parametersandcamerapositionsthatisnearthecorrectsolution. In practice, our initial estimate method consistentl enables the nonlinearminimization algorithm toconvergetothecorrectsolution. Ourinitialestimatemethodconsistsoftwoproceduresperformed in sequence. The first procedure estimates the camera rotations whilethesecondestimatesthecameratranslationsandtheparametersofthemodel. Bothinitialestimateproceduresarebaseduponan eaminationof Equation3. From this equationthe following constraints canbededuced: m T R jv 0 = (5) m T 0 R j(d, = t j ) (6) Givenanobservededge u ij themeasurednormal m 0 totheplane passingthroughthecameracenteris: m 0 = 1 1,f! 2 2,f Fromtheseequations,weseethatanmodeledgesofknownorientationconstrainthe possiblevaluesfor R j.sincemost architectural models containmansuchedges(e.g. horizontalandvertical lines), each camera rotation can be usuall be estimated from the modelindependentofthemodelparametersandindependentofthe camera slocationinspace. Ourmethoddoesthisbminimizingthe following objectivefunction O 1 thatsums theetentsto whichthe rotations R j violatetheconstraintsarisingfrom Equation5: O 1 = X i! (7) (m T R jv i) 2 ; v i 2f^; ^; ^zg (8) Once initial estimates for the camera rotations are computed, Equation 6is used to obtain initial estimates of the model parameters and camera locations. Equation 6reflects the constraint that allofthepointsonthelinedefinedbthetuple hv; di shouldlieon theplanewithnormalvector m passingthroughthecameracenter. This constraintis epressedinthefollowing objectivefunction O 2 where P i(x) and Q i(x) areepressionsfortheverticesofanedge ofthemodel. O 2 = X i (m T R j (P i(x), t j)) 2 +(m T R j (Q i(x), t j)) 2 (9) In the specialcasewhere all of the blockrelations in the model have aknown rotation, this objective function becomes asimple quadraticform whichis easilminimizedbsolvingasetoflinear equations. Oncetheinitialestimateisobtained,thenon-linearminimization overtheentireparameterspaceisappliedtoproducethebestpossiblereconstruction. Tpicall,theminimization requiresfewer than ten iterations and adjusts the parameters of the modelb at most a few percentfrom the initial estimates. The edges of the recovered models tpicall conform to the original photographs to within a piel. Figure7: Threeoftwelvephotographsusedtoreconstructtheentire eterior of UniversitHigh Schoolin Urbana, Illinois. Thesuperimposedlines indicatetheedgestheuserhas marked. (a) (c) Figure 8: The high school model, reconstructedfrom twelve photographs. (a) Overheadview.(b) Rearview.(c) Aerialviewshowingtherecoveredcamerapositions. Twonearlcoincidentcameras can be observed in front of the building; their photographs were takenfromthesecondstorofabuildingacrossthestreet. Figure 9: Asnthetic view of Universit High School. This is a framefromananimationof flingaroundtheentirebuilding. (b) 6

7 (a) (b) (c) Figure10: ReconstructionofHooverTower,Stanford,CA(a)Originalphotograph,withmarkededgesindicated. (b)modelrecovered fromthesinglephotographshownin(a). (c)teture-mappedaerial viewfromthevirtualcamerapositionindicatedin (b). Regionsnot seenin(a) areindicatedin blue. 2.5 Results Fig. 2showed the results of using Faç ade to reconstruct aclock tower from asingle image. Figs. 7and 8show the results of using Faç ade to reconstruct ahigh schoolbuilding from twelve photographs. (Themodelwas originall constructedfrom just fiveimages;theremainingimageswereaddedtotheprojectforpurposesof generatingrenderingsusingthetechniquesofsection3.) Thephotographsweretakenwithacalibrated35mmstillcamerawithastandard50mmlensanddigitizedwiththePhotoCDprocess. Imagesat the piel resolution were processedto correctfor lens distortion,thenfiltereddownto pielsforuseinthemodelingsstem. Fig. 8showssomeviewsoftherecoveredmodeland camerapositions,andfig. 9showsasntheticviewofthebuilding generatedbthetechniqueinsec. 3. Fig. 10 shows the reconstruction of another tower from asingle photograph. The domewas modeledspeciallsincethe reconstructionalgorithmdoesnotrecovercurvedsurfaces. Theuserconstrainedatwo-parameterhemisphereblocktositcenteredontopof thetower, andmanualladjustedits heightandwidth toalign with the photograph. Eachof the models presentedtook approimatel four hourstocreate. 3 View-Dependent Teture-Mapping In this sectionwe present view-dependentteture-mapping, an effective method of rendering the scene that involves projecting the originalphotographsontothemodel. Thisformofteture-mapping is most effective when the model conforms closel to the actual structureof thescene,andwhentheoriginal photographsshowthe sceneinsimilarlightingconditions. InSection4wewill showhow view-dependent teture-mapping can be used in conjunction with model-basedstereotoproducerealisticrenderingswhentherecoveredmodelonlapproimatelmodels thestructureofthescene. Since the camera positions of the original photographs are recoveredduringthe modelingphase,projectingthe imagesonto the model is straightforward. In this section we first describehow we projectasingleimageontothemodel,andthenhowwemergeseveral image projections to render the entire model. Unlike traditional teture-mapping, our method projects different images onto themodeldependingontheuser sviewpoint. Asaresult,ourviewdependentteture mapping can give abetter illusion of additional geometricdetailinthemodel. 3.1 Projecting asingle Image Theprocessof teture-mappingasingleimageontothemodelcan be thought of as replacing each camera with aslide projector that projects the original imageontothe model. Whenthemodel is not conve,itis possiblethatsomeparts ofthemodelwill shadowotherswithrespecttothecamera. Whilesuchshadowedregionscould bedeterminedusinganobject-spacevisiblesurfacealgorithm,oran image-spaceracastingalgorithm, weuseanimage-spaceshadow mapalgorithm basedon[22] sinceit is efficientlimplementedusingz-buffer hardware. Fig. 11,upperleft, showsthe results of mappingasingle image onto the high school building model. The recoveredcamera position for the projected imageis indicatedin the lower left corner of theimage. Becauseofself-shadowing,noteverpointonthemodel within thecamera sviewingfrustum is mapped. 3.2 Compositing Multiple Images In general, each photograph will view onl apiece of the model. Thus,it is usuallnecessartousemultiple imagesinorder torendertheentire modelfrom anovelpointof view. Thetop imagesof Fig. 11showtwodifferent imagesmappedontothemodelandrenderedfromanovelviewpoint. Somepielsarecoloredinjustoneof the renderings, while some are coloredin both. Thesetwo renderings can be merged into acompositerendering b consideringthe correspondingpiels intherenderedviews. If apielis mappedin onlonerendering,itsvaluefromthatrenderingisusedinthecomposite. If it is mappedinmorethanonerendering,therendererhas todecidewhichimage(or combinationof images)to use. It wouldbeconvenient,ofcourse,iftheprojectedimageswould agree perfectl where the overlap. However, the images will not necessarilagreeifthereisunmodeledgeometricdetailinthebuilding,orifthesurfacesofthebuildingehibitnon-lambertianreflection. In this case, the best image to use is clearl the one with the viewingangleclosesttothat ofthe renderedview. However,using theimageclosestinangleateverpielmeansthatneighboringrenderedpielsmabesampledfrom differentoriginalimages. When thishappens,specularitandunmodeledgeometricdetailcancause visible seams in the rendering. Toavoid this problem, we smooth thesetransitions throughweightedaveragingas infig. 12. Figure 11: The process of assembling projected images to form a composite rendering. The top two pictures show two images projectedontothemodelfromtheir respectiverecoveredcamerapositions. Thelower left pictureshowstheresultsof compositingthese two renderingsusing our view-dependentweighting function. The lower right picture shows the results of compositing renderingsof all twelve original images. Some piels near the front edge of the roof not seen in an image have beenfilled in with the hole-filling algorithmfrom[23]. Even with this weighting, neighboring piels can still be sampledfromdifferentviewsattheboundarofaprojectedimage,since thecontribution of animagemustbezerooutsideits boundar. To 7

8 ToappearintheSIGGRAPH 96 conferenceproceedings view 1 virtual view view 2 a 2 a 1 (a) (b) model Figure 12: The weighting function used in view-dependentteture mapping. The piel in the virtual view correspondingto the point on the model is assignedaweightedaverageof the corresponding pielsinactualviews1and2. Theweights w 1 and w 2 areinversel inversel proportional to the magnitude of angles a 1 and a 2. Alternatel,moresophisticatedweightingfunctionsbasedonepected foreshorteningandimageresamplingcanbeused. addressthis, thepielweights are rampeddownneartheboundar of the projectedimages. Althoughthis method does not guarantee smoothtransitionsinallcases,wehavefoundthatiteliminatesmost artifacts inrenderingsandanimations arisingfrom suchseams. If an original photograph features an unwanted car, tourist, or otherobjectinfrontofthearchitectureofinterest,theunwantedobjectwill beprojectedontothesurfaceofthemodel. Topreventthis from happening,theusermamaskouttheobjectbpaintingover theobstructionwithareservedcolor. Therenderingalgorithm will thensettheweights foranpielscorrespondingtothemaskedregionstozero,anddecreasetheweightsofthepielsneartheboundarasbeforetominimizeseams. Anregionsinthecompositeimagewhichareoccludedineverprojectedimagearefilledinusing thehole-fillingmethodfrom [23]. Inthediscussionsofar,projectedimageweightsarecomputedat everpielofeverprojectedrendering. Sincetheweightingfunction is smooth (though not constant) across flat surfaces, it is not generallnot necessarto computeit for ever pielof everface of the model. For eample, using asingle weight for each face of the model, computed at the face s center, produces acceptable results. B coarselsubdividing large faces, the results are visuall indistinguishablefromthecasewhereauniqueweightis computed foreverpiel. Importantl,this techniquesuggestsareal-timeimplementation of view-dependent teture mapping using ateturemapping graphics pipeline to render the projected views, and - channelblendingtocompositethem. Forcomplemodelswheremostimagesareentireloccludedfor the tpical view, it canbe ver inefficient to project ever original photographtothenovelviewpoint. Someefficienttechniquestodetermine suchvisibilit aprioriinarchitectural scenesthroughspatial partitioning arepresentedin [18]. 4 Model-BasedStereopsis ThemodelingsstemdescribedinSection2allowstheusertocreateabasicmodel of ascene,but in generalthescenewill haveadditionalgeometricdetail(suchasfriezesandcornices)notcaptured in the model. In this section we present anew method of recoveringsuchadditionalgeometricdetailautomaticallthroughstereo correspondence, which we call model-based stereo. Model-based stereodiffers fromtraditional stereointhatit measureshowtheactual scenedeviatesfrom the approimatemodel,rather thantring tomeasurethestructureofthescenewithoutanpriorinformation. Themodelservesto placethe imagesinto acommonframe of referencethat makes the stereo correspondencepossibleevenfor im- (c) (d) Figure13: View-dependentteturemapping. (a)adetailviewofthe highschoolmodel. (b)arenderingofthemodelfromthesamepositionusingview-dependentteturemapping. Notethatalthoughthe modeldoesnotcapturetheslightlrecessedwindows,thewindows appear properlrecessedbecausethe teturemap is sampled primarilfromaphotographwhichviewedthewindowsfromapproimatelthesamedirection. (c) Thesamepieceofthemodelviewed fromadifferent angle, usingthe sameteturemap as in (b). Since thetetureisnotselectedfromanimagethatviewedthemodelfrom approimatelthesameangle,therecessedwindowsappearunnatural. (d) Amore natural result obtained b using view-dependent teturemapping. Sincethe angleof view in (d) is different than in (b),adifferentcompositionoforiginalimagesisusedtoteture-map themodel. agestakenfrom relativelfar apart. Thestereocorrespondenceinformationcanthenbeusedtorendernovelviewsofthesceneusing image-basedrenderingtechniques. As in traditional stereo, given two images (which we call the ke andoffset), model-basedstereo computes the associateddepth map for the ke imageb determining correspondingpoints in the keandoffset images. Likemanstereoalgorithms, ourmethodis correlation-based,inthatitattemptstodeterminethecorresponding point in the offset image b comparingsmall piel neighborhoods aroundthepoints. Assuch,correlation-basedstereoalgorithmsgenerall require the neighborhoodof each point in the ke image to resemblethe neighborhoodof its correspondingpoint in the offset image. The problem we face is that when the ke and offset images are taken from relativel far apart, as is the casefor our modeling method, corresponding piel neighborhoods can be foreshortened verdifferentl. InFigs. 14(a)and(c),pielneighborhoodstoward theright of the keimageareforeshortenedhorizontallb nearl afactorof fourin theoffsetimage. The ke observation in model-based stereo is that even though two images of the same scenema appear ver different, the appearsimilarafterbeingprojectedontoanapproimatemodelofthe scene. In particular,projectingtheoffsetimageontothemodeland viewingitfromthepositionofthekeimageproduceswhatwecall thewarpedoffsetimage,whichappearsversimilar tothe keimage. The geometricall detailed scenein Fig. 14 was modeled as twoflatsurfaceswithourmodelingprogram,whichalsodetermined therelativecamerapositions. Asepected,thewarpedoffsetimage (Fig. 14(b)) ehibits the same pattern of foreshorteningas the ke image. In model-based stereo, piel neighborhoods are compared be- tweenthekeandwarpedoffsetimagesratherthanthekeandoff- 8

9 (a) KeImage (b)warpedoffsetimage (c)offset Image (d) ComputedDisparitMap Figure 14: (a) and (c) Two images of the entrance to Peterhouse chapel in Cambridge, UK. The Faç ade program was used to model the faç adeandgroundas aflat surfacesandto recoverthe relativecamerapositions. (b) Thewarpedoffset image, producedb projectingthe offset imageontothe approimatemodelandviewingit fromthepositionof thekecamera. This projectioneliminates mostof thedisparit andforeshorteningwithrespectto the keimage,greatlsimplifingstereocorrespondence.(d) Anunediteddisparitmapproducedbour model-basedstereoalgorithm. setimages. Whenacorrespondenceisfound,itissimpletoconvert itsdisparittothecorrespondingdisparitbetweenthekeandoffset images, from which the point sdepth is easil calculated. Fig. 14(d)showsadisparitmapcomputedfor thekeimagein (a). Thereductionofdifferencesinforeshorteningis justoneofseveral wasthatthe warpedoffset imagesimplifies stereocorrespondence. Someother desirableproperties of thewarpedoffset image are: An point in the scenewhich lies on the approimate model willhavezerodisparitbetweenthekeimageandthewarped offset image. Disparitiesbetweenthekeandwarpedoffsetimagesareeasil convertedto adepthmapfor thekeimage. Depth estimates are far less sensitive to noise in image measurementssinceimages takenfrom relativel farapart canbe compared. Placeswherethemodeloccludesitself relative tothekeimagecanbedetectedandindicatedin thewarpedoffset image. Alinear epipolargeometr (Sec. 4.1) eists betweenthe ke and warped offset images, despite the warping. In fact, the epipolar lines of the warped offset image coincide with the epipolarlines ofthekeimage. 4.1 Model-Based Epipolar Geometr In traditional stereo, the epipolar constraint(see [6]) is often used to constrain the search for corresponding points in the offset image to searching along an epipolar line. This constraint simplifies stereo not onl b reducing the search for eachcorrespondenceto onedimension,butalsobreducingthechanceofselectingafalse matches. Inthissectionweshowthattakingadvantageoftheepipolarconstraintisnomoredifficultinmodel-basedstereocase,despite thefactthat theoffsetimageis non-uniforml warped. Fig. 15shows the epipolargeometr for model-basedstereo. If weconsiderapoint P inthescene,thereisauniqueepipolarplane whichpassesthrough P andthe centersof the keandoffset cameras. Thisepipolarplaneintersects thekeandoffsetimageplanes in epipolar lines e k and e o.if we considerthe projection p k of P ontothekeimageplane,theepipolarconstraintstatesthatthecorrespondingpointin the offset imagemust lie somewherealongthe offset image sepipolarline. Inmodel-basedstereo,neighborhoodsinthekeimagearecomparedtothewarpedoffsetimageratherthantheoffsetimage. Thus, to makeuseof the epipolar constraint, it is necessarto determine wherethepielsontheoffsetimage sepipolarlineprojecttointhe warpedoffsetimage. Thewarpedoffsetimageisformedbprojectingtheoffsetimageontothemodel,andthenreprojectingthemodel ontotheimageplaneofthe kecamera. Thus,the projection p o of P intheoffset imageprojects ontothemodelat Q, andthenreprojects to q k in the warpedoffset image. Since eachof theseprojectionsoccurswithintheepipolarplane,anpossiblecorrespondence ke / warped offset image p p o q k epipolar lines k Ke Camera approimate model e k Q P actual structure epipolar plane e o offset image Offset Camera Figure15: Epipolargeometrfor model-basedstereo. for p k inthekeimagemustlie onthekeimage sepipolarline in thewarpedoffset image. Inthecasewheretheactualstructureand the modelcoincideat P, p o is projectedto P andthen reprojected to p k,ieldingacorrespondencewithzerodisparit. Thefactthattheepipolargeometrremainslinearafterthewarping step also facilitates the use of the ordering constraint [2, 6] throughadnamicprogrammingtechnique. 4.2 Stereo Results and Rerendering While the warping step makes it dramaticall easier to determine stereo correspondences,astereo algorithm is still necessarto actualldeterminethem. Thealgorithm wedevelopedtoproducethe imagesinthis paperis describedin[3]. Onceadepthmaphasbeencomputedfor aparticular image,we can rerender the scene from novel viewpoints using the methods described in [23, 16, 13]. Furthermore, when several images and theircorrespondingdepthmapsareavailable,wecanusetheviewdependent teture-mapping method of Section 3to composite the multiple renderings. The novel views of the chapel faç ade in Fig. 16wereproducedthroughsuchcompositingof fourimages. 5 Conclusion and Future Work Toconclude,wehavepresentedanew,photograph-basedapproach to modeling and rendering architectural scenes. Our modeling approach, which combines both geometr-basedand image-based modeling techniques, is built from two components that we have developed. The first component is an eas-to-use photogrammet- 9

10 Figure16: Novelviewsofthescenegeneratedfromfouroriginalphotographs. Theseareframesfromananimatedmovieinwhichthefaç ade rotatescontinuousl. Thedepthis computedfrommodel-basedstereoandtheframesaremadebcompositingimage-basedrenderingswith view-dependentteture-mapping. ric modeling sstem which facilitates the recover of abasic geometricmodelofthephotographedscene. Thesecondcomponentis amodel-basedstereo algorithm, which recovers preciselhow the realscenediffers fromthebasicmodel. Forrendering,wehavepresentedview-dependentteture-mapping,whichproducesimagesb warpingandcompositingmultiple views of thescene. Throughjudicioususeofimages,models,andhumanassistance,ourapproach is more convenient, more accurate, and more photorealistic than current geometr-based or image-based approaches for modeling andrenderingreal-world architecturalscenes. Thereareseveralimprovementsandetensionsthatcanbemade toourapproach. First,surfacesofrevolutionrepresentanimportant componentofarchitecture(e.g. domes,columns,andminarets)that arenot recoveredin ourphotogrammetricmodelingapproach. (As noted,thedomeinfig. 10was manuallsizedbtheuser.) Fortunatel,therehas beenmuchwork(e.g. [24]) thatpresents methods of recoveringsuchstructures from image contours. Curved model geometrisalsoentirelconsistentwithourapproachtorecovering additionaldetailwith model-basedstereo. Second, our techniques should be etended to recognize and modelthephotometricproperties ofthematerials inthescene. The sstem should be able to make better use of photographs taken in varinglighting conditions,and it shouldbe ableto renderimages of the sceneas it would appearat antime of da, in anweather, andwithanconfigurationofartificiallight. Alread,therecovered modelcanbeusedtopredictshadowinginthescenewithrespectto anarbitrar light source. However,afull treatment of the problem will require estimating the photometricproperties (i.e. the bidirectionalreflectancedistributionfunctions)ofthesurfacesinthescene. Third,itisclearthatfurtherinvestigationshouldbemadeintothe problem of selectingwhich original images to use whenrendering anovelviewofthescene. Thisproblemis especialldifficultwhen theavailableimagesaretakenatarbitrarlocations. Ourcurrentsolution to this problem, the weighting functionpresentedin Section 3, still allows seams to appearin renderings and doesnot consider issuesarisingfromimageresampling. Anotherformofviewselectionis requiredtochoosewhichpairsofimagesshouldbematched torecoverdepthin themodel-basedstereoalgorithm. Lastl, it will clearl be an attractive application to integrate the modelscreatedwith thetechniquespresentedin this paperinto forthcomingreal-time image-basedrenderingsstems. Acknowledgments This research was supported b anational Science Foundation Graduate Research Fellowship and grants from Interval Research Corporation, the California MICRO program, and JSEP contract F C The authors also wish to thank Tim Hawkins, CarloSéquin,DavidForsth,andJianboShifortheirvaluablehelp inrevisingthis paper. References [1] Ali AzarbaejaniandAle Pentland. Recursiveestimationof motion,structure, andfocallength.ieeetrans.patternanal.machineintell.,17(6): ,June [2] H. H. Baker and T.O. Binford. Depth fromedgeand intensit based stereo. In ProceedingsoftheSeventhIJCAI,Vancouver,BC,pages ,1981. [3] PaulE.Debevec,CamilloJ. Talor,andJitendraMalik. Modelingandrendering architecturefrom photographs: Ahbridgeometr-andimage-basedapproach. TechnicalReportUCB//CSD ,U.C.Berkele,CSDivision,Januar1996. [4] D.J.Fleet, A.D.Jepson,andM.R.M.Jenkin. Phase-baseddisparitmeasurement. CVGIP:ImageUnderstanding,53(2): ,1991. [5] Oliver Faugeras and Giorgio Toscani. The calibration problem for stereo. In ProceedingsIEEECVPR86,pages15 20,1986. [6] Olivier Faugeras. Three-DimensionalComputerVision. MITPress, [7] Olivier Faugeras, Stephane Laveau, Luc Robert, Gabriella Csurka, and Cril Zeller. 3-d reconstructionof urban scenes from sequencesof images. TechnicalReport2572,INRIA,June1995. [8] W.E. L.Grimson. FromImagesto Surface. MITPress, [9] D. Jones and J. Malik. Computational framework for determining stereo correspondencefrom aset of linear spatial filters. Image and Vision Computing, 10(10): ,December1992. [10] E. Kruppa. Zur ermittlungeinesobjectesauszwei perspektivenmitinnererorientierung. Sitz.-Ber.Akad.Wiss., Wien, Math.Naturw.Kl., Abt. Ila., 122: ,1913. [11] H.C. Longuet-Higgins. Acomputeralgorithm for reconstructingascene from two projections. Nature,293: ,September1981. [12] D.MarrandT.Poggio.Acomputationaltheorofhumanstereovision.ProceedingsoftheRoalSocietofLondon,204: ,1979. [13] LeonardMcMillan andgarbishop. Plenopticmodeling:An image-basedrenderingsstem. InSIGGRAPH 95,1995. [14] EricN.MortensenandWilliam A.Barrett. Intelligentscissorsforimagecomposition. In SIGGRAPH 95,1995. [15] S. B. Pollard,J. E. W.Mahew,andJ. P.Frisb. Astereo correspondencealgorithmusingadisparitgradientlimit. Perception,14: ,1985. [16] R. Szeliski. Image mosaicing for tele-realit applications. In IEEE Computer GraphicsandApplications,1996. [17] Camillo J. Talor and David J. Kriegman. Structureand motion from line segments in multiple images. IEEE Trans. Pattern Anal. Machine Intell., 17(11), November1995. [18] S.J.Teller,CelesteFowler,ThomasFunkhouser,andPatHanrahan. Partitioning and orderinglarge radiosit computations. In SIGGRAPH 94, pages , [19] Carlo Tomasiand TakeoKanade. Shapeand motionfrom imagestreams under orthograph:afactorizationmethod. InternationalJournalofComputerVision, 9(2): ,November1992. [20] Roger Tsai. Aversatile cameracalibration techniquefor high accurac3d machinevision metrologusingoff-the-shelftv camerasand lenses. IEEEJournal ofroboticsandautomation,3(4): ,August1987. [21] S.Ullman.TheInterpretationofVisualMotion.TheMITPress,Cambridge,MA, [22] LWilliams. Casting curved shadows on curved surfaces. In SIGGRAPH 78, pages ,1978. [23] LanceWilliams andericchen. Viewinterpolationforimagesnthesis. InSIG- GRAPH 93,1993. [24] Mourad Zerroug and Ramakant Nevatia. Segmentation and recover of shgcs fromarealintensitimage. InEuropeanConferenceonComputerVision,pages ,

From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract

From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract To Appear in the IEEE Trans. on Pattern Analysis and Machine Intelligence From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose Athinodoros S. Georghiades Peter

More information

Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe Computer Science Department University of British Columbia Vancouver, B.C., Canada January 5, 2004 Abstract This paper

More information

Rendering Synthetic Objects into Legacy Photographs

Rendering Synthetic Objects into Legacy Photographs Rendering Synthetic Objects into Legacy Photographs Kevin Karsch Varsha Hedau David Forsyth Derek Hoiem University of Illinois at Urbana-Champaign {karsch1,vhedau2,daf,dhoiem} Figure 1: With only

More information

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation

Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation Point Matching as a Classification Problem for Fast and Robust Object Pose Estimation Vincent Lepetit Julien Pilet Pascal Fua Computer Vision Laboratory Swiss Federal Institute of Technology (EPFL) 1015

More information

Automatic Photo Pop-up

Automatic Photo Pop-up Automatic Photo Pop-up Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University Figure 1: Our system automatically constructs a rough 3D environment from a single image by learning a statistical

More information

Randomized Trees for Real-Time Keypoint Recognition

Randomized Trees for Real-Time Keypoint Recognition Randomized Trees for Real-Time Keypoint Recognition Vincent Lepetit Pascal Lagger Pascal Fua Computer Vision Laboratory École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland Email:

More information

Dude, Where s My Card? RFID Positioning That Works with Multipath and Non-Line of Sight

Dude, Where s My Card? RFID Positioning That Works with Multipath and Non-Line of Sight Dude, Where s My Card? RFID Positioning That Works with Multipath and Non-Line of Sight Jue Wang and Dina Katabi Massachusetts Institute of Technology {jue_w,dk} ABSTRACT RFIDs are emerging as

More information

Toappearin thesiggraph98conferenceproceedings

Toappearin thesiggraph98conferenceproceedings Toappearin thesiggraph98conferenceproceedings Rendering Synthetic Objects into Real Scenes: Bridging Traditional and Image-based Graphics with Global Illumination and High Dynamic Range Photography Paul

More information

A Theory of Shape by Space Carving

A Theory of Shape by Space Carving International Journal of Computer Vision 38(3), 199 218, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. A Theory of Shape by Space Carving KIRIAKOS N. KUTULAKOS Department of

More information

Parallel Tracking and Mapping for Small AR Workspaces

Parallel Tracking and Mapping for Small AR Workspaces Parallel Tracking and Mapping for Small AR Workspaces Georg Klein David Murray Active Vision Laboratory Department of Engineering Science University of Oxford ABSTRACT This paper presents a method of estimating

More information

Modelling with Implicit Surfaces that Interpolate

Modelling with Implicit Surfaces that Interpolate Modelling with Implicit Surfaces that Interpolate Greg Turk GVU Center, College of Computing Georgia Institute of Technology James F O Brien EECS, Computer Science Division University of California, Berkeley

More information

THIS CHAPTER INTRODUCES the Cartesian coordinate

THIS CHAPTER INTRODUCES the Cartesian coordinate 87533_01_ch1_p001-066 1/30/08 9:36 AM Page 1 STRAIGHT LINES AND LINEAR FUNCTIONS 1 THIS CHAPTER INTRODUCES the Cartesian coordinate sstem, a sstem that allows us to represent points in the plane in terms

More information

Shape from Shading: A Survey Ruo Zhang, Ping-Sing Tsai, James Edwin Cryer and Mubarak Shah Computer Vision Lab School of Computer Science University of Central Florida Orlando, FL 32816

More information

Multi-view matching for unordered image sets, or How do I organize my holiday snaps?

Multi-view matching for unordered image sets, or How do I organize my holiday snaps? Multi-view matching for unordered image sets, or How do I organize my holiday snaps? F. Schaffalitzky and A. Zisserman Robotics Research Group University of Oxford fsm,az Abstract. There

More information

Visual Categorization with Bags of Keypoints

Visual Categorization with Bags of Keypoints Visual Categorization with Bags of Keypoints Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray Xerox Research Centre Europe 6, chemin de Maupertuis 38240 Meylan, France

More information

ONE of the great open challenges in computer vision is to

ONE of the great open challenges in computer vision is to IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 1, JANUARY 2007 65 Tracking People by Learning Their Appearance Deva Ramanan, Member, IEEE, David A. Forsyth, Senior Member,

More information

A Survey of Augmented Reality

A Survey of Augmented Reality In Presence: Teleoperators and Virtual Environments 6, 4 (August 1997), 355-385. A Survey of Augmented Reality Abstract Ronald T. Azuma Hughes Research Laboratories 3011 Malibu Canyon Road, MS RL96 Malibu,

More information

Metric Measurements on a Plane from a Single Image

Metric Measurements on a Plane from a Single Image TR26-579, Dartmouth College, Computer Science Metric Measurements on a Plane from a Single Image Micah K. Johnson and Hany Farid Department of Computer Science Dartmouth College Hanover NH 3755 Abstract

More information

A Low-cost Attack on a Microsoft CAPTCHA

A Low-cost Attack on a Microsoft CAPTCHA A Low-cost Attack on a Microsoft CAPTCHA Jeff Yan, Ahmad Salah El Ahmad School of Computing Science, Newcastle University, UK {Jeff.Yan, Ahmad.Salah-El-Ahmad} Abstract: CAPTCHA is now almost

More information

A Flexible New Technique for Camera Calibration

A Flexible New Technique for Camera Calibration A Flexible New Technique for Camera Calibration Zhengyou Zhang December 2, 1998 (updated on December 14, 1998) (updated on March 25, 1999) (updated on Aug. 1, 22; a typo in Appendix B) (updated on Aug.

More information

THE PROBLEM OF finding localized energy solutions

THE PROBLEM OF finding localized energy solutions 600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Re-weighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,

More information

A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms

A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms Daniel Scharstein Richard Szeliski Dept. of Math and Computer Science Microsoft Research Middlebury College Microsoft Corporation

More information

Not Seeing is Also Believing: Combining Object and Metric Spatial Information

Not Seeing is Also Believing: Combining Object and Metric Spatial Information Not Seeing is Also Believing: Combining Object and Metric Spatial Information Lawson L.S. Wong, Leslie Pack Kaelbling, and Tomás Lozano-Pérez Abstract Spatial representations are fundamental to mobile

More information

Can we calibrate a camera using an image of a flat, textureless Lambertian surface?

Can we calibrate a camera using an image of a flat, textureless Lambertian surface? Can we calibrate a camera using an image of a flat, textureless Lambertian surface? Sing Bing Kang 1 and Richard Weiss 2 1 Cambridge Research Laboratory, Compaq Computer Corporation, One Kendall Sqr.,

More information

Using Many Cameras as One

Using Many Cameras as One Using Many Cameras as One Robert Pless Department of Computer Science and Engineering Washington University in St. Louis, Box 1045, One Brookings Ave, St. Louis, MO, 63130 Abstract We

More information

Feature Sensitive Surface Extraction from Volume Data

Feature Sensitive Surface Extraction from Volume Data Feature Sensitive Surface Extraction from Volume Data Leif P. Kobbelt Mario Botsch Ulrich Schwanecke Hans-Peter Seidel Computer Graphics Group, RWTH-Aachen Computer Graphics Group, MPI Saarbrücken Figure

More information

Modeling by Example. Abstract. 1 Introduction. 2 Related Work

Modeling by Example. Abstract. 1 Introduction. 2 Related Work Modeling by Example Thomas Funkhouser, 1 Michael Kazhdan, 1 Philip Shilane, 1 Patrick Min, 2 William Kiefer, 1 Ayellet Tal, 3 Szymon Rusinkiewicz, 1 and David Dobkin 1 1 Princeton University 2 Utrecht

More information

An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision

An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision In IEEE Transactions on PAMI, Vol. 26, No. 9, pp. 1124-1137, Sept. 2004 p.1 An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision Yuri Boykov and Vladimir Kolmogorov

More information

Decomposing a Scene into Geometric and Semantically Consistent Regions

Decomposing a Scene into Geometric and Semantically Consistent Regions Decomposing a Scene into Geometric and Semantically Consistent Regions Stephen Gould Dept. of Electrical Engineering Stanford University Richard Fulton Dept. of Computer Science Stanford

More information

Which Edges Matter? Aayush Bansal RI, CMU. Devi Parikh Virginia Tech. Adarsh Kowdle Microsoft. Larry Zitnick Microsoft Research

Which Edges Matter? Aayush Bansal RI, CMU. Devi Parikh Virginia Tech. Adarsh Kowdle Microsoft. Larry Zitnick Microsoft Research Which Edges Matter? Aayush Bansal RI, CMU Andrew Gallagher Cornell University Adarsh Kowdle Microsoft Devi Parikh Virginia Tech

More information