EECS 598 Advanced Topcs n Moble Computer Vson Course Project Fnal Report Augmented Realty Based Buldng Desgn Exhbton on DROID Chen Feng, PhD Student, Department of Cvl and Envronmental Engneerng, Unversty of Mchgan, 2350 Hayward St., Ann Arbor, MI 48109, USA smba.forrest@gmal.com Chunxa L, PhD Student, Department of Cvl and Envronmental Engneerng, Unversty of Mchgan, 2350 Hayward St., Ann Arbor, MI 48109, USA chunxa@umch.edu ABSTRACT: In ths project, we successfully modfed FERNs, a newly proposed key pont recognton algorthm and mplemented t on the Androd 2.2 moble operatng system. We frstly explan n detals about the orgnal FERNs algorthm as well as the dea of matchng as a classfcaton problem. After understandng ths algorthm, we explan how to modfy t so as to make t applcable on moble platforms. Usng FAST corner detector nstead of local extrema of Laplacan used n orgnal methods, and a dynamc threshold technque to lmt the maxmum number of key ponts to be detected n each frame, we can acheve 15~20 frames per second on a 1.3GHz desktop PC and 3~5 frames per second on MOTOROLA DROID A855 moble platform. We also dscussed what could be done to further mprove our result. KEYWORDS: Buldng Desgn Exhbton, Computer Vson, Planar Marker Based Augmented Realty, FERNs, PhonyFERNs 1. INTRODUCTION Constructon ndustry has wtnessed rapd development durng last few decades, especally n developng countres such as Chna. Archtects all over the world have opportuntes to bd ther desgns and make real buldngs. However, n order to wn n the bddng process, a clear and vvd exhbton of the desgn could really help a lot. Currently, when archtects desgn a buldng and want to show t to the clent, there are typcally three methods besdes archtecture blueprnts. In the frst place, 2D archtecture renderngs are wdely used; secondly, as Buldng Informaton Modelng (BIM) becomes more and more popular, 3D vrtual realty technques are ntroduced to be used for desgn shows; also, usng sold models s another way to gve clents a real sense of the desgn (see FIG. 1). FIG. 1 Three methods to show desgns: (left) 2D renderng; (center) 3D vrtual model; (rght) sold model Recently, the wdespread use of moble phone brngs rapd upgrade of ts hardware, whch makes t possble to develop computer vson and augmented realty applcatons on moble platforms. If we can ntegrate all these development nto a new human computer nterface to be used n the buldng desgn exhbton, magng clents could use ts own moble phones to observe all detals of the desgn, ths new desgn exhbton methods could be very nterestng for the nteracton, as well as helpful to show the desgner s dea better, and thus help archtects to wn ther bd (see FIG. 2). 1
Dec. 19, 2010 FIG. 2 Proposed moble nterface to show desgns In ths project, we appled planar marker based augmented realty. By trackng the planar marker through the camera on moble phone, we can then recover the phone s pose,.e. poston and orentaton, relatve to the marker. Once we get the phone s pose, the desgned vrtual 3D model can then be augmented accordngly on the frame captured by camera. Next, we wll report the project n detals. In secton 2, we wll gve an overvew of the problem as well as a bref ntroducton of the technques we appled. In secton 3, we wll explan n detals about how the algorthm works and how we modfed t so that t can be used on moble platform. In secton 4, we wll show our experment results. In secton 5, we wll summarze the project and dscuss some future works. 2. OVERVIEW The problem of gettng the camera s poston and orentaton s termed as regstraton problem n augmented realty. Frstly, let s brefly revew the tradtonal regstraton methods. 2.1 Tradtonal Regstraton Regstraton problem has been ntensvely studed from both computer vson and augmented realty communty. Based on whether rely on any pror knowledge of the envronment, there are roughly two groups of regstraton methods. The frst group requres havng some pror knowledge about the scene to be augmented. Insde ths group, they can be further grouped by usng 2D planar nformaton or usng 3D nformaton. The former s usually called planar marker based method. Wthn ths, template matchng method has been successfully used n applcatons such as ARTOOLKIT [1]. By trackng partcular black-whte fducal marker, t can then estmate a homography between the marker plane and mage plane and then get camera pose relatve to the marker. The shortcomng of ths method s that the black-whte marker usually has nothng to do wth the augmented realty applcaton, whch may be annoyng. Besde ths method, nterest pont based planar marker methods have attract lots of attenton snce the development n key pont detecton and recognton such as SIFT [2] and SURF [3]. They typcally use nterest pont detectors and matchng schemes to assocate 2D locatons n the vdeo mage wth 3D locatons. The locaton nvarance afforded by nterest pont detectors s attractve for localzaton wthout pror knowledge and wde baselne matchng. However, computaton of SIFT and SURF descrptors that are nvarant across large vew changes s usually too expensve (less than 5 frames per second) to be appled n augmented realty applcatons. The second group usually takes advantage of vsual smultaneously locaton and mappng (SLAM). Among them, the recently proposed parallel trackng and mappng (PTAM) [4] system s typcal and successful. It splt the trackng and mappng nto two threads and gves trackng thread hgher prorty. In mappng thread t uses bundle adjustment to refne the map reconstructed from key frames. Although ths method requres no pror knowledge of the scene, t s stll too computatonally expensve and too complcated to be mplemented on moble platform. 2.2 FERNs Based Regstraton + After carefully comparng, we choose to use a newly proposed key pont recognton method called FERNs [5], whch falls nto the nterest pont based planar marker methods. There are several advantages of ths algorthm. Frstly, t runs really fast. On a normal desktop PC, t can easly acheve about 15 frames per second. Ths gves us confdence to get a satsfactory fps on moble platforms. Secondly, t does not requre multple threads nor any 2
EECS 598 Advanced Topcs n Moble Computer Vson Course Project Fnal Report numercal teratons, whch makes t easy to mplement on moble platforms. Thrdly, t s very flexble, whch makes t easer to use dfferent corner detectors such as FAST [6]. Usng FERNs based regstraton, t nvolves two stages: tranng stage and recognton stage. The former stage wll be done n an offlne way,.e. tranng before real-tme trackng. After tranng, the recognton stage looks lke ths for each frame of mage: frstly, usng whatever corner detector (FAST n our modfed case) to detect a set of key ponts; secondly, usng a small mage patch around each key pont and the tranng data to recognze whether the pont s a same key pont on the planar marker; then usng matched pont pars, a homography between the two planes can be estmated by RANSAC methods (see FIG. 3). 3. TECHNICAL DETAILS FIG. 3 General schema of FERNs In ths secton, we wll frst explan how FERNs works and then how we modfed t nto PhonyFERNs so that t s applcable n moble platform. 3.1 Matchng as Classfcaton The whole dea behnd FERNs s so called as matchng as a classfcaton problem [7]. Durng tranng stage, a set F { f, f,, f } 1 2 N of N key ponts s constructed. Each key pont f s represented by a samplng set of ts so called vew set,.e. the set of all ts possble appearances under dfferent vewng condtons. Ths s smulated by random affne transformatons, whch makes ths method nvarant to vew pont change (see FIG. 4). At the recognton stage, an nput mage patch p wll be classfed by a label Y( p) C { 1,1,2,, N}, where -1 means the newly detected key pont do not belong to the set F. FIG. 4 Example of samplng from vew set of an mage patch around a key pont by random affne transformatons The problem now becomes how to construct a classfer Y ˆ : P C, n whch P s the space of all mage patches of a gven sze. The author tred several dfferent classfers. The frst one they used s prncple analyss plus K-means and nearest neghbor search drectly appled on mage patches [7], whch proves the applcablty of ther matchng as a classfcaton problem dea, though stll slow (about 5 fps on a 2GHz desktop PC). As one can see from FIG. 5, t gves a decent result. 3
Dec. 19, 2010 After ths, the author tred ther second classfer, randomzed trees method [8]. But then they found out that, when the tests are chosen randomly, the power of the approach derves not from the tree structure tself but from the fact that combnng groups of bnary tests allows mproved classfcaton rates [5]. Ths results n FERNs, replacng the trees by non-herarchcal ferns and poolng ther answers n a Nave Bayesan manner yeldng better results and scalablty n terms of number of classes. 3.2 FERNs FIG. 5 (left) PCA+Kmeans+NN search classfers results; (rght) SIFT results The theoretcal background of FERNs looks lke ths: If we represent a patch by a set of mage features{ f }, we have Usng Bayesan equaton we have cˆ arg max P( C c patch) (1) c P( C c patch) P( C c f, f,, f ) 1 2 N (2) P( C c f, f,, f ) P( f, f,, f C c ) 1 2 N 1 2 N (3) Wthout losng some correlatons between mage features, one can get P( f1, f2,, fn C c ) P( f1, f2,, fn C c ) P( fn 1, fn 2,, f2n C c ) (4) f, f,, f s called a FERN, and the mage feature n whch P( 1 2 n C c) ntenstes of two randomly selected locatons p and p 1 2 wthn the mage patch: f s a comparson between f 1 f I( p ) I( p ) 1 2 0 otherwse (5) whch makes FERNs nvarant to lght changes. In mplementaton, each FERN could be seen as a 2D hstogram wth one axs representng axs representng the group test result of f1, f2,, f n (see FIG. 6 left). C c and another 4
EECS 598 Advanced Topcs n Moble Computer Vson Course Project Fnal Report In the tranng stage, mage patch around each key pont detected by any key pont detector wll be tested by all FERNs so as to update the hstograms. After all mage patches are tested, normalzaton s performed on each FERN s 2D hstogram so that all ts values are summed up to be 1. Ths fnshes the tranng stage. In the recognton stage, for any newly detected key pont, test ts surroundng mage patch by each FERN. And for each class c, look up all ts FERNs probablty values and calculate P( f 1, f2,, fn C c) by equaton (4). Among all classes, fnd the class whch gves the maxmum response, then the new key pont s classfed as well as recognzed. FIG. 6 (left) an example of three FERNs for 3 classes, each FERN has 3 features, result n a 3 by 2 3 hstogram; red and blue dots represent mage features. (rght) recognton methods by FERNs 3.3 PhonyFERNs In order to apply FERNs n moble platforms, several changes are needed: 1. Usng FAST corner detector nstead of local extrema of Laplacan used n orgnal methods. Ths replacement gves us a better performance snce t takes advantage of the fast key pont detecton algorthm; 2. Snce the lmtaton n computatonal power of moble platform, a dynamc threshold technque s ntroduced. Ths lmts the maxmum number of key ponts to be detected n each frame, snce FERNs tme complexty s drectly related to the number of key ponts; 3. Snce the memory s relatvely small n moble platform, the parameters need to be tuned to lower ts memory consumpton. Besdes ths, further modfcatons could make t runs even faster on moble platforms, such as replacng floatng numbers by nteger numbers and so on [9]. 4. EXPERIMENTS We frst experment PhonyFERNs on a 1.3GHz desktop PC. By usng dynamc threshold n the tranng stage, we can lmt the number of model key ponts (ponts used to tran) to be 100±20. The mage patch sze s 32x32. We use 100 FERNs, and number of mage features for each FERN s 9. Also, durng the tranng stage, each mage patch s vew set s sampled by 10000 random affne transformatons. Durng recognzng stage, we also lmt the number of key ponts to be detected for each frame to be 500±50. Ths gves us a very fast performance wth 15~20 frames per second and uses about 40Mb of memory. The breakdown of tme consumpton wthn each frame looks lke ths: 5 mllseconds for FAST corner detecton, 45 mllseconds for PhonyFERNs and 15 mllseconds for RANSAC estmaton of homography (see FIG. 7 left). 5
Dec. 19, 2010 FIG. 7 (left) tme consumpton breakdown wthn each frame for PhonyFERNs on PC; (rght) tme consumpton breakdown for PhonyFERNs on DROID We then mplement the PhonyFERNs on MOTOROLA DROID A855 platform wth operatng system of Androd 2.2, usng the same tranng parameters; the only dfference s the lmtaton of number of key ponts to be detected for each frame s now set to be 300±50. Ths gves us a performance wth 3~5 frames per second and uses about 40Mb of memory as well. The breakdown of tme consumpton wthn each frame looks lke ths: 10 mllseconds for FAST corner detecton, 250 mllseconds for PhonyFERNs and 50~100 mllseconds for RANSAC estmaton of homography (see FIG. 7 rght). The results of the PhonyFERNs can be seen n FIG. 8. As one can see, when there s a moton blur, the algorthm wll fal snce no key pont s found wthn the marker plane. FIG. 8 PhonyFERNs results on PC: (up left) success case, red dots are canddate key ponts detected by FAST detector, green dots are recognzed nler key ponts and green polygon s calculated by the estmated homography; (up rght) falure case, due to moton blur of the marker. PhonyFERNs results on DROID: (bottom left) success case; (bottom rght) falure case, due to moton blur. 5. CONCLUSIONS In ths project, we successfully modfed FERNs, a newly proposed key pont recognton algorthm and mplemented t on the Androd 2.2 moble operatng system. We have three major conclusons: 6
EECS 598 Advanced Topcs n Moble Computer Vson Course Project Fnal Report 1. FERNs s very fast on PC, but stll slow on Moble platform, even f usng FAST key pont detecton; 2. The memory use of Ferns could be an ssue for moble applcaton; 3. RANSAC s too costly n tme for moble applcaton; 4. Natural Marker Trackng could become more popular n AR because t could easly be made more relevant to the applcaton (n our case, the marker could be a blueprnt of the desgn). For future works, we plan to replace RANSAC wth other faster robust estmator such as PROSAC [10]. Also, we want to use other advanced modfcatons of FERNs suggested by D. Wagner [9]. 6. REFERENCES [1] H. Kato and M. Bllnghurst, Marker trackng and HMD calbraton for a vdeo-based augmented realty conferencng system, Proceedngs 2nd IEEE and ACM Internatonal Workshop on Augmented Realty IWAR99, IEEE Comput. Soc, 1999, pp. 85-94. [2] D.G. Lowe, Object recognton from local scale-nvarant features, Proceedngs of the Seventh IEEE Internatonal Conference on Computer Vson, vol. 2, 1999, pp. 1150-1157 vol.2. [3] H. Bay, T. Tuytelaars, and L. Van Gool, SURF: Speeded Up Robust Features, Computer Vson ECCV 2006, vol. 3951, 2006, p. 404 417. [4] G. Klen and D. Murray, Parallel Trackng and Mappng for Small AR Workspaces, 2007 6th IEEE and ACM Internatonal Symposum on Mxed and Augmented Realty, vol. 07, 2007, pp. 1-10. [5] M. Ozuysal, P. Fua, and V. Lepett, Fast Keypont Recognton n Ten Lnes of Code, 2007 IEEE Conference on Computer Vson and Pattern Recognton, Ieee, 2007, pp. 1-8. [6] E. Rosten and T. Drummond, Machne learnng for hgh-speed corner detecton, European Conference on Computer Vson, Sprnger, 2006, pp. 430-443. [7] V. Lepett, J. Plet, and P. Fua, Pont matchng as a classfcaton problem for fast and robust object pose estmaton, Proceedngs of the 2004 IEEE Computer Socety Conference on Computer Vson and Pattern Recognton 2004 CVPR 2004, vol. 2, 2004, pp. 244-250. [8] V. Lepett and P. Fua, Towards Recognzng Feature Ponts usng Classfcaton Trees, 2004. [9] D. Wagner, G. Retmayr, A. Mullon, T. Drummond, and D. Schmalsteg, Real-tme detecton and trackng for augmented realty on moble phones., IEEE transactons on vsualzaton and computer graphcs, vol. 16, 2010, pp. 355-68. 7
Dec. 19, 2010 [10] O. Chum and J. Matas, Matchng wth PROSAC Progressve Sample Consensus, 2005 IEEE Computer Socety Conference on Computer Vson and Pattern Recognton CVPR05, vol. 1, 2005, pp. 220-226. 8