Gaze Manipulation for One-to-one Teleconferencing

Gze Mnipultion for One-to-one Teleconferencing A. Criminisi, J. Shotton, A. Blke, P.H.S. Torr Microsoft Reserch Ltd, Cmridge, UK Astrct A new lgorithm is proposed for novel view genertion in one-toone teleconferencing pplictions. Given the video strems cquired y two s plced on either side of computer monitor, the proposed lgorithm synthesises imges from virtul in ritrry position (typiclly locted within the monitor) to fcilitte eye contct. Our technique is sed on n improved, dynmicprogrmming, stereo lgorithm for efficient novel-view genertion. The two min contriutions of this pper re: i) new type of three-plne grph for dense-stereo dynmic-progrmming, tht encourges correct occlusion leling; ii) compct geometric derivtion for novel-view synthesis y direct projection of the minimum-cost surfce. Furthermore, this pper presents novel lgorithm for the temporl mintennce of ckground model to enhnce the rendering of occlusions nd reduce temporl rtefcts (flicker); nd cost ggregtion lgorithm tht cts directly on our three-dimensionl mtching cost spce. Exmples re given tht demonstrte the roustness of the new lgorithm to sptil nd temporl rtefcts for long stereo video strems. These include demonstrtions of synthesis of cyclopen views of extended converstionl sequences. We further demonstrte synthesis from freely trnslting virtul. 1. Introduction This pper ddresses the prolem of novel-view synthesis from pir of rectified imges with specific emphsis on peer-to-peer teleconferencing. With the rise of instnt messenging technologies, it is envisged tht the PC will incresingly e used for interctive visul communiction. One pressing prolem is tht ny used to cpture imges of one of the prticipnts hs to e positioned offset to his or her gze. This cn led to lck of eye contct nd hence undesirle effects on the interction [4]. Previously proposed solutions to this prolem cn e rodly ctegorized s model-sed or imge-sed. One model-sed pproch is to use detiled hed model nd reproject it into the cyclopen view; whilst this cn e successful [10, 11], it is limited to imging heds, nd would not, for exmple, del with hnd in front of the fce or whiteord scriling. A more generl pproch therefore is to use low level stereo mtching, in some form, nd follow with n imge sed rendering pproch (IBR) [1]. The Computer screen Messenging window Input left imge Input right imge Figure 1: Cmer configurtion. The sic setup considers two s plced on the frme of the computer monitor. The gol of this pper is tht of generting high-qulity imges for virtul s plced nywhere ner the computer monitor. It will e demonstrted tht gze correction cn e chieved y this technique in n efficient nd compelling wy. im is to synthesize view from virtul tht is locted roughly where the imge of the hed will e displyed on the screen for ech prticipnt, thus chieving eye contct. The sic setup is illustrted in fig. 1. In IBR depth mp is comined with n imge to produce the new view. However in our pproch we use new min-cost surfce projection lgorithm for novel view genertion tht dels with occlusions nd hole filling in direct mnner; y voiding the explicit construction of the scene 3D model. In order to generte depth mp dense stereo lgorithm is required, sustntil review of which cn e found in [8]. According to the evlution, two of the most powerful pproches use grph cuts [5, 7] nd loopy elief propgtion [9] ut oth of these re currently too computtionlly intensive for rel time pplictions. Furthermore, the evlution in [8] my not e vlid for our purposes s: (i) the rnge of disprities considered in this pper is much smller thn in our ppliction (0-29 pixels there, wheres we typiclly consider 0-80 pixel disprities), (ii) we re primrily interested in new view synthesis, thus it does not mtter if the disprities re reltively inccurte in homogeneous imge regions, ll tht mtters is tht the new view is well synthesized. On the other hnd where occlusions occur it is importnt to estimte them ccurtely since otherwise rtefcts such s hloing ( lens-like distortion round foreground ojects) ecome visile. (iii) we consider long video sequences, thus stility of estimtion lso plys prt: flickery reconstruction is less desirle thn consistent one. One of the most computtionlly efficient lgorithms for stereo is Epipolr line Dynmic Progrmming [6], referred to s DP. We hve implemented DP for dense stereo [3], 1

P(X,Y,Z) c f Ol l Y Z X O B r Or Figure 2: Fst cyclopen view synthesis y dynmic progrmming. (,c) Input left nd right views, respectively; () Cyclopen view synthesized y stndrd dynmic progrmming [2]. Note tht gze is correct in the cyclopen view. The lgorithm runs t ner rel-time rte, ut produces rtefcts in the cyclopen imge. In this cse considerle undesired streks corrupt the synthesized fce. Furthermore, reconstructed temporl cyclopen sequence shows considerle flicker nd lso n undesirle hlo effect round the hed. the use of which hs previously een demonstrted for cyclopen view interpoltion [2] in video. To otin computtionl efficiency, oservtions consist of single-pixel intensities, nd the consequent qulity of reconstruction (especilly on cluttered ckgrounds) is not consistently stisfctory, s fig. 2 shows. Reconstruction from stndrd dynmic progrmming is chrcterized y two kinds of error: (i) rtefcts produced y mismtches (horizontl streks), nd (ii) the hlo in the regions where the ckground is visile in only one of the two input views. This pper sets out to ddress nd solve oth kinds of rtefcts. There re two prts to our method: the first is generting ccurte disprity nd occlusion mps. Indeed, s will e seen, ccurte leling of occlusions is necessry to remove the hloing effect. The second is representing nd using the computed disprities nd occlusions to generte new view. Within the pper we present new contriutions in oth res: For the genertion of disprities we propose new type of dynmic progrmming pproch, s pth finding through three-plne grph (s opposed to the trditionl single-plne DP), introducing new lels to help the correct identifiction of occlusions, nd ltering the cost function employed to fvour: (i) correct grouping of occlusions, (ii) formtion of occlusions t the oundries of foreground ojects, nd (iii) inter scnline consistency. Second, we introduce the elegnt geometry of min-cost surfce projection s n efficient technique for generting synthetic views from ritrry virtul s directly from the minimum-cost surfce otined during the DP process. The new lgorithm is demonstrted on long sequences of pirs of synchronized stereo videos tken from sttic s, the methods show compelling novel view genertion results. Section 2 descries trditionl dynmic progrmming. Section 3 introduces our improved dense stereo technique. Our view genertion pproch is descried in section 4, nd finlly, section 5 presents concluding results. scnline Figure 3: Bsic nottion. P is 3D scene point. O l nd O r re the opticl centres of left nd right s, respectively. f is the focl length of the s nd B is the seline etween the two opticl centres. O is the origin of the reference X, Y, Z coordinte system. r scnline Virtulscnline Minimum costpth l r l occluded move Mtched move occluded move Figure 4: Trditionl dynmic progrmming. () Bsic digrm for dynmic progrmming. It represents the mtrix of cumultive costs used for computing the minimum cost pth. () A single-step view of () showing the set of the three llowed moves etween pixel pirs in [2]. The circles represent elements of the cost mtrix in (). 2. Trditionl dynmic progrmming In the interests of clrity this section descries riefly the sic dynmic progrmming lgorithm [3]. Figure 3 shows pln view of the setup. The left nd right s provide us with the synchronized nd rectified input videos. f is the focl length nd B the distnce etween the two opticl centres (seline). A Crtesin coordinte system is chosen with origin t the mid point etween the left nd right opticl centres. In the reminder of the pper we will refer to cyclopen imges s the ones generted y virtul with opticl centre in O. Given the configurtion in fig. 3 the digrm in fig. 4 represents the cumultive cost mtrix for pir of corresponding scnlines in the two input imges [3]. The cost M(l, r) of mtching pixel t position l long the left scnline nd pixel t position r long the right scnline is defined simply s the squre difference of pixel intensities (which in our experiment in fig. 2 is normlized etween 0 nd 1). Note tht, since l >= r P (i.e. disprities re lwys positive), then it is only ever necessry to consider the lower hlf of the digrm (grey in fig. 4). The limiting cse l = r corresponds to point t infinity with consequent zero disprity. It is ssumed from here on tht 2

Figure 5: The 3D cost spce for pir of stereo imges. () The sic digrm of fig. 4 ecomes 3D digrm when ll the scnline pirs re considered. The vlue of ech element inside the prllelepiped is the M(l, r) cost of mtching two pixels. See text for detils. The digonl plne is clled the plne of virtul imge for resons tht will ecome pprent in the reminder of the pper. () A 2D Gussin filter prllel to the virtul imge plne is pplied to the 3D cost spce to enforce interscnline consistency. ny computtion is restricted in this fshion. Aprt from simple initiliztion step, the elements of the cumultive cost mtrix C re filled, t ech itertion y the following recurrence: C(l, r) = min C(l 1, r) + OccCost C(l 1, r 1) + M(l, r) C(l, r 1) + OccCost where C(l, r) indictes the cumultive cost of the pth reching the point (l, r) in the mtrix. This recurrence defines the forwrd pss of the DP lgorithm. Notice tht three moves (or lels) re permitted: horizontl occluded move, digonl mtched move nd verticl occluded move, respectively. The cost of hving n occluded pixel, OccCost is mnully set prmeter which depends on the imge pir eing exmined vlue of 0.3 seems to yield the est results on vriety of imges. At ech itertion the minimum cost etween the three possile moves is chosen nd tle of ckwrds links is stored for use in the second prt of DP. The ckwrd pss of the lgorithm follows the sved links, producing the minimum-cost pth (fig. 4) nd, therey the disprity mp. It is importnt to stress the fct tht only three types of move re possile in [3] thus confounding occluded nd non-occluded moves. As descried lter, one of the min contriutions of this pper is tht of expnding the set of permitted moves for correct detection nd clssifiction of the occlusion events. 3. Our improved DP lgorithm This section descries the proposed mtching lgorithm. Our improved DP technique produces etter occlusion clssifiction nd improved disprity mps which, in turn, yield (1) the removl of rendering rtefcts in the synthesized virtul imges. Computing mtching costs. In order to id inter-scnline consistency the mtching cost M(l, r) is clculted for every pir of pixels long corresponding epipolr lines with windowed normlised cross-correltion: M(l, r) = (1 M (l, r))/2 where M (l, r) = (IL I L )(I R I R ) (IL I L ) 2 (I R I R ) 2 is the correltion coefficient. Notice tht since 1 M (l, r) +1 l, r, then 0 M(l, r) 1 l, r. Tller neighorhood windows (e.g. 3 7) help incorporte inter-scnline informtion etter thn squre windows. Computing the costs M(l, r) cn e performed efficiently y expnsion of the ove eqution nd keeping trck of sums from one pir of epipolr lines to the next. Filtering the mtching costs. The use of windows for the computtion of the M(l, r) costs helps to reduce rtefcts in the finl disprity nd occlusion mps ut it is not sufficient for the complete removl of the rtefcts. Therefore, in order to otin clener disprity nd occlusion mps we first tke ll the mtrices of M(l, r) costs ssocited to ech pir of scnlines (note: not the cumultive cost mtrices), stck them up together to crete 3D cost spce (fig. 5), nd then pply two-dimensionl Gussin smoothing filter (prllel to the virtul imge plne) to the 3D cost spce. The xis of the Gussin kernel orthogonl to the left nd right scnline xes (denoted in fig. 5) is responsile for enforcing inter-scnline consistency of the costs nd the orthogonl xis (denoted in fig. 5) produces dditionl smoothing of shrp corners in the occlusion mp. The output of this process is the new set of M(l, r) costs used s input to our improved DP lgorithm, descried elow. The five-move model. A mjor drwck to the stndrd DP pproch is tht slnted surfces (e.g. non frontoprllel wlls nd tle tops) in spce must e pproximted y comintion of digonl (mtched) nd horizontl or verticl (occluded) moves. In such cse the occluded moves do not correspond to rel occlusions nd therefore, in order to dismigute etween pproximting occluded moves nd rel occlusions we ugment the sic 3-move model y dding further pir of horizontl nd verticl mtched moves, thus defining 5-lel (.k.. 5-move) model. The new model is illustrted in the digrm in fig. 6 (to e compred with fig. 4). This improved model produces more consistent clssifiction of occluded nd mtched pixels. In fct, slnted surfces re correctly pproximted y sequences of mtched moves only, nd rel occlusion lels re correctly ssigned to pixels visile only in one of the two input views. 3

Occluded move (r) r l Mtched move Mtchedmove Mtchedmove Occludedmove(l) Figure 6: The improved 5-move model for DP. () Five different moves re llowed: three mtched moves (horizontl, digonl nd verticl), nd two occluded moves (horizontl nd verticl). Thus, verticl nd horizontl moves cquire two possile menings: (i) for the explicit modeling of occlusion events, nd (ii) pproximting slnted surfces. The three-plne grph for DP. Furthermore, in order to is towrds runs of identicl moves we introduce further extension of the sic DP technique y defining the DP lgorithm on three plnes of cumultive cost mtrices (s opposed to the single plne in [2]): left-occluded plne L, mtched plne M, nd right-occluded plne R (see fig 7). As illustrted in fig 7 in this new model totl of thirteen moves re permitted. The improved three-plne grph is the sis of our new lgorithm for recovering the disprity mp. The min dvntge of the new model is tht it llows us to lter the individul costs of ech type of move, independently. For exmple, ising the penlty costs ginst inter-plne moves would tend to keep runs of occluded or non-occluded pixels together, thus reducing most of the inccurcies in the reconstructed occlusions nd disprities. Also, physiclly impossile moves such s the direct trnsition etween left nd right occlusions re prohiited simply y removing certin trnsitions from the set of llowed ones in the three-plne grph (in fig 7 the top nd ottom plnes re never directly linked). At present, the cost π(a B) of generic trnsition etween two plnes A nd B is mnully set, s descried elow, ut further investigtion out possile proilistic frmework is necessry. Moreover, it is resonle to ssume tht π(a B) is symmetric 1, i.e. π(a B) = π(b A) nd lso tht ny move involving the left occluded plne hs the sme cost s corresponding move involving the right occluded plne. This reduces the totl numer of penlty prmeters to three, nd then y setting π(m M) to zero in the mtched plne we end up with only two prmeters: α eing the cost of move within n occluded plne, nd β eing the cost of move etween different plnes (fig 7). In this new frmework the mtrices of cumultive costs C L, C M nd C R (one for ech plne in the grph) re ini- 1 To void introducing unjustified symmetries in the wy the imge dt is treted. Figure 7: The proposed 13-move, 3-plne model for DP. () The grph ssocited to our DP lgorithm now lives in three plnes to impose constrints etween the llowed moves. The llowed moves within plnes nd etween plnes re shown y rrows. () The permitted moves hve een lelled with the ssocited costs. Some of the lels hve een left out for clrity. The entire set of permitted moves nd their ssocited costs is descried in the text. tilised to everywhere except in the right occluded plne, where: C R (i, 0) = iα (2) nd then the forwrd step of the dynmic progrmming proceeds s follows: { CL (l, r 1) + α C L (l, r) = min (3) C M (l, r 1) + β C M (l, r) = M(l, r) + (4) C M (l 1, r) C L (l 1, r) + β C R (l 1, r) + β C M (l, r 1) min C L (l, r 1) + β C R (l, r 1) + β C M (l 1, r 1) C L (l 1, r 1) + β C R (l 1, r 1) + β { CR (l 1, r) + α C R (l, r) = min C M (l 1, r) + β where M(l, r) is the filtered cost (s descried previously) of mtching the l th pixel in the left scnline with the r th pixel in the right scnline. The prmeters re chosen s follows: α is set to 1/2, vlue chosen such tht most good mtching costs M(l, r) re less thn this. The prmeter β is set to 1.0 vlue too low produces spurious isolted mtched pixels within occluded regions, with consequent rtefcts; while vlue too high results in the minimum-cost pth rrely leving the mtched plne. (5) 4

scnline pv Virtul scnline pr pv p c pr Minimum cost pth p Virtulscnline scnline scnline pl scnline d Figure 8: Filtering the cost spce. () One of the two input imges, () Occlusion mp for the cyclopen view otined without cost smoothing, green nd red mrk occluded pixels, cyn nd mgent mrk horizontl nd verticl mtched moves, white is foreground nd lck is ckground. (c) Corresponding disprity mp otined with no cost smoothing. (d) Occlusion mp for the cyclopen view otined with Gussin smoothing (σ = 4.0) of the 3D cost spce. Only the rel occlusions re shown here. (e) Corresponding disprity mp otined with Gussin smoothing (σ = 4.0) of the 3D cost spce. A 3 3 correltion window ws used in oth exmples. Exmple result from improved DP. Figure 8 demonstrtes two concepts: i) our improved DP lgorithm correctly lels occluded pixels nd distinguishes them from horizontl nd verticl mtched moves; nd ii) solid occluded res re relily detected round the hed. In fig. 8d the lrge left nd right occluded res pper clener nd solid (no spurious mtched pixels within). Furthermore, slnted surfces such s the wlls nd the fce re modeled y sequences of digonl, verticl nd horizontl mtched moves (not shown in the figure, for lck of spce). We hve found tht smoothing of the costs considerly improves the results of the dynmic progrmming, nd enles the window size of the cross-correltion mtching function to e reduced considerly (without reducing the qulity of the results) for potentilly fster execution speeds. A 3 3 window hs een found to work consistently well. Much poorer results re otined from smoothing the estimted disprity mps. Shiftle windows [8] were lso tried here ut did not pper to hve lrge effect; proly due to the smll window sizes in use (3 3). Generting the cyclopen view. The synthesis of the cyclopen (centrl) view cn e done for ech scnline y simply tking point p (fig. 9) on the minimum cost pth, tking the colours of the corresponding pixels p l nd p r in the left nd right scnlines, verging them nd projecting the newly otined pixel orthogonlly to the virtul imge plne into the virtul imge point p v. e Figure 9: Generting the cyclopen view. () For mtched points: mtched point p (on lue segments) is projected orthogonlly onto its corresponding point p v on the virtul scnline. The pixel vlue of the virtul pixel p v is the verge of the corresponding pixels p l nd p r on left nd right imges, respectively. () For occluded regions: point p on the continution of the ckground (with sme disprity, dshed lue line) is projected orthogonlly onto its corresponding point p v on the virtul scnline. Since we re deling with left occlusion then the pixel vlue for p v is the sme s tht of its corresponding point p r on the right view only. This implements the fronto-prllel occlusion filling directly from the nlysis of the grph. In order to fill the occluded regions, fronto-prllel ssumption is used, i.e. the ckground is continued t the sme depth (dshed lines in fig. 9). Here, for left occlusion, the pixel vlues re tken only from the right imge nd vice-vers. The disprity vlue is set ccordingly. The fronto-prllel pproximtion does not work well if the occlusion regions present isolted mtched pixels nd therefore, otining solid nd relile occlusion regions is of prmount importnce. As shown, the use of the proposed thirteen-move, three-plne lgorithm with the extr cost-smoothing step produce extremely solid occlusion regions (fig. 8d) nd, consequently, visully convincing ckground propgtion into the occlusion regions. Temporl occlusion filling. Despite the progress otined in the synthesis of cyclopen views from stereo pir of still imges, when the sme lgorithm is pplied to sequence of stereo imges then smll temporl rtefcts ecome visile (e.g. flickering). To void this prolem we proceed with temporl construction of model of the ckground tht llows us to fill in the regions of missing informtion t given time with pixel vlues which my hve een ville in previous time instnces. In order to do so the ccurtely estimted disprity surfce is first segmented into foreground nd ckground for ech frme y employing the following lgorithm: long ech scnline in the disprity surfce, for ech run of occlusions, the disprity t the highest disprity end of the run is histogrmmed. The vlley in the resulting i-modl histogrm defines the disprity threshold. This is in line 5

synthesized cyclopen view ckground model P(X,Y,Z) xv Virtul f Ol xl Ov Y O Z X xr Or Figure 11: Bsic nottion for virtul imge genertion. O l, O r nd O v re the opticl centres of left, right nd virtul s respectively. The opticl centre of the virtul cn e plced nywhere in spce nd the corresponding virtul imge is synthesized y our lgorithm. Figure 10: Temporl ckground genertion. (left column) Synthesized cyclopen views for different frmes (0, 70 nd 170). More exmples of synthesised cyclopen views re provided in the reminder of the pper. (right column) Corresponding cyclopen ckground models. As new regions of the ckground re discovered the ckground model is updted nd occluded res filled in. with the ssumption tht lmost ll runs of occlusions will occur to either side of the hed nd so the histogrm will hve shrp pek where the foreground strts. This turns out to e the cse for lrge numer of sequences nd so foreground threshold disprity cn e utomticlly set. Figure 8d shows lso the results of the segmenttion (foreground in white nd ckground in lck). Given the segmented foreground nd ckground, new options ecome ville, e.g. to replce the ckground entirely with chosen photogrph or video, or to dynmiclly updte the ckground model for use in filling occluded res in successive frmes. The ltter is especilly useful in the next section which introduces the three-dimensionl motion of the virtul. In fct, for exmple, s the virtul centre moves wy from the seline of the two input s less informtion is ville from individul frmes in the occluded regions nd temporlly cquired informtion ecomes extremely useful. In the second step of the lgorithm ckground model is constructed nd updted t ech time instnce. The ckground model is mde of three elements: its disprity mp D B in cyclopen coordintes, nd the corresponding left nd right imges IB l nd Ir B, respectively. At ech time instnce t the ckground model is updted y the following rule: DB(p) t = τd t 1 B (p) + (1 τ)dt (p) I lt B(p l ) = τi lt 1 B (p l ) + (1 τ)i lt (p l ) I rt B (p r ) = τi rt 1 B (p r ) + (1 τ)i rt (p r ). (6) where p is pixel whose disprity D(p) flls elow the utomticlly computed foreground threshold ˆd (nd thus elongs to the ckground), p l nd p r re the corresponding positions on left nd right input imges, respectively. DB t (p) is the disprity of the pixel p in the current ckground model t time t. The sclr fctor τ represents decy constnt (0 τ 1) nd I indictes intensities. The updte rule (6) pplies to ll the pixels which elong to the ckground nd re visile, nd does not pply to occluded pixels. The presence of the decy fctor τ (we use τ = 0.9) in the updte rule (6) hs the desired effect of temporl smoothing of the output visul dt, with the consequent reduction of pixel flicker (instility). This leds to n improved temporl consistency in the reconstructed occluded regions. Figure 10 illustrtes the results of the temporl ckground filling lgorithm. 4. Simulting the 3D motion of the virtul This section descries novel, compct technique for rendering virtul views directly from the estimted minimumcost surfce, thus negting the need to construct n explicit 3D model of the scene. Figure 11 shows pln view of the system with the opticl centre of the virtul eing plced in O v. A 3D scene point P is projected on the left nd right imges into the points p l = (x l, y l ) nd p r = (x r, y r ) respectively. Also, P is projected on the cyclopen (with opticl centre in O c = O) in the point p c = (x c, y c ) (not shown in the figure) nd on the virtul (with opticl centre O v in generic position) in the point p v = (x v, y v ). The disprity etween the corresponding left nd right imge points is esily computed s d = x l x r = f B Z. (7) In the cyclopen, y tringle similrity we cn compute x c = f X Z. (8) 6

imge Virtul imge Disprity surfce - 8 imge Virtulimge Disprity surfce imge Inwrds virtul motion, Q imge Q + 8 Outwrds, Upwrds, Downwrds,,. c Figure 13: Exmple of gze correction. (,c) Input left nd right views, respectively; () Our lgorithm does correct the gze while eliminting the rtefcts. To e compred with fig. 2. Figure 12: Virtul motion. () The 3D motion of the virtul is chieved y direct projection of points on the minimum-cost surfce into the virtul imge plne. () Moving the centre of projection Q corresponds to trnslting the virtul. The coloured rrows indicte the mpping etween moving the centre of projection Q in our digrm nd the corresponding trnsltions of the virtul in the scene. For virtul with opticl center in O v = (T x, T y, T z ) we cn write: (X T x ) : x v = (Z T z ) : f, from which x v = f X T x. (9) Z T z By sustituting (7) nd (8) into (9) we otin: x v = f x c dt x /B 1 dt z/(fb) which, together with the nlogous eqution for the y v coordinte, cn e rewritten in homogeneous coordintes s: x v y v w = 1 0 T x/b 0 0 1 T y /B 0 0 0 T z /(fb) 1 x c y c d 1. (10) Eqution (10) represents projection of 3D points into plne. It cn e proven tht (10) corresponds to projecting points of the disprity surfce into the corresponding points on the plne of the virtul imge (up to scle, digonl mtrix) s illustrted in fig. 12. From (10) the centre of projection Q is redily computed s the null vector of the projection mtrix, thus yielding: ( Q = T x B T y B 1 T z fb ). Notice tht for T z = 0 the trnsformtion (10) is prllel projection (Q is t infinity). This, in turn mens tht sidewise (in the X direction) nd up/down (in the Y direction) motion of the virtul cn e simulted y simple projection of points of the min-cost surfce onto the virtul imge plne vi prllel rys. The inwrds/outwrds trnsltion of the virtul (T z 0), insted, is chieved y mens of centrl projection with finite centre of projection Q. The simple mpping etween the motion of the centre of projection Q nd the corresponding trnsltion of the virtul is illustrted in 12. For instnce, inwrds trnsltion (not zoom) is chieved y moving the centre Q from + towrds the plne of the virtul imge. Forwrd Input left view Input right view Bckwrd Figure 14: Forwrd/ckwrd trnsltion of virtul. The ottom row shows the synthesized cyclopen views with (left) forwrd virtul trnsltion, (center) cyclopen view, (right) ckwrd virtul trnsltion. Notice the prllx effect round the hed. Notice tht for Q = ( 1/2, 0, 1, 0) (i.e. O v = ( B/2, 0, 0) ) the virtul imge corresponds to the input left imge, for Q = (1/2, 0, 1, 0) (i.e. O v = (B/2, 0, 0) ) the virtul imge corresponds to the input right imge, nd for Q = (0, 0, 1, 0) (i.e. O v = (0, 0, 0) ) the virtul imge corresponds to the hlf-wy cyclopen imge. In order to produce high qulity output imges inverse mpping nd iliner interpoltion techniques re used. 5. Results Generting cyclopen views from still imge pirs. Figure 13 shows n exmple where the input left nd right imges of fig. 2 hve een used to generte the cyclopen view vi our lgorithm. Notice tht the sptil rtefcts (streks in fig. 2) hve een considerly reduced. In the centrl imge the gze hs een corrected. 3D trnsltion of the virtul. Figure 14 shows n exmple of trnslting the virtul towrds nd wy from the viewed scene. This is different from simple zooming or cropping of the output imge. In fct, prllx effect my e noticed in the oundry etween the hed nd the ckground, thus providing the correct threedimensionl feeling. 7

Down Up Figure 15: In-plne trnsltion of virtul. The left nd right input imges re the sme s in fig. 14. This tle shows the synthesized imges corresponding to trnsltion of the virtul long the x nd y xes. Notice the prllx effect round the hed. Also, the door frme is reconstructed nicely despite its prtil occlusion in the right input view. Figure 15 shows n exmple of in-plne trnsltion (long the X nd Y directions) of the virtul. Notice the reltive displcement of the hed with respect to the ckground. Cyclopen view genertion in long sequences. Finlly fig. 16 demonstrtes the effectiveness of the proposed lgorithm for reconstructing cyclopen views of extended temporl sequences. Notice tht most of the sptil nd temporl rtefcts re removed. 6. Conclusions nd Future Work We hve presented n efficient lgorithm for the synthesis nd geometric mnipultion of high-qulity virtul imges from pir of synchronized stereo sequences with lrge disprities (0 80 pixels). In this pper we hve focused on one-to-one teleconferencing pplictions ut the techniques descried re more generl nd cn e employed in other pplictions requiring novel view genertion nd dense stereo. The newly proposed three-lyer grph for dynmic progrmming, the nisotropic cost filtering nd the temporl ckground model uilding hve een demonstrted effective in the synthesis of novel views with virtul plced in generic loction. With the current unoptimized implementtion these results hve een produced t rte of out frme every 2 sec (on 320 240 imges, on 2.8Ghz Pentium). Finlly, for the future development of the work presented in this pper, thorough experimenttion with different lyouts, code optimiztion for rel-time synthesis, nd Figure 16: Gze correction for long sequences. Frmes extrcted from reconstructed cyclopen sequence (over 10 sec long). the genertion of stndrd test sequences for evlution of the lgorithm will e necessry. Acknowledgements. The uthors would like to thnk C.Rother, I.Cox, P.Anndn nd R.Szeliski for their useful comments nd inspiring discussions. References [1] E. Chen nd L. Willims. View interpoltion for imge synthesis. In SIGGRAPH, pges 279 288, 1993. [2] I. Cox, M. Ott, nd J.P. Lewis. Videoconference system using virtul imge. US Ptent, 5,359,362, 1993. [3] I.J. Cox, S.L. Hingorni, nd S.B. Ro. A mximum likelihood stereo lgorithm. Computer vision nd imge understnding, 63(3):542 567, 1996. [4] J. Gemmell, K. Toym, C. Zitnick, T. Kng, nd S. Seitz. Gze wreness for video-conferencing: A softwre pproch. IEEE Multimedi, 7(4), 2000. [5] V. Kolmogorov nd R. Zih. Multi- scene reconstruction vi grph cuts. In Proc. Europ. Conf. Computer Vision, Copenhgen, Denmrk, My 2002. [6] Y. Oht nd T. Knde. Stereo y intr- nd inter-scn line serch using dynmic progrmming. IEEE Trns. on Pttern Anlysis nd Mchine Intelligence, 7(2):139 154, 1985. [7] S. Roy nd I.J. Cox. A mximum-flow formultion of the n- stereo correspondence prolem. In Proc. Int. Conf. Computer Vision, pges 492 499, 1998. [8] D. Schrstein nd R. Szeliski. A txonomy nd evlution of dense two-frme stereo correspondence lgorithms. Int. J. Computer Vision, 47(1 3):7 42, 2002. [9] J. Sun, H. Y. Shum, nd N. N. Zheng. Stereo mtching using elief propgtion. In Proc. Europ. Conf. Computer Vision, Copenhgen, Denmrk, My 2002. [10] T. Vetter. Synthesis of novel views from single fce imge. Int. J. Computer Vision, 28(2):103 116, 1998. [11] R. Yng nd Z. Zhng. Eye gze correction with stereovision for video tele-conferencing. In Proc. Europ. Conf. Computer Vision, volume 2, pges 479 494, Copenhgen, Denmrk, My 2002. 8