Real-Time Multi-Step View Reconstruction for a Virtual Teleconference System



Similar documents
Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)

A MOST PROBABLE POINT-BASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION

Web Application Scalability: A Model-Based Approach

The Online Freeze-tag Problem

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS

Pinhole Optics. OBJECTIVES To study the formation of an image without use of a lens.

The impact of metadata implementation on webpage visibility in search engine results (Part II) q

Alpha Channel Estimation in High Resolution Images and Image Sequences

A Modified Measure of Covert Network Performance

Concurrent Program Synthesis Based on Supervisory Control

An Introduction to Risk Parity Hossein Kazemi

Title: Stochastic models of resource allocation for services

Time-Cost Trade-Offs in Resource-Constraint Project Scheduling Problems with Overlapping Modes

Local Connectivity Tests to Identify Wormholes in Wireless Networks

An Associative Memory Readout in ESN for Neural Action Potential Detection

Automatic Search for Correlated Alarms

Storage Basics Architecting the Storage Supplemental Handout

Joint Production and Financing Decisions: Modeling and Analysis

Multiperiod Portfolio Optimization with General Transaction Costs

Two-resource stochastic capacity planning employing a Bayesian methodology

Memory management. Chapter 4: Memory Management. Memory hierarchy. In an ideal world. Basic memory management. Fixed partitions: multiple programs

United Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane

Branch-and-Price for Service Network Design with Asset Management Constraints

Computational Finance The Martingale Measure and Pricing of Derivatives

Service Network Design with Asset Management: Formulations and Comparative Analyzes

The risk of using the Q heterogeneity estimator for software engineering experiments

Comparing Dissimilarity Measures for Symbolic Data Analysis

X How to Schedule a Cascade in an Arbitrary Graph

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

Risk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7

Evaluating a Web-Based Information System for Managing Master of Science Summer Projects

Risk in Revenue Management and Dynamic Pricing

Learning Human Behavior from Analyzing Activities in Virtual Environments

Factoring Variations in Natural Images with Deep Gaussian Mixture Models

FDA CFR PART 11 ELECTRONIC RECORDS, ELECTRONIC SIGNATURES

Compensating Fund Managers for Risk-Adjusted Performance

F inding the optimal, or value-maximizing, capital

Machine Learning with Operational Costs

An inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods

Complex Conjugation and Polynomial Factorization

An Efficient NURBS Path Generator for a Open Source CNC

Static and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks

Monitoring Frequency of Change By Li Qin

Analysis of Effectiveness of Web based E- Learning Through Information Technology

C-Bus Voltage Calculation

A Multivariate Statistical Analysis of Stock Trends. Abstract

On Multicast Capacity and Delay in Cognitive Radio Mobile Ad-hoc Networks

An important observation in supply chain management, known as the bullwhip effect,

CABRS CELLULAR AUTOMATON BASED MRI BRAIN SEGMENTATION

THE RELATIONSHIP BETWEEN EMPLOYEE PERFORMANCE AND THEIR EFFICIENCY EVALUATION SYSTEM IN THE YOTH AND SPORT OFFICES IN NORTH WEST OF IRAN

EECS 122: Introduction to Communication Networks Homework 3 Solutions

TOWARDS REAL-TIME METADATA FOR SENSOR-BASED NETWORKS AND GEOGRAPHIC DATABASES

Robotic Exploration Under the Controlled Active Vision Framework

Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network

Electronic Commerce Research and Applications

2D Modeling of the consolidation of soft soils. Introduction

Synopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE

COST CALCULATION IN COMPLEX TRANSPORT SYSTEMS

On the predictive content of the PPI on CPI inflation: the case of Mexico

Modeling and Simulation of an Incremental Encoder Used in Electrical Drives

From Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling

The fast Fourier transform method for the valuation of European style options in-the-money (ITM), at-the-money (ATM) and out-of-the-money (OTM)

Buffer Capacity Allocation: A method to QoS support on MPLS networks**

Introduction to NP-Completeness Written and copyright c by Jie Wang 1

The Economics of the Cloud: Price Competition and Congestion

Moving Objects Tracking in Video by Graph Cuts and Parameter Motion Model

Service Network Design with Asset Management: Formulations and Comparative Analyzes

Failure Behavior Analysis for Reliable Distributed Embedded Systems

Greeting Card Boxes. Preparation and Materials. Planning chart

Asymmetric Information, Transaction Cost, and. Externalities in Competitive Insurance Markets *

Managing specific risk in property portfolios

Load Balancing Mechanism in Agent-based Grid

Sage Document Management. User's Guide Version 13.1

Efficient Training of Kalman Algorithm for MIMO Channel Tracking

Drinking water systems are vulnerable to

SECTION 6: FIBER BUNDLES

Pressure Drop in Air Piping Systems Series of Technical White Papers from Ohio Medical Corporation

Large firms and heterogeneity: the structure of trade and industry under oligopoly

Web Inv. Web Invoicing & Electronic Payments. What s Inside. Strategic Impact of AP Automation. Inefficiencies in Current State

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree

Improved Algorithms for Data Visualization in Forensic DNA Analysis

Multi-Channel Opportunistic Routing in Multi-Hop Wireless Networks

A Certification Authority for Elliptic Curve X.509v3 Certificates

CFRI 3,4. Zhengwei Wang PBC School of Finance, Tsinghua University, Beijing, China and SEBA, Beijing Normal University, Beijing, China

Design of A Knowledge Based Trouble Call System with Colored Petri Net Models

Transcription:

EURASIP Journal on Alied Signal Processing 00:0, 067 087 c 00 Hindawi Publishing Cororation Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System B. J. Lei Information and Communication Theory Grou, Deartment of Mediamatics, Faculty of Information Technology and Systems (ITS), Delft University of Technology, P.O. Box 503, 600 GA Delft, The Netherlands Email: B.J.Lei@its.tudelft.nl E. A. Hendriks Information and Communication Theory Grou, Deartment of Mediamatics, Faculty of Information Technology and Systems (ITS), Delft University of Technology, P.O. Box 503, 600 GA Delft, The Netherlands Email: E.A.Hendriks@its.tudelft.nl Received 9 August 00 and in revised form 0 February 00 We roose a real-time multi-ste view reconstruction algorithm and we tune its imlementation to a virtual teleconference alication. Theoretical motivations and ractical imlementation issues of the algorithm are detailed. The roosed algorithm can be used to reconstruct novel views at arbitrary oses (osition and orientation) in a way that is geometrically valid. The algorithm is alied to a virtual teleconference system. In this system, we show that it can rovide high-quality nearby virtual views that are comarable with the real erceived view. We exerimentally show that, due to the modular aroach, a real-time imlementation is feasible. Finally, it is roved that it is ossible to seamlessly integrate the roosed view reconstruction aroach with other arts of the teleconference system. This integration can seed u the virtual view reconstruction. Keywords and hrases: teleconference, 3D vision, view reconstruction, real-time, virtual.. INTRODUCTION There is a recent trend in comuter grahics and comuter vision to merge with and benefit from each other []. Comuter grahics is good at roviding high-quality 3D ercetion with the hel of comlex 3D models []. However, human eyes are still able to easily distinguish a comutergenerated icture out of real-life hotograhs [3]. On the other hand, in the comuter vision field we have hotorealistic images, from which various kinds of meaningful information could be extracted [4]. But until now, unless in a restricted environment, the information we can get is not sufficient to automatically construct very accurate 3D models [5]. Modelling comlex, nonrigid objects like the human body and face [], which is imortant for teleresence video conference systems, is even more difficult. Aroaches like the Image Based Modeling (IBM) [6] and the Image Based Rendering (IBR) [7] have been roosed in order to efficiently bridge the ga between these two fields. This exchange of knowledge and exerience is roved to be very romising [8]. Within the Fifth Euroean Framework roject VIRTUE (VIRtual Team User Environment) [9], a 3-arty virtual teleconferencing system, is working steadily in order to realize the convergence of comuter grahics and comuter vision in a real alication. In this system the articiants are given a convincing imression of resence in a semiimmersive environment. It is required that this environment is characterized by eye-to-eye contact, gaze awareness, direct body language, life-sized ortraits, and shared work sace (Figure ). It was found that, in order to fulfill these requirements, it is essential to adat the erceived views of all conferees to their viewoint change. For examle, in Figure,if conferee A wants to be given a looking around feeling about conferee C (in order to have a teleresence of C at the location of A), the correct view about C corresonding to the viewoint of A should be reconstructed in real time from the two fixed views coming from cameras and at site. (For more detailed discussion about VIRTUE, see Aendix A.) Real-time novel view reconstruction is the generation of arbitrary novel views from limited number of known views and it is crucial for the success of VIRTUE. Not only the quality of the reconstructed novel view should be realistic enough, but also the broadcasting and the view reconstruction should be done in real time (e.g., 5 frames/second). The real-time broadcasting issue has been addressed in [0].

068 EURASIP Journal on Alied Signal Processing. RELATED WORK Figure : The mock-u of one VIRTUE station where the real table is extended seamlessly into the virtual table in dislay. The fullsize remote articiants are rendered as arbitrary D video objects and their synthesized looks will change in line with the local articiant s head osition. The two cameras mounted on the left and to-left side of the screen rovide two video streams for 3D analysis and view synthesis for the left viewer in the dislay; likewise for the two right-hand side cameras. The eye-to-eye contact, normal habitual hand gesturing, and gaze awareness are exected to be maintained. This aer mainly deals with the view reconstruction issue, by following the aforementioned IBR aroach. Due to the symmetry in the system, we concentrate only on reconstructing a novel view of conferee C (remote articiant) atsite(local site) from the information rovided by cameras and at site (remote site) and by the viewoint of conferee A (local articiant). Henceforth, cameras and at the remote site are referred to as the left (C L ) and the right (C R ) camera, resectively. These two cameras form a stereo setu for the view reconstruction. Each time, at the remote site, the fixed stereo setu acquires two images. After segmentation, the air of stereo views, containing only the remote articiant without background, is broadcasted to the local site. Locally, based on the information about the stereo setu, the local dislay, and the ose (osition and orientation) of the viewoint of the local articiant, these two views are used to reconstruct a novel view (teleresence) of the remote articiant that is adated to the current local viewoint. The reconstructed novel view is then combined with a man-made uniform virtual environment in order to give the local articiant the ercetion of being in a local conference with the remote articiant. The aer is organized as follows. In Section, an overview of work related to IBR is given. The rocessing chain of the whole VIRTUE system is introduced in Section 3. In Section 4, we conduct a theoretical analysis of the roosed view reconstruction algorithm. Furthermore, the algorithm s relationshi with revious work is examined. Based on this analysis, the real-time imlementation issues are considered in Section 5. Two ossible schemes are investigated and comared with each other. The best otion is chosen for real-time realization in VIRTUE. Exeriments are done in Section 6 in order to demonstrate the quality and the seed of the imlemented algorithm. Comarisons are made with two other well-studied aroaches. Finally, we draw conclusions in Section 7. Traditionally, the view reconstruction roblem was solved by first constructing a 3D model out of the acquired information. The 3D model is then rojected into a virtual camera by combining the texture maing in order to obtain the desired view. The first ste is usually named 3D reconstruction in comuter vision. The second ste is either called rendering in comuter grahics or simly rojection in comuter vision. However, since real-time 3D reconstruction has been roved to be very difficult [] and often ill-osed [], IBR has intensively been studied as an alternative [7]. One big advantage of the IBR scheme is its image-size-roortional comlexity (indeendent of the scene comlexity). This roerty makes the stable real-time imlementation ossible [3]. Below we briefly address the IBR method. More general and detailed discussions can be found in [3, 7, ]. Following the IBR aroach, different kinds of reresentations of the 3D scene have been roosed. Collection of ure D images A collection of ure D images from either video sequences [4] or multile cameras at different oses [5] is directly used for reconstructing novel views. Layer reresentation The whole 3D scene is decomosed into multile layers according to different uroses. These layers rovide faster and more robust rendering ossibilities. Several kinds of layers have been roosed in the literature: motion consistent layers in [6], occlusion derived layers in [7, 8, 9], and lanar layers in [0]. Plenotic function This function was introduced by Adelson and Bergen []in order to characterize the comlete flow of light for all ossible wavelengths, at any time, from each direction of a 3D scene, and at every ose. Afterwards many simlifications have been attemted on it [7]. No geometry information is needed to roduce a new view, due to its comlete descrition of the 3D scene. However, it inevitably relies on the oversamling, which is both time and labor consuming. Panorama view Hand-created or automatically generated view on a circular sace from which a viewer can insect his surroundings in all directions []. This technology has gained great success in real alications such as the QuickTime VR system [3]. But the osition of the viewer is constrained to a fixed oint and the caturing of the needed views should be done in a secial way (all focal centers coincide with each other). Mosaic All images are wared into a uniform image coordinate system [4]. It can roduce a very realistic immersive environment ma. Recently, stereo cue starts to be added into it [5]. However, manual work cannot be avoided.

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 069 3 3 4 B A 4 A C C Network B Site (Remote site) Site 3 Virtual articiant Real table 4 C 3 A B Camera system Stereo microhones and seakers Motion arallax cue: When the viewer moves his/her head, his viewoint changes, so does the view he received Local articiant Site (Local site) Document sharing Virtual table Figure : Illustration of the networked VIRTUE station in a three-way teleresence video conferencing session. Assuming accurate 3D scene analysis, the technique behind the novel view generation is the focus of the current aer. Tour into a icture By incororating some human knowledge and observation about the 3D info embedded in an image, Youichi et al. [6] can even reresent a 3D scene by one single icture (also see [7]). Moreover, by taking into account more constraints, a anorama can even be created from only one available view [8]. Again, manual work is needed to rovide the rough 3D info. All reresentations mentioned above have rovided certain nice roerties for reconstructing new views. However, excet for the first two, all of them are very difficult, if not imossible, to be done automatically. The layered reresentation of 3D scenes offers a very nice oortunity to handle occlusions and redundancy in the stereo views. But it is rather comlicated to segment the images according to certain consistency (e.g., motion) and later grou the regions into different layers [8]. Therefore, the first reresentation with ure stereo views is chosen as the starting oint for the VIRTUE view reconstruction. In order to reconstruct a new view directly from a discrete set of images at different oses, four main aroaches are currently followed. Hybrid 3D models and image rendering By combining simlified 3D structure models and realistic hotograhs, Debevec [8] has made amazing visual roducts (see htt://www.debevec.org/).however,duetotheuseof3d models, manual works cannot be avoided and thus the whole rocess (including modelling and rendering) is imossible to be automatic. Visual hull Starting from the visible silhouette information embedded in the given views, this aroach aroximates the geometry of the considered scene with either multile visual hulls (the intersections of the silhouette come from all the given views) [9] or a single convex hull [30]. Performing all comutations in the D image sace instead of in the 3D sace, as the traditional discrete volumetric reresentations did [3], realtime rendering may be guaranteed [9]. However, the visual hull of an object does not match the object s exact geometry and in articular it cannot reresent concave surface regions [3]. Geometry driven methods Geometric relations between known views are first comletely recovered and then used for transferring known views to the new ose. Two schemes have been investigated in this category. () Using the fundamental matrix: LaveauandFaugeras [33] by using fundamental matrix, tried to construct a new virtual view from N views. Each ixel in the desired novel view is maed from a ixel in one known view based on the fundamental matrix between them. An attractive idea in [33] is the use of the ray tracing technique for handling the occlusion areas. () Using the trifocal tensor: trying to avoid the singularity situations (the otical centers of the two known views and the virtual view are collinear) that may occur in the fundamental matrix, Avidan and Shashua [34] exlored the usage of trifocal tensors (also called trilinearity) in view reconstruction.

070 EURASIP Journal on Alied Signal Processing Camera L Left image Pre-rocessing Distortion Segmentation correction View Camera R Right image Segmentation Distortion correction Distortion-free left view V L Parallel left view V recti L LR and RL disarity mas Disarity estimation Parallel right view V recti R Distortion-free right view V R Synthesis Virtual local conference with the remote arty Comositor Virtual destination view V D Man-made virtual surroundings Figure 3: The rocessing chain for adating the synthesized look of one articiant in line with the viewoint change of another articiant. Based on a air of stereo sequences, the virtually erceived view should be reconstructed and integrated seamlessly with the man-made uniform environment in real time. The solution in[34] is theoretically erfect for stereo setu. It can geometric-validly recover arbitrary viewoints from a discrete set of stereo images. Even more, by relacing the trifocal tensor with higher order tensors (e.g., quadrifocal tensor [35]), the geometry of the new view can directly be reconstructed from multile (> ) discrete known views. However, there are two main disadvantages in this aroach: () occlusion areas are very difficult to be taken into consideration; () the calculation of the trifocal tensor is rather comlex. Although a simle equation is derived in[34] to transform the trifocal tensor in line with the viewoint changing, all its 7 elements must be udated each time. This hinders the ossibility of real-time rocessing. Interolation based methods Chen and Williams [36] ostulated a ossible interolation scheme for intermediate views from a stereo air. Later this method was alied successfully in a vision system Quicktime-VR [3]. Seitz and Dyer [5, 37] investigatedfur- ther along this line and roosed the hysical-valid view interolation method (re-waring followed by interolation lus ost-waring). From this method arbitrary views (with valid geometry) on the baseline between a air of stereo views can be reconstructed. Later on, Scharstein [] addedmore freedoms into the ossible ose of the desired novel view. Therein the only constraint is that the trifocal lane formed by the three focal centers of the two original views and the novel view should not intersect with the concerned 3D scene (see [, age 47]). One major advantage of this aroach is that it erforms very well when rocessing relative comlex scenes. Two other nice roerties of this interolation based aroach that have not been addressed so far are the following. Modular arrangement This gives us much sace to seed u the whole view reconstruction rocess by either rearranging rocessing comonents or simlifying the comlex arts without affecting the quality of the final novel view. Thus a stable real-time realization may be guaranteed. Parallel setu By alying this setu comutations may be simlified from D sace to multile D saces, allowing arallel rocessing. Therefore, it can hel to kee the latency very low. Further, this arallel setu allows us to handle the occlusion areas in a very intuitive way. For our virtual conference system with teleresence ercetion, the interolation based aroach seems to be the most romising one. Our objective is to ut this into a ractical alication which has seldom been studied excet in rather constrained environments like the one described in [3]. Modular arrangement and arallel setu will be exlored in Section 5. In the following section, an introduction on the rocessing chain of the whole VIRTUE system will be given. Each stage in the rocess will be briefly addressed. 3. VIRTUE SYSTEM PROCESSING CHAIN The intended rocessing chain of VIRTUE is deicted in Figure 3. It can be divided into four stages: re-rocessing, disarity estimation, view reconstruction, and comosition. 3.. Pre-rocessing The original images coming from the fixed stereo setu contain various kinds of imaging distortions [38] andback- ground that should be removed or neglected. Some rerocessing is needed to get rid of these unnecessary information.

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 07 3.. Segmentation For the segmentation urose, the background is first imaged by all emloyed cameras for several seconds. These images are rocessed to build a Gaussian model at every ixel for each camera by aroximating the scene background with a texture surface. This Gaussian model is then used to distinguish a foreground ixel from a background ixel in the conference session. It has been shown that this kind of change detection scheme is very flexible and feasible for realtime imlementation and thus able to meet the requirement of VIRTUE [39]. 3.. Distortion correction Camera distortions [38] arecomensatedin thisste tokee all following oerations linear in the rojective sace. Camera distortions are nonlinear. However, if the imaging distortion definition is emloyed [38], then there is a one-tomany corresondence between the distorted image coordinates and the distortion-free coordinates. This enables us to construct beforehand, a backward maing looku table for distortion correction that is attractive for real-time imlementation. 3.. Disarity estimation Based on a air of rectified arallel views, the disarity mas (both left-to-right and right-to-left) [40] areestimatedto reresent imlicitly the 3D information contained in them. A hybrid block- and ixel-recursive aroach has been chosen for VIRTUE. The main idea of this new algorithm is to combine the advantages of the block-recursive disarity estimator and a ixel-recursive otical flow estimator into one common scheme, leading to a fast, hybrid, recursive disarity estimation [4]. Although the disarity estimation is very imortant for the quality of the overall result, it is not within the scoe of this aer. For a detailed discussion on the disarity estimator used for VIRTUE, see [4]. 3.3. View reconstruction With a air of re-rocessed clean views V L (the left view) and V R (the right view), the view reconstruction will generate avirtualviewv D in accordance to the actual viewoint of the local articiant. This is the toic of this aer and will be discussed in detail. 3.4. Comosition In the comositor, the 3D model of a man-made uniform environment ma is combined with the reconstructed virtual view of the remote articiants in order to construct the final virtual conferencing environment. This comosition can efficiently be realized using current off-the-shelf grahics card with texture handling caability []. The final comosite view is ut on a life-size dislay to give the local articiant the imression of being in a conference room with the other articiants. 4. MULTI-STEP VIEW RECONSTRUCTION In the above described VIRTUE system, the view reconstruction stage lays a vital role. In this section, we generalize the theory of the interolation-based IBR to accommodate arbitrary novel viewoints. We base this generalization on the known camera geometry. We call the generalized method multi-ste view reconstruction. We will show that, without assuming any constraints on either the 3D scene or the camera geometry, our multi-ste view reconstruction maintains the geometry validity of the reconstructed view. 4.. Problem analysis In brevity, our roblem is to reconstruct an arbitrary novel view V D from a air of stereo views V L (the left view) and V R (the right view), which came from two fixed-ose cameras C L and C R, resectively. Corresondingly, we can treat V D as coming from a virtual camera C D (in corresondence with the current viewoint, its geometry can be calculated from the system configuration, the stereo setu, the local dislay, and the ose of the viewoint). So the roblem to be solved is simly: to comute V D as coming from C D,givenV L from C L and V R from C R. Considering a single 3D-scene oint W, the roblem can be further simlified as: given P L (the rojection of W into C L ) and/or (in case of occlusion) P R (the rojection of W into C R ), comute P D (the rojection of W into C D ). Two things should be noticed here: () we only address a fully calibrated situation [4], which means that the geometry details of C L, C R,andC D are all known in advance with high accuracy; () distortions due to imaging system have been removed by re-rocessing. Thus only linear relations will be taken into account henceforth. 4.. Notation We denote by V the D view generated from the camera C,where stands for, for examle, L, R, D, recti L, recti R, X, Y, Z. P is the rojection of the 3D-scene oint W into C. The intrinsic arameters of C are f (focal length), s x and s y (the effective ixel distance in the horizontal (x-axis) and vertical (y-axis) direction, determined by the samling rates for Vidicon cameras or the sensitive distance for CCD or CID cameras, resectively), x 0 and y 0 (the ixel coordinates of the rincial oint with resect to the image lane coordinate system). The extrinsic arameters of C are R c and t c (rotation matrix and translation vector of the CCS (camera coordinate system) with resect to the WCS (world coordinate system)). The rojection matrix is P = K Ẽ, where K is the intrinsic matrix and Ẽ is the extrinsic matrix [38]. In the CCS of C, the coordinate of W is w (the rojective corresondence is w ). In the WCS, the coordinate of W is w (the rojective corresondence is w). The coordinate of P in view V is (the rojective corresondence is ).

07 EURASIP Journal on Alied Signal Processing V recti L Distortion-free left view V L LR and RL disarity mas Distortion-free right view V R Rectification Rectification Parallel left view V recti L Parallel right view V recti R Interolation along x-axis X-Interolated View V X Extraolation along y-axis V recti R Virtual destination view V D De-recti fication to virtual view Z-Transferred View V Z Transfer along z-axis Y-Extraolated View V Y Figure 4: The multi-ste view synthesis framework for VIRTUE. Multile searate stes work together to eliminate three major differences between the final novel view V D andthetwooriginalviewsv L and V R : () hotometric differences such as focal length and asect ratio etc.; () osition in 3D sace (x, y, z); (3) orientation. Further t c = x c y c z c, [ ] x x w xw =, w y = y w, w = y w, z w zw x x w x w = y, w = y w z w, w = y w zw. Then we can write [38] () λ = P w = K Ẽ w f 0 x 0 s x = 0 f [ y R T 0 c s y ] Rc t T c w, () 0 0 where λ is the deth of the oint W in the CCS of C. The line connecting the focal oints of C L and C R is usually called baseline b. Its length is simly noted as b. 4.3. Multi-ste view reconstruction rocess Without loss of generality, we can secifically select the WCS to be such that x cd t cl = 0, t cr = 0, t cd = y cd. (3) 0 0 This means that the x-axis of the WCS is on the baseline b and its direction is from C R to C L. The origin of the WCS stays at the middle oint of b. And the unit of the WCS is b/. z cd In the general case, the multi-ste view reconstruction rocess can be divided into five stes (Figure 4). () Rectification. We transform the stereo views V L and V R into a air of new views V recti L and V recti R,resectively. ThetwovirtualcamerasC recti L and C recti R,whichgenerate those two new views, are arallel to each other and share the same image lane. This rocess is known as stereo rectification [43]. It is intended to eliminate the hotometric and orientation differences between the two source cameras, to simlify the corresondence estimation into a D search roblem along the scan line and at the same time to rovide arallel rocessing ossibility for latter stes. () X-interolation. Given the necessary disarity information, the two arallel views V recti L and V recti R are combined by either interolation [44] orextraolationtoro- duce another arallel view V X. The corresonding camera C X is located at [x cd 0 0] with the same rotation and intrinsic arameters [4]asC recti L and C recti R. Through this ste, the x osition difference from the known views V recti L and V recti R to the final view V D is eliminated. (3) Y-extraolation. TheX-interolated view V X is extraolated [] by ixel shifting in the Y direction to roduce the view V Y, which is coming from a virtual camera C Y located at [x cd y cd 0] with the same rotation and intrinsic arameters as C X. Through this ste, the y osition difference from V X to the final view V D is eliminated. (4) Z-transfer.TheY-extraolated view V Y is transferred along the Z direction to generate either a closer or further look V Z. In the same way, the corresonding camera C Z is located at [x cd y cd z cd] with the same rotation and intrinsic arameters as C Y. Finally, the z osition difference from V Y to the final view V D is eliminated. (5) De-rectification.TheZ-transferred view V Z is rotated and scaled to get the final view V D. In the following subsections, we go through these five stes and discuss in detail all the formulas involved. Note that through the whole transformation rocess the geometric validity is maintained.

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 073 4.4. Rectification To obtain a arallel configuration, C recti L and C recti R should have the same set of intrinsic arameters and the same orientation in sace []. Further, in order to simlify the notation we may assume, without loss of generality, that C recti L and C recti R have the same orientation as the WCS f 0 x 0 s x K recti L = K recti R = K = 0 f y, 0 s y 0 0 0 0 R c recti L = R c recti R = 0 0, 0 0 where the intrinsic transformation matrix K [38] and its associated intrinsic arameters are emloyed just for notation simlicity. Based on the above notations, as both K L and K R are invertible, we have [45] (4) λ recti L recti L = λ L KR cl ( K L ) L + K ( t cl t c recti L ), λ recti R recti R = λ R KR cr ( K R ) R + K ( t cr t c recti R ). (5) However, we do not know the deth of the oint W yet. In order to make the above two equations indeendent of the deth information, we have only one choice [43], which is to set t c recti L = t cl, t c recti R = t cr. (6) In this case, we can construct two homograhies [4] T L and T R such that 4.5. X-interolation Because C X, C recti L,andC recti R are arallel to each other with the x-axis on the baseline b, the rojections of W into them have the same y-coordinates but different x-coordinates x recti L x recti R y X = y recti L = y recti R = f s y yw z w + y 0, (0) = f s x xw z w + x 0, () = f s x xw ( ) z w + x 0, () x X = f xw x cd + x 0. (3) s x z w By subtracting ()from(), weobtain z w = s x ( x recti R f x recti L ) = f s x d LR By subtracting ()from(3), weget = f s x d RL. (4) x X = x recti R + f xcd +. (5) s x z w By substituting (4) into (5)weget x X = ( x recti L + x recti R ) / (x recti R x recti L ) xcd } {{ } } {{ }. (6) Middle view Disarity Equation (6) is just the interolation equation derived by Seitz and Dyer in [44]. However, it should be noted that, in order to obtain (6), we did not add the ordering constraint [46]. By rearranging (6)we have where recti L = λl λ recti L T L L, recti R = λr λ recti R T R R, (7) x X = x recti L and/or (in case of occlusion) + x cd d LR (7) T L = KR cl ( K L ), TR = KR cr ( K R ). (8) It should be noted that here we need not know the exact value of λ L /λ recti L and λ R /λ recti R, because they have been determined imlicitly by the definition that the third comonents of recti L, L, recti R,and R are all. After the rectification in (7) iserformed,itcanbever- ified that recti L and recti R have the same y coordinates but different x coordinates [43]. Therefore, we can comute, using a disarity estimation algorithm [40], two disarity mas D LR (left-to-right, based on the view V recti L, the disarity value at P recti L is denoted as d LR )andd RL (right-to-left, based on the view V recti R, the disarity value at P recti R is denoted as d RL )where d LR = x recti R x recti L, d RL = x recti L x recti R. (9) x X = xrecti R + +x cd d RL. (8) To kee the 3D info that was recovered from the stereo data, a new disarity ma D X is constructed wherein (based on the view V X, the disarity value at P X is denoted as d X ) d X = drl = f or d X s x z = dlr w = f. (9) s x z w 4.6. Y-extraolation As C X and C Y are arallel to each other with aligned y-axes, the rojections of W into them should have the same x- coordinates but different y-coordinates x Y = x X = f s x xw x cd z w + x 0, (0)

074 EURASIP Journal on Alied Signal Processing y X = f s y yw z w + y 0, () y Y = f s y yw y cd z w + y 0. () By subtracting ()from()we have y Y = yx f ycd = y X s y z y cd sx d X. (3) w s y Scharstein has derived similar results as (7), (8), and (3) (see [, age 45]). One major difference is that Scharstein neglected the functioning of s x and s y on the extraolation. To retain the 3D information, another disarity ma D Y is constructed wherein (based on the view V Y, the disarity value at P Y is denoted as d Y ) d Y = f. (4) s x z w Although it looks like we reeatedly transfer the same info here, it is really necessary because the osition information (content of ( stands for, e.g., X,Y)) is imlicitly encoded in the view itself. 4.7. Z-transfer When we move along the deth direction, both x and y coordinates are changed accordingly x Z = f s x xw x cd z w z cd + x 0, (5) y Z = f s y yw y cd z w z cd + y 0, (6) bysubstituting (0)and(4) into (5)weget Similarly for y Z x Z = ( x Y x 0) y Z = ( y Y y 0) f f + s x z cd d Y f f + s x z cd d Y + x 0. (7) + y 0. (8) 4.8. De-rectification U to this stage, the view has been moved to the destination osition t cd = [x cd y cd z [ 00] cd]withorientation 00.InCZ 00 and C D,wehave λ Z Z = K ( w t cd ), λd D = K D R T cd( w tcd ). (9) As both K and K D R T cd are invertible [45], we can easily write down for the de-rectification ste λ D D = λ Z T D Z, (30) where T D = K D R T cd( K ). (3) This is similar to the rectification (see (3)). 4.9. Summary The multi-ste reconstruction rocess, as described in this section, is a generalization of the interolation-based aroach. It combines existing methods and generalizes them to accommodate arbitrary novel viewoints. () In the case of arallel setu, in order to obtain an intermediate virtual view, Chen and Williams [36] rooseda view interolation method that is the same as ste here. () In the arbitrary stereo configuration case, in order to maintain the geometric validity of the reconstructed view, Seitz and Dyer [44] added the rectification (ste ) and the de-rectification (ste 5) into the view interolation system. (3) In order to get a novel view at arbitrary camera ose confined in a lane, Scharstein [] addedste3. (4) Based on the comlete camera geometry, we formally generalize the interolation-based aroach for arbitrary stereo configurations and novel views at arbitrary oses. 5. IMPLEMENTATION CONSIDERATIONS To make the whole view reconstruction rocess suitable for real-time alications such as VIRTUE, two critical issues should carefully be addressed: occlusion handling and comutation efficiency. In this section, these two issues will be elaborated along with a detailed analysis of the realization. Instead of reconstructing a novel view in a comlex single ste like the Trifocal tensor aroach in [34], the multiste scheme decomoses the whole view reconstruction rocess into two arts: one art (the rectification and the derectification) deals with the hotometric and orientation difference and the other (the middle three stes) handles the osition comonent. First we discuss the imlementation issues for these two arts. Fast rocessing otentials will be exlored in detail. For the translation handling art two simlifying ossibilities are roosed and comared with each other. The best is chosen for VIRTUE. After this, the view reconstruction rocess will be integrated with other stages within the intended VIRTUE framework. To show the rocess in each ste, we use a stereo air (Figure 5, image size 70 576) from the real VIRTUE setu as an examle. 5.. Photometric and orientation difference handling Rectification is very imortant for two main reasons. First, two views can be interolated easily if they came from two cameras that are arallel to each other and these two cameras share the same image lane. Second, the arallel setu significantly facilitates the corresondence estimation task, in which case the corresondence search in D sace is simlified along D scanlines [40].

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 075 (a) Original left view V L. (b) Original right view V R. (a) Rectified left view V L. (b) Rectified right view V R. Figure 5: An examle stereo air from the cameras and of the VIRTUE setu as shown in Figure 0. Notice the large occlusion areas around the right arm of the articiant and the converging effect of the setu. Figure 7: The stereo views are rectified considering the eiolar geometry. Thus two constraints can be alied to make them have the same smallest ossible size 0 8. () the width of the rectified stereo air should be the larger one of the widths of the two rectified images (left and right); () the height of the rectified stereo air should be the smaller one of the heights of the two rectified images (left and right). (a) Rectified left view V L. (b) Rectified right view V R. Figure 6: The stereo views are rectified without considering the eiolar geometry. To retain all the information (including the black area) from the original views, they may have different sizes. In this stereo setu, the size of the rectified left view is 0 955 and that of the rectified right view is 848 8. From (7), since K, R cl,andr cr are all invertible, we can write down L = λrecti L λ L T L recti L, R = λrecti R λ R T R recti R. (3) This means that for every ixel in the rectified view, there exists a corresonding ixel in the corresonding original view and it enables us to emloy the backward maing technique (see Aendix B). The backward maing can be realized very fast by a simle looku table, which can be constructed beforehand and it may stay the same as long as the setu is fixed. Two fast ways to assign intensities in the rectified view are zero-order (nearest neighbor) and first-order (bilinear interolation). Through extensive exeriments we found that for the VIRTUE setu the nearest neighbor is three times faster than the bilinear interolation, while the image quality generated by both is comarable to each other. Thus we adot the nearest neighbor for the rectification rocess. During the rectification, in order to retain all information we have from the original images and at the same time to kee the rectified image size as small as ossible, the rectified image should be shifted and cut to kee only the useful information. Further, due to the eiolar constraint [4],two constraints should be alied: Figure 6 shows the rectification alied to the left and right view searately. The size of the rectified left view is 0 955 and that of the rectified right view is 848 8. By alying the two constraints mentioned above, we get a air of rectified stereo air of size 0 8 (shown in Figure 7). Since the de-rectification is similar to the rectification rocess (comare (30) with(7)), it can be imlemented by a nearest neighbor based looku table as well Z = λd λ Z T D D. (33) From Section 4.4, we note that there are two freedoms in choosing the geometric roerties of C recti L and C recti R :() the intrinsic arameter transformation matrix K; () the z- axis of the CCSs (imlicitly the WCS). The intrinsic transformation matrix K can be determined by the final dislay, in this case C D. In our current imlementation we choose K = ( K L + K R )/ and the z-axis of C recti L and C recti R to be at the middle osition of the z-axis of C L and C R (for detailed equations involved, see[38, Aendix A..5]). The advantage of this choice is that the rectification homograhies T L and T R are fixed for a fixed stereo setu. Thus two fixed looku tables can be built beforehand. 5.. Position difference handling Although translating the view in three searate stes along three orthogonal directions sounds straightforward, there is one major difficulty: in the Z-transfer ste, at each ixel, we have to calculate a different factor c c = f f s x z cd d Y. (34)

076 EURASIP Journal on Alied Signal Processing This makes the Z-transfer time-consuming, not only because of the comutation load but also of the random D memory access. In addition, (7) and(8) can only be imlemented by the forward maing technique, in which case the remaining D holes have to be filled in afterwards, which, again, is comutational exensive. (Note that although there exist forward maing techniques that do not leave holes [47], we found out that real-time imlementation for them is even more difficult, if not imossible.) Fortunately, there exist two schemes that can be exlored here to ease the rocessing burden. One or the other of these two schemes could be emloyed, according to different situations. 5.. Scheme one: one otimal uniform scale factor One scheme is to relace c by a constant c for every ixel in the destination view. A ossible choice is c = f f s x z cd d, (35) whered is a constant but otimal disarity value. Otimal here means otimal with resect to the minimal ersective distortion. In this case, (7) and(8)will become x Z = c x Y +( c) x 0, y Z = c y Y +( c) y 0. (36) A nice roerty of the two equations above is that backward maing (see Aendix B) can be alied if we rewrite them into x Y = xz ( c) x 0, c y Y = yz ( c) y 0, c (37) where c can be comuted in advance by locating d as the disarity corresonding to the eak in the disarity histogram. Thus where Y = Z Z Z, (38) ( 0 ) x 0 c c Z Z = 0 ( ) y 0. (39) c c 0 0 As this simlification in fact changes the Z-transfer ste into a zooming effect of the camera, we call it Z-zooming, to be distinguished from the general Z-transfer rocess. Of course, the simlification from Z-transfer to Z- zooming will introduce some ersective distortions. How- (a) Z-transfer. (b) Z-zooming. Figure 8: Both the Z-transfer and the Z-zooming are alied on the same Y-extraolated view. In the Z-transfer, each ixel has a different transform coefficient. While in the Z-zooming, all ixels are transformed by a uniform otimal scaling factor. The uniform factor is chosen to minimize the ossible ersective distortions. ever, from our extensive exeriments on the real VIRTUE setu, we did not find any noticeable artifacts yet. The quality of the results from the two cases are comarable. The visual quality of the result from Z-zooming is even better, since with the otimal disarity we get rid of minor disarity artifacts (see Figure 8). The above simlification holds well for the current VIRTUE setu. In other circumstances, for examle, when the deth difference in the scene is too large, we can segment the view into several layers by analyzing the disarity histogram in advance. Then we aly the above technique to each layer searately. Finally, a waring technique [47] can be alied to stitch them together to get the Z-transferred view. This multile scale factors roosal would roduce more geometrically accurate novel views than using one single scale factor. But inevitably it would involve more comutations. 5.. Scheme two: two D transfer stes On the other hand, if we take a closer look at (7) and(8), we note that the two calculations in x and y are indeendent from each other. Thus it is ossible to integrate the calculations in (7) into the X-interolation ste and the calculations in (8) into the Y-extraolation ste. If so, the translation difference handling art can be reduced from three stes to two D oeration stes: () X-transfer x X = (x recti L or ( x X = x recti R + x ) cd d LR x 0 + +x ) cd d RL x 0 f f + s x z cd d LR / + x 0 (40) f f s x z cd d RL / + x 0. (4)

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 077 Left occlusion Right occlusion Left occlusion Right occlusion Middle occlusion View angle View angle View angle C recti L C X C recti R Figure 9: To view about three arallel cameras imaging a 3D scene. The bold curve indicates the scene surface from the viewoint of the three cameras C recti L, C recti R,andC X. The left occlusion areas aear in V recti L but not in V recti R. The right occlusion areas aear only in V recti R but not in V recti L. The comlete 3D areas aear both in V recti L and in V recti R. These three tyes of info may or may not aear in V X. While the middle occlusion areas only aear in V X but neither in V recti L nor in V recti R. () Y-transfer ( ) y Y = y X + y cd sx d X s y f 0 y f s x z cd d X + y 0. (4) VIRTUE realization. The multile scale factors roosal will be investigated for the further extension. The two D transfer stes ossibility is reserved for a future imrovement when more advanced hardware are available. In this manner, only two D oerations are needed to transfer the rectified stereo views into the destination ose. Comared with the D oerator like the original Z-transfer, the D oerations have two advantages: () straight-forward hole filling; () more regular memory access (combining information only from the same row or column). 5..3 Comarison Although the two D transfer stes simlification is geometrically valid and sounds more romising, the adotion of one otimal uniform scale factor is more ractical for a real-time alication mainly due to three reasons: () as the disarity range in a stereo setu is limited, three small looku tables can be built u in advance for (( x cd )/)d LR, ((+x cd )/)d RL,andy cd s x /s y d X in (7), (8), and (3), resectively. While this is imossible for (40), (4), and (4). () The simlified Z-zooming function can be integrated together with the de-rectification ste as one backward maing ste without adding any additional comutation load such as ( λ Y D ) = λ Z Z ZT D D. (43) (3) The visibility issue could be solved imlicitly in the X-interolation and Y-extraolation oerations but it is rather comlicated in the X-transfer and the Y-transfer [48]. Based on these three considerations, the one otimal uniform scale factor scheme is chosen for the current Multile scene oints may ma to the same ixel in a view, while only the nearest (to the camera) one contributes intensity to that ixel. 5..4 Realization using one otimal uniform scale factor Considering the occlusion and visibility issues, our current realization based on the one otimal uniform scale factor scheme can be summarized as follows. X-interolation From (7) and(8), it seems that we only need one disarity ma lus one view to construct the X-interolated view V X erfectly. However, in ractice, we always suffer from the occlusion roblem [0]. In general, four cases in the 3D scene can be distinguished (Figure 9)[0]: (i) Comlete 3D info. The info that can be viewed in C recti L, C recti R,andC X. (ii) Left occlusion. The arts of the view that are visible in the left camera but not in the right camera. (iii) Right occlusion. The arts of the view that are visible in the right camera but not in the left camera. (iv) Middle occlusion. The arts of the view that are only visible in the X-interolated view. Suose the estimated disarity mas contain the correct (seudo) disarity in occluded regions. Then we can generate the X-interolated view V X by () For comlete 3D info: Aly either (7) or(8). A scene oint may aear, due to the visibility issue, either only in the left view, or only in the right view. So for each ixel in the destination view, the intensity may be contributed by a ixel in either the left view or in the right view. But we do not know this information in advance. It is revealed only after the forward maing has been alied both on the left and the right views.

078 EURASIP Journal on Alied Signal Processing () For left occlusion areas: Use (7). (3) For right occlusion areas: Emloy (8). (4) For middle occlusion areas: No info is available from either V recti L or V recti R. We have to aroximate them, for examle, by either nearest-neighbor or linear interolation. Assuming that the light condition between the left and the right view does not change too much, four cases can be considered to facilitate the comutation: (i) x cd : fetch the intensity of each ixel in V X from the corresonding ixel in V recti L ; (ii) 0 x cd < : fetch them from V recti L, excet for the right occlusion areas, where the information should come from V recti R ; (iii) < x cd < 0: fetch them from V recti R,excetfor the left occlusion areas, where the information should come from V recti L ; (iv) x cd : fetch the intensity of each ixel in V X from the corresonding ixel in V recti R. To avoid reetition, we give below only the imlementation descrition for the second case. The realization for the other three cases can be derived in a similar way. () Two looku tables are constructed for (( x cd )/)d LR and (( + x cd )/)d RL, resectively, based on the redetermined disarity range. () From (7), we generate a disarity ma D XL which is based on the view V X. The relations are d XL = d LR / and x X = xrecti L +(( x cd )/)d LR. For solving the visibility roblem, this ste rocesses ixels from left to right [48]. (3) From (4), we generate a disarity ma D XR which is based on the view V X. The relations are d XR = d RL / and x X = x recti R + (( + x cd )/)d RL. In order to solve the visibility roblem, this ste rocesses ixels from right to left [48]. (4) D XL and D XR are integrated to generate D X. (5) Consider now D X and each scanline, for every hole area where no value has been assigned (in fact, they are middle occlusions), if the two ends of the segment are both either left available or right available, then it is filled in by linear interolation. Otherwise it is filled in by nearest neighbor. (6) We synthesize the view V X by the backward maing technique based on D X, V recti L,andV recti R. Here we first imlicitly reconstruct the geometry of the view V X and then we use the backward maing technique. In this case, our aroach is similar to the one stated in [49] and [7]. It has been demonstrated to be very fast and stable. Note that, the X-interolation can indeendently be erformed row-er-row, facilitating a arallel imlementation. Y-extraolation From (3) we know that the Y-extraolation is a forward maing. The holes aearing in this rocess can be filled in by the linear interolation along the y direction. The visibility roblem can imlicitly be solved, by either rocessing from to to bottom (if y cd < 0), or from bottom to to (if y cd > 0), or coy directly V X into V Y (if y cd = 0). So the view is indeendently rocessed column-er-column. A arallel realization can also be romised. Normally, since the image data are stored row er row, the Y-extraolation may cause a large memory jum. Therefore several memory maniulation schemes (e.g., transosing followed by rocessing along x and transosing afterwards) have been investigated. 5.3. Integration with the whole system All considerations discussed above have considerably seeded u the view reconstruction art. Further, if we consider the whole VIRTUE system (see Figure 3), another advantage is that we can combine comonents from the view reconstruction with other rocessing in the VIRTUE rocessing chain. 5.3. Integrate rectification with distortion correction As we indicated in Section 3.., if we adot the imaging distortion, for each ixel in the distortion-free view, we can find a corresonding ixel ˆ in the distorted view by ˆ = f (), (44) where f is the distortion function [38]. Then the intensity of ˆ in the distorted view can be assigned to the ixel in the distortion-free view. From Section 5., we know that, for each ixel recti in the rectified view, we may find a corresonding ixel in the original view by = f recti ( recti ), (45) where f recti is the rectification function as in (3). Combining the above two equations, we get ˆ = f & recti ( recti ), (46) where f & recti = f ( f recti ). Then the intensity of ˆ in the distorted view can be assigned to the ixel recti in the rectified view. As the stereo setu is fixed for one conferencing session, the looku table for f & recti can be constructed once beforehand. It is easy to see that the combined oeration f & recti has the same on-line calculation load (intensity assignment) as the distortion correction rocess f. 5.3. Combining Z-zooming, de-rectification, and the comositor By combining (38)with(33), weget Y = λd λ Z Z ZT D D. (47) In the comositor, to comosite the uniform manmade environment with the teleresences, another transformation T como is alied on D to get the final dislayed view ixel final.as T como is invertible, we could have that D = T como final. (48)

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 079 Integrating the above two equations, we get ( λ Y D = λ Z Z ZT D ) T como final. (49) Thus, excet for an extra 3 by 3 matrix multilication, the comutation occurring in this equation is the same as that for the comosition. This means that if the view reconstruction is integrated into the whole VIRTUE rocessing chain, then only the X- interolation and Y-extraolation will take time, reducing the rocessing time further. The rectification ste and the de-rectification (including Z-zooming) ste have been combined into the other stages as shown above. 6. EXPERIMENTS AND COMPARISON Based on the imlementation from the one otimal uniform scale factor idea described in Section5.., weexeri- mented our algorithm not only on the stereo sequences from the final VIRTUE configuration, but also on synthetic stereo views with ground-truth disarity mas. Moreover, to show the advantage of our roosal, we comared our algorithm with two other aroaches (the trifocal tensor and the visual hull). 6.. Exeriments To verify the quality (both geometry validity and visual quality for temoral and satial continuity) and seed of our imlementation, we erformed exeriments on the imlementation of our algorithm using various stereo sequences coming from the VIRTUE setu (using cameras (referred to as C L )and(referredtoasc R )) (see Figure ). 6.. Geometry validity To justify the geometry validity, a third camera C 3 (caturing real view V 3 ) was arbitrarily ut between C L and C R. C L, C R, and C 3 are all calibrated at the same time (Figure 0). For this setu, we catured three sequences simultaneously (from C L, C R,andC 3, resectively) of a scene containing a talking erson. After removing the background, correcting the distortion, and estimating the disarity, we choose C D = C 3 in the view reconstruction rocess to reconstruct the novel view V D from V L and V R. V D and V 3 are comared with each other to check the geometry validity. The three sequences have 30 frames each. In Figure, we show all stes of the view reconstruction for the last frame (note that all the examle ictures shown in revious sections are also based on this frame). For better visibility, all images in this figure have been cut and only the interesting arts are ket. The rectification has been done together with the distortion correction. The Z-zooming has been combined with the de-rectification ste. Further, using the area containing only the erson as a mask, we calculate the difference image between V 3 and V D together with the histogram of the difference in Figure. Figure 4 shows another examle. In this examle, the two cameras in the stereo setu have been rotated a bit C 3 C L C R (a) A recording rocess for the exeriment. To view cm Chair 3 00 cm 7 cm Front view (b) The detailed configuration of the setu. 45 cm Figure 0: The real scenario of a data caturing rocess together with the detailed descrition of the setu configuration for exerimenting the view reconstruction algorithm. The camera C 3 (3) is served as a virtual viewoint to verify the final reconstruction result. (Figure 3). With this setu, we may reduce the rectified image size to further save some rocessing time. Further, to reduce the influence of the disarity ma on the final results, we also synthesized several novel views from a air of synthetic stereo views. The stereo airs are made by ray tracing from a 3D scene that contains three sheres and a flat background. Four real images are maed to the surfaces of the four objects, resectively. All ixels of the background have disarity value 0. This means that the background is at infinity. The left and right cameras are located at [ 0 0] and [ 0 0], resectively. We emloyed the two groundtruth disarity mas to reconstruct novel views located at [ 0 0], [ 0 0], [ 0], and [ 0], resectively (Figure 5). From Figure 5, we can clearly get a feeling of viewoint changing. Another air of synthetic stereo views are roduced in Figure 6 following the same stes described above. In contrast to the synthetic air in Figure 5, the stereo data in Figure 6 contains large occlusion areas. The exeriment on it reveals the inevitable roblem of the current aroach that some arts of the 3D scene may be visible in the desired novel view but they are not resent in the stereo air we have. Currently, we simly fill these areas by linear interolation. Artifacts may aear because of this. More advanced texture analysis techniques [50] havetobe emloyed. 3

080 EURASIP Journal on Alied Signal Processing (a) Original left view V L. (b) Original right view V R. (a) Difference image. 3 0 5.5 (c) Rectified left view V recti L. (d) Rectified right view V recti R..5 0.5 (e) Left disarity ma D LR. (f) Right disarity ma D RL. 0 300 00 00 0 00 00 300 (b) Histogram. Figure : The difference image, between the view V 3 and V D in Figure considering only ixels within the area containing the talking erson, together with its histogram are shown here to check the consistency of the reconstructed view with the real-erceived view at the novel viewoint. To view Front view (g) X-interolated view V X. (h) Y-extraolated view V Y. cm 3 7 cm 45 cm 3 00 cm Chair (i) Reconstructed view V D. (j) Real destination view V 3. Figure 3: Setu with rotate cameras for the exeriments in Figure 4. The advantage of this configuration is that the rectified stereo views may get smaller dimensions to ease the memory access load. Figure : All intermediate and final results from our multi-ste view reconstruction algorithm are listed above together with the disarity data. They are based on a air of stereo frames ((a) and (b)) from the setu shown in Figure 0. 6.. View adatation quality To test the visual quality of the view reconstruction, the 5th, 0th, 5th, 0th, 5th, and 30th frame of the stereo sequences acquired above are selected for exeriment in two scenarios:

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 08 (a) Original left view V L. (b) Original right view V R. (a) Original left view V L. (b) Original right view V R. (c) Rectified left view V recti L. (d) Rectified right view V recti R. (c) Left disarity ma D LR. (d) Right disarity ma D RL. (e) Left disarity ma D LR. (f) Right disarity ma D RL. (g) X-interolated view V X. (h) Y-extraolated view V Y. (e) Novel view at (,0,0). (f) Novel view at (, 0, 0). (i) Reconstructed novel view V D. (j) Real destination view V 3. Figure 4: All intermediate and final results from our multi-ste view reconstruction algorithm are listed above together with the disarity data. They are based on a air of stereo frames ((a) and (b)) from the setu shown in Figure 3 with a different talking erson as the articiant. (g) Novel view at (,,0). (h) Novel view at (,, 0). Figure 5: Novel views at difference oses reconstructed from the synthetic stereo air (a) and (b) by our multi-ste algorithm based on the ground-truth disarity mas (c) and (d). Clear feeling of viewoint movement can be erceived. () Fixed viewoint. The same as above, let C D = C 3 to fix the view oint for all frames. This is intended to check the temoral continuity. () Dynamic viewoint. The view oint C D moves along a circle with the diameter coincident with the baseline b from [.5 0. 0.] to[.5 0. 0.] in the WCS described in

08 EURASIP Journal on Alied Signal Processing Table : The average time needed for the imlementation based on the one otimal uniform scale factor idea on a Pentium III 550 MHz. (a) Original left view V L. (c) Left disarity ma D LR. (e) Novel view at (,0,0). (b) Original right view V R. (d) Right disarity ma D RL. (f) Novel view at (, 0, 0). Function Time Frame size Total time X-interolation 7 ms 0 8 54 ms Y-extraolation 7 ms YUV 4 : : X-interolation 0 ms 56 484 Y-extraolation 0 ms YUV 4 : : 0 ms Table : The average time needed for the imlementation based on the two D transfer stes idea on a Pentium III 550 MHz. Function Time Frame size Total time X-transfer 45 ms 0 8 90 ms Y-transfer 45ms YUV4:: X-transfer ms 56 484 Y-transfer ms YUV4:: 4 ms Results for these two tests are both shown in Figure 8. 6..3 Seed To investigate the seed of the view reconstruction art, our algorithm has been evaluated on a Pentium III 550 MHz. The average execution time for the X-interolation and Y- extraolation stes is shown in Table. The average execution time for the X-transfer and Y-transfer stes is shown in Table. In the final VIRTUE system the CCIR60 standard (70 576) YUV4 will be used. Since the rocessing time is roughly linear to the number of ixels to be rocessed, we estimated from Table that for this format our imlementation based on the one otimal uniform scale factor idea is feasible for real-time rocessing (5 fs). However, since other rocessing have to be done as well (e.g., disarity estimation, head tracking, network handling), based on the system analysis, we are exloring the arallel rocessing ability, which is inherent in the realization for the X-interolation and the Y-extraolation, on the final dedicated hardware Trimedia. (g) Novel view at (,,0). (h) Novel view at (,, 0). Figure 6: Novel views at difference oses reconstructed from the synthetic stereo air (a) and (b) by our multi-ste algorithm based on the ground-truth disarity mas (c) and (d). It reveals the inevitable roblem of the current aroach that some arts of the 3D scene may be visible in the desired novel view but they are not resent in the stereo air we have. Section 4.3. At the same time the view direction continuously changes for each different frame. This is used for checking the satial continuity of the view oint. 6.. Comarison Within the VIRTUE consortium, we also tried the trifocal tensor aroach [5]. Using the same examle data and disarity mas as in Figure, the reconstructed novel views both from our multi-ste algorithm and from the trifocal tensor aroach are shown in Figure 7. Excet from the fact that the reconstructed view from our algorithm is a bit sharer, they are comarable with each other. However, under the same condition, on the Pentium III 550 MHz PC, the trifocal tensor imlementation costs in average more than 40 ms for one frame. This is more than twice the average time shownin Table that our algorithm costs. Further, since

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 083 Figure 7: The reconstructed novel view from our multi-ste algorithm (left) comared with that from the trifocal tensor aroach (right) using the same inut data and system configuration as in Figure. Excet that the reconstructed view from our algorithm is a bit sharer, they are comarable with each other. the final dedicated hardware Trimedia is not good at floating oint calculation, with Trimedia 33 (33 MHz), using the same data as that for the above PC evaluation, the trifocal tensor imlementation costs more than 000 ms for one frame. However, in our algorithm several small looku tables can be built in advance. So in the multi-ste view reconstruction rocess only integer addition and memory access are erformed, reducing the total reconstruction time to 50 ms er frame. The arallel rocessing ability for the X- interolation and Y-extraolation is now exlored on four Trimedias to achieve the real-time requirement. It would be interesting to comare the visual results and time figure resented here to those reorted in [9]and[30]. In [9], it is declared that their imlementation is already real-time, but therein advanced hardware (four 600 MHz PCs and one dual 933 MHz Pentium III PC) are needed to guarantee the high seed, and the image dimension is much smaller than what we use. On the other hand, it is already found that the visual hull of an object does not match the object s exact geometry and in articular it cannot reresent concave surface regions [3]. Because of this it may encounter difficulty for reconstructing the comlex facial exressions and human gestures. Strong geometry distortions may be erceived from the novel view of a erson with certain gesture as shown in [30]. 7. CONCLUSIONS The fixed-viewoint exeriments with both the original setu (Figures and 8) and the rotated setu (Figure 4) show that the reconstructed novel views are comarable with the real views erceived at the virtual viewoint. Our algorithm reroduces the viewoint in good aroximation, minimizing the visual rojective distortion. The dynamicviewoint exeriment shown in Figure 8 illustrates the continuity of the satial change of the viewoint, which is crucial for roviding the motion arallax cue. The overall visual quality is good. However, there are still some artifacts along the object border and fingers. These are mainly caused by inaccuracies of the estimated disarity fields and of occlusion areas for which we filled in by linear interolation (Figure 6, comared with Figure 5). The quality of the final dislayed view can be further imroved in the comositor but is not shown here. Since only still images can be shown in this aer, more extensive video results can be seen at htt://wwwict.its.tudelft.nl/ bangjun/ublications.html#viewsynthesis. The view reconstruction can be done in real time on a owerful Pentium rocessor or multile dedicated hardware like Trimedias in favor of the inherent arallel rocessing ability. Our contribution is two-fold () for the first time a theoretical analysis of the interolation based view reconstruction aroach is given and extended to the general case; () otimization and imrovement of the quality of the roosed multi-ste algorithm are considered in a ractical alication (in this case VIRTUE). The roosed algorithm together with its imlementation can be integrated tightly with the whole VIRTUE rocessing chain to seed u the system. Further, since our algorithm only requires simle hardware such as Trimedia for real-time rocessing, the comutation ower of, for examle, the main rocessor can be used for other tasks. For our algorithm roosed in this aer, both the multile scale factors extension and the two D transfer stes scheme would be interesting toics for the future VIRTUE++ system. Further, currently our algorithm rocesses video sequences frame er frame indeendently. It would be interesting if we could use the temoral information. In this way, only udated content would be reconstructed and thus most comutations could be omitted. APPENDICES A. WORKING PRINCIPAL OF VIRTUE VIRTUE roject aims at constructing a virtual cooeration environment in which a three-arty teleconference system becomes reality. In VIRTUE, you are at a meeting table with eole saced around in front of you. You are able to communicate with them effectively as if they are sitting next to you in the same room, in fact you are led to believe they are resent in the same room, but they are actually located at several remote locations. To achieve the above goal, seven key comonents should mainly be investigated and develoed: () high accurate camera calibration; () dynamic eye tracking; (3) realistic wide view synthesis for dynamic scenes; (4) virtual and real scene fusion; (5) tele- audio/video transmission; (6) real-time rocessing latform; (7) human factors investigation. Suorted by those techniques, the final VIRTUE system will feature: semi-immersive dislay with life-size head and torso images; camera views for multile articiants; integrated visual environment for multile articiants; comression and multilex layer.

084 EURASIP Journal on Alied Signal Processing (a) Left view of frame 5. (b) Right view of frame 5. (c) Fixed view of frame 5. (d) Dynamic view of frame 5. (a) Left view of frame 0. (b) Right view of frame 0. (c) Fixed view of frame 0. (d) Dynamic view of frame 0. (3a) Left view of frame 5. (3b) Right view of frame 5. (3c) Fixed view of frame 5. (3d) Dynamic view of frame 5. (4a) Left view of frame 0. (4b) Right view of frame 0. (4c) Fixed view of frame 0. (4d) Dynamic view of frame 0. (5a) Left view of frame 5. (5b) Right view of frame 5. (5c) Fixed view of frame 5. (5d) Dynamic view of frame 5. (6a) Left view of frame 30. (6b) Right view of frame 30. (6c) Fixed view of frame 30. (6d) Dynamic view of frame 30. Figure 8: The results of our multi-ste algorithm obtained from selected frames (5th, 0th, 5th, 0th, 5th, and 30th) of a air of stereo sequences (columns and ) containing 30 frames with viewoint fixed (column 3) or changed from frame to frame (column 4).

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 085 To fulfill the semi-immersive requirement, an integrated environment should be built for all articiating conferees. For this urose, techniques from comuter grahics [] can be emloyed to construct a uniform virtual environment. Then this virtual background can be combined with the tele-resences of the remote conferees to roduce at each site a local conference atmoshere for the local articiant. The tele-resences of the conferees should be erceived differently for each articiating conferee, in corresondence with his (her) viewoint. This means a motion arallax cue need to be rovided for all the articiants [5]. To rovide the motion arallax cue, an adative viewoint vision system is develoed. This system enables all conferees to exerience a look-around feeling, and at the same time rovide realistic eye to eye contact. Secifically, in VIRTUE we have four cameras at each site to suort a 3-arty teleconference. In the virtual meeting sace (Figure ), if, for instance, at the local site we have to reconstruct a novel view of conferee C, we reconstruct it by only using the broadcasted stereo views coming from cameras and at remote site. The images from these two cameras are most similar to what conferee A sees of conferee C. In the same way conferee B is reconstructed at the local site from cameras 3 and 4. This aroach has three advantages: () ease the bandwidth requirements; () facilitate the stereo corresondence estimation (smaller baseline); (3) suort straightforward extension to 4-arty or even morearty teleconference session. B. FORWARD MAPPING VERSUS BACKWARD MAPPING Assume that, for a arallel setu, the stereo images can be described by two functions I L (x, y)andi R (x, y), resectively. The objective is to construct a virtual middle view I M (x, y) situated just at the middle of the stereo air. Forward Maing Assumtion We have the disarity ma d L (x, y)basedon the left image to the right image Interolating Middle View Procedure For every ixel in the left image I M (x d L (x, y)/,y) = ( I L (x, y)+i R (x d L (x, y),y) ) / End For descrition Algorithm B. This rocess is called Forward Maing, since it is going from known to unknown [, 47]. During this rocess, the visibility roblem due to over-maing should be solved searately. Further, some ost-rocessing ste must be adoted to fill in the holes at which osition no value was assigned. An alternative is Backward Maing from unknown to search in known [47]. Backward Maing Assumtion We have the disarity ma d M (x, y)basedon middle image to the right image. Interolating Middle View Procedure For every ixel in the middle image I M (x, y) = ( I L (x + d M (x, y),y)+i R (x d M (x, y),y) ) / End For descrition Algorithm B. This backward maing rocess is very straightforward. And, unless subixels are taken into account, no extra comutation is needed. It has been widely used in image transformations such as rotation, waring, and distortion correction. However, many authors have ostulated that it is very difficult to aly this technology to view reconstruction [, 34] due to the non-invertibility of the geometric transformation equations. ACKNOWLEDGMENTS The authors are thankful to artners HHI, BT, Sony UK, and HWU in the VIRTUE consortium for valuable discussions and relevant contributions. REFERENCES [] J. Lengyel, The convergence of grahics and vision, IEEE Comuter, vol. 3, no. 7,. 46 53, 998. [] A. Watt, 3D Comuter Grahics, Addison-Wesley, New York, NY, USA, 3rd edition, 000. [3] Z. Zhang, Image-based geometrically-correct hotorealistic scene/object modeling (IBPhM): a review, in Proc. 3rd Asian Conference on Comuter Vision (ACCV 98),. 340 349, Hong Kong, January 998. [4] O. Faugeras, Three-Dimensional Comuter Vision: A Geometric View, MIT Press, Cambridge, Mass, USA, nd edition, 993. [5] S. F. El-Hakim, Three-dimensional modeling of comlex environments, in Videometrics and Otical Methods for 3D Shae Measurement, vol. 4309 of SPIE Proceedings, SanJose, Calif, USA, January 00. [6] Z. Zhang, Image-based modeling of objects and human faces, in Videometrics and Otical Methods for 3D Shae Measurement, vol. 4309 of SPIE Proceedings, San Jose, Calif, USA, January 00. [7] H.-Y. Shum and S. B. Kang, A review of image-based rendering techniques, in IEEE/SPIE Visual Communications and Image Processing (VCIP 000),. 3, Perth, Australia, June 000. [8] P. E. Debevec, Pursuing reality with image-based modeling, rendering, and lighting, in Keynote Paer for the nd Worksho on 3D Structure from Multile Images of Large-Scale Environments and Alications to Virtual and Augmented Reality (SMILE), Dublin, Ireland, June 000.

086 EURASIP Journal on Alied Signal Processing [9] VIRTUE, Euroean IST roject IST-999-0044, 000 003, htt://93.3.58.09/index.html. [0] B. J. Lei and E. A. Hendriks, Middle view stereo reresentation an efficient architecture for teleconference with handling occlusions, in Proc. IEEE International Conference on Image Processing, November December 00. [] U. R. Dhond and J. K. Aggarwal, Structure from stereo a review, IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 6,. 489 50, 989. [] D. Scharstein, View Synthesis Using Stereo Vision, vol. 583 of Lecture Notes in Comuter Science (LNCS), Sringer-Verlag, New York, NY, USA, 999. [3] L-Q. Xu, A. Loffler, P. J. Sheard, and D. Machin, True-view videoconferencing system through 3-D imression of teleresence, BT Technical Journal, vol. 7, no., 999. [4] N. L. Chang and A. Zakhor, Intermediate view reconstruction for three-dimensional scenes, in Proc. International Conference on Digital Signal Processing, vol.,. 636 64, Nicosia, Cyrus, July 993. [5] S. M. Seitz, Image-based transformation of viewoint and scene aearance, Ph.D. thesis, University of Wisconsin-Madison, Comuter Sciences Deartment, October 997. [6] H. Sawhney and S. Ayer, Comact reresentations of videos through dominant and multile motion estimation, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 8, no. 8,. 84 830, 996. [7] J. Shade, S. Gortler, L.-W. He, and R. Szeliski, Layered deth images, in SIGGRAPH 98,. 3 4, Orlando, Fla, USA, July 998. [8] N. L. Chang and A. Zakhor, A multivalued reresentation for view synthesis, in Proc. IEEE International Conference on Image Processing, Kobe, Jaan, October 999. [9] C.-F. Chang, G. Bisho, and A. Lastra, LDI tree: A hierarchical reresentation for image-based rendering, in SIGGRAPH 99,. 9 98, Los Angeles, Calif, USA, August 999. [0] S. Baker, R. Szeliski, and P. Anandan, A layered aroach to stereo reconstruction, in IEEE Conference on Comuter Vision and Pattern Recognition,. 434 44, June 998. [] E. H. Adelson and J. R. Bergen, The lenotic function and the elements of early vision, in Comutational Models of Visual Processing, M.LandyandJ.A.Movshon,Eds.,.3 0, MIT Press, Cambridge, Mass, USA, 99. [] K. Yamada, T. Ichikawa, T. Naemura, Aizawa K., and T. Saito, High-quality stereo anorama generation using a threecamera system, in SPIE Visual Communications and Image Processing (VCIP 000), vol. 4067,. 49 48, Perth, Australia, June 000. [3] S. E. Chen, Quicktime-VR an image-based aroach to virtual environment navigation, in SIGGRAPH 95,. 9 38, Los Angeles, Calif, USA, August 995. [4] M. Aggarwal and N. Ahuja, On generating seamless mosaics with large deth of field, in Proc. 5th International Conference on Pattern Recognition, vol.,. 588 59, Barcelona, Sain, Setember 000. [5] S. Peleg, M. Ben-Ezra, and Y. Pritch, OmniStereo: Panoramic stereo imaging, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 3, no. 3,. 79 90, 00. [6] H. Youichi, K. I. Anjyo, and K. Arai, Tour into the icture: Using a sidery mesh interface to make animation from a single image, in Comuter Grahics (SIGGRAPH 97 Proceedings),. 5 3, Los Angeles, Calif, USA, August 997. [7] D. Svedberg and S. Carlasson, Calibration, ose and novel views from single images of constrained scenes, in Proc. th Scandinavian Conference on Image Analysis,. 8, Kangerlussuaq, Greenland, USA, June 999. [8] H.W.Kang,S.Y.Pyo,K.Anjyo,andS.Y.Shin, Tourintothe icture using a vanishing line and its extension to anoramic images, Comuter Grahics Forum, vol. 0, no. 3,. 3 4, 00. [9] W. Matusik, C. Buehler, and L. McMillan, Polyhedral visual hulls for real-time rendering, in Proc. the th Eurograhics Worksho on Rendering,. 5 5, London, UK, June 00. [30] Y. Wexler and R. Chellaa, View synthesis using convex and visual hulls, in BMVC 00, Manchester, UK, 00. [3] R. Szeliski, Raid octree construction from image sequences, CVGIP: Image Understanding, vol.58,no.,. 3 3, 993. [3] W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, Image-based visual hulls, in Proc. ACM SIGGRAPH 000,. 369 374, New Orleans, La, USA, July 000. [33] S. Laveau and O. D. Faugeras, 3-D scene reresentation as a collection of images, in Proc. International Conference on Pattern Recognition, vol.,. 689 69, Jerusalem, Israel, 994. [34] S. Avidan and A. Shashua, Novel view synthesis by cascading trilinear tensors, IEEE Transactions on Visualization and Comuter Grahics, vol. 4, no. 4,. 93 306, October December 998. [35] A. Shashua and L. Wolf, On the structure and roerties of the quadrifocal tensor, in Proc. the Euroean Conference on Comuter Vision (ECCV), Dublin, Ireland, June 000. [36] S. E. Chen and L. Williams, View interolation for image synthesis, in SIGGRAPH 93,. 79 88, 993. [37] S. M. Seitz and C. R. Dyer, View morhing, in SIGGRAPH 96, New Orleans, La, USA, August 996. [38] B. J. Lei and E. A. Hendriks, Reviewing camera calibration techniques, Tech. Re. ict-00-0, Information and Communication Theory Grou, ITS, TUDelft, Aril 000, htt://www-ict.its.tudelft.nl/ bangjun/doc/ Camera Calibration s.zi. [39] S. Rauthenberg, A. Graffunder, U. Kowalik, and P. Kauff, Virtual sho and virtual meeting oint two rototye alications of interactive services using the new multimedia coding standard MPEG-4, in Conference on Comuter Communication, Tokyo, Jaan, Setember 999. [40] A. Redert, E. Hendriks, and J. Biemond, Corresondence estimation in image airs, IEEE Signal Processing, vol. 6, no. 3,. 9 46, 999. [4] P. Kauff, N.Brandenburg,M.Karl,andO.Schreer, Fasthybrid block and ixel recursive disarity analysis for realtime alications in immersive tele-conference scenarios, in Proc. WSCG 00, 9th Int. Conference on Comuter Grahics, Visualization and Comuter Vision, Plzen, Czech Reublic, February 00. [4] O. Schreer, N. Brandenburg, S. Askar, and P. Kauff, Hybrid recursive matching and segmentation-based ostrocessing in real-time immersive video conferencing, in Proc. VMV 00, Vision, Modeling and Visualization 00, Stuttgart, Germany, November 00. [43] A.Fusiello,E.Trucco,andA.Verri, Acomactalgorithmfor rectification of stereo airs, Machine Vision and Alications, vol., no.,. 6, 000. [44] S. M. Seitz and C. R. Dyer, Physically-valid view synthesis by image interolation, in Proc. Worksho on Reresentation of Visual Scenes,. 8 5, Cambridge, Mass, USA, 995. [45] B. J. Lei and E. A. Hendriks, View synthesis for VIRTUE, Tech. Re. ict-00-03, Information and Communication Theory Grou, TUDelft, August 000, htt://wwwict.its.tudelft.nl/ bangjun/ublications.html#technical. [46] A. F. Bobick and S. S. Intille, Large occlusion stereo, International Journal of Comuter Vision, vol. 33, no. 3,. 8 00, 998.

Real-Time Multi-Ste View Reconstruction for a Virtual Teleconference System 087 [47] G. Wolberg, Digital Image Waring, IEEE Comuter Society Press, Los Alamitos, Calif, USA, 990. [48] M. Leonard, Comuting visibility without deth, UNC Technical Reort TR95-047, University of North Carolina, 995. [49] M. M. Oliveira and G. Bisho, Relief textures, UNC Comuter Science Technical Reort TR99-05, University of North Carolina, March 999. [50] L.-Y. Wei and M. Levoy, Texture synthesis over arbitrary manifold surfaces, in SIGGRAPH 00,. 355 360, Los Angeles, Calif, USA, August 00. [5] F. Isgrò, E. Trucco, and L. Q. Xu, Towards teleconferencing by view synthesis and large-baseline stereo, in Proc. the IAPR/IEEE International Conference on Image Analysis and Processing, Palermo, Italy, 00. [5] O. Schreer and P. Sheard, VIRTUE the ste towards immersive tele-resence in virtual video-conference systems, in ework and ebusiness 000, Madrid, Sain, October 000. B. J. Lei received his B.S. in comuter software and M.S. in arallel network comuting from Xi an Jiaotong University, China in 995 and 998, resectively. He is now working towards a Ph.D. in the Information and Communication Theory Grou at the Technical University of Delft, The Netherlands. His main research interests are in feature extraction, camera calibration, corresondence estimation, and novel view reconstruction. Currently he is involved in the EU framework V VIRTUE roject. E. A. Hendriks received his M.S. and Ph.D. degrees from the University of Utrecht in 983 and 987, resectively, both in hysics. In 987, he joined the Electrical Engineering Faculty of Delft University of Technology as an Assistant Professor. In 994 he became a member of the Information and Communication Theory of this faculty and since 997 he is heading the comuter vision section of this grou as an associate rofessor. His interest is in low-level image rocessing, image segmentation, stereoscoic and 3D imaging, motion and disarity estimation, and real time alications.