VIRTUE The step towards immersive telepresence in virtual video-conference systems



Similar documents
Eye-contact in Multipoint Videoconferencing

Template-based Eye and Mouth Detection for 3D Video Conferencing

Immersive Medien und 3D-Video

Immersive 3-D Video Conferencing: Challenges, Concepts, and Implementations

Telepresence for Deep Space Missions Project

2D & 3D TelePresence

real time in real space TM connectivity... SOLUTIONS GUIDE

Video Conferencing Display System Sizing and Location

A Prototype For Eye-Gaze Corrected

VIDEO CONFERENCING SYSTEMS: TELEPRESENCE AND SELECTION INTERVIEWS.

Eye Contact in Leisure Video Conferencing. Annick Van der Hoest & Dr. Simon McCallum Gjøvik University College, Norway.

The changing face of global data network traffic

REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH- RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

Guiding principles for the use of videoconferencing in ESL programming

EDUCATION THROUGH VIDEO CONFERENCING - AN OVERVIEW

Intuitive Navigation in an Enormous Virtual Environment

Perception-based Design for Tele-presence

SOPHISTICATED COLLABORATION MADE SIMPLE. TELSTRA ivision AUDIO VISUAL ROOM SOLUTIONS

Product Specifications

Cisco TelePresence TX1300 Series

How To Understand Eye Contact In An Immersive Group Telepresence

Very Low Frame-Rate Video Streaming For Face-to-Face Teleconference

Internet Desktop Video Conferencing

3D Te l epr e s e n c e

VIDEOCONFERENCE. 1 Introduction. Service Description Videoconferece

3D U ser I t er aces and Augmented Reality

One-Way Pseudo Transparent Display

Introduction to Videoconferencing

GTS VIDEOCONFERENCE. Powered by: Valid from: 1 June 2014

Virtual Environments - Basics -

The Benefits of a Telepresence Platform

White Paper on Video Wall Display Technology in Videoconferencing HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date

A Study of Immersive Game Contents System Design and Modeling for Virtual Reality Technology

INFITEC - A NEW STEREOSCOPIC VISUALISATION TOOL BY WAVELENGTH MULTIPLEX IMAGING

White Paper. Interactive Multicast Technology. Changing the Rules of Enterprise Streaming Video

Video Conferencing Glossary

Implementation of Video Voice over IP in Local Area Network Campus Environment

Video Conferencing Infrastructure and Endpoints

Sony 3D TelePresence. The new dimension.

NETWORK ISSUES: COSTS & OPTIONS

A General Framework for Tracking Objects in a Multi-Camera Environment

Multilingual Videoconferencing Solutions Benchmark

Table 1 lists the features and benefits of the Cisco TelePresence TX9000 Series.

FET PRESENCE Brainstorm on November 13, 2003: some views.

Video Conferencing Glossary

LifeSize Videoconferencing. LifeSize Videoconferencing. Concept. Preparing for a Videoconference

WHITE PAPER Personal Telepresence: The Next Generation of Video Communication VIDYO

BT Conferencing. Videoconferencing. User Guide

VidyoPanorama SOLUTION BRIEF VIDYO

BUILDING TELEPRESENCE SYSTEMS: Translating Science Fiction Ideas into Reality

Universiti Teknologi MARA. ANALYSIS THE PERFORMANCE OF VIDEO CONFERENCING BASED ON QUALITY OF SERVICE (QoS) Nor Hayaty binti Amran

Video Conferencing System Buyer s Guide

A POLYCOM WHITEPAPER Polycom Enhances Its Portfolio with Support of the Telepresence Interoperability Protocol (TIP)

High Definition (HD) Technology and its Impact. on Videoconferencing F770-64

FREQUENTLY ASKED QUESTIONS ABOUT DOLBY ATMOS FOR THE HOME

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

PROJECT WORKPLACE DEVELOPMENT

Internet based manipulator telepresence

CSCW & SOFTWARE ENGINEERING

Scope of Work and Techical Specifications

Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation

Low-resolution Image Processing based on FPGA

Research Article ISSN Copyright by the authors - Licensee IJACIT- Under Creative Commons license 3.0

Preparing Your IP Network for High Definition Video Conferencing

An introduction to videoconference teaching

How To Run A Visual Communications Deployment In The Cloud

Cisco Telepresence Implementation for Telekom s Corporate Requirements

Space Perception and Binocular Vision

3D sound in the telepresence project BEAMING Olesen, Søren Krarup; Markovic, Milos; Madsen, Esben; Hoffmann, Pablo Francisco F.; Hammershøi, Dorte

FIRST YEAR PROJECT SUMMARY

Video Conferencing Glossary of Terms

The Limits of Human Vision

Graphic Design. Location: Patterson Campus - Bldg. K. Program Information. Occupational Choices. Average Full-Time Wage.

Introducing VEGA. Communications made easy. Video Communications. Skype For Business & Collaboration. Audio Visual Solutions

FPGA area allocation for parallel C applications

VIRTUAL VIDEO CONFERENCING USING 3D MODEL-ASSISTED IMAGE-BASED RENDERING

Transcription:

VIRTUE The step towards immersive telepresence in virtual video-conference systems Oliver SCHREER (HHI) 1 and Phil SHEPPARD (British Telecom) 2 1 Heinrich-Hertz-Institut, Einsteinufer 37, D-10587 Berlin, Germany Tel: ++49 30 31002 620, Fax: ++49 30 3927200, Email: schreer@hhi.de 2 British Telecommunications plc, BT Adastral Park, Martlesham Heath, UK-IP5 3RE Tel: +44 1473 644352, Fax: +44 1473 644649, Email: phil.j.sheppard@bt.com Abstract. We present the challenging project VIRTUE, which will supersede current video conferencing systems. The essential features of this virtual video conferencing system, aiming at a convincing impression of immersive telepresence, are presented. To demonstrate what telepresence in video conferencing means the main properties of such a system are described together with the approaches to achieve the illusion of telepresence. We discuss the technological challenges of the project, which aims to advance the state of the art in a number of component discipline areas, in terms of either proposing novel paradigms, approaches and algorithms or pushing forward the frontiers of the current approaches. Finally, we give first results of the work, which end in the requirements specification of the advanced real-time demonstration system. It will be designed and constructed to achieve a 3-way telepresence video conferencing system supporting life-sized upper body video images in a shared virtual environment. 1. Introduction The IST project VIRTUE has been started at the beginning of the year 2000 in the context of the 5 th framework programme. It will develop the innovative technology necessary to produce a convincing impression of presence in a semi-immersive teleconferencing system. The consortium consists of British Telecom, Sony, Heriot Watt University, TU Delft, TNO (Human Factors Research Institute, NL) and HHI. We will propose the main concept of VIRTUE and first results of the project. Figure 1: An artist's impression of a future multi-party video conferencing system

The convincing impression of presence can be explained as follows (see Figure 1): You are at a meeting table with people spaced around in front of you. You are able to communicate with them effectively as if they are sitting next to you in the same room, in fact you are led to believe they are present in the same room - but they are actually located at several remote locations. For many types of meeting this high-realism telepresence conferencing system will replace the need to travel. Technologies in computer vision and graphics have developed to a position where this vision is achievable. VIRTUE will make this happen in Europe. VIRTUE will investigate the human factors involved in maximising the effectiveness and realism of telepresence, and to use them to drive the design of a final demonstrator. In this article we discuss what telepresence means in the context of virtual video conferencing. We follow with a brief overview about the technical challenge of the project and the final demonstrator which will be presented at the end of the project. Then, we introduce the first results of the project with respect to the system definition. Finally, we conclude and give a short outlook to the forthcoming issues in the project. 1.1 Essential features VIRTUE aims to create a multiparty meeting space where virtual and real worlds are seamlessly combined so that participants at different locations can have the impression that they are sitting next to each other and can work co-operatively. The participants will view each other from the proper perspective, with realistic eye contact, and the individual viewpoint will change appropriately as people move around within the common meeting environment. Specifically the final system will incorporate the following features: Feature 1: Semi-immersive display with life-size head and torso images enabling: Accurate reproduction of facial expressions and effective body language Balance of power (the relative size of people's video image does not give unfair psychological dominance) Feature 2: Camera views for multiple participants enabling: Eye-to-eye contact and gaze awareness Spatial awareness (consistent spatial positioning of people in the environment) Directed body language (i.e. who am I pointing to?) Moveable viewpoint (motion parallax or look-around effect) Feature 3: Integrated visual environment for multiple participants enabling: Participants to feel as if they are sitting around the same "virtual" table Correct perspective (participants around the virtual table will appear in correct proportions relative to their position around the table) Harmonisation of visual parameters (contrast, illumination, shadow etc.) of video objects from different locations combining them in a common virtual environment. 2. What Does Telepresence Means? The idea of telepresence in virtual video conferencing is to see the other geographically distributed conferees as they sit in the same meeting room around a joint table and appear naturally in size and perspective. In order to realise such an impression a lot of different aspects in designing the system has to be considered. The main principle to guarantee a realistic perspective view for each participant is the principle of the shared table. One major constraint for the shared table is predefined symmetrically distributed positions of the participants around the table.

That means, that each conferee has to receive the corresponding view according to his position at the table. The positions for three participants are given in Figure 2 on the right. Pos. A Pos. B Pos. C Figure 2: Different views of conferee at Pos. A and the corresponding positions of the participants To make different views of a conferee available more than one camera is required. On the left side of Figure 2, a display positioned on the conference table is shown, on which three cameras are mounted providing three different views. In Figure 3, the virtual camera for the conferee at Pos. A is shown. The position of the virtual camera coincides with the position of the conferee and has to be adapted due to a movement of the persons head. virtual camera for Pos. A Pos. B head position Pos. C Figure 3: Synthesis of the virtual views of the conferee at position A A very important issue in video conferencing systems providing telepresence is the eye contact between the real and virtual participants [1][2]. As the cameras can not be posed at the virtual position of the participants due to the size and geometry of the display, new techniques of generating intermediate views have to be applied. If two neighboured camera views are given and a 3D analysis i.e. disparity analysis, has been done, a novel view synthesis is possible [3][4][5][6]. In Figure 4, the stereo views of a conferee are shown. Neither the left camera nor the right camera provides the direct view to establish eye contact. But in the middle image of Figure 5, the synthesised middle view is given and the impression of eye contact can be stated.

Figure 4: Stereo views of a conferee (a) (b) (c) Figure 5: Synthesised views at positions left (a) and right (c) of the available views and the middle view, which provides eye contact ( b) A further property of a convincing telepresence system is to provide motion parallax. This allows movement of the viewpoint during the conference, while the system offers the correct perspective view of the other conferees. We call this the look-around effect. In Figure 5, the views of the left and right cameras are depicted in images (a) and (c) in order to show the capability of motion parallax. 3. The Technical Challenge Achieving the previously mentioned vision, VIRTUE focus on the following three issues: 3.1. Develop new technologies We will investigate practical and efficient solutions and new algorithms, most importantly 3D analysis, wide viewpoint synthesis from multiple cameras that deals with occlusion, object segmentation from non-uniform backgrounds, motion tracking of arms and hands and 3D scene composition of natural and graphical video objects. The software to be developed has to fulfil real-time constraints achieved by optimising the algorithms. The system architecture should be scalable in order to increase the capability of usage in a wide range of conference situations, i.e. different display resolutions, number of participants. 3.2. Build a technology demonstrator To show the efficiency of the developed algorithms a real-time demonstrator will be built. It is a 3-way telepresence conference system containing a large flat display supporting lifesized upper body video images in a shared virtual environment.

The geographically separate sites will be connected using broadband IP network. 3.3. Investigate the human factors involved A continuous line of human factors studies accompanies the project to take the user acceptance into consideration. In the definition phase these studies will drive the system design in order to achieve the best technical solution in conjunction with maximising the effectiveness and realism of telepresence. The final demonstrator will be subject to user testing and human interaction experiments. Related to the state-of-the-art in recent video conferencing systems [6], the technical challenges of the VIRTUE project can be summarised in the following properties (see Table 1): Table 1: Technical challenge of the VIRTUE project related to recent systems State-of-the-Art (PANORAMA etc) Head and shoulder view only Restricted point-of-view (inter-occular & relatively small camera separation) Point-to-point Natural content only Uniform background for segmentation "Presence" not fully understood or characterised Implied fixed camera and display configuration VIRTUE Head and torso view including held objects Wide point-of-view supporting full lookaround capability and flexible positioning of participants Scalable multi-location Integrated natural, synthetic and graphical content Unconstrained environment Development and application of "presence" metrics System model abstracts the camera and display configuration 4. First Results of The Project In Figure 6, a mock-up of the final demonstrator shows the different components of the system. It consists on a audio-visual input-output module, i.e. cameras and microphones for the input and the flat-panel display and the surround sound speakers for the output. A vision analysis subsystem performs a number of complex tasks including depth estimation and object segmentation. The video, depth and audio information is coded and transmitted over a broadband IP network to the other remote locations. On the receiver side, a rendering module combines the video, depth and audio information of the other conferees into a common virtual environment. The remote participant as natural video object is composed with synthetic objects, e.g. table, clock, into the virtual environment. The first major deliverable of the project has been the Requirements Specification. For VIRTUE to succeed it is vital that user requirements are considered from the outset. Indeed a stated objective of the project is to investigate the human factors involved in maximising the effectiveness and realism of telepresence, and to use them to drive the design of the system. The Requirements Specification forms the foundation of future technical design activities.

Flat-panel display Virtual environment (synthetic objects) Camera Remote participant (natural video) Virtual table Speaker Local participant Real table Microphone Document sharing Figure 6: Mock-up of the final demonstrator It concentrates on specifying system design parameters from a user perception standpoint but it also incorporates commercial and technical constraints so that the end system will have a feasible route to commercial exploitation. The requirements have been broken down into 10 areas: User Interaction Workspace Video Audio Software Hardware Network Assessment Terminal cost These cover in detail a wide range of requirements on the system to support effective and realistic telepresence meetings. It is worth noting from this work that there are a very large number of factors that contribute towards creating a convincing impression of presence - but that any one of these factors on its own could weaken the impression of presence if not implemented to a sufficient quality. To illustrate this two examples from the report are given below: Distance user to screen and screen size. It has been shown in human factors tests that larger images of people create a greater impression of presence [7] - lifesize images are the goal of VIRTUE. However, in order to view multiple people at lifesize around a virtual meeting table a sufficiently large field of view must be provided for the user. With a flat screen display device a larger field of view can be achieved by sitting the user closer to the screen and/or increasing the size of the screen. However, too close a viewing distance can become uncomfortable - at least 0.8 m was found to be needed with a near-immersive display, but more than 1 m was preferred. This minimum viewing distance is a major factor determining the required screen size and hence also the camera configuration. Gaze awareness. As described earlier in this paper the VIRTUE system will provide eye-to-eye contact for all participants by rendering interpolated views from the correct angle. However, in order to fit multiple people around a virtual table displayed on a limited size screen it is desirable to be able to display people non-symmetrically around the table. Initial experiments on sensitivity to gaze deviation has revealed that a gaze deviation up to 3 degrees is tolerable in face-to-face communication. Larger deviations (between 5 and 10 degrees) are tolerable when the participant is looking elsewhere as long as the represented eye gaze deviation is in the direction of the observer. Such experimental evidence gives us quantifiable targets for the design of the VIRTUE system.

5. Conclusion In this paper, we have presented the key challenges of the IST project VIRTUE. It focuses on providing immersive telepresence in virtual video conferencing. This telepresence will be achieved by realising many different aspects in the design of the complete system. The most important are providing motion parallax, eye contact and life-size images. This results in new algorithms and approaches which have to be investigated in this project. It concentrates on specifying system design parameters from a user perception standpoint but it also incorporates commercial and technical constraints so that the end system will have a feasible route to commercial exploitation. In that respect investigation of the trade off between current technical capabilities and the challenging goal of immersive telepresence in virtual video conferencing, will be a key aspect of VIRTUE. In parallel with its technical activities VIRTUE is considering a variety of ways to commercially exploit the technologies being developed. There are a range of applications within telecommunications, broadcasting, games and engineering which could make use of the VIRTUE technologies. In terms of the primary goal of high-realism videoconferencing a VIRTUE-like system would initially be a premium solution for the top end of the market where it is expected to greatly enhance the effectiveness of communication. We anticipate however that: A VIRTUE-like solution will provide a "showcase" that can be used to drive uptake of a range of conferencing solutions. The cost of signal processing, display technology and bandwidth will continue to fall, improving the prospects for a much wider deployment - eventually one can envisage remote workers being equipped as standard with a VIRTUE-like system. 6. Acknowledgement This work is supported by the IST program of the EC under proposal No. IST-1999-10044. We would like to thank our partners represented by John Stone (Sony, UK), Emanuele Trucco (Heriot-Watt-University, UK), Emile Hendriks (TU Delft, NL) and Peter Werkhoven (TNO, NL) for their contributions to this project. References [1] L. Mühlbach, M. Böcker, A. Prussog: Telepresence in Videocommunications: A Study on Stereoscopy and Individual Eye Contact, Human Factors 37, No. 2, pp.290-305, 1995. [2] A. Suwita, M. Böcker, L. Mühlbach, D. Runde: Overcoming Human Factors Deficiencies of Videocommunications Systems by Means of Advanced Image Technologies, Displays, 17, pp.75-88, 1997. [3] D. Scharstein: Stereo Vision for View Synthesis, IEEE Conf. On Computer Vision and Pattern Recognition, San Francisco, pp. 852-858, June 1996. [4] E. Chen, L. Williams: View Interpolation for Image Synthesis, Proc. Siggraph 93, ACM Press, New York, pp.279-288, 1993. [5] S. Avidan, A. Shashua: Novel View Synthesis in Tensor Space, Proc. of Int. Conf. on Computer Vision and Pattern Recognition, pp.1034-1040, June 1997. [6] J.-R. Ohm, K. Grüneberg, E. Hendriks, E. Izquierdo M., D. Kalivas, M. Karl, D. Papadimatos, A. Redert: "A Realtime Hardware System for Stereoscopic Videoconferencing With Viewpoint Adaptation", Image Communication, Special Issue on 3D Technology, January 1998. [7] J.S. Angiolillo, H.E. Blanchard, E.W. Israelski, A. Mané: Technology constraints of video-mediated communication In K. Finn, A. Sellen & S. Wilbur (Eds.), Video-mediated communication (pp. 51-74). Mahwah, NJ: Lawrence Erlbaum Associates, 1997.