A Prototype For Eye-Gaze Corrected



Similar documents
Real-time Visual Tracker by Stream Processing

Color Segmentation Based Depth Image Filtering

Introduction to Computer Graphics

Real Time Target Tracking with Pan Tilt Zoom Camera

A Short Introduction to Computer Graphics

Computer Graphics Hardware An Overview

BRINGING 3D VISION TO THE WEB: ACQUIRING MOTION PARALLAX USING COMMODITY CAMERAS AND WEBGL

Object tracking & Motion detection in video sequences

Optimizing AAA Games for Mobile Platforms

SSIM Technique for Comparison of Images

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

WAKING up without the sound of an alarm clock is a

Eye Contact in Leisure Video Conferencing. Annick Van der Hoest & Dr. Simon McCallum Gjøvik University College, Norway.

Basler. Area Scan Cameras

One-Way Pseudo Transparent Display

product overview pco.edge family the most versatile scmos camera portfolio on the market pioneer in scmos image sensor technology

Beyond Built-in: Why a Better Webcam Matters

CHAPTER 6 TEXTURE ANIMATION

OBJECT TRACKING USING LOG-POLAR TRANSFORMATION

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

Water Flow in. Alex Vlachos, Valve July 28, 2010

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Feature Tracking and Optical Flow

Mean-Shift Tracking with Random Sampling

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Digital image processing

1. INTRODUCTION Graphics 2

High Performance GPU-based Preprocessing for Time-of-Flight Imaging in Medical Applications

Consolidated Visualization of Enormous 3D Scan Point Clouds with Scanopy

Embedded Vision on FPGAs The MathWorks, Inc. 1

Tracking and integrated navigation Konrad Schindler

From Product Management Telephone Nuremberg

balesio Native Format Optimization Technology (NFO)

Whitepaper. Image stabilization improving camera usability

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

ROBOTRACKER A SYSTEM FOR TRACKING MULTIPLE ROBOTS IN REAL TIME. by Alex Sirota, alex@elbrus.com

A Method for Controlling Mouse Movement using a Real- Time Camera

Introduction to GPGPU. Tiziano Diamanti

Dynamic Resolution Rendering

Automotive Applications of 3D Laser Scanning Introduction

Real-Time Tracking of Pedestrians and Vehicles

CS231M Project Report - Automated Real-Time Face Tracking and Blending

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

Motion Capture Sistemi a marker passivi

Design of Multi-camera Based Acts Monitoring System for Effective Remote Monitoring Control

Perception-based Design for Tele-presence

Modelling 3D Avatar for Virtual Try on

Introduction. C 2009 John Wiley & Sons, Ltd

3D Vehicle Extraction and Tracking from Multiple Viewpoints for Traffic Monitoring by using Probability Fusion Map

BUILDING TELEPRESENCE SYSTEMS: Translating Science Fiction Ideas into Reality

T-REDSPEED White paper

Visual-based ID Verification by Signature Tracking

A Noise-Aware Filter for Real-Time Depth Upsampling

Impedance 50 (75 connectors via adapters)

A System for Capturing High Resolution Images

Announcements. Active stereo with structured light. Project structured light patterns onto the object

ARC 3D Webservice How to transform your images into 3D models. Maarten Vergauwen

Automatic Labeling of Lane Markings for Autonomous Vehicles

Measuring Video Quality in Videoconferencing Systems

Real-Time Depth-Image-Based Rendering for 3DTV Using OpenCL

GeoImaging Accelerator Pansharp Test Results

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.

GPGPU Computing. Yong Cao

NVIDIA VIDEO ENCODER 5.0

To determine vertical angular frequency, we need to express vertical viewing angle in terms of and. 2tan. (degree). (1 pt)

CCTV - Video Analytics for Traffic Management

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

How To Fuse A Point Cloud With A Laser And Image Data From A Pointcloud

Interactive Offline Tracking for Color Objects

Real-Time 3D Reconstruction Using a Kinect Sensor

Computer Graphics. Introduction. Computer graphics. What is computer graphics? Yung-Yu Chuang

Digital Camera Imaging Evaluation

Demo: Real-time Tracking of Round Object

VIRTUE The step towards immersive telepresence in virtual video-conference systems

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

An automatic system for sports analytics in multi-camera tennis videos

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Interactive Level-Set Segmentation on the GPU

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Basler pilot AREA SCAN CAMERAS

MVTec Software GmbH.

3D U ser I t er aces and Augmented Reality

SkillsUSA 2014 Contest Projects 3-D Visualization and Animation

Software Requirements Specification Report

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

How To Use Trackeye

Transcription:

A Prototype For Eye-Gaze Corrected Video Chat on Graphics Hardware Maarten Dumont, Steven Maesen, Sammy Rogmans and Philippe Bekaert

Introduction Traditional webcam video chat: No eye contact. No extensive context information. Long-term goal: full immersive augmented environment where participants can communicate and cooperate as if they were in the same room.

Overview Related Work System Architecture Preprocessing View Interpolation Joint View/Depth Refinement Movement Analysis Eye Tracking Networking Results Conclusion

Related Work Implemented on commodity CPUs, low framerate [Criminisi et al., 2003] Expensive dedicated hardware [Baker et al., 2002] Unpractical camera setup [Schreer et al., 2001] Optimize parts of the application instead of end-to- end performance: Multi-camera video coding [Chien et al., 2003; Guo et al., 2005] Real-time view synthesis [Yang and Pollefeys, 2003; Geys and Van Gool, 2004; Nozick et al., 2006]

Our Solution Peer-to-peer eye gaze corrected video chat. N input images I 1,, I N are fetched from N cameras C 1,, C N that are closely aligned along the screen. A virtual camera viewpoint is interpolated to restore eye contact. Implemented on the GPU for real-time performance.

System Architecture Four consecutive GPU processing modules: Preprocessing: lens correction and background/foreground segmentation. View Interpolation: interpolate the eye-gaze corrected view. Joint View/Depth Refinement of the interpolated view. Movement Analysis: avoid heavy constraints on the user s movements. Concurrent CPU processing: Eye Tracking Networking Stand-alone processing modules.

Preprocessing Radial distortion for each input image is corrected according to the Brown-Conrady distortion model [Brown, 1966]. with radial distortion radial distortion corrected

Preprocessing Each input image I i is segmented into a binary foreground/background silhouette. The consecutive processing modules will rely on these silhouettes. Background subtraction:

Preprocessing Greenscreening: Very precise silhouettes allow to easily develop the consecutive processing modules and thereby reduce the design space complexity.

View Interpolation Interpolate an image I v (and consistent depth map Z v ) as seen with a virtual camera C v that is positioned behind the screen. The image I v is computed as if the camera C v captured it through a completely transparent screen and is thus eye-gaze corrected.

View Interpolation Plane sweep approach [Yang et al., 2002]. The e3d spaceis discretized into M planes {D 1,, D M } parallel to the image plane of the virtual camera C v.

View Interpolation For each plane D j, every pixel f v of the virtual camera image I v is backprojected on the plane D j and reprojected to the input images I i. For each pixel on each plane Dj, the interpolated t color Ψ and the matching cost Κ are computed and the best color consensus (i.e. minimum cost) is selected. N cameras are used to interpolate the color instead of stereo- interpolation as in [Yang et al., 2002].

View Interpolation However, points on the plane D j that t project outside a foreground silhouette in at least one of the input images are immediately rejected. Levers both: Speed: all further operations are automatically discarded d d by the GPU hardware. Quality: segmentation noise will, with a high probability, not be available in all N cameras.

View Interpolation Result: interpolated eye-gaze corrected image I v and joint depth map Z v. g v j v

Joint View/Depth Refinement Still visually disturbing artifacts. Interpolated image I v and depth map Z v are jointly linked. Errors are even more apparent in the depth map Z v. Detect and restore errors in the depth map Z v, then restore the link between I v and Z v by recoloring Z v.

Joint View/Depth Refinement Two types of errors: Erroneous patches Speckle noise Due to illumination changes, partially occluded areas and natural homogeneous texturing of the human face. patch error noise error

Joint View/Depth Refinement Erroneous patches solution: Naive Gaussian smoothing to remove patches does not work. Photometric outlier detection algorithm that (a) detects and (b) restores erroneous patches in the depth map Z v. Gaussian smoothing outlier detection

Joint View/Depth Refinement Erroneous patch filtering: (a) Centers of patches are detected and morphologically grown from center to border.

Joint View/Depth Refinement Erroneous patch filtering: (b) Patches are filled with reliable depth values from its neighbourhood by reverse morphological grow from border to center.

Joint View/Depth Refinement Speckle noise solution: Large homogeneous texture regions of the human face cause the depth map to contain spatial high frequency speckle noise that can be smoothed with a Gaussian low-pass filter. Gaussian smoothing eliminates the geometrical correctness of the depth map, but enhances perceptual visual quality.

Joint View/Depth Refinement Joint View/Depth Refinement example result.

Movement Analysis Because the position of the user towards the screen is not known, a large depth range has to be scanned. Problems: High probability of mismatches (bad visual quality). Real-time performance is endangered. d

Movement Analysis Solution: limit the effective depth range to narrowly encompass the user s head: Lower probability of mismatches (much better visual quality). Improves real-time performance.

Movement Analysis Problem: a small depth range heavily constraints the user s movements. Solution: dynamically adjust depth range to track the user s y y j p g head and narrowly encompass it at all times.

Movement Analysis How: peak of Gaussian distribution G(μ,σ) fitted on the depth map histogram indicates the position of the user. Dynamically place the depth range around this peak. Three separate cases: Forward: User moves forward and exits the active scanning range. Peak towards the front of the histogram. Stable: User remains stationary. Clear peak in the middle. Backward: User moves backward and exits the active scanning range. Peak towards the back of the histogram.

Movement Analysis Histogram can be efficiently implemented on the GPU. Optimizations: Fewer bins. Fewer samples. Approximated peak location remains virtually it the same = quality vs. complexity tadeoff trade-off.

Eye Tracking The virtual camera C v needs to directly look into the user s eyes at all times to ensure eye contact. t Eye tracking module concurrent on CPU: Face and eye candidates are detected in every input image. Eye candidates are used to triangulate the 3D positions of the eyes. 3D eye position expressed in a coordinate system relative to the screen.

Networking Only the interpolated image I v (instead of N images) and the eye coordinates are send over the network: Minimum network communication allows for real-time speeds over various types of networks.

Results Setup: N = 6 auto-synchronized PGR Grasshopper cameras mounted closely around the screen. Few occlusions No extrapolations Can be integrated into monitor frame (avoid tedious calibration procedures) Still allows real-time processing (as opposed to N=99999)

Results Workload profiling on NVIDIA GeForce 8800 GTX, 800x600 @ 15 Hz cameras. 33 ms processing time for single frame, theoretic speed of 30 fps. Image Download / Readback (54%): Demonstrates t the data locality lit importance, justifies porting all processing to the GPU. Preprocessing (15%) and View Interpolation ti (7%): Computational complexity is linear to N (amount of cameras). Joint View/Depth Refinement (15%) and Movement Analysis (9%): Levers the quality independent of the amount of input images.

Results Although still minor artifacts, results yield high perceptual visual quality. Convincingly seem to be making eye contact.

Conclusion Prototype for eye-gaze correction between two video chat participants: Convenient camera setup. Minimal amount of constraints, large freedom of movement. Achieve real-time performance through GPGPU. High perceptual visual quality. Practical usability. Future work: Improving the movement analysis. Multi-party video conferencing. Interpolate background with correct motion parallax. Create the immersive effect of a virtual window into the world of the other participant!

Demo

Thank you! Questions?