A Multi-Touch Surface Using Multiple Cameras

Similar documents
Mouse Control using a Web Camera based on Colour Detection

PHOTOGRAMMETRIC TECHNIQUES FOR MEASUREMENTS IN WOODWORKING INDUSTRY

3D Scanner using Line Laser. 1. Introduction. 2. Theory

A Method for Controlling Mouse Movement using a Real- Time Camera

Automatic Labeling of Lane Markings for Autonomous Vehicles

Colorado School of Mines Computer Vision Professor William Hoff

Multi-Touch Control Wheel Software Development Kit User s Guide

Optical Tracking Using Projective Invariant Marker Pattern Properties

Self-Calibrated Structured Light 3D Scanner Using Color Edge Pattern

B2.53-R3: COMPUTER GRAPHICS. NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions.

INTRODUCTION TO RENDERING TECHNIQUES

Head-Coupled Perspective

How To Fuse A Point Cloud With A Laser And Image Data From A Pointcloud

Virtual Mouse Using a Webcam

Traffic Monitoring Systems. Technology and sensors

The purposes of this experiment are to test Faraday's Law qualitatively and to test Lenz's Law.

The Scientific Data Mining Process

VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

A Novel Multitouch Interface for 3D Object Manipulation

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

LIST OF CONTENTS CHAPTER CONTENT PAGE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK

Automatic Detection of PCB Defects

Automotive Applications of 3D Laser Scanning Introduction

Poker Vision: Playing Cards and Chips Identification based on Image Processing

Development of an automated Red Light Violation Detection System (RLVDS) for Indian vehicles

Fixplot Instruction Manual. (data plotting program)

HAND GESTURE BASEDOPERATINGSYSTEM CONTROL

Polarization of Light

Classifying Manipulation Primitives from Visual Data

Laser Gesture Recognition for Human Machine Interaction

Augmented Architectural Environments

An Iterative Image Registration Technique with an Application to Stereo Vision

Face detection is a process of localizing and extracting the face region from the

Spatial location in 360 of reference points over an object by using stereo vision

REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING

Static Environment Recognition Using Omni-camera from a Moving Vehicle

MACHINE VISION MNEMONICS, INC. 102 Gaither Drive, Suite 4 Mount Laurel, NJ USA

Implementation of a Wireless Gesture Controlled Robotic Arm


ZEISS T-SCAN Automated / COMET Automated 3D Digitizing - Laserscanning / Fringe Projection Automated solutions for efficient 3D data capture

Integrated sensors for robotic laser welding

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Wii Remote Calibration Using the Sensor Bar

Projection Center Calibration for a Co-located Projector Camera System

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

Robust and accurate global vision system for real time tracking of multiple mobile robots

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

CNC-STEP. "LaserProbe4500" 3D laser scanning system Instruction manual

Removing Moving Objects from Point Cloud Scenes

Interactive Projector Screen with Hand Detection Using LED Lights

Multi-view Intelligent Vehicle Surveillance System

A Study on M2M-based AR Multiple Objects Loading Technology using PPHT

BCC Multi Stripe Wipe

More Local Structure Information for Make-Model Recognition

3D Building Roof Extraction From LiDAR Data

Chapter 5 Objectives. Chapter 5 Input

THE CONTROL OF A ROBOT END-EFFECTOR USING PHOTOGRAMMETRY

Welcome to CorelDRAW, a comprehensive vector-based drawing and graphic-design program for the graphics professional.

A method of generating free-route walk-through animation using vehicle-borne video image

Hand Analysis Tutorial Intel RealSense SDK 2014

VOLUMNECT - Measuring Volumes with Kinect T M

Multi-Touch Ring Encoder Software Development Kit User s Guide

A System for Capturing High Resolution Images

Super-resolution method based on edge feature for high resolution imaging

Real-time 3D Scanning System for Pavement Distortion Inspection

A ROBUST BACKGROUND REMOVAL ALGORTIHMS

Synthetic Sensing: Proximity / Distance Sensors

KINECT PROJECT EITAN BABCOCK REPORT TO RECEIVE FINAL EE CREDIT FALL 2013

Solving Geometric Problems with the Rotating Calipers *

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT

SDC. Schroff Development Corporation PUBLICATIONS. MultiMedia CD by Jack Zecher

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Palmprint Recognition. By Sree Rama Murthy kora Praveen Verma Yashwant Kashyap

House Design Tutorial

Using touch gestures with your SMART Board interactive display frame or SMART Board 6052i interactive display

GelAnalyzer 2010 User s manual. Contents

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

Microsoft Excel 2010 Charts and Graphs

Prof. Dr. M. H. Assal

Indoor Surveillance System Using Android Platform

T-REDSPEED White paper

Virtual Fitting by Single-shot Body Shape Estimation

Graphic Design. Background: The part of an artwork that appears to be farthest from the viewer, or in the distance of the scene.

Using MATLAB to Measure the Diameter of an Object within an Image

A technical overview of the Fuel3D system.

CS231M Project Report - Automated Real-Time Face Tracking and Blending

Anime Studio Debut vs. Pro

Mensch-Maschine-Interaktion 2. Interactive Environments. Prof. Dr. Andreas Butz, Dr. Julie Wagner

Opportunities for the generation of high resolution digital elevation models based on small format aerial photography

ENGINEERING METROLOGY

VRSPATIAL: DESIGNING SPATIAL MECHANISMS USING VIRTUAL REALITY

3D SCANNING: A NEW APPROACH TOWARDS MODEL DEVELOPMENT IN ADVANCED MANUFACTURING SYSTEM

Analecta Vol. 8, No. 2 ISSN

Software Manual. IDEA The Software Version 1.0

Binary Image Scanning Algorithm for Cane Segmentation

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People

Done. Click Done to close the Capture Preview window.

Transcription:

A Multi-Touch Surface Using Multiple Cameras Itai Katz, Kevin Gabayan, Hamid Aghajan Dept. of Electrical Engineering, Stanford University, Stanford, CA 94305 {katz,gabayan,aghajan}@stanford.edu Abstract. In this project we present a framework for a multi-touch surface using multiple cameras. With an overhead camera and side-mounted camera we determine the three dimensional coordinates of the fingertips and detect touch events. We interpret these events as hand gestures which can be generalized into commands for manipulating applications. We offer an example application of a multi-touch finger painting program. 1 Motivation Traditional human input devices such as the keyboard and mouse are not sufficiently expressive to capture more natural and intuitive hand gestures. A richer gestural vocabulary may be enabled through devices that capture and understand more gestural information. Interactive tables that detect fingertip touches and interpret these events as gestures have succeeded to provide a richer input alternative. Touch detection estimates in overhead camera based systems suffer in depth resolution, causing a system to report touches when a user s finger hovers near the touch surface. In our multi-touch surface we strive to build a system that is low-cost, scaleable, and improves existing vision-based touch-table accuracy. Background Multi-touch input devices have been explored by Bill Buxton [1] as early as 1984. Touch position sensing in touchpads is a common laptop input device [], measuring the capacitance created between a user s finger and an array of charged plates under the touch surface. Larger touch sensitive surfaces have been constructed from similar sensor arrays. The Mitsubishi DiamondTouch display sends a unique electrical signal through the seats of each of its users, and an antenna array under the touch surface localizes and identifies unique touches [3].

Fig. 1. Diagram of the multi-touch surface prototype Computer vision techniques for human-computer interface have provided a high bandwidth alternative to traditional mechanical input devices as early as 1969 in Krueger s Videoplace [4]. While cameras provide a high rate of input information, algorithms that efficiently reduce this data to desired parameters are necessary to minimize the interface latency. Modifications to the operating conditions of an optical interface that isolate desirable features can simplify the complexity of the latent image analysis. Methods of touch detection exist that force a finger position to illuminate on contact. The Coeno Office of Tomorrow touch sensitive surface uses a scanning laser diode to produce a thin layer of laser light over the touch sensitive surface [5]. Touches that contact the surface reflect the light, simplifying fingertip localization to the identification of bright spots in an image. Jeff Han revisited touch surfaces based on frustrated total internal reflection (FTIR) in a novel multi-touch sensitive surface. In his implementation, an acrylic pane lined with infrared LEDs acts as an optical waveguide whose property of total internal reflection is disturbed by the mechanical disturbances of finger presses [6]. A camera behind the acrylic pane views the fingertip presses as they illuminate with infrared light, and connected components analysis provides the positions of multiple finger presses. The system does not provide information regarding where fingertips hover or which fingertips belong to which hand [6]. Touch-like gestures may estimate actual surface contact in situations where one may be unable to directly measure fingertip proximity to a surface. The steerable projector and camera in IBM Everywhere Displays senses touch-like gestures performed upon flat surfaces in a room [7]. The SmartCanvas [8] project classifies completed strokes seen from a monocular camera as being a touch or transitional stroke. Pauses performed between strokes in the SmartCanvas system make drawing gestures unnatural yet simplify stroke segmentation [8]. A solution requiring minimal hardware using a stereo pair of cameras and a black paper background is offered by Malik, which performs hand contour analysis for fingertip detection and orientation measurement, and measures fingertip altitude from stereo disparity [9]. The use of hand shadow analysis for the extraction of touch and hover information has been explored by Wilson [10], in which the author subjectively estimated the

best touch detection threshold altitude of the PlayAnywhere system to be 3-4mm [10]. The SmartCanvas paper suggests the use of side cameras aligned with the touch surface to detect touches, but does not explore an implementation [8]. We have chosen to explore the use of side cameras as a means of improving vision-based touch detection accuracy. 3 Overview Our hardware consists of a flat table surface mounted in a PVC frame. Attached to the frame are support struts which house a downward-pointing overhead camera approximately two feet above the surface. A side-mounted camera supported by a tripod is stationed two feet away and is level with the surface. Fig.. The touch surface and camera rig The software consists of a pipeline divided into three layers: image processing, interpretation, and application. By dividing up the system in this fashion, each module can be developed independently without affecting other modules. This method also allows for greater generalization, as different applications can be easily substituted. 4 Image Processing Layer The image processing layer is responsible for converting the raw images into a descriptor suitable for gesture interpretation. This layer consists of image acquisition, hand segmentation, and contour extraction.

4.1 Hand Segmentation After image acquisition, the hand is segmented in both cameras. For the overhead camera, segmentation is required for fingertip detection. For the side-mounted camera, segmentation is required for contact detection. Initially, we assumed that having a uniform, matte background would allow us to implement an extremely simple background subtraction and thresholding algorithm. Further examination showed this to be ineffective for several reasons. First, as the cameras we use are cheap and have poor noise characteristics, the black background regions are exceptionally noisy. Thus, black regions deviate strongly from a learned background, resulting in misclassification. Second, having the hand in close proximity to the table surface results in highly conspicuous shadows, which are also misclassified by the aforementioned algorithm. These drawbacks led us to a more sophisticated method given by Morris [11]. Rather than acquiring multiple background images and noting only the per-pixel mean, we also note the per-pixel standard deviation. Upon receiving an input image, we determine how many standard deviations each pixel is from its corresponding mean. Fig. 3. Mean and standard deviation images Morris [11] observes that the color of a shadow is always the same as the surface it is projected onto. Furthermore, the luminosity of a shadow is always lower than that of the surface. Therefore, we can effectively remove the hand s shadow by applying a luminosity threshold after the segmentation phase. In general, shadow removal is an open problem. With our simplifying assumptions (the uniform background), this simple heuristic works surprisingly well. One difficulty remains when the hand color closely resembles the background color. This occurs when parts of the hand are in shadow, and results in those pixels being falsely classified as background. We solve this problem by supplying additional lighting in the form of lamps attached to the support structure. 4. Contour Extraction The final step in the image processing layer is to find the contour of the hand silhouette. After finding the contour of all foreground blobs, all but the longest contour is eliminated. These eliminated contours correspond to regions where the segmentation algorithm misclassified. Thus the segmentation phase can misclassify portions of the image without adversely affecting the end result.

5 Interpretation Layer The interpretation layer is responsible for converting the pixel information from the previous layer into semantic knowledge to be passed up to the application layer. This layer consists of modules that detect fingertip positions and share information between the two cameras. It also handles events including contact with the table surface, gestures, and occlusions 5.1 Fingertip Detection Initially, every point on the hand s contour is a potential fingertip. The objective of the fingertip detection module is to eliminate irrelevant points. The first step is to identify the palm center as the hand silhouette pixel whose distance to the nearest background pixel is the largest. When inspecting the distance between each contour pixel and the palm center, pixels that do not exhibit a local maximum are ignored as fingertip candidates. The remaining points include the fingertips, but also the knuckles of the thumb, the outside edge of the palm, and any peaks that are the result of noise. To eliminate these false positives, we introduce the following heuristic: a neighborhood of 60 contour points, centered about the point in question is analyzed to determine its local centroid. The distance between the point in question and the local centroid is thresolded. True fingertips will have a characteristically high distance. Fig. 4. Fingertip Detection Finally, we account for a few special cases. Points close to the image border are eliminated, as are multiple fingertip points clustered together. The performance of this module is excellent; the performance of the pipeline to this point seems to be limited by the capabilities of the segmentation phase, as described earlier.

5. Coordinate Transform Once the fingertip positions are acquired in overhead camera coordinates, they must be converted into side-mounted camera coordinates for contact detection. In this section we describe two methods of performing this transformation. The first method is a generalization of the common bilinear interpolation algorithm, extended to non-orthogonal data points. It is assumed that there are four marker points viewable in the overhead camera, denoted Q11, Q1, Q1, and Q, with known x,y values. For each marker point, the corresponding x values in the side-mounted camera is also known. These values are denoted by F(Q11), etc. Fig. 5. Modified bilinear interpolation First the weight for R1 is computed: y R1 L = = F( R1) = + ( yq1 yq11 ) ( xp xq11 ) + y ( xq1 xq 11 ) ( yq1 yq11 ) + ( xq1 xq 11 ) ( y y ) + ( x x ) R1 ( y y ) + ( x x ) Q1 Q11 R1 L L P Q1 Q11 P Q11 F( Q1) F( Q11) (1) After computing the weight for R in a similar manner, interpolate along the y axis: F ( P) = ( yr y P ) F( ( y y ) R R1 ( y P y R1 ) F ( ( y y ) R1) R) + This method works well, but can only interpolate for values of P that lie within the convex hull of the data points. A less restrictive alternative transforms coordinates using the fundamental matrix, at the cost of more complex calibration. To calculate the fundamental matrix, we manually choose 16 corresponding points in the overhead and side images. These points are chosen so as to provide maximal coverage in R R1 ()

the two planes. Although only 8 points are strictly required, doubling the number of correspondences and using RANSAC estimation [1] returns a more accurate matrix. The resulting matrix F can be used with a fingertip point m to determine the epipolar line l in the side image: Fm= l (3) 5.3 Fingertip Association For meaningful gesture interpretation, the position of individual fingers must be tracked through multiple frames. This involves assigning a label to each finger and maintaining the consistency of that label for as long as the finger is visible in both cameras. Note that it is not necessary to assign a label based on the type of the finger (e.g. index finger is assigned label 1, etc.). For each input image, we assign labels by comparing the current frame to the previous frame. For each fingertip in the current frame, we look for the nearest neighbor fingertip in the previous frame. If the hand is in motion, it is possible for fingertips to be misclassified but experimentation has shown this to be a rare occurrence, since the time between frames is generally short. If the number of fingertips between two successive frames is different, new labels are registered or old labels are deleted as necessary. 5.4 Contact Detection By calculating the height of each finger from the touch surface, i.e., its distance along the z-axis, we can define a contact event as when the fingertip is at z=0. A priori, we manually register camera parameters (position, orientation, and principal point) and the endpoints of the table edge in the side camera image. During runtime, we search the camera images to establish a correspondence, then project the corresponding points to localize the fingertip in three dimensions. Our method of contact detection requires that each side camera be placed such that its principal axis lies within the touch surface plane to optimally view the intersection of a fingertip with the touch surface. This camera configuration also simplifies our localization calculations. After fingertips have been detected in the overhead view, corresponding fingertip locations are searched for in the side camera binary image. Using the fundamental matrix between the two views, an epipolar line is defined in the side camera image for each fingertip detected in the overhead view. The corresponding fingertip location is searched for along this epipolar line beginning at the intersection of the epipolar line and the table edge line. Traveling up the epipolar line from this intersection point, a fingertip is detected as the first ON pixel encountered.

The angular offset of any pixel from the principal axis of a camera may be calculated as a function of the principal point, pixel offset, camera field of view, and image dimensions: θ θ offseth offsetv = = ( x x ) principal W ( y y ) principal H θ θ HFOV VFOV (4) (5) Fig. 6. (a) Detected fingertips with associate epipolar lines (b) Detected fingertips labeled with height value where x and y are the coordinates in question, x principal and y principal are the coordinates of the principal point, θ HFOV and θ VFOV are the camera s horizontal and vertical fields of view, and W and H are the image width and height in pixels, respectively. With the fingertip localized in both cameras, the angular offsets of each fingertip are determined for each camera view. The vertical angular offset θ of the fingertip in the side camera view defines a plane T in which the fingertip exists, whose relative normal vector is: n = 0, d tanθ, d (6) where d is an arbitrary constant. The horizontal and vertical angular offsets θ and φ from the overhead camera O at (O x, O y, O z ) project a point P on the touch surface at the coordinates: ( Ox + h tanθ, Oy + h tanφ,0) (7) The fingertip lies along line segment OP at point Q where the line segment intersects with plane T: n Q = n ( S O) ( P O) (8)

where S is the position of the side camera. z O x P Q Fig. 7. Calculation of the fingertip height from the touch surface The error of the measured z-value corresponding to one pixel in the side camera image can be simply calculated with a right triangle, knowing the camera s vertical field of view θ VFOV, image height H in pixels, and the fingertip distance d from the side camera: d tan z = ( θ VFOV / ) ( H / ) θ y S (9) 5.5 Occlusion Detection Occlusion of detected fingertip touches can be expected to occur in any side camera. Touch detection that are occluded by touches occurring nearer to the side camera share the same virtual button position as the occluding touch and should not be evaluated for touch detection. Our coarse approximation of the fingertips as points of contact on a planar surface is convenient for fingertip detection, but it ignores the unique volume that each finger occupies in space. With such an assumption we may report that a detected touch is occluded when another touch is detected along the line between itself and the side camera. The side camera touch positions conveniently correspond to the horizontal angular offset of the detected fingertip. When fingertip coordinates from the overhead camera are transformed to the same angular offset in the side camera the fingertips are evaluated for occlusions. All touches that share the same angular offset except the touch nearest to the side camera is reported as occluded. In reality the volume occupied by each finger provides some width to its profile viewed in a side camera and so occludes touches not perfectly collinear with itself

and the camera position. Touches in the side camera that are a within a threshold angular offset of another touch are evaluated for occlusions. Fig. 6. Occluding touches are approximately collinear with the position of the observing side camera 5.6 Gesture Recognition The touch detection and tracking provides the touch positions current and previous states that may be interpreted as gestures. Our gestures are simply detected as a change of state from the previous frame to the current frame, whose incremental output is well suited for applications like map viewers that are navigated via incremental motions. If only a single touch is detected and the touch has translated more than a threshold distance, a pan command is issued with the detected motion vector as its parameter. If only two touches are detected, the distance between them is measured, and angular orientation is measured by drawing a line between the touch positions and measuring its angle against the positive x-axis of the overhead camera view. If the angular orientation of two touches changes beyond a threshold angle, a rotate command is issued with the angle change as its parameter. If the distance between the two touches increases or decreases beyond a threshold, a zoom-in or zoom-out is performed, respectively. Fig. 7. Single and double-touch gestures that are suitable for a map viewing application 6 Application Layer A simple paint program has been developed that takes advantage of the multitouch output of our interface. It paints a different color on a black background for

each detected fingertip press, allowing a user to freehand draw and leave rainbow colored stripes on the virtual canvas by dragging an open hand across the multitouch surface. An unfilled outline of the user s hand is displayed on the canvas to aid the user in placing strokes in the desired locations on the screen. Fig. 8. A multi-touch paint program screenshot 7 Analysis and Future Work The slow frame rate that is qualitatively under 10Hz introduces a latency between the update of virtual button positions in the side camera image and the true position of the finger, forcing the user to perform their gestures slowly. These issues may be resolved by improvements to the multiple camera image acquisition rate in OpenCV. We encounter false positives of gesture recognition among many true positive identifications which may be resolved through spatio-temporal smoothing. The addition of more side cameras is expected to markedly reduce the number of reported occlusions. A larger surface that can better fit two hands and incorporate more side cameras would benefit from a larger workspace, and would have potential to detect hand ownership of touches. Future work aims to improve the robustness of each component. The current method of segmentation relies on a static background improved implementations will dispense with this requirement, possibly by using adaptive background subtraction. More sophisticated fingertip detection methods could also have great value in eliminating occasional false positives. 8 Conclusion We have demonstrated a multi-touch surface built from commodity hardware that explores the use of a side camera for touch detection. Gestures suitable for mapviewing are recognized and a paint application that takes advantage of multiple touches has been developed.

References 1. Buxton, B., Multi-Touch Systems That I Have Known And Loved, http://billbuxton.com/multitouchoverview.html, accessed March 1, 007. Synaptics Technologies, "Capacitive Position Sensing," http://www.synaptics.com/technology/cps.cfm, visited March 1, 007. 3. Mitsubishi DiamondTouch, http://www.merl.com/projects/diamondtouch/, visited March 1, 007. 4. Levin, G., "Computer Vision for Artists and Designers: Pedagogic Tools and Techniques for Novice Programmers," AI & Society Vol 0, Issue 4 (August 006), pp 46-48. 5. Haller, M., Leithinger, D., Leitner, J., Seifried, T., "Coeno-Storyboard: An Augmented Surface for Storyboard Presentations," Mensch & Computer 005, September 4-7, 005, Linz, Austria. 6. Han, J., "Low-Cost Multi-Touch Sensing through Frustrated Total Internal Reflection," ACM Symposium on User Interface Software and Technology (UIST), 005. 7. Kjeldsen, R., Levas, A., Pinhanez, C., "Dynamically Reconfigurable Vision-Based User Interfaces," 3rd International Conference on Vision Systems (ICVS'03). Graz, Austria. April 003 8. Mo, Z., Lewis, J.P., Neumann, U., "SmartCanvas: A Gesture-Driven Intelligent Drawing Desk System," Proceedings of Intelligent User Interfaces (IUI '05), January 9-1, 005. 9. Malik, S., Laszlo, J., "Visual Touchpad: A Two-handed Gestural Input Device," International Conference on Multimodal Interfaces (ICMI '04), October 13-15, 004.Wilson 10. Wilson, A., "PlayAnywhere: A Compact Interactive Tabletop Projection-Vision System," ACM Symposium on User Interface Software and Technology (UIST), 005. 11. Morris, T., Eshehry, H., Real-Time Fingertip Detection for Hand Gesture Recognition, Advanced Concepts for Intelligent Vision Systems (ACIVS 04), September 9-11, 00, Ghent University, Belgium 1. P. H. S. Torr and D. W. Murray. The development and comparison of robust methods for estimating the fundamental matrix. Intl. J. of Computer Vision, 4(3):71--300, 1997.