PROJECT REPORT. CSE 527 : Introduction to Computer Vision. Nafees Ahmed :

Similar documents
VIRTUAL TRIAL ROOM USING AUGMENTED REALITY

Master Thesis Using MS Kinect Device for Natural User Interface

Kinect Interface to Play Computer Games with Movement

Fall Detection System based on Kinect Sensor using Novel Detection and Posture Recognition Algorithm

Removing Moving Objects from Point Cloud Scenes

TRENTINO - The research, training and mobility programme in Trentino - PCOFUND-GA

Next Generation Natural User Interface with Kinect. Ben Lower Developer Community Manager Microsoft Corporation

Robotics. Chapter 25. Chapter 25 1

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

The 3D rendering pipeline (our version for this class)

Boneshaker Framework and Games

Contents. Introduction Hardware Demos Software. More demos Projects using Kinect Upcoming sensors. Freenect OpenNI+ NITE + SensorKinect

C# Implementation of SLAM Using the Microsoft Kinect

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

Abstract. Introduction

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

Binocular Vision and The Perception of Depth

Introduction.

Space Perception and Binocular Vision

Real Time Skeleton Tracking based Human Recognition System using Kinect and Arduino

Spatio-Temporally Coherent 3D Animation Reconstruction from Multi-view RGB-D Images using Landmark Sampling

Head-Coupled Perspective

Character Animation from 2D Pictures and 3D Motion Data ALEXANDER HORNUNG, ELLEN DEKKERS, and LEIF KOBBELT RWTH-Aachen University

Automated Recording of Lectures using the Microsoft Kinect

A Short Introduction to Computer Graphics

Colorado School of Mines Computer Vision Professor William Hoff

How does the Kinect work? John MacCormick

Teaching Methodology for 3D Animation

Top 10 Business Intelligence (BI) Requirements Analysis Questions

SHOOTING AND EDITING DIGITAL VIDEO. AHS Computing

PROPOSED SYSTEM FOR MID-AIR HOLOGRAPHY PROJECTION USING CONVERSION OF 2D TO 3D VISUALIZATION

DESIGN OF A TOUCHLESS USER INTERFACE. Author: Javier Onielfa Belenguer Director: Francisco José Abad Cerdá

Advanced Methods for Pedestrian and Bicyclist Sensing

INTERNSHIP REPORT CSC410. Shantanu Chaudhary 2010CS50295

Tracking Densely Moving Markers

Create 3D Videos in VideoWave The Easy Way

Articulated Body Motion Tracking by Combined Particle Swarm Optimization and Particle Filtering

Mobile Robot FastSLAM with Xbox Kinect

ICT Perspectives on Big Data: Well Sorted Materials

Immersive Medien und 3D-Video

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

ANALYZING A CONDUCTORS GESTURES WITH THE WIIMOTE

Human Motion Tracking for Assisting Balance Training and Control of a Humanoid Robot

Multi-Kinect Tracking for Dismounted Soldier Training

Tracking in flussi video 3D. Ing. Samuele Salti

The Scientific Data Mining Process

THE MS KINECT USE FOR 3D MODELLING AND GAIT ANALYSIS IN THE MATLAB ENVIRONMENT

Character Animation Tutorial

How To Use Trackeye

Gaze is not Enough: Computational Analysis of Infant s Head Movement Measures the Developing Response to Social Interaction

Creating Smarter, More Interactive Apps and Systems with Computer Vision

CS 4204 Computer Graphics

WHITE PAPER. Are More Pixels Better? Resolution Does it Really Matter?

GestPoint Maestro3D. A White Paper from GestureTek The Inventor of 3D Video Gesture Control

4 G: Identify, analyze, and synthesize relevant external resources to pose or solve problems. 4 D: Interpret results in the context of a situation.

CHAPTER 6 TEXTURE ANIMATION

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Robust and Automatic Optical Motion Tracking

Robust and accurate global vision system for real time tracking of multiple mobile robots

Accuracy of joint angles tracking using markerless motion system

Motion Activated Camera User Manual

Application-Centric Analysis Helps Maximize the Value of Wireshark

Mobile Multimedia Application for Deaf Users

Privacy Preserving Automatic Fall Detection for Elderly Using RGBD Cameras

Introduction to Robotics Analysis, Systems, Applications

Inferring Body Pose without Tracking Body Parts

WIRELESS BLACK BOX USING MEMS ACCELEROMETER AND GPS TRACKING FOR ACCIDENTAL MONITORING OF VEHICLES

3D Arm Motion Tracking for Home-based Rehabilitation

Wii Remote Calibration Using the Sensor Bar

Mouse Control using a Web Camera based on Colour Detection

Florida 4-H Consumer Choices Study Topics. Student Guide. Video Game Systems. Introduction and Background

A General Framework for Tracking Objects in a Multi-Camera Environment

Current California Math Standards Balanced Equations

Motion Activated Video Surveillance Using TI DSP


Robot Perception Continued

An Iterative Image Registration Technique with an Application to Stereo Vision

Kinect Gesture Recognition for Interactive System

3D Pose Tracking of Walker Users Lower Limb with a Structured-Light Camera on a Moving Platform

Exergaming: Video Games as a form of Exercise

How To Use A Kinect To Measure A Teacher'S Work In School

Communicating Agents Architecture with Applications in Multimodal Human Computer Interaction

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

Development of 3D Image Manipulation Software Utilizing the Microsoft Kinect

FSI Machine Vision Training Programs

CS231M Project Report - Automated Real-Time Face Tracking and Blending

Image Synthesis. Transparency. computer graphics & visualization

Model-Based 3D Human Motion Capture Using Global-Local Particle Swarm Optimizations

6 Space Perception and Binocular Vision

Activity recognition in ADL settings. Ben Kröse

Motion Capture Sistemi a marker passivi

Car Racing Game. Figure 1 The Car Racing Game

HD Capture Box Nano HD900 SAFETY PRECAUTIONS. Before using the HD Capture Box please ensure that you read and understand the safety precautions below.

Ping Pong Game with Touch-screen. March 2012

Research Investments in Large Indian Software Companies

Body-Controlled Trampoline Training Games Based on Computer Vision

Computational Geometry. Lecture 1: Introduction and Convex Hulls

SimFonIA Animation Tools V1.0. SCA Extension SimFonIA Character Animator

3D Modeling, Animation, and Special Effects ITP 215x (2 Units)

Transcription:

PROJECT REPORT CSE 527 : Introduction to Computer Vision Nafees Ahmed : 107403294

Abstract The problem of skeleton reconstruction is an integral part of gesture driven computer interfaces where the input is driven by human body movement. Gesture recognition and classification requires identification of features and for that, in most cases the first step is to reconstruct human skeleton structure from the given input. Depending upon the type of input setup the problem can vary into many different dimensions. In this project, we concentrate on the issue of self-occlusion of human body when we try to capture the motion from a single perspective environment. As a solution to this problem, we propose an application specific data model driven approach for reconstructing occluded skeleton joints and also asses its performance compared to the already existing ones. Computer interfaces driven by human gestures rely on the features driven by skeletal structures reconstructed from human body captures. Depending upon the type of capture, this reconstruction problem is posed in many different ways. Reconstruction can be done from a single image, from multiple images over continuous time or from different perspectives using multiple cameras. How truthfully can we extract and recreate human skeletal form depends upon the amount of information there is about the human structure and also what we seek from those input information. We can choose to reconstruct 2D skeleton from a single/multiple images or 3D skeleton from single/multiple perspectives. Figure 1 shows an example of 3D reconstruction from single image.

Figure 1 3D reconstruction from images When we do 3D skeleton reconstruction from source image/videos, one of the major obstacles is self-occlusion of human body. If we only rely on the images/videos then for reconstruction, the only information we have in our hand is what the camera sees from its predefined position. Due to the fixed position of the camera, it can happen that not all parts of the body will be clearly visible, rather in most cases some articulated movements present images where hand/leg or part of the body is occluded and ambiguously placed. The problem is less prominent when we have multi perspective setup where we take pictures from many different locations, making the system able to have insight into positions which otherwise was occluded. Also, presence of depth image provides more information towards true reconstruction. In recent times, Microsoft has introduced Kinect, which is commercially available and comparatively very low priced solution of monocular depth image perception. Kinect utilizes IR patterns and IR sensors to produce a low spatial resolution depth perception of objects in-front of the sensor in reflective manner. Introduction of this cheap depth sensor has produced many opportunities in both gaming and useful computer interaction tool development driven by human motion. Figure 2 shows a Kinect device and a simple setup.

Figure 2 (Top Left) Kinect Device (Top Right) Simple Kinect Setup (Bottom) Skeleton Tracking using Kinect Since Kinect is a single perspective device, skeleton reconstruction from the depth image faces the standard problem of occlusion. Now, given only a single frame and no other information, any skeleton reconstruction algorithm will try best to fit a skeleton into the frame with some joints identified fully and the rest with some error range. To understand what sort of problem is faced when the reconstruction algorithm only utilizes the image data, we take into count a specific application. We consider tracking of skeleton in the game of cricket where Kinect is used to do reconstruction for skeletons of Batsman and Bowlers. Cricket is considered here because of the posture of batsman and bowlers during the gameplay which make reconstruction from a single perspective device really hard because of partial self-occlusions. Figure 3 shows a setup of a real cricket game. Figure 5,6 show example the possible motions generated by players in real life.

Figure 3 : Snapshot from a Cricket game Figure 4 : One of the many possible bowling motions, looking from a side. Figure 5 : Several examples of batting motion. Identification of shot requires identification of both hands, legs, wrists and also shots can be played all around 360

Now, if we want to track the skeleton of a human using Kinect by placing it in-front of a batsman, in most cases, due to occlusion, the reconstruction will be incomplete and hence getting the right position and movement of the bat will be tough. For example, we take into count a specific batting position as shown in figure 6. Figure 6 Front-foot defense Occluded Limbs Visible Limbs Figure 7 Skelton Reconstruction by OpenNI using Kinect Figure 7 shows the output of the standard skeleton reconstruction based on only the image captured by kinect for the shot. The lines in gray identify the joints with reconstruction confidence level less than 1.0. To account for this problem, we consider the fact that, we are not utilizing all the information we have in our hand. The reconstruction algorithm is only utilizing the input depth image from the sensor and only using that to produce the human skeleton joint positions. But, like this

scenario, if we know beforehand that this reconstruction is purely for the purpose of tracking the motion of a cricket batsman with a specified range of shots, then, the unconcluded joints should provide a very good cue about where the occluded joints should be. To derive such probabilistic values of occluded joints, we need to have a model that is built upon the batting motion captures for many different shots and interpolate the unknown position from that. In this project we try to explore such possibilities and show an example reconstruction of a batting motion. The framework is summarized in the following diagram (Figure 8). The system works in two phases. Model Construction Phase: In this phase, we allow the batsman to play a lot of shots in a controlled environment. We capture the motions and keep the poses that have full confidence level. Using these known values (joints relative positions, orientation, velocities etc.) as features we construct a model. User Tracking Phase: In this phase, we use kinect in the standard setup. At each capture, first we use OpenNI to reconstruct the skeleton. From that skeleton we identify which joints came from occluded field of view using the confidence level. Then, using the unoccluded joint positions as input, we interpolate the most probable value of the occluded joint positions. Merging these two sets of values, we produce the final skeleton.

Separate occluded joints from visible joints Filter to extract full skeletons with complete confidence value Approximate occluded joints using model Compute Model Parameters Merge visible joints with approximated joints MODEL Figure 8 Framework

The performance of the system relies a great deal upon the kind of model construction adopted. Many different approaches can be taken and many features can be considered. In this project, we test a very simple linear system and compare it with some trivial models. The models considered are listed below, Model 0 : The Zero Model In case there is a occluded joint. Replace the values with (0,0,0). Model 1 : Last Known Position Whenever the system captures a joint with full confidence, store it in the model. Anytime the system fails to provide with confident joint position, replace it with the value from model. Hence, it keeps the occluded limb in the last seen location. Model 2 : Last Known Orientation Almost same as the previous one, but here, the system only stores the last valid orientation and drives the occluded joint with that value. Model 3 : Linear Interpolation We build this model with this principle in mind Given the orientation and relative positions of the un-occluded joints, it is possible to interpolate the most probable values of orientations for occluded joints given a proper model So, during model construction phase, we capture joint positions for different batsman postures, compute relative orientations for different joints with respect to its parent body part in the Scene graph of human body structure and store them. During tracking phase, we resolve the occluded joints by first finding the un-occluded joint orientation values. Then search for the nearest two points in the high dimensional space of the model. Then do a linear interpolation between those points for the value of the occluded joint.

Model 0 : The Zero Model Model 1 : Last Known Position

Model 2 : Last Known Orientation Model 3 : Linear Interpolation

The results clearly show that if trained with proper data and driven by flexible enough model, the reconstruction algorithm has the opportunity to provide better skeletal structure given a specific application. In this project, we showed one such model to improve upon present reconstruction method. But there can be many different approaches for model construction and occluded joint interpolations. Each might provide advantage than the other in specific settings and applications. As for future work, we intend to explores such models.