Rock, Paper & Scissors with Nao

Similar documents
The Scientific Data Mining Process

Face Locating and Tracking for Human{Computer Interaction. Carnegie Mellon University. Pittsburgh, PA 15213

Face detection is a process of localizing and extracting the face region from the


Vision based Vehicle Tracking using a high angle camera

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

A secure face tracking system

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

Classifying Manipulation Primitives from Visual Data

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Adaptive Face Recognition System from Myanmar NRC Card

Speed Performance Improvement of Vehicle Blob Tracking System

Introduction to Logistic Regression

Colour Image Segmentation Technique for Screen Printing

QUALITY TESTING OF WATER PUMP PULLEY USING IMAGE PROCESSING

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Canny Edge Detection

Analecta Vol. 8, No. 2 ISSN

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Real-time pedestrian detection in FIR and grayscale images

Object Recognition and Template Matching

Mouse Control using a Web Camera based on Colour Detection

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

Face Recognition For Remote Database Backup System

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

A Content based Spam Filtering Using Optical Back Propagation Technique

Mean-Shift Tracking with Random Sampling

Building an Advanced Invariant Real-Time Human Tracking System

Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies

Open Access A Facial Expression Recognition Algorithm Based on Local Binary Pattern and Empirical Mode Decomposition

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT

Subspace Analysis and Optimization for AAM Based Face Alignment

Morphological segmentation of histology cell images

Efficient Attendance Management: A Face Recognition Approach

AUTOMATIC CROWD ANALYSIS FROM VERY HIGH RESOLUTION SATELLITE IMAGES


Overview. 1. Introduction. 2. Parts of the Project. 3. Conclusion. Motivation. Methods used in the project Results and comparison

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

SYMMETRIC EIGENFACES MILI I. SHAH

Music Genre Classification

Local features and matching. Image classification & object localization

Railway Expansion Joint Gaps and Hooks Detection Using Morphological Processing, Corner Points and Blobs

Poker Vision: Playing Cards and Chips Identification based on Image Processing

A Dynamic Approach to Extract Texts and Captions from Videos

Tracking and Recognition in Sports Videos

HAND GESTURE BASEDOPERATINGSYSTEM CONTROL

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Supporting Online Material for

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

Virtual Mouse Using a Webcam

Open-Set Face Recognition-based Visitor Interface System

SIGNATURE VERIFICATION

Spatio-Temporal Nonparametric Background Modeling and Subtraction

LEAF COLOR, AREA AND EDGE FEATURES BASED APPROACH FOR IDENTIFICATION OF INDIAN MEDICINAL PLANTS

Multimodal Biometric Recognition Security System

SHADOW DETECTION AND RECONSTRUCTION OF IMAGES FROM SATELLITE IMAGES: GIS

Project 4: Camera as a Sensor, Life-Cycle Analysis, and Employee Training Program

High-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound

Object Tracking System Using Motion Detection

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

Laser Gesture Recognition for Human Machine Interaction

E27 SPRING 2013 ZUCKER PROJECT 2 PROJECT 2 AUGMENTED REALITY GAMING SYSTEM

FPGA Implementation of Human Behavior Analysis Using Facial Image

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People

Demo: Real-time Tracking of Round Object

Circle detection and tracking speed-up based on change-driven image processing

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

WP1: Video Data Analysis

Relative Permeability Measurement in Rock Fractures

Face Recognition in Low-resolution Images by Using Local Zernike Moments

ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan

Module II: Multimedia Data Mining

Machine Learning Final Project Spam Filtering

The Delicate Art of Flower Classification

Data, Measurements, Features

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

A Method for Controlling Mouse Movement using a Real- Time Camera

Determining optimal window size for texture feature extraction methods

VIRTUAL TRIAL ROOM USING AUGMENTED REALITY

Component Ordering in Independent Component Analysis Based on Data Power

Video Surveillance System for Security Applications

Matt Cabot Rory Taca QR CODES

1 Example of Time Series Analysis by SSA 1

Galaxy Morphological Classification

Multi-Zone Adjustment

Automated Attendance Management System using Face Recognition

International Journal of Innovative Research in Computer and Communication Engineering. (A High Impact Factor, Monthly, Peer Reviewed Journal)

CS231M Project Report - Automated Real-Time Face Tracking and Blending

Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras

Predicting Flight Delays

Automatic Detection of PCB Defects

Tracking of Small Unmanned Aerial Vehicles

Knowledge Discovery from patents using KMX Text Analytics

Making Sense of the Mayhem: Machine Learning and March Madness

A Study of Automatic License Plate Recognition Algorithms and Techniques

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Support Vector Machine-Based Human Behavior Classification in Crowd through Projection and Star Skeletonization

Transcription:

Rock, Paper & Scissors with Nao Nimrod Raiman [0336696] Silvia L. Pintea [6109960] Abstract Throughout this paper we are going to explain our work on teaching a humanoid robot (Nao) to play "Rock, paper & scissors". In order to accomplish this task we have used dierent theoretical methods which are described in the section 2. In Section 3 are presented our experimental results. Finally, we give an overall view of this paper and indicate the possible future work that could be done on this subject. 1 Introduction "Rock, Paper & Scissors" is a game with simple rules and extremely easy to play as a human. This is the reason for which it is interesting to learn a robot how to play it against human players. In order to do that the robot needs to be able to recognize the hands of its opponent and classify the gesture as: "rock", "paper" or "scissors". In this paper we describe our approach to accomplish this in real time. Our solution is fairly robust to lightning condition and also the gestures of the player need not be restricted to certain angles or positions in the frame (given that the gesture is clear enough). Our problem has been split into three main tasks: extracting the hands from the webcam stream, recognizing the gesture of the extracted hand and implementing motion and speech on Nao. Throughout our project we have tried dierent approaches in order to nd the best method to solve the problem. We have experimented with methods such as: backprojection of probability histogram for hand detection, Gabor lters and PCA for feature extraction. We will start by describing the methods that we have tried and then continue by giving an overview of the results and the conclusions. 2 Methods For hand detection and recognition we have employed two dierent techniques: a naive approach and the backproject of the probability histogram of the pixels corresponding to skin. For the gesture recognition we have experimented with dierent sets of data and we have extracted dierent features using methods such as: PCA and Gabor lters. We have also tried using two dierent types of classiers: SVM (support vector machine) and Knn (K nearest neighbors). 2.1 Hands extraction 2.1.1 Naive approach Our rst approach for extracting the area from the webcam stream that corresponds to the hand was to dene thresholds for the hue and the saturation values for the frames received from the webcam. The resulting binary image would contain all pixels corresponding to skin. Then using erosion and dilation noise was reduced. From the resulting binary image we then could easily nd the area corresponding to the hand. This naive approach was not reliable enough because it is very sensitive to objects with a color 1

that is close to skin color in the view of the camera. This approach is also very sensitive to lightning changes (articial light is more blue than sunlight). Finally, the thresholds need to be de- ned very broad to make this work for the majority of skin colors fact which might introduce even more false skin responses. that were connected and chose this as the most probable area to correspond to the hand. Once we had the area that corresponded to 2.1.2 Backprojection of skin pixels After realizing that our rst approach was too naive and instable we needed to create a more robust skin nder. Inspired by [3] and Gijs Molenaar 1 we implemented a skin PDF (probability distribution function) approach. In order to make such a PDF we needed to build a histogram (normalizing the histogram would give a PDF ) of an area that we knew for sure it was representative for the skin color of the person standing in front of the camera. For this we chose to use the face area. Using a frontal face Haar detector we were able to nd the face of the other player and we constructed a histogram from the inside area of the rectangle returned by the face detector. We removed the borders in order to loose as much background and hair as possible and get an even more reliable PDF of the skin. Once we had the histogram, we could backproject it on the image. The resulting image was a gray-scale one showing what areas are more or less probable to correspond to skin (see Figure 1 left image). We rst set the probability of the pixels in the face area to zero, because we did not want this to be interfering with nding the hand in the frame. Then we thresholded this image to get rid of areas with very low probability for the skin. Using rough erosion and dilation we got rid of noise and connected the areas that belonged together. This was done roughly because at this step, we just wanted to nd the area of the hand and process it separately in a more sophisticated manner. From the image obtained in this manner we have selected the biggest blob of pixels 1 AI master student working on the project Sonic gestures Figure 1: Hand detection Example the hand, we went back to the backprojection image. Here we selected the hand area again to do a more sophisticated thresholding, erosion and dilation on it. After dilation a part of the hand area would end up being white corresponding probably to the skin, and another part being black probably not skin. The black area was left unchanged while the white area was replaced by the original gray-scale image. The nal step was to resize the hand image to 70 70 pixels to match the training images. The resulting image was one in which the hand area was selected and most of the background was removed (see Figure 1 right image). 2.2 Gesture recognition For the gesture recognition task we have started by using a training set containing images of hands of 70 70px with dierent backgrounds. The problem proved to be too complex for our classiers so we have decided to switch to a simpler one which would contain only centered hands and a black background. Out of this dataset we have extracted the features to be used during the classication. 2.2.1 PCA The rst technique we have tried for feature extraction was PCA. We have computed the eigen- 2

hands of our data and then we have projected each set, separately, on this space. There are three steps for generating the eigenhands: Each image in the training set was convolved ˆ Subtracting the mean of the data ˆ Computing the covariance of the data and the eigenvectors of the covariance matrix ˆ Projecting each training set (corresponding to each sign: "rock", "paper" or "scissors") on the space dened by the eigenvectors For high-dimensional data a better approach than computing the eigenvectors of the matrix: Data T Data (which for a dataset of [N,D] with D N has the dimension [D,D]) would be to use an intermediate step and compute the eigenvectors V eigh(data Data T ) and then determine the nal eigen-space: U DataT V norm(data T V ). Unfortunately, given the fact that PCA is not background invariant, translation invariant or rotation invariant the results were not as good as we were expecting. 2.2.2 Gabor lters The next technique that we have attempted to use for the feature extraction part was: Gabor wavelets. A Gabor wavelet is created by dening the size of the wavelet to be generated and then by looping over the coordinates of each pixel (x and y) and applying the following formula: g(x, y, λ, θ, ψ, σ, γ) = exp(- x 2 +γ 2 y 2 2σ 2 ) cos(2π x λ + ψ) where x = x cosθ+y sinθ and y = -x sinθ+y cosθ The parameter λ represents the frequency of the stripes in the wavelet, θ gives the angle of the stripes, ψ is a translation parameter, σ gives the size of the stripes while γ indicates how elliptical they are. Figure 2: Gabor wavelet Example with the Gabor wavelets and then reshaped as a single row. The reshaping is necessary in order to have a row in the training matrix corresponding to each data sample (image). In our project we have used 4 Gabor wavelets (with the angle θ {0, Π 4, Π 2, 3Π 4 }). Each image was convolved with each one of these 4 wavelets and then reshaped as a row. All 4 results obtained in this manner were then concatenated and stored in single row for each image. Besides using these 4 Gabor wavelets we have also tried concatenating to the result the original image also reshaped on a single row. Unlike the PCA, this method gave satisfactory results, fact which indicates that it is capable of determining a set of representative features of our training set. 2.2.3 Classication In order to be able to classify the gesture we have tried using both the SVM (support vector machine) and Knn (K nearest neighbors) classiers. Given the fact that the data is not perfectly aligned (hands have slightly dierent positions in the image and dierent angles) the problem seemed to be too hard for the SVM. It was not able to nd a good boundary between the classes. To ensure that our approach was correctly implemented we have created a more aligned training set (the hands had only small dierences in the angles of their orientation [ 20, +20] degrees). For this task the SVM classier was able to perfectly classify the data (it found a good 3

boundary between the classes) and it gave an error of 0.0. Because we did not want to restrict the player to only a small amount of positions and ways in which a sign could be made we have decided not to use SVM. We have experimented also with the Knn classier. This one was able to give accurate prediction even on the raw training images and thus we have decided to employ it for solving our task of gesture recognition. 2.3 Motion & Speech on Nao Nao is a humanoid robot of 58 cm hight, with a processor of 500 MhZ and 3 ngers that move simultaneously. Because of the limited computational power In order to make Nao move and talk we had to generate behavior les using the "Choregraphe" simulator. These les were, then, transfered to Nao's memory and then we had to remotely connect to Nao using its IP address and execute one of the predened behaviors that was needed at that specic moment in the game. The hand-detection and gesture recognition parts needed to be run in parallel with Nao's motion and speech behaviors such that it would be able vocalize the outcome of the game by comparing it's choice of move with the player's one. In order to accomplish this we had to implement multi-threads. 3 Results We have trained our models using images of hands with dierent positions and orientations. We have used: ˆ 1774 training examples for the "rock" sign ˆ 1747 training examples for the "paper" sign ˆ 1445 training examples for the "scissors" sign Figure 3: Nao Aldebaran we had to run our software on a laptop that was connected to Nao. All decisions were made by the software that was running on the laptop and the resulting interaction was sent and executed by Nao. Because the ngers can only be moved simultaneously we had to represent "paper" as hand open, "rock" as hand closed and "scissors" by closing and opening the hand a few times. Besides using images of 70 70 pixels we have also experimented with images scaled to 20 20 pixels. This idea proved to be useful because the slight dierence in the positions of the hands in the images (the hands were not always centered) was considerably reduced and thus the need for translation invariant features was not as important as before. Also the fact that the images had smaller sizes has helped reduce the time required for the sign recognition task. The results are obtained using 50fold cross validation. In Table 1 are depicted the average errors we have obtained while testing our methods. We can notice from here that the PCA tends to perform worst than the other methods. It can also be observed that the Gabor wavelets have a high prediction accuracy and that the best results are obtained when using raw grayscale images of the size 20 20px. 4

Size Method Average Error 70 70 PCA 0.475 20 20 PCA 0.470 20 20 Gabor 0.046 20 20 PCA on Gabor 0.510 20 20 Gabor & Image 0.039 20 20 PCA on 0.447 (Gabor & Image) 70 70 Grayscale 0.030 20 20 Grayscale 0.020 Table 1: Average errors for dierent methods 4 Conclusions Throughout this paper we have presented the methods we have employed in order to make a humanoid robot (Nao) play "Rock, paper & scissors" against a human opponent. Our experiments indicate the fact that the SVM classiers is not very well suited for solving this problem and that it is capable of nding a good boundary between the classes only in the case in which the orientation and the position of the hands does not vary too much. But on the other hand a more simple classier like the Knn is capable of solving the problem and returning satisfactory results. We have also seen that PCA was not able to give features strong enough to allow for a good classication performance. This might be due to the fact that PCA is not background invariant, rotation invariant or translation invariant. Given the fact the the hand detection component of our project was not selecting hand as perfectly separated from the background as the ones used for training we had to enhance our training set with a number of examples given by the handdetection component in order to ensure stability to our system and a more precise and robust prediction. In the present our software runs at a rate of 4 frames per second. With some proper optimization techniques this can be improved to make Nao react even faster and more naturally. Finally, we believe that other methods might give good results and therefor they should be explored. Probably a good idea would be to try employing SIFT features in solving this problem. References [1] Yen-Ting Chen, Kuo-Tsung Tsen, Developing a Multiple-angle Hand Gesture Recognition System for Human Machine Interaction; 2007 [2] M. R. Tuner, Texture Discrimination by Gabor Functions; 1986, Biol. Cybern. 55, p. 71-82 [3] Comaniciu, Dorin and Ramesh, Visvanathan and Meer, Peter, Kernel-Based Object Tracking; 2003, IEEE Trans. Pattern Anal. Mach. Intell., vol 25, p. 564575 5