Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble regons of nterest n the lve feed from the camera. It s done usng a probablty equaton and flterng t on a certan threshold probablty. Clusterng approach s used to locate eyes or the fngers. The report further dscusses how a neural network can be utlzed to control a robot hand and other potental applcatons. Keywords: Machne Vson, Clusterng, K-Mean, Artfcal Neural Network. 1. INTRODUCTION Computer mouse or joystck has become part of every computer we buy today. Despte ther popularty they have certan constrans. They are restrcted to a two dmensonal space. The user also needs to develop a hand eye coordnaton to effectvely control them. If we could control the objects n the computer wth our hands lke we control the objects n the real world, t would gve us a whole new level of computer operatng experence. The report here descrbes an attempt to capture the moton of human fngers and eyes to generate trackng data of ther moton. In past few years people have tred to solve ths problem through varous approaches 1,2. The system dscussed here takes n lve feed from a webcam and then usng a code wrtten n Matlab the computer tres to fnd out where the person s lookng. In another extenson of the code the system tres to trace the moton of human hand. In future the moton of the three fngers can be used to control the moton of a robot grpper wth three fngers. It s also possble to defne a plane n a three dmensonal space usng these three ponts. In the same way usng the relatve locaton of three fngers, we can defne a plane. Ths would enable us to work n three dmensonal envronments nstead of two. Fgure 1. Scence fcton move Mnorty Report depcts a futurstc computer nterface. Here actor Tom Cruse s usng hs hands to manpulate the mages on the screen. * Further author nformaton: (S correspondence to Saurabh Sarkar.) sarkarsh@emal.uc.edu

2. SYSTEM DETAIL Ths system descrbes the detal archtecture of the system. The mage processng whch nvolves flterng out of regons of nterest, clusterng the regons nto dfferent groups, tracng ther movement, are descrbed n ths secton. 2.1. Hardware Archtecture The system utlzes Image Acquston Toolbox (IAT) of Matlab. The IAT connects to a Wndows Meda Devce (WMD) whch s a webcam to acqure real tme mages. The System on whch the code s tested s Celeron laptop wth 512MB ram. The webcam used to acqure mages s a low of-the-self product wth a resoluton of 320 * 240 pxel mage sze. 2.2. Probablstc Estmaton of Regons of Interest The frames acqured from the camera s transformed nto a two dmensonal array contanng the probablty value of the locaton beng a part of object we are trackng. Ths done usng a multvarate probablty equaton: P(x,y) : Probablty matrx I (x,y) : Color value at x,y P( x, y) 3 ( = 1 = e ( I ( x, y) µ ) / 2σ )) (1) µ : Mean of color n th color doman σ : Standard devaton n th color doman Fgure 2. The mage on the left s a sample frame from the webcam. The mage n the mddle s the probablty plot of the locaton of eye. And one on the rght s the fltered mage wth the selected data ponts. The two dmenson array contanng probablty values s then fltered agan a certan threshold value to mnmze the nose and fnd out data ponts healthy to be passed on to the nest secton of the algorthm. 2.3. Clusterng The selected data ponts from the prevous secton of the algorthm are then clustered usng K-mean Clusterng algorthm. K-mean s a fast algorthm clusters objects based on attrbutes nto k-parttons. It assumes that the object attrbutes form the vector space. It tres to mnmze total ntra-cluster varance. V = K I = 1 X j S X j µ (2) 2 where there are k clusters S, = 1,2,...,k and µ s the centrod or mean pont of all the ponts. Depng of the number of objects we are trackng we change the K-mean clusterng parameter. For example where we track the eyes we use two clusters and where we track three fngers we use three clusters. The system uses the

[center,u,obj_fcn] = fcm(data, 2) command to locate the centrod of the ndvdual clusters. The centers are stored n the varable center. Fgure 3. The mage here has been clustered nto two clusters. The centers have been marked wth small crosses. 2.4. Artfcal Neural Network Approach to Fndng the Locaton of Gaze The author proposes an artfcal neural network approach to calculate the locaton of where the user s currently lookng at. The center of the gaze can be found out by averagng the x and y values of both the centers of the clusters. Feedng ths nput nto a neural network we can get the locaton of the orgnal gaze. 2.5. Three Pont Trackng of Fngers In ths system nstead of trackng eyes the system tracks three fngers of the user. The algorthm used to track the eyes s same here. The only varatons are color the algorthm looks for and number of clusters. Fgure 4. The mage on the left s a frame from the feed of camera and the mage on rght shows the output of the system. The small red crosses denote the locaton of the fngers found by the algorthm. 3. FUTURE APPLICATIONS The eye trackng system can help us mprove our flexblty of how use our computers today. We can replace the computer pontng devces lke mouse wth ths system. The cursor would smply pont where we look. Ths system can also be used as a feedback system. Where we can know where user has been lookng and for how much tme. The system can alert the user f he msses some mportant stuff dsplayed on the computer. The fnger trackng system wll open up more dmensons to how we use and nteract wth computers today. Today computers nterfaces are largely two dmensonal envronments. By trackng three or more fngers we can have three dmensonal nterfaces. Ths would open up new ways of lookng at data and nformaton n three dmensons. Ths concept can further be exted to tele-robotcs. The moton of the fngers captured by the system can be transmtted over nternet to a remote robotc three fnger robotcs hand.

REFERENCES 1. Baluja, S. & Pomerleau, D.A. Non-Intrusve Gaze Trackng Usng Artfcal Neural Networks, Advances n Neural Informaton Processng Systems (NIPS) 6. Cowan J.D., Tesauro, G. & Alspector, J. (eds.) Morgan Kaufmann Publshers, San Francsco, CA., 1994. 2. P. Ftzpatrck, Detectng head orentaton, MIT APPENDIX A. MATLAB CODE clc; clear all; vd = vdeonput('wnvdeo', 1); prevew(vd); pause(2); for z=1:900 data=getsnapshot(vd); I=double(data); J=exp(-1*(((I(:,:,1)-14).^2)/(800)+((I(:,:,2)-51).^2)/(800)+((I(:,:,3)-7).^2)/(800)))*100; sz=sze(j); c=0; X=J; data=0; for =1:sz(1) for j=1:sz(2) f(j(,j)>70) X(,j)=20; c=c+1; data(c,1)=; data(c,2)=j; else X(,j)=0; [center,u,obj_fcn] = fcm(data, 2); for =1:2 for j=-5:5 x=round(center(,1))+j; y=round(center(,2)); f (x<0) x=0; X(x,y)=200;

for j=-5:5 x=round(center(,1)); y=round(center(,2))+j; f (y<1) y=1; X(x,y)=200; D=X; for =1:sz(1) dy=round((center(1,2)+center(2,2))/2); f (dy<1) dy=1; D(,dy)=100; for =1:sz(2) dy=round((center(1,1)+center(2,1))/2); f (dy<1) dy=1; D(dy,)=100; mage(d); clear center; clear U; clear obj_fcn;