Attention windows of second level fixations. Input image. Attention window of first level fixation



Similar documents
hp calculators HP 12C Net Present Value Cash flow and NPV calculations Cash flow diagrams The HP12C cash flow approach Practice solving NPV problems

ME6130 An introduction to CFD 1-1

hp calculators HP 12C Internal Rate of Return Cash flow and IRR calculations Cash flow diagrams The HP12C cash flow approach

Data Mining Techniques Chapter 6: Decision Trees

Tutorial: Assigning Prelogin Criteria to Policies

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model

Adabas Archiving. Mike Conena, Database Administrator Commonwealth of Massachusetts

Purchase Agent Installation Guide

5 Systems of Equations

Applied Biosystems Real-Time System Computer Setup Guide

DYNAMICS AS A PROCESS, HELPING UNDERGRADUATES UNDERSTAND DESIGN AND ANALYSIS OF DYNAMIC SYSTEMS

How-To Guide Importing a Portal Public Key into an ECC client

PMLead. Project Management Professional. edition. Based on PMBOK Guide 4 th.

DRAFT Standard Operating Procedure for Long-Term Archiving of PM 2.5 Filters and Extracts

DataTrak Release Notes

A Cognitive Approach to Vision for a Mobile Robot

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

The Correlation Coefficient

NOTE-TAKING. Rutgers School of Nursing

USER GUIDE MANTRA WEB EXTRACTOR.

June TerraSAR-X-based Flood Mapping Service

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Architectural Design Structured Design. Xin Feng

THE PROCESS APPROACH IN ISO 9001:2015

Simulation of processes in a mining enterprise with Tecnomatix Plant Simulation

Copyright 2013 CTB/McGraw-Hill LLC. 1

SQLFlow: PL/SQL Multi-Diagrammatic Source Code Visualization

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Intelligent Learning Content Management System based on SCORM Standard. Dr. Shian-Shyong Tseng

Data Masking Secure Sensitive Data Improve Application Quality. Becky Albin Chief IT Architect

ABAQUS Tutorial. 3D Modeling

Value Engineering VE with Risk Assessment RA

Notecard Question & Answer Technique

CHAPTER 1. Introduction to CAD/CAM/CAE Systems

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

PROCESS DOCKET - PRINT CHECKS

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

Multi-factor Authentication in Banking Sector

UML TUTORIALS THE USE CASE MODEL

Large Scale Systems Design G52LSS

Quantitative market research for incremental improvement innovations. Professor Eric von Hippel MIT Sloan School of Management

EXTENDED ANGEL: KNOWLEDGE-BASED APPROACH FOR LOC AND EFFORT ESTIMATION FOR MULTIMEDIA PROJECTS IN MEDICAL DOMAIN

IST722 Data Warehousing

ORIENTATIONS INVENTORY

SN54165, SN54LS165A, SN74165, SN74LS165A PARALLEL-LOAD 8-BIT SHIFT REGISTERS

Next Generation of Global Production Management Using Sensing and Analysis Technology

Client Security Risk Assessment Questionnaire

Building Information Modelling (BIM); How it Improves Building Performance. R.P. Kumanayake Lecturer, Department of Civil Engineering

SpotCell Automatic Dialer System

Customer Training Material. Lecture 2. Introduction to. Methodology ANSYS FLUENT. ANSYS, Inc. Proprietary 2010 ANSYS, Inc. All rights reserved.

Software Specification and Architecture 2IW80

Physics 9e/Cutnell. correlated to the. College Board AP Physics 1 Course Objectives

Procedure for obtaining Biometric Device Certification (Authentication)

Media Cloud Service with Optimized Video Processing and Platform

Express Introductory Training in ANSYS Fluent Lecture 1 Introduction to the CFD Methodology

Unit I. Introduction

Automatic Analysis of Browser-based Security Protocols

A Study of Immersive Game Contents System Design and Modeling for Virtual Reality Technology

The Scientific Data Mining Process

Biometric Authentication using Online Signature

EAP/GWL Rev. 1/2011 Page 1 of 5. Factoring a polynomial is the process of writing it as the product of two or more polynomial factors.

Newton s Laws of Motion Project

Region of Interest Access with Three-Dimensional SBHP Algorithm CIPR Technical Report TR

ELEC 3908, Physical Electronics, Lecture 15. BJT Structure and Fabrication

Prospectus for the Essential Physics package.

Evaluation & Validation: Credibility: Evaluating what has been learned

Reviewed by Ok s a n a Afitska, University of Bristol

Assessment Report Sample Candidate

OFFICIAL SECURITY CHARACTERISTIC MOBILE DEVICE MANAGEMENT

Search and Information Retrieval

Technology in Music Therapy and Special Education. What is Special Education?

Umbrella: A New Component-Based Software Development Model

Midterm. Name: Andrew user id:

Web Application Testing. Web Performance Testing

COMP Visualization. Lecture 11 Interacting with Visualizations

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

is in plane V. However, it may be more convenient to introduce a plane coordinate system in V.

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS.

e-gateway SOLUTION OVERVIEW Financials HCM ERP e-gateway Web Applications Mobile Devices SharePoint Portal

Management Information System Prof. B. Mahanty Department of Industrial Engineering & Management Indian Institute of Technology, Kharagpur

The Sun: Our nearest star

Factors and Products

Factorising quadratics

1000-Grid Banner Set. This Really Good Stuff product includes: 1000-Grid Banner Set This Really Good Stuff Activity Guide

NEW CHALLENGES IN COLLABORATIVE VIRTUAL FACTORY DESIGN

from Larson Text By Susan Miertschin

(Refer Slide Time: 02:17)

Transcription:

HandSegmentationUsingLearning-BasedPredictionand VericationforHandSignRecognition DepartmentofComputerScience YuntaoCuiandJohnJ.Weng mentationschemeusingattentionimagesfrommultiple Thispaperpresentsaprediction-and-vericationseg- AbstractE-mail:fcui,wengg@cps.msu.edu EastLansing,MI48824,USA MichiganStateUniversity canhandlealargenumberofdierentdeformableobjectspresentedincomplexbackgrounds.theschemeingareferenceimageofthestaticbackground[8],or eachsequencerepresentsahandsign.theexperimen- a3%falserejectionrate. menthandsinthesequencesofintensityimages,where talresultshoweda95%correctsegmentationratewith 1Introduction vericationscheme.thesystemhasbeentestedtoseg-elsor2-dvelocity-eldmodels[2].thesecondtype extractingthemotionentitybasedon3-dmotionmod- (e.g.[9]).thesemodelstypicallyneedagoodinitial positiontoconverge.theyalsoneedarelativelyclean ofapproachestashapetodeformablemovingobjects imagegradient. backgroundsincetheexternalforcesaredenedbythe guidedbythepastknowledgethroughaprediction-and- xations.amajoradvantageofthisschemeisthatit isalsorelativelyecientsincethesegmentationis cultiesfacedbythevision-basedapproachissegmenta- tion(e.g.[1,3,4,7,11,13]).oneofthemajordi- amountofresearchonvision-basedhandsignrecognimanmachineinterface.recently,thereisasignicantionofthemovinghandfromsometimescomplexback- suchasuniformbackground. grounds.toavoidtheaboveproblem,someofthesys- temsrelyonmarkers.theothersuserestrictivesetups Theabilitytointerprethandsignsisessentialforhu- Inordertoovercomethedicultiesfacedbythesegmentationmethodsfordeformableobjectsmentionevironment,itisnotverydiculttoroughlydetermine thepositionofamovingobjectintheimageusingmotioninformation.however,itisnotsimpleifthetaskis toextractthecontouroftheobjectfromvariousbackgrounds.severalmotionsegmentationmethodshavefigure1:anillustrationoftwolevelxationsofaninput handimage. thattheobjectofinterestismovinginastationaryen- choiceofvisualcueforvisualattention.ifweassume ofanalyzingtemporalsequence,motionisanobvious toperformthetaskofhandsegmentation.inthecase Inthispaper,wepresentalearning-basedapproachbackgroundinterference. thereconstructionisnotabletofullygetridofthe positioninarectangularattentionimagetogetherwith thebackground.theattentionimagewentthrougha reconstructionbasedonlearningwhichcanreducethe backgroundinterferencetoacertaindegree.however, proach[5].inthatapproach,theobjectwasassumedto above,wehaveproposedaneigen-subspacelearningap- beenproposed.theseapproachesfallintotwocategories.approachesintherstcategoryaredesignedto ofapproachesachievesasegmentationbyeitherbuild- dealwithrigidmovingobjects(e.g.[2,8]).thistypesolvethesegmentationproblemcompletely.similar kindofmultiplexationshasahierarchalstructure.as showninfig.1,therstlevelofthexationconcentratesontheentirehand,whilethenextlevelofthe xationtakescareofdierentpartsofthehand.the tohumanvision,multiplexationsareneeded.this attentionwindowoftherstlevelxationusuallycontainsapartofthebackground.butaswecontinue Oneattentionwindowfromasinglexationcannot zoomingintheobjectfromdierentxations,theat- Input image Attention window of first level fixation Attention windows of second level fixations

tentionwindowsbecomefocusingondierentpartsof theobject.oneimportantfeatureoftheseattention windowsisthattheytypicallycontainmuchlessbackgroundthantheattentionwindowoftherstlevelxation.theseattentionimagesfrommultiplexations canbeusedasimportantvisualcuestosegmentthe objectofinterestfromtheinputimage.inthispaper, wepresentalearning-basedapproachwhicheciently utilizestheattentionimagesobtainedfromthemultiple xationsthroughaprediction-and-vericationscheme toperformthetaskofhandsegmentation. 2ValidSegmentation Inthissection,wedenetheverierftoevaluate thesegmentationusingfunctioninterpolationbasedon trainingsamples.givenaninputimage,wecanconstructanattentionimageofthehandasshowninfig. 2.Input image Attention image Extract and scale the hand Figure2:Theillustrationofconstructingattentionimages. 2.1TheMostExpressiveFeatures(MEF) LetanattentionimageFofmrowsandncolumns bean(mn)-dimensionalvector.forexample,theset ofimagepixelsff(i;j)j0i<m;0j<ng canbewrittenasavectorv=(v1;v2;;vd)where vmi+j=f(i;j)andd=mn.typicallydisvery large.thekarhunen-loeveprojection[12]isavery ecientwaytoreduceahigh-dimensionalspacetoa low-dimensionalsubspace.thevectorsproducedbythe Karhunen-Loeveprojectionaretypicallycalledtheprinciplecomponents.Wecallthesevectorsthemostexpressivefeatures(MEF)inthattheybestdescribethe samplepopulationinthesenseoflineartransform[4]. 2.2ApproximationasFunctionInterpolation AfterprojectinghandattentionimagestoalowdimensionalMEFspace,wearenowreadytoapproximatetheverierfusingfunctioninterpolation. Denition1GivenatrainingvectorXk;iofgesture kinthemefspace,agaussianbasisfunctionsiis si(x)=e?kx?xk;ik2,whereisapositivedampingfactor,andkkdenotestheeuclideandistance. Averysmalltendstoreducethecontributionof neighboringtrainingsamples. Denition2GivenasetofntrainingsamplesLk= fxk;1;xk;2;;xk;ngofgesturek,thecondencelevel oftheinputxbelongstoclasskisdenedas:gk(x)= Pni=1cisi(X),wherethesiisaGaussianbasisfunctionandthecoecientsci'saretobedeterminedbythe trainingsamples. Thecoecientsci'saredeterminedasfollows.Given ntrainingsamples,wehavenequations gk(xk;i)=nxi=1cisi(xk;i); (1) whicharelinearwithrespecttothecoecientsci's.if wesetgk(xk;i)equalto1,wecansolvetheaboveequationsforciusinggauss-jordaneliminationmethod. ThecondenceleveldenedinDenition2canbe usedtoverifyasegmentationresult. Denition3GivenasegmentationresultSandacon- dencelevell,theverierfoutputsvalidsegmentation forgesturekifgk(s)>l. Intuitively,asegmentationresultSisvalidifthereis atrainingsamplethatissucientlyclosetoit. 3PredicationforValidSegmentation Thissectioninvestigatestheproblemhowtonda validsegmentation.ourapproachistousetheattentionimagesfrommultiplexationsoftraininghandimages.givenahandattentionimage,axationimage isdeterminedbyitsxationposition(s;t)andascale r.fig.3showstheattentionimagesofthe19xations fromonetrainingsample. Figure3:Theattentionimagesfrom19xationsofa trainingsample.therstoneisthesameastheoriginal handattentionimage. 3.1Overview Givenatrainingset,weobtainasetofattention imagesfrommultiplexationsforeachimageintheset. Eachattentionimagefromaxationisassociatedwith thesegmentationmaskoforiginalhandattentionimage, thescalerandthepositionofthexation(s;t).these informationisnecessarytorecoverthesegmentationfor theentireobject. Duringthesegmentationstage,werstusethemotioninformationtoselectvisualattention.Then,we

trydierentxationsontheinputimage.anattentionimagefromaxationofaninputimageisused toquerythetrainingset.thesegmentationmaskassociatedwiththequeryresultisthepredication.the predictedsegmentationmaskisthenappliedtotheinputimage.finally,weverifythesegmentationresult toseeiftheextractedsubimagecorrespondstoahand gesturethathasbeenlearned.iftheanswerisyes,we ndthesolution.thissolutioncanfurthergothrough arenementprocess.fig.4givestheoutlineofthe scheme. 3.2OrganizationofAttentionImagesfrom Fixations Inordertoachieveafastretrieval,webuildahierarchicalstructuretoorganizethedata. Denition4Ahierarchicalquasi-VoronoidiagramP ofsisasetofpartitionsp=fp1;p2;;pmg,where everypi=fpi;1;;pi;nig,i=1;2;;misapartitionofs.pi+1=fpi+1;1;;pi+1;ni+1gisaner VoronoidiagrampartitionofPiinthesensethatcorrespondingtoeveryelementPi;k2Pi,Pi+1containsa VoronoipartitionfPi+1;s;;Pi+1;tgofPi;k. 0,1 1,2 1,3 1,4 1,5 1,6 1,7 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,10 2,14 2,12 Figure5:A2-Dillustrationofahierarchicalquasi-Voronoi diagram. ThegraphicdescriptioninFig.5givesansimpli- edbutintuitiveexplanationofthehierarchicalquasi- Voronoidiagram.Thestructureisatree.Theroot correspondstotheentirespaceofallthepossibleinputs.thechildrenoftherootpartitionthespaceinto largecells,asshownbythicklinesinfig.5.thechildrenofaparentsubdividetheparent'scellfutureinto smallercells,andsoon. 3.3PredictionasQueryingtheTraining Set GivenatrainingsetL,ahierarchicalquasi-Voronoi diagramp=fp1;p2;;pngcorrespondingtoland aquerysamplex,thepredictionproblemistonda trainingsamplex02l,suchthatkx?x0kkx?x00k foranyx002lwithx006=x0.thetypeofquerymentionedaboveisanearestneighborproblem,alsoknown aspost-oceproblem[10].therestilllacksofecient solutionsforthecasewithdimensionhigherthanthree. Inthissection,wewillpresentanecientalgorithm whenthetrainingsetisd-supportiveasdenedbelow. Denition5LetSbeasetwhichcontainsallpossible samples.atrainingsetl=fl1;l2;;lngisadsupportivetrainingsetifforanytestsamplex2s, thereexistisuchthatkx?lik<d,wherekkisthe Euclideandistance. Nexttwotheoremsshowhowtoprunethesearch patheswhenthetrainingsetisd-supportive. Theorem1Wehaveasetofd-supportivetrainingset L=fL1;L2;;Lng,ahierarchicalquasi-VoronoidiagramP=fP1;P2;;PngcorrespondingtoLanda querysamplex2s.lettheithpartitionbepi= fpi;1;pi;2;;pi;nigandc=fc1;c2;;cnigbethe correspondingcentersofregionsinpi.assumec1be thecentertoxsuchthatkc1?xkkci?xkforany i6=1.letc2beanyothercenterandp1beaboundary hyperplanebetweenregionsrepresentedbyc1andc2as illustratedinfig.6.thentheregionofc2doesnot containthenearesttrainingsampletoxifthedistance betweenxandthehyperplanep1isgreaterthand. d a b e f boundary hyperplane M P1 P2 m C 1 C 2 X Figure6:A2Dillustrationofnearestneighborquerytheorems. Inordertoavoidtocalculatethepointtohyperplane distanceinahighdimensionalspace,wecanusefollowingequivalenttheorem. Theorem2LetkC1?C2k=r,f=r2,e=r2?d, kc1?xk=aandkc2?xk=basshowninfig.6. TheregionofC2doesnotcontainthenearesttraining sampletoxifa2?e2<b2?f2. FortheproofTheorem1andTheorem2,thereader isreferredto[6]. 4Experiments Wehaveappliedoursegmentationschemetothetask ofhandsegmentationintheexperiments.thenumber ofgesturesweusedinourexperimentis40.thesegestureshaveappearedinthesignswhichhavebeenused

input sequence Confident?Figure4:Overviewofthesegmentationscheme. Motion based visual attention Extractor attention images recalled mask from multiple fixations Information needed by the Verifier approximate function (e.g., illustratedinfig.7.thesizeofattentionwindowused coefficients) intheexperimentis3232pixels. totestthehandsignrecognitionsystem[4].theyareverierfforthatgesture.givenasetoftrainingsam- plesl=fx1;x2;;xngforgesturek,weempirically information for gesture k gesture 1 functionasfollows: pleswereusedtoobtainedtheapproximationofthe determinedthedampingfactorintheinterpolation Predictor gesture k gesture n no yes index Thesecondtypeoftrainingwastogeneratetheat- =0:2Pn?1 i=1kxi?xi+1k n?1 : (2) Discard Refinement proximationforverierfwhichwouldbeusedlater 4.1Training iments.thersttypeoftrainingistogettheap- tocheckthevalidationofthesegmentation.foreach gesture,anumberbetween(27and36)oftrainingsam- Twotypesoftrainingwereconductedintheexper- Figure7:40handgesturesusedintheexperiment. Thetotalnumberoftrainingattentionimagesis1742. 4.2HandSegmentation presentedintheattentionwindowwouldbediscarded. tentionimageswithmorethan30%backgroundpixels ples.inthecurrentimplementation,theselectionofthe foreachtrainingsampleasshowninfig.3.theattentionimagesfrommultiplexationsoftrainingsammentationtaskfromatemporalsequenceofintensity xationsismechanical.totally19xationswereused images.eachsequencerepresentsacompletehandsign. Fig.8(a)showstwosamplesequences. weutilizemotioninformationtondamotionattention window.theattentionalgorithmcandetecttherough Thetrainedsystemwastestedtoperformtheseg- Inordertospeeduptheprocessofthesegmentation,

(a) (b) attentionareshownusingdarkrectangular;(c)theresultsofthesegmentationareshownaftermaskingothebackground. Figure8:Thesamplesoftheexperimentalresults.(a)Theinputtestingsequences;(b)Theresultsofmotion-basedvisual

positionofamovingobject,buttheaccuracyisnot guaranteedasshowninfig.8(b).wesolvethisproblembydoingsomelimitedsearchbasedonthemotion attentionwindow.inthecurrentimplementation,given amotionattentionwindowwithmrowsandncolumns, wetrythecandidateswithsizefrom(0:5m;0:5n)to (2m;2n)usingstepsize(0:5m;0:5n). Wetestedthesystemwith802images(161sequences)whichwerenotusedinthetraining.Aresult wasrejectedifthesystemcouldnotndavalidsegmentationwithacondencelevell.thesegmentation wasconsideredasacorrectoneifthecorrectgesture segmentationcwasretrievedandplacedintheright positionofthetestimage.forthecaseofl=0:2,we haveachieved95%correctsegmentationratewith3% falserejectionrate.fig.8(c)showssomesegmentationresults.wesummarizetheexperimentalresultsin Table1.ThetimewasobtainedonaSGI-INDIGO2 workstation. Table1:Summaryoftheexperimentaldata NumberofCorrect FalseTime testimagessegmentationrejectionperimage 805 95% 3%58.3sec. 5ConclusionsandFutureWork Asegmentationschemeusingattentionimagesfrom multiplexationsispresentedinthispaper.themajoradvantageofthisschemeisthatitcanhandlea largenumberofdierentdeformableobjectspresented invariouscomplexbackgrounds.theschemeisalso relativelyecientsincethesearchofthesegmentation isguidedbythepastknowledgethroughapredicationand-vericationscheme. Inthecurrentimplementation,thexationsaregeneratedmechanically.Thenumberofxationsandthe positionsofxationsarethesameregardlessofthetypes ofgestures.thisisnotveryecient.somegestures maybeverysimplesothatafewxationsareenough torecognizethem.nevertheless,inordertoachievethe optimalperformance,dierentgesturesmayrequiredifferentpositionsofxations.inthefuture,weplanto investigatethegenerationofthexationsalsobasedon learning.thepreviousxationsareusedtoguidethe nextaction.thenextactioncouldbe(a)termination oftheprocessofgeneratingxationifthegesturehas alreadybeenrecognized;or(b)ndingtheappropriate positionfornextxation. Acknowledgements TheauthorswouldliketothankYuZhong,Kal Rayes,DougNeal,andValerieBolsterformakingthemselvesavailablefortheexperiments.ThisworkwassupportedinpartbyNSFgrantNo.IRI9410741andONR grantno.n00014-95-1-0637. References [1]A.BobickandA.Wilson,\Astate-basedtechnique forthesummarizationandrecognitionofgesture", inproc.5thint'lconf.computervision,pp.382-388,boston,1995. [2]P.BouthemyandE.Francois,\Motionsegmentationandqualitativedynamicsceneanalysisfroman imagesequence",ininternationaljournalofcomputervision,vol.10,pp.157-182,1993. [3]R.Cipolla,Y.OkamotoandY.Kuno,\Robust structurefrommotionusingmotionparallax",in IEEEConf.ComputerVisionandPatternRecog., pp.374-382,1993. [4]Y.Cui,D.SwetsandJ.Weng,\Learning-based handsignrecognitionusingshoslif-m",inproc. 5thInt'lConf.ComputerVision,pp.631-636, Boston,1995. [5]Y.CuiandJ.Weng,"2Dobjectsegmentationfrom foveaimagesbasedoneigen-subspacelearning", Proc.IEEEInt'lSymposiumonComputerVision, CoralGables,FL,Nov.20-22,1995. [6]Y.CuiandJ.Weng,\Alearning-basedpredictionand-vericationsegmentationschemeforhandsign imagesequences",technicalreportcps-95-43, ComputerScienceDepartment,MichiganState University,Dec.,1995. [7]T.DarrellandA.Pentland,\Space-timegestures",inIEEEConf.ComputerVisionandPatternRecog.,pp.335-340,1993. [8]G.W.Donohoe,D.R.HushandN.Ahmed, \Changedetectionfortargetdetectionandclassicationinvideosequences",inProc.Int'lConf. Acoust.,Speech,SignalProcessing,pp.1084-1087, 1988. [9]M.Kass,A.WitkinandD.Terzopoulos,\Snakes: activecontourmodels",inproc.1sticcv,pp.259-268,1987. [10]D.Knuth,TheArtofComputerProgrammingIII: SortingandSearching,Addison-Wesley,Reading, Mass.,1973. [11]J.J.KuchandT.S.Huang,\Visionbasedhand modelingandtracking",inproc.international ConferenceonComputerVision,June,1995. [12]M.M.Loeve,ProbabilityTheory,Princeton,NJ: VanNostrand,1955. [13]T.E.StarnerandA.Pentland,\Visualrecognition ofamericansignlanguageusinghiddenmarkov models",inproc.internationalworkshoponautomaticface-andgesture-recognition",june1995.