Part 14: Interaction in VR: Speech and Gesture Input Virtuelle Realität Wintersemester 2006/07 Prof. Bernhard Jung Gestures Body movements which convey meaningful information Gesture types 1. emblematic gestures (symbols): Meaning defined by convention 2. deictic gestures: Action of pointing to an object or to a region 3. mimetic gestures: Actions imitating interaction with an object 4. iconic gestures: Describe shape or movement of a object 5. beats: Mark rhythm of speech Types 2-5 (often) accompanied by speech (co-verbal) 1
Gestures: Kinetic Structure McNeill, Levy & Pedelty (1990) Consistent Arm Arm Use Use and and Body Body Posture Posture Consistent Head Head Movement Gesture-Unit Gesture-Phrase Preparation Stroke Stroke Retraction Hold Hold (pre-stroke) Hold Hold (post-stroke) Gesture: uni-modal select (deictic) turn (mimetic) drag and drop 2
Gesture Recognition Data Glove Joint Angles 6DOF Tracker Hand Position Hand Model Classifier Classifiers Neural networks Decision trees Grammars atomic form elements are composed to 'gesture words' e.g. based on HamNoSys HamNoSys: Symbols for Body Parts Prillwitz et al. (1989) "Hamburg Notation System" 3
Some HamNoSys Symbols Prillwitz et al. (1989) "Hamburg Notation System" Symbol ASCII notation Description BSifinger basis shape index finger stretched <etc.> EFinA PalmL LocShoulder LocStretched MoveA MoveR... extended finger orientation ahead palm orientation left location shoulder height location stretched move hand ahead move hand right... ( ) [ ] PARALLEL SEQUENCE executed in parallel executed in sequence HamNoSys Parse Tree statischer Anteil konfiguration input dynamischer Anteil aktion handform handstellung lokation BrackSeqL aktion BrackSeqRq grundform koerperebene abstand aktion bewegung BSifinger LocShoulder LocStretched bewegung einfachebewegung einfachebewegung gerade fingeransatzrichtung handflaechenorientierung gerade MoveR EFinA PalmL MoveA 4
Iconic Gesture Recognition Timo Sowa, University of Bielefeld Object identification by way of iconic gestural descriptions Multimodal Interaction (Speech & Gesture) 5
Multimodal (Speech & Gesture) Interaction Put-That-There System (1980) MIT Media Room Spatial Data Management later: speech and static pointing gestures via Polhemus Tracker Timing of gestures and speech The gesture stroke is often marked by an abrupt stop which is correlated with accented words or syllables the stroke does not occur after an accented word but simultaneously or shortly before Nimm dieses Rohr, steck es da dran 0 1 2 3 => hypotheses for establishing correspondence between accented behaviors in speech and gesture channels 6
Timing in human face-to-face communication Multimodal Integration Two "logistic" problems to be solved (Srihari, 1995): Segmentation Problem How can a system be made to cope with open input? How can units be determined to be processed in one system cycle? Correspondence Problem How to determine cross-references between multiple modalities (speech/gesture)? 7
Correspondence Problem How to determine cross-references between gesture and speech? Importance of reconstructing temporal correspondence example: Put <gesture> this chair there! Specification of referent (presupposition: target loc. known) Put this chair there! Put this chair <gesture> there! Specification of target location (presupposition: referent known) VIENA - Virtual Environment VIENA Project, University of Bielefeld, 1996 Setting: Designing a virtual office environment (arrangement, color) Goal: Relief from technical detail by natural, situated interaction Graphics DB Renderer Modeler Viewing Observe changes Augmented Graphics DB Time-stamped scene descriptions (Geometry models, materials, object names, object types) Bookkeeping Adaptor Virtual Camera Plan & Physics Color Space Interpret Mediating agents Communicate changes P a r s e r Verbal input questions 8
Agents in the VIENA System VIENA Project, University of Bielefeld Space agent has "expert" knowledge: how to obtain current object positions how to calculate object transformations (transl, rot) that objects have to stand on something (not in the air) that an object can only be where there is space what "in front" means for a table or a desk, resp. which orientation an object is expected to have etc. Color agent has "expert" knowledge: Bookkeeper agent has knowledge about: how to obtain current object colors (r,g,b) how to identify objects by color ("the blue chair") how to calculate a color transformation (blue, lighter) by changing rgb vectors etc. geometric description of all scene details, even when changed "by hand" material descriptions of all scene objects previous scene descriptions and alterations etc. Multimodal Parsing VIENA Project, University of Bielefeld Examples: Speech and hand gesture move <gesture> forward move <gesture> that to the left put the bowl <gesture> there make <gesture> this chair green turn <gesture> right put <gesture> that <gesture> there put <gesture> this computer on <gesture> that table put <gesture> this computer on the blue desk 9
put this computer on the blue desk...make this chair green 10
VIENA Interface Agency Timed Input Agency motivated by work of Pöppel & Schwender (1993) Rough approach: record sensor data in small time cycles (here: 100 ms) integrate info from multiple channels in large time cycles (here: 2 sec) 11
Multimodal Parsing SGIM Project, University of Bielefeld Temporal ATN (Augmented Transition Network) Input: gesture and speech streams Output: logical form of multimodal input Multimodal Interaction in VR SGIM & Virtuelle Werkstatt Projects, University of Bielefeld 12