Towards automated surfacelevel phonetic analysis of SL. Still cont d! Preamble

Presentation at the Brown Bag seminar (JyU) on 20 April 2009 Pre-preamble Towards automated surfacelevel phonetic analysis of SL From 2007 onwards: An attempt to establish a multidisciplinary research project in which computer vision techniques for the recognition and analysis of gestures and facial expressions from video are developed and applied to the processing of Sign Language in general and Finnish Sign Language in particular. Tommi Jantunen, Department of Languages, University of Jyväskylä tommi.j.jantunen@jyu.fi Cont d Still cont d! Project partners: Helsinki University of Technology: Jorma Laaksonen & Markus Koskela (CIS) University of Jyväskylä: Ritva Takkinen & Tommi Jantunen (Dept. of Languages); Timo Ahonen & Auli Meronen (NMI) University of Art and Design: Antti Raike (Media Lab) Finnish Association of the Deaf: Päivi Rainò (Sign Language Unit) Publication: Koskela, Markus; Laaksonen, Jorma; Jantunen, Tommi; Takkinen, Ritva; Rainò, Päivi & Raike, Antti (2008). Contentbased video analysis and access for Finnish Sign Language - a multidisciplinary research project. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd & I. Zwitserlood (Eds.), Construction and exploitation of sign language corpora [Proceedings of the 3rd Workshop on the representation and processing of sign languages, organised as a part of the 6th Language resources and evaluation conference (LREC) at Marrakech, Morocco, June 1st, 2008], pp. 101-104. Paris: ELRA. Preamble Cont d From 2009 onwards: A smaller scale multidisciplinary project aiming at representing graphically, from the existing video material, the prosody and rhythm of continuous natural signing My own work on signed syllable, sign, and sentence (cf. the need for more empirical research) Päivi s project (OSATA) dealing with dyslectic signers (cf. the role of rhythm in explaining dyslexia) CIS s interest in testing content-based video retrieval and analysis methods (cf. CIS s celebrated PicSOM technology) A research group consisting of experts in Computer and information science: Jorma Laaksonen & Markus Koskela (HUT/CIS) Sign language linguistics: Tommi Jantunen (JyU) & Päivi Rainò (FAD) Spoken language phonetics and prosody: Eeva Yli-Luukko ( Kotus ), Eija Aho (University of Helsinki) & Richard Ogden (University of York) 1

Introduction (1) Background: In spoken language phonetics, representing speech data directly with different graphical diagrams is quite widespread, or at least not uncommon (cf. the use of Praat etc.). Graphical representation of data enables more accurate analysis. Introduction (2) Current problem : There exists a number of SL studies that claim to be phonetic in a similar way spoken language phonetic studies are. However, in the literature, there are only a handfull of papers in which data is represented directly in a form of a graphical diagram (cf. the lack of movement tracking hardware/software etc.) When compared to the phonetic analysis of spoken language, the analysis of SL is based relatively more on estimates and abstractions. Motion tracking systems (1) My Wiimote experiment Motion tracking systems have been used in the study of SL phonetics (cf. movement) at least since Wilcox (1992) These systems are nowadays VERY accurate and enable graphical representation of the data as well as the extraction of various types of information from the data Cont d Still cont d! 2

Motion tracking systems (2) However, motion tracking data is ALWAYS laboratory-data Method(olog)ical prerequisite for natural data (in the sense of prosody and function ) is pre-recording! Is it possible to graphically represent and analyze SL (cf. movement) purely on the basis of video s digital content? One rare example from the existing literature Boyes Braem, Penny (1999). Rhythmic temporal patterns in the signing of deaf early and late learners of Swiss German Sign Language. Language and Speech 42:2-3, 177-208. Boyes Braem (1999:189) Boyes Braem (1999:188) DEMO 1 How to graphically represent and analyse SL movement on the basis of digital content of existing videos? Suvi 1038/3 3

HUT s 2007 demo An example frame Koskela & al. (2008:103) [A]n essential feature in the analysis of recorded continuous-signing sign language is that of motion. For tracking local motion in the video stream, we apply a standard algorithm based on detecting distinctive pixel neighborhoods and then minimizing the sum of squared intensity differences in small image windows between two successive video frames [ ]. DEMO 2 Towards more sophisticated content-based analysis of SL videos Skin filter Improved vector count 4

Auto-calculated values The result number of tracked motion points horizontal motion vertical motion length of sum of motion vectors sum of motion vector lengths length of sum of acceleration sum of acceleration lengths Horizontal motion Observations (Suvi 1038/3, hm) In general, changes in horizontal motion map well to the phonological boundaries (cf. the method of identifying lexical signs) However, the boundaries are not unambiguous in all cases Horizontal motion of lexical sequences exhibits more variation than that of intersign transitions (cf. the trad. assumption that only lexical movements are modifiable) The amount of horizontal motion in reduplicated or iterated signs reduces towards the end (cf. the disyllabicity constraint ) Vertical motion Observations (Suvi 1038/3, vm) Cf. the previous observations concerning horizontal motion Note that the vertical motion during lexical signs is directed from top to bottom, or is a plateau Note that the first two signs (BOY and INDEX) are produced with a continuous downward movement 5

Number of tracked motion points Observations (Suvi 1038/3, ntmp) Radical changes in the number of motion tracked interest points occur at the sign/transition boundaries Sequences with maximal amount of motion points seem to be transitions, or short sequences centering around sign/transition boundaries Note the large number of mtips at the end of the compound COMPUTER; also the small amount of mtips during the last two signs Note the main levels of mtips (cf. the number of signs) Length of sum of motion vectors Sum of motion vector lengths Observations (Suvi 1038/3, lsmv&smvl) Peaks map to transitions and to sign/ transition boundaries, not to lexical signs per se The lowest values occur in signs HOBBY and PLAY-JOYSTIC (cf. the muscular tension in the production of these signs) In general, movement in all tracked points occurs into same direction Length of sum of acceleration 6

Sum of acceleration lengths Observations (Suvi 1038/3, lsa&sal) Acceleration peaks map to sign/ transition boundaries and to transitions Acceleration values within lexical signs are relatively lower than the values within transitions Pessi & Illusia (frames1-512) DEMO 3 An experiment with a longer story Number of tracked motion points (P&I, frames 1-10705) Aho & Yli-Luukko (2005:209; laajoja intonaatiojaksoja eteläpohjalaisen naisen kertomuksessa) 7

DEMO 4 A few words concerning the qualitative difference between sign internal content movements and transitional movements Acceleration curve of one sentence in the P&I story Acceleration curve of a lexical sign in P&I (1) Acceleration curve of a lexical sign in P&I (2) Acceleration curve of a lexical sign in P&I (3) Acceleration curve of a transition in P&I (1) 8

Acceleration curve of a transition in P&I (2) Acceleration curve of a transition in P&I (3) Acceleration > perception > sonority [A] visual beat is communicated by periods of acceleration or deceleration [ ] (Luck & Sloboda 2008:237) Visual beats = those events that are felt to be more forcefully produced and around which the other events in the sequence are organized (Allen & al. 1991:197) Sonority is perceptual salience (e.g. Ohala 1990) Consequences to phonological theory? The axiom of signed syllable research has been that, in a sign stream, the most salient/sonorous events associate to lexical signs However, the present data suggests that this is not the case! Issues that still need addressing The Z-dimension (cf., for example, the sign BOY in Suvi s example 1038/3) Symmetrical two-handed signs (cf. the Pessi & Illusia story) Option to choose the focus area References Aho, Eija & Yli-Luukko, Eeva (2005). Intonaatiojaksoista. Virittäjä 2/2005, 201-220. Allen, George D.; Wilbur, Ronnie B. & Schick, Brenda B. (1991). Aspects of rhythm in ASL. Sign Language Studies 72, 297-320. Boyes Braem, Penny (1999). Rhythmic temporal patterns in the signing of deaf early and late learners of Swiss German Sign Language. Language and Speech 42:2-3, 177-208. Koskela, Markus; Laaksonen, Jorma; Jantunen, Tommi; Takkinen, Ritva; Rainò, Päivi & Raike, Antti (2008). Content-based video analysis and access for Finnish Sign Language - a multidisciplinary research project. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd & I. Zwitserlood (Eds.), Construction and exploitation of sign language corpora [Proceedings of the 3rd Workshop on the representation and processing of sign languages, organised as a part of the 6th Language resources and evaluation conference (LREC) at Marrakech, Morocco, June 1st, 2008], pp. 101-104. Paris: ELRA. Luck, Geoff & Nte, Sol (2008). An investigation of conductors' temporal gestures and conductormusician synchronization, and a first experiment. Psychology of Music 36:1, 81-99. hala, J. (1990). Alternatives to the sonority hierarchy for explaining segmental sequential constraints. In M. Ziolkowski, M. Noske & K. Deaton (Eds.), CLS 26. [Papers from the 26th regional meeting of the Chicago Linguistic Society] Vol. 2, The parasession on the syllable in phonetics and phonology, 319-338. Chicago Linguistic Society, University of Chicago, Chicago, Ill. Wilcox, Sherman (1992). The phonetics of fingerspelling. Studies in speech pathology and clinical linguistics 4. Amsterdam: John Benjamins. 9