Vocal Emotion Recognition
|
|
- Dennis Holmes
- 8 years ago
- Views:
Transcription
1 Vocal Emotion Recognition State-of-the-Art in Classification of Real-Life Emotions October 26, 2010 Stefan Steidl International Computer Science Institute (ICSI) at Berkeley, CA Overview 2 / 49 1 Different Perspectives on Emotion Recognition 2 FAU Aibo Emotion Corpus 3 Own Results on Emotion Classification 4 INTERSPEECH 2009 Emotion Challenge
2 Overview 3 / 49 1 Different Perspectives on Emotion Recognition Psychology of Emotion Computer Science 2 FAU Aibo Emotion Corpus 3 Own Results on Emotion Classification 4 INTERSPEECH 2009 Emotion Challenge Facial Expressions of Emotion 4 / 49
3 5 / 49 Universal Basic Emotions Paul Ekman postulates the existence of 6 basic emotions: anger, fear, disgust, surprise, joy, sadness other emotions are mixed or blended emotions universal facial expressions Terminology 6 / 49 Different affective states [1]: type of affective state inten- dura- syn- event appraisal rapid- behavsity tion chroni- focus elicita- ity of ioral zation tion change impact emotion - mood - interpersonal stances - - attitudes personality traits - : low, : medium, : high, : very high, -: indicates a range [1] K. R. Scherer: Vocal communication of emotion: A review of research paradigms, Speech Communication, Vol. 40, pp , 2003
4 7 / 49 Terminology (cont.) Definition of Emotion Emotion (Scherer) episodes of coordinated changes in several components including at least: neurophysiological activation, motor expression, and subjective feeling but possibly also action tendencies and cognitive processes in response to external or internal events of major significance to the organism Vocal Expression of Emotion 8 / 49 Results from studies in Psychology of Emotion anger/ fear/ sadness joy/ boredom stress rage panic elation Intensity F 0 floor/mean F 0 variability F 0 range ( ) 1 Sentence contour High frequency energy ( ) 2 Speech and articulation rate ( ) 2 1 Banse and Scherer found a decrease in F 0 range 2 inconclusive evidence Goal Classification of the subject s actual emotional state (some sort of lie detector for emotions)
5 9 / 49 Human-Computer Interaction (HCI) Emotion-Related User States naturally occurring states of users in human-machine communication emotions in a broader sense coordinated changes in several components NOT required classification of the perceived emotional state, not necessarily the actual emotion of the speaker 10 / 49 Pattern Recognition Pattern Recognition Point of View classification task: choose 1 of n given classes discrimination of classes rather than classification definition of good features machine classification Actually not needed definition of term emotion information on how specific features change
6 11 / 49 Emotional Speech Corpora Acted data based on Basic Emotions theory suited for studying prototypical emotions corpora easy to create (inexpensive, no labeling process) high audio quality balanced classes neutral linguistic content (focus on acoustics only) high recognition results 12 / 49 Emotional Speech Corpora (cont.) Popular corpora Emotional Prosody Speech and Transcript corpus (LDC): 15 classes Berlin Emotional Speech Database (EmoDB): 7 classes 89.9 % accuracy (speaker independent LOSO evaluation, speaker adaptation, feature selection) [2] Danish Emotional Speech Corpus: 5 classes 74.5 % accuracy (10-fold SCV, feature selection) [3] [2] B. Vlasenko et al.: Combining Frame and Turn-Level Information for Robust Recognition of Emotions within Speech, INTERSPEECH 2007 [3] Schuller et al.: Emotion Recognition in the Noise Applying Large Acoustic Feature Sets, Speech Prosody 2006
7 13 / 49 Emotional Speech Corpora (cont.) Naturally occurring emotions states that actually appear in HCI (real applications) difficult to create (appropriate scenario needed, ethical concerns, need to label data) low emotional intensity in general 80 % neutral low audio quality (reverberation, noise, far-distance microphones) needed for machine classification (because conditions between training and test must not differ too much) research on both acoustic and linguistic features possible new research questions: optimal emotion unit almost no corpora large enough for machine classification available (do not exist or are not available for research) Overview 14 / 49 1 Different Perspectives on Emotion Recognition 2 FAU Aibo Emotion Corpus Scenario Labeling of User States Data-driven Dimensions of Emotion Units of Analysis Sparse Data Problem 3 Own Results on Emotion Classification 4 INTERSPEECH 2009 Emotion Challenge
8 15 / 49 The FAU Aibo Emotion Corpus 51 children (30 f, 21 m) at the age of 10 to hours of spontaneous speech (mainly short commands) 48,401 words in 13,642 audio files 16 / 49 FAU Aibo Emotion Corpus (cont.) data base for CEICES and INTERSPEECH 2009 Emotion Challenge available for scientific, non-commercial use [4] S. Steidl: Automatic Classification of Emotion-Related User States in Spontaneous Children s Speech, Logos Verlag, Berlin available online:
9 Emotion-Related User States 17 / categories: prior inspection of the data before labeling joyful surprised motherese neutral bored emphatic helpless touchy/irritated reprimanding angry other motherese the way mothers/parents address their babies either because Aibo is well-behaving or because the child wants Aibo to obey; positive equivalent to reprimanding emphatic pronounced, accentuated, sometimes hyper-articulated way but without showing any emotion reprimanding the child is reproachful, reprimanding, wags the finger Labeling of User States 18 / 49 Labeling: 5 students of linguistics holistic labeling on the word level majority vote emotion category words angry (A) % touchy (T) % reprimanding (R) % emphatic (E) 2, % neutral (N) 39, % motherese (M) 1, % joyful (J) %. all 48, %
10 19 / 49 Labeling of User States (cont.) Confusion matrix majority vote emotion category A T R E N M J angry (A) touchy (T) reprimanding (R) emphatic (E) neutral (N) motherese (M) joyful (J) / 49 Data-driven Dimensions of Emotions Non-metric dimensional scaling: arranging the emotion categories in the 2-dimensional space states that are often confused are close to each other +interaction interaction motherese reprimanding touchy neutral emphatic angry joyful interaction negative valence positive
11 21 / 49 Units of Analysis Units of analysis v1 v2 p3 s3 stopp Aibo g radeaus fein machst du das stopp sitz word level chunk level Ohm_18_342 turn level Ohm_18_343 Advantages/disadvantages of larger units + more information less emotional homogeneity S. Steidl: Vocal Emotion Recognition 22 / 49 Sparse Data Problem Super classes: Motherese 0.5 joyful motherese neutral angry emphatic 1 touchy reprimanding Anger: angry, touchy/irritated, reprimanding Emphatic Neutral Motherese Neutral 0 0 Anger S = 0.32 S. Steidl: RSQ = 0.73 Vocal Emotion Recognition Emphatic S = RSQ =
12 23 / 49 Sparse Data Problem (cont.) Data subsets Aibo corpus Aibo turn set Aibo chunk set Aibo word set data set number of taken from words # chunks # turns Aibo corpus 48,401 18,216 13,642 Aibo word set 6,070 4,543 3,996 Aibo chunk set 13,217 4,543 3,996 Aibo turn set 17,618 6,413 3,996 Overview 24 / 49 1 Different Perspectives on Emotion Recognition 2 FAU Aibo Emotion Corpus 3 Own Results on Emotion Classification Results for different Units of Analysis Machine vs. Human Feature Types and their Relevance 4 INTERSPEECH 2009 Emotion Challenge
13 Most Appropriate Unit of Analysis 25 / 49 Classification complete set of features classification with Linear Discriminant Analysis (LDA) 51-fold speaker-independent cross-validation unit of number of number of average analysis features samples recall word level 265 6,070 words 67.2 % chunk level 700 4,543 chunks 68.9 % turn level 700 3,996 turns 63.2 % Chunks: best compromise between length of the segment homogeneity of the emotional state within the segment Machine Classifier vs. Human Labeler 26 / 49 Entropy based measure: labeler class A E A A 1 2 A E N M decoder: M 1 2 A E N M A E N M H dec = 1.41 implicit weighting of classification errors depending on the word that is classified
14 27 / 49 Machine Classifier vs. Human Labeler (cont.) Classification: Aibo word set rel. frequency [%] avg. human labeler machine classifier entropy [5] S. Steidl, M. Levit, A. Batliner, E. Nöth, H. Niemann: Of All Things the Measure is Man Classification of Emotions and Inter-Labeler Consistency, ICASSP / 49 Evaluation of Different Types of Features Types of features acoustic features prosodic features spectral features voice quality features linguistic features Evaluation Artificial Neural Networks (ANN) 51-fold speaker-independent cross-validation combination by early or late fusion
15 29 / 49 Acoustic Features: Prosody Prosody suprasegmental characteristics such as pitch contour energy contour temporal shortening/lengthening of words duration of pauses between words 30 / 49 Acoustic Features: Prosody (cont.) Classification results: Aibo chunk set average recall [%] pauses (16) duration (37) energy (25) all F0 (29)
16 31 / 49 Acoustic Features: Spectral Characteristics (cont.) Classification results: Aibo chunk set average recall [%] prosody (107) HNR (2) TEO (64) MFCC (24) formants (16) jitter/shimmer (4) best combination Acoustic Features: Voice Quality 32 / 49 Classification results: Aibo chunk set average recall [%] prosody (107) MFCC (24) formants (16) HNR (2) jitter/shimmer (4) TEO (64) best combination
17 Acoustic Features: Combination 33 / 49 Classification results: Aibo chunk set average recall [%] prosody (107) MFCC (24) formants (16) jitter/shimmer (4) HNR (2) TEO (64) best combination Linguistic Features 34 / 49 Types of linguistic features word characteristics average word length (number of letters, phonemes, syllables) proportion of word fragments average number of repetitions part-of-speech features unigram models bag-of-words
18 35 / 49 Linguistic Features (cont.) Part-of-Speech (POS) Features only 6 coarse POS categories can be annotated without considering context nouns, proper names % of total inflected adjectives not inflected adjectives present/past participles (other) verbs, infinitives auxiliaries articles, pronouns, particles, interjections Anger Joyful Neutral Emphatic Motherese Other - 36 / 49 Linguistic Features (cont.) Unigram Models u(w, e) = log 10 P(e w) P(e) Anger P(A w) Emphatic P(E w) böser (bad) 29.2 % stopp (stop) 30.5 % stehenbleiben (stop) 18.9 % halt (halt) 29.3 % nein (no) 17.0 % links (left) 20.5 % aufstehen (get up) 12.3 % rechts (right) 18.9 % Aibo (Aibo) 10.1 % nein (no) 17.6 % Neutral P(N w) Motherese P(M w) okay (okay) 98.6 % fein (fine) 57.5 % und (and) 98.5 % ganz (very) 41.9 % Stück (bit) 98.5 % braver (good) 36.0 % in (in) 98.2 % sehr (very) 23.5 % noch (still) 96.2 % brav (good) 21.7 %
19 37 / 49 Linguistic Features (cont.) Bag-of-Words utterance: Aibo, geh nach links! (Aibo, move to the left!) Aibo allen geh nach links Aibolein representation of the linguistic content word order getting lost various dimensionality reduction techniques 38 / 49 Linguistic Features (cont.) Classification results: Aibo chunk set average recall [%] word statistics (6) POS (6) unigram models (16) best combination BOW (254 50)
20 39 / 49 Combination of Acoustic and Linguistic Features Classification results: Aibo chunk set 80 average recall [%] acoustic features (late fusion, ANN) best combination (late fusion, ANN) best combination linguistic features combination (late fusion, ANN) combination (early fusion, LDA) 40 / 49 Similar Results within CEICES CEICES: Combining Efforts for Improving Automatic Classification of Emotional User States collaboration of various research groups within the European Network of Excellence HUMAINE ( ) state-of-the-art feature set with 4,000 features SVM (linear kernel), 3-fold speaker-independent cross-validation selection of 150 features (SFFS): surviving feature types? only chunk based features, no information outside Aibo chunk set [6] A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, L. Kessous, N. Amir: Whodunnit Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech, Computer, Speech, and Language, Vol. 25, Issue 1 (January 2011), pp. 4-28
21 41 / 49 Similar Results within CEICES(cont.) duration energy F0 spectrum cepstrum voice quality wavelets all acoustic BOW POS higher semantics varia all linguistic all SFFS # total # F MEASURE SHARE PORTION SFFS # F MEASURE SHARE PORTION Overview 42 / 49 1 Different Perspectives on Emotion Recognition 2 FAU Aibo Emotion Corpus 3 Own Results on Emotion Classification 4 INTERSPEECH 2009 Emotion Challenge
22 INTERSPEECH 2009 Emotion Challenge 43 / 49 New goals: challenge with standardized test conditions open microphone: using the complete corpus highly unbalanced classes including all observed emotional categories including chunks with low inter-labeler agreement 44 / 49 INTERSPEECH 2009 Emotion Challenge (cont.) Speaker independent training and test sets 2-class problem: NEGative vs. IDLe # NEG IDL train test class problem: Anger, Emphatic, Neutral, Positive, Rest # A E N P R train test
23 45 / 49 INTERSPEECH 2009 Emotion Challenge (cont.) Sub-Challenges 1 Feature Sub-Challenge optimisation of feature extraction/selection; classifier settings fixed 2 Classifier Sub-Challenge optimisation of classification techniques; feature set given 3 Open Performance Sub-Challenge optimisation of feature extraction/selection and classification techniques 46 / 49 INTERSPEECH 2009 Emotion Challenge (cont.) Participants Open Performance Classifier Feature Sub-Challenge Sub-Challenge Sub-Challenge number of 2 classes 5 classes 2 classes 5 classes 2 classes 5 classes participants [7] B. Schuller, A. Batliner, S. Steidl, D. Seppi: Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge, Speech Communication, Special Issue Sensing Emotion and Affect - Facing Realism in Speech Processing, to appear
24 47 / 49 INTERSPEECH 2009 Emotion Challenge (cont.) 2-class problem: NEGative vs. IDLe 74 average recall [%] unweighted avg. recall weighted avg. recall 62 Majority voting Dumouchel et al. Vlasenko et al. Kockmann et al. 60 Baseline Barra-Chicote et al. Vogt et al. Bozkurt et al. Polzehl et al. Luengo et al. 48 / 49 INTERSPEECH 2009 Emotion Challenge (cont.) 5-class problem: Anger, Emphatic, Neutral, Positive, Rest 55 average recall [%] Lee et al. Vlasenko et al. Luengo et al. Planet et al. Dumouchel et al unweighted average recall weighted average recall 35 Vogt el al. Barra-Chicote et al. Baseline Majority voting Kockmann et al. Bozkurt et al.
25 State-of-the-Art: Summary 49 / 49 Berlin Emotion Speech Database 7-class problem: hot anger, disgust, fear/panic, happiness, sadness/sorrow, boredom, neutral balanced classes 90 % accuracy FAU Aibo Emotion Corpus 4-class problem: Anger, Emphatic, Neutral, Motherese subset with roughly balanced classes (Aibo chunk set) 69 % unweighted average recall 5-class problem: Anger, Emphatic, Neutral, Positive, Rest highly unbalanced classes, complete corpus 44 % unweighted average recall 2-class problem: NEGative vs. IDLe highly unbalanced classes, complete corpus 71 % unweighted average recall
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationAutomatic Emotion Recognition from Speech
Automatic Emotion Recognition from Speech A PhD Research Proposal Yazid Attabi and Pierre Dumouchel École de technologie supérieure, Montréal, Canada Centre de recherche informatique de Montréal, Montréal,
More informationRecognition of Emotions in Interactive Voice Response Systems
Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,
More informationMODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G.
MODELING OF USER STATE ESPECIALLY OF EMOTIONS Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G. email: noeth@informatik.uni-erlangen.de Dagstuhl, October 2001
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationEMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS
EMOTION IN SPEECH: RECOGNITION AND APPLICATION TO CALL CENTERS VALERY A. PETRUSHIN Andersen Consulting 3773 Willow Rd. Northbrook, IL 60062 petr@cstar.ac.com ABSTRACT The paper describes two experimental
More informationFinal Project Presentation. By Amritaansh Verma
Final Project Presentation By Amritaansh Verma Introduction I am making a Virtual Voice Assistant that understands and reacts to emotions The emotions I am targeting are Sarcasm, Happiness, Anger/Aggression
More informationSALIENT FEATURES FOR ANGER RECOGNITION IN GERMAN AND ENGLISH IVR PORTALS
Chapter 1 SALIENT FEATURES FOR ANGER RECOGNITION IN GERMAN AND ENGLISH IVR PORTALS Tim Polzehl Quality and Usability Lab, Technischen Universität Berlin / Deutsche Telekom Laboratories, Ernst-Reuter-Platz
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationHow To Assess Dementia With A Voice-Based Assessment
Alex Sorin, IBM Research - Haifa Voice Analytics for Dementia Assessment DemAAL, September 2013, Chania, Greece Acknowledgements The experimental work presented below is supported by Dem@Care FP7 project
More information1 Introduction. An Emotion-Aware Voice Portal
An Emotion-Aware Voice Portal Felix Burkhardt*, Markus van Ballegooy*, Roman Englert**, Richard Huber*** T-Systems International GmbH*, Deutsche Telekom Laboratories**, Sympalog Voice Solutions GmbH***
More informationYou Seem Aggressive! Monitoring Anger in a Practical Application
You Seem Aggressive! Monitoring Anger in a Practical Application Felix Burkhardt Deutsche Telekom Laboratories, Berlin, Germany Felix.Burkhardt@telekom.de Abstract A monitoring system to detect emotional
More informationTHE VOICE OF LOVE. Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard
THE VOICE OF LOVE Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard University of Toledo, United States tbelanger@rockets.utoledo.edu, Caroline.Menezes@utoledo.edu, Claire.Barbao@rockets.utoledo.edu,
More informationAutomatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations
Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ
More informationSentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:
More informationPsychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features
22 Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features Marko Lugger and Bin Yang University of Stuttgart Germany Open Access Database www.intechweb.org 1. Introduction
More informationTechnische Universität München. Speech Processing When it gets Emotional
Speech Processing When it gets Emotional One Hour of: Intro Emotion & Speech Some Hot Topics And next? Björn Schuller 2 Intro Application Natural Interaction Media Retrieval Monitoring Editing Encoding
More informationProsodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements
Prosodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements Authors: A. Paeschke, W. F. Sendlmeier Technical University Berlin, Germany ABSTRACT Recent data on prosodic
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationAMMON: A Speech Analysis Library for Analyzing Affect, Stress, and Mental Health on Mobile Phones
AMMON: A Speech Analysis Library for Analyzing Affect, Stress, and Mental Health on Mobile Phones Keng-hao Chang, Drew Fisher, John Canny Computer Science Division, University of California at Berkeley
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationThe Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency
The Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency Andrey Raev 1, Yuri Matveev 1, Tatiana Goloshchapova 2 1 Speech Technology Center, St. Petersburg, RUSSIA {raev, matveev}@speechpro.com
More informationengin erzin the use of speech processing applications is expected to surge in multimedia-rich scenarios
engin erzin Associate Professor Department of Computer Engineering Ph.D. Bilkent University http://home.ku.edu.tr/ eerzin eerzin@ku.edu.tr Engin Erzin s research interests include speech processing, multimodal
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationEmotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis
Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis S-E.Fotinea 1, S.Bakamidis 1, T.Athanaselis 1, I.Dologlou 1, G.Carayannis 1, R.Cowie 2, E.Douglas-Cowie
More informationAUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH
AUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH Ani Nenkova University of Pennsylvania nenkova@seas.upenn.edu Dan Jurafsky Stanford University jurafsky@stanford.edu ABSTRACT In natural
More informationNonverbal Communication Human Communication Lecture 26
Nonverbal Communication Human Communication Lecture 26 Mar-14-11 Human Communication 1 1 Nonverbal Communication NVC can be communicated through gestures and touch (Haptic communication), by body language
More informationINFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1
INFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1 1 Telefonica Research, Barcelona, Spain 2 Idiap Research Institute,
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationGender Identification using MFCC for Telephone Applications A Comparative Study
Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is
More informationBig Data and Opinion Mining: Challenges and Opportunities
Big Data and Opinion Mining: Challenges and Opportunities Dr. Nikolaos Korfiatis Director Frankfurt Big Data Lab JW Goethe University Frankfurt, Germany /~nkorf Agenda Opinion Mining and Sentiment Analysis
More informationSentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies
Sentiment analysis of Twitter microblogging posts Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies Introduction Popularity of microblogging services Twitter microblogging posts
More informationEmotion Recognition Using Blue Eyes Technology
Emotion Recognition Using Blue Eyes Technology Prof. Sudan Pawar Shubham Vibhute Ashish Patil Vikram More Gaurav Sane Abstract We cannot measure the world of science in terms of progress and fact of development.
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationDegree of highness or lowness of the voice caused by variation in the rate of vibration of the vocal cords.
PITCH Degree of highness or lowness of the voice caused by variation in the rate of vibration of the vocal cords. PITCH RANGE The scale of pitch between its lowest and highest levels. INTONATION The variations
More informationA System for Labeling Self-Repairs in Speech 1
A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.
More informationComparative Error Analysis of Dialog State Tracking
Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 rws@cs.ecu.edu Abstract A primary motivation
More informationPrincipal Components of Expressive Speech Animation
Principal Components of Expressive Speech Animation Sumedha Kshirsagar, Tom Molet, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva 24 rue du General Dufour CH-1211 Geneva, Switzerland {sumedha,molet,thalmann}@miralab.unige.ch
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationIntegration of Negative Emotion Detection into a VoIP Call Center System
Integration of Negative Detection into a VoIP Call Center System Tsang-Long Pao, Chia-Feng Chang, and Ren-Chi Tsao Department of Computer Science and Engineering Tatung University, Taipei, Taiwan Abstract
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationSeparation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More informationOn-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues
J Multimodal User Interfaces (2010) 3: 7 19 DOI 10.1007/s12193-009-0032-6 ORIGINAL PAPER On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues Florian
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationLMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH
LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH K. Riedhammer, M. Gropp, T. Bocklet, F. Hönig, E. Nöth, S. Steidl Pattern Recognition Lab, University of Erlangen-Nuremberg, GERMANY noeth@cs.fau.de
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationPoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
More informationAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts Vitomir Kovanović v.kovanovic@ed.ac.uk Dragan Gašević dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk
More informationVoice User Interfaces (CS4390/5390)
Revised Syllabus February 17, 2015 Voice User Interfaces (CS4390/5390) Spring 2015 Tuesday & Thursday 3:00 4:20, CCS Room 1.0204 Instructor: Nigel Ward Office: CCS 3.0408 Phone: 747-6827 E-mail nigel@cs.utep.edu
More informationSelected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká hladka@ufal.mff.cuni.cz
More informationLANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5
Page 1 of 57 Grade 3 Reading Literary Text Principles of Reading (P) Standard 1: Demonstrate understanding of the organization and basic features of print. Standard 2: Demonstrate understanding of spoken
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationIntroduction to the Database
Introduction to the Database There are now eight PDF documents that describe the CHILDES database. They are all available at http://childes.psy.cmu.edu/data/manual/ The eight guides are: 1. Intro: This
More informationHow To Learn From The Revolution
The Revolution Learning from : Text, Feelings and Machine Learning IT Management, CBS Supply Chain Leaders Forum 3 September 2015 The Revolution Learning from : Text, Feelings and Machine Learning Outline
More informationFast Labeling and Transcription with the Speechalyzer Toolkit
Fast Labeling and Transcription with the Speechalyzer Toolkit Felix Burkhardt Deutsche Telekom Laboratories, Berlin, Germany Felix.Burkhardt@telekom.de Abstract We describe a software tool named Speechalyzer
More informationEfficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
More informationCOURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014
COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: weyand@maine.edu COURSE DESCRIPTION: This is a practical
More informationSENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas
More informationApplying Repair Processing in Chinese Homophone Disambiguation
Applying Repair Processing in Chinese Homophone Disambiguation Yue-Shi Lee and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan, R.O.C.
More informationKNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationAnalysis of SMO and BPNN Model for Speech Emotion Recognition System
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-4 E-ISSN: 2347-2693 Analysis of SMO and BPNN Model for Speech Emotion Recognition System Rohit katyal
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationII. RELATED WORK. Sentiment Mining
Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract
More informationApplications of speech-to-text in customer service. Dr. Joachim Stegmann Deutsche Telekom AG, Laboratories
Applications of speech-to-text in customer service. Dr. Joachim Stegmann Deutsche Telekom AG, Laboratories Contents. 1. Motivation 2. Scenarios 2.1 Voice box / call-back 2.2 Quality management 3. Technology
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationMicroblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationIEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. X, XXXXX 2015 1. Sentiment Analysis: From Opinion Mining to Human-Agent Interaction
TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 6, NO. X, XXXXX 2015 1 Sentiment Analysis: From Opinion Mining to Human-Agent Interaction Chloe Clavel and Zoraida Callejas Abstract The opinion mining and human-agent
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationAnnotated bibliographies for presentations in MUMT 611, Winter 2006
Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.
More information62 Hearing Impaired MI-SG-FLD062-02
62 Hearing Impaired MI-SG-FLD062-02 TABLE OF CONTENTS PART 1: General Information About the MTTC Program and Test Preparation OVERVIEW OF THE TESTING PROGRAM... 1-1 Contact Information Test Development
More informationMODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC
12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,
More informationTechnologies for Voice Portal Platform
Technologies for Voice Portal Platform V Yasushi Yamazaki V Hitoshi Iwamida V Kazuhiro Watanabe (Manuscript received November 28, 2003) The voice user interface is an important tool for realizing natural,
More informationSocial Media Analytics Summit April 17-18, 2012 Hotel Kabuki, San Francisco WELCOME TO THE SOCIAL MEDIA ANALYTICS SUMMIT #SMAS12
Social Media Analytics Summit April 17-18, 2012 Hotel Kabuki, San Francisco WELCOME TO THE SOCIAL MEDIA ANALYTICS SUMMIT #SMAS12 www.textanalyticsnews.com www.usefulsocialmedia.com New Directions in Social
More informationThe Minor Third Communicates Sadness in Speech, Mirroring Its Use in Music
Emotion 2010 American Psychological Association 2010, Vol. 10, No. 3, 335 348 1528-3542/10/$12.00 DOI: 10.1037/a0017928 The Minor Third Communicates Sadness in Speech, Mirroring Its Use in Music Meagan
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationAnalysis and Synthesis of Hypo and Hyperarticulated Speech
Analysis and Synthesis of and articulated Speech Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium {benjamin.picart,thomas.drugman,thierry.dutoit}@umons.ac.be
More informationMeasuring and synthesising expressivity: Some tools to analyse and simulate phonostyle
Measuring and synthesising expressivity: Some tools to analyse and simulate phonostyle J.-Ph. Goldman - University of Geneva EMUS Workshop 05.05.2008 Outline 1. Expressivity What is, how to characterize
More informationOn Intuitive Dialogue-based Communication and Instinctive Dialogue Initiative
On Intuitive Dialogue-based Communication and Instinctive Dialogue Initiative Daniel Sonntag German Research Center for Artificial Intelligence 66123 Saarbrücken, Germany sonntag@dfki.de Introduction AI
More informationVisualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationLecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationSentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
More informationFeature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
More informationStrand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Details Craft and Structure RL.3.1 Ask and answer questions to demonstrate understanding of a text, referring explicitly to the text as the basis for the answers.
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationObjective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More information