Multimodal Emotion Recognition

Size: px

Start display at page:

Download "Multimodal Emotion Recognition"

Sydney Lane
7 years ago
Views:

1 Multimodal Emotion Recognition Marco Paleari & Benoit Huet {Paleari, eurecom.fr

2 Outline Motivations Possible HCI scenarios and objectives Automatic emotion recognition Introduction Approach Audio: speech prosody Video: facial expressions Multimodal emotion recognition Conclusions 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 2

3 Motivations Why Emotions?

4 Motivations Emotions are important communication means For example howone thing is said is sometimes more important than what is said. Search results for good morning 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 4

Edward Munch, The Scream Francisco Hayez, The Kiss (1) Jerrold Levinson, The Oxford Handbook

5 Motivations Emotions are important communication means Art is a vehicle for the expression or communication of emotionsand ideas (1). Edward Munch, The Scream Francisco Hayez, The Kiss (1) Jerrold Levinson, The Oxford Handbook of Aesthetics, The Oxford University Press, 2003, p.5. 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 5

6 Motivations Human memory tends to group together events that present similar emotive meaning 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 6

7 Possible HCI Scenarios How to use Emotions?

8 Possible HCI Scenarios Emotions could be used for indexing and retrieving media happiness sadness happiness Happiness Comedy Em surprise 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 8

9 Possible HCI Scenarios Summarization sadness happiness happiness surprise Comedy Tailored Summary 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 9

10 Possible HCI Scenarios Telemedicine Gaming E-learning Ubiquitous / pervasive computing 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 10

11 Automatic Emotion Recognition Introduction

12 Automatic Emotion Recognition Different possibilities Detect people emotions: Facial Expressions Vocal Prosody Autonomous Nervous System Signals (i.e. heart rate, skin conductivity, etc.) Gestures / Posture Others Detect object emotions (music, images, etc.) Colors Tempo / Timber Others 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 13

Emotion Recognition From Facial Expression

Ekman& Friesel 1971 Definition of 6 universal

13 Emotion Recognition From Facial Expression Duchenne1862 First studies of facial expressions Ekman& Friesel 1971 Definition of 6 universal emotions: Anger Disgust Fear Happiness Sadness Surprise 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 14

14 Emotion Recognition From Facial Expression Different techniques are possible 3D mesh mapping [Essa1995, Cohen et al. 2000] Dense motion flow [Lien 1998, Busso 2004] Feature point movement [Lien 1998, Cohen et al. 2000, Michel & Kaliouby 2003] Facial movement energy estimation [Essa 1995] 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 15

15 Emotion Recognition From Vocal Prosody Different techniques are possible Pitch and Intensity Spectral Envelop (Formants) Harmonicity analysis Linear Predictive Coefficient (LPC) Mel-Frequency Cepstral Coefficients (MFCC) Speech rate Combination of the above [Johnstone et al. 2001, Noble 2003, Busso 2004] ~23/46% ~71% 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 16

16 Inter and Intra Individual Differences AVERAGE Inter-individual variability Intra-individual variability e.g. arousal Psychological meaning Emotional cue e.g. Heart Rate, pitch, eyebrow shape Personality or Mood Dependence Day Dependence [Fiorito and Simons 1994] [Picard et al. 2001] 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 17

17 Automatic Emotion Recognition Approach

18 Experimental Setup enterface database [Martin et al. 2005] 44 characters (male and female) Each acting 6 emotions: Anger, Disgust, Fear, Happiness, Sadness and Surprised Each emotions is played 5 times by each character Resulting in 1420 videos Emotion learned on 30 characters acting 4 times each emotion Recognition is evaluated on the remainder of the database 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 19

19 Experimental Setup 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 20

20 Audio: Speech Prosody

21 Emotion Recognition From Speech Prosody Speech Feature Extraction Feature Vector Generation Classification Pitch Mean Formants Standard Deviation Energy Variance Harmonicity Min (& its position) (Harmonics/ Noise Energy Ratio) Support Vector Machines Neural Networks Hidden Markov Models Linear Prediction Coefficients Max (& its position) (LPC) [0.05, 0.25, 0.5, 0.75, 0.95] Quantiles Mel-Frequency Cepstral Coefficients (MFCC) Polynomial Regression 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 22

22 Audio Result 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 36,19% 58,84% 29,16% 50,31% 38,35% 59,85% 57,47% 66,04% 37,51% 41,63% 10,00% 19,89% 16,40% 21,79% 17,11% 0,00% anger disgust fear happiness sadness surprise average Result obtained with 1 NN Result using obtained Polynomial with & SVM Statistical using analysis Polynomial and analysis averaging results on 0.6 seconds 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 23

23 Video: Facial Expressions

24 Emotion Recognition from Facial Expressions 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 25

but only shift from one position to another 26/05/2008 Eurecom -

25 Lukas Kanade Technique This method assumes that gray values (nxnfeature window) do not change between two consecutive frames, but only shift from one position to another 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 26

26 Emotion Recognition from Facial Expressions Follow points and analyze the movement signals to extract meaningful features signal 1 signal 2 time 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 27

27 Emotion Recognition from Facial Expressions Mean Standard Deviation signal 1 signal 2 Variance Min (& its position) Max (& its position) SVM NN HMM time [0.05, 0.25, 0.5, 0.75, 0.95] Quantiles Polynomial Regression 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 28

28 Video Result 70,00% 70,00% 60,00% 60,00% 50,00% 50,00% 40,00% 40,00% 30,00% 30,00% 55,40% 36,48% 34,46% 56,97% 41,43% 20,00% 62,66% 42,55% 23,54% 30,71% 31,77% 33,49% 32,12% 10,00% 12,91% 12,84% 0,00% anger disgust fear happiness sadness surprise average Result obtained with SVM 6 SVM using (one Polynomial for each emotion) analysis 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 29

29 Example 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 30

30 Example 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 31

31 Example 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 32

32 Multimodal Emotion Recognition

33 Multimodal Emotion Recognition Feature Extraction Feature Extraction Feature Fusion Emotional Classification Emotional Classification Emotional Classification F u s i o n Emotional Estimate Emotional Estimate 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 34

34 Multimodal Result 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 54,67% 60,63% 46,77%49,95% 25,49% 52,78% 54,61% 61,09% 54,12% 26,94% 42,24% 45,29% 20,62% 22,63% 10,00% 0,00% anger disgust fear happiness sadness surprise average Feature Fusion Decision Fusion Result obtained with NN and data filtering (average on 0.6 sec). Decision Fusion results are obtained by averaging on 0.4 sec. 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 35

35 Conclusion Similar results (on average over all emotions) are obtained based on audio and video. Simple decision fusionprovides more improvement than simple feature fusion. More modalities should lead to improved emotion recognition accuracy. 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 36

36 Future Work Video Analysis Facial Feature Point Detection Classification Hidden Markov Models Multimodal Fusion Temporally controlled Multimodal Fusion 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 37

37 Multimodal Fusion Signal Quality Assessment Signal Quality Assessment Feature Extraction Feature Extraction Control Emotional Classification Emotional Classification Feature Fusion Emotional Classification F u s i o n Emotional Estimate Emotional Estimate F u s i o n Emotional Estimate 26/05/2008 Eurecom - MM Talk: Multimodal Emotion Recognition - Marco Paleari and Benoit Huet 38

38 Thank you for your attention

Emotion Detection from Speech

Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction