Multi Modal Affective Data Analytics



Similar documents
Managing, Mining and Visualizing Multi-Modal Data for Stress Awareness

Emotion Detection from Speech

Speech Signal Processing: An Overview

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Context Aware Predictive Analytics: Motivation, Potential, Challenges

Introduction to Data Mining

The Scientific Data Mining Process

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Classification of Household Devices by Electricity Usage Profiles

Separation and Classification of Harmonic Sounds for Singing Voice Detection

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Information Visualization WS 2013/14 11 Visual Analytics

Social Media Mining. Data Mining Essentials

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

School Class Monitoring System Based on Audio Signal Processing

An Overview of Knowledge Discovery Database and Data mining Techniques

Anomaly Detection in Predictive Maintenance

How To Find Out If You Are Stressed

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification

Developing an Isolated Word Recognition System in MATLAB

6.2.8 Neural networks for data mining

Chapter 1: Introduction

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Introduction to Data Mining

Data Mining for Wearable Sensors in Health Monitoring Systems: A Review of Recent Trends and Challenges

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Michael R. Pinsky, M.D., C.M., Dr.h.c., FCCP, MCCM Professor of Critical Care Medicine, Bioengineering, Anesthesiology, Cardiovascular Diseases, and

Research on physiological signal processing

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Applications of Deep Learning to the GEOINT mission. June 2015

Lecture 9: Data Mining, Data Analytics and Big Data

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Alignment and Preprocessing for Data Analysis

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Artificial Neural Network for Speech Recognition

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Simple and efficient online algorithms for real world applications

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Big Data: Image & Video Analytics

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Data Mining and Data Warehousing on US Farmer s Data

Emotion Recognition Using Blue Eyes Technology

Visualization methods for patent data

Machine Learning Logistic Regression

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Big Data Text Mining and Visualization. Anton Heijs

MHI3000 Big Data Analytics for Health Care Final Project Report

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Lecture 2, Human cognition

Information Management course

The policy also aims to make clear the actions required when faced with evidence of work related stress.

Increase System Efficiency with Condition Monitoring. Embedded Control and Monitoring Summit National Instruments

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Protein Protein Interaction Networks

Concept and Applications of Data Mining. Week 1

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

A Demonstration of a Robust Context Classification System (CCS) and its Context ToolChain (CTC)

SPATIAL DATA CLASSIFICATION AND DATA MINING

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Structural Health Monitoring Tools (SHMTools)

Data Mining Algorithms Part 1. Dejan Sarka

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Making Sense of the Mayhem: Machine Learning and March Madness

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

Recent advances in Digital Music Processing and Indexing

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

How can we discover stocks that will

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Facility & Property Management Solution

Understanding Agile Project Management

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Applying Data Science to Sales Pipelines for Fun and Profit

Knowledge Discovery from patents using KMX Text Analytics

Cleaned Data. Recommendations

Transcription:

Multi Modal Affective Data Analytics Mykola Pechenizkiy SDAD 2012 @ ECMLPKDD2012 2 September 2012 Bristol, UK http://www.win.tue.nl/stressatwork

Affective data Social media Social media leads to masses of affective data related to peoples emotions, sentiments and opinions In the recent past was used mainly for marketing needs Web analytics Social Media Whatever the incentive was to study this, sentiment classification has become much more accurate 3

Multilingual Sentiment Classification 4

Rule based polarity detection Rule based emission model: 8 kinds of rules: Emission 5

SentiCorr How much positive and negative content do we read or write? 6

Mobile SentiCorr App What a fantastic idea, now if Great idea! Get it on ios This app is esigned to make someone else (or a computer) soon (anonymous) read our e mails for and protect us from WHAT??? How lazy can we get? Like someone Stress commented is often on CNN reactions, made if we are worse getting upset by by the tones anticipation and/or scoldings in e mails, we certainly have bigger issues that need to of be an dealt unpleasant with. C mon, guys, go event invent something useful. Not to mention, does it detect and actually dissipated irony? Will it weed out the liars? once you tackle the Pleeeeezzzeeee, what a WASTE OF SOMEONE S COLLEGIATE TIME AND ENERGY. Don t we have problem directly houses to clean and poor people to feed and old folks to help with their shopping? Go do something useful with your time, inventors of his app!!! Pamela Briggs British Psychological Society 7

OLAP Style Exploration of Data Summaries 8

Exploration of Individual Cases, e.g. e Mails 9

Sentiment vs. Fact Classification News in media or business are considered to be sentiment neutral, but they often contain positive or negative information, e.g. You will be fired in 3 months because of the serious budget cuts. no sentiment, but negative information Similarly, in work related correspondence there could be stressing information: How can we identify it? 10

Sentiment discovery: State of the art Sentiment analysis/classification is mature! Commercial products, free services, open source, variety of apps, evolves in many directions Several great overviews: Sentiment Analysis in Practice ICDM2011 tutorial by Tiger Zhang (ebay Research Labs) http://web.cs.dal.ca/~yongzhen/publication/paper/icdm2011_senti mentanalysisinpracticetutorial.pdf Modeling Opinions and Beyond in Social Media by Bing Liu (UIC) http://kdd2012.sigkdd.org/sites/images/summerschool/bing Liu.pptx

Outline Framework for Stress Analytics: Data management, OLAP support Shape based Query by Example Stress detection from speech and GSR Predictive features and classification From controlled experiments to real life

What is stress? Is it a bad thing?

Stress in NL according to Coosto.nl Not really job related

Impact of Stress at Work WHO: by 2020 Top 5 diseases will be stress related. USA: health care expenditures are ~50% greater for workers who report high levels of stress at work (J. Occup. Env. Med, 40:843 854). the Netherlands: (TNO, 2006; TU/e Cursor 2012): The direct costs of stress are 4 billion Euro per year. Every year 150.000 300.000 employees become ill because of stress at work. 1/7 disabled because of stress at work. In TU Delft, 53% of surveyed students indicated that they experienced huge stress during their studies. 15

What do organizations try (not) to do? Reduce workload (33%) Discuss psychological load (28%) Change work processes (17%) Improve work/life balance (14%) Improve managers skills (13%) Extend regulations (9%) Source: (TNO, dossier Werkdruk)

What can go wrong? They are not always aware of the problem or don t know the exact cause People do not always want to share what they experience with others Not always timely enough Expensive to organize meeting with psychologists, interventions The individual causes are different and not always well understood Giving practical advises is not trivial

Types of Stress and Stressors Different types of stress: Survival stress a response to a physical danger Environmental stress noise, crowding, pressure from work or family Internal stress worrying about things we can't control; putting ourselves in situations we know will cause us stress (addicted to stress expanding todo list with more and more conference deadlines) Fatigue and overwork in a long term perspective Stress affects both body and mind 18

Types of Stress and Stressors Three kinds of stress: Acute: caused by an acute short term stress factor. Episodic acute: occurs more frequently & periodically. Chronic: caused by long term stress factors harmful. Factors causing stress@work: long work hours, work overload, time pressure, difficult, demanding or complex tasks, high responsibility, lack of breaks, lack of training conflicts, underpromotion, job insecurity, lack of variety, and poor physical work conditions (limited space, temperature and lighting conditions) 19

Concept Be eep! 20

StressAnalytics Make people aware of their stress and stressors Overview of stressors Exploration of relations Access to evidence, i.e. annotated, measured stress Empowerment by awareness (+ implicit/explicit advice)

Our approach to StressAnalytics What, When, Where, with Whom Physiological signs Pattern Mining OLAP cube 22

Our approach to Stress Analytics Make a person aware of what is happening how they spend their time and when and from where the stress comes in Provide valuable input for pattern mining/knowledge discovery Much richer data sources Visual analytics Interactive exploration of stress related data Collecting subjective data/labels from a person through the interaction 23

GUI Exploration, Interaction, Visual Analytics OLAP Zoom in&out, slice&dice Pattern mining, prediction, query by example Data Mining Feature extraction, peak/change detection, classification Raw data, objective evidence External environment temperature, lighting, noise, airconditioning External userrelated data KPI, E mail, calendar, social media, news Physiological signs GSR, temp., voice, heart rate, facial expressions

Evidence: physiol. signals & external sources GSR, Temperature, Speech, Facial expressions, Sentiment in text

Alignment of Information Sources What person reads and writes: SentiCorr What person does in general according to agenda Environment context (lighting, noise, temp etc.) Annotate data from video, sound, text processing, and vital signs What person does with the computer http://wakoopa.com/ Different aspect with pre processing, storing, managing 26

Stress Data Cube/OLAP Quick data summaries wrt predefined dimensions 27

Stress Analytics Visualization OLAP style exploration: selecting multidimension, zoom in, zoom out. Navigating to the evidences: i.e. raw data: GSR, skin temperature, speech, and email Shape based time series similarity search State of the art UCR Suite (Keogh et al.) Demo: http://www.win.tue.nl:8080/saw_analytics/stress_v isualization.jsp

OLAP system, a Star Schema

Shape Based Query by Example Given a subsequence of GSR time series s Query Find a similar shape time series with s Result

Euclidean Distance: Shape based QBE Dynamic Time Warping (DTW) State of the art UCR Suite (Keogh et al.)

How to measure stress Determine stress level based on observed sweat production 32

Detection and Categorization of Stress Based on GSR data alone not as easy as the following figure may suggest: 33

Challenges in Stress Detection All kinds of noise, e.g. loosing contact with the skin Activity (exercising), environment (cold/hot) context and personal differences may impact GSR we observe 34

Interpretation isn t straightforward 35

Detection as Classification GSR features Mean, SD, min and max of GSR. Mean, min and max of peak height. Total number of GSR response. The sum of GSR amplitude. The sum of rising time response. The sum of energy response.

Adding more data to disambiguate Skin and room temperature, noise, accelerometer, voice, face, 37

e.g. activity recognition can help Writing vs. typing vs. walking vs. teaching vs Analyzing accelerometer data only (wrist band) 38

Uncontrolled and semi controlled Philips Research employees wearing the device during their working hours Students passing the written and multiple choice exams Students presenting demos/posters with course project results More to come via HumanCapitalCare 39

Experiment demo

41

Measuring GSR in (un)controlled settings Philips prototype Self made, the LEGO Mindstorms NXT 42

Multi Source Affective Data Classification Stress/Emotion classification from text, GSR & speech Facial expression analysis GSR & other sensors 43

Automatic Stress Detection speech model GSR model feature enrichment ensemble learning speech GSR speech GSR speech GSR speech features GSR features speech features GSR features speech features GSR features classification classification combine features classification classification classification ensemble

Stress and Skin Conductance Stress Changes in Autonomic Nervous System (ANS) activation of sweat glands Changes of skin conductance Changes of the amount of the produced sweat Relax skin is drier skin conductance is lower Stress sweat increases skin conductance is higher

GSR features Mean, SD, min and max of GSR. Mean, min and max of peak height. Total number of GSR response. The sum of GSR amplitude. The sum of rising time response. The sum of energy response.

Change detection approach Online settings

Preprocessing steps 50

Stress and Speech Stress Respiration Rate increases Increased Pitch Increased subglottal pressure Voice is a good indicator of stress [scherer, 1986]

Speech Features Voiced and unvoiced speech

Speech Features Pitch / Fundamental frequency

Speech Features Mel Frequency Cepstral Coefficients (MFCCs) are coefficients that approximate human perception auditory response. Audio (temporal) FFT frequency Mel scale filter filtered frequency logs power MFCCs Store the first coefficients DCT representation DCT log frequency

Classification Methods Support Vector Machine (SVM) State of the arts. Decision Tree classifier. K means using Vector Quantization (VQ). This method is chosen as a baseline. Gaussian Mixture Model (GMM). This method works well for speaker recognition task. Change detectors: ADWIN, thresholding

Stress Dataset Three types of GSR patterns. First Second Third type: type:

Aligning of data sources 60 seconds GSR Instance 1 Instance 2 Instance 3 speech Instance 1 Instance 2 Instance 3

Stress Dataset: Speech Features

Stress Model using GSR features 10-times 10-fold CV (not subject independent) 90 80 70 70.51 79.66 80.72 73.45 74.9 77.81 66.82 70.6 62.52 Accuracy (percent) 60 50 40 30 46.12 55.54 53.21 k means GMM SVM 20 Decision Tree 10 0 Recovery vs workloads Recovery vs heavy workload Light vs heavy workload SVM outperformed other methods. Recognizing light vs heavy workload is harder than between recovery vs heavy workload.

Stress Model using speech features 100 92.39 92.56 91.69 90 Accuracy (percent) 80 70 60 50 40 30 62.08 58.82 56.78 55.6 55.39 49.65 68.86 70.69 71.47 59.08 49.17 50.6 52.3 k means GMM SVM 20 Decision Tree 10 0 Pitch MFCC MFCC Pitch RASTA SVM outperforms the other classifier. K means and GMM do not perform well for speech. MFCC is a good indicator for stress detection.

1 subject leave out cross validation (subject independent model) Accuracy (percent) Accuracy (percent) 10090 9080 80 70 70 60 60 50 50 40 40 30 30 92.39 92.56 79.66 80.72 91.69 74.84 75 70.6 63.04 67.82 70 72.17 62.08 53.04 It is better to address the problem of stress detection using a subject dependent model 20 20 10 10 0 0 Recovery Pitch vs workloads MFCC Recovery vs heavy workload MFCC Pitch Light vs heavy RASTA workload PLP GSR Tasks Speech Features 10 times 10 fold CV 1 Subject Leave Out 1 subject leave out CV

Fusion Approaches Feature enrichment Ensemble learning

Fusion of GSR and Speech Accuracy (percent) 100 90 80 70 60 50 40 30 20 10 0 90.73 92.43 91.34 92.47 69.04 70.17 MFCC and GSR MFCC Pitch and GSR Pitch and GSR Enriching Feature Space Logistic Regression as MetaLearner Light vs. heavy workload, balanced data

Kappa Agreement for Classifiers Measure agreement between two model using Cohen s Kappa test. Kappa = 1 complete agreement. Kappa = 0 complete disagreement.

Stress detection summary Speech is more reliable (in lab settings) than GSR, but more subject dependent. SVM is performing better on both GSR and Speech signal. ADWIN & thresholding detectors do well on GSR Combining GSR and Speech is not trivial: Speech and GSR predictions are highly independent (low kappa value) This diversity may be exploited with dynamic integrations methods

Further directions Extend the notion of stress (positive and negative) in the stress analytics framework. Stress analytics affective data analytics Collect more data to enable OLAP KDD part of the framework. Combine with other signals, such as facial expression, heart rate, nutrition. Long path from lab setting to real life situation; but both are needed.

Is Acute Stress Good or Bad? 69

What is the Relaxation Then? 70

Is Normal Condition Good or Bad? What if someone s patterns looks like NNNNNNNNNNNNNNNN 71

Summary The fun parts come from The fact that not much is known about stress Playing with heterogeneous/multi modal data Multi disciplinary (data collection, data management, data mining, visual analytics) Engineering approach to data mining How to show the utility i.e. what we do helps to understand better stress as a phenomenon, and the stressors, and how to helps people at the end 72

Take home messages Lab settings vs. real world Availability and quality of the signal Voice recorded Someone s else voice recorded Noise and missing data, uncertainty A person cannot speak (during the meeting while someone else is speaking) Ground truth, labels, subjective vs. objective A large problem space If you know how to help us with any part on StressAnalytics talk to me 73