Using Crowdsourcing for Labelling Emotional Speech Assets



Similar documents
Online crowdsourcing: rating annotators and obtaining cost-effective labels

The Multidimensional Wisdom of Crowds

Human Cloud and Crowdsourcing in Future Internet Platforms and Modeling Issues

How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy

Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria

End-to-End Sentiment Analysis of Twitter Data

Domain Specific Knowledge Base Construction via Crowdsourcing

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Crowdsourcing for Big Data Analytics

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Using Artificial Intelligence to Manage Big Data for Litigation

Crowd Translator: On Building Localized Speech Recognizers through Micropayments

Transcription bottleneck of speech corpus exploitation

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Crowdsourcing Entity Resolution: a Short Overview and Open Issues

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Visipedia: collaborative harvesting and organization of visual knowledge

Divergence of User experience: Professionals vs. End Users

A User-Based Usability Assessment of Raw Machine Translated Technical Instructions

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Machine Learning using MapReduce

Data Isn't Everything

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Introduction to Pattern Recognition

Speech Signal Processing: An Overview

Why Data Mining Research Does Not Contribute to Business?

Identifying Focus, Techniques and Domain of Scientific Papers

CSC384 Intro to Artificial Intelligence

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Big Data and Analytics: Challenges and Opportunities

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

An Iterative Dual Pathway Structure for Speech-to-Text Transcription

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Usability Testing with Paper Prototypes. Dorrit Gordon CS290GW04 February

Some Research Challenges for Big Data Analytics of Intelligent Security

Web Database Integration

Towards an open product repository using playful crowdsourcing

Challenges for Data Driven Systems

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Social Data Science for Intelligent Cities

Pricing Crowdsourcing-based Software Development Tasks

COMP9321 Web Application Engineering

Human Computation Must Be Reproducible

txteagle: Mobile Crowdsourcing

Evaluation Metrics for Persuasive NLP with Google AdWords

Learning to Process Natural Language in Big Data Environment

Tomer Shiran VP Product Management MapR Technologies. November 12, 2013

{ Mining, Sets, of, Patterns }

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics

Architecting Real-Time Crowd-Powered Systems

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

Machine Learning: Overview

Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION. clickworker GmbH 2015

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Sentiment analysis using emoticons

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Simple and efficient online algorithms for real world applications

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

NIST Digital Archives

MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping

The Contextualization of Project Management Practice and Best Practice

Cost of Quality in Crowdsourcing

Harvesting and Structuring Social Data in Music Information Retrieval

Object Recognition. Selim Aksoy. Bilkent University

How to Effectively Answer Questions With Online Tutors

Conversation Clusters: Human- Computer Dialog for Topic Extraction

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Interaction Design. Chapter 5 (June 8th, 2011, 9am-12pm): Sketching Interaction

MobileWorks: Designing for Quality in a Managed Crowdsourcing Architecture

Improving the Usability of APM Data: Essential Capabilities and Benefits

Database Marketing, Business Intelligence and Knowledge Discovery

Team Dynamics in Process Simplification. Introduction to Process Improvement Slide 1

Toward a community enhanced programming education

Industry 4.0 and Big Data

Multistep Dynamic Expert Sourcing

Automatic Labeling of Lane Markings for Autonomous Vehicles

Clustering Technique in Data Mining for Text Documents

MA2823: Foundations of Machine Learning

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Text Analysis for Big Data. Magnus Sahlgren

Acoustical Surfaces, Inc.

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

A qualitative examination of online gambling culture among college students: Factors influencing participation, maintenance and cessation

The Big Data methodology in computer vision systems

Miracle Integrating Knowledge Management and Business Intelligence

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

Introducing diversity among the models of multi-label classification ensemble

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

Crowdsourcing for Enterprises

Collecting Voices from the Cloud

People centric sensing Leveraging mobile technologies to infer human activities. Applications. History of Sensing Platforms

That is why leading vacuum pump manufacturers put their trust in the top-quality products of MANN+HUMMEL.

- Motivation - Quality, Quality Management overview and definitions - (Why standards in data quality) - Overview metadata, Quality as part of meta

False alarm in outdoor environments

Transcription:

Using Crowdsourcing for Labelling Emotional Speech Assets Alexey Tarasov, Charlie Cullen, Sarah Jane Delany Digital Media Centre Dublin Institute of Technology W3C Workshop on Emotion Language Markup - Oct 2010

2 Project Introduction! Science Foundation Ireland funded project! Objective:! prediction of levels of emotion in natural speech! 2 strands:! acoustic analysis (Dr Charlie Cullen - DIT MmIG)! machine learning prediction (Dr Sarah Jane Delany - DIT AIG)! 4 year project, started in October 2009! 2 PhD students

3 Requirements for Supervised Learning! Performance of supervised learning techniques depends on the quality of the training data! Requirements:! High quality speech assets! Good labels

4 Starting Point...! Emotional speech corpus [Cullen et al. LREC 08]! natural assets! use of Mood Induction Procedures! high quality recording! participants recorded in separate sound isolation booths! contextual or meta data is recorded where available! based on IMDI annotation schema

5 Next Steps...! Need to rate these assets...! Challenges:! manual annotation can be expensive and time consuming! experts often disagree! expertise does not necessarily correlate with experience Consider Crowdsourcing?

6 Crowdsourcing The act of taking a task traditionally performed by a designated agent and outsourcing it to an undefined, generally large group of people in the form of an open call [Jeff Howe]

7 Crowdsourcing! June 2006 Wired magazine article by Jeff Howe...the power of many... www.wired.com/wired/archive/14.06/crowds.html

https://www.mturk.com/mturk/ 8

www.google.com/recaptcha 9

www.gwap.com/gwap/ 10

11

12 Crowdsourcing! Triggered a shift in the way labels or ratings are obtained in variety of domains:! natural language tasks [Snow et al. 2008]! computer vision [Sorokin & Forsyth 2008, vonahn & Dabbish 2004]! sentiment analysis [Hsueh et al. 2008, Brew et al. 2010]! machine translation [Ambati et al. 2010]

13 Practical Experiences! Speed! 300 annotations from each of 10 annotators in < 11 mins [Snow et al. 2008]! evidence that obtaining quality annotations effects time (avg completion time 4 mins vs 1.5 mins) [Kittur et al. 2008]

14 Practical Experiences! Quality! 875 expert-equivalent affect labels per $1 [Snow et al. 2008]! by identifying good annotators accurate labels can be achieved with significant reduction in effort [Donmez et al. 2008, Brew et al. 2010]

15 Challenges! How to! select which assets are presented for rating?! estimate the reliability of the annotators?! ensure the reliability of the ratings?! select training data for the prediction systems?! maintain the balance between consensus and data coverage?

16 Asset Selection! Active Learning used by [Ambati et al. 2010, Domnez et al. 2009]! a supervised learning technique which selects the most informative examples for annotation! Clustering used by [Brew et al. 2010]! grouping examples and selecting representative examples from cluster to annotate

17 Annotator Reliability! Depends on whether annotators are identifiable or not...! Strategies for recognising strong annotators! Good Annotators those that agree with the consensus rating [Brew et al. 2010]! Iterative approach to filter out weaker annotators [Domnez et al. 2009]

18 Good Annotators are Useful... high consensus assets good for training... [Brew et al. 2010]

19 Deriving ratings! Use consensus rating [Brew et al. 2010]! select the rating with highest consensus! thresholds can apply! Only use good annotators to derive rating [Domnez et al. 2009]! Using learning techniques to estimate ground truth from multiple noisy labels [Smyth et al. 1995, Raykar et al. 2009/10]

20 Consensus vs. Coverage! Is it better to label more assets or get more labels per asset?! Research suggests fewer annotations are needed in domains with high consensus [Brew et al. 2010]

21 Reliability of the Ratings! Evidence of gaming with crowdsourcing services! numbers of untrustworthy users is not large! Techniques! require users to complete a test first [Ambiati et al. 2010]! use percentage of previously accepted submissions [Hsueh et al. 2008]! include explicitly verifiable questions [Kittur et al. 2008]

22 Use Case Seán has a set of speech assets extracted from recordings of experiments using mood induction procedures. He wants to get these assets rated on a number of different scales, including activation and evaluation, by a large number of non-expert annotators. He wants to use a micro-task system such as Mechanical Turk to get these ratings. Active learning will be used to select the most appropriate assets to present for labels from the annotators. He will then analyse and evaluate different techniques for identifying good annotators and determining consensus ratings for the assets which will be used as training data for developing prediction systems for emotion recognition.

23 Experience in our group! Preliminary rating using crowdsourcing [Brian Vaughan] Findings! clear instructions! asset selection strategy! payment amounts

24 References! V. Ambati, S. Vogel, and J. Carbonell. Active Learning and Crowd-Sourcing for Machine Translation. In Proc. of LRECʼ10, pages 2169 2174, 2010.! A. Brew, D. Greene, and P. Cunningham. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media. In Proc. of PAIS 2010, pages 1 11. IOS Press, 2010.! J. Howe. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. Crown Business, 2008.! A. Kittur, E. Chi, and B. Suh. Crowdsourcing for Usability: Using Micro-Task Markets for Rapid, Remote, and Low-cost User Measurements. Proceedings of CHI 2008.! V.C. Raykar, S. Yu, L.H. Zhao, A. Jerebko, C. Florin, G.H. Valadez, L. Bogoni, and L. Moy. Supervised Learning from Multiple Experts: Whom to trust when everyone lies a bit. In Proc. of ICML-2009, pages 889 896, 2009.! V.C. Raykar, S. Yu, L.H. Zhao, G.H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from Crowds. Journal of Machine Learning Research, 11:1297 1322, 2010.! P. Smyth, U. Fayyad, M. Burl, P. Perona, and P. Baldi. Inferring Ground Truth from Subjective Labelling of Venus Images. Advances in neural information processing systems, 7:1085 1092, 1995.! R. Snow, B. OʼConnor, D. Jurafsky, and A.Y. Ng. Cheap and Fast - But is it Good? Evaluating Non- Expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 254 263. ACL, 2008.! A. Sorokin and D. Forsyth. Utility data annotation with Amazon Mechanical Turk. In Proc. of CVPR 2008, pages 1 8, 2008! L. von Ahn and L. Dabbish. Labelling Images with a Computer Game, In Procs of CHI 2004