Transcription bottleneck of speech corpus exploitation

Size: px
Start display at page:

Download "Transcription bottleneck of speech corpus exploitation"

Transcription

1 Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen

2 Overview Introduction Written corpora vs. speech corpora Speech corpus annotation Transcription bottleneck Crowdsourcing the orthographic transcription Automatic broad phonetic alignment Query-driven annotation Summary 2

3 Written vs. speech corpora Written corpora can be compiled/accessed more easily web as corpus large available corpora, e.g. DeReKo for German (3.4 billion words): Written corpora can be exploited without any annotation, e.g. extraction of higher-order collocations in CCDB: Limited availability of speech corpora Speech corpora need at least a basic transcription 3

4 Speech corpus annotation "Basic" transcription: orthographic transcription languages without standardized orthography? Text-to-audio alignment Phonetic transcription for phonetic and phonological research Prosody, information structure, coreferences, POS,... 4

5 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 5

6 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 6

7 Crowdsourcing: Introduction Term coined by Jeff Howe (Wired, June 2006) Outsourcing: subcontracting a process, such as product design or manufacturing, to a third-party company Crowdsourcing: outsourcing a task traditionally performed by an employee or contractor to an undefined, generally large group of people Classical crowdsourcing: self-service restaurants, supermarkets, IKEA, ATMs, ticket machines New: use the Internet to publicize and manage crowdsourcing projects "Wisdom of crowds": aggregation of information in groups result in decisions that are often better than could have been made by any single member of the group 7

8 Amazon Mechanical Turk (mturk.com) 8

9 Distributed Proofreaders (pgdp.net) 9

10 Recording Teenagers: (LMU Munich) 10

11 Key guidelines for successful crowdsourcing 1. Be focused: vaguely defined problems get vague answers 2. Get your filters right: use crowd and experts to extract the best answers 3. Tap the right crowds: find the best experts in the mass 4. Build community into social networks (BusinessWeek, September 25, 2006) 11

12 Possible application: speech corpus "German Today" Recordings in 160+ towns throughout the German speaking area of Europe (D, A, CH, LUX, I, B, FL) 4 high school students (aged 16-20) in every town und 2 older adults (aged 50-60) in 80 towns 800+ speakers 90 minutes per speaker 1200 hrs. of speech Material: read speech interview map task 12

13 13

14 Map Task Bruneck Landeck Start Ziel Start Ziel 14

15 Crowdsourcing the orthographic transcription Dialectal spontaneous speech (map task data) can be transcribed reliably only by (near-)native speakers of the dialect. Possible crowdsourcing implementation: central database of speech signals, metadata, transcripts, and information about the users/transcribers web-based transcription software, e.g. WebTranscribe (as used in clearly defined task: transcribe each inter-pause-stretch with standard German orthography quality assurance: parallel transcription, evaluation + control tasks (as employed by CastingWords on mturk.com) recruit transcribers: contact the schools where the recordings took place and/or the speakers directly community: points / virtual titles, rewards (e.g. visit to IDS), games... 15

16 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 16

17 Automatic broad phonetic alignment Input: speech signal orthographic transcription canonic/phonemic transcription of all words in the corpus pronunciation lexicon grapheme-to-phoneme converter language-specific phoneme models (e.g. trained HMMs) Output: time-aligned broad phonetic transcription 17

18 Example: orthographic transcription 18

19 Munich Automatic Segmentation System MAUS 19

20 Modelling post-lexical phonological processes 20

21 Obvious errors 21

22 Evaluation: comparison with manual transcription Van Bael et al. (2006, 2007) compared 10 aligners for Dutch with a manually obtained reference transcription. Results: Best performance: Canonical transcription + modelling of postlexical phonological processes with a decision tree Number of remaining disagreements with the reference transcription (14.6% for spontaneous speech, 8.1% for read speech) only slightly higher than human inter-labeller disagreement scores reported in the literature 22

23 Task-based evaluation access specific portions of the speech signal for further manual annotation? duration-based analyses (only large, significant effects can be found) analyses in the frequency domain (e.g. formant slope) 23

24 Phonetic aligners for lessresourced languages? build your own using HTK but: you need at least one hour of phonetically segmented and labelled speech data find an aligner for a language that is phonetically similar to your target language and use its pre-built HMMs adding pronunciation lexicon and/or grapheme-to-phoneme rules mapping between the phonemes of your target language and the HMM-modelled language 24

25 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 25

26 Traditional corpus annotation process Gut (2008) 26

27 Problems with sequential corpus creation too time-consuming: many years of annotation work before corpus can be exploited and any results can be published very error-prone: limited reliability of annotations due to coder drift restricted corpus queries: failed/impossible queries re-annotation of corpus 27

28 Cyclic and iterative corpus annotation ("agile corpus creation") Gut (2008) 28

29 Query-driven phonetic annotation of "German Today" 29

30 30

31 31

32 Advantages of agile corpus creation Query-driven approach tests suitability and consistency of annotation schema very little data has to be re-annotated or discarded design errors, annotation errors and conceptual inadequacies become immediately visible successive cycles improve annotation schema and limit it to the elements necessary for the queries saves time early publication of first results 32

33 Combining automatic and querydriven annotation 33

34 Summary speech corpora need at least a basic (orthographic) transcription to be exploitable difficult to produce for languages/dialects with only few native speakers use crowdsourcing phonological research further requires phonemic/phonetic segmentation and labelling very time-consuming combine automatic broad phonetic alignment with querydriven annotation 34

35 References Brinckmann, C., Kleiner, S., Knöbl, R., and Berend, N. (2008): German Today: an areally extensive corpus of spoken Standard German. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. Draxler, C. (2005): WebTranscribe an extensible web-based speech annotation framework. Proceedings of the 8th International Conference on Text, Speech and Dialogue (TSD 2005), Karlovy Vary, Czech Republic, Keibel, H. and Belica, C. (2007): CCDB: a corpus-linguistic research and development workbench. Proceedings of Corpus Linguistics 2007, Birmingham, United Kingdom. Raffelsiefen, R. and Brinckmann, C. (2007): Evaluating phonological status: significance of paradigm uniformity vs. prosodic grouping effects. Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS XVI), Saarbrücken, Germany, Schiel, F. (2004): MAUS Goes Iterative. Proceedings of the fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, Van Bael, C., Boves, L., van den Heuvel, H. and Strik, H. (2006): Automatic phonetic transcription of large speech corpora. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, Van Bael, C., Boves, L., van den Heuvel, H. and Strik, H. (2007): Automatic phonetic transcription of large speech corpora. Computer Speech and Language 21 (4), Voormann, H. and Gut, U. (2008): Agile corpus creation. Corpus Linguistics and Linguistic Theory 4 (2),

36 Thank you! 36

Towards Web Services for Speech Recording and Annotation

Towards Web Services for Speech Recording and Annotation Towards Web Services for Speech Recording and Annotation Christoph Draxler draxler@phonetik.uni-muenchen.de BAS Bavarian Archive for Speech Signals LMU Munich BAS hosted by University of Munich (LMU) Florian

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne Published in: Proceedings of Fonetik 2008 Published: 2008-01-01

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

The Database for Spoken German DGD2

The Database for Spoken German DGD2 The Database for Spoken German DGD2 Thomas Schmidt Institut für Deutsche Sprache R5, 6-13, D-68161 Mannheim E-mail: thomas.schmidt@ids-mannheim.de Abstract The Database for Spoken German (Datenbank für

More information

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Janne Bondi Johannessen, Signe Laake, Kristin Hagen, Øystein Alexander Vangsnes, Tor Anders Åfarli, Arne

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

An analysis of coding consistency in the transcription of spontaneous. speech from the Buckeye corpus

An analysis of coding consistency in the transcription of spontaneous. speech from the Buckeye corpus An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus William D. Raymond Ohio State University 1. Introduction Large corpora of speech that have been supplemented

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1] Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

DAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR

DAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR DAM-LR at the INL Archive Formation and Local INL Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl Introducing Remco van Veenendaal Project manager DAM-LR Acting project manager Dutch HLT Agency

More information

FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE

FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE Şükriye RUHİ sukruh@metu.edu.tr Derya ÇOKAL KARADAŞ cokal@metu.edu.tr Middle East Technical University THE METU SPOKEN TURKISH DISCOURSE

More information

Tools & Resources for Visualising Conversational-Speech Interaction

Tools & Resources for Visualising Conversational-Speech Interaction Tools & Resources for Visualising Conversational-Speech Interaction Nick Campbell NiCT/ATR-SLC Keihanna Science City, Kyoto, Japan. nick@nict.go.jp Preamble large corpus data examples new stuff conclusion

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

1. Introduction to Spoken Dialogue Systems

1. Introduction to Spoken Dialogue Systems SoSe 2006 Projekt Sprachdialogsysteme 1. Introduction to Spoken Dialogue Systems Walther v. Hahn, Cristina Vertan {vhahn,vertan}@informatik.uni-hamburg.de Content What are Spoken dialogue systems? Types

More information

Computerized Language Analysis (CLAN) from The CHILDES Project

Computerized Language Analysis (CLAN) from The CHILDES Project Vol. 1, No. 1 (June 2007), pp. 107 112 http://nflrc.hawaii.edu/ldc/ Computerized Language Analysis (CLAN) from The CHILDES Project Reviewed by FELICITY MEAKINS, University of Melbourne CLAN is an annotation

More information

ehg New Trends in e Humanities Amsterdam 10 01 2013

ehg New Trends in e Humanities Amsterdam 10 01 2013 ehg New Trends in e Humanities Amsterdam 10 01 2013 Overview 1) Dialect geography 2) A unified structure for Dutch dialect dictionary data 3) Dialectgebieden in Brabant. Geografische clustering op basis

More information

Study Plan for Master of Arts in Applied Linguistics

Study Plan for Master of Arts in Applied Linguistics Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment

More information

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania A Short Introduction to Transcribing with ELAN Ingrid Rosenfelder Linguistics Lab University of Pennsylvania January 2011 Contents 1 Source 2 2 Opening files for annotation 2 2.1 Starting a new transcription.....................

More information

Developing a User-based Method of Web Register Classification

Developing a User-based Method of Web Register Classification Developing a User-based Method of Web Register Classification Jesse Egbert Douglas Biber Northern Arizona University Introduction The internet has tremendous potential for linguistic research and NLP applications

More information

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma) The Language Archive at the Max Planck Institute for Psycholinguistics Alexander König (with thanks to J. Ringersma) Fourth SLCN Workshop, Berlin, December 2010 Content 1.The Language Archive Why Archiving?

More information

MULTIMODAL CORPUS USING MULTIMODAL DICTIONARY IN LOHORUNG

MULTIMODAL CORPUS USING MULTIMODAL DICTIONARY IN LOHORUNG MULTIMODAL CORPUS USING MULTIMODAL DICTIONARY IN LOHORUNG Prof. Jens Allwood 1, Sagun Dhakhwa 2, Bhim Narayan Regmi 2, Prasanna Shrestha 2 1 University of Gothenburg, Sweden 2 Centre for Communication

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Crowdsourcing for Big Data Analytics

Crowdsourcing for Big Data Analytics KYOTO UNIVERSITY Crowdsourcing for Big Data Analytics Hisashi Kashima (Kyoto University) Satoshi Oyama (Hokkaido University) Yukino Baba (Kyoto University) DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY

More information

The use of Praat in corpus research

The use of Praat in corpus research The use of Praat in corpus research Paul Boersma Praat is a computer program for analysing, synthesizing and manipulating speech and other sounds, and for creating publication-quality graphics. It is open

More information

Research Portfolio. Beáta B. Megyesi January 8, 2007

Research Portfolio. Beáta B. Megyesi January 8, 2007 Research Portfolio Beáta B. Megyesi January 8, 2007 Research Activities Research activities focus on mainly four areas: Natural language processing During the last ten years, since I started my academic

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

A CHINESE SPEECH DATA WAREHOUSE

A CHINESE SPEECH DATA WAREHOUSE A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk

More information

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National

More information

Annotation in Language Documentation

Annotation in Language Documentation Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE 2015-10-29 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

Online experiments with the Percy software framework experiences and some early results

Online experiments with the Percy software framework experiences and some early results Online experiments with the Percy software framework experiences and some early results Christoph Draxler BAS Bavarian Archive of Speech Signals Institute of Phonetics and Speech Processing Ludwig-Maximilian

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Teaching Methodology Modules. Teaching Skills Modules

Teaching Methodology Modules. Teaching Skills Modules 3.3 Clarendon Park, Clumber Avenue, Nottingham, NG5 1DW, United Kingdom Tel: +44 115 969 2424. Fax: +44 115 962 1452. www.ilsenglish.com. Email: frances@ilsenglish.com Teacher Development Modules for Teachers

More information

ANALEC: a New Tool for the Dynamic Annotation of Textual Data

ANALEC: a New Tool for the Dynamic Annotation of Textual Data ANALEC: a New Tool for the Dynamic Annotation of Textual Data Frédéric Landragin, Thierry Poibeau and Bernard Victorri LATTICE-CNRS École Normale Supérieure & Université Paris 3-Sorbonne Nouvelle 1 rue

More information

Speech Transcription

Speech Transcription TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

More information

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014 COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: weyand@maine.edu COURSE DESCRIPTION: This is a practical

More information

8 Email Strategies for 2008

8 Email Strategies for 2008 TM 8 Strategies for 2008 www.subscribermail.com This report is provided to you courtesy of SubscriberMail, an award-winning provider of email marketing services and technology that enable organizations

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

Gilead Transparency Reporting Methodological Note

Gilead Transparency Reporting Methodological Note Gilead Transparency Reporting Methodological Note Contents 1 Introduction... 2 2 Definition of Transfers of Value... 2 3 Definition and management of Cross-Border Spend... 3 4 Which Recipients of Transfers

More information

Volume Trends in EU Postal Markets

Volume Trends in EU Postal Markets Volume Trends in EU Postal Markets Antonia Niederprüm 14th Königswinter Seminar, 25-27 November 2013 Postal Regulation and Volumes under Pressure Königswinter, 26 November 2013 0 Agenda 1. Today: Differences

More information

http://liceu.uab.cat/~joaquim/publicacions/ Dybkjaer_et_al_01_annotation_multimodality.pdf

http://liceu.uab.cat/~joaquim/publicacions/ Dybkjaer_et_al_01_annotation_multimodality.pdf Dybkjaer, L., Berman, S., Bernsen, N. O., Carletta, J., Heid, U., & Llisterri, J. (2001). Requirements and specifications for a tool in support of annotation of natural interaction and multimodal data.

More information

DIXI A Generic Text-to-Speech System for European Portuguese

DIXI A Generic Text-to-Speech System for European Portuguese DIXI A Generic Text-to-Speech System for European Portuguese Sérgio Paulo, Luís C. Oliveira, Carlos Mendes, Luís Figueira, Renato Cassaca, Céu Viana 1 and Helena Moniz 1,2 L 2 F INESC-ID/IST, 1 CLUL/FLUL,

More information

Eliminating Complexity to Ensure Fastest Time to Big Data Value

Eliminating Complexity to Ensure Fastest Time to Big Data Value Eliminating Complexity to Ensure Fastest Time to Big Data Value Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest

More information

Speaker Recruitment for Speech Databases

Speaker Recruitment for Speech Databases Speaker Recruitment for Speech Databases Eric Sanders, Henk van den Heuvel SPEX P.O. Box 9103, 6500 HD Nijmegen, the Netherlands eric@spex.nl Abstract In this paper, the aspects of speaker recruitment,

More information

Speech Analytics. Whitepaper

Speech Analytics. Whitepaper Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Turker-Assisted Paraphrasing for English-Arabic Machine Translation Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University

More information

209 THE STRUCTURE AND USE OF ENGLISH.

209 THE STRUCTURE AND USE OF ENGLISH. 209 THE STRUCTURE AND USE OF ENGLISH. (3) A general survey of the history, structure, and use of the English language. Topics investigated include: the history of the English language; elements of the

More information

Pan-European opinion poll on occupational safety and health

Pan-European opinion poll on occupational safety and health PRESS KIT Pan-European opinion poll on occupational safety and health Results across 36 European countries Press kit Conducted by Ipsos MORI Social Research Institute at the request of the European Agency

More information

Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects

Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects David Graff, Mohamed Maamouri Linguistic Data Consortium University of Pennsylvania E-mail: graff@ldc.upenn.edu, maamouri@ldc.upenn.edu

More information

Sprinter: Language Technologies for Interactive and Multimedia Language Learning

Sprinter: Language Technologies for Interactive and Multimedia Language Learning Sprinter: Language Technologies for Interactive and Multimedia Language Learning Renlong Ai, Marcela Charfuelan, Walter Kasper, Tina Klüwer, Hans Uszkoreit, Feiyu Xu, Sandra Gasber, Philip Gienandt German

More information

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.

More information

Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt

Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt Understanding Impaired Speech Kobi Calev, Morris Alper January 2016 Voiceitt Our Problem Domain We deal with phonological disorders They may be either - resonance or phonation - physiological or neural

More information

Annotation Pro Software Speech signal visualisation, part 1

Annotation Pro Software Speech signal visualisation, part 1 Annotation Pro Software Speech signal visualisation, part 1 klessa@amu.edu.pl katarzyna.klessa.pl Katarzyna Klessa ` Topics of the class 1. Introduction: annotation of speech recordings 2. Annotation Pro

More information

Between voicing and aspiration

Between voicing and aspiration Workshop Maps and Grammar 17-18 September 2014 Introduction Dutch-German dialect continuum Voicing languages vs. aspiration languages Phonology meets phonetics Phonetically continuous, phonologically discrete

More information

Develop Software that Speaks and Listens

Develop Software that Speaks and Listens Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Evaluating grapheme-to-phoneme converters in automatic speech recognition context Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating

More information

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Using the Amazon Mechanical Turk for Transcription of Spoken Language Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.

More information

Payments and Revenues. Do retail payments really matter to banks?

Payments and Revenues. Do retail payments really matter to banks? Payments and Revenues Do retail payments really matter to banks? by Dave Birch Consult Hyperion Opportunities Banks do lots of things, all of which

More information

Offshore Software Development Centers in Russia: Risk Mitigation Strategy

Offshore Software Development Centers in Russia: Risk Mitigation Strategy Offshore Software Development Centers in Russia: Risk Mitigation Strategy Sergei Riabov Director of Business Development, Auriga Inc. Agenda Introduction Changing Physiognomy of Offshore Outsourcing Opportunities

More information

CallAn: A Tool to Analyze Call Center Conversations

CallAn: A Tool to Analyze Call Center Conversations CallAn: A Tool to Analyze Call Center Conversations Balamurali AR, Frédéric Béchet And Benoit Favre Abstract Agent Quality Monitoring (QM) of customer calls is critical for call center companies. We present

More information

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Robustness of a Spoken Dialogue Interface for a Personal Assistant Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia

More information

Reading Competencies

Reading Competencies Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies

More information

English for communication in the workplace

English for communication in the workplace English for communication in the workplace David Bonamy Introduction If you are teaching or planning to teach English to help your students communicate effectively in their present or future place of work,

More information

Data at the SFB "Mehrsprachigkeit"

Data at the SFB Mehrsprachigkeit 1 Workshop on multilingual data, 08 July 2003 MULTILINGUAL DATABASE: Obstacles and Opportunities Thomas Schmidt, Project Zb Data at the SFB "Mehrsprachigkeit" K1: Japanese and German expert discourse in

More information

Technical Report. Overview. Revisions in this Edition. Four-Level Assessment Process

Technical Report. Overview. Revisions in this Edition. Four-Level Assessment Process Technical Report Overview The Clinical Evaluation of Language Fundamentals Fourth Edition (CELF 4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language

More information

Historical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics?

Historical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics? Historical Linguistics Diachronic Analysis What is Historical Linguistics? Historical linguistics is the study of how languages change over time and of their relationships with other languages. All languages

More information

Database Design For Corpus Storage: The ET10-63 Data Model

Database Design For Corpus Storage: The ET10-63 Data Model January 1993 Database Design For Corpus Storage: The ET10-63 Data Model Tony McEnery & Béatrice Daille I. General Presentation Within the ET10-63 project, a French-English bilingual corpus of about 2 million

More information

The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content Omar F. Zaidan and Chris Callison-Burch Dept. of Computer Science, Johns Hopkins University Baltimore,

More information

Establishing Testing Knowledge and Experience Sharing at Siemens

Establishing Testing Knowledge and Experience Sharing at Siemens WWW.QUALTECHCONFERENCES.COM Europe s Premier Software Testing Event World Forum Convention Centre, The Hague, Netherlands The Future of Software Testing Establishing Testing Knowledge and Experience Sharing

More information

C E D A T 8 5. Innovating services and technologies for speech content management

C E D A T 8 5. Innovating services and technologies for speech content management C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle

More information

Crowdsourcing for Speech Processing

Crowdsourcing for Speech Processing Crowdsourcing for Speech Processing Applications to Data Collection, Transcription and Assessment Editors Maxine Eskénazi Gina-Anne Levow Helen Meng Gabriel Parent David Suendermann CROWDSOURCING FOR

More information

Central and South-East European Resources in META-SHARE

Central and South-East European Resources in META-SHARE Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB

More information

CONTENTS: bul BULGARIAN LABOUR MIGRATION, DESK RESEARCH, 2015

CONTENTS: bul BULGARIAN LABOUR MIGRATION, DESK RESEARCH, 2015 215 2 CONTENTS: 1. METHODOLOGY... 3 a. Survey characteristics... 3 b. Purpose of the study... 3 c. Methodological notes... 3 2. DESK RESEARCH... 4 A. Bulgarian emigration tendencies and destinations...

More information

PHONETIC TOOL FOR THE TUNISIAN ARABIC

PHONETIC TOOL FOR THE TUNISIAN ARABIC PHONETIC TOOL FOR THE TUNISIAN ARABIC Abir Masmoudi 1,2, Yannick Estève 1, Mariem Ellouze Khmekhem 2, Fethi Bougares 1, Lamia Hadrich Belguith 2 (1) LIUM, University of Maine, France (2) ANLP Research

More information

Master of Arts in Teaching English to Speakers of Other Languages (MA TESOL)

Master of Arts in Teaching English to Speakers of Other Languages (MA TESOL) Master of Arts in Teaching English to Speakers of Other Languages (MA TESOL) Overview Teaching English to non-native English speakers requires skills beyond just knowing the language. Teachers must have

More information

Prosodic focus marking in Bai

Prosodic focus marking in Bai Prosodic focus marking in Bai Zenghui Liu 1, Aoju Chen 1,2 & Hans Van de Velde 1 Utrecht University 1, Max Planck Institute for Psycholinguistics 2 l.z.h.liu@uu.nl, aoju.chen@uu.nl, h.vandevelde@uu.nl

More information

PICCL: Philosophical Integrator of Computational and Corpus Libraries

PICCL: Philosophical Integrator of Computational and Corpus Libraries 1 PICCL: Philosophical Integrator of Computational and Corpus Libraries Martin Reynaert 12, Maarten van Gompel 1, Ko van der Sloot 1 and Antal van den Bosch 1 Center for Language Studies - Radboud University

More information

Reading Specialist (151)

Reading Specialist (151) Purpose Reading Specialist (151) The purpose of the Reading Specialist test is to measure the requisite knowledge and skills that an entry-level educator in this field in Texas public schools must possess.

More information

Multi-level annotation in the Emu speech database management system

Multi-level annotation in the Emu speech database management system Speech Communication 33 (2001) 61±77 www.elsevier.nl/locate/specom Multi-level annotation in the Emu speech database management system Steve Cassidy a, *, Jonathan Harrington a,b a Speech Hearing and Language

More information

COMMUNICATION POLICY. Adopted by the Board of Directors on 6 March 2008 NORDIC INVESTMENT BANK

COMMUNICATION POLICY. Adopted by the Board of Directors on 6 March 2008 NORDIC INVESTMENT BANK COMMUNICATION POLICY Adopted by the Board of Directors on 6 March 2008 NORDIC INVESTMENT BANK Communication policy 1. Purpose... 3 2. Goals... 3 3. Guiding principles... 3 4. Target groups... 4 5. Messages...

More information

Sample Cities for Multilingual Live Subtitling 2013

Sample Cities for Multilingual Live Subtitling 2013 Carlo Aliprandi, SAVAS Dissemination Manager Live Subtitling 2013 Barcelona, 03.12.2013 1 SAVAS - Rationale SAVAS is a FP7 project co-funded by the EU 2 years project: 2012-2014. 3 R&D companies and 5

More information

4 Pitch and range in language and music

4 Pitch and range in language and music 4 Pitch and range in language and music 4.1 Average and range of pitch in spoken language and song 4.1.1 Average and range of pitch in language Fant (1956) determined the average values for fundamental

More information

IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS

IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS Yeon-Jun Kim, Ann Syrdal AT&T Labs-Research, 180 Park Ave. Florham Park, NJ 07932 Matthias Jilka Institut für Linguistik,

More information

Language Resources and Evaluation for the Support of

Language Resources and Evaluation for the Support of Language Resources and Evaluation for the Support of the Greek Language in the MARY TtS Pepi Stavropoulou 1,2, Dimitrios Tsonos 1, and Georgios Kouroupetroglou 1 1 National and Kapodistrian University

More information

From Fieldwork to Annotated Corpora: The CorpAfroAs project

From Fieldwork to Annotated Corpora: The CorpAfroAs project From Fieldwork to Annotated Corpora: The CorpAfroAs project Amina Mettouchi & Christian Chanard (University of Nantes & Institut Universitaire de France) (CNRS-LLACAN, Villejuif) * Introduction In the

More information

Global Food Security Programme A survey of public attitudes

Global Food Security Programme A survey of public attitudes Global Food Security Programme A survey of public attitudes Contents 1. Executive Summary... 2 2. Introduction... 4 3. Results... 6 4. Appendix Demographics... 17 5. Appendix Sampling and weighting...

More information

Hyunah Ahn hyunah.ahn@hawaii.edu http://www2.hawaii.edu/~ahnhyuna

Hyunah Ahn hyunah.ahn@hawaii.edu http://www2.hawaii.edu/~ahnhyuna Hyunah Ahn hyunah.ahn@hawaii.edu http://www2.hawaii.edu/~ahnhyuna EDUCATION Ph.D. candidate in Second Language Studies (SLS) University of Hawai i at Mānoa (UHM), Honolulu, Hawai i (ABD as of May 16, 2014;

More information

Fundamentals of Information Systems, Fifth Edition. Chapter 8 Systems Development

Fundamentals of Information Systems, Fifth Edition. Chapter 8 Systems Development Fundamentals of Information Systems, Fifth Edition Chapter 8 Systems Development Principles and Learning Objectives Effective systems development requires a team effort of stakeholders, users, managers,

More information

Reporting. Understanding Advanced Reporting Features for Managers

Reporting. Understanding Advanced Reporting Features for Managers Reporting Understanding Advanced Reporting Features for Managers Performance & Talent Management Performance & Talent Management combines tools and processes that allow employees to focus and integrate

More information

Ontology construction on a cloud computing platform

Ontology construction on a cloud computing platform Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Bridgestone Europe HR Transformation. Martha C. White, Vice President, Human Resouces & CSR Bridgestone EMEA 9 September, 2015

Bridgestone Europe HR Transformation. Martha C. White, Vice President, Human Resouces & CSR Bridgestone EMEA 9 September, 2015 Bridgestone Europe HR Transformation Martha C. White, Vice President, Human Resouces & CSR Bridgestone EMEA 9 September, 2015 Agenda Introductions Personal Introduction Bridgstone Europe: Who we are and

More information

W-PhAMT: A web tool for phonetic multilevel timeline visualization

W-PhAMT: A web tool for phonetic multilevel timeline visualization W-PhAMT: A web tool for phonetic multilevel timeline visualization Francesco Cutugno, Vincenza Anna Leano, Antonio Origlia LUSI-Lab @ Dipartimento di Scienze Fisiche Università di Napoli Federico II Complesso

More information