Transcription bottleneck of speech corpus exploitation
|
|
- Geoffrey Tyler
- 7 years ago
- Views:
Transcription
1 Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen
2 Overview Introduction Written corpora vs. speech corpora Speech corpus annotation Transcription bottleneck Crowdsourcing the orthographic transcription Automatic broad phonetic alignment Query-driven annotation Summary 2
3 Written vs. speech corpora Written corpora can be compiled/accessed more easily web as corpus large available corpora, e.g. DeReKo for German (3.4 billion words): Written corpora can be exploited without any annotation, e.g. extraction of higher-order collocations in CCDB: Limited availability of speech corpora Speech corpora need at least a basic transcription 3
4 Speech corpus annotation "Basic" transcription: orthographic transcription languages without standardized orthography? Text-to-audio alignment Phonetic transcription for phonetic and phonological research Prosody, information structure, coreferences, POS,... 4
5 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 5
6 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 6
7 Crowdsourcing: Introduction Term coined by Jeff Howe (Wired, June 2006) Outsourcing: subcontracting a process, such as product design or manufacturing, to a third-party company Crowdsourcing: outsourcing a task traditionally performed by an employee or contractor to an undefined, generally large group of people Classical crowdsourcing: self-service restaurants, supermarkets, IKEA, ATMs, ticket machines New: use the Internet to publicize and manage crowdsourcing projects "Wisdom of crowds": aggregation of information in groups result in decisions that are often better than could have been made by any single member of the group 7
8 Amazon Mechanical Turk (mturk.com) 8
9 Distributed Proofreaders (pgdp.net) 9
10 Recording Teenagers: (LMU Munich) 10
11 Key guidelines for successful crowdsourcing 1. Be focused: vaguely defined problems get vague answers 2. Get your filters right: use crowd and experts to extract the best answers 3. Tap the right crowds: find the best experts in the mass 4. Build community into social networks (BusinessWeek, September 25, 2006) 11
12 Possible application: speech corpus "German Today" Recordings in 160+ towns throughout the German speaking area of Europe (D, A, CH, LUX, I, B, FL) 4 high school students (aged 16-20) in every town und 2 older adults (aged 50-60) in 80 towns 800+ speakers 90 minutes per speaker 1200 hrs. of speech Material: read speech interview map task 12
13 13
14 Map Task Bruneck Landeck Start Ziel Start Ziel 14
15 Crowdsourcing the orthographic transcription Dialectal spontaneous speech (map task data) can be transcribed reliably only by (near-)native speakers of the dialect. Possible crowdsourcing implementation: central database of speech signals, metadata, transcripts, and information about the users/transcribers web-based transcription software, e.g. WebTranscribe (as used in clearly defined task: transcribe each inter-pause-stretch with standard German orthography quality assurance: parallel transcription, evaluation + control tasks (as employed by CastingWords on mturk.com) recruit transcribers: contact the schools where the recordings took place and/or the speakers directly community: points / virtual titles, rewards (e.g. visit to IDS), games... 15
16 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 16
17 Automatic broad phonetic alignment Input: speech signal orthographic transcription canonic/phonemic transcription of all words in the corpus pronunciation lexicon grapheme-to-phoneme converter language-specific phoneme models (e.g. trained HMMs) Output: time-aligned broad phonetic transcription 17
18 Example: orthographic transcription 18
19 Munich Automatic Segmentation System MAUS 19
20 Modelling post-lexical phonological processes 20
21 Obvious errors 21
22 Evaluation: comparison with manual transcription Van Bael et al. (2006, 2007) compared 10 aligners for Dutch with a manually obtained reference transcription. Results: Best performance: Canonical transcription + modelling of postlexical phonological processes with a decision tree Number of remaining disagreements with the reference transcription (14.6% for spontaneous speech, 8.1% for read speech) only slightly higher than human inter-labeller disagreement scores reported in the literature 22
23 Task-based evaluation access specific portions of the speech signal for further manual annotation? duration-based analyses (only large, significant effects can be found) analyses in the frequency domain (e.g. formant slope) 23
24 Phonetic aligners for lessresourced languages? build your own using HTK but: you need at least one hour of phonetically segmented and labelled speech data find an aligner for a language that is phonetically similar to your target language and use its pre-built HMMs adding pronunciation lexicon and/or grapheme-to-phoneme rules mapping between the phonemes of your target language and the HMM-modelled language 24
25 Transcription bottleneck Reliable orthographic transcription: only feasible for near-native speakers problem: minority languages / dialectal speech crowdsourcing the orthographic transcription Phonetic transcription: manual annotation is very time-consuming (1:200) and requires considerable skill automatic broad phonetic alignment query-driven annotation 25
26 Traditional corpus annotation process Gut (2008) 26
27 Problems with sequential corpus creation too time-consuming: many years of annotation work before corpus can be exploited and any results can be published very error-prone: limited reliability of annotations due to coder drift restricted corpus queries: failed/impossible queries re-annotation of corpus 27
28 Cyclic and iterative corpus annotation ("agile corpus creation") Gut (2008) 28
29 Query-driven phonetic annotation of "German Today" 29
30 30
31 31
32 Advantages of agile corpus creation Query-driven approach tests suitability and consistency of annotation schema very little data has to be re-annotated or discarded design errors, annotation errors and conceptual inadequacies become immediately visible successive cycles improve annotation schema and limit it to the elements necessary for the queries saves time early publication of first results 32
33 Combining automatic and querydriven annotation 33
34 Summary speech corpora need at least a basic (orthographic) transcription to be exploitable difficult to produce for languages/dialects with only few native speakers use crowdsourcing phonological research further requires phonemic/phonetic segmentation and labelling very time-consuming combine automatic broad phonetic alignment with querydriven annotation 34
35 References Brinckmann, C., Kleiner, S., Knöbl, R., and Berend, N. (2008): German Today: an areally extensive corpus of spoken Standard German. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. Draxler, C. (2005): WebTranscribe an extensible web-based speech annotation framework. Proceedings of the 8th International Conference on Text, Speech and Dialogue (TSD 2005), Karlovy Vary, Czech Republic, Keibel, H. and Belica, C. (2007): CCDB: a corpus-linguistic research and development workbench. Proceedings of Corpus Linguistics 2007, Birmingham, United Kingdom. Raffelsiefen, R. and Brinckmann, C. (2007): Evaluating phonological status: significance of paradigm uniformity vs. prosodic grouping effects. Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS XVI), Saarbrücken, Germany, Schiel, F. (2004): MAUS Goes Iterative. Proceedings of the fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, Van Bael, C., Boves, L., van den Heuvel, H. and Strik, H. (2006): Automatic phonetic transcription of large speech corpora. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, Van Bael, C., Boves, L., van den Heuvel, H. and Strik, H. (2007): Automatic phonetic transcription of large speech corpora. Computer Speech and Language 21 (4), Voormann, H. and Gut, U. (2008): Agile corpus creation. Corpus Linguistics and Linguistic Theory 4 (2),
36 Thank you! 36
Towards Web Services for Speech Recording and Annotation
Towards Web Services for Speech Recording and Annotation Christoph Draxler draxler@phonetik.uni-muenchen.de BAS Bavarian Archive for Speech Signals LMU Munich BAS hosted by University of Munich (LMU) Florian
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationEXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationSWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne
SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne Published in: Proceedings of Fonetik 2008 Published: 2008-01-01
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationThe Database for Spoken German DGD2
The Database for Spoken German DGD2 Thomas Schmidt Institut für Deutsche Sprache R5, 6-13, D-68161 Mannheim E-mail: thomas.schmidt@ids-mannheim.de Abstract The Database for Spoken German (Datenbank für
More informationScandinavian Dialect Syntax Transnational collaboration, data collection, and resource development
Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Janne Bondi Johannessen, Signe Laake, Kristin Hagen, Øystein Alexander Vangsnes, Tor Anders Åfarli, Arne
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationAn analysis of coding consistency in the transcription of spontaneous. speech from the Buckeye corpus
An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus William D. Raymond Ohio State University 1. Introduction Large corpora of speech that have been supplemented
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationEfficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationDAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR
DAM-LR at the INL Archive Formation and Local INL Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl Introducing Remco van Veenendaal Project manager DAM-LR Acting project manager Dutch HLT Agency
More informationFEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE
FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE Şükriye RUHİ sukruh@metu.edu.tr Derya ÇOKAL KARADAŞ cokal@metu.edu.tr Middle East Technical University THE METU SPOKEN TURKISH DISCOURSE
More informationTools & Resources for Visualising Conversational-Speech Interaction
Tools & Resources for Visualising Conversational-Speech Interaction Nick Campbell NiCT/ATR-SLC Keihanna Science City, Kyoto, Japan. nick@nict.go.jp Preamble large corpus data examples new stuff conclusion
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More information1. Introduction to Spoken Dialogue Systems
SoSe 2006 Projekt Sprachdialogsysteme 1. Introduction to Spoken Dialogue Systems Walther v. Hahn, Cristina Vertan {vhahn,vertan}@informatik.uni-hamburg.de Content What are Spoken dialogue systems? Types
More informationComputerized Language Analysis (CLAN) from The CHILDES Project
Vol. 1, No. 1 (June 2007), pp. 107 112 http://nflrc.hawaii.edu/ldc/ Computerized Language Analysis (CLAN) from The CHILDES Project Reviewed by FELICITY MEAKINS, University of Melbourne CLAN is an annotation
More informationehg New Trends in e Humanities Amsterdam 10 01 2013
ehg New Trends in e Humanities Amsterdam 10 01 2013 Overview 1) Dialect geography 2) A unified structure for Dutch dialect dictionary data 3) Dialectgebieden in Brabant. Geografische clustering op basis
More informationStudy Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
More informationA Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania
A Short Introduction to Transcribing with ELAN Ingrid Rosenfelder Linguistics Lab University of Pennsylvania January 2011 Contents 1 Source 2 2 Opening files for annotation 2 2.1 Starting a new transcription.....................
More informationDeveloping a User-based Method of Web Register Classification
Developing a User-based Method of Web Register Classification Jesse Egbert Douglas Biber Northern Arizona University Introduction The internet has tremendous potential for linguistic research and NLP applications
More informationThe Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)
The Language Archive at the Max Planck Institute for Psycholinguistics Alexander König (with thanks to J. Ringersma) Fourth SLCN Workshop, Berlin, December 2010 Content 1.The Language Archive Why Archiving?
More informationMULTIMODAL CORPUS USING MULTIMODAL DICTIONARY IN LOHORUNG
MULTIMODAL CORPUS USING MULTIMODAL DICTIONARY IN LOHORUNG Prof. Jens Allwood 1, Sagun Dhakhwa 2, Bhim Narayan Regmi 2, Prasanna Shrestha 2 1 University of Gothenburg, Sweden 2 Centre for Communication
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationCrowdsourcing for Big Data Analytics
KYOTO UNIVERSITY Crowdsourcing for Big Data Analytics Hisashi Kashima (Kyoto University) Satoshi Oyama (Hokkaido University) Yukino Baba (Kyoto University) DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY
More informationThe use of Praat in corpus research
The use of Praat in corpus research Paul Boersma Praat is a computer program for analysing, synthesizing and manipulating speech and other sounds, and for creating publication-quality graphics. It is open
More informationResearch Portfolio. Beáta B. Megyesi January 8, 2007
Research Portfolio Beáta B. Megyesi January 8, 2007 Research Activities Research activities focus on mainly four areas: Natural language processing During the last ten years, since I started my academic
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationA CHINESE SPEECH DATA WAREHOUSE
A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationAnnotation in Language Documentation
Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE 2015-10-29 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationOnline experiments with the Percy software framework experiences and some early results
Online experiments with the Percy software framework experiences and some early results Christoph Draxler BAS Bavarian Archive of Speech Signals Institute of Phonetics and Speech Processing Ludwig-Maximilian
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationTeaching Methodology Modules. Teaching Skills Modules
3.3 Clarendon Park, Clumber Avenue, Nottingham, NG5 1DW, United Kingdom Tel: +44 115 969 2424. Fax: +44 115 962 1452. www.ilsenglish.com. Email: frances@ilsenglish.com Teacher Development Modules for Teachers
More informationANALEC: a New Tool for the Dynamic Annotation of Textual Data
ANALEC: a New Tool for the Dynamic Annotation of Textual Data Frédéric Landragin, Thierry Poibeau and Bernard Victorri LATTICE-CNRS École Normale Supérieure & Université Paris 3-Sorbonne Nouvelle 1 rue
More informationSpeech Transcription
TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion
More informationCOURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014
COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: weyand@maine.edu COURSE DESCRIPTION: This is a practical
More information8 Email Strategies for 2008
TM 8 Strategies for 2008 www.subscribermail.com This report is provided to you courtesy of SubscriberMail, an award-winning provider of email marketing services and technology that enable organizations
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationD2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
More informationGilead Transparency Reporting Methodological Note
Gilead Transparency Reporting Methodological Note Contents 1 Introduction... 2 2 Definition of Transfers of Value... 2 3 Definition and management of Cross-Border Spend... 3 4 Which Recipients of Transfers
More informationVolume Trends in EU Postal Markets
Volume Trends in EU Postal Markets Antonia Niederprüm 14th Königswinter Seminar, 25-27 November 2013 Postal Regulation and Volumes under Pressure Königswinter, 26 November 2013 0 Agenda 1. Today: Differences
More informationhttp://liceu.uab.cat/~joaquim/publicacions/ Dybkjaer_et_al_01_annotation_multimodality.pdf
Dybkjaer, L., Berman, S., Bernsen, N. O., Carletta, J., Heid, U., & Llisterri, J. (2001). Requirements and specifications for a tool in support of annotation of natural interaction and multimodal data.
More informationDIXI A Generic Text-to-Speech System for European Portuguese
DIXI A Generic Text-to-Speech System for European Portuguese Sérgio Paulo, Luís C. Oliveira, Carlos Mendes, Luís Figueira, Renato Cassaca, Céu Viana 1 and Helena Moniz 1,2 L 2 F INESC-ID/IST, 1 CLUL/FLUL,
More informationEliminating Complexity to Ensure Fastest Time to Big Data Value
Eliminating Complexity to Ensure Fastest Time to Big Data Value Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest
More informationSpeaker Recruitment for Speech Databases
Speaker Recruitment for Speech Databases Eric Sanders, Henk van den Heuvel SPEX P.O. Box 9103, 6500 HD Nijmegen, the Netherlands eric@spex.nl Abstract In this paper, the aspects of speaker recruitment,
More informationSpeech Analytics. Whitepaper
Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More information209 THE STRUCTURE AND USE OF ENGLISH.
209 THE STRUCTURE AND USE OF ENGLISH. (3) A general survey of the history, structure, and use of the English language. Topics investigated include: the history of the English language; elements of the
More informationPan-European opinion poll on occupational safety and health
PRESS KIT Pan-European opinion poll on occupational safety and health Results across 36 European countries Press kit Conducted by Ipsos MORI Social Research Institute at the request of the European Agency
More informationDeveloping LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects David Graff, Mohamed Maamouri Linguistic Data Consortium University of Pennsylvania E-mail: graff@ldc.upenn.edu, maamouri@ldc.upenn.edu
More informationSprinter: Language Technologies for Interactive and Multimedia Language Learning
Sprinter: Language Technologies for Interactive and Multimedia Language Learning Renlong Ai, Marcela Charfuelan, Walter Kasper, Tina Klüwer, Hans Uszkoreit, Feiyu Xu, Sandra Gasber, Philip Gienandt German
More informationThe Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale
The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.
More informationUnderstanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt
Understanding Impaired Speech Kobi Calev, Morris Alper January 2016 Voiceitt Our Problem Domain We deal with phonological disorders They may be either - resonance or phonation - physiological or neural
More informationAnnotation Pro Software Speech signal visualisation, part 1
Annotation Pro Software Speech signal visualisation, part 1 klessa@amu.edu.pl katarzyna.klessa.pl Katarzyna Klessa ` Topics of the class 1. Introduction: annotation of speech recordings 2. Annotation Pro
More informationBetween voicing and aspiration
Workshop Maps and Grammar 17-18 September 2014 Introduction Dutch-German dialect continuum Voicing languages vs. aspiration languages Phonology meets phonetics Phonetically continuous, phonologically discrete
More informationDevelop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationEvaluating grapheme-to-phoneme converters in automatic speech recognition context
Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating
More informationUsing the Amazon Mechanical Turk for Transcription of Spoken Language
Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.
More informationPayments and Revenues. Do retail payments really matter to banks?
Payments and Revenues Do retail payments really matter to banks? by Dave Birch Consult Hyperion Opportunities Banks do lots of things, all of which
More informationOffshore Software Development Centers in Russia: Risk Mitigation Strategy
Offshore Software Development Centers in Russia: Risk Mitigation Strategy Sergei Riabov Director of Business Development, Auriga Inc. Agenda Introduction Changing Physiognomy of Offshore Outsourcing Opportunities
More informationCallAn: A Tool to Analyze Call Center Conversations
CallAn: A Tool to Analyze Call Center Conversations Balamurali AR, Frédéric Béchet And Benoit Favre Abstract Agent Quality Monitoring (QM) of customer calls is critical for call center companies. We present
More informationRobustness of a Spoken Dialogue Interface for a Personal Assistant
Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia
More informationReading Competencies
Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies
More informationEnglish for communication in the workplace
English for communication in the workplace David Bonamy Introduction If you are teaching or planning to teach English to help your students communicate effectively in their present or future place of work,
More informationData at the SFB "Mehrsprachigkeit"
1 Workshop on multilingual data, 08 July 2003 MULTILINGUAL DATABASE: Obstacles and Opportunities Thomas Schmidt, Project Zb Data at the SFB "Mehrsprachigkeit" K1: Japanese and German expert discourse in
More informationTechnical Report. Overview. Revisions in this Edition. Four-Level Assessment Process
Technical Report Overview The Clinical Evaluation of Language Fundamentals Fourth Edition (CELF 4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language
More informationHistorical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics?
Historical Linguistics Diachronic Analysis What is Historical Linguistics? Historical linguistics is the study of how languages change over time and of their relationships with other languages. All languages
More informationDatabase Design For Corpus Storage: The ET10-63 Data Model
January 1993 Database Design For Corpus Storage: The ET10-63 Data Model Tony McEnery & Béatrice Daille I. General Presentation Within the ET10-63 project, a French-English bilingual corpus of about 2 million
More informationThe Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content Omar F. Zaidan and Chris Callison-Burch Dept. of Computer Science, Johns Hopkins University Baltimore,
More informationEstablishing Testing Knowledge and Experience Sharing at Siemens
WWW.QUALTECHCONFERENCES.COM Europe s Premier Software Testing Event World Forum Convention Centre, The Hague, Netherlands The Future of Software Testing Establishing Testing Knowledge and Experience Sharing
More informationC E D A T 8 5. Innovating services and technologies for speech content management
C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle
More informationCrowdsourcing for Speech Processing
Crowdsourcing for Speech Processing Applications to Data Collection, Transcription and Assessment Editors Maxine Eskénazi Gina-Anne Levow Helen Meng Gabriel Parent David Suendermann CROWDSOURCING FOR
More informationCentral and South-East European Resources in META-SHARE
Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB
More informationCONTENTS: bul BULGARIAN LABOUR MIGRATION, DESK RESEARCH, 2015
215 2 CONTENTS: 1. METHODOLOGY... 3 a. Survey characteristics... 3 b. Purpose of the study... 3 c. Methodological notes... 3 2. DESK RESEARCH... 4 A. Bulgarian emigration tendencies and destinations...
More informationPHONETIC TOOL FOR THE TUNISIAN ARABIC
PHONETIC TOOL FOR THE TUNISIAN ARABIC Abir Masmoudi 1,2, Yannick Estève 1, Mariem Ellouze Khmekhem 2, Fethi Bougares 1, Lamia Hadrich Belguith 2 (1) LIUM, University of Maine, France (2) ANLP Research
More informationMaster of Arts in Teaching English to Speakers of Other Languages (MA TESOL)
Master of Arts in Teaching English to Speakers of Other Languages (MA TESOL) Overview Teaching English to non-native English speakers requires skills beyond just knowing the language. Teachers must have
More informationProsodic focus marking in Bai
Prosodic focus marking in Bai Zenghui Liu 1, Aoju Chen 1,2 & Hans Van de Velde 1 Utrecht University 1, Max Planck Institute for Psycholinguistics 2 l.z.h.liu@uu.nl, aoju.chen@uu.nl, h.vandevelde@uu.nl
More informationPICCL: Philosophical Integrator of Computational and Corpus Libraries
1 PICCL: Philosophical Integrator of Computational and Corpus Libraries Martin Reynaert 12, Maarten van Gompel 1, Ko van der Sloot 1 and Antal van den Bosch 1 Center for Language Studies - Radboud University
More informationReading Specialist (151)
Purpose Reading Specialist (151) The purpose of the Reading Specialist test is to measure the requisite knowledge and skills that an entry-level educator in this field in Texas public schools must possess.
More informationMulti-level annotation in the Emu speech database management system
Speech Communication 33 (2001) 61±77 www.elsevier.nl/locate/specom Multi-level annotation in the Emu speech database management system Steve Cassidy a, *, Jonathan Harrington a,b a Speech Hearing and Language
More informationCOMMUNICATION POLICY. Adopted by the Board of Directors on 6 March 2008 NORDIC INVESTMENT BANK
COMMUNICATION POLICY Adopted by the Board of Directors on 6 March 2008 NORDIC INVESTMENT BANK Communication policy 1. Purpose... 3 2. Goals... 3 3. Guiding principles... 3 4. Target groups... 4 5. Messages...
More informationSample Cities for Multilingual Live Subtitling 2013
Carlo Aliprandi, SAVAS Dissemination Manager Live Subtitling 2013 Barcelona, 03.12.2013 1 SAVAS - Rationale SAVAS is a FP7 project co-funded by the EU 2 years project: 2012-2014. 3 R&D companies and 5
More information4 Pitch and range in language and music
4 Pitch and range in language and music 4.1 Average and range of pitch in spoken language and song 4.1.1 Average and range of pitch in language Fant (1956) determined the average values for fundamental
More informationIMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS
IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS Yeon-Jun Kim, Ann Syrdal AT&T Labs-Research, 180 Park Ave. Florham Park, NJ 07932 Matthias Jilka Institut für Linguistik,
More informationLanguage Resources and Evaluation for the Support of
Language Resources and Evaluation for the Support of the Greek Language in the MARY TtS Pepi Stavropoulou 1,2, Dimitrios Tsonos 1, and Georgios Kouroupetroglou 1 1 National and Kapodistrian University
More informationFrom Fieldwork to Annotated Corpora: The CorpAfroAs project
From Fieldwork to Annotated Corpora: The CorpAfroAs project Amina Mettouchi & Christian Chanard (University of Nantes & Institut Universitaire de France) (CNRS-LLACAN, Villejuif) * Introduction In the
More informationGlobal Food Security Programme A survey of public attitudes
Global Food Security Programme A survey of public attitudes Contents 1. Executive Summary... 2 2. Introduction... 4 3. Results... 6 4. Appendix Demographics... 17 5. Appendix Sampling and weighting...
More informationHyunah Ahn hyunah.ahn@hawaii.edu http://www2.hawaii.edu/~ahnhyuna
Hyunah Ahn hyunah.ahn@hawaii.edu http://www2.hawaii.edu/~ahnhyuna EDUCATION Ph.D. candidate in Second Language Studies (SLS) University of Hawai i at Mānoa (UHM), Honolulu, Hawai i (ABD as of May 16, 2014;
More informationFundamentals of Information Systems, Fifth Edition. Chapter 8 Systems Development
Fundamentals of Information Systems, Fifth Edition Chapter 8 Systems Development Principles and Learning Objectives Effective systems development requires a team effort of stakeholders, users, managers,
More informationReporting. Understanding Advanced Reporting Features for Managers
Reporting Understanding Advanced Reporting Features for Managers Performance & Talent Management Performance & Talent Management combines tools and processes that allow employees to focus and integrate
More informationOntology construction on a cloud computing platform
Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationBridgestone Europe HR Transformation. Martha C. White, Vice President, Human Resouces & CSR Bridgestone EMEA 9 September, 2015
Bridgestone Europe HR Transformation Martha C. White, Vice President, Human Resouces & CSR Bridgestone EMEA 9 September, 2015 Agenda Introductions Personal Introduction Bridgstone Europe: Who we are and
More informationW-PhAMT: A web tool for phonetic multilevel timeline visualization
W-PhAMT: A web tool for phonetic multilevel timeline visualization Francesco Cutugno, Vincenza Anna Leano, Antonio Origlia LUSI-Lab @ Dipartimento di Scienze Fisiche Università di Napoli Federico II Complesso
More information