Building a Regionally Inclusive Dictionary for Speech Recognition

Size: px
Start display at page:

Download "Building a Regionally Inclusive Dictionary for Speech Recognition"

Transcription

1 SPRING 2004 Computer Science & Linguistics Building a Regionally Inclusive Dictionary for Speech Recognition Speech Recognition (SR) is the automated conversion of speech into written text. Applications range from simple phone-based information services to commercial-grade automated customer service systems, such as those for airline phone reservations. While this process is complex in and of itself, it is further complicated by the fact that speakers (the users) from different parts of the country have varying accents and pronounce the same words differently. Our aim is to create a more speaker-independent SR system while maintaining speed and accuracy of transcription. This requires the construction of an SR dictionary that takes into account the existence of multiple pronunciations for the same word. However, the existence of too many alternate pronunciations overloads the system and is detrimental to accuracy and speed. By finding the optimal number of pronunciations per word, the percentage of words correctly identified by the SR system increased from 78% using the traditional technique to 85.7% using the improved method outlined in this article. This brings the technology 35% closer to the goal of complete recognition and the use of speech as the primary method of human-computer interaction. Justin Burdick Speech Recognition (SR) is a process that transcribes speech into text using a computer. Many repetitive phone tasks can be automated with speech recognition technology, saving businesses significant amounts of money. However, the transcription of everyday human speech is significantly more difficult than simply the recognition of a small set of words, as is the case in a SR-based phone information-retrieval system (e.g. 411 services). A complete Speech Recognition system is fairly complex and consists of various subsystems that interact together to convert speech into written text, as shown in Figure 1. A typical SR system utilizes a common tool for pattern-matching called Hidden Markov Models (HMM) to model parts of speech known as phonemes (similar to syllables) [1]. The models are first trained using a set of training data, which consists of a large number of wave sound files with speech and the corresponding text transcriptions. Each speech file is converted into a series of observations known as feature vectors." These are sets of numbers which describe the sound mathematically by extracting specific information from the waveform at 25-millisecond intervals. The computer then matches this numerical representation to the corresponding text transcription included in the training data. Applied over the entire training set, this process creates a series of models which later allow the SR system to translate speech into text. The training process results in the creation of two data files: one consisting of a model of properties for each phoneme and another of a list of pronunciations for each word. Phonemes, the most fundamental units of speech, are single sounds; each phoneme is modeled by its average sound, the variation of that sound across various speakers, and the transition probabilities between it and other phonemes. The list of pronunciations, termed the pronouncing dictionary, contains each of the words used during training along with the sequence of phonemes that compose that word. Once the models are trained, the system can be used to perform recognition. In order to produce meaningful sentences, the SR system relies on a dictionary and a grammar model as well as the trained phoneme models from the training phase. Transcription is accomplished using the Viterbi algorithm. This algorithm divides the incoming speech into individual observations and processes one observation at a time by comparing it to every possible phoneme. When the algorithm moves to the next observation, it eliminates paths that have a low matching probability. By this elimination at each observation, only the most probable paths survive until the end of the observation sequence; the recognized word sequence can then be generated. The dictionary is important in this process. As the system processes each observation, it must check the dictionary to see if it has made a word yet. Once the word is recognized, the system uses the grammar file to influence the next word chosen, which is very valuable for making the sentence meaningful, similar to grammar check in a wordprocessing program. Implementing an effective SR system requires the synthesis of concepts 1

2 SURJ. Building the Dictionary Figure 1. A simplified model of an HMM-based speech recognition system from several fields, including: Signal Processing (for extracting important features from the sound waves) Probability and Statistics (for defining and training the models and recognition) Linguistics (for building the dictionary and the grammar model) Computer Science (for creating efficient search algorithms) Improvements in any of the subsystems above could result in better overall performance. The following performance goals are among the main objectives of any SR system [2]: 1. Increased Accuracy Accuracy is measured as a percentage of words (or sentences) that are correctly detected. To emphasize the importance of this value, consider a certain SR system that has an accuracy of 99%. This means that during the task of dictation, one word in every 100 words is identified incorrectly. For the average dictation, such an error rate would result in approximately six errors per single-spaced page. Searching for and correcting these typos is a tedious and time-consuming task. This shows that even an accuracy of 99% may not be sufficiently high for the task of dictation. 2. Increased Speed Many speech recognition applications must run on small hand-held devices with limited CPU power. It is important for the SR system to be as computationally inexpensive as possible. This allows the system to transcribe speech in realtime, even on a low-performance system like a PDA or cell phone. 3. Speaker Independence Most SR systems can operate with very high accuracy if they are trained to a specific speaker. However, problems can arise when the system attempts to transcribe the speech of a different person. The reasons for this include, but are not limited to: a. Different speakers may pronounce the same phonemes (sub-words) differently, e.g. a speaker from Brooklyn may pronounce certain vowels differently than a speaker from California. b. Different speakers may pronounce the same words differently, with a different sequence of phonemes, e.g. the word maybe could be pronounced as m ey b iy or as m ey v iy. In this project, our goal is to improve recognition of speaker independent SR systems by taking into account that different speakers may pronounce similar words differently. This requires building a dictionary that has multiple pronunciations for any given word. The details of building such a dictionary are explained below. In order to test the effects of this approach, a set of carefully designed experiments were conducted, the results of which are in the Results and Discussion section. The TIMIT (Texas Instruments- Massachusetts Institute of Technology) database was used for both training and testing of the SR system. This database contains audio files of sample sentences as well as word-level and phonemelevel transcriptions. However, we needed to create our own "pronouncing dictionary" before we could do training and recognition. Each word in a pronouncing dictionary is associated with a series of phonemes that approximate the sound of that word. However, some words need multiple pronunciations in order to be best represented (see Table 1 for examples). When this sequence of phonemes is pronounced, it accurately approximates the sound of the word being spoken. All of the TIMIT data was used in the experiments. About 70% was used to train the models, and the rest was used for performing the testing. Table 2 summarizes key characteristics of the TIMIT data. Method limitations In order to provide a versatile training set, data was collected from speakers spanning eight different geographical areas to capture regional dialects [4] (shown in Figure 2). Unfortunately, these regional dialects introduce problems for the construction of a dictionary. Since many of the speakers pronounce words differently especially in the case of short, common words multiple transcriptions for the same word were common. In fact, the Word As Coauthors Coffee Phoneme-level Transcription ae z ax z kcl k ow ao th axr z k aa iy f k ao f iy Table 1. Sample dictionary entries. Multiple pronunciations can exist for the same word. For example, the first pronunciation of coffee is that of a typical New Yorker, while the second is that of someone from the old Northwest. 2

3 Number of... Speakers 630 (70% male, 30% female) Utterances 6300 (10 per speaker) Distinct texts 2342 Words 6099 Utterances in the training set 4620 Utterances in the test set 1680 Male speakers 326 (training set) (test set) Female speakers 136 (training set) + 56 (test set) Table 2. Key characteristics of the TIMIT data unedited dictionary contained an average of two pronunciations per word [1]. However, these pronunciations tended to cluster around certain words, with most words having only one pronunciation but with others having six or more. A certain level of multiple pronunciations can be helpful at capturing common divergences in pronunciation (e.g. potato/pot_to), as even the best hand-made dictionaries contain 1.1 to 1.2 pronunciations per word [5]. Too many pronunciations per word, however, could be misleading when the computer performs recognition, because a poorly recognized string of phonemes could cause a word mismatch that makes a sentence meaningless. Methodology In order to reduce the number of redundant definitions in the dictionary, two methods were investigated. The first, called skimming, simply edits out and deletes low-frequency transcriptions. This method defined a threshold based on the most commonly occurring pronunciations. Anything below this threshold was removed from the dictionary. The second method, called percentaging, encodes the frequency at which each pronunciation was encountered into the definitions themselves, which a speech recognizer can use to modify its recognition network. As mentioned earlier, the dictionary was created by examining all the speech transcriptions during training mode. The original system just added all of the different combinations of word and phonemes into the system. The new method does the same thing, except that it keeps count of how many times each combination occurs. These counts can be used to perform either skimming (removal of low-occurrence pairs), or percentaging (placing percentage occurrence information with the data). This paper examines the effect of skimming alone, percentaging alone, and finally skimming followed by percentaging on SR system accuracy. Results and Discussion The two systems of dictionary making (skimming and percentaging) were tested independently and then with the 2-step method (skimming followed by percentaging). As shown in Figure 4, skimming examples of dictionaries after processing provided a much larger accuracy gain than did percentaging. When the two were used simultaneously, percentaging added a Figure 2: Speaker Breakdown by Region (total number of speakers= 630). SPRING 2004 very slim increase in accuracy to the method of skimming. An unexpected bonus of using the skimming method is a reduction in required recognition time. Since there are fewer possible recognition sequences, the computer can process each speech fragment more rapidly. At optimal skimming (30%), the test took only two hours, rather than the full three, which is a 33% reduction in time. Furthermore, at this level of skimming, there was an average of 1.6 pronunciations per word, as opposed to 2.0 pronunciations without skimming. Thus, not only does skimming provide more accurate speech recognition, it also allows for faster recognition. Conclusion Since people speak in many differ-ent ways, it is expected that having a large dictionary, inclusive of all pronunciation and dialectic differences, would be a valuable aid to recognition. However, this paper demonstrates that the most effective SR system dictionary may be one with a small number of select alternate pronunciations per word. The dictionary must be built so that a balance is found between being inclusive of pronunciation variants and of conforming to the limitations of a computerized recognition system. This study observed that while a typical SR system contains on average 2.0 pronunciations per word, maximum 3

4 SURJ. SR system accuracy is achieved at 1.6 pronunciations per word. Tailoring a dictionary to contain fewer pronunciations per word is best achieved manually; however, given the size of a typical SR system dictionary, (13,000 entries) a computerized method of deleting erroneous and infrequently used pronunciations is more practical. This study revealed that the method of skimming both effectively increases recognition accuracy and results in a 33% reduction in the time required for speech recognition tasks. An example of an unaltered dictionary Clearly, about ax b aw t is the most commonly given pronunciation for the word about, and so it should be given the highest weight, whereas definitions bah and baw should just be erased. Examples of dictionaries after processing After Skimming 4 about ax b aw d 17 about ax b aw t Here we have done skimming, deleting the least frequently occurring pronunciations. After Percentaging 3.85% about ax b ae t 7.69% about ax b ah 15.38% about ax b aw d 65.38% about ax b aw t 3.85% about b ah 3.85% about b aw Original 1 about ax b ae t 2 about ax b ah 4 about ax b aw d 17 about ax b aw t 1 about b ah 1 about b aw Here we have done percentaging, inserting the frequency of each pronunciation as a percentage. Figure 3: The process of modifying the dictionary Table 3. Simulation Test Results Figure 4. Test Results 4

5 SPRING 2004 References 1. Mohajer, K., Zhong-Min Hu, Pratt, V. Time Boundary Assisted speech recognition. International Journal of Information Fusion (Special Issue on Multi-Sensor Information Fusion For Speech Processing Applications). 2. Rabiner, Juang. Fundamentals of Speech Recognition, Chapter 6, Prentice Hall Signal Processing Series, New Jersey, Young, Evermann, Kershaw, Moore, Odell, Ollason, Valtchev, Woodland. The HTK Book, Ch5, Cambridge University Engineering Department, Garofolo, Lamel, Fisher, Fiscus, Pallett, Dahlgren. TIMIT Printed Documentation, Ch5, U.S. Depart. Of Commerce, Hain, T., Woodland, P.C., Evermann, G. et al. New features in the CU-HTK system for transcription of conversational telephone speech. Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP 01) IEEE International Conference on, Vol.1, Iss., Pages:57-60 vol Culicover, Peter. Lecture: An Introduction to Language in the Humanities. Spring variation.pdf. Justy Burdick Justy Burdick is a sophomore majoring in Electrical Engineering and is considering a minor in Computer Science. He would like to thank Professor Vaughan Pratt and the URP office for sponsoring this Research Experience for Undergraduates project. Furthermore, he would like to thank Keyvan Mohajer for the guidance and help he received at every step of the research process. Finally, he would like to thank the 2002 and 2003 speech group REU team members: Simon Hu, Melissa Mansur, Ryan Bickerstaff, Kenneth Lee, Gu Pan Grace. 5

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

A Comparative Analysis of Speech Recognition Platforms

A Comparative Analysis of Speech Recognition Platforms Communications of the IIMA Volume 9 Issue 3 Article 2 2009 A Comparative Analysis of Speech Recognition Platforms Ore A. Iona College Follow this and additional works at: http://scholarworks.lib.csusb.edu/ciima

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

SASSC: A Standard Arabic Single Speaker Corpus

SASSC: A Standard Arabic Single Speaker Corpus SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science

More information

IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS

IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS Yeon-Jun Kim, Ann Syrdal AT&T Labs-Research, 180 Park Ave. Florham Park, NJ 07932 Matthias Jilka Institut für Linguistik,

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

Improving Automatic Forced Alignment for Dysarthric Speech Transcription

Improving Automatic Forced Alignment for Dysarthric Speech Transcription Improving Automatic Forced Alignment for Dysarthric Speech Transcription Yu Ting Yeung 2, Ka Ho Wong 1, Helen Meng 1,2 1 Human-Computer Communications Laboratory, Department of Systems Engineering and

More information

The ROI. of Speech Tuning

The ROI. of Speech Tuning The ROI of Speech Tuning Executive Summary: Speech tuning is a process of improving speech applications after they have been deployed by reviewing how users interact with the system and testing changes.

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Develop Software that Speaks and Listens

Develop Software that Speaks and Listens Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered

More information

Generating Training Data for Medical Dictations

Generating Training Data for Medical Dictations Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko

More information

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Objective Speech Quality Measures for Internet Telephony

Objective Speech Quality Measures for Internet Telephony Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

Speech Analytics. Whitepaper

Speech Analytics. Whitepaper Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the

More information

Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch

Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch Frans Adriaans Markus Heukelom Marijn Koolen Tom Lentz Ork de Rooij Daan Vreeswijk Supervision: Rob van Son 9th July 2004 Contents

More information

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential white paper Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential A Whitepaper by Jacob Garland, Colin Blake, Mark Finlay and Drew Lanham Nexidia, Inc., Atlanta, GA People who create,

More information

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for

More information

A CHINESE SPEECH DATA WAREHOUSE

A CHINESE SPEECH DATA WAREHOUSE A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Verification of Correct Pronunciation. of Mexican Spanish using Speech Technology 1

Verification of Correct Pronunciation. of Mexican Spanish using Speech Technology 1 Verification of Correct Pronunciation of Mexican Spanish using Speech Technology 1 Ingrid Kirschning and Nancy Aguas Tlatoa Speech Processing Group, ICT, CENTIA 2, Universidad de las Américas- Puebla.

More information

PTE Academic. Score Guide. November 2012. Version 4

PTE Academic. Score Guide. November 2012. Version 4 PTE Academic Score Guide November 2012 Version 4 PTE Academic Score Guide Copyright Pearson Education Ltd 2012. All rights reserved; no part of this publication may be reproduced without the prior written

More information

PROBLEM-SOLVING SKILLS By Robert L. Harrold http://www.ndsu.nodak.edu/ndsu/marmcdon/assessment/assessment_techniques/problem_solving_skills.

PROBLEM-SOLVING SKILLS By Robert L. Harrold http://www.ndsu.nodak.edu/ndsu/marmcdon/assessment/assessment_techniques/problem_solving_skills. PROBLEM-SOLVING SKILLS By Robert L. Harrold http://www.ndsu.nodak.edu/ndsu/marmcdon/assessment/assessment_techniques/problem_solving_skills.htm The initial steps in assessing problem-solving are obvious

More information

Comparative Analysis on the Armenian and Korean Languages

Comparative Analysis on the Armenian and Korean Languages Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State

More information

Chapter 8: Quantitative Sampling

Chapter 8: Quantitative Sampling Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or

More information

The Impact of Using Technology in Teaching English as a Second Language

The Impact of Using Technology in Teaching English as a Second Language English Language and Literature Studies; Vol. 3, No. 1; 2013 ISSN 1925-4768 E-ISSN 1925-4776 Published by Canadian Center of Science and Education The Impact of Using Technology in Teaching English as

More information

Multi-Lingual Display of Business Documents

Multi-Lingual Display of Business Documents The Data Center Multi-Lingual Display of Business Documents David L. Brock, Edmund W. Schuster, and Chutima Thumrattranapruk The Data Center, Massachusetts Institute of Technology, Building 35, Room 212,

More information

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3 Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Worse than not knowing is having information that you didn t know you had. Let the data tell me my inherent

More information

TEACHER NOTES. For information about how to buy the guide, visit www.pearsonpte.com/prepare

TEACHER NOTES. For information about how to buy the guide, visit www.pearsonpte.com/prepare TEACHER NOTES The Official Guide contains: information about the format of PTE Academic authentic test questions to practise answering sample responses and explanations test taking strategies over 200

More information

Perceived Speech Quality Prediction for Voice over IP-based Networks

Perceived Speech Quality Prediction for Voice over IP-based Networks Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,

More information

Dragon Solutions Enterprise Profile Management

Dragon Solutions Enterprise Profile Management Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Enterprise Voice Technology Solutions: A Primer

Enterprise Voice Technology Solutions: A Primer Cognizant 20-20 Insights Enterprise Voice Technology Solutions: A Primer A successful enterprise voice journey starts with clearly understanding the range of technology components and options, and often

More information

Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska

Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska PROBLEM STATEMENT A ROBUST COMPRESSION SYSTEM FOR LOW BIT RATE TELEMETRY - TEST RESULTS WITH LUNAR DATA Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska The

More information

have more skill and perform more complex

have more skill and perform more complex Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such

More information

Corpus Design for a Unit Selection Database

Corpus Design for a Unit Selection Database Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus

More information

Knowledge Management and Speech Recognition

Knowledge Management and Speech Recognition Knowledge Management and Speech Recognition by James Allan Knowledge Management (KM) generally refers to techniques that allow an organization to capture information and practices of its members and customers,

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

EXPERIMENTAL DESIGN REFERENCE

EXPERIMENTAL DESIGN REFERENCE EXPERIMENTAL DESIGN REFERENCE Scenario: A group of students is assigned a Populations Project in their Ninth Grade Earth Science class. They decide to determine the effect of sunlight on radish plants.

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.

More information

Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

More information

The LENA TM Language Environment Analysis System:

The LENA TM Language Environment Analysis System: FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September

More information

The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis

The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis Hüseyin Çakmak, Jérôme Urbain, Joëlle Tilmanne and Thierry Dutoit University of Mons,

More information

Using ELAN for transcription and annotation

Using ELAN for transcription and annotation Using ELAN for transcription and annotation Anthony Jukes What is ELAN? ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

Framework for Joint Recognition of Pronounced and Spelled Proper Names

Framework for Joint Recognition of Pronounced and Spelled Proper Names Framework for Joint Recognition of Pronounced and Spelled Proper Names by Atiwong Suchato B.S. Electrical Engineering, (1998) Chulalongkorn University Submitted to the Department of Electrical Engineering

More information

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Robustness of a Spoken Dialogue Interface for a Personal Assistant Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia

More information

VACA: A Tool for Qualitative Video Analysis

VACA: A Tool for Qualitative Video Analysis VACA: A Tool for Qualitative Video Analysis Brandon Burr Stanford University 353 Serra Mall, Room 160 Stanford, CA 94305 USA bburr@stanford.edu Abstract In experimental research the job of analyzing data

More information

customer care solutions

customer care solutions customer care solutions from Nuance white paper :: Understanding Natural Language Learning to speak customer-ese In recent years speech recognition systems have made impressive advances in their ability

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS Fanbo Xiang UIUC Physics 193 POM Professor Steven M. Errede Fall 2014 1 Introduction Chords, an essential part of music, have long been analyzed. Different

More information

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA

More information

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered

IEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior

More information

Teacher s Guide for Let s Go Mastery Tests

Teacher s Guide for Let s Go Mastery Tests Teacher s Guide for Let s Go Mastery Tests Version 1.1 Copyright 1999~2003, DynEd International, Inc. March 2003 Table of Contents Introduction...3 Mastery Tests...4 When to Take a Test...4 Types of Test

More information

Speech Recognition Software Review

Speech Recognition Software Review Contents 1 Abstract... 2 2 About Recognition Software... 3 3 How to Choose Recognition Software... 4 3.1 Standard Features of Recognition Software... 4 3.2 Definitions... 4 3.3 Models... 5 3.3.1 VoxForge...

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Toward completely automated vowel extraction: Introducing DARLA

Toward completely automated vowel extraction: Introducing DARLA Toward completely automated vowel extraction: Introducing DARLA Sravana Reddy and James N. Stanford July 2015 (preprint) Abstract Automatic Speech Recognition (ASR) is reaching further and further into

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty 1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment

More information

WATER BODY EXTRACTION FROM MULTI SPECTRAL IMAGE BY SPECTRAL PATTERN ANALYSIS

WATER BODY EXTRACTION FROM MULTI SPECTRAL IMAGE BY SPECTRAL PATTERN ANALYSIS WATER BODY EXTRACTION FROM MULTI SPECTRAL IMAGE BY SPECTRAL PATTERN ANALYSIS Nguyen Dinh Duong Department of Environmental Information Study and Analysis, Institute of Geography, 18 Hoang Quoc Viet Rd.,

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

6.080/6.089 GITCS Feb 12, 2008. Lecture 3

6.080/6.089 GITCS Feb 12, 2008. Lecture 3 6.8/6.89 GITCS Feb 2, 28 Lecturer: Scott Aaronson Lecture 3 Scribe: Adam Rogal Administrivia. Scribe notes The purpose of scribe notes is to transcribe our lectures. Although I have formal notes of my

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Reading Assistant: Technology for Guided Oral Reading

Reading Assistant: Technology for Guided Oral Reading A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director

More information

Voice Driven Animation System

Voice Driven Animation System Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take

More information

Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details

Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting

More information

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence

More information

Portions have been extracted from this report to protect the identity of the student. RIT/NTID AURAL REHABILITATION REPORT Academic Year 2003 2004

Portions have been extracted from this report to protect the identity of the student. RIT/NTID AURAL REHABILITATION REPORT Academic Year 2003 2004 Portions have been extracted from this report to protect the identity of the student. Sessions: 9/03 5/04 Device: N24 cochlear implant Speech processors: 3G & Sprint RIT/NTID AURAL REHABILITATION REPORT

More information

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,

More information

DynEd International, Inc.

DynEd International, Inc. General Description: Proficiency Level: Course Description: Computer-based Tools: Teacher Tools: Assessment: Teacher Materials: is a multimedia course for beginning through advanced-level students of spoken

More information

A Guide to Cambridge English: Preliminary

A Guide to Cambridge English: Preliminary Cambridge English: Preliminary, also known as the Preliminary English Test (PET), is part of a comprehensive range of exams developed by Cambridge English Language Assessment. Cambridge English exams have

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS June 2005 Authorized for Distribution by the New York State Education Department "NYSTCE," "New York State Teacher Certification Examinations," and the

More information

Technologies for Voice Portal Platform

Technologies for Voice Portal Platform Technologies for Voice Portal Platform V Yasushi Yamazaki V Hitoshi Iwamida V Kazuhiro Watanabe (Manuscript received November 28, 2003) The voice user interface is an important tool for realizing natural,

More information