Building a Regionally Inclusive Dictionary for Speech Recognition
|
|
- Primrose Sharp
- 7 years ago
- Views:
Transcription
1 SPRING 2004 Computer Science & Linguistics Building a Regionally Inclusive Dictionary for Speech Recognition Speech Recognition (SR) is the automated conversion of speech into written text. Applications range from simple phone-based information services to commercial-grade automated customer service systems, such as those for airline phone reservations. While this process is complex in and of itself, it is further complicated by the fact that speakers (the users) from different parts of the country have varying accents and pronounce the same words differently. Our aim is to create a more speaker-independent SR system while maintaining speed and accuracy of transcription. This requires the construction of an SR dictionary that takes into account the existence of multiple pronunciations for the same word. However, the existence of too many alternate pronunciations overloads the system and is detrimental to accuracy and speed. By finding the optimal number of pronunciations per word, the percentage of words correctly identified by the SR system increased from 78% using the traditional technique to 85.7% using the improved method outlined in this article. This brings the technology 35% closer to the goal of complete recognition and the use of speech as the primary method of human-computer interaction. Justin Burdick Speech Recognition (SR) is a process that transcribes speech into text using a computer. Many repetitive phone tasks can be automated with speech recognition technology, saving businesses significant amounts of money. However, the transcription of everyday human speech is significantly more difficult than simply the recognition of a small set of words, as is the case in a SR-based phone information-retrieval system (e.g. 411 services). A complete Speech Recognition system is fairly complex and consists of various subsystems that interact together to convert speech into written text, as shown in Figure 1. A typical SR system utilizes a common tool for pattern-matching called Hidden Markov Models (HMM) to model parts of speech known as phonemes (similar to syllables) [1]. The models are first trained using a set of training data, which consists of a large number of wave sound files with speech and the corresponding text transcriptions. Each speech file is converted into a series of observations known as feature vectors." These are sets of numbers which describe the sound mathematically by extracting specific information from the waveform at 25-millisecond intervals. The computer then matches this numerical representation to the corresponding text transcription included in the training data. Applied over the entire training set, this process creates a series of models which later allow the SR system to translate speech into text. The training process results in the creation of two data files: one consisting of a model of properties for each phoneme and another of a list of pronunciations for each word. Phonemes, the most fundamental units of speech, are single sounds; each phoneme is modeled by its average sound, the variation of that sound across various speakers, and the transition probabilities between it and other phonemes. The list of pronunciations, termed the pronouncing dictionary, contains each of the words used during training along with the sequence of phonemes that compose that word. Once the models are trained, the system can be used to perform recognition. In order to produce meaningful sentences, the SR system relies on a dictionary and a grammar model as well as the trained phoneme models from the training phase. Transcription is accomplished using the Viterbi algorithm. This algorithm divides the incoming speech into individual observations and processes one observation at a time by comparing it to every possible phoneme. When the algorithm moves to the next observation, it eliminates paths that have a low matching probability. By this elimination at each observation, only the most probable paths survive until the end of the observation sequence; the recognized word sequence can then be generated. The dictionary is important in this process. As the system processes each observation, it must check the dictionary to see if it has made a word yet. Once the word is recognized, the system uses the grammar file to influence the next word chosen, which is very valuable for making the sentence meaningful, similar to grammar check in a wordprocessing program. Implementing an effective SR system requires the synthesis of concepts 1
2 SURJ. Building the Dictionary Figure 1. A simplified model of an HMM-based speech recognition system from several fields, including: Signal Processing (for extracting important features from the sound waves) Probability and Statistics (for defining and training the models and recognition) Linguistics (for building the dictionary and the grammar model) Computer Science (for creating efficient search algorithms) Improvements in any of the subsystems above could result in better overall performance. The following performance goals are among the main objectives of any SR system [2]: 1. Increased Accuracy Accuracy is measured as a percentage of words (or sentences) that are correctly detected. To emphasize the importance of this value, consider a certain SR system that has an accuracy of 99%. This means that during the task of dictation, one word in every 100 words is identified incorrectly. For the average dictation, such an error rate would result in approximately six errors per single-spaced page. Searching for and correcting these typos is a tedious and time-consuming task. This shows that even an accuracy of 99% may not be sufficiently high for the task of dictation. 2. Increased Speed Many speech recognition applications must run on small hand-held devices with limited CPU power. It is important for the SR system to be as computationally inexpensive as possible. This allows the system to transcribe speech in realtime, even on a low-performance system like a PDA or cell phone. 3. Speaker Independence Most SR systems can operate with very high accuracy if they are trained to a specific speaker. However, problems can arise when the system attempts to transcribe the speech of a different person. The reasons for this include, but are not limited to: a. Different speakers may pronounce the same phonemes (sub-words) differently, e.g. a speaker from Brooklyn may pronounce certain vowels differently than a speaker from California. b. Different speakers may pronounce the same words differently, with a different sequence of phonemes, e.g. the word maybe could be pronounced as m ey b iy or as m ey v iy. In this project, our goal is to improve recognition of speaker independent SR systems by taking into account that different speakers may pronounce similar words differently. This requires building a dictionary that has multiple pronunciations for any given word. The details of building such a dictionary are explained below. In order to test the effects of this approach, a set of carefully designed experiments were conducted, the results of which are in the Results and Discussion section. The TIMIT (Texas Instruments- Massachusetts Institute of Technology) database was used for both training and testing of the SR system. This database contains audio files of sample sentences as well as word-level and phonemelevel transcriptions. However, we needed to create our own "pronouncing dictionary" before we could do training and recognition. Each word in a pronouncing dictionary is associated with a series of phonemes that approximate the sound of that word. However, some words need multiple pronunciations in order to be best represented (see Table 1 for examples). When this sequence of phonemes is pronounced, it accurately approximates the sound of the word being spoken. All of the TIMIT data was used in the experiments. About 70% was used to train the models, and the rest was used for performing the testing. Table 2 summarizes key characteristics of the TIMIT data. Method limitations In order to provide a versatile training set, data was collected from speakers spanning eight different geographical areas to capture regional dialects [4] (shown in Figure 2). Unfortunately, these regional dialects introduce problems for the construction of a dictionary. Since many of the speakers pronounce words differently especially in the case of short, common words multiple transcriptions for the same word were common. In fact, the Word As Coauthors Coffee Phoneme-level Transcription ae z ax z kcl k ow ao th axr z k aa iy f k ao f iy Table 1. Sample dictionary entries. Multiple pronunciations can exist for the same word. For example, the first pronunciation of coffee is that of a typical New Yorker, while the second is that of someone from the old Northwest. 2
3 Number of... Speakers 630 (70% male, 30% female) Utterances 6300 (10 per speaker) Distinct texts 2342 Words 6099 Utterances in the training set 4620 Utterances in the test set 1680 Male speakers 326 (training set) (test set) Female speakers 136 (training set) + 56 (test set) Table 2. Key characteristics of the TIMIT data unedited dictionary contained an average of two pronunciations per word [1]. However, these pronunciations tended to cluster around certain words, with most words having only one pronunciation but with others having six or more. A certain level of multiple pronunciations can be helpful at capturing common divergences in pronunciation (e.g. potato/pot_to), as even the best hand-made dictionaries contain 1.1 to 1.2 pronunciations per word [5]. Too many pronunciations per word, however, could be misleading when the computer performs recognition, because a poorly recognized string of phonemes could cause a word mismatch that makes a sentence meaningless. Methodology In order to reduce the number of redundant definitions in the dictionary, two methods were investigated. The first, called skimming, simply edits out and deletes low-frequency transcriptions. This method defined a threshold based on the most commonly occurring pronunciations. Anything below this threshold was removed from the dictionary. The second method, called percentaging, encodes the frequency at which each pronunciation was encountered into the definitions themselves, which a speech recognizer can use to modify its recognition network. As mentioned earlier, the dictionary was created by examining all the speech transcriptions during training mode. The original system just added all of the different combinations of word and phonemes into the system. The new method does the same thing, except that it keeps count of how many times each combination occurs. These counts can be used to perform either skimming (removal of low-occurrence pairs), or percentaging (placing percentage occurrence information with the data). This paper examines the effect of skimming alone, percentaging alone, and finally skimming followed by percentaging on SR system accuracy. Results and Discussion The two systems of dictionary making (skimming and percentaging) were tested independently and then with the 2-step method (skimming followed by percentaging). As shown in Figure 4, skimming examples of dictionaries after processing provided a much larger accuracy gain than did percentaging. When the two were used simultaneously, percentaging added a Figure 2: Speaker Breakdown by Region (total number of speakers= 630). SPRING 2004 very slim increase in accuracy to the method of skimming. An unexpected bonus of using the skimming method is a reduction in required recognition time. Since there are fewer possible recognition sequences, the computer can process each speech fragment more rapidly. At optimal skimming (30%), the test took only two hours, rather than the full three, which is a 33% reduction in time. Furthermore, at this level of skimming, there was an average of 1.6 pronunciations per word, as opposed to 2.0 pronunciations without skimming. Thus, not only does skimming provide more accurate speech recognition, it also allows for faster recognition. Conclusion Since people speak in many differ-ent ways, it is expected that having a large dictionary, inclusive of all pronunciation and dialectic differences, would be a valuable aid to recognition. However, this paper demonstrates that the most effective SR system dictionary may be one with a small number of select alternate pronunciations per word. The dictionary must be built so that a balance is found between being inclusive of pronunciation variants and of conforming to the limitations of a computerized recognition system. This study observed that while a typical SR system contains on average 2.0 pronunciations per word, maximum 3
4 SURJ. SR system accuracy is achieved at 1.6 pronunciations per word. Tailoring a dictionary to contain fewer pronunciations per word is best achieved manually; however, given the size of a typical SR system dictionary, (13,000 entries) a computerized method of deleting erroneous and infrequently used pronunciations is more practical. This study revealed that the method of skimming both effectively increases recognition accuracy and results in a 33% reduction in the time required for speech recognition tasks. An example of an unaltered dictionary Clearly, about ax b aw t is the most commonly given pronunciation for the word about, and so it should be given the highest weight, whereas definitions bah and baw should just be erased. Examples of dictionaries after processing After Skimming 4 about ax b aw d 17 about ax b aw t Here we have done skimming, deleting the least frequently occurring pronunciations. After Percentaging 3.85% about ax b ae t 7.69% about ax b ah 15.38% about ax b aw d 65.38% about ax b aw t 3.85% about b ah 3.85% about b aw Original 1 about ax b ae t 2 about ax b ah 4 about ax b aw d 17 about ax b aw t 1 about b ah 1 about b aw Here we have done percentaging, inserting the frequency of each pronunciation as a percentage. Figure 3: The process of modifying the dictionary Table 3. Simulation Test Results Figure 4. Test Results 4
5 SPRING 2004 References 1. Mohajer, K., Zhong-Min Hu, Pratt, V. Time Boundary Assisted speech recognition. International Journal of Information Fusion (Special Issue on Multi-Sensor Information Fusion For Speech Processing Applications). 2. Rabiner, Juang. Fundamentals of Speech Recognition, Chapter 6, Prentice Hall Signal Processing Series, New Jersey, Young, Evermann, Kershaw, Moore, Odell, Ollason, Valtchev, Woodland. The HTK Book, Ch5, Cambridge University Engineering Department, Garofolo, Lamel, Fisher, Fiscus, Pallett, Dahlgren. TIMIT Printed Documentation, Ch5, U.S. Depart. Of Commerce, Hain, T., Woodland, P.C., Evermann, G. et al. New features in the CU-HTK system for transcription of conversational telephone speech. Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP 01) IEEE International Conference on, Vol.1, Iss., Pages:57-60 vol Culicover, Peter. Lecture: An Introduction to Language in the Humanities. Spring variation.pdf. Justy Burdick Justy Burdick is a sophomore majoring in Electrical Engineering and is considering a minor in Computer Science. He would like to thank Professor Vaughan Pratt and the URP office for sponsoring this Research Experience for Undergraduates project. Furthermore, he would like to thank Keyvan Mohajer for the guidance and help he received at every step of the research process. Finally, he would like to thank the 2002 and 2003 speech group REU team members: Simon Hu, Melissa Mansur, Ryan Bickerstaff, Kenneth Lee, Gu Pan Grace. 5
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationSpeech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationA Comparative Analysis of Speech Recognition Platforms
Communications of the IIMA Volume 9 Issue 3 Article 2 2009 A Comparative Analysis of Speech Recognition Platforms Ore A. Iona College Follow this and additional works at: http://scholarworks.lib.csusb.edu/ciima
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationSASSC: A Standard Arabic Single Speaker Corpus
SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science
More informationIMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS
IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS Yeon-Jun Kim, Ann Syrdal AT&T Labs-Research, 180 Park Ave. Florham Park, NJ 07932 Matthias Jilka Institut für Linguistik,
More informationInformation Leakage in Encrypted Network Traffic
Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)
More informationImproving Automatic Forced Alignment for Dysarthric Speech Transcription
Improving Automatic Forced Alignment for Dysarthric Speech Transcription Yu Ting Yeung 2, Ka Ho Wong 1, Helen Meng 1,2 1 Human-Computer Communications Laboratory, Department of Systems Engineering and
More informationThe ROI. of Speech Tuning
The ROI of Speech Tuning Executive Summary: Speech tuning is a process of improving speech applications after they have been deployed by reviewing how users interact with the system and testing changes.
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationDevelop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
More informationGenerating Training Data for Medical Dictations
Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko
More informationCHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present
CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationObjective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
More informationSpecialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationSpeech Analytics. Whitepaper
Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the
More informationSpeech Technology Project 2004 Building an HMM Speech Recogniser for Dutch
Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch Frans Adriaans Markus Heukelom Marijn Koolen Tom Lentz Ork de Rooij Daan Vreeswijk Supervision: Rob van Son 9th July 2004 Contents
More informationPhonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential
white paper Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential A Whitepaper by Jacob Garland, Colin Blake, Mark Finlay and Drew Lanham Nexidia, Inc., Atlanta, GA People who create,
More informationWeighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for
More informationA CHINESE SPEECH DATA WAREHOUSE
A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationVerification of Correct Pronunciation. of Mexican Spanish using Speech Technology 1
Verification of Correct Pronunciation of Mexican Spanish using Speech Technology 1 Ingrid Kirschning and Nancy Aguas Tlatoa Speech Processing Group, ICT, CENTIA 2, Universidad de las Américas- Puebla.
More informationPTE Academic. Score Guide. November 2012. Version 4
PTE Academic Score Guide November 2012 Version 4 PTE Academic Score Guide Copyright Pearson Education Ltd 2012. All rights reserved; no part of this publication may be reproduced without the prior written
More informationPROBLEM-SOLVING SKILLS By Robert L. Harrold http://www.ndsu.nodak.edu/ndsu/marmcdon/assessment/assessment_techniques/problem_solving_skills.
PROBLEM-SOLVING SKILLS By Robert L. Harrold http://www.ndsu.nodak.edu/ndsu/marmcdon/assessment/assessment_techniques/problem_solving_skills.htm The initial steps in assessing problem-solving are obvious
More informationComparative Analysis on the Armenian and Korean Languages
Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State
More informationChapter 8: Quantitative Sampling
Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or
More informationThe Impact of Using Technology in Teaching English as a Second Language
English Language and Literature Studies; Vol. 3, No. 1; 2013 ISSN 1925-4768 E-ISSN 1925-4776 Published by Canadian Center of Science and Education The Impact of Using Technology in Teaching English as
More informationMulti-Lingual Display of Business Documents
The Data Center Multi-Lingual Display of Business Documents David L. Brock, Edmund W. Schuster, and Chutima Thumrattranapruk The Data Center, Massachusetts Institute of Technology, Building 35, Room 212,
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationSpeech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness
Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Worse than not knowing is having information that you didn t know you had. Let the data tell me my inherent
More informationTEACHER NOTES. For information about how to buy the guide, visit www.pearsonpte.com/prepare
TEACHER NOTES The Official Guide contains: information about the format of PTE Academic authentic test questions to practise answering sample responses and explanations test taking strategies over 200
More informationPerceived Speech Quality Prediction for Voice over IP-based Networks
Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,
More informationDragon Solutions Enterprise Profile Management
Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible
More informationUsing Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents
Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationEnterprise Voice Technology Solutions: A Primer
Cognizant 20-20 Insights Enterprise Voice Technology Solutions: A Primer A successful enterprise voice journey starts with clearly understanding the range of technology components and options, and often
More informationKhalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska
PROBLEM STATEMENT A ROBUST COMPRESSION SYSTEM FOR LOW BIT RATE TELEMETRY - TEST RESULTS WITH LUNAR DATA Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska The
More informationhave more skill and perform more complex
Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such
More informationCorpus Design for a Unit Selection Database
Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus
More informationKnowledge Management and Speech Recognition
Knowledge Management and Speech Recognition by James Allan Knowledge Management (KM) generally refers to techniques that allow an organization to capture information and practices of its members and customers,
More informationComp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The
More informationVOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationEXPERIMENTAL DESIGN REFERENCE
EXPERIMENTAL DESIGN REFERENCE Scenario: A group of students is assigned a Populations Project in their Ninth Grade Earth Science class. They decide to determine the effect of sunlight on radish plants.
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.
More informationSpeech recognition technology for mobile phones
Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such
More informationThe LENA TM Language Environment Analysis System:
FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September
More informationThe AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis
The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis Hüseyin Çakmak, Jérôme Urbain, Joëlle Tilmanne and Thierry Dutoit University of Mons,
More informationUsing ELAN for transcription and annotation
Using ELAN for transcription and annotation Anthony Jukes What is ELAN? ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationFramework for Joint Recognition of Pronounced and Spelled Proper Names
Framework for Joint Recognition of Pronounced and Spelled Proper Names by Atiwong Suchato B.S. Electrical Engineering, (1998) Chulalongkorn University Submitted to the Department of Electrical Engineering
More informationRobustness of a Spoken Dialogue Interface for a Personal Assistant
Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia
More informationVACA: A Tool for Qualitative Video Analysis
VACA: A Tool for Qualitative Video Analysis Brandon Burr Stanford University 353 Serra Mall, Room 160 Stanford, CA 94305 USA bburr@stanford.edu Abstract In experimental research the job of analyzing data
More informationcustomer care solutions
customer care solutions from Nuance white paper :: Understanding Natural Language Learning to speak customer-ese In recent years speech recognition systems have made impressive advances in their ability
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationFOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM
FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS Fanbo Xiang UIUC Physics 193 POM Professor Steven M. Errede Fall 2014 1 Introduction Chords, an essential part of music, have long been analyzed. Different
More informationABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
More informationIEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior
More informationTeacher s Guide for Let s Go Mastery Tests
Teacher s Guide for Let s Go Mastery Tests Version 1.1 Copyright 1999~2003, DynEd International, Inc. March 2003 Table of Contents Introduction...3 Mastery Tests...4 When to Take a Test...4 Types of Test
More informationSpeech Recognition Software Review
Contents 1 Abstract... 2 2 About Recognition Software... 3 3 How to Choose Recognition Software... 4 3.1 Standard Features of Recognition Software... 4 3.2 Definitions... 4 3.3 Models... 5 3.3.1 VoxForge...
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationToward completely automated vowel extraction: Introducing DARLA
Toward completely automated vowel extraction: Introducing DARLA Sravana Reddy and James N. Stanford July 2015 (preprint) Abstract Automatic Speech Recognition (ASR) is reaching further and further into
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationUnderstanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty
1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment
More informationWATER BODY EXTRACTION FROM MULTI SPECTRAL IMAGE BY SPECTRAL PATTERN ANALYSIS
WATER BODY EXTRACTION FROM MULTI SPECTRAL IMAGE BY SPECTRAL PATTERN ANALYSIS Nguyen Dinh Duong Department of Environmental Information Study and Analysis, Institute of Geography, 18 Hoang Quoc Viet Rd.,
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More information6.080/6.089 GITCS Feb 12, 2008. Lecture 3
6.8/6.89 GITCS Feb 2, 28 Lecturer: Scott Aaronson Lecture 3 Scribe: Adam Rogal Administrivia. Scribe notes The purpose of scribe notes is to transcribe our lectures. Although I have formal notes of my
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationReading Assistant: Technology for Guided Oral Reading
A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director
More informationVoice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
More informationStrand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting
More informationIFS-8000 V2.0 INFORMATION FUSION SYSTEM
IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence
More informationPortions have been extracted from this report to protect the identity of the student. RIT/NTID AURAL REHABILITATION REPORT Academic Year 2003 2004
Portions have been extracted from this report to protect the identity of the student. Sessions: 9/03 5/04 Device: N24 cochlear implant Speech processors: 3G & Sprint RIT/NTID AURAL REHABILITATION REPORT
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationDynEd International, Inc.
General Description: Proficiency Level: Course Description: Computer-based Tools: Teacher Tools: Assessment: Teacher Materials: is a multimedia course for beginning through advanced-level students of spoken
More informationA Guide to Cambridge English: Preliminary
Cambridge English: Preliminary, also known as the Preliminary English Test (PET), is part of a comprehensive range of exams developed by Cambridge English Language Assessment. Cambridge English exams have
More informationNEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS June 2005 Authorized for Distribution by the New York State Education Department "NYSTCE," "New York State Teacher Certification Examinations," and the
More informationTechnologies for Voice Portal Platform
Technologies for Voice Portal Platform V Yasushi Yamazaki V Hitoshi Iwamida V Kazuhiro Watanabe (Manuscript received November 28, 2003) The voice user interface is an important tool for realizing natural,
More information