BANKIR NameConverter

Size: px
Start display at page:

Download "BANKIR NameConverter"

Transcription

1 Master s Thesis BANKIR NameConverter - Automatic generation of transcriptions of fund names in speech recognition Terese Arvidsson Language Engineering Programme Uppsala University & Telia Promotor AB Uppsala Supervisors: Mats Dahllöf, Uppsala University Ove Andersson, Telia Promotor AB

2 Abstract This thesis describes the development and implementation of NameConverter, a program that automatically generates phonetic transcriptions for speech recognition in Nuance Systems. These transcriptions will reflect the speaker variations that a voice control system has to deal with. The program is developed for fund names and stock names. An automatic generation of transcriptions is problematic for names, since the relation between spelling and pronunciation often is irregular. Therefore, an existing transcription, developed for Infovox speech synthesis, is used when generating the transcriptions for speech recognition. The advantage of this approach is that one pronunciation is known. Infovox speech synthesis and Nuance speech recognition use different phonetic alphabets, and a comparison between them was made. An investigation on speaker variation was carried out to find which transcriptions give high accuracies. These transcriptions were used as reference when implementing NameConverter. The results indicate that a voice control system can achieve a higher accuracy using the NameConverter component instead of Autopron, a utility in Nuance Systems, used in many voice control applications. Sammanfattning Detta examensarbete beskriver utvecklingen av NameConverter, ett program som automatiskt genererar fonetiska transkriptioner för taligenkänning i Nuance Systems. Dessa transkriptioner ska återspegla den talarvariation som ett talsvarssystem utsätts för. Programmet är utvecklat för fond- och aktienamn, som ofta är oregelbundna i stavning och uttal. Oregelbundenheten försvårar en automatisk generering av fonetiska transkriptioner. I examensarbetet används ett befintligt transkriptionssystem, skrivet för Infovox talsyntes, för att generera de transkriptioner som behövs för taligenkänningen. Fördelen med den här metoden är att en uttalsvariant är känd. Infovox talsyntes och Nuance röstigenkänning använder sig av två olika fonetiska alfabet, och en undersökning av skillnaderna mellan dessa har gjorts. En undersökning av talarvariation gjordes för att skapa transkriptioner för röstigenkänningen. Dessa användes som utgångspunkt vid implementationen av NameConverter. NameConverter testades och utvärderades med ca 240 namn. Programmets genererade transkriptioner resulterade i en klart högre igenkänning i jämförelse med transkriptioner genererade av Autopron, den automatiska genereringsfunktion som ingår i Nuance Systems. ii

3 Contents 1 INTRODUCTION THE TASK PURPOSE OUTLINE OF THIS THESIS BACKGROUND BANKIR ONEVOICE INFOVOX TEXT-TO-SPEECH SYSTEM Infovox transcriptions NUANCE SPEECH RECOGNITION SYSTEM Nuance transcriptions Building Nuance grammars Grammar specification language Subgrammars Autopron function Problems with irregular words DIFFERENCES IN THE TWO PHONEME SYSTEMS Consonants Vowels Prosody IMPLEMENTATION BUILDING NUANCE GRAMMARS Special characters Abbreviations Numbers Return values Confidence scores Dynamic grammars BANKIR NAMECONVERTER BANKIRName overview NAMECONVERTER RULES Structure rules Conversion rules Generation rules Searching for word parts Extended functionality of NameConverter EVALUATION EVALUATION TOOL COMPARISON BETWEEN THE TWO TESTS CONCLUSIONS ADVANTAGE OF A SPEECH SYNTHESIS TRANSCRIPTION THE GRAMMAR STRUCTURE USEFULNESS IN OTHER APPLICATIONS MULTILINGUAL SUPPORT...34 REFERENCES...35 APPENDIX A: RULSYS...36 APPENDIX B: NUANCE CPA NOTATION...38 APPENDIX C: FUND NAMES...39 iii

4 Swedish fund names...39 APPENDIX D: RESULTS...44 RESULTS IN THE NAMECONVERTER TEST...44 RESULT IN THE AUTOPRON TEST...45 iv

5 Acknowledgements First of all, I would like to thank my academic supervisor at Uppsala University, Mats Dahllöf for valuable feedback concerning this thesis. This thesis was made at Telia Promotor, Uppsala, and I would like to thank my industrial supervisor, Ove Andersson for encouraging me and always having time for my questions during the work. I would also like to thank Fredrik Engberg, Telia Promotor AB, for helping me with the evaluation. I really enjoyed the discussions concerning BANKIRName with Alf Bergstrand and Anders Möllmark. Last, but not least, I would like to thank my father, Mats-Åke Arvidsson, and Peter Cedermark for proofreading this thesis. v

6 1 Introduction BANKIR (BANkens KundIngång med Röststyrning) is a project at Telia Promotor AB. The purpose of the project is to provide banks an automatic service with voice control. The project offers many bank functions, for example providing stock market information. The customer will be able to check rates, buy and sell stock and fund shares. Not only rates will change over time but the set of names as new corporations are launched as well. The voice control system has to be able to understand and pronounce these new names. 1.1 The task This thesis describes the development and evaluation of NameConverter, which is a component in BANKIRName, a system in BANKIR for automatic insertion of fund and stock names. This system also includes the NameHandler (Bergstrand & Möllmark, 2001), a component that updates the lexicon during run-time as new names are added to BANKIR. BANKIR NameConverter will generate transcriptions that represent the possible pronunciations of a name. These transcriptions reflect speaker variation, which the speech recognition has to interpret. The relation between spelling and pronunciations of names can often be irregular, which complicates an automatic generation of phonetic description. For that reason, the automatic transcription generation made by NameConverter will be based on an existing phonetic transcription used for speech synthesis. Using an input of one possible pronunciation of an utterance, several transcriptions will be generated for speech recognition. These transcriptions will improve the understanding of names in the voice control system. The speech synthesis transcription is used in Infovox Enterprise, a text-tospeech system based on diphone synthesis. The generated pronunciations are used in the Nuance speech recognition platform. These two systems are to be integrated for the system to fully understand and respond to the user with speech synthesis instead of pre-recorded speech. The speech recognition used in BANKIR is developed by Nuance Systems and a Nuance grammar package was built and evaluated to investigate which transcriptions give the greatest accuracy. 1.2 Purpose The purpose of this thesis is to develop and evaluate BANKIR NameConverter, which automatically generates a number of phonetic transcriptions used for speech recognition. The application uses a phonetic transcription, developed for speech synthesis, as input to generate alternative pronunciation possibilities of fund names. The application is a component of a voice control concept that can be used in an automatic banking service to give customers an easy access to stock market information and facilitate their business transactions. A fund name can be pronounced in several ways and the number of possible pronunciations determines the number of phonetic strings that will be generated. An investigation will be carried out to find out what kinds of variations exist among speakers. The evaluation includes a comparison of the phonetic alphabets used in the two systems. Infovox Enterprise test-to-speech system and Nuance speech recognition 1

7 platform have been developed independently and, therefore, they have different transcription notations in the lexicon, one for text-to-speech and one for speech recognition. There are several differences between the lexicons. The lexicons have different transcription conventions, resulting in phonetic distinctions having different typographical representations in the two lexicons. This difference is important to deal with, since one transcription of a given phonetic distinction in the text-to-speech system may signify a different phonetic distinction in the speech recognition system. This could result in either an erroneous response from the system or, in most cases, no response at all. None of these results is acceptable in a voice control supported system. A second difference in the lexicons is due to the lexicons being developed for different reasons. For example, one important phonetic distinction when dealing with text-to-speech may have no significance in speech recognition. In order to find out which transcriptions should be generated for the fund names, a Nuance Grammar package was built and evaluated. Initially, the aim was to be able to generate all existing Swedish and foreign fund names at Swedish banks with NameConverter. However, it proved very difficult, and in some cases impossible, to phonetically describe the English words with the Swedish phoneme set. Every acoustic model is adapted to one particular language and cannot be used for others. Therefore, the task was limited to Swedish fund names. The thesis will also include an evaluation of Autopron, which is a utility in Nuance Systems that automatically generates pronunciations of unrecognized words, i.e. those not found in the built-in dictionary and which also lack phonetic transcriptions in the user-defined dictionary. Autopron generates the pronunciations by using the orthographic string of the word. It is rule-based, and has its shortcomings on words with irregular spellings and pronunciations, for example names. A comparison of the different systems will be done as well. Their generations of transcriptions are organized in different ways that result in contrasting transcriptions for the same words. These diverse transcriptions will naturally, result in different recognition results. Some words will be recognized by both systems but other words might only be recognized by one of the systems. 1.3 Outline of this thesis This thesis consists of five sections. This section is an introduction to the assignment. The second section gives a background of the concept BANKIR and the system components. The third section describes the approach and the implementation. The fourth section contains an evaluation and a comparison to another system. The last section describes conclusions from this work. 2

8 2 Background In a voice control application, the system responses can be made either by prerecorded speech or with synthetic speech. Pre-recorded speech consists of prerecorded prompts, which are selected in response to the interpretation of the user utterance. There are two approaches to synthetic speech: formant synthesis and diphone synthesis. Formant synthesis focuses on formant sequences and transitions and is yet to be improved for a natural-sounding synthesis. Diphone synthesis is used to capture the transitional information between segments. A database containing sound files of diphones, i.e. segments beginning in the center of one phone and ending in the center of the next phone, is used. In the two words tak and tam, for example, the diphones /#t/ta/ak/k#/ and /#t/ta/am/m#/ are used, respectively. There can be more than one sound file representing each set of sounds. In these two words, the same sound file is not used for the diphone /ta/. In the first word, the vowel is not nasalized, but in the second it is. There are two main approaches to automatic speech recognition, a knowledgebased approach and a data-based approach (Keller, 1994). In the first approach, human linguistic knowledge is expressed as acoustic, phonetic, syntactic and morphological rules. In a data-based approach, as in Nuance Speech Recognition, algorithms are used to automatically extract knowledge directly from the speech signal to model its information. Algorithms that can be used for this purpose are Artificial Neural Network and HMMs (Hidden Markov Models). Nuance Systems use HMMs in the interpretation and language understanding 1 of spoken input. The speech pattern of a language is structured by HMMs. A finite set of segments is defined, and the HMMs accept all combinations of the phonemes of a given language. These combinations can form diphones, polyphones and whole words. The interpretation of a speech sound depends on its acoustic waveform. The temporal structure and the pauses are employed to structure speech into words (Keller, 1994). A mapping is made from each spoken segment to a written word by using a representation of groups of phonemes, i.e. a phonetic transcription that is interpreted by the HMMs of the acoustic model. In Infovox Name Database, there is only one pronunciation for each word. Since a speech recognition system will be subject to pronunciation variations, each word will need transcriptions that can deal with alternative pronunciations. Speech synthesis and speech recognition have different needs of phonetic detail. To make synthetic speech sound natural, a vowel is nasalized if preceded or proceeded by a nasal, as in human speech. This phonetic detail is redundant in Nuance speech recognition, since the same vowel sound will be recognized equally with or without nasalization. In Swedish, several prosodic elements have to be controlled. The Swedish language uses word accents, which can distinguish words with identical spellings and stress. Differences in pronunciation are very common in current speech. They are caused by speaker variations in, for example, speed or dialect. People speak with vowel and syllable reduction, with long and short vowel or syllable and stressed/unstressed syllable. 1 In this thesis, the term language understanding is used with the same meaning as in Nuance Systems. It is a synonym to speech recognition. 3

9 The need of an automatic generation of transcriptions in BANKIR is for its future use in applications at several banks with different sets of funds and shares. Each bank has different demands on dictionary contents, and the contents will change over time. As a voice control system, BANKIR will have to deal with updating the dictionaries for both speech recognition and speech synthesis. The need for expertise help when maintaining this linguistic information up-to-date is very expensive and time-consuming. BANKIRName has been developed for two reasons: primarily, because it updates the dictionaries automatically when needed, and secondly, because the use of a speech synthesis transcription increases the chance of a better generation of transcriptions for speech recognition. 2.1 BANKIR With BANKIR, banks will be able to offer their customers a personalized banking service. Every bank customer will have his or her personal information, such as account numbers, permanent transactions, and a personal share list, stored separately in the voice control system. This personal information contains the funds and shares of the customer. This information can be accessed through speaker verification in two different ways: by voice authentication or by saying a personal pin code. The customers are given the possibility to update this personal information either by text input or voice enrollment. Naturally, the customer can only add the fund and share names already familiar to the system. Therefore, new names need to be included in the voice control system before their funds and shares are launched. In BANKIRName, new phrases are added via text through a web page interface. These new phrases and their phonetic transcriptions, if not found in Infovox Name Database, will be added to the text-to-speech system and to the recognition system. Firstly, the transcription of the new phrase has to be found by searching Infovox Name Database. Secondly, this transcription is used to generate several transcriptions that can be interpreted by the Nuance recognition system. Three platforms are involved in BANKIR: OneVoice, Infovox and Nuance. 4

10 Customers make business transactions with the voice by phone BANKIR Nuance Speech recognition OneVoice Connects the incoming telephone calls Infovox Text-to-speech Figure 1. The three platforms involved in BANKIR. NameConverter has no direct contact with OneVoice. The component has Infovox transcriptions as input and Nuance transcriptions as output. Infovox Infovox transcription Name database NameConverter Nuance transcription(s) Nuance speech recognition Figure 2. The connection between NameConverter and the two platforms Infovox and Nuance. 5

11 Table 1. Components mentioned in this thesis. Component Description Autopron A utility in Nuance Speech Recognition. It generates a number of phonetic transcriptions from an input of orthographic words. Autopron is used for unrecognized words in several telephony applications and will be compared to NameConverter. CPA Transcription Computer Phonetic Alphabet. This phonetic transcription is used in Nuance Speech Recognition. The CPA transcriptions used in BANKIRName are generated by the NameConverter component. Infovox Enterprise A diphone-based synthesis developed at Babel- Infovox. It uses the RULSYS notation, which is the input to the NameConverter component. Infovox Name Database A database containing approximately 180,000 names including first names and second names, street names, cities, countries and company names. Infovox text-to-speech See Infovox Enterprise. system NameConverter NameHandler NET Nuance (Nuance Systems) Nuance Transcription OneVoice RULSYS The component of BANKIRName that automatically generates phonetic transcriptions for speech recognition. The component of BANKIRName that receives the new fund names, searches for the fund name in Infovox Name Database, and dynamically updates the grammar and the dictionary. NameHandler is the connection between NameConverter and the other system components of BANKIRName. Nuance Evaluation Tool. The development tool used to evaluate NameConverter and Autopron in this thesis. The speech recognition platform used in BANKIR. See CPA transcription. A platform that supports speech recognition, text-tospeech and speaker verification. OneVoice is the connection between the different components in the BANKIR system. The phonetic transcription format used in Infovox Name Database. It is developed at the Royal Institute of Technology, Stockholm. These transcriptions are the input to the NameConverter component. 2.2 OneVoice OneVoice Interactive Voice Response (OneVoice IVR) is a speech-ready voice solutions platform supporting speech recognition, text-to-speech (TTS) and speaker verification. 6

12 In BANKIR, OneVoice recognizes the incoming calls and retrieves the data (both choices made by button presses and spoken utterances). It connects to the databases used in the system as well. 2.3 Infovox text-to-speech system Infovox TTS processes written text, which can be orthographic or phonetic, to produce synthetic speech. If the entered text is orthographic, pronunciation lexicons and general pronunciation rules are used to convert it into phonetic text. Special rules are used for numbers, amounts, and abbreviations. Since most names are spelled and/or pronounced irregularly, a special database including 180,000 names has been constructed. It contains names of people (first names and surnames), companies, streets, cities and countries, which have been manually transcribed. The phonetic text is converted into digitized speech by using an electronic speech production model. Figure 3. Conversion from text to synthetic speech (http://www.infovox.se/work.htm). By permission Infovox transcriptions Infovox transcriptions are used to give synthetic speech a natural sound. Each pronunciation is represented by a transcription containing phonetic and prosodic information. The notation used is RULSYS (see appendix A), developed at the Royal Institute of Technology in Stockholm, Sweden. Word boundaries are marked with #. The phoneme notation consists of capital letters, each letter representing the sound of a phoneme. Each vowel sound has two phonemes: one short and one long. Quantity is marked with :. het Hett #HE:T# #HET# 7

13 The phonemes E, Ä and Ö have additional symbols representing their allophones. Ä and Ö 2 have extra symbols representing their more open allophones preceding R : Här Herr För Förr #HÄ3R# or #H[3R# #HÄ4R# or #H[4R# #FÖ3R# or #F/3R# #FÖ4R# or #F/4R# Most consonants have just one phoneme representation, except for consonants with retroflex allophones: Bord Sorl Kors Hart #BO2D# #SÅ2L# #KÅ2S# #HÅ2T# The nasal [N] in tung is represented as NG, and the fricatives [Ó] in skjorta and [ã] in kedja are represented as SJ and TJ, respectively. Words with accent I are marked with ' before the vowel with the main stress, and words with accent II are marked with " before the main stressed vowel. Compound words have the word compound marked with hy, the stressed vowel in the first segment of the word marked with ", and the stressed vowel in the last segment of the word marked with '. Compound words have accent II: anden = fågeln (the bird) #'ANDEON# anden = spöket (the ghost) #"ANDEON# andblom #"ANDhyBL'OM# 2.4 Nuance speech recognition system Nuance speech recognition is one of the components of Nuance Voice Interface Software, used in several applications in telecommunication, enterprise and Internet solutions where automatic telephone service is needed. Nuance supports several languages, including Swedish. The following flow chart shows the different stages in the speech recognition process in Nuance: 2 As partly shown in the examples, the Swedish characters å, ä and ö have alternative symbols, which is advantageous when using a system that does not support Swedish characters. 8

14 Acoustic models Dictionaries Grammars Audio input Preprocessing Utterance waveform Front-end processing Speech features Recognition search Word string Interpretation (natural language understanding) Meaning Recognition client Recognition server Figure 4. Nuance recognition process overview (reproduction from Nuance Systems, 2001a, by Bergstrand & Möllmark). By permission. The recognition client (RecClient) handles the interaction between the application and the Nuance system by preprocessing utterances, i.e. performing echo cancellation and background noise reduction. The result, a clean utterance, is sent to and interpreted by the recognition server (RecServer). The RecServer performs speech recognition and natural language understanding by using acoustic models, grammars and dictionaries. Nuance Systems have developed several acoustic models. Each model is adapted to a particular language, dialect or a combination of languages. Grammars and dictionaries are specific for each application. The grammar contains the accepted lexical entries in the application. The dictionaries contain the phonetic transcriptions of each entry, described by a language-specific phoneme set. The acoustic models consist of Hidden Markov Models (HMMs) that interpret the meaning of a sampled speech signal. The HMMs provide mappings from the sampled speech signal to a sequence of phonetic units and from the phonetic sequence to the word sequence representing the transcription (Nuance Systems, 2001b, p. 367). The phonetic sequences and the mappings to orthographic words are stored in the dictionary files. The accepted lexical entries (one orthographic word or combinations of orthographic words) are stored in the grammar files. Every acoustic model has a built-in dictionary with a common vocabulary, which is used for natural language understanding. Domain-specific dictionaries can be used alone or combined with the built-in dictionary. These domain-specific dictionaries can be created manually or by a rule-based automatic generation of phonetic strings called Autopron. HMMs can also be used to compute the probability that the interpretation made is correct. HMMs are constructed for every sentence allowed by the grammar (Nuance Systems, 2001b, p. 368). Nuance uses HMMs at many levels: Sentences are broken down into words, words are broken down into phonemes, which in the end are broken down into start, middle and end states (Nuance Systems, 2001b, p. 368). This utility was used in evaluating the results when testing the NameConverter dictionary and the built-in dictionary combined with the Autopron dictionary. 9

15 2.4.1 Nuance transcriptions The Computer Phonetic Alphabet (CPA) in Nuance is a broad transcription, i.e. it ignores many phonetic details. It describes the aspects of a pronunciation just enough to differentiate a phoneme from other phoneme in the same language. Many distinctive features have proved to be inessential in speech recognition and have therefore been omitted in the CPA notation. This results in different sounds having identical transcriptions and some distinctions never being represented in the transcription. For example, the speech recognition does not detect the difference between the word accents (accent I and accent II) in Swedish, thus words with identical spellings, but with different accents, are transcribed identically. Many allophones are also represented equally. For example, the vowels in the word häll (h E l) and herr (h E r) are transcribed identically. Many sounds are not transcribed in the CPA, for example retroflexion and nasalization. Some of the phonetic details transcribed in the RULSYS notation, used by the speech synthesis developed at Babel-Infovox (formerly the division Infovox at Telia Promotor AB), have to be ignored in the conversion to the CPA, since they are redundant in Nuance speech recognition (see section 3.3.2) Building Nuance grammars The Nuance Systems supply a development tool for building application dependent grammars. A Nuance grammar package is needed for every application. The package includes five files, each having a file extension specifying its usage: File.grammar contains the entries that should be recognized. A grammar file is created manually to fit the needs of a particular application. An entry can consist of one or many words. File.dictionary contains the words in the entries with their phonetic transcription(s). File.slot_definitions All slot names used in the application need to be stored in this file; otherwise, a compilation error will occur. A slot can be defined for each type of information that is accepted in the application. It is filled with a value during interpretation. File.missing during grammar compilation, an automatic check is performed to make sure that all words have at least one phonetic transcription. If one or more words in the dictionaries lack a phonetic transcription, a file with the extension.missing is created containing these words. There are two ways of dealing with untranscribed words: their transcriptions can either be added manually to the dictionary file before the next compilation or be added using the Autopron option when recompiling the grammar. File.autopron If specified during grammar compilation, Nuance Autopron function automatically generates phonetic transcriptions for the words not found in any of the dictionaries used in an application. These will be saved in a file with the file extension.autopron Grammar specification language Since an entry in the grammar files can consist of not only one word, but a combination of words as well, syntactic operators are used to combine words. 10

16 These syntactic operators are part of Grammar Specification Language (GSL), developed for Nuance Systems: () Parenthesis is used when one entry consists of more than one word. Ex: (likviditetsfonden mega) [] Brackets filled with two or more optional words. One of the words is used. Ex: (roburs [amerikafond finansfond japanfond])? Optional element: (svenska obligationsfonden?mega) + Word exists 1 to n times: (thank you +very much) * Word exists 0 to n times: (thank you *very much) 11

17 Example of a grammar file: Grammar[ (likviditetsfonden mega){prompt} (roburs [amerikafond finansfond japanfond]){prompt} bosparfonden {prompt} ] The name of the grammar is written with the first letter capitalized. If prerecorded utterances are used, as in most test situations, a word prompt may define for instance how the system should respond, for example No utterance was recognized. Please repeat." The dictionary file contains the phonetic transcriptions of each word: telia t e l i a telia t e l j a roburs r O b U: r s roburs r O: b U: r s Subgrammars Subgrammars are used when a 'slot' in an utterance, i.e. a syntactic space, can be filled with several words. Instead of using square bracket as above, a natural language command can be used to fill slots with a value. The variable f is filled with the values returned by the subgrammar Funds..Fund is a top-level grammar which is specified by the initial. :.Fund[ (roburs Funds:f) {<namn $f>} ] Funds [ amerikafond{return (amerikafond)} finansfond{return (finansfond)} japanfond{return (japanfond)} ] Autopron function As mentioned, the phonetic transcriptions for the words that do not exist in any of the dictionaries are stored in a file with the extension.missing during grammar compilation. Transcriptions of these words can be automatically generated by the Autopron utility if that option is specified when compiling the grammar again. It creates a number of phonetic transcriptions (and a mapping to the orthographic word) in the Autopron file. The maximum number of generated transcriptions can be specified, but if not, the default is ten. The function generates between one and ten transcriptions, depending on how many possible transcriptions of a word can be generated, which depends on its length and how many allophones each sound in the word has. 12

18 When speech recognition and language understanding is performed, Nuance uses the phonetic transcription in the Autopron file as well as the dictionary files Problems with irregular words Since Autopron is rule-based, it does not work well with words with an irregular relation between spelling and pronunciation. When creating an application including names, Nuance Systems suggest the use of the utility pronounce (Nuance Systems, 2001b, p. 37), which generates the correct pronunciation of a regular word. This information can be used to insert transcriptions of irregular words. For example, if you want the spelling of the English name Vaughan you can use a regular word that rhymes with it (for example gone) to find out which transcription is applicable. The pronounce utility (that uses the Autopron function) will correctly give you the pronunciation g O n. You have to add Vaughan v O n manually to the dictionary, and merge this with the rest of the dictionaries used. 2.5 Differences in the two phoneme systems The Infovox phoneme system is more detailed than the CPA used in Nuance. The aim with a text-to-speech system is to sound natural, and every phoneme has many allophones to be used in different contexts. In Nuance speech recognition this is not very important, since a particular sound can be pronounced in various ways and still be recognized correctly by the same phoneme representation Consonants Consonants are described with the same amount of details in the different notations, except for consonants with retroflexion. In RULSYS, retroflexed consonants are treated as one phoneme (2T in bort), while in the CPA, the same sound is treated as separate phonemes (r t in bort). However, in the phoneme description of CPA they are described as a phone sequence Vowels The descriptions of vowels are different in the two sets of phoneme notation. RULSYS has 23 different vowel phones and Nuance 17. In RULSYS, vowels preceding retroflexed consonants have a different phoneme, which results in more open vowels in the synthetic speech. This feature is not essential in speech recognition; the vowel will be recognized as the same no matter which context it has. RULSYS has a special phoneme for unstressed /e/, as in fonden (the fund). Many features in synthetic speech are produced automatically, by choosing the correct diphone segment for a particular context, to create a natural sound. For example, nasalization and velarization are not described in the transcription. These features are not present in the CPA either, because the vowel is interpreted equally by the speech recognition with or without secondary articulation Prosody A speech synthesis needs prosodic parameters to sound natural. Phrasal timing and accents are crucial when listening and understanding synthetic speech. Words with accent I have a higher pitch throughout the word, which needs to be 13

19 transcribed. Words with accent II have two pitch peaks 3, which are transcribed as well. The CPA notation lacks symbols describing prosodic features. 3 The synthetic speech at Infovox is based on the dialect in Stockholm, in which, as most Swedish dialects, accent II words have two peaks. In, for example, Southern Sweden and in many dialects of Dalarna, however, a word with accent II has just one peak. In those dialects, the difference between accent words is the timing of the peak. 14

20 3 Implementation This chapter describes the development and implementation of the NameConverter component. Since the aim with the component is to generate transcriptions that give high accuracies in speech recognition, an investigation on speaker variations was carried out. 239 fund names were recorded by six persons (see APPENDIX C). Each utterance was saved in a WAV file and was used as reference when transcribing the words manually in a Nuance dictionary file. 3.1 Building Nuance grammars A grammar file containing the fund names and a dictionary containing their transcriptions were built. Some characters in the fund names were not allowed in Nuance grammars, called special characters, and some types of words, abbreviations and numbers, showed a very low recognition rate Special characters Some characters, such as numbers, &, - and + are not accepted in Nuance grammars. As a consequence, they have to be transcribed into alphabetical letters. For example, & has to be transcribed as och (and) to be correctly interpreted by the recognition system. These characters are included in many fund names in different ways: Carlson ST-fond år Carlson fond 59+ Eldsjäl 1 Evli euro 50 Handelsbanken Sverige topp 30 index Handelsbankens generationsfond 40-tal KPA etisk blandfond 2 Nordbanken premiepensionsfond Placeringsfonden Seligson & Co Europa 50-indexfond Pension 2040 SPP generation 30tal SEB premiefond 50 Hagströmer & Qviberg likviditetsfond D&G aktiefond As shown above, there are different writing conventions for special characters in similar names. Hyphen, for example, is an optional element in decades, in 50tal (50s) and 40-tal (40s). Hyphens between two numbers can be pronounced as the word till (to) or be omitted in speech. Ampersand is always pronounced as och (and). It can occur alone or with other characters. How these characters and the differences between them are dealt with for consistency is described in section Abbreviations Testing the grammar showed that fund names containing abbreviations almost never were recognized. The abbreviation is pronounced as many words, and if it is 15

Master's Thesis. NameConverter. - Automatic generation of transcriptions of fund names in speech recognition. Terese Arvidsson terarv@stp.ling.uu.

Master's Thesis. NameConverter. - Automatic generation of transcriptions of fund names in speech recognition. Terese Arvidsson terarv@stp.ling.uu. Master's Thesis NameConverter - Automatic generation of transcriptions of fund names in speech recognition 22 th of November, 2001 Terese Arvidsson terarv@stp.ling.uu.se Language Engineering Programme

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Develop Software that Speaks and Listens

Develop Software that Speaks and Listens Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

Moving Enterprise Applications into VoiceXML. May 2002

Moving Enterprise Applications into VoiceXML. May 2002 Moving Enterprise Applications into VoiceXML May 2002 ViaFone Overview ViaFone connects mobile employees to to enterprise systems to to improve overall business performance. Enterprise Application Focus;

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

Web Based Maltese Language Text to Speech Synthesiser

Web Based Maltese Language Text to Speech Synthesiser Web Based Maltese Language Text to Speech Synthesiser Buhagiar Ian & Micallef Paul Faculty of ICT, Department of Computer & Communications Engineering mail@ian-b.net, pjmica@eng.um.edu.mt Abstract An important

More information

SYNTHESISED SPEECH WITH UNIT SELECTION

SYNTHESISED SPEECH WITH UNIT SELECTION Institute of Phonetic Sciences, University of Amsterdam, Proceedings 24 (2001), 57-63. SYNTHESISED SPEECH WITH UNIT SELECTION Creating a restricted domain speech corpus for Dutch Betina Simonsen, Esther

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Corpus Design for a Unit Selection Database

Corpus Design for a Unit Selection Database Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

Language and Literacy

Language and Literacy Language and Literacy In the sections below is a summary of the alignment of the preschool learning foundations with (a) the infant/toddler learning and development foundations, (b) the common core state

More information

Contemporary Linguistics

Contemporary Linguistics Contemporary Linguistics An Introduction Editedby WILLIAM O'GRADY MICHAEL DOBROVOLSKY FRANCIS KATAMBA LONGMAN London and New York Table of contents Dedication Epigraph Series list Acknowledgements Preface

More information

Speech Analytics. Whitepaper

Speech Analytics. Whitepaper Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the

More information

Kindergarten Common Core State Standards: English Language Arts

Kindergarten Common Core State Standards: English Language Arts Kindergarten Common Core State Standards: English Language Arts Reading: Foundational Print Concepts RF.K.1. Demonstrate understanding of the organization and basic features of print. o Follow words from

More information

Voice Driven Animation System

Voice Driven Animation System Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details

Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting

More information

Indiana Department of Education

Indiana Department of Education GRADE 1 READING Guiding Principle: Students read a wide range of fiction, nonfiction, classic, and contemporary works, to build an understanding of texts, of themselves, and of the cultures of the United

More information

FSD Kindergarten READING

FSD Kindergarten READING College and Career Readiness Anchor Standards for Reading Read closely to determine what the text says explicitly and to make logical inferences from it; cite specific textual evidence when writing or

More information

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne Published in: Proceedings of Fonetik 2008 Published: 2008-01-01

More information

9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV

9RLFH$FWLYDWHG,QIRUPDWLRQ(QWU\7HFKQLFDO$VSHFWV Université de Technologie de Compiègne UTC +(8',$6

More information

Developing acoustics models for automatic speech recognition

Developing acoustics models for automatic speech recognition Developing acoustics models for automatic speech recognition GIAMPIERO SALVI Master s Thesis at TMH Supervisor: Håkan Melin Examiner: Rolf Carlson TRITA xxx yyyy-nn iii Abstract This thesis is concerned

More information

Study Plan for Master of Arts in Applied Linguistics

Study Plan for Master of Arts in Applied Linguistics Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

PTE Academic Preparation Course Outline

PTE Academic Preparation Course Outline PTE Academic Preparation Course Outline August 2011 V2 Pearson Education Ltd 2011. No part of this publication may be reproduced without the prior permission of Pearson Education Ltd. Introduction The

More information

Dragon Solutions. Using A Digital Voice Recorder

Dragon Solutions. Using A Digital Voice Recorder Dragon Solutions Using A Digital Voice Recorder COMPLETE REPORTS ON THE GO USING A DIGITAL VOICE RECORDER Professionals across a wide range of industries spend their days in the field traveling from location

More information

Houghton Mifflin Harcourt StoryTown Grade 1. correlated to the. Common Core State Standards Initiative English Language Arts (2010) Grade 1

Houghton Mifflin Harcourt StoryTown Grade 1. correlated to the. Common Core State Standards Initiative English Language Arts (2010) Grade 1 Houghton Mifflin Harcourt StoryTown Grade 1 correlated to the Common Core State Standards Initiative English Language Arts (2010) Grade 1 Reading: Literature Key Ideas and details RL.1.1 Ask and answer

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Pronunciation in English

Pronunciation in English The Electronic Journal for English as a Second Language Pronunciation in English March 2013 Volume 16, Number 4 Title Level Publisher Type of product Minimum Hardware Requirements Software Requirements

More information

Points of Interference in Learning English as a Second Language

Points of Interference in Learning English as a Second Language Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

The sound patterns of language

The sound patterns of language The sound patterns of language Phonology Chapter 5 Alaa Mohammadi- Fall 2009 1 This lecture There are systematic differences between: What speakers memorize about the sounds of words. The speech sounds

More information

Reading Competencies

Reading Competencies Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies

More information

Design and Implementation of Text To Speech Conversion for Visually Impaired People

Design and Implementation of Text To Speech Conversion for Visually Impaired People Design and Implementation of Text To Speech Conversion for Visually Impaired People Itunuoluwa Isewon* Department of Computer and Information Sciences Covenant University PMB 1023, Ota, Nigeria * Corresponding

More information

A CHINESE SPEECH DATA WAREHOUSE

A CHINESE SPEECH DATA WAREHOUSE A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk

More information

Massachusetts Tests for Educator Licensure

Massachusetts Tests for Educator Licensure Massachusetts Tests for Educator Licensure FIELD 90: FOUNDATIONS OF READING TEST OBJECTIVES Subarea Multiple-Choice Range of Objectives Approximate Test Weighting I. Foundations of Reading Development

More information

Eventia Log Parsing Editor 1.0 Administration Guide

Eventia Log Parsing Editor 1.0 Administration Guide Eventia Log Parsing Editor 1.0 Administration Guide Revised: November 28, 2007 In This Document Overview page 2 Installation and Supported Platforms page 4 Menus and Main Window page 5 Creating Parsing

More information

Quarterly Progress and Status Report. Preaspiration in Southern Swedish dialects

Quarterly Progress and Status Report. Preaspiration in Southern Swedish dialects Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Preaspiration in Southern Swedish dialects Tronnier, M. journal: Proceedings of Fonetik, TMH-QPSR volume: 44 number: 1 year: 2002

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Office Phone/E-mail: 963-1598 / lix@cwu.edu Office Hours: MW 3:50-4:50, TR 12:00-12:30

Office Phone/E-mail: 963-1598 / lix@cwu.edu Office Hours: MW 3:50-4:50, TR 12:00-12:30 ENG 432/532: Phonetics and Phonology (Fall 2010) Course credits: Four (4) Time/classroom: MW2:00-3:40 p.m./ LL243 Instructor: Charles X. Li, Ph.D. Office location: LL403H Office Phone/E-mail: 963-1598

More information

Specialty Answering Service. All rights reserved.

Specialty Answering Service. All rights reserved. 0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

PHONETIC TOOL FOR THE TUNISIAN ARABIC

PHONETIC TOOL FOR THE TUNISIAN ARABIC PHONETIC TOOL FOR THE TUNISIAN ARABIC Abir Masmoudi 1,2, Yannick Estève 1, Mariem Ellouze Khmekhem 2, Fethi Bougares 1, Lamia Hadrich Belguith 2 (1) LIUM, University of Maine, France (2) ANLP Research

More information

Design Grammars for High-performance Speech Recognition

Design Grammars for High-performance Speech Recognition Design Grammars for High-performance Speech Recognition Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks

More information

FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2

FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2 FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2 MARK SCHEME Instructions to marker There are 30 marks available for each of the three tasks, which should be marked separately, resulting in a total of 90 marks.

More information

A Prototype of an Arabic Diphone Speech Synthesizer in Festival

A Prototype of an Arabic Diphone Speech Synthesizer in Festival Department of Linguistics and Philology Språkteknologiprogrammet (Language Technology Programme) Master s thesis in Computational Linguistics A Prototype of an Arabic Diphone Speech Synthesizer in Festival

More information

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational

More information

The ROI. of Speech Tuning

The ROI. of Speech Tuning The ROI of Speech Tuning Executive Summary: Speech tuning is a process of improving speech applications after they have been deployed by reviewing how users interact with the system and testing changes.

More information

CELTA. Syllabus and Assessment Guidelines. Fourth Edition. Certificate in Teaching English to Speakers of Other Languages

CELTA. Syllabus and Assessment Guidelines. Fourth Edition. Certificate in Teaching English to Speakers of Other Languages CELTA Certificate in Teaching English to Speakers of Other Languages Syllabus and Assessment Guidelines Fourth Edition CELTA (Certificate in Teaching English to Speakers of Other Languages) is regulated

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

THE BACHELOR S DEGREE IN SPANISH

THE BACHELOR S DEGREE IN SPANISH Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text

More information

Florida International University Master s of Science in Reading Education. Florida Reading Endorsement Alignment Matrix Competency 1

Florida International University Master s of Science in Reading Education. Florida Reading Endorsement Alignment Matrix Competency 1 Florida Reading Endorsement Alignment Matrix Competency 1 The * designates which of the reading endorsement competencies are specific to the competencies for English to Speakers of Languages (ESOL). The

More information

Support and Compatibility

Support and Compatibility Version 1.0 Frequently Asked Questions General What is Voiyager? Voiyager is a productivity platform for VoiceXML applications with Version 1.0 of Voiyager focusing on the complete development and testing

More information

Elementary English Pacing Guides for Henrico County Public Schools

Elementary English Pacing Guides for Henrico County Public Schools The revised Pacing Guide is the collaborative work of the 2013 Curriculum Committee, formed to critically consider the importance of the Curriculum as the foundation for all learning. With this in mind,

More information

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014 COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: weyand@maine.edu COURSE DESCRIPTION: This is a practical

More information

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS

ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ARMORVOX IMPOSTORMAPS HOW TO BUILD AN EFFECTIVE VOICE BIOMETRIC SOLUTION IN THREE EASY STEPS ImpostorMaps is a methodology developed by Auraya and available from Auraya resellers worldwide to configure,

More information

Enterprise Voice Technology Solutions: A Primer

Enterprise Voice Technology Solutions: A Primer Cognizant 20-20 Insights Enterprise Voice Technology Solutions: A Primer A successful enterprise voice journey starts with clearly understanding the range of technology components and options, and often

More information

Modern foreign languages

Modern foreign languages Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007

More information

Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5

Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5 Unit: Fever, Fire and Fashion Term: Spring 1 Year: 5 English Fever, Fire and Fashion Unit Summary In this historical Unit pupils learn about everyday life in London during the 17 th Century. Frost fairs,

More information

Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa Recuperación de pasantía.

Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa Recuperación de pasantía. Nombre del autor: Maestra Bertha Guadalupe Paredes Zepeda. bparedesz2000@hotmail.com Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa

More information

ELAGSEKRI7: With prompting and support, describe the relationship between illustrations and the text (how the illustrations support the text).

ELAGSEKRI7: With prompting and support, describe the relationship between illustrations and the text (how the illustrations support the text). READING LITERARY (RL) Key Ideas and Details ELAGSEKRL1: With prompting and support, ask and answer questions about key details in a text. ELAGSEKRL2: With prompting and support, retell familiar stories,

More information

Framework for Joint Recognition of Pronounced and Spelled Proper Names

Framework for Joint Recognition of Pronounced and Spelled Proper Names Framework for Joint Recognition of Pronounced and Spelled Proper Names by Atiwong Suchato B.S. Electrical Engineering, (1998) Chulalongkorn University Submitted to the Department of Electrical Engineering

More information

CCSS English/Language Arts Standards Reading: Foundational Skills First Grade

CCSS English/Language Arts Standards Reading: Foundational Skills First Grade CCSS.ELA-Literacy.RF.1.1 Demonstrate understanding of the organization and basic features of print. CCSS.ELA-Literacy.RF.1.1.A Recognize the distinguishing features of a sentence (e.g., first word, capitalization,

More information

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix Course Credit In-service points St. Petersburg College RED 4335/Reading in the Content Area Florida Reading Endorsement Competencies 1 & 2 Reading Alignment Matrix Text Rule 6A 4.0292 Specialization Requirements

More information

Voice-Recognition Software An Introduction

Voice-Recognition Software An Introduction Voice-Recognition Software An Introduction What is Voice Recognition? Voice recognition is an alternative to typing on a keyboard. Put simply, you talk to the computer and your words appear on the screen.

More information

Grade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27

Grade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27 Grade 1 LA. 1. 1. 1. 1 Subject Grade Strand Standard Benchmark Florida K-12 Reading and Language Arts Standards 27 Grade 1: Reading Process Concepts of Print Standard: The student demonstrates knowledge

More information

Stamford Green Primary School Spanish Curriculum Map. September 2014

Stamford Green Primary School Spanish Curriculum Map. September 2014 Stamford Green Primary School Spanish Curriculum Map September 2014 Contents Page Essential characteristics of linguists Page 3 Aims of the National Curriculum Page 4 Early Years Page 5 Year 1 Expectation

More information

CCSS English/Language Arts Standards Reading: Foundational Skills Kindergarten

CCSS English/Language Arts Standards Reading: Foundational Skills Kindergarten Reading: Foundational Skills Print Concepts CCSS.ELA-Literacy.RF.K.1 Demonstrate understanding of the organization and basic features of print. CCSS.ELA-Literacy.RF.K.1.A Follow words from left to right,

More information

have more skill and perform more complex

have more skill and perform more complex Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such

More information

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Robustness of a Spoken Dialogue Interface for a Personal Assistant Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness Worse than not knowing is having information that you didn t know you had. Let the data tell me my inherent

More information

TEACHER CERTIFICATION STUDY GUIDE LANGUAGE COMPETENCY AND LANGUAGE ACQUISITION

TEACHER CERTIFICATION STUDY GUIDE LANGUAGE COMPETENCY AND LANGUAGE ACQUISITION DOMAIN I LANGUAGE COMPETENCY AND LANGUAGE ACQUISITION COMPETENCY 1 THE ESL TEACHER UNDERSTANDS FUNDAMENTAL CONCEPTS AND KNOWS THE STRUCTURE AND CONVENTIONS OF THE ENGLISH LANGUAGE Skill 1.1 Understand

More information

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts

A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts [Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational

More information

MODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G.

MODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G. MODELING OF USER STATE ESPECIALLY OF EMOTIONS Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G. email: noeth@informatik.uni-erlangen.de Dagstuhl, October 2001

More information

Speech-Enabled Interactive Voice Response Systems

Speech-Enabled Interactive Voice Response Systems Speech-Enabled Interactive Voice Response Systems Definition Serving as a bridge between people and computer databases, interactive voice response systems (IVRs) connect telephone users with the information

More information

Syntactic Theory on Swedish

Syntactic Theory on Swedish Syntactic Theory on Swedish Mats Uddenfeldt Pernilla Näsfors June 13, 2003 Report for Introductory course in NLP Department of Linguistics Uppsala University Sweden Abstract Using the grammar presented

More information

Functional Auditory Performance Indicators (FAPI)

Functional Auditory Performance Indicators (FAPI) Functional Performance Indicators (FAPI) An Integrated Approach to Skill FAPI Overview The Functional (FAPI) assesses the functional auditory skills of children with hearing loss. It can be used by parents,

More information

Dynamic Learning Maps Essential Elements English Language Arts. Version 2 Comparison Document

Dynamic Learning Maps Essential Elements English Language Arts. Version 2 Comparison Document Dynamic Learning Maps Essential Elements English Language Arts Version 2 Comparison Document COMMON CORE ESSENTIAL ELEMENTS AND ACHIEVEMENT DESCRIPTORS FOR KINDERGARTEN Kindergarten English Language Arts

More information

Real-World Experience Adding Speech to IVR Solutions with MRCP

Real-World Experience Adding Speech to IVR Solutions with MRCP Real-World Experience Adding Speech to IVR Solutions with MRCP A webinar by NMS, ScanSoft and CapitalOne Agenda Introduction to speech technology Dr. Rob Kassel, Senior Product Manager, ScanSoft, Inc.

More information

From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs

From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs Isabel Trancoso 1,Céu Viana 2, Manuela Barros 2, Diamantino Caseiro 1, and Sérgio Paulo 1 1 L 2 F - Spoken Language Systems

More information

A Comparative Analysis of Speech Recognition Platforms

A Comparative Analysis of Speech Recognition Platforms Communications of the IIMA Volume 9 Issue 3 Article 2 2009 A Comparative Analysis of Speech Recognition Platforms Ore A. Iona College Follow this and additional works at: http://scholarworks.lib.csusb.edu/ciima

More information

Dragon Solutions Enterprise Profile Management

Dragon Solutions Enterprise Profile Management Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible

More information

set in Options). Returns the cursor to its position prior to the Correct command.

set in Options). Returns the cursor to its position prior to the Correct command. Dragon NaturallySpeaking Commands Summary Dragon Productivity Commands Relative to Dragon NaturallySpeaking v11-12 or higher Dragon Medical Practice Edition and Practice Edition 2 or higher Dictation success

More information

COMPUTER TECHNOLOGY IN TEACHING READING

COMPUTER TECHNOLOGY IN TEACHING READING Лю Пэн COMPUTER TECHNOLOGY IN TEACHING READING Effective Elementary Reading Program Effective approach must contain the following five components: 1. Phonemic awareness instruction to help children learn

More information

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14 CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based

More information

A System for Labeling Self-Repairs in Speech 1

A System for Labeling Self-Repairs in Speech 1 A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.

More information

Enabling Speech Based Access to Information Management Systems over Wireless Network

Enabling Speech Based Access to Information Management Systems over Wireless Network Enabling Speech Based Access to Information Management Systems over Wireless Network M. Bagein, O. Pietquin, C. Ris and G. Wilfart 1 Faculté Polytechnique de Mons - TCTS Lab. Parc Initialis - Av. Copernic,

More information

NATURAL SOUNDING TEXT-TO-SPEECH SYNTHESIS BASED ON SYLLABLE-LIKE UNITS SAMUEL THOMAS MASTER OF SCIENCE

NATURAL SOUNDING TEXT-TO-SPEECH SYNTHESIS BASED ON SYLLABLE-LIKE UNITS SAMUEL THOMAS MASTER OF SCIENCE NATURAL SOUNDING TEXT-TO-SPEECH SYNTHESIS BASED ON SYLLABLE-LIKE UNITS A THESIS submitted by SAMUEL THOMAS for the award of the degree of MASTER OF SCIENCE (by Research) DEPARTMENT OF COMPUTER SCIENCE

More information

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM B. Angelini, G. Antoniol, F. Brugnara, M. Cettolo, M. Federico, R. Fiutem and G. Lazzari IRST-Istituto per la Ricerca Scientifica e Tecnologica

More information

Hindi & Telugu Text-to-Speech Synthesis (TTS) and inter-language text Conversion

Hindi & Telugu Text-to-Speech Synthesis (TTS) and inter-language text Conversion International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 Hindi & Telugu Text-to-Speech Synthesis (TTS) and inter-language text Conversion Lakshmi Sahu and Avinash

More information

Using ELAN for transcription and annotation

Using ELAN for transcription and annotation Using ELAN for transcription and annotation Anthony Jukes What is ELAN? ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video

More information