Mobile Speech Processing
|
|
- Jean Oliver
- 7 years ago
- Views:
Transcription
1 Mobile Speech Processing David Huggins-Daines Language Technologies Institute Carnegie Mellon University September 19, 2008
2 Outline Mobile Devices What are they? What would we like to do with them? Mobile Speech Applications Mobile Speech Technologies Current Research
3 Mobile Devices What is a mobile device? A hammer is a device, and you can carry it around with you! But no, that s not what we mean here
4 Mobile Devices What is a mobile device? A device that goes everywhere with you... which provides some or all of the functions of computer... and some things it doesn t, such as a cell phone or GPS.
5 Speech on Mobile Devices Why do we care about speech processing on these devices? Because they are the future of computers Because speech is actually a useful way to interact with them, unlike full-sized computers What kind of speech processing do we care about? Speech coding to improve voice quality for cellular and VoIP Speech recognition for hands-free input to apps Speech synthesis for eyes-free output from apps In some cases, speech is a natural and convenient modality In other cases, it is a necessity (e.g. in-car navigation)
6 Speech on Mobiles vs. Mobile Speech None of this necessarily implies doing actual speech processing (aside from coding) on the device itself Telephone dialog systems are mobile by any definition Let s Go - bus scheduling information HealthLine - medical information for rural health workers But all synthesis and recognition is done on a server This can be a good thing especially in the latter case You can t run a speech recognizer on a Motofone or a Nokia 1010 Speech processing on the device is useful for: Multimodal applications Disconnected applications Access to local data
7 Some Mobile Speech Applications GPS navigation Older systems used a small number of recorded prompts ( turn left, 100 metres, etc) More recently, TTS has been used to speak street names Even more recently, ASR is used for input Voice dialing Old systems used DTW and required training Newer ones build models from your address book Cactus for iphone - uses CMU Flite and Sphinx Voice-driven search (local, web, etc) Nuance, Vlingo, TellMe, Microsoft are all doing this Voice-to-text Typically server-based, requires a data connection on-line, ASR-based: Vlingo, Nuance off-line, human-assisted: SpinVox, Jott, ReQall Speech to Speech Translation
8 Mobile Speech Technologies Speech Coding Efficient digital representation of speech signals Fundamental for 2G and 3G cell networks and VoIP Speech Synthesis Speech output for commands, directions Text-to-speech for messages, books, other content Speech Recognition Command and control ( voice control ) Dictation (Speech-to-text for , SMS) Search input (questions, keywords) Dialogue
9 Speech Coding A fairly mature technology (started in the 1960s) Early versions were mostly for military applications Digital cell phone networks changed this dramatically Almost universally based on linear prediction and the source-filter model. Each sample is a weighted sum of P previous samples. Weights are linear prediction coefficients (LPCs), and are calculated to minimize mean squared error. Conveniently enough, this is actually a good model of the frequency response of the vocal tract (given enough LPCs). An excitation function models the glottal source. Everything else is just tweaking Better excitation functions (CELP) Variable bit rates (AMR) Compression tricks (VAD + comfort noise)
10 Mobile Speech Synthesis Two traditional categories, one new one Synthesis by rule, e.g. formant synthesis Concatenative synthesis, e.g. diphone, unit selection Statistical-parametric synthesis ( HMM synthesis ) We have had very efficient (often hardware-based) implementations of TTS for decades They sound terrible (but are often quite intelligible) The challenges for mobile devices are: Achieving natural-sounding speech Dealing with very large, irregular vocabularies Dealing with raw and diverse input text
11 Mobile Speech Synthesis Unit selection currently gives the most natural output But it is very ill-suited to mobile implementations Best systems use gigabytes of speech data But, you say... I have an 8GB microsd card in my phone! Search time: finding the right units of speech Access time: loading them from the storage medium Signal generation can also be time-consuming if not efficiently implemented Some ways to improve efficiency: Compress the speech database Prune the speech database by discarding units that are infrequently or never used Approximate search algorithms (much like ASR)
12 Mobile Speech Synthesis Statistical-parametric synthesis is quite promising Models are quite small (1-2MB) The search problem is nonexistent Parameter and waveform generation are the most time consuming parts currently Requires higher dimensionality parameterizations than concatenative synthesis Output parameters are smoothed using an iterative algorithm (similar to EM) Waveform generation from mcep is much slower than LPC Dictionary compression and text normalization Dictionary can be compressed by building letter-to-sound models and listing only the exceptions Efficient finite-state transducer representations can be created for pronunciation and text processing rules
13 Mobile Speech Recognition Challenges for mobile devices are: Variable and noisy acoustic environments Large vocabularies Open domain dictation input As with speech synthesis, simple ASR is not very resource intensive, although it has not been as widely implemented Even with large vocabularies, ASR can be done efficiently The most important factor is the complexity of the grammar Commercial systems achieve impressive performance based on very constrained grammars Systems tend to be extensively tuned for a given application
14 Mobile Speech Recognition: Acoustic Issues How do you talk to a device? This depends on the application, user, and environment Acoustic feature vectors can look very different Microphones may not be optimized for all positions Noisy environments Mobile devices are more likely to be used in noisy environments Worse, they are more likely to be used in difficult ones Non-stationary noise, crosstalk, human babble Array processing is not well suited to handheld devices On the bright side: Usually a mobile device has only one user Speaker adaptation can improve acoustic modeling Speaker identification can be used to filter out babble and crosstalk
15 Mobile Speech Recognition: Computational Issues Acoustic feature extraction Efficient, as long as it is implemented properly Fixed-point arithmetic, data-parallel processing Most processing time is consumed by, in roughly equal amounts: Acoustic model evaluation Search (hypothesis generation and evaluation) These can be made computationally efficient but must also be made memory efficient, search in particular. This necessarily involves tuning heuristics because a complete solution is intractable.
16 Mobile Speech Recognition: Acoustic Modeling Exact acoustic model evaluation is intractable P(o s i, λ) = K k=1 w ik 1 (2π) D Σ ik exp D d=1 (o d µ ikd ) 2 2σ 2 ikd Typical continuous-density acoustic model: 5000 tied states, each with 32 Gaussian densities, of 39 dimensions Complete evaluation of all log-likelihoods for one 10ms frame: log-additions subtractions multiplications That s 2500 million operations per second! Your new MacBook Pro can do that, but just barely (yes, its video card can do it easily)
17 Mobile Speech Recognition: Acoustic Modeling How do we make this fast enough? Only evaluate densities for active phones in search Predict which densities will score highly using a smaller, approximate model set, and only evaluate these ones Use fewer densities and: Share them between all HMM states (semi-continuous HMM) or all the states for some phonetic class (phonetically-tied HMM) Make density computation faster by quantizing acoustic features and parameters Skip some frames in the input, either by Blindly computing only multiples of N (usually 2 or 3) Detecting interesting regions in the input and only computing densities there (landmark detection) Every ASR system in existence uses some combination of these However, too many approximations can make the system slower
18 Mobile Speech Recognition: Search Search is not arithmetically intensive It largely consists of adding up scores and comparing them to other scores However it is very memory intensive The search module in an ASR system touches: Acoustic scores Language model scores Dictionary entries Viterbi path scores and backpointers Backpointer table entries In other words, pretty much every piece of memory except the acoustic model parameters Worse yet, there are sequential dependencies between all these memory accesses
19 Mobile Speech Recognition: Search Fundamentally, the speed of the recognizer is proportional to the number of different hypotheses it considers at once Optimizing search is entirely devoted to reducing this number without significantly affecting accuracy This includes: Careful tuning of various thresholds (beams) for word transitions, phone transitions, etc. Absolute pruning - hard limits on words per frame Phonetic lookahead Language model lookahead (factorization / weight pushing) Finite-state transducer systems can be very fast Dictionary, grammar, and (part of) acoustic model are composed into a single decoding network Determinization - allows exact language model search Minimization - merges common subpaths Weight pushing - more general kind of LM lookahead
20 Common Problems for Mobile Speech Processing Moore s Law works differently for mobile devices Instead of getting faster, they get smaller and cheaper Storage gets bigger, RAM doesn t Memory doesn t get much faster Memory bandwidth is a major bottleneck Making things smaller almost always makes them faster Memory allocations can be very expensive (depending on the operating system) Audio input quality is often much lower Typically 8kHz or 11kHz maximum sampling rate Dubious microphones
21 Current and Future Research Incorporating user feedback in multimodal (speech + touch) applications Presenting information efficiently using speech synthesis Very low bitrate speech coding using ASR and TTS Distributed processing for mobile speech recognition Acoustic robustness for handheld mobile devices Voice and multimodal user interface design
Digital Speech Coding
Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2720 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationSpeech recognition technology for mobile phones
Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationVoice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification
Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification (Revision 1.0, May 2012) General VCP information Voice Communication
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationA Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman
A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationHIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES. 1 Nokia 2013 High quality audio recording in Nokia Lumia smartphones
HIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES 1 Nokia 2013 High quality audio recording in Nokia Lumia smartphones HIGH QUALITY AUDIO RECORDING IN NOKIA LUMIA SMARTPHONES This white paper describes
More informationVOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
More informationObject Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
More informationReading Assistant: Technology for Guided Oral Reading
A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director
More informationThe ROI. of Speech Tuning
The ROI of Speech Tuning Executive Summary: Speech tuning is a process of improving speech applications after they have been deployed by reviewing how users interact with the system and testing changes.
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationAdvanced Speech-Audio Processing in Mobile Phones and Hearing Aids
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain
More informationDragon Medical Practice Edition v2 Best Practices
Page 1 of 7 Dragon Medical Practice Edition v2 Best Practices 1. Hardware 2. Installation 3. Microphones 4. Roaming User Profiles 5. When (and how) to Make Corrections 6. Accuracy Tuning Running the Acoustic
More informationAnalog-to-Digital Voice Encoding
Analog-to-Digital Voice Encoding Basic Voice Encoding: Converting Analog to Digital This topic describes the process of converting analog signals to digital signals. Digitizing Analog Signals 1. Sample
More informationChallenges and Solutions in VoIP
Challenges and Solutions in VoIP Challenges in VoIP The traditional telephony network strives to provide 99.99 percent uptime to the user. This corresponds to 5.25 minutes per year of down time. Many data
More informationDragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.
matrix Recognition accuracy Recognition speed System configuration Turns your voice into text with up to 99% accuracy New - Up to a 15% improvement to out-of-the-box accuracy compared to Dragon version
More informationSpeech Processing Applications in Quaero
Speech Processing Applications in Quaero Sebastian Stüker www.kit.edu 04.08 Introduction! Quaero is an innovative, French program addressing multimedia content! Speech technologies are part of the Quaero
More informationTCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION
TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION (Please read appropriate parts of Section 2.5.2 in book) 1. VOICE DIGITIZATION IN THE PSTN The frequencies contained in telephone-quality
More informationSimple Voice over IP (VoIP) Implementation
Simple Voice over IP (VoIP) Implementation ECE Department, University of Florida Abstract Voice over IP (VoIP) technology has many advantages over the traditional Public Switched Telephone Networks. In
More informationWhite Paper April 2006
White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationVoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)
VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.) 1 Remember first the big picture VoIP network architecture and some terminologies Voice coders 2 Audio and voice quality measuring
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationEnabling Speech Based Access to Information Management Systems over Wireless Network
Enabling Speech Based Access to Information Management Systems over Wireless Network M. Bagein, O. Pietquin, C. Ris and G. Wilfart 1 Faculté Polytechnique de Mons - TCTS Lab. Parc Initialis - Av. Copernic,
More informationDesign of a Flexible User Interface Demonstrator, Control Software and Sensors
Design of a Flexible User Interface Demonstrator, Control Software and Sensors Editors: Paul Panek 1) & Håkan Neveryd 2) Contributors: Georg Edelmayer 1), Håkan Eftring 2), Gunilla Knall 2), Charlotte
More informationEnvox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts
Envox CDP 7.0 Performance Comparison of and Envox Scripts Goal and Conclusion The focus of the testing was to compare the performance of and ENS applications. It was found that and ENS applications have
More informationSpeech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML
Speech Recognition of a Voice-Access Automotive Telematics System using VoiceXML Ing-Yi Chen Tsung-Chi Huang ichen@csie.ntut.edu.tw rick@ilab.csie.ntut.edu.tw Department of Computer Science and Information
More informationSpeech Recognition Software Review
Contents 1 Abstract... 2 2 About Recognition Software... 3 3 How to Choose Recognition Software... 4 3.1 Standard Features of Recognition Software... 4 3.2 Definitions... 4 3.3 Models... 5 3.3.1 VoxForge...
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationcustomer care solutions
customer care solutions from Nuance white paper :: Understanding Natural Language Learning to speak customer-ese In recent years speech recognition systems have made impressive advances in their ability
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationVideo Conferencing. Femi Alabi UNC-CH - Comp 523 November 22, 2010
Video Conferencing Femi Alabi UNC-CH - Comp 523 November 22, 2010 Introduction Videoconferencing What Is It? Videoconferencing is a method of communicating between two or more locations where sound, vision
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationDRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION
1 Recognition Accuracy Turns your voice into text with up to 99% accuracy NEW - Up to a 20% improvement to out-of-the-box accuracy compared to Dragon version 11 Recognition Speed Words appear on the screen
More informationConference Phone Buyer s Guide
Conference Phone Buyer s Guide Conference Phones are essential in most organizations. Almost every business, large or small, uses their conference phone regularly. Such regular use means choosing one is
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationS4 USER GUIDE. Read Me to Get the Most Out of Your Device...
S4 USER GUIDE Read Me to Get the Most Out of Your Device... Contents Introduction 4 Remove the Protective Cover 5 Charge Your S4 5 Pair the S4 with your Phone 6 Install the S4 in your Car 8 Using the Handsfree
More informationIBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream
RC23499 (W0501-090) January 19, 2005 Computer Science IBM Research Report CSR: Speaker Recognition from Compressed Packet Stream Charu Aggarwal, David Olshefski, Debanjan Saha, Zon-Yin Shae, Philip Yu
More informationSpecialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
More informationMicrosoft Office Outlook 2013: Part 1
Microsoft Office Outlook 2013: Part 1 Course Specifications Course Length: 1 day Overview: Email has become one of the most widely used methods of communication, whether for personal or business communications.
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationTitle : Analog Circuit for Sound Localization Applications
Title : Analog Circuit for Sound Localization Applications Author s Name : Saurabh Kumar Tiwary Brett Diamond Andrea Okerholm Contact Author : Saurabh Kumar Tiwary A-51 Amberson Plaza 5030 Center Avenue
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationSpeech Processing 15-492/18-492. Speech Translation Case study: Transtac Details
Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details Transtac: Two S2S System DARPA developed for Check points, medical and civil defense Requirements Two way Eyes-free (no screen)
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationEfficient Data Structures for Decision Diagrams
Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...
More informationLinear Predictive Coding
Linear Predictive Coding Jeremy Bradbury December 5, 2000 0 Outline I. Proposal II. Introduction A. Speech Coding B. Voice Coders C. LPC Overview III. Historical Perspective of Linear Predictive Coding
More informationLOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,
More informationSense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
More informationCalculating Bandwidth Requirements
Calculating Bandwidth Requirements Codec Bandwidths This topic describes the bandwidth that each codec uses and illustrates its impact on total bandwidth. Bandwidth Implications of Codec 22 One of the
More informationx64 Servers: Do you want 64 or 32 bit apps with that server?
TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called
More informationSetting up for Adobe Connect meetings
Setting up for Adobe Connect meetings When preparing to lead a live lecture or meeting, you probably ensure that your meeting room and materials are ready before your participants arrive. You run through
More informationASR Resource Websites
ATIM Module ASR Page 1 of 5 ASR Resource Websites Adaptive Solutions, Inc. Dragon Products: voice recognition products and microphones. http://www.talksight.com/home.html Ars technical: Review of Speech
More informationVoice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
More informationTeamAgenda Synchronization under Windows. Microsoft Outlook as a Synchronization Gateway
TeamAgenda Synchronization under Windows This document describes the process of synchronizing TeamAgenda s Address Book, Calendar and To Do List with handheld devices such as Personal Digital Assistants
More informationMarathi Interactive Voice Response System (IVRS) using MFCC and DTW
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,
More informationIntroduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles
Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles Sound is an energy wave with frequency and amplitude. Frequency maps the axis of time, and amplitude
More informationDevelop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationDesigning the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor
Designing the NEWCARD Connector Interface to Extend PCI Express Serial Architecture to the PC Card Modular Form Factor Abstract This paper provides information about the NEWCARD connector and board design
More informationThin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals
Thin Client Development and Wireless Markup Languages cont. David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh tipper@tele.pitt.edu http://www.sis.pitt.edu/~dtipper/2727.html
More informationDragon Solutions Enterprise Profile Management
Dragon Solutions Enterprise Profile Management summary Simplifying System Administration and Profile Management for Enterprise Dragon Deployments In a distributed enterprise, IT professionals are responsible
More informationHow To Synchronize With A Cwr Mobile Crm 2011 Data Management System
CWR Mobility Customer Support Program Page 1 of 10 Version [Status] May 2012 Synchronization Best Practices Configuring CWR Mobile CRM for Success Whitepaper Copyright 2009-2011 CWR Mobility B.V. Synchronization
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationDovid Coplon, Product Management Director http://gipscorp.com
Dovid Coplon, Product Management Director http://gipscorp.com VoIP Quality Mobile VoIP & Technology Trends Greater accessibility and affordability Network operators are introducing new pricing models Handset
More informationName: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
More informationExhibit n.2: The layers of a hierarchical network
3. Advanced Secure Network Design 3.1 Introduction You already know that routers are probably the most critical equipment piece in today s networking. Without routers, internetwork communication would
More informationIVR Primer Introduction
IVR Primer Introduction Speech-enabled applications are quickly becoming very popular. Why? Because using voice to navigate is more natural for users than punching telephone keypads. Speech as a navigation
More informationVideo-Conferencing System
Video-Conferencing System Evan Broder and C. Christoher Post Introductory Digital Systems Laboratory November 2, 2007 Abstract The goal of this project is to create a video/audio conferencing system. Video
More information2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
More informationAbstract. Cycle Domain Simulator for Phase-Locked Loops
Abstract Cycle Domain Simulator for Phase-Locked Loops Norman James December 1999 As computers become faster and more complex, clock synthesis becomes critical. Due to the relatively slower bus clocks
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationIntroduction to acoustic imaging
Introduction to acoustic imaging Contents 1 Propagation of acoustic waves 3 1.1 Wave types.......................................... 3 1.2 Mathematical formulation.................................. 4 1.3
More informationTutorial about the VQR (Voice Quality Restoration) technology
Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport
More informationSolid State Drive Architecture
Solid State Drive Architecture A comparison and evaluation of data storage mediums Tyler Thierolf Justin Uriarte Outline Introduction Storage Device as Limiting Factor Terminology Internals Interface Architecture
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationThe Problem with Faxing over VoIP Channels
The Problem with Faxing over VoIP Channels Lower your phone bill! is one of many slogans used today by popular Voice over IP (VoIP) providers. Indeed, you may certainly save money by leveraging an existing
More informationABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
More informationORGANIZER QUICK REFERENCE GUIDE. Install GoToMeeting. Schedule a Meeting. Start a Scheduled Meeting. Start an Instant Meeting.
GoToMeeting organizers on both personal and corporate plans can hold meetings with up to 25 attendees. Organizers must first create a GoToMeeting account and then download the GoToMeeting desktop application
More informationTechnology Finds Its Voice. February 2010
Technology Finds Its Voice February 2010 Technology Finds Its Voice Overview Voice recognition technology has been around since the early 1970s, but until recently the promise of new advances has always
More information