Practical Applications of Speech Signal Processing

Similar documents
Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Digital Speech Coding

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Speech Signal Processing: An Overview

Simple Voice over IP (VoIP) Implementation

ETSI TS V1.1.1 ( )

Voice Encoding Methods for Digital Wireless Communications Systems

Thirukkural - A Text-to-Speech Synthesis System

Develop Software that Speaks and Listens

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

HD VoIP Sounds Better. Brief Introduction. March 2009

Analog-to-Digital Voice Encoding

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Automated Dialing of Cellular Telephones Using Speech Recognition

Speech Compression. 2.1 Introduction

Voice over IP Protocols And Compression Algorithms

Tech Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement Understanding VoIP Performance. Title Series.

Technology Finds Its Voice. February 2010

Understanding the Transition From PESQ to POLQA. An Ascom Network Testing White Paper

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

Adjusting Voice Quality

Conference Phone Buyer s Guide

Application Notes. Contents. Overview. Introduction. Echo in Voice over IP Systems VoIP Performance Management

SIP Trunking and Voice over IP

VoIP Conferencing. The latest in IP technologies deliver the next level of service innovation for better meetings. Global Collaboration Services

Course 4: IP Telephony and VoIP

Feature and Technical

IP PBX using SIP. Voice over Internet Protocol

Developing an Isolated Word Recognition System in MATLAB

DSAP - Digital Speech and Audio Processing

Enterprise Voice Technology Solutions: A Primer

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

VoIP and IP Telephony

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

David Tipper Associate Professor Department of Information Science and Telecommunications University of Pittsburgh Slides 2.

Speech recognition technology for mobile phones

1. Public Switched Telephone Networks vs. Internet Protocol Networks

OPERATOR ASSISTANCE (*0) - Immediate operator support is available by pressing *0 on your telephone keypad*.

MPEG-H Audio System for Broadcasting

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

Active Monitoring of Voice over IP Services with Malden

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Voyager Legend. User Guide

Optimizing Converged Cisco Networks (ONT)

HANDS FREE COMMUNICATION (UConnect ) IF EQUIPPED

To help manage calls:

Ericsson T18s Voice Dialing Simulator

Indepth Voice over IP and SIP Networking Course

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

Bluetooth Handsfree Kit. Car Speakerphone (For Bluetooth Mobile Phones)

Search keywords: Connect, Meeting, Collaboration, Voice over IP, VoIP, Acoustic Magic, audio, web conferencing, microphone, best practices

Speech-Enabled Interactive Voice Response Systems

Voice Activity Detection in the Tiger Platform. Hampus Thorell

1. Introduction to Spoken Dialogue Systems

White Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

Linear Predictive Coding

Thin Client Development and Wireless Markup Languages cont. VoiceXML and Voice Portals

Forum 500 Forum 5000 Voice Portal Planning System Forum 500(0) Auto Attendant

User Manual. Please read this manual carefully before using the Phoenix Octopus

On the move: technology Great technology makes for great sound

Tutorial about the VQR (Voice Quality Restoration) technology

Voice Encryption over GSM:

Establishing the Uniqueness of the Human Voice for Security Applications

Echo Cancellation. Definition. Overview. Topics

B12 Troubleshooting & Analyzing VoIP

High Definition Wideband

VoIP Analysis Fundamentals with Wireshark. Phill Shade (Forensic Engineer Merlion s Keep Consulting)

VoiceXML Tutorial. Part 1: VoiceXML Basics and Simple Forms

Voice-Recognition Software An Introduction

Introduction to Packet Voice Technologies and VoIP

User Guide. BlackBerry Storm 9530 Smartphone. Version: 4.7

Radio over Internet Protocol (RoIP)

VoIP Bandwidth Calculation

Monitoring VoIP Call Quality Using Improved Simplified E-model

White Paper. ETSI Speech Quality Test Event Calling Testing Speech Quality of a VoIP Gateway

4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19

Call Recorder Oygo Manual. Version

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

BRINGING VOIP TO THE CONFERENCE ROOM: HOW IT MANAGERS CAN ENHANCE THE USER EXPERIENCE

Dragon speech recognition Nuance Dragon NaturallySpeaking 13 comparison by product. Feature matrix. Professional Premium Home.

C E D A T 8 5. Innovating services and technologies for speech content management

Application Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement. Series. Overview

DeNoiser Plug-In. for USER S MANUAL

From Concept to Production in Secure Voice Communications

GSM VOICE CAPACITY EVOLUTION WITH VAMOS Strategic White Paper

Beyond VoIP Protocols. Understanding Voice Technology and Networking Techniques for IP Telephony

A Smart Telephone Answering Machine with Voice Message Forwarding Capability

Global System for Mobile Communication (GSM)

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

Delivering reliable VoIP Services

IP Telephony (Voice over IP)

IP- PBX. Functionality Options

HEAD acoustics. Standards on Audio Quality - from a system-level view. H. W. Gierlich HEAD acoustics GmbH.

DRAGON NATURALLYSPEAKING 12 FEATURE MATRIX COMPARISON BY PRODUCT EDITION

Figure1. Acoustic feedback in packet based video conferencing system

Speech Coding Methods, Standards, and Applications. Jerry D. Gibson

Transcription:

Practical Applications of Speech Signal Processing Vishu R Viswanathan TI Fellow, Director, Speech Technologies Lab DSP Solutions R&D Center Texas Instruments, Dallas, Texas v-viswanathan@ti.com March 2004 Vishu Viswanathan 1

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 2

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 3

Goals of the Lecture Introduce and discuss each of a number of speech signal processing areas List examples of practical applications Discuss some selected topics in each area High level presentation only March 2004 Vishu Viswanathan 4

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 5

Goal Speech Coding Reduce speech signal data rate Maintain high speech quality General Principle: Take advantage of Redundancies in the speech signal Properties of speech production and perception Applications Digital cellular telephony, voice over IP, IP phone, audio/video conferencing, PSTN trunking, secure voice communication, digital answering machines, voice mail, voice response systems, talking products March 2004 Vishu Viswanathan 6

Components of a Speech Coding System Sampled Speech s(n) Analyzer Channel or Encoder x(n) y(n) Medium y (n) Decoder x (n) Synthesizer s (n) Goal: Minimize data rate of y(n) while maximizing speech quality of s (n) March 2004 Vishu Viswanathan 7

Waveform Coders Types of Speech Coders Goal: Reproduce speech on a sample-by-sample basis High data rates, high speech quality Examples: 64 kb/s PCM (G.711), 32 kb/s ADPCM (G.726) Parametric Coders Speech production characterized by parametric models Low data rates, good speech intelligibility, communications/synthetic speech quality Examples: 2.4 kb/s LPC (FS 1015), 2.4 kb/s MELP (recent NATO standard) Analysis-by-Synthesis Coders Hybrid between waveform and parametric coders, with medium data rates Parametric models used, with excitation signal computed by minimizing error between synthesized speech and input speech Examples: 16 kb/s G.728, 8 kb/s G.729 March 2004 Vishu Viswanathan 8

Speech Quality Terms Used Toll quality: High-grade wireline telephone High quality Good quality Communications quality Transparent quality Formal Subjective Testing Methods Expensive, time consuming Mean opinion score (MOS): Used in all industry standards bodies Diagnostic acceptability measure (DAM): Used by US Dep t of Defense Informal and Semi-Formal Subjective Tests Pairwise or A/B comparisons Rating tests Objective Methods Signal-to-Noise Ratio, ITU P.802 (PESQ) Automatic, repeatable, useful in coder development and optimization March 2004 Vishu Viswanathan 9

Speech Coder Attributes Low bit rate Low quality Clean Speech Low delay Low Complexity Human Speech 1200 2400 4800 8000 16000 32000 64000 Bits/Second 2.5 3.0 3.5 4.0 Handheld Mean Opinion Score Hands-free 10 50 100 200 Milliseconds MIPS, Memory Sound Effects High bit rate High quality Noisy Speech High delay High Complexity Music March 2004 Vishu Viswanathan 10

Speech Coding Standards ITU Standards coder rate (kb/s) approach G.711 64 Mu/A-law G.726 16-40 ADPCM G.728 16 LD-CELP G.729 8 CS-ACELP G.723.1 5.3/6.3 MP/ACELP ITU standards are targeted for telephone network applications Also used in Voice over IP applications All produce toll quality speech March 2004 Vishu Viswanathan 11

Europe North America Japan Speech Coding Standards Digital Cellular Standards coder rate (kb/s) chan rate approach date GSM FR 13 22.8 RPE-LTP 1987 GSM HR 5.6 11.4 VSELP 1994 GSM EFR 12.2 22.8 ACELP 1995 GSM AMR 4.75-12.2 11.4-22.8 ACELP 1998 TIA IS54 7.95 13 VSELP 1989 TIA IS95 0.8-8.55 QCELP 1993 TIA Q13 0.8-13.3 QCELP 1995 TIA IS641 7.4 13 ACELP 1996 TIA EVRC 0.8-8.55 R-ACELP 1996 TIA SMV 0.8-8.5 R-ACELP 2001 PDC FR 6.7 11.2 VSELP 1990 PDC HR 3.45 5.6 PSI-CELP 1993 PDC EFR 8 11.2 ACELP 1999 PDC EFR 6.7 11.2 ACELP 2000 March 2004 Vishu Viswanathan 12

Speech Coding Standards Wideband Standards coder rate (kb/s) approach G.722 48,56,64 SB-ADPCM G.722.1 24,32 Transform ITU WB 16,24 ACELP AMR WB 6.60-23.85 ACELP VMR WB 1.0-13.3 ACELP Wideband: 50 Hz 7 khz (versus narrowband telephone, 300-3200 Hz) March 2004 Vishu Viswanathan 13

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 14

Speech Synthesis Human Speech Based Systems Suitable for known material Speech coding based Talking toys, talking books, voice prompts, voice response systems Concatenation of pre-recorded voice data Information retrieval (stock quotes, airline schedules, banking) Text-to-Speech Systems Suitable for unknown or arbitrary text Applications: e-mail/fax reading, phone access to web based services, spoken telephone directory, car navigation, locationbased services, customer service, help desk, reading machines for the blind March 2004 Vishu Viswanathan 15

Components of a TTS System Dictionary and Rules Text Text Analysis Letter-to- Sound Synthesizer Speech - Numerical expansion (dates, times, money) - abbreviations, acronyms -proper name id Dr. Smith lives at 23 Lakeshore Dr. Courtesy of Larry Rabiner - Phonemes -Pitch - Duration -Pauses - loudness/amplitude choice of units words, phones, diphones, dyad, syllables choice of parameters LPC, formants, waveform templates, articulatory parameters, sinusoidal parameters method of computation rules, concatenation March 2004 Vishu Viswanathan 16

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 17

Speech Recognition & Understanding Problem Recognition: Automatic recognition of human speech by machine Understanding: Interpret the meaning of recognized speech and map them to actions to be taken Applications Voice dialing (name or number dialing) in telephone, cellphone, PDA, smartphone (Safety laws against handheld cellphone use while driving) Voice command & control in telematics, cellphone, PDA, smartphone, PC, toys Voice-enabled web browsing, information retrieval (stock quotes, weather forecast, airline flight information, banking), navigation, e-mail, SMS, dictation Automated customer service and help desks Benefits: hands-free, eyes-free use; not using keypad; faster task completion; ease of use; part of multi-modal interface; cost savings March 2004 Vishu Viswanathan 18

March 2004 Vishu Viswanathan 19

Components of a Speech Recognizer speech signal word string Feature Extraction Acoustic Scoring Decoding Acoustic Models Language Models Front end Back end March 2004 Vishu Viswanathan 20

Speaker Dependent Small Vocabulary Isolated Words Recognition Speech Recognizer Attributes Speaker Adaptive 10 100 1000 10000 Words Continuous Speech Syntax Semantics Speaker Independent Large Vocabulary Conversational Speech Understanding Clean Speech Handheld Hands-free Noisy Speech Low Complexity MIPS, Memory High Complexity Server Based Distributed Client Based March 2004 Vishu Viswanathan 21

Performance & Robustness Performance Recognition Accuracy: Word error rate (WER) or task completion rate High enough performance required for user acceptance Robustness Issues Training versus operational condition differences Background noise: extent of noise, its variability (Usually additive) Channel variability: different microphones, different telephone circuits, handheld, handsfree, handheld-handsfree (Usually convolutive) Recognizer must have means to compensate for noise and channel variabilities Out-of-vocabulary rejection capability Speaker dialect and accent variability (handled by speaker adaptation) User Interface: Very important for the success of an application March 2004 Vishu Viswanathan 22

Recognition in Multiple Languages Speaker-Dependent Recognition Language independent (User can enroll names for voice dialing in multiple languages!) Some Observations for Speaker-Independent Recognition Same recognition engine but different data (models, dictionary) needed Recognition grammar to handle language-specific usage differences (e.g., French speak telephone numbers in pairs; natural number dialing needed) Training requires speech databases and dictionary in the new language Automatic training tools to minimize time to develop recognition in a new language March 2004 Vishu Viswanathan 23

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 24

Speaker Recognition Speaker Verification / Authentication Problem: Use voice input to verify the user s claimed identity Applications: Secure access to premises, information (banking), services (voice dialing), etc. Issues True user acceptance traded off with impostor acceptance Total voice verification Fixed text versus free text Speaker Identification Problem: Use voice to identify speaker from a closed or open set of speakers Applications: Legal and forensic use, intelligence, security Issues: Uncooperative user, often relatively short-duration speech, noisy and/or distorted speech. March 2004 Vishu Viswanathan 25

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 26

Speech Enhancement Noise Suppression Playback Enhancement Acoustic Echo Cancellation March 2004 Vishu Viswanathan 27

Noise Suppression Problem Remove acoustic noise from noisy speech signal for better listenability or for improved performance of speech processing devices Requirements: No speech signal distortion, no loss of speech intelligibility, no artifacts like musical noises, natural sounding residual noise Methods Single microphone approach: spectral subtraction family of methods Multi-microphone approach: adaptive noise cancellation, microphone array based fixed or adaptive beamforming, blind signal separation March 2004 Vishu Viswanathan 28

Playback Enhancement Problem Enhanced playback of speech to the listener Methods Spectrally shape the speech signal prior to playback, for improved intelligibility when the listener is in a noisy environment (PA system in aircraft, airports, sports arenas) Active noise cancellation to cancel noise acoustically in listener s ears (ANC headsets) Narrowband to wideband speech extension to provide wideband speech perception March 2004 Vishu Viswanathan 29

Acoustic Echo Cancellation rn ( ) Downlink Signal s( n) Far End Signal loudspeaker Error Signal A E C ˆ ( ) H z H(z) channel x( n) en ( ) - yn ˆ( ) vn ( ) = un ( ) + yn ( ) + n( n) 0 microphone Uplink Signal + Near End Signal Goal: Cancel feedback from loudspeaker into microphone using adaptive linear filter March 2004 Vishu Viswanathan 30

Lecture Outline Goals of the Lecture Speech Coding Speech Synthesis Speech Recognition & Understanding Speaker Recognition Speech Enhancement Speech Modification March 2004 Vishu Viswanathan 31

Speech Modification Voice Conversion Convert one voice to sound like another A female voice converted to sound like a low-pitched male voice (security) Time-Scale or Rate Modification Speed up or slow down speech, while preserving naturalness Applications: talking books, pre-recorded lectures, language learning March 2004 Vishu Viswanathan 32