Tutorial about the VQR (Voice Quality Restoration) technology



Similar documents
Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

Lecture 1-10: Spectrograms

DeNoiser Plug-In. for USER S MANUAL

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

UNIVERSITY OF CALICUT

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Basics of Digital Recording

Lecture 1-6: Noise and Filters

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Interpreting the Information Element C/I

Trigonometric functions and sound

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Welcome to the United States Patent and TradeMark Office

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

Analog Representations of Sound

ARTICLE. Sound in surveillance Adding audio to your IP video solution

CDMA TECHNOLOGY. Brief Working of CDMA

Application Note. Introduction to. Page 1 / 10

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

DTS Enhance : Smart EQ and Bandwidth Extension Brings Audio to Life

Digital Speech Coding

Canalis. CANALIS Principles and Techniques of Speaker Placement

CS263: Wireless Communications and Sensor Networks

5th Congress of Alps-Adria Acoustics Association NOISE-INDUCED HEARING LOSS

Audio Coding Algorithm for One-Segment Broadcasting

Interference to Hearing Aids by Digital Mobile Telephones Operating in the 1800 MHz Band.

INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA

FRAUNHOFER INSTITUTE FOR INTEGRATED CIRCUITS IIS AUDIO COMMUNICATION ENGINE RAISING THE BAR IN COMMUNICATION QUALITY

Voice Encoding Methods for Digital Wireless Communications Systems

MAPS - Bass Manager. TMH Corporation 2500 Wilshire Blvd., Suite 750 Los Angeles, CA ph: fx:

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Audio processing and ALC in the FT-897D

Aircraft cabin noise synthesis for noise subjective analysis

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

AM TRANSMITTERS & RECEIVERS

For Articulation Purpose Only

Timing Errors and Jitter

BASS MANAGEMENT SYSTEM

DT3: RF On/Off Remote Control Technology. Rodney Singleton Joe Larsen Luis Garcia Rafael Ocampo Mike Moulton Eric Hatch

Implementing Digital Wireless Systems. And an FCC update

Specialty Answering Service. All rights reserved.

Doppler Effect Plug-in in Music Production and Engineering

Simple Voice over IP (VoIP) Implementation

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

Voltage. Oscillator. Voltage. Oscillator

CHAPTER 1 1 INTRODUCTION

MSB MODULATION DOUBLES CABLE TV CAPACITY Harold R. Walker and Bohdan Stryzak Pegasus Data Systems ( 5/12/06) pegasusdat@aol.com

Operation Manual for Users

The Phase Modulator In NBFM Voice Communication Systems

SIP Trunking and Voice over IP

Analog vs. Digital Transmission

PS 29M DUAL CHANNEL BELTPACK IN METAL CASE

Starlink 9003T1 T1/E1 Dig i tal Trans mis sion Sys tem

Analog-to-Digital Voice Encoding

RECOMMENDATION ITU-R BS *,** Audio quality parameters for the performance of a high-quality sound-programme transmission chain

What Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing

Data Transmission. Data Communications Model. CSE 3461 / 5461: Computer Networking & Internet Technologies. Presentation B

Multiplexing on Wireline Telephone Systems

GSM speech coding. Wolfgang Leister Forelesning INF 5080 Vårsemester Norsk Regnesentral

VoIP Technologies Lecturer : Dr. Ala Khalifeh Lecture 4 : Voice codecs (Cont.)

8. Cellular Systems. 1. Bell System Technical Journal, Vol. 58, no. 1, Jan R. Steele, Mobile Communications, Pentech House, 1992.

Linear Predictive Coding

AUDIO CODING: BASICS AND STATE OF THE ART

Yamaha Power Amplifier. White Paper

Appendix C GSM System and Modulation Description

MODULATION Systems (part 1)

PRIMER ON PC AUDIO. Introduction to PC-Based Audio

DAB + The additional audio codec in DAB

Advanced Signal Processing 1 Digital Subscriber Line

What Does Communication (or Telecommunication) Mean?

CELL PHONE AUDIO CONTROLLED POINT OF SALE TECHNOLOGY REVIEW REPORT. Alana Sweat Group 9 October 18, 2008

From Concept to Production in Secure Voice Communications

Generating Element. Frequency Response. Polar Pattern. Impedance. Output Level

Multichannel stereophonic sound system with and without accompanying picture

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Whisky. Product-Description

Audio Coding Introduction

The Problem with Faxing over VoIP Channels

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

Understanding Video Latency What is video latency and why do we care about it?

Yerkes Summer Institute 2002

Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation

White Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

Introduction 1.1 CONCEPT

ARIB STD-T64-C.S0042 v1.0 Circuit-Switched Video Conferencing Services

How To Understand The Differences Between A Fax And A Fax On A G3 Network

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Revision of Lecture Eighteen

Digital Transmission of Analog Data: PCM and Delta Modulation

application note Directional Microphone Applications Introduction Directional Hearing Aids

The quest for sound quality

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

sound where art and science meet. Ultra flat panels with high definition sound Sound with details you never heard before

(Refer Slide Time: 4:45)

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Transcription:

Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport of audio signals. Its frequency range, however, is very restricted. At the beginning of this technology, it was due to technical limitations of carbon microphones and headphones with metal diaphragm. As technology improved and the transducers have good audio quality, the telephone communication did not include a significant increase in its frequency response. This was for economic reasons, to reduce the required bandwidth. While it is proven that a 100% of intelligibility needs 5.000 Hz of audio band (French-Steinberg, 1947) the telephone industry accepts a slight loss of intelligibility in order to reduce the bandwidth to 3400 Hz and thus be able to handle more channels in the same bandwidth. Low frequencies (bass) are also cut below 300 Hz in order to reduce the cost of transducers and also eliminate the interference of ambient noise like traffic or machines. How does this affect the transmission of news for broadcasting? It is clear that the phone was not designed as a system of high quality audio. So the voices transmitted (even if it goes directly to the telephone line, bypassing the transducers), are doubly limited in frequency, both in bass as in treble. The transmission of frequencies between 300-3400 Hz is a very important limitation. The use of modern cellular phones is even worst due to the use of a reduced bandwidth. The human voice has frequencies from 80 Hz to 10,000 Hz therefore lose two octaves on bass and two treble octaves. This produces a voice decidedly metallic (lack of bass). This forces many radios to invest in UHF links with big antennas and truck mounted facilities. This however takes away their mobility and prevents journalists of working quickly because they must travel followed by an expensive team of technicians. But the limited frequency response is not the only problem, because phone communications have a reduced dynamic range. Recall that we call dynamic range of the relationship between the loudest sound and background noise. The human ear operates in normal life with 90 db dynamic ranges, being acceptable for FM broadcasting a 60 to 70 db. Telephone transmissions handle dynamic ranges between 40 db and 50 db at best. EXISTING SOLUTIONS TODAY The problem of transmitting audio over the phone line has had many attempts at a solution in the past. None of them were full satisfactory. A brief analysis is as follows; a) Multiple frequency shifters. One encoder divides the audio band into three sub-bands 300-3400 Hz and transmits them using three fixed telephone lines. At the other end (Studies) the decoder shifts it to original band. You get an acceptable sound (50-7500 Hz) but the high cost and difficulty to maintain three simultaneous lines for each event and the inability to use with mobile phones, has made the system fall into disuse. b) Frequency extenders. The principle is similar to above but uses a single line. Requires a encoder in transmission that shifts all frequencies above 250 Hz (or a similar value) Then a voice at 100 Hz is transmitted at 250 + 100 = 350 Hz At the studio, the decoder does the reverse task running down 250 Hz in order to recover the original band. Please note that we lost 250 Hz at the treble band.

Disadvantages: You need to use two computers (coder + decoder) / not usable for telephone interviews (where the interviewee speaks from its own telephone set) / not correct the treble / no better dynamic range c) ISDN transmission systems to send phone service for high-speed data, available in Europe. The quality is excellent and this is their main advantage. Disadvantages: Only usable in Europe and very few places in the world, in major cities / Not 100% portable because it requires physical line / High equipment cost and time of transmission / need to use two computers / no use for phone interviews (where the interviewee speaks from its own telephone set) d) CODEC Digital Systems In this category falls the Musicam, Patriot Tieline, TELOS, etc. It is an encoder system that converts the digital signal to the MPEG or other compression system, and transmits it through a telephone modem, using a land telephone line or a GSM Modem through the cellular network. The frequency response and sound quality is good. But it has latency (sound delay) that in some cases, such as remote interviews conducted from studio, can be annoying. Disadvantages: High-cost equipment / It has delay, which prevents interviews from studies / It needs special decoders at studio side e) Dual transmission systems. Falls in this category the Solidyne MB2400 that simultaneously transmits a digital MP3 streaming and analog cellular for processing VQR, eliminating the delay and drawbacks of conventional Digital Codec Compare the previous systems with the new system VQR The VQR encoder is a low cost system that does not require a computer. In fact, improve the quality of the interviews in which the remote person uses its own phone line or cell phone. The VQR has no delay. Restoring frequencies between 50 Hz - 10,000 Hz improves the bass and treble. Increases the dynamic range up to 70 dba (similar to a good studio microphone). 95% of the listeners believe that the remote caller is at the radio study. Disadvantages: The sound quality is good but not perfect as in the CODEC Digital. We recommend listening audio demos at Solidyne WEB site. VQR TECHNOLOGY (Voice Quality Restoration) The comparison just made between VQR and other solutions seem to indicate that the VQR is a kind of magic solution. But of course this is not true, because it is based on scientific principles. Solidyne has worked on the issue of telephone transmission for 35 years. We have analyzed several mathematical techniques that could solve this problem. Several prototypes were built and conducted numerous tests. But our solutions had the same problems as mentioned above and therefore we never commercialize it. But from 2005 we face the problem from a radically different point of view. We use the Psychoacoustic point of view and the analysis of the human voice generation mechanisms. We were encouraged by the success of the same idea in 1988, when we achieved the world's first system of recording and reproducing music in Hard Disk, based on the invention of data compression (bit compression) using psychoacoustic theory of Critical Masking bands. This allowed us to build the world's first system for data compression that was a precursor of MP3, ATRAC, Dolby and all other systems in use today by millions of people around the world. 2

But in 1988 it seemed magic. VQR system, invented by Solidyne in 2005, operates exclusively on the human voice. Let's see then what we know about the speech generation. In Fig 1 we see the phonatory system Figure-1 We can see that the vocal cords are at the beginning of the entire chain of generation, making a sound fundamental or pitch that is the base of the speaker voice. Note that the sound goes through three cavities (pharynx, oral and nasal) that behave like three resonators altering the harmonic components of the sound of the vocal cords. The pitch is rich in harmonics due to the fact that it is an asymmetrical triangular wave, with even and odd harmonics. The movement of the vocal cords in the sequence is shown in Fig-2 Figure 2 Vocal chord movement ( After M.Guirao, 1980) The acoustic action of the vocal cords has been simulated in the laboratory with two masses connected by a spring (Fant 1970, Veldhuis 1995) is evident from Fig 1 that the engineers can simulate cavities with tubular resonators, thus imitating the vocal tract (O'Saughnessy 1987). See Fig 3 3

Figure 3 Modeling vocal tract with 2 or 3 resonator cavities Glottis system (generation of pitch) and three cavities behave as a series equivalent circuit. This hypothesis is now fully accepted (McAulay 1983, Macon 1996, Klijn 1998) Figure 4 Equivalent circuit f vocal chords and resonators Each of the filters (they are band-pass second-order) is quickly changed by the movement of the tongue, producing different resonances that are called formants of the voice. This is defined in the classic equation of Harvey Fletcher (Allen, 1995) which defines the relationship between the sound pressure for a harmonious Pk with respect to the fundamental P1 Where delta is the damping factor Now let's see how looks the audio spectrum that the mechanism of phonation of the human voice presents (Fant (1960), Flanagan (1965), Fry (1979), Lieberman & Blumstein (1988). Consider Fig 5 Figure 5 Audio spectrum for 100 & 200 Hz pitch 4

We see at the first graph the sound of the vocal cords for the case of a fundamental of 100 Hz (male voice) and a 200 Hz (female voice.) It is interesting to note that there are many harmonics that extend to over 3 KHz In the second graph we see the three filters for the three cavities, or formants. Finally in the third graph we see the final result issued by the human voice. Contrary to what one might think this is a discrete spectrum and not a continuous one. That is, consists of single-spaced frequencies. We emphasize this detail because it is the technology base for restoration of serious VQR. Let us see this phenomenon in greater detail. Graph each harmonic as a vertical bar and have the Fig 6 and Fig 7 Figure 6; Harmonics of the A vocal with male voice (pitch = 100 Hz) Figure 7; Harmonics of the A vocal with female voice (pitch = 250 Hz) This is observed in great detail the discreet nature of the audio spectrum. This means that even cutting the bottom of the spectrum, below 300 Hz, the original bass frequencies can theoretically be reconstructed. Imagine that an observer examines two spectra between 1,000 and 2,000 Hz and he do not know anything below 300 Hz. Can our observer know what id the original pitch frequency? Clearly he could do it, measuring the frequency separation between two consecutive harmonics (in our example, 100 and 250 HZ). This is the foundation of VQR restoration of bass frequencies RECONSTRUCTION OF TREBLE & DYNAMIC RANGE As we have seen, human voice can be restored from the discrete components that pass through the telephone channel. This is a remarkable discovery that changes our perspectives on the quality of transmission by telephone or cellphone. The reconstruction of the treble is also possible, but less accurate than the bass. This is because there are two mechanisms for producing high frequency sounds. One of them is given by high harmonics of voiced sounds that is produced by the vocal cords. But another very important group is produced by non-voiced sounds or fricatives. These are generated when the vocal tract is partially closed somewhere and the air pushes strongly in the same place producing turbulence. 5

Example of fricative sounds are the consonants f, s and j Fortunately, these sounds have their fundamental between 2,000 and 3,000 Hz, inside the band of telephone systems. But its harmonics can reach up to 10,000 Hz. Then the telephone loss the brightness characteristic of high quality recordings. However, a careful study of the harmonic components of the human voice above indicates that if excited with the components of the band of 2-3 KHz a harmonic generator based on asymmetric triangular waves (similar to those generated by the vocal cords), yields a harmonic spectrum quite similar to actual sound. Then you get a high band complementary to the original sound that can be added to achieve the reconstruction of the high frequencies of the voice To make the quality of transmitted phone voice as close as possible to the quality we would get in the radio studio, it is necessary to increase the dynamic range from 40-50 dba of telephone calls to 60-70 dba which is what we get in a good studio. We use a well known system used at modern recording studios. The solution is psychoacoustic device (again!) It is based on the temporal masking that a strong signal produces in the human ear. The device that allows this is called Audio Expander. Figure 8 shows the transfer curve of an Expander Figure 8 Input / Output graph of an Expander This is a 1:2 slope expander. This device is absolutely linear. Its gain depends on the signal level. To be effective and meet the psychoacoustic conditions, the gain change when the input signal increases their level, should be very fast, typically less than 5 milliseconds. As the signal level decreases the response time should be about 50-100 ms. Suppose that at the expander input we have a signal that changes its level 20 db. At the output the same signal shows a variation of 30 db, ie it increases the dynamic range. This same principle is applied in VQR technology to reduce background noise of the telephone, to obtain a very clean audio signal with very low perceived noise. 6