Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis

Size: px
Start display at page:

Download "Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis"

Transcription

1 Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis S-E.Fotinea 1, S.Bakamidis 1, T.Athanaselis 1, I.Dologlou 1, G.Carayannis 1, R.Cowie 2, E.Douglas-Cowie 2, N.Fragopanagos 3, J.G.Taylor 3 1 Institute for Language and Speech processing (ILSP) Tel: , evita@ilsp.gr 2 Department of Psychology, Queen s University, Belfast, UK Tel:0044-(0) , r.cowie@qub.ac.uk 3 Department of Mathematics, King s College, London, UK Tel: , john.g.taylor@kcl.ac.uk Abstract. If speech analysis is to detect a speaker s emotional state, it needs to derive information from both linguistic information, i.e., the qualitative targets that the speaker has attained (or approximated), conforming to the rules of language; and paralinguistic information, i.e., allowed variations in the way that qualitative linguistic targets are realised. It also needs an appropriate representation of emotional states. The ERMIS project addresses the integration problem that those requirements pose. It mainly comprises a paralinguistic analysis and a robust speech recognition module. Descriptions of emotionality are derived from these modules following psychological and linguistic research that indicates the information likely to be available. We argue that progress in registering emotional states depends on establishing an overall framework of at least this level of complexity. 1 Introduction Speech recognition is a technically sophisticated field, with numerous commercial systems already available for transforming speech to text. However, these systems ignore a large part of the information that humans extract from speech signals that is, information about the emotional state of the speaker. There are various specific applications for the detection of emotional and emotion-related states [1]; but probably the most important reason for addressing the issue is completely generic. In this paper we describe progress towards a system capable of recovering the emotional content of speech signals. Our general case is that understanding the emotional dimension of speech communication is a thoroughly interdisciplinary problem. Learning algorithms in general, and neural networks in particular, have an indispensable part to play. However, they need to be applied within a framework that makes use of other computational techniques, and of knowledge derived from several traditions within linguistics and psychology. In humans, there are at least two separate systems involved in the processing of information about emotion from speech. One derives information from the words that are spoken; the other derives information

2 2 S-E.Fotinea, S.Bakamidis, T.Athanaselis, I.Dologlou, G.Carayannis, R.Cowie, E.Douglas- Cowie, N.Fragopanagos, J.G.Taylor from the way they are spoken, particularly from the patterns of rise and fall in pitch and intensity known as prosody and the changes in fine structure known as voice quality. There are indications that these distinctions may be associated with different cortical processing streams [2]. Following this bipartite division of emotion processing in the human, our work distinguishes two basic components for the emotional speech analysis system. The first consists of a linguistic analysis system, which derives information from a word string, extracted as text from the signal. A postprocessor stage then provides an interpretation of the emotion associated with the speaker. The other component is composed of a paralinguistic analysis system. This uses different components of the raw acoustic signal to infer underlying emotion states of the speaker. The structure of the emotion recognition process depends critically on the definition of emotion-related states. There is a large body of psychological research in that area, but it is not well known in the IT communities that have expertise in the basic extraction processes. We highlight a well-established parameterisation of emotional states (into activation and valence levels) that is soft in its state delineation. Using that representation makes it possible to avoid some of the problems of binary state representation (with too much dependence on a linguistic definition of emotional states). Ideas that are less well established, but much more useful than uninformed intuitions, are relevant to the extraction of information from specifically verbal sources. In the next section we describe the system that we have developed for prosodic analysis. Various emotionally important components, such as the F0 and intensity plots, are extracted, and then used to give a separate indication of the speaker s emotional state of the speaker. Section 3 describes the linguistic analyser, with subsections devoted to the explicit text recognition process and to the post-processing emotional state look-up. We conclude the paper with a discussion of the issues facing research in the immediate future. 2 Paralinguistic Analysis of Speech The paralinguistic module extracts information about emotion that resides in the way words are spoken. The first target in this module is the extraction of phonetic structures, such as pitch and intensity contours, spectral profiles, and feature boundaries. From these are derived measures such as average pitch and energy, and parameters of timing. These are measured across sections of an utterance marked by natural endpoints. The module derives from a system called ASSESS (standing for Automatic Statistical Summary and Extraction of Speech Segments), which we have shown captures information relevant to speakers emotional states [3]. Hence, we call the new system ASSESS MU (for modular unit). 2.1 Overall organization For several reasons, it is desirable to apply paralinguistic processing to units of speech which correspond roughly to sentences or phrases - lasting of the order of a second or more, and bounded by substantial breaks in speech. Some of the features

3 Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis 3 that are most often associated with emotion, are only defined relative to that kind of unit. An example is declination, i.e. a pattern in which pitch shows an overall tendency to fall from the beginning of a phrase to the end. Hence a good deal of processing must be held back until such a break occurs. The linguistic analyser needs to work continuously, and so it will provide the signal that a break has occurred; and at that point, ASSESS MU will be triggered and will analyse the file, operating in three main stages, described below. 2.2 Stage 1 Stage 1 will take the plot of voltage against time specified by a pause-defined file, and output descriptions of three basic types overall signal energy, signal spectrum, and vocal cord openings. Voltage is sampled at 22.5Khz. Overall energy and spectral properties will be described in terms of slices, that is, portions of the signal which span 512 points in the voltage plot. The overall energy measures will describe RMS of voltage measurements within a slice, and the basic spectral description of each slice will describe signal intensity within each of 18 bands, which are generally 1/3 octave but wider (for practical reasons) at the top and the bottom of the range. From that will be derived descriptions of the energy in four broad bands associated with measures used in [5] to capture qualities of voice such as breathiness, tension, etc; plus one lower boundary. which other work (including our own) has shown is emotion-sensitive. The bands are: # Hz, #2 0-2kHz, #3 2-5kHz, #4 5-8kHz Vocal cord openings form the basis on which the pitch contour (F0) will be estimated. They will be identified using an algorithm which picks up rapid upswings in the voltage/time curve. In the context of emotion detection, that approach is more appropriate than standard cepstral techniques, because it has the potential to detect local irregularities which underlie emotionally significant qualities of vocalization, such as creak. Detecting vocal cord openings reliably is a non-trivial problem. There are standard algorithms which give rough solutions, but we believe that neural net techniques may give more precise identification. 2.3 Stage 2 The core of Stage 2 will be the description of two contours, one representing the rise and fall of intensity and the other describing the rise and fall of pitch (or strictly speaking F0). Two main operations are applied to the intensity contour. It is smoothed to filter events that last much less than a syllable. A more complex problem is suggesting a reference constant for the db scale. A histogram-based technique is currently used to give reasonable estimates of intensity given a calibration sample of normal speech. A more sophisticated approach is to use evidence indicative of vocal effort (the energy in our third spectral band is reported to correlate with perceived vocal effort). Finding appropriate functions is another task where neural net techniques are probably appropriate. Constructing a pitch contour is complex because (a) samples usually contain time periods where there is no pitch contour most obviously pauses; and (b) stage 1 outputs may lack direct information about pitch

4 4 S-E.Fotinea, S.Bakamidis, T.Athanaselis, I.Dologlou, G.Carayannis, R.Cowie, E.Douglas- Cowie, N.Fragopanagos, J.G.Taylor during time periods where there is a pitch contour, or contain misleading information about pitch during time periods when there is none. Our response to these problems is based on a flexible string that is (so to speak) stretched across the sample from the first slice that contains good pitch information to the last. Each point is the string is pulled towards data points on one hand, and towards its neighbours on the other. An iterative process finds a balance between the two, giving a robust estimate of the pitch contour. After contour extraction, the speech signal is divided into significant units before quantitative descriptions are formed. The main units to be considered are tunes, roughly phrase-like units; and pauses, i.e. silences which form the outer boundary of a tune (these must last for more than 150ms). Shorter intervals when no speech is detected are called silences. 2.4 Stage 3 Stage 3 takes the general descriptions provided by stage 2 and recovers parameters that are expected to correlate with emotional states. In general, the relevant parameters come from straightforward statistical summary of data derived in stage 2. That strategy yields both parameters that are generally regarded as basic (for instance, mean, range, and standard deviation of intensity or pitch range) and others that are at a higher level, for instance parameters related to durations of chunks, tunes and silences. A few key descriptors involve more specific operations. These involve specific properties of tunes, which we have considered under the heading tune shape, and some spectral properties. In the spectral domain, various measures which have been correlated with perceptual qualities will also be generated from the basic stage 2 outputs, notably; Energy in 0-500Hz region relative to total energy (see [4]). Measures from [5] based on peak energy in selected spectral bands. Band 2 Band 3 (correlates with perceived coarseness of voice) Band 3 Band 4 (correlates with perceived stability of voice) Band 2 Band 4 (correlates with perceived use of head register vs chest register) The approach described up to this point defines a wide range of parameters that could in principle be passed to the emotion recognition subsystem. We have reported elsewhere on the relationships between these parameters and speakers emotional states, using a range of learning algorithms to identify the parameters which have most predictive value. 3 Linguistic Analysis of Speech The Linguistic Analyser processes the speech signal and provides the linguistic parameters used for the deduction of the user s emotion based on the speech signal. It consists of a Signal Enhancement/Adaptation module to provide the enhanced speech signal from the original speech input, and a robust Speech Recognition module that outputs a text string representing what the speaker has uttered. This text serves as input to the Text Postprocessing module that converts text to emotion.

5 Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis Recognising Speech To guarantee the best possible quality of speech recognition for emotionally coloured speech, the Linguistic Analyser should use uncompressed speech signals before any enhancement or recognition algorithm is being applied. The modules that need to be combined in order to recognise speech are in short presented below The Signal Enhancement/Adaptation Module Signal Enhancement: The uncompressed speech signal is fed to the Signal Enhancement/Adaptation module and it is processed in order to enhance the signal and remove noise prior to recognition. Two methods are currently implemented. The first, is the well known non-linear spectral subtraction [6] and the second comprises a noise reduction technique presented in [7], based on the Singular Value Decomposition (SVD) approach. Validation tests are being conducted to evaluate the speech enhancement algorithms with respect to the word error rate and initial comparative results are reported in [8]. Speaker adaptation: An important source of variability in speech is due to the difference between speakers, e.g., male/female, adult/child. The performance may be improved considerably if normalisation of the input speech against speaker variability is performed. The selected strategy involves feature extraction for the current speaker be adapted to the acoustic models, instead of models being adapted to the input The Speech Recognition Module This module allows the processing of the speech signal and the feature extraction, by converting each speech frame into a set of cepstral coefficients. Then, acoustic phoneme models provide estimates of the probability of the features, given a sequence of words. Language modelling provides a mechanism for estimating the probability of some word in an utterance given its preceding words. The output of this process is a text, representing what the speaker has uttered. The developed Speech Recognition module has been inspired by the work proposed in [9]. Parameter extraction: The prime function of the parameter extraction module is to divide the input speech into blocks; then for each block to derive a smoothed spectral estimate. (The spacing between blocks is typically 10 msecs and blocks are normally overlapped to give a longer analysis window, typically 25 msecs). In almost all cases of such processing, it is quite usual to apply a tapered window function (e.g. Hamming) to each block. Mel-Frequency Cepstral Coefficients (MFCCs) are used to model the spectral characteristictis of each block. Acoustic modelling: The purpose of the acoustic models is to provide a method of calculating the likelihood of any vector sequence Y given a word w. In principle, the required probability distribution could be found by finding many examples of each w and collecting the statistics of the corresponding vector sequences. However, this is impractical for LVR systems and instead, word sequences are decomposed into basic sounds called phones. Each individual phone is represented by a hidden Markov model (HMM). Contextual effects cause large variations in the way that different sounds are produced. Hence, to achieve good phonetic discrimination, different HMMs have to be trained for each different context, instead for one HMM per phone.

6 6 S-E.Fotinea, S.Bakamidis, T.Athanaselis, I.Dologlou, G.Carayannis, R.Cowie, E.Douglas- Cowie, N.Fragopanagos, J.G.Taylor Our approach involves using triphones where every phone has a distinct HMM model for every unique pair of left and right neighbours. Moreover, state-tying techniques with continuous density HMMs are used. Language modelling: An effective way of estimating the probability of a word given its preceding words, is to use N-grams which simultaneously encode syntax, semantics and pragmatics and they concentrate on local dependencies, which makes them very effective for languages where word order is important and the strongest contextual effects tend to come from near neighbours. We have also chosen N-grams, because the N-gram probability distributions can be computed directly from text data, yielding hence no requirement to have explicit linguistic rules (e.g. formal grammars). Search engine: The basic recognition problem is to find the most probable sequence of words given the observed acoustic signal (based on the Bayes rule for decomposition). In our system, we use the breadth-first approach and specifically, beam search and Viterbi decoding (it exploits Bellman s optimality principle). The dynamic performance in this search engine accomplishes a system capable of exploiting complex language models and HMM phone models depending on both the previous and succeeding acoustic context, such as coarticulation. Moreover, it can do this in a single pass, in contrast to most other Viterbi-systems that use multiple passes. 3.2 Emotion-related information from text Converting speech to text is the outcome of the Speech Recognition procedure described above. The extraction, however, of the speaker s emotional state requires conversion from text to emotion. This module of the Linguistic Analyser, being the last, in terms of sequential execution, is called the Text Post-Processing Module. The simplest way to proceed is to assume that Text Post-Processing comprises text retrieval techniques, such as Word Spotting, in order to provide a classification of the user s emotion based on the linguistic characteristics of the user s utterance. The possible use of Emotional Lexicons is being investigated. Such lexicons exist in English, and the appropriate adaptation for the Greek language seems necessary, should we foresee emotion recognition for Greek as well. The basic process we start with here is to use a look-up table to describe the speaker s emotional state from interpreted words. We base this on the original one of Whissel, extended more recently to 8700 word, with a 90% matching rate for most documents [10]. This uses the two dimensions of activation and evaluation. The first of these is the degree of arousal associated with various emotion-relevant words, such as having a low value of 2.2 for bashful and a high of over 6 for surprised. The second emotion dimension is the degree of pleasantness, with low value of 1.1 for the word guilty, and a high value of 6.4 associated with delighted. We use the two-dimensional look-up table to produce a two-dimensional activation-evaluation coding for each word in a text. This transformation produces a dynamic trajectory followed in the two-dimensional emotion space, as a string of words is successively processed. This trajectory is to be related to the associated feel-trace trajectory arising by using the ASSESS system to analyse the prosodic components of the speech input string. An important question to be answered is to how these two trajectories are to be fused to produce a suitable coding of the emotional content of the speech trace. This is discussed in section 4.

7 Emotion in Speech: towards an integration of linguistic, paralinguistic and psychological analysis 7 4 Fusion of the two streams A plethora of issues arise from the effort to fuse the emotional features extracted from the linguistic analysis and those extracted from the paralinguistic (prosody). These issues pertain to both technical intricacies and the generic complexity of the fusion task. On the technical level, there must be an effective method of synchronizing and combining the two streams of data that represent the emotional state as detected by the two types of analysis. To this end, we need to specify what the unit of analysis is and which characteristic(s) of the speech stream should trigger the different modules. On a more generic level, there is a question of harmonization of the two emotion inference procedures by means of balancing the authority of the linguistic vs. paralinguistic analysis with respect to which is apposite for deducing the emotional state at each instance. This is particularly important in cases where the two methods report incompatible emotional states. For instance, we know that the same phrase spoken with a different tone (different prosodic features) can have quite a different emotional effect. Thus, prosody can often be more informative than the actual words spoken as when one uses sarcasm or when the semantic content of the phrase spoken is neutral but the tone is highly emotional (e.g. in frustration). The aptness of the two individual analyses is important before a merge occurs, as both linguistic and paralinguistic analysis are susceptible to emotion detection errors. In the case of the text post-processing, we have to improve our approach by removing incorrect emotion assignments by the presence of further context indicating that the speaker is not themselves experiencing the emotion state spotted. Thus, in the example He said to me that he was very angry it is clear that the speaker is not angry. Thus items of reported speech containing emotion words should be treated as a separate category. They may contain implications of an emotion state that needs to be taken notice of. But that needs to be treated differently from that of recognising and responding appropriately to the emotion state of the speaker. Thus any presence of reported speech words: that p, X felt, or equivalents must be treated separately. In the case of emotion detection by prosody, it has been reported that different emotional states correspond to the same or similar prosodic patterns. Thus, special care should be given for classifying emotion based on these patterns with the utilisation of crossreferencing between the two feature streams to resolve ambiguity. One solution to this problem of fusion is by means of a neural network, trained on a suitable training set of speech streams with known emotional state tagging. Such a set of data is being developed as part of ERMIS, with the associated emotional activation-evaluation trajectories being part of the developing FEELTRACE database. An initial version of this was already used in an earlier project (PHYSTA), using a variety of techniques; our present approach is more principled as well as involving a larger and better-defined FEELTRACE data-base. We will also take seriously the suggestion of the accompanying paper, to take lesions from the human brain. More specifically we propose to build a feedback system, essentially mimicking the ventral route (amygdala and prefrontal cortices) to emotion recognition in the human brain. This will allow attention to be directed to subsets of the overall speech features being analysed; in that way we will obtain speed-up as well as improved accuracy.

8 8 S-E.Fotinea, S.Bakamidis, T.Athanaselis, I.Dologlou, G.Carayannis, R.Cowie, E.Douglas- Cowie, N.Fragopanagos, J.G.Taylor 5. Conclusions We have presented a description of the ongoing work in the ERMIS project to marry prosodic and linguistic analyses of speech so as to create an emotional recognition system. The problems of so doing are not trivial, as has been noted elsewhere in some detail [11, 12]. However we consider that we have built an expertise on both the fundamentals of emotional recognition and word recognition. Addition of feedback may help give the edge needed to obviate the difficulties noted in [11, 12]. Acknowledgements: This work has been partially supported by the European Commission under the ERMIS Project Grant (IST ). References 1. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine 18 (1), (2001) Taylor, J.G. et al: The Emotional Recognition Architecture in the Human Brain. Submitted to ICONIP/ICANN 2003, Istanbul, Turkey (2003) 3. McGilloway, S., Cowie, R. Douglas-Cowie, E., Gielen, S., Westerdijk and Stroeve, S.: Automatic recognition of emotion from speech: a rough benchmark. Proceedings of ISCA Workshop on Speech and Emotion: A Conceptual framework for research, Belfast:Textflow, (2000) Klasmeyer, G.: An automatic description tool for time-contours and long-term average voice features in large emotional speech databases. Proceedings of ISCA Workshop on Speech and emotion: A conceptual framework for research. Belfast:Textflow, (2000) Hammarberg, B., Fritzell, B., Gauffin, J., Sundberg, J., Wedin, l.: Perceptual and acoustic correlates of voice qualities. Acta Otolaryng 90, (1980) Pellom, B. L., Hansen, J.H.L.: Voice Analysis in Adverse Conditions: The Centennial Olympic Park Bombing 911 Call, Proceedings of IEEE Midwest Symposium on Circuits & Systems, August (1997) Doclo, S., Dologlou, I., Moonen, M.: A novel iterative signal enhancement algorithm for noise reduction in speech, Proceedings of ICSLP, Sydney, Australia, (1998) Athanaselis, T., Fotinea, S-E., Bakamidis, S., Dologlou, I., Giannopoulos, G.: Signal Enhancement for Continuous Speech Recognition. Submitted to ICONIP/ICANN 2003, Instabul, Turkey (2003) 9. Young, S.J.: Large Vocabulary Continuous Speech Recognition IEEE Signal Processing Magazine 13(5) (1996) Whissel, C.M.: The dictionary of affect in language. In R Plutchik, H Kellerman, eds, Emotion: Theory, Research and Experience: vol 4. The Measurement of Emotions New York: Academic Press (1989) 11. Russell, J.A. et al: Facial & Vocal Expressions of Emotion. Ann Rev Psychol 54 (2003) McNeely, H.E. & Parlow, S.E.: Complimentarity of Linguistic and Prosodic Processes in the Intact Brain. Brain & Language 79 (2001)

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Recognition of Emotions in Interactive Voice Response Systems

Recognition of Emotions in Interactive Voice Response Systems Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition

ABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA

More information

Gender Identification using MFCC for Telephone Applications A Comparative Study

Gender Identification using MFCC for Telephone Applications A Comparative Study Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Measuring inharmonicity through pitch extraction Galembo, A. and Askenfelt, A. journal: STL-QPSR volume: 35 number: 1 year: 1994

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

Automatic Transcription: An Enabling Technology for Music Analysis

Automatic Transcription: An Enabling Technology for Music Analysis Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon simon.dixon@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University

More information

Integration of Negative Emotion Detection into a VoIP Call Center System

Integration of Negative Emotion Detection into a VoIP Call Center System Integration of Negative Detection into a VoIP Call Center System Tsang-Long Pao, Chia-Feng Chang, and Ren-Chi Tsao Department of Computer Science and Engineering Tatung University, Taipei, Taiwan Abstract

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Cursive Handwriting Recognition for Document Archiving

Cursive Handwriting Recognition for Document Archiving International Digital Archives Project Cursive Handwriting Recognition for Document Archiving Trish Keaton Rod Goodman California Institute of Technology Motivation Numerous documents have been conserved

More information

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES Zhen Qin, Allard Jongman Department of Linguistics, University of Kansas, United States qinzhenquentin2@ku.edu, ajongman@ku.edu

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 A comparison of the OpenGIS TM Abstract Specification with the CIDOC CRM 3.2 Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 1 Introduction This Mapping has the purpose to identify, if the OpenGIS

More information

Lecture 1-10: Spectrograms

Lecture 1-10: Spectrograms Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed

More information

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

More information

You Seem Aggressive! Monitoring Anger in a Practical Application

You Seem Aggressive! Monitoring Anger in a Practical Application You Seem Aggressive! Monitoring Anger in a Practical Application Felix Burkhardt Deutsche Telekom Laboratories, Berlin, Germany Felix.Burkhardt@telekom.de Abstract A monitoring system to detect emotional

More information

Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features

Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features 22 Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features Marko Lugger and Bin Yang University of Stuttgart Germany Open Access Database www.intechweb.org 1. Introduction

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various

More information

Robustness of a Spoken Dialogue Interface for a Personal Assistant

Robustness of a Spoken Dialogue Interface for a Personal Assistant Robustness of a Spoken Dialogue Interface for a Personal Assistant Anna Wong, Anh Nguyen and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 22, Australia

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

SoMA. Automated testing system of camera algorithms. Sofica Ltd

SoMA. Automated testing system of camera algorithms. Sofica Ltd SoMA Automated testing system of camera algorithms Sofica Ltd February 2012 2 Table of Contents Automated Testing for Camera Algorithms 3 Camera Algorithms 3 Automated Test 4 Testing 6 API Testing 6 Functional

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Timing Errors and Jitter

Timing Errors and Jitter Timing Errors and Jitter Background Mike Story In a sampled (digital) system, samples have to be accurate in level and time. The digital system uses the two bits of information the signal was this big

More information

Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN2300 Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Numerical Field Extraction in Handwritten Incoming Mail Documents

Numerical Field Extraction in Handwritten Incoming Mail Documents Numerical Field Extraction in Handwritten Incoming Mail Documents Guillaume Koch, Laurent Heutte and Thierry Paquet PSI, FRE CNRS 2645, Université de Rouen, 76821 Mont-Saint-Aignan, France Laurent.Heutte@univ-rouen.fr

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information

Automatic parameter regulation for a tracking system with an auto-critical function

Automatic parameter regulation for a tracking system with an auto-critical function Automatic parameter regulation for a tracking system with an auto-critical function Daniela Hall INRIA Rhône-Alpes, St. Ismier, France Email: Daniela.Hall@inrialpes.fr Abstract In this article we propose

More information

Automatic Emotion Recognition from Speech

Automatic Emotion Recognition from Speech Automatic Emotion Recognition from Speech A PhD Research Proposal Yazid Attabi and Pierre Dumouchel École de technologie supérieure, Montréal, Canada Centre de recherche informatique de Montréal, Montréal,

More information

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Voice Digitization in the POTS Traditional

More information

CONATION: English Command Input/Output System for Computers

CONATION: English Command Input/Output System for Computers CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

Agilent AN 1316 Optimizing Spectrum Analyzer Amplitude Accuracy

Agilent AN 1316 Optimizing Spectrum Analyzer Amplitude Accuracy Agilent AN 1316 Optimizing Spectrum Analyzer Amplitude Accuracy Application Note RF & Microwave Spectrum Analyzers Table of Contents 3 3 4 4 5 7 8 8 13 13 14 16 16 Introduction Absolute versus relative

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

Extended Resolution TOA Measurement in an IFM Receiver

Extended Resolution TOA Measurement in an IFM Receiver Extended Resolution TOA Measurement in an IFM Receiver Time of arrival (TOA) measurements define precisely when an RF signal is received, necessary in the identification of type and mode of RF and radar

More information

MODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G.

MODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G. MODELING OF USER STATE ESPECIALLY OF EMOTIONS Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G. email: noeth@informatik.uni-erlangen.de Dagstuhl, October 2001

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 441 Audio Content Analysis for Online Audiovisual Data Segmentation and Classification Tong Zhang, Member, IEEE, and C.-C. Jay

More information

Impedance 50 (75 connectors via adapters)

Impedance 50 (75 connectors via adapters) VECTOR NETWORK ANALYZER PLANAR TR1300/1 DATA SHEET Frequency range: 300 khz to 1.3 GHz Measured parameters: S11, S21 Dynamic range of transmission measurement magnitude: 130 db Measurement time per point:

More information

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT The Effect of Network Cabling on Bit Error Rate Performance By Paul Kish NORDX/CDT Table of Contents Introduction... 2 Probability of Causing Errors... 3 Noise Sources Contributing to Errors... 4 Bit Error

More information

1 Introduction. An Emotion-Aware Voice Portal

1 Introduction. An Emotion-Aware Voice Portal An Emotion-Aware Voice Portal Felix Burkhardt*, Markus van Ballegooy*, Roman Englert**, Richard Huber*** T-Systems International GmbH*, Deutsche Telekom Laboratories**, Sympalog Voice Solutions GmbH***

More information

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

VoiceXML-Based Dialogue Systems

VoiceXML-Based Dialogue Systems VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based

More information

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM B. Angelini, G. Antoniol, F. Brugnara, M. Cettolo, M. Federico, R. Fiutem and G. Lazzari IRST-Istituto per la Ricerca Scientifica e Tecnologica

More information

The accurate calibration of all detectors is crucial for the subsequent data

The accurate calibration of all detectors is crucial for the subsequent data Chapter 4 Calibration The accurate calibration of all detectors is crucial for the subsequent data analysis. The stability of the gain and offset for energy and time calibration of all detectors involved

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

More information

Sound Quality Evaluation of Hermetic Compressors Using Artificial Neural Networks

Sound Quality Evaluation of Hermetic Compressors Using Artificial Neural Networks Purdue University Purdue e-pubs International Compressor Engineering Conference School of Mechanical Engineering 2006 Sound Quality Evaluation of Hermetic Compressors Using Artificial Neural Networks Claudio

More information

THE VOICE OF LOVE. Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard

THE VOICE OF LOVE. Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard THE VOICE OF LOVE Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard University of Toledo, United States tbelanger@rockets.utoledo.edu, Caroline.Menezes@utoledo.edu, Claire.Barbao@rockets.utoledo.edu,

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

Develop Software that Speaks and Listens

Develop Software that Speaks and Listens Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

A Game of Numbers (Understanding Directivity Specifications)

A Game of Numbers (Understanding Directivity Specifications) A Game of Numbers (Understanding Directivity Specifications) José (Joe) Brusi, Brusi Acoustical Consulting Loudspeaker directivity is expressed in many different ways on specification sheets and marketing

More information

DeNoiser Plug-In. for USER S MANUAL

DeNoiser Plug-In. for USER S MANUAL DeNoiser Plug-In for USER S MANUAL 2001 Algorithmix All rights reserved Algorithmix DeNoiser User s Manual MT Version 1.1 7/2001 De-NOISER MANUAL CONTENTS INTRODUCTION TO NOISE REMOVAL...2 Encode/Decode

More information

The Phase Modulator In NBFM Voice Communication Systems

The Phase Modulator In NBFM Voice Communication Systems The Phase Modulator In NBFM Voice Communication Systems Virgil Leenerts 8 March 5 The phase modulator has been a point of discussion as to why it is used and not a frequency modulator in what are called

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

The LENA TM Language Environment Analysis System:

The LENA TM Language Environment Analysis System: FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September

More information