Audio Scene Analysis as a Control System for Hearing Aids

Size: px
Start display at page:

Download "Audio Scene Analysis as a Control System for Hearing Aids"

Transcription

1 Audio Scene Analysis as a Control System for Hearing Aids Marie Roch marie.roch@sdsu.edu Tong Huang hty2000tony@yahoo.com Jing Liu jliu 76@hotmail.com San Diego State University 5500 Campanile Dr San Diego, CA, USA Richard R. Hurtig The University of Iowa, 119 SHC Iowa City, IA, USA richard-hurtig@uiowa.edu Abstract It is well known that simple amplification cannot help many hearing-impaired listeners. As a consequence of this, numerous signal enhancement algorithms have been proposed for digital hearing aids. Many of these algorithms are only effective in certain environments. The ability to quickly and correctly detect elements of the auditory scene can permit the selection/parameterization of enhancement algorithms from a library of available routines. In this work, the authors examine the real time parameterization of a frequency-domain compression algorithm which preserves formant ratios and thus enhances speech understanding for some individuals with severe sensorineural hearing loss in the 2-3 khz range. The optimal compression ratio is dependent upon qualities of the acoustical signal. We briefly review the frequency-compression technology and describe a Gaussian mixture model classifier which can dynamically set the frequency compression ratio according to broad acoustic categories which we call cohorts. We discuss the results of a prototype simulator which has been implemented on a general purpose computer. 1 Introduction Sensorineural hearing loss, a hearing deficit due to physiological problems in the cochlea, is estimated to affect over 20 million individuals in the United States [9]. Hearing impaired individuals typically have more problems with speech understanding in situations with lower signal to noise ratio (SNR), and it is common for a hearing aid wearer to be satisfied with their device in quiet environments but not in the presence of noise. Audio scene analysis is the process of automatically extracting information about the environment based upon properties of an observed signal, and has numerous applications in multimedia processing. The application of audio scene analysis to hearing aids permits the selection of signal enhancement algorithms (or parameterizations) which are appropriate to specific situations. Several researchers have used audio scene analysis to classify the background noise of a hearing aid wearer s environment. Typical classes include speech in traffic, clean speech, speech in babble, and so on. Kates [8] proposed exploiting information about the audio scene to permit the selection of signal processing algorithms based upon the audio scene. Kates measured the Mahalanobis distance between features representative of envelope modulation supplemented with linear fits of the spectrum above and below the mean. More recently, Nordqvist and Leijon [14] introduced a discrete observation hidden Markov model (HMM) based classifier for hearing aids using a vector quantization codebook derived from a small number of delta features from cepstral coefficients. They elected to use only the delta features which characterizes the change in the cepstrum as the delta features tend to be reasonably invariant. A second stage HMM used an ad-hoc metric in place of state distributions which merged the results of class specific HMMs, with the class decision based upon the current state in the forward decode. Büchler et. al [2] implemented a variety of machine learning techniques (k-means, histogram driven Bayes classifiers, multilayer perceptrons, and HMMs) as well as a post processing step of voting on the class based on the last N classification decisions. The ergodic HMMs were shown to be the best classifier. A number of features derived from 1 s. intervals 1

2 were explored, the best of which included tonality, width, spectral center of gravity and its fluctuation, pitch variance, and measurements of offset time. This work focuses on a novel application of audio scene analysis to hearing aids. Rather than classifying the background noise, we are interested in categorizing attributes of the foreground speaker for the purpose of enhancing his or her speech. We describe and implement a control system for a frequency domain compression algorithm which maps the frequency information across a specified bandwidth into a lower range where the listener s auditory deficit is less pronounced. The control system permits the dynamic assignment of compression ratio based upon characteristics of the speech signal. Unlike previous work, we do not focus on constructing a system suitable for implementation on today s hearing instruments. As noted by Armstrong [1], Moore s law is applicable to the computational power in hearing aids and we believe that it is reasonable to target research towards future generations of hearing aids rather than current ones. The remainder of the article is organized as follows: the frequency compression algorithm is reviewed in section 2, and the classifier is introduced in section 3. The databases and experiments are described in section 4 and we summarize the results and future directions in section 5. 2 Frequency Compression For some listeners with sufficient high-frequency hearing loss, amplification of the signal across the bandwidth where hearing deficit occurs is insufficient. Even with amplification, these listeners are unable to perceive the formant patterns necessary to understand speech. It is well known that normal listeners are adept at handling frequency compression. Normal listeners have little trouble understanding the speech of children, adult females, or adult males even though they typically cover different bandwidths. Peterson and Barney s [17] study of vowels suggests that it is the ratio of formants which are important for perceiving speech. This observation has led some researchers to examine methods to reduce the bandwidth of a signal presented to a hearing-impaired listener. Studies by several groups have shown that most listeners can understand speech that is compressed to 70% of its original bandwidth [22]. Prior approaches to frequency modification used in hearing aids have not proportionally compressed across the spectrum and the change of proportionality of the formants may counteract the advantage gained by shifting unusable frequency information into a range accessible by the wearer. Parent et. al. [15] describe a frequency transposition system where higher frequencies are shifted to lower ones, but this shift does not preserve the formant ratios. Turner and Hurtig [22] conducted a study of listeners to determine whether a frequency-domain compression algorithm could provide better results than simple amplification when listening to adult male and female talkers. They hypothesized that users with severe hearing loss (> 60 db HL) across the 2 to 3 khz range and less severe loss in the lower frequencies would be most likely to benefit. The study included 15 hearing-impaired listeners who were close matches to the hypothesis criteria (50-60 db HL above 2 khz) as well as a 4 normal-hearing control speakers. Their results showed that 45% of the listeners showed statistically significant improvement for female speakers and 20% of the listeners had improvement for male speakers. Although there were no clear indicators as to how the population that could be helped by frequency compression could be identified, there was a trend for speakers who achieved higher recognition scores on unamplified speech to benefit less from the compressed speech. The algorithm (U.S. patent ) operates on consecutive, non-overlapping frames of speech. Each frame is transformed to the frequency domain and a proportional mapping of frequency bins is performed. Care is taken to preserve the DC portion of the signal. An inverse Fourier transform is applied and the output signal is presented to the hearing aid wearer. The compressed frequencies may optionally be transposed as well. 3 Cohort Detection In Turner and Hurtig s study [22], subjects were tested at varying compression ratios to determine whether compression was useful to the listener and what level of comprehension optimized performance. In addition to being listener dependent, the optimal compression ratio is dependent upon the speech being analyzed and depends upon several variables. The physical characteristics of the speaker play a major role in this. Speakers with shorter vocal tracts will tend to have higher formants in their voiced speech and thus require greater compression than their longer vocal tract counterparts. Alternatively, one can consider spectral differences in classes of articulatory productions. As an example, fricatives tend to have high-frequency energy which is believed to be important for their correct recognition. We will define a cohort to be a set of related classes which might benefit from a common compression ratio setting. Each class is composed of some broadly defined acoustic quality, such as manner or place of articulation, or characteristics of a speaker group. Consequently, the real-time identification of cohorts permits a dynamic setting of the compression ratio. Several factors have motivated our choice of cohort group. We hypothesize that from a human factors point of view, abrupt and frequent changes in compression ratio may be distract-

3 ing to a listener. In addition, for probabilistic classifiers, it has been shown [18] that the average log-likelihood projections (the difference of log probability in a two class problem) produced by multiple observations from the same class have an F-ratio whose lower bound is the F-ratio of any single observation. This implies that in general, the average log likelihood projections from the same class will be more separable. In the current context, this means that if each cohort is active for a long enough period of time and we average the log-likelihood projections, there should be a reduction in the classification error rate. Consequently, we have decided to select cohorts based upon source- and vocal-tract characteristics of the speaker. In particular, we are interested in vocal qualities affected by the vocal-tract length and vocal-fold thickness. For convenience, we will call these groups male and female, but it is important to note that classification of a high-pitched male as belonging to a female cohort and vice-versa is not inappropriate. The male-female separation problem is one at which human listeners are reasonably adept [21, 20], and automatic systems typically perform quite well when given phrases of several seconds. Vergin et al. [23] proposed an approximate formant detector for F1 and F2 and compared the detected values to known means for each gender. Parris and Carrey [16] detected gender by linearly combining the output of gender-dependent sub-word hidden Markov models with the output of an F0 tracker. In both cases, the systems performed with low error rate, but relied on segments of several seconds. By using a sliding window, it is possible to achieve frame by frame classification in real time, but windows of several seconds are inappropriate for conversational speech where turns are likely to occur on a frequent basis. We have trained a classifier using one Gaussian mixture model (GMM) per cohort. GMMs are well known for their ability to model arbitrarily complex distributions with multiple modes and are effective classifiers for many tasks. GMMs consist of N normal distributions, or mixtures. The number of mixtures is typically chosen based upon the empirical performance of training and development data sets. The mixtures are scaled by a set of weights such that all of the weights sum to 1. Thus the scaled sum of the N integrals of the Gaussians is 1, and the model represents a probability distribution. Training is accomplished using the expectation maximization (EM) algorithm, an iterative algorithm which is guaranteed to find a local optimum [13]. The EM algorithm requires an initial model, which we create based upon the partitions induced by vector quantization (VQ) [10]. To reduce computational cost and due to the asymptotic independence of cepstral feature vectors [12], it is assumed that the covariance matrices are diagonal. Details on the algorithms for both VQ and HMMs can be found in standard texts such as [6]. The feature vectors are the Mel-filtered cepstral coefficients (MFCC) [6], which are the dominant feature set in the speech, speaker, and language recognition communities. These are created as follows: Successive frames are formed by multiplying the input with a Hamming window which is shifted between each frame. The short time spectrum of each window is computed with a discrete Fourier transform. The squared magnitude spectrum is filtered in the frequency domain using a set of triangular filters whose center frequencies are regularly spaced on the Mel scale. Finally a discrete cosine transform is applied to the log magnitude squared spectrum, resulting in the energy and a set of Mel-filtered cepstral coefficients (MFCC). The MFCCs are frequently supplemented by their derivatives which are appended to the feature vector. The lower MFCC components are indicative of the overall slope and shape of the frame s Mel-filtered spectrum while the higher order MFCC components represent the finer detail. It is typical to only retain lower order components as fine variations in the spectrum are typically too variable to be of significant use for most classification tasks. Feature vectors are extracted in real time from the input speech. In the recognition phase, the plug-in maximum a- posteriori (MAP) rule is used to decide the class. As the real class distributions are unknown, the estimated ones are used (they are plugged-in ), and it can be shown [6] that a decision rule which selects the largest likelihood will minimize the risk for a 0-1 loss rule. For simplicity, we make the common assumption that observations are independent of one another. 4 Methodology and Experiments Selection of databases for evaluation of the system was motivated by both suitability and availability of prerecorded corpora. The ideal corpus would consist of a large body of labeled microphone speech sampled at 16 khz or faster from a hearing aid collected in conditions similar to that encountered by users on a daily basis. The authors are unaware of any publicly available corpora which meet this criteria. Consequently, we have selected three separate corpora with different strengths and weaknesses. The SPIDRE [11] corpus contains 322 speakers (157 female and 165 male) who participated in varying numbers of approximately 5 m. sessions of unplanned conversational speech in a variety of environments (homes, dormitories, and so on). The combination of word aligned transcriptions, background speech, and environmental noise make the content of this corpus an excellent match for the problem domain. Unfortunately, the data was collected over the public telephone network (8 khz sampling, 8 bit mu-law quantization), resulting in low bandwidth as well as chan-

4 nel effects both from transmission equipment and multiple telephone handsets. Different microphone responses from telephone handsets, particularly from carbon button versus electret microphones, is a well known source of error for speech classification tasks. With the exception of the additive noise sources, these limitations represent additional and unnecessary constraints for hearing aids under most circumstances. In contrast, the TIMIT [5] corpus is a recording-booth quality corpus which consists of 630 speakers (192 female and 438 male) who recorded 10 short shibboleth sentences with an average duration of about 6.1 s. Transcriptions are provided at both the phoneme and word levels, permitting detailed analysis of classification results. The TIMIT transcriptions use 58 classes of narrow transcription phonemes. For analysis, we grouped these into vowels, diphthongs, and the overlapping classes associated with place and manner of articulation. The final database is the NITMIT [7] corpus. NTIMIT is a version of TIMIT that has been transmitted through public telephone network and resampled at the terminating end 1. The labels provided for all three corpora are known to contain some transcription and alignment errors, but can be considered to be reliable in general. Experiments were designed to illustrate the influence of various factors on recognition performance. To provide insight as to the contributions of the sampling rate reduction and transmission channel effects present in the SPIDRE corpus, section 4.1 reports the classification error rate of individual frames with TIMIT, a downsampled 8 khz version of TIMIT as well as NTIMIT. Section 4.2 reports the error rates for the TIMIT and SPIDRE corpora using the aforementioned averaged averaged log-likelihood projections which in many cases provide a better separation between the cohorts. Spurious class changes, which may prove to be a human factors issue for the wearer of a hearing aid, are also investigated. For all experiments, feature vectors contained 12 Mel filtered cepstral coefficients plus energy and their deltas which were extracted from frames created with a 20 ms. Hamming window which advanced without overlap. The 8 khz speech was filtered with 24 Mel filters spanning Hz, while the 16 khz speech was filtered with 35 filters spanning Hz. The increased number of filters was selected in order to provide similar filter widths across the common bandwidth. The first derivatives are appended to the feature vector, resulting in a 26 dimensional feature vector. For each corpus, 25 female and 25 male speakers were selected to serve as training data. From each of the speakers an average of 12 s. was used to construct training sets of 5 1 TIMIT, NTIMIT, and SPIDRE are available through The Linguistic Data Consortium at m. per gender. For the SPIDRE corpus, Raj and Singh s endpointer [18] was used to detect speech activity. In addition, the speech was taken from single gender phone calls to prevent any channel cross talk from contaminating the data. No speech activity detection was used for the TIMIT corpus which has a very brief silence at the beginning and end of each utterance. Five iterations of the k-means algorithm were sufficient to create the codebooks used to initialize the GMMs. The EM algorithm executes for 10 iterations or until convergence is reached. Typically, all ten iterations are executed and the system is close to convergence. During the final iteration, the log likelihood of any given model configuration shows no more than about a 1% improvement from the previous iteration. Male and female models were created for each corpus. Earlier experiments [19] showed that 128 mixtures provided a good trade-off between computation and accuracy, and 128 mixtures are used in all experiments. The TIMIT/NTIMIT development set consisted of the 580 (167 female and 413 male) speakers not used in training. For the SPIDRE development we selected all crossgender phone calls such that neither speaker was one of the 50 training speakers. This resulted in 87 5 m. calls between 66 female and 62 male speakers 5 m. calls. The test set was not split into development and evaluation data due to the smaller population of female speakers for the TIMIT corpus and the overall small number of cross gender calls in SPIDRE where one of the speakers was not in the training set. The system has been implemented on a general purpose computer system running Windows XP, and is capable of running in real time subject to the constraints of the operating system scheduler. There are latency issues associated with the operating system s multimedia system, many of which will be addressed either by the use of ASIO low latency drivers or a port to a digital signal processing board. 4.1 Effects of Sampling Rate and Telephone Transmission The first set of experiments were designed to illustrate the influence of some of the factors in the SPIDRE corpus that are not applicable for hearing aids except in the case of processing telephone speech. We examined the error rate of the classifier on individual frames on the TIMIT (TIMIT16) data. These were then compared to the error rates when the speech was downsampled to 8 khz (TIMIT8), and finally passed through the public telephone network (NTIMIT). Classification was performed on single frames, which resulted in a high error rate but permitted the ability to observe the effects of different environments on different classes of phonemes. The TIMIT16 data had an error rate of.313 which rose to.361 (15% increase) on TIMIT8 and to.411

5 lic telephone network and the additional bandwidth filtering between.2 and 3.5 khz by the public telephone network. In summary, it can be seen that reducing the sample rate and transmitting across a telephone channel has a significant impact on the performance of the classifier. 4.2 Increasing Class Separation and Human Factors Concerns Figure 1. Wide band spectrograms of a female speaker saying [ m ei k^] (make). Diphthongs showed a marked increase in error rate between the 8 and 16 khz corpora. The two spectrograms above show that additional relevant information (the region of the F4) is lost in the upper spectrogram. (31% increase) on the telephone speech of NTIMIT. When broken down by phoneme class, differences in error rate vary significantly. The largest TIMIT16 error rates were for plosives, affricatives, and labiodentals (which of course include some plosives). While the error rates of all categories rose when the speech was degraded, the increase of error rate across phoneme classes was far from uniform. After downsampling to 8 khz, the vowels, diphthongs, and nasals demonstrated the greatest degradation as a result of the downsampling. Figure 1 shows spectrograms for the diphthong [ ei ] in the word make for both the original and downsampled TIMIT. The region of the fourth formant which can be clearly detected in 16 khz speech is completely removed once the Mel filters are applied. When considering the telephone NTIMIT corpus, performance degradation across a wide variety of categories was also observed with nasals and diphthongs showing relative increases in error of over 40% as compared to the baseline, closely followed by alveolars, vowels, and glides. As the same handset was used for transmission for all calls, the degradation can be primarily attributed to channel effects and quantization noise during transmission across the pub- The individual frame results from the previous section can be significantly improved by basing the MAP decision on a short moving average (MA) of the log likelihood projections. As the current frame to be analyzed transitions from one class to another, the MA window will cover both classes and the assumption of a single class in the window will no longer be true, which makes the technique sensitive to speaker change. Conversational speech has a tendency towards shorter speech segments, with many turns being less than 2 s. in length. Consequently, we only consider relatively short windows. The experiments with TIMIT can be considered an optimistic view of the classifier performance while SPIDRE may be thought of as a pessimistic one. Figure 2 shows the results as the moving average length varies. As can be seen, the error rate for both corpora decreases exponentially as the window length increases with an elbow in the.5 to.8 s. range. The TIMIT error rate has decreased to about 5% in the elbow. The performance of SPIDRE has a significantly higher error rate of approximately 24% in the elbow region. The large differences in error can be partially explained by the differing sample rates, quantization, and telephone transmission discussed in the section 4.1. Other differences not explored in section 4.1 are the ambient noise, and the multiple speakers and microphones present in SPIDRE. The first of these are representative of the operating conditions for a hearing aid, but microphone mismatch between training and testing conditions is an avoidable problem for hearing aid applications. While we cannot reliably attribute what percentage of the error is to due to each type, it is well known in the speaker recognition community that microphone mismatch is a significant cause of error. No attempt was made to normalize the speech from the different microphones in this study. In addition, the classification error rate is reported with respect to biological gender rather than our definition of cohort which is based upon acoustic properties of a signal such as the range of the fundamental frequency or the poor harmonics to noise ratio characterized by breathy speech. Such characteristics are typical of, but do not always coincide with a speaker s gender. A precise composition of the cohorts is dependent upon the performance of hearingimpaired listeners at different compression ratios and is be-

6 Error rate Error rate Averaging window (secs.) (a) SPIDRE Averaging window (secs.) (b) 16 khz TIMIT Figure 2. Error rates for 128 mixture models on the telephone-bandwidth SPIDRE and TIMIT corpora. yond the scope of this study which demonstrates the feasibility of the control system. Pr(Segment Length N Secs the labeled frame sets are shorter than a quarter second. The disadvantage to the median filter is that it effectively moves the decision away from the optimal Bayes decision boundary. Assuming that the models are representative of the underlying distributions, this would predict an increased error rate which has been observed in our experiments. In practice, the increase in error rate is small, and varies with the length of the likelihood averaging window. Using a constant length median filter, increase in error rate was insignificant for short likelihood averaging windows and increased to about 4% as the likelihood averaging window approached a 0.8 s. 0.1 No filtering Median 150 ms 5 Summary and Conclusions Secs. Figure 3. Length of identified sequences on a typical SPIDRE conversation. A portion of the cumulative distribution function indicating the percentage of identified segments whose lengths are N seconds. The results with and without a per frame postclassification step of applying a 150 ms median filter are shown. Finally, from a human factors perspective, we must consider how often decisions change from one category to another. Excessive switching may be distracting to the user. In figure 3, we show a portion of the cumulative distribution function of a reasonably typical SPIDRE conversation. Nearly 38% of the contiguous frames labeled as the same gender have a duration of under.25 seconds. When a short (less than.2 s.) median filter is applied, only about 17% of We have discussed a method to enhance speech for listeners with high-frequency hearing loss and to dynamically adapt the system in response to varying qualities of human speech. We have shown that it is possible to make decisions about how a frame of speech should be compressed using approximately 0.5 s. of previous history, making the classification decision suitable for use in conversation. Furthermore, we have considered human factors issues to prevent excessive switching between cohort classes which may negatively impact the user s experience. When tested on a clean speech corpus, the system achieves an error rate of less than 5%. On telephone speech, the error is approximately 24%, but a portion of the error rate is due to microphone mismatch, a situation that is unlikely to characterize the majority of most hearing aid wearers day. Future work will use an F0 detector to decide cohort membership as opposed to physical sex. The Gaussian mixture models used in this study are equivalent to continuous-observation ergodic hidden

7 Markov models and thus similar to the classifier used by Büchler et al. [2] although it is used with different feature set and a for different purpose. Further endeavors to improve the error rate are possible both with respect to the classifier and the feature set. Other classifier organizations such as structural GMMs combined with neural nets [24] and support vector machines [3] have been known to provide good results in other domains. Other feature sets, particularly those which are known to be associated with gender (e.g. one of the breathiness measures reviewed or proposed by Fröhlich et al. [4]) are also areas for further investigation. Finally, a clinical trial should be conducted to determine the effectiveness of the system, and to guide further research. Although overcompression contributes the degrading the naturalness of the speech, it has not been shown that overcompression results in a reduction in speech intelligibility. Should this prove to be true, the classifier could be biased towards deciding in favor of cohorts with higher pitches by assuming a non-uniform prior. 6 Acknowledgments The authors would like to thank Rita Singh for making available the GMM and VQ source code used in [18] and the anonymous reviewers for their thoughtful comments. References [1] S. Armstrong. Integrated circuit technology in hearing aids. J. Acoust. Soc. of Am., 116(4, Pt. 2):2536, October abstract only. [2] M. Büchler, S. Allegro, S. Launer, and N. Dillier. Sound classification in hearing aids inspired by auditory scene analysis. EURASIP Journal on Applied Signal Processing, in press. [3] N. Cristianini and J. Shawe-Taylor. Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, UK, [4] M. Fröhlich, D. Michaelis, and H. W. Strube. Acoustic breathiness measures in the description of pathologic voices. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 2, pages , Seattle, WA, May [5] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue. Timit acoustic-phonetic continuous speech corpus. Technical Report LDC93S1, Linguistic Data Consortium, Philadelphia, PA, [6] X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing. Prentice Hall PTR, Upper Saddle River, NJ, [7] C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz. Ntimit: A phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 1, pages , Albuquerque, NM, April IEEE. [8] J. M. Kates. Classification of background noises for hearingaid applications. J. Acoust. Soc. of Am., 97(1): , January [9] V. D. Larson, D. W. Williams, W. G. Henderson, and L. E. Luethke. Efficacy of 3 commonly used hearing aid circuits: A crossover trial. Journal of the American Medical Association, 284(14): , October [10] Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Trans. Commun., COM-28:84 95, January [11] A. Martin, J. Godfrey, E. Holliman, and M. Przybocki. Spidre corpus. Technical Report LDC94S15 CD-ROM, Linguistic Data Consortium, Philadelphia, PA, [12] N. Merhav and C.-H. Lee. On the asymptotic statistical behavior of empirical cepstral coefficients. IEEE Trans. Signal Processing, 41(5): , May [13] T. K. Moon. The expectation-maximization algorithm. IEEE Signal Processing Mag., 13(6):47 60, November [14] P. Nordqvist and A. Leijon. An efficient robust sound classification algorithm for hearing aids. J. Acoust. Soc. of Am., 115(6): , June [15] T. C. Parent, R. Chmiel, and J. Jerger. Comparison of performance with frequency transposition hearing aids and conventional hearing aids. J. American Academy of Audiology, 8(5): , October [16] E. S. Parris and M. J. Carey. Language independent gender identification. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 2, pages , Atlanta, GA, May [17] G. E. Peterson and H. L. Barney. Control methods used in a study of the vowels. J. Acoust. Soc. of Am., 24(2): , March [18] B. Raj and R. Singh. Classifier-based non-linear projection for adaptive endpointing of continuous speech. Computer Speech and Language, 17(1):5 26, January [19] M. Roch, R. R. Hurtig, J. Liu, and T. Huang. Towards a cohort-selective frequency-compression hearing aid. In Proc. of the The Intl. Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, pages , Las Vegas, NV, June [20] M. F. Schwartz. Identification of speaker sex from isolated, voiceless fricatives. J. Acoust. Soc. of Am., 43(5): , [21] S. Singh and T. Murry. Multidimensional classification of normal voice qualities. J. Acoust. Soc. of Am., 64(1):81 87, July [22] C. W. Turner and R. R. Hurtig. Proportional frequency compression of speech for listeners with sensorineural hearing loss. J. Acoust. Soc. of Am., 106(2): , August [23] R. Vergin, A. Farhat, and D. O Shaughnessy. Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification. In Int. Conf. on Spoken Language Processing, pages , Philadelphia, PA, October [24] B. Xiang and T. Berger. Efficient text-independent speaker verification with structural gaussian mixture models and neural network. IEEE Trans. Speech Audio Processing, 11(5): , September 2003.

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN2300 Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Objective Speech Quality Measures for Internet Telephony

Objective Speech Quality Measures for Internet Telephony Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice

More information

Vector Quantization and Clustering

Vector Quantization and Clustering Vector Quantization and Clustering Introduction K-means clustering Clustering issues Hierarchical clustering Divisive (top-down) clustering Agglomerative (bottom-up) clustering Applications to speech recognition

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions

1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions Rec. ITU-R SM.853-1 1 RECOMMENDATION ITU-R SM.853-1 NECESSARY BANDWIDTH (Question ITU-R 77/1) Rec. ITU-R SM.853-1 (1992-1997) The ITU Radiocommunication Assembly, considering a) that the concept of necessary

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF THE µ-law PNLMS ALGORITHM Laura Mintandjian and Patrick A. Naylor 2 TSS Departement, Nortel Parc d activites de Chateaufort, 78 Chateaufort-France

More information

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS

PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Lecture 1-10: Spectrograms

Lecture 1-10: Spectrograms Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3 Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is

More information

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP Department of Electrical and Computer Engineering Ben-Gurion University of the Negev LAB 1 - Introduction to USRP - 1-1 Introduction In this lab you will use software reconfigurable RF hardware from National

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Tutorial about the VQR (Voice Quality Restoration) technology

Tutorial about the VQR (Voice Quality Restoration) technology Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

5th Congress of Alps-Adria Acoustics Association NOISE-INDUCED HEARING LOSS

5th Congress of Alps-Adria Acoustics Association NOISE-INDUCED HEARING LOSS 5th Congress of Alps-Adria Acoustics Association 12-14 September 2012, Petrčane, Croatia NOISE-INDUCED HEARING LOSS Davor Šušković, mag. ing. el. techn. inf. davor.suskovic@microton.hr Abstract: One of

More information

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain

More information

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

More information

From Concept to Production in Secure Voice Communications

From Concept to Production in Secure Voice Communications From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Non-Data Aided Carrier Offset Compensation for SDR Implementation Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

More information

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering

More information

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING Dennis P. Driscoll, P.E. and David C. Byrne, CCC-A Associates in Acoustics, Inc. Evergreen, Colorado Telephone (303)

More information

RF Network Analyzer Basics

RF Network Analyzer Basics RF Network Analyzer Basics A tutorial, information and overview about the basics of the RF Network Analyzer. What is a Network Analyzer and how to use them, to include the Scalar Network Analyzer (SNA),

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

62 Hearing Impaired MI-SG-FLD062-02

62 Hearing Impaired MI-SG-FLD062-02 62 Hearing Impaired MI-SG-FLD062-02 TABLE OF CONTENTS PART 1: General Information About the MTTC Program and Test Preparation OVERVIEW OF THE TESTING PROGRAM... 1-1 Contact Information Test Development

More information

Perceived Speech Quality Prediction for Voice over IP-based Networks

Perceived Speech Quality Prediction for Voice over IP-based Networks Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,

More information

Application Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN

Application Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN Using PESQ to Test a VoIP Network Application Note Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com

More information

Impedance 50 (75 connectors via adapters)

Impedance 50 (75 connectors via adapters) VECTOR NETWORK ANALYZER PLANAR TR1300/1 DATA SHEET Frequency range: 300 khz to 1.3 GHz Measured parameters: S11, S21 Dynamic range of transmission measurement magnitude: 130 db Measurement time per point:

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Implementation of Digital Signal Processing: Some Background on GFSK Modulation Implementation of Digital Signal Processing: Some Background on GFSK Modulation Sabih H. Gerez University of Twente, Department of Electrical Engineering s.h.gerez@utwente.nl Version 4 (February 7, 2013)

More information

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Voice Digitization in the POTS Traditional

More information

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

More information

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm 1 Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm Hani Mehrpouyan, Student Member, IEEE, Department of Electrical and Computer Engineering Queen s University, Kingston, Ontario,

More information

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS 1. Bandwidth: The bandwidth of a communication link, or in general any system, was loosely defined as the width of

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

The LENA TM Language Environment Analysis System:

The LENA TM Language Environment Analysis System: FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September

More information

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany

Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Early vs. Late Onset Hearing Loss: How Children Differ from Adults. Andrea Pittman, PhD Arizona State University

Early vs. Late Onset Hearing Loss: How Children Differ from Adults. Andrea Pittman, PhD Arizona State University Early vs. Late Onset Hearing Loss: How Children Differ from Adults Andrea Pittman, PhD Arizona State University Heterogeneity of Children with Hearing Loss Chronological age Age at onset Age at identification

More information

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION

TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION (Please read appropriate parts of Section 2.5.2 in book) 1. VOICE DIGITIZATION IN THE PSTN The frequencies contained in telephone-quality

More information

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Loop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements. Application Note 1304-6

Loop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements. Application Note 1304-6 Loop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements Application Note 1304-6 Abstract Time domain measurements are only as accurate as the trigger signal used to acquire them. Often

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Making Spectrum Measurements with Rohde & Schwarz Network Analyzers

Making Spectrum Measurements with Rohde & Schwarz Network Analyzers Making Spectrum Measurements with Rohde & Schwarz Network Analyzers Application Note Products: R&S ZVA R&S ZVB R&S ZVT R&S ZNB This application note describes how to configure a Rohde & Schwarz Network

More information

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final

More information

Audio Coding Algorithm for One-Segment Broadcasting

Audio Coding Algorithm for One-Segment Broadcasting Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information

DeNoiser Plug-In. for USER S MANUAL

DeNoiser Plug-In. for USER S MANUAL DeNoiser Plug-In for USER S MANUAL 2001 Algorithmix All rights reserved Algorithmix DeNoiser User s Manual MT Version 1.1 7/2001 De-NOISER MANUAL CONTENTS INTRODUCTION TO NOISE REMOVAL...2 Encode/Decode

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various

More information

Spectrum Level and Band Level

Spectrum Level and Band Level Spectrum Level and Band Level ntensity, ntensity Level, and ntensity Spectrum Level As a review, earlier we talked about the intensity of a sound wave. We related the intensity of a sound wave to the acoustic

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Cluster analysis with SPSS: K-Means Cluster Analysis

Cluster analysis with SPSS: K-Means Cluster Analysis analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,

More information

A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems

A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems by Sameeh Ullah A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of

More information

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information

A Microphone Array for Hearing Aids

A Microphone Array for Hearing Aids A Microphone Array for Hearing Aids by Bernard Widrow 1531-636X/06/$10.00 2001IEEE 0.00 26 Abstract A directional acoustic receiving system is constructed in the form of a necklace including an array of

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Acoustical Design of Rooms for Speech

Acoustical Design of Rooms for Speech Construction Technology Update No. 51 Acoustical Design of Rooms for Speech by J.S. Bradley This Update explains the acoustical requirements conducive to relaxed and accurate speech communication in rooms

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification (Revision 1.0, May 2012) General VCP information Voice Communication

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information