Audio Scene Analysis as a Control System for Hearing Aids
|
|
- Dustin Edmund Hamilton
- 8 years ago
- Views:
Transcription
1 Audio Scene Analysis as a Control System for Hearing Aids Marie Roch marie.roch@sdsu.edu Tong Huang hty2000tony@yahoo.com Jing Liu jliu 76@hotmail.com San Diego State University 5500 Campanile Dr San Diego, CA, USA Richard R. Hurtig The University of Iowa, 119 SHC Iowa City, IA, USA richard-hurtig@uiowa.edu Abstract It is well known that simple amplification cannot help many hearing-impaired listeners. As a consequence of this, numerous signal enhancement algorithms have been proposed for digital hearing aids. Many of these algorithms are only effective in certain environments. The ability to quickly and correctly detect elements of the auditory scene can permit the selection/parameterization of enhancement algorithms from a library of available routines. In this work, the authors examine the real time parameterization of a frequency-domain compression algorithm which preserves formant ratios and thus enhances speech understanding for some individuals with severe sensorineural hearing loss in the 2-3 khz range. The optimal compression ratio is dependent upon qualities of the acoustical signal. We briefly review the frequency-compression technology and describe a Gaussian mixture model classifier which can dynamically set the frequency compression ratio according to broad acoustic categories which we call cohorts. We discuss the results of a prototype simulator which has been implemented on a general purpose computer. 1 Introduction Sensorineural hearing loss, a hearing deficit due to physiological problems in the cochlea, is estimated to affect over 20 million individuals in the United States [9]. Hearing impaired individuals typically have more problems with speech understanding in situations with lower signal to noise ratio (SNR), and it is common for a hearing aid wearer to be satisfied with their device in quiet environments but not in the presence of noise. Audio scene analysis is the process of automatically extracting information about the environment based upon properties of an observed signal, and has numerous applications in multimedia processing. The application of audio scene analysis to hearing aids permits the selection of signal enhancement algorithms (or parameterizations) which are appropriate to specific situations. Several researchers have used audio scene analysis to classify the background noise of a hearing aid wearer s environment. Typical classes include speech in traffic, clean speech, speech in babble, and so on. Kates [8] proposed exploiting information about the audio scene to permit the selection of signal processing algorithms based upon the audio scene. Kates measured the Mahalanobis distance between features representative of envelope modulation supplemented with linear fits of the spectrum above and below the mean. More recently, Nordqvist and Leijon [14] introduced a discrete observation hidden Markov model (HMM) based classifier for hearing aids using a vector quantization codebook derived from a small number of delta features from cepstral coefficients. They elected to use only the delta features which characterizes the change in the cepstrum as the delta features tend to be reasonably invariant. A second stage HMM used an ad-hoc metric in place of state distributions which merged the results of class specific HMMs, with the class decision based upon the current state in the forward decode. Büchler et. al [2] implemented a variety of machine learning techniques (k-means, histogram driven Bayes classifiers, multilayer perceptrons, and HMMs) as well as a post processing step of voting on the class based on the last N classification decisions. The ergodic HMMs were shown to be the best classifier. A number of features derived from 1 s. intervals 1
2 were explored, the best of which included tonality, width, spectral center of gravity and its fluctuation, pitch variance, and measurements of offset time. This work focuses on a novel application of audio scene analysis to hearing aids. Rather than classifying the background noise, we are interested in categorizing attributes of the foreground speaker for the purpose of enhancing his or her speech. We describe and implement a control system for a frequency domain compression algorithm which maps the frequency information across a specified bandwidth into a lower range where the listener s auditory deficit is less pronounced. The control system permits the dynamic assignment of compression ratio based upon characteristics of the speech signal. Unlike previous work, we do not focus on constructing a system suitable for implementation on today s hearing instruments. As noted by Armstrong [1], Moore s law is applicable to the computational power in hearing aids and we believe that it is reasonable to target research towards future generations of hearing aids rather than current ones. The remainder of the article is organized as follows: the frequency compression algorithm is reviewed in section 2, and the classifier is introduced in section 3. The databases and experiments are described in section 4 and we summarize the results and future directions in section 5. 2 Frequency Compression For some listeners with sufficient high-frequency hearing loss, amplification of the signal across the bandwidth where hearing deficit occurs is insufficient. Even with amplification, these listeners are unable to perceive the formant patterns necessary to understand speech. It is well known that normal listeners are adept at handling frequency compression. Normal listeners have little trouble understanding the speech of children, adult females, or adult males even though they typically cover different bandwidths. Peterson and Barney s [17] study of vowels suggests that it is the ratio of formants which are important for perceiving speech. This observation has led some researchers to examine methods to reduce the bandwidth of a signal presented to a hearing-impaired listener. Studies by several groups have shown that most listeners can understand speech that is compressed to 70% of its original bandwidth [22]. Prior approaches to frequency modification used in hearing aids have not proportionally compressed across the spectrum and the change of proportionality of the formants may counteract the advantage gained by shifting unusable frequency information into a range accessible by the wearer. Parent et. al. [15] describe a frequency transposition system where higher frequencies are shifted to lower ones, but this shift does not preserve the formant ratios. Turner and Hurtig [22] conducted a study of listeners to determine whether a frequency-domain compression algorithm could provide better results than simple amplification when listening to adult male and female talkers. They hypothesized that users with severe hearing loss (> 60 db HL) across the 2 to 3 khz range and less severe loss in the lower frequencies would be most likely to benefit. The study included 15 hearing-impaired listeners who were close matches to the hypothesis criteria (50-60 db HL above 2 khz) as well as a 4 normal-hearing control speakers. Their results showed that 45% of the listeners showed statistically significant improvement for female speakers and 20% of the listeners had improvement for male speakers. Although there were no clear indicators as to how the population that could be helped by frequency compression could be identified, there was a trend for speakers who achieved higher recognition scores on unamplified speech to benefit less from the compressed speech. The algorithm (U.S. patent ) operates on consecutive, non-overlapping frames of speech. Each frame is transformed to the frequency domain and a proportional mapping of frequency bins is performed. Care is taken to preserve the DC portion of the signal. An inverse Fourier transform is applied and the output signal is presented to the hearing aid wearer. The compressed frequencies may optionally be transposed as well. 3 Cohort Detection In Turner and Hurtig s study [22], subjects were tested at varying compression ratios to determine whether compression was useful to the listener and what level of comprehension optimized performance. In addition to being listener dependent, the optimal compression ratio is dependent upon the speech being analyzed and depends upon several variables. The physical characteristics of the speaker play a major role in this. Speakers with shorter vocal tracts will tend to have higher formants in their voiced speech and thus require greater compression than their longer vocal tract counterparts. Alternatively, one can consider spectral differences in classes of articulatory productions. As an example, fricatives tend to have high-frequency energy which is believed to be important for their correct recognition. We will define a cohort to be a set of related classes which might benefit from a common compression ratio setting. Each class is composed of some broadly defined acoustic quality, such as manner or place of articulation, or characteristics of a speaker group. Consequently, the real-time identification of cohorts permits a dynamic setting of the compression ratio. Several factors have motivated our choice of cohort group. We hypothesize that from a human factors point of view, abrupt and frequent changes in compression ratio may be distract-
3 ing to a listener. In addition, for probabilistic classifiers, it has been shown [18] that the average log-likelihood projections (the difference of log probability in a two class problem) produced by multiple observations from the same class have an F-ratio whose lower bound is the F-ratio of any single observation. This implies that in general, the average log likelihood projections from the same class will be more separable. In the current context, this means that if each cohort is active for a long enough period of time and we average the log-likelihood projections, there should be a reduction in the classification error rate. Consequently, we have decided to select cohorts based upon source- and vocal-tract characteristics of the speaker. In particular, we are interested in vocal qualities affected by the vocal-tract length and vocal-fold thickness. For convenience, we will call these groups male and female, but it is important to note that classification of a high-pitched male as belonging to a female cohort and vice-versa is not inappropriate. The male-female separation problem is one at which human listeners are reasonably adept [21, 20], and automatic systems typically perform quite well when given phrases of several seconds. Vergin et al. [23] proposed an approximate formant detector for F1 and F2 and compared the detected values to known means for each gender. Parris and Carrey [16] detected gender by linearly combining the output of gender-dependent sub-word hidden Markov models with the output of an F0 tracker. In both cases, the systems performed with low error rate, but relied on segments of several seconds. By using a sliding window, it is possible to achieve frame by frame classification in real time, but windows of several seconds are inappropriate for conversational speech where turns are likely to occur on a frequent basis. We have trained a classifier using one Gaussian mixture model (GMM) per cohort. GMMs are well known for their ability to model arbitrarily complex distributions with multiple modes and are effective classifiers for many tasks. GMMs consist of N normal distributions, or mixtures. The number of mixtures is typically chosen based upon the empirical performance of training and development data sets. The mixtures are scaled by a set of weights such that all of the weights sum to 1. Thus the scaled sum of the N integrals of the Gaussians is 1, and the model represents a probability distribution. Training is accomplished using the expectation maximization (EM) algorithm, an iterative algorithm which is guaranteed to find a local optimum [13]. The EM algorithm requires an initial model, which we create based upon the partitions induced by vector quantization (VQ) [10]. To reduce computational cost and due to the asymptotic independence of cepstral feature vectors [12], it is assumed that the covariance matrices are diagonal. Details on the algorithms for both VQ and HMMs can be found in standard texts such as [6]. The feature vectors are the Mel-filtered cepstral coefficients (MFCC) [6], which are the dominant feature set in the speech, speaker, and language recognition communities. These are created as follows: Successive frames are formed by multiplying the input with a Hamming window which is shifted between each frame. The short time spectrum of each window is computed with a discrete Fourier transform. The squared magnitude spectrum is filtered in the frequency domain using a set of triangular filters whose center frequencies are regularly spaced on the Mel scale. Finally a discrete cosine transform is applied to the log magnitude squared spectrum, resulting in the energy and a set of Mel-filtered cepstral coefficients (MFCC). The MFCCs are frequently supplemented by their derivatives which are appended to the feature vector. The lower MFCC components are indicative of the overall slope and shape of the frame s Mel-filtered spectrum while the higher order MFCC components represent the finer detail. It is typical to only retain lower order components as fine variations in the spectrum are typically too variable to be of significant use for most classification tasks. Feature vectors are extracted in real time from the input speech. In the recognition phase, the plug-in maximum a- posteriori (MAP) rule is used to decide the class. As the real class distributions are unknown, the estimated ones are used (they are plugged-in ), and it can be shown [6] that a decision rule which selects the largest likelihood will minimize the risk for a 0-1 loss rule. For simplicity, we make the common assumption that observations are independent of one another. 4 Methodology and Experiments Selection of databases for evaluation of the system was motivated by both suitability and availability of prerecorded corpora. The ideal corpus would consist of a large body of labeled microphone speech sampled at 16 khz or faster from a hearing aid collected in conditions similar to that encountered by users on a daily basis. The authors are unaware of any publicly available corpora which meet this criteria. Consequently, we have selected three separate corpora with different strengths and weaknesses. The SPIDRE [11] corpus contains 322 speakers (157 female and 165 male) who participated in varying numbers of approximately 5 m. sessions of unplanned conversational speech in a variety of environments (homes, dormitories, and so on). The combination of word aligned transcriptions, background speech, and environmental noise make the content of this corpus an excellent match for the problem domain. Unfortunately, the data was collected over the public telephone network (8 khz sampling, 8 bit mu-law quantization), resulting in low bandwidth as well as chan-
4 nel effects both from transmission equipment and multiple telephone handsets. Different microphone responses from telephone handsets, particularly from carbon button versus electret microphones, is a well known source of error for speech classification tasks. With the exception of the additive noise sources, these limitations represent additional and unnecessary constraints for hearing aids under most circumstances. In contrast, the TIMIT [5] corpus is a recording-booth quality corpus which consists of 630 speakers (192 female and 438 male) who recorded 10 short shibboleth sentences with an average duration of about 6.1 s. Transcriptions are provided at both the phoneme and word levels, permitting detailed analysis of classification results. The TIMIT transcriptions use 58 classes of narrow transcription phonemes. For analysis, we grouped these into vowels, diphthongs, and the overlapping classes associated with place and manner of articulation. The final database is the NITMIT [7] corpus. NTIMIT is a version of TIMIT that has been transmitted through public telephone network and resampled at the terminating end 1. The labels provided for all three corpora are known to contain some transcription and alignment errors, but can be considered to be reliable in general. Experiments were designed to illustrate the influence of various factors on recognition performance. To provide insight as to the contributions of the sampling rate reduction and transmission channel effects present in the SPIDRE corpus, section 4.1 reports the classification error rate of individual frames with TIMIT, a downsampled 8 khz version of TIMIT as well as NTIMIT. Section 4.2 reports the error rates for the TIMIT and SPIDRE corpora using the aforementioned averaged averaged log-likelihood projections which in many cases provide a better separation between the cohorts. Spurious class changes, which may prove to be a human factors issue for the wearer of a hearing aid, are also investigated. For all experiments, feature vectors contained 12 Mel filtered cepstral coefficients plus energy and their deltas which were extracted from frames created with a 20 ms. Hamming window which advanced without overlap. The 8 khz speech was filtered with 24 Mel filters spanning Hz, while the 16 khz speech was filtered with 35 filters spanning Hz. The increased number of filters was selected in order to provide similar filter widths across the common bandwidth. The first derivatives are appended to the feature vector, resulting in a 26 dimensional feature vector. For each corpus, 25 female and 25 male speakers were selected to serve as training data. From each of the speakers an average of 12 s. was used to construct training sets of 5 1 TIMIT, NTIMIT, and SPIDRE are available through The Linguistic Data Consortium at m. per gender. For the SPIDRE corpus, Raj and Singh s endpointer [18] was used to detect speech activity. In addition, the speech was taken from single gender phone calls to prevent any channel cross talk from contaminating the data. No speech activity detection was used for the TIMIT corpus which has a very brief silence at the beginning and end of each utterance. Five iterations of the k-means algorithm were sufficient to create the codebooks used to initialize the GMMs. The EM algorithm executes for 10 iterations or until convergence is reached. Typically, all ten iterations are executed and the system is close to convergence. During the final iteration, the log likelihood of any given model configuration shows no more than about a 1% improvement from the previous iteration. Male and female models were created for each corpus. Earlier experiments [19] showed that 128 mixtures provided a good trade-off between computation and accuracy, and 128 mixtures are used in all experiments. The TIMIT/NTIMIT development set consisted of the 580 (167 female and 413 male) speakers not used in training. For the SPIDRE development we selected all crossgender phone calls such that neither speaker was one of the 50 training speakers. This resulted in 87 5 m. calls between 66 female and 62 male speakers 5 m. calls. The test set was not split into development and evaluation data due to the smaller population of female speakers for the TIMIT corpus and the overall small number of cross gender calls in SPIDRE where one of the speakers was not in the training set. The system has been implemented on a general purpose computer system running Windows XP, and is capable of running in real time subject to the constraints of the operating system scheduler. There are latency issues associated with the operating system s multimedia system, many of which will be addressed either by the use of ASIO low latency drivers or a port to a digital signal processing board. 4.1 Effects of Sampling Rate and Telephone Transmission The first set of experiments were designed to illustrate the influence of some of the factors in the SPIDRE corpus that are not applicable for hearing aids except in the case of processing telephone speech. We examined the error rate of the classifier on individual frames on the TIMIT (TIMIT16) data. These were then compared to the error rates when the speech was downsampled to 8 khz (TIMIT8), and finally passed through the public telephone network (NTIMIT). Classification was performed on single frames, which resulted in a high error rate but permitted the ability to observe the effects of different environments on different classes of phonemes. The TIMIT16 data had an error rate of.313 which rose to.361 (15% increase) on TIMIT8 and to.411
5 lic telephone network and the additional bandwidth filtering between.2 and 3.5 khz by the public telephone network. In summary, it can be seen that reducing the sample rate and transmitting across a telephone channel has a significant impact on the performance of the classifier. 4.2 Increasing Class Separation and Human Factors Concerns Figure 1. Wide band spectrograms of a female speaker saying [ m ei k^] (make). Diphthongs showed a marked increase in error rate between the 8 and 16 khz corpora. The two spectrograms above show that additional relevant information (the region of the F4) is lost in the upper spectrogram. (31% increase) on the telephone speech of NTIMIT. When broken down by phoneme class, differences in error rate vary significantly. The largest TIMIT16 error rates were for plosives, affricatives, and labiodentals (which of course include some plosives). While the error rates of all categories rose when the speech was degraded, the increase of error rate across phoneme classes was far from uniform. After downsampling to 8 khz, the vowels, diphthongs, and nasals demonstrated the greatest degradation as a result of the downsampling. Figure 1 shows spectrograms for the diphthong [ ei ] in the word make for both the original and downsampled TIMIT. The region of the fourth formant which can be clearly detected in 16 khz speech is completely removed once the Mel filters are applied. When considering the telephone NTIMIT corpus, performance degradation across a wide variety of categories was also observed with nasals and diphthongs showing relative increases in error of over 40% as compared to the baseline, closely followed by alveolars, vowels, and glides. As the same handset was used for transmission for all calls, the degradation can be primarily attributed to channel effects and quantization noise during transmission across the pub- The individual frame results from the previous section can be significantly improved by basing the MAP decision on a short moving average (MA) of the log likelihood projections. As the current frame to be analyzed transitions from one class to another, the MA window will cover both classes and the assumption of a single class in the window will no longer be true, which makes the technique sensitive to speaker change. Conversational speech has a tendency towards shorter speech segments, with many turns being less than 2 s. in length. Consequently, we only consider relatively short windows. The experiments with TIMIT can be considered an optimistic view of the classifier performance while SPIDRE may be thought of as a pessimistic one. Figure 2 shows the results as the moving average length varies. As can be seen, the error rate for both corpora decreases exponentially as the window length increases with an elbow in the.5 to.8 s. range. The TIMIT error rate has decreased to about 5% in the elbow. The performance of SPIDRE has a significantly higher error rate of approximately 24% in the elbow region. The large differences in error can be partially explained by the differing sample rates, quantization, and telephone transmission discussed in the section 4.1. Other differences not explored in section 4.1 are the ambient noise, and the multiple speakers and microphones present in SPIDRE. The first of these are representative of the operating conditions for a hearing aid, but microphone mismatch between training and testing conditions is an avoidable problem for hearing aid applications. While we cannot reliably attribute what percentage of the error is to due to each type, it is well known in the speaker recognition community that microphone mismatch is a significant cause of error. No attempt was made to normalize the speech from the different microphones in this study. In addition, the classification error rate is reported with respect to biological gender rather than our definition of cohort which is based upon acoustic properties of a signal such as the range of the fundamental frequency or the poor harmonics to noise ratio characterized by breathy speech. Such characteristics are typical of, but do not always coincide with a speaker s gender. A precise composition of the cohorts is dependent upon the performance of hearingimpaired listeners at different compression ratios and is be-
6 Error rate Error rate Averaging window (secs.) (a) SPIDRE Averaging window (secs.) (b) 16 khz TIMIT Figure 2. Error rates for 128 mixture models on the telephone-bandwidth SPIDRE and TIMIT corpora. yond the scope of this study which demonstrates the feasibility of the control system. Pr(Segment Length N Secs the labeled frame sets are shorter than a quarter second. The disadvantage to the median filter is that it effectively moves the decision away from the optimal Bayes decision boundary. Assuming that the models are representative of the underlying distributions, this would predict an increased error rate which has been observed in our experiments. In practice, the increase in error rate is small, and varies with the length of the likelihood averaging window. Using a constant length median filter, increase in error rate was insignificant for short likelihood averaging windows and increased to about 4% as the likelihood averaging window approached a 0.8 s. 0.1 No filtering Median 150 ms 5 Summary and Conclusions Secs. Figure 3. Length of identified sequences on a typical SPIDRE conversation. A portion of the cumulative distribution function indicating the percentage of identified segments whose lengths are N seconds. The results with and without a per frame postclassification step of applying a 150 ms median filter are shown. Finally, from a human factors perspective, we must consider how often decisions change from one category to another. Excessive switching may be distracting to the user. In figure 3, we show a portion of the cumulative distribution function of a reasonably typical SPIDRE conversation. Nearly 38% of the contiguous frames labeled as the same gender have a duration of under.25 seconds. When a short (less than.2 s.) median filter is applied, only about 17% of We have discussed a method to enhance speech for listeners with high-frequency hearing loss and to dynamically adapt the system in response to varying qualities of human speech. We have shown that it is possible to make decisions about how a frame of speech should be compressed using approximately 0.5 s. of previous history, making the classification decision suitable for use in conversation. Furthermore, we have considered human factors issues to prevent excessive switching between cohort classes which may negatively impact the user s experience. When tested on a clean speech corpus, the system achieves an error rate of less than 5%. On telephone speech, the error is approximately 24%, but a portion of the error rate is due to microphone mismatch, a situation that is unlikely to characterize the majority of most hearing aid wearers day. Future work will use an F0 detector to decide cohort membership as opposed to physical sex. The Gaussian mixture models used in this study are equivalent to continuous-observation ergodic hidden
7 Markov models and thus similar to the classifier used by Büchler et al. [2] although it is used with different feature set and a for different purpose. Further endeavors to improve the error rate are possible both with respect to the classifier and the feature set. Other classifier organizations such as structural GMMs combined with neural nets [24] and support vector machines [3] have been known to provide good results in other domains. Other feature sets, particularly those which are known to be associated with gender (e.g. one of the breathiness measures reviewed or proposed by Fröhlich et al. [4]) are also areas for further investigation. Finally, a clinical trial should be conducted to determine the effectiveness of the system, and to guide further research. Although overcompression contributes the degrading the naturalness of the speech, it has not been shown that overcompression results in a reduction in speech intelligibility. Should this prove to be true, the classifier could be biased towards deciding in favor of cohorts with higher pitches by assuming a non-uniform prior. 6 Acknowledgments The authors would like to thank Rita Singh for making available the GMM and VQ source code used in [18] and the anonymous reviewers for their thoughtful comments. References [1] S. Armstrong. Integrated circuit technology in hearing aids. J. Acoust. Soc. of Am., 116(4, Pt. 2):2536, October abstract only. [2] M. Büchler, S. Allegro, S. Launer, and N. Dillier. Sound classification in hearing aids inspired by auditory scene analysis. EURASIP Journal on Applied Signal Processing, in press. [3] N. Cristianini and J. Shawe-Taylor. Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, UK, [4] M. Fröhlich, D. Michaelis, and H. W. Strube. Acoustic breathiness measures in the description of pathologic voices. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 2, pages , Seattle, WA, May [5] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue. Timit acoustic-phonetic continuous speech corpus. Technical Report LDC93S1, Linguistic Data Consortium, Philadelphia, PA, [6] X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing. Prentice Hall PTR, Upper Saddle River, NJ, [7] C. Jankowski, A. Kalyanswamy, S. Basson, and J. Spitz. Ntimit: A phonetically balanced, continuous speech, telephone bandwidth speech database. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 1, pages , Albuquerque, NM, April IEEE. [8] J. M. Kates. Classification of background noises for hearingaid applications. J. Acoust. Soc. of Am., 97(1): , January [9] V. D. Larson, D. W. Williams, W. G. Henderson, and L. E. Luethke. Efficacy of 3 commonly used hearing aid circuits: A crossover trial. Journal of the American Medical Association, 284(14): , October [10] Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Trans. Commun., COM-28:84 95, January [11] A. Martin, J. Godfrey, E. Holliman, and M. Przybocki. Spidre corpus. Technical Report LDC94S15 CD-ROM, Linguistic Data Consortium, Philadelphia, PA, [12] N. Merhav and C.-H. Lee. On the asymptotic statistical behavior of empirical cepstral coefficients. IEEE Trans. Signal Processing, 41(5): , May [13] T. K. Moon. The expectation-maximization algorithm. IEEE Signal Processing Mag., 13(6):47 60, November [14] P. Nordqvist and A. Leijon. An efficient robust sound classification algorithm for hearing aids. J. Acoust. Soc. of Am., 115(6): , June [15] T. C. Parent, R. Chmiel, and J. Jerger. Comparison of performance with frequency transposition hearing aids and conventional hearing aids. J. American Academy of Audiology, 8(5): , October [16] E. S. Parris and M. J. Carey. Language independent gender identification. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 2, pages , Atlanta, GA, May [17] G. E. Peterson and H. L. Barney. Control methods used in a study of the vowels. J. Acoust. Soc. of Am., 24(2): , March [18] B. Raj and R. Singh. Classifier-based non-linear projection for adaptive endpointing of continuous speech. Computer Speech and Language, 17(1):5 26, January [19] M. Roch, R. R. Hurtig, J. Liu, and T. Huang. Towards a cohort-selective frequency-compression hearing aid. In Proc. of the The Intl. Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, pages , Las Vegas, NV, June [20] M. F. Schwartz. Identification of speaker sex from isolated, voiceless fricatives. J. Acoust. Soc. of Am., 43(5): , [21] S. Singh and T. Murry. Multidimensional classification of normal voice qualities. J. Acoust. Soc. of Am., 64(1):81 87, July [22] C. W. Turner and R. R. Hurtig. Proportional frequency compression of speech for listeners with sensorineural hearing loss. J. Acoust. Soc. of Am., 106(2): , August [23] R. Vergin, A. Farhat, and D. O Shaughnessy. Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification. In Int. Conf. on Spoken Language Processing, pages , Philadelphia, PA, October [24] B. Xiang and T. Berger. Efficient text-independent speaker verification with structural gaussian mixture models and neural network. IEEE Trans. Speech Audio Processing, 11(5): , September 2003.
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationSolutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationObjective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
More informationVector Quantization and Clustering
Vector Quantization and Clustering Introduction K-means clustering Clustering issues Hierarchical clustering Divisive (top-down) clustering Agglomerative (bottom-up) clustering Applications to speech recognition
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More information1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions
Rec. ITU-R SM.853-1 1 RECOMMENDATION ITU-R SM.853-1 NECESSARY BANDWIDTH (Question ITU-R 77/1) Rec. ITU-R SM.853-1 (1992-1997) The ITU Radiocommunication Assembly, considering a) that the concept of necessary
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationA STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF
A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF THE µ-law PNLMS ALGORITHM Laura Mintandjian and Patrick A. Naylor 2 TSS Departement, Nortel Parc d activites de Chateaufort, 78 Chateaufort-France
More informationPERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationLecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationDepartment of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP
Department of Electrical and Computer Engineering Ben-Gurion University of the Negev LAB 1 - Introduction to USRP - 1-1 Introduction In this lab you will use software reconfigurable RF hardware from National
More informationA Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
More informationTutorial about the VQR (Voice Quality Restoration) technology
Tutorial about the VQR (Voice Quality Restoration) technology Ing Oscar Bonello, Solidyne Fellow Audio Engineering Society, USA INTRODUCTION Telephone communications are the most widespread form of transport
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More information5th Congress of Alps-Adria Acoustics Association NOISE-INDUCED HEARING LOSS
5th Congress of Alps-Adria Acoustics Association 12-14 September 2012, Petrčane, Croatia NOISE-INDUCED HEARING LOSS Davor Šušković, mag. ing. el. techn. inf. davor.suskovic@microton.hr Abstract: One of
More informationAdvanced Speech-Audio Processing in Mobile Phones and Hearing Aids
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain
More informationFinal Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones
Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationSeparation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
More informationNon-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
More informationVideo Affective Content Recognition Based on Genetic Algorithm Combined HMM
Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China
More informationComputer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction
Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper
More informationLog-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network
Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering
More informationACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING
ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING Dennis P. Driscoll, P.E. and David C. Byrne, CCC-A Associates in Acoustics, Inc. Evergreen, Colorado Telephone (303)
More informationRF Network Analyzer Basics
RF Network Analyzer Basics A tutorial, information and overview about the basics of the RF Network Analyzer. What is a Network Analyzer and how to use them, to include the Scalar Network Analyzer (SNA),
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More information62 Hearing Impaired MI-SG-FLD062-02
62 Hearing Impaired MI-SG-FLD062-02 TABLE OF CONTENTS PART 1: General Information About the MTTC Program and Test Preparation OVERVIEW OF THE TESTING PROGRAM... 1-1 Contact Information Test Development
More informationPerceived Speech Quality Prediction for Voice over IP-based Networks
Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,
More informationApplication Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
Using PESQ to Test a VoIP Network Application Note Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com
More informationImpedance 50 (75 connectors via adapters)
VECTOR NETWORK ANALYZER PLANAR TR1300/1 DATA SHEET Frequency range: 300 khz to 1.3 GHz Measured parameters: S11, S21 Dynamic range of transmission measurement magnitude: 130 db Measurement time per point:
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationImplementation of Digital Signal Processing: Some Background on GFSK Modulation
Implementation of Digital Signal Processing: Some Background on GFSK Modulation Sabih H. Gerez University of Twente, Department of Electrical Engineering s.h.gerez@utwente.nl Version 4 (February 7, 2013)
More informationVoice---is analog in character and moves in the form of waves. 3-important wave-characteristics:
Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Voice Digitization in the POTS Traditional
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationA Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman
A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints
More informationBiometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationEnhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm
1 Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm Hani Mehrpouyan, Student Member, IEEE, Department of Electrical and Computer Engineering Queen s University, Kingston, Ontario,
More informationTCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS
TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS 1. Bandwidth: The bandwidth of a communication link, or in general any system, was loosely defined as the width of
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationThe LENA TM Language Environment Analysis System:
FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September
More informationConvention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany
Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationEarly vs. Late Onset Hearing Loss: How Children Differ from Adults. Andrea Pittman, PhD Arizona State University
Early vs. Late Onset Hearing Loss: How Children Differ from Adults Andrea Pittman, PhD Arizona State University Heterogeneity of Children with Hearing Loss Chronological age Age at onset Age at identification
More informationTCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION
TCOM 370 NOTES 99-6 VOICE DIGITIZATION AND VOICE/DATA INTEGRATION (Please read appropriate parts of Section 2.5.2 in book) 1. VOICE DIGITIZATION IN THE PSTN The frequencies contained in telephone-quality
More informationA Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer
A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationLoop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements. Application Note 1304-6
Loop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements Application Note 1304-6 Abstract Time domain measurements are only as accurate as the trigger signal used to acquire them. Often
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMaking Spectrum Measurements with Rohde & Schwarz Network Analyzers
Making Spectrum Measurements with Rohde & Schwarz Network Analyzers Application Note Products: R&S ZVA R&S ZVB R&S ZVT R&S ZNB This application note describes how to configure a Rohde & Schwarz Network
More informationMPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
More informationAudio Coding Algorithm for One-Segment Broadcasting
Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationFigure1. Acoustic feedback in packet based video conferencing system
Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents
More informationDeNoiser Plug-In. for USER S MANUAL
DeNoiser Plug-In for USER S MANUAL 2001 Algorithmix All rights reserved Algorithmix DeNoiser User s Manual MT Version 1.1 7/2001 De-NOISER MANUAL CONTENTS INTRODUCTION TO NOISE REMOVAL...2 Encode/Decode
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationWorkshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking
Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various
More informationSpectrum Level and Band Level
Spectrum Level and Band Level ntensity, ntensity Level, and ntensity Spectrum Level As a review, earlier we talked about the intensity of a sound wave. We related the intensity of a sound wave to the acoustic
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationCluster analysis with SPSS: K-Means Cluster Analysis
analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,
More informationA Soft Computing Based Approach for Multi-Accent Classification in IVR Systems
A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems by Sameeh Ullah A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of
More informationADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1
WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's
More informationA Microphone Array for Hearing Aids
A Microphone Array for Hearing Aids by Bernard Widrow 1531-636X/06/$10.00 2001IEEE 0.00 26 Abstract A directional acoustic receiving system is constructed in the form of a necklace including an array of
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationAcoustical Design of Rooms for Speech
Construction Technology Update No. 51 Acoustical Design of Rooms for Speech by J.S. Bradley This Update explains the acoustical requirements conducive to relaxed and accurate speech communication in rooms
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationVoice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification
Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification (Revision 1.0, May 2012) General VCP information Voice Communication
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More information