Compensation Approaches for Far-field Speaker Identification
|
|
|
- Jeffry Jordan
- 10 years ago
- Views:
Transcription
1 Compensation Approaches for Far-field Speaer Identification Qin Jin, Kshitiz Kmar, Tanja Schltz, and Richard Stern Carnegie Mellon University, USA Abstract While speaer identification performance has improved dramatically over the past years, the presence of interfering noise and the variety of channel conditions pose a major obstacle. Particlarly the mismatch between training and test condition leads to severe performance degradations. In this paper we explored several approaches to compensation for the effects of reverberation inclding compensation sing linear post-filtering and frame-base score competition.. Introdction Over the years atomatic Speaer IDentification (SID) has developed into a rather matre technology that is crcial to a large variety of spoen langage applications. However, SID systems still lac robstness, i.e. their performance degrades dramatically when the acostic training data mismatch with the given test conditions [][]. Robstness is crrently the major challenge for real-world applications of speaer recognition. We proposed a reverberation compensation approach sing cepstral post-filtering [8]. The goal of this processing was to minimize the sqared distortion between training and testing featres by passing the cepstral featres that emerge from the featre extraction process throgh a linear filter. As noted above, the filter was designed to minimize mean sqare distortion between cepstral in the training and testing conditions. We also proposed the Frame based Score Competition (FSC) approach in [5] to improve speaer recognition in far-field sitations. In this paper we frther elaborate this approach by adding simlated data on a large variety of noise conditions, i.e. we artificially create additional data by applying a filter approach and extend the nmber and variety of models for the competition approach. The paper is organized as follows: In the next section we describe the reverberation compensation approach sing cepstral post-filtering and experimental reslts on the YOHO database. Section 3 describes the Frame-based Score Competition approach and shows the experimental reslts on a live FarSID database and section 4 concldes the paper. featres derived from the corresponding clean speech. Let x[n] represent the cepstral featres of a speech waveform (and not the original waveform itself), and let y [n] represent the corresponding cepstral featres after ndergoing room reverberation. Becase reverberation can be thoght of as the convoltion of the inpt speech with the effective implse response of a room, there wold be a constant difference between x[n] and y [n] if these featres represented long-term cepstra of the entire waveform. When x[n] and y [n] are cepstral coefficients of brief segments of speech (as in shorttime Forier analysis), there is an interaction between the speech and the analysis window and the difference between x[n] and y [n] is no longer constant. For simplicity, we propose that the reverberated cepstral featres y [n] can be represented as the convoltion between the inpt cepstra x[n] and the cepstral coefficients h[n] representing the effects of the room: y N h [] n = x[] n + h[][ i x n i] i= [ n] M n Referring to Figre, x = x = represents clean speech featres and y [ ] M = y n n= represents the corresponding reverberated featres. The sbscript in y indicates ncompensated speech in the testing environment. Ths, the assmption in () implies a linear filter in the cepstral featre h = h L. Note that in this h N h domain with filter taps being [ ] representation h[] =. As a reslt of reverberation, system training will be performed on the featres of clean speech x bt testing will se reverberated featres y. We define the instantaneos ncompensated distortion d[n] to be d [] n = x[] n y [] n = x[] n h[] n x[] n = h[][ i x n i] N h i= (The second eqality is valid becase h[] =.) () (). Reverberation Compensation sing Cepstral Post-Filtering.. Mathematical Representation of Reverberation This section provides a representation of reverberated speech in terms of the corresponding clean speech. The SID system wors in conventional fashion, by extracting featres from the signal and determining which of a set of trained models provides the best match to an ensemble of incoming featres. In order to maintain high SID accracy, it is desirable that the featres derived from reverberated speech closely match Figre : Bloc diagram representing the model of reverberation and compensation We compensate for the effects of reverberation by imposing a finite-implse response LTI filter on the observed featres (p in Figre ). We refer to the otpts of these featres as compensated, and we se the notations x c and y c to indicate
2 featres representing compensated clean speech and reverberated speech, respectively. We define the instantaneos compensated distortion d c [n] to be the difference between x c [n] and y c [n]: d n = x n y n = p n x n p n h n x n = c [] c[] c[] [] [] [] [] [ ] N p p[] n d [] n = p[] j d [ n j] j = Where Np is the nmber of taps in the p filter. We see to obtain the optimal p filter which, when applied to both x and y, minimizes the mean sqare compensated distortion d c [n] as defined above... Soltion to the Minimization Problem In this section, we determine the optimal p filter as defined in Sec... We define the objective for optimization to be the minimm expected distortion between the compensated training and testing featres, and we find p to minimize E[ d c [ n ]. Using (3), obtain E d c [] n as below: [ ] [ ] c p[ i] p[ j] E d[ i] d[ j] d = E d n c = i, j NP d in (4), the terms [ [ ] [ ] For evalating c E d m d n can be obtained by sing () as below: E d [ m] d [ n] = E h[ i] h[ j] x[ m i] x[ n j] (5) i, j Nh Frther, assming that E h[ i] h[ j] = σδ[ i j], σ (6) E hih [ ] [ jxmxn ] [ ] [ ] = E hih [ ] [ j] E xmxn [ ] [ ], ijmn,,, with δ being Kronecer delta, we can obtain E[ d [ m] d[ n ] in (5) as E[ d [ m] d [ n] ] = N σ R [ n m] (7) h x where Rx is the atocorrelation seqence of x. Sbstitting (7) into (4), we obtain d = N σ c h p[] i p[ j] Rx[ i j] (8) i, j Np We can differentiate (8) with respect to p to find the optimal p bt this will reslt in the optimal p being : if all the elements in p are eqal to, all featres in x and y will be mapped to, and the mean sqare distortion d c will always be zero as well. While this is clearly the optimal soltion in the mathematical sense, it is not a sefl soltion. In order to avoid the degenerate soltion p = we frther constrain p: N p p[ j] (9) The constraint in (9) means that the p filter mst have non-zero DC gain. Next, the fact that the p filter will be applied to both training and testing implies that scaling featres by the same factor in both training and testing will leave the SID accracy nchanged. This implies that we lose no generality by sing the more specific constraint on p N p p[ j] = () (4) (3) To minimize d c in (8) nder (), we constrct a Lagrangian optimization criterion as below: Λ ( p, λ) = N σ h p[ i] p[ j] Rx[ i j] + i, j Np N p () λ( p j ) [ ] Differentiating () with respect to [,λ] p and eqating the differentials to zero, we can obtain the optimal p as below: R [] R [] L R [ N ] x x x p R [] R [] R [ N 3] x x x p L L L L L R [ N ] R [ N 3] L R [] x p x p x L () p[] p[] L = L pn [ ] p λ ' λ where λ = Rx[]. Note that the nnown in N hσ N σ h de to the reverberation filter h has been incorporated into λ '. For later reference and compactness, we write () eqivalently as (3). T p Φ = b (3) λ We note that Rx, the atocorrelation seqence of the clean featres nderlying the reverberated featres, is the only nnown reqired to find p in (3), and specifically that the optimal soltion does not depend on the reverberation filter h. We have ths designed an optimal post-filter p which is invariant to the reverberation de to h and ths far depends only on Rx. Of corse, an SID system operating in a reverberant environment can observe directly the reverberated featres y, bt the atocorrelation of the clean speech Rx is not directly observable. Nevertheless, we can approximate Rx as the atocorrelation seqence obtained from clean featres extracted from training data: R [ m] R [ m], m=, L N (4) x T p where R T [m] is the atocorrelation seqence obtained from clean featres in training data. Combining (4) and (), we can solve for p, so the p is invariant to the nderlying clean featres in x. The Φ matrix in (3) is Hermitian bt not Toeplitz, its invertibility garantees a soltion that is both optimal and niqe for p. Φ was fond to be invertible in SID evalations so combining (), (6) (9), and (4), we claim that we have developed an optimal and niqe soltion for p which is not only invariant to the reverberation de to h bt also invariant to the nderlying clean featres in x. This invariance greatly simplifies or SID system. We can design p sing training featres alone and se p to generate compensated training featres x c, as in Figre. The processed featres x c are sed for training speaer models. Dring testing we apply the same filter p to the observed testing featres in y and generate compensated testing featres y c, again as in Figre. Becase the same p is applied across all reverberation conditions, no
3 modification of the filter design needs to be done for any particlar reverberant environment..3. EXPERIMENTAL VERIFICATION.3.. Experimental Procedres We developed the simlated FarSID database for this stdy, which was obtained by degrading clean speech from the English portion of Verbmobil I (VMI) database. The tterances consisted of high-qality recordings of spontaneosly spoen face-to-face dialogs between two speaers who are discssing appointment schedling and travel arrangements []. The data were recorded at a sampling rate of 6 Hz, with 6-bit resoltion, and PCM encoding. For or stdy we selected a sbset of male speaers from the VMI database based on the drations of their tterances. The speaers contribtions are segmented by trns, where each file contains one trn. The trns are annotated for the dialog sessions to accommodate withinand between-session testing. In a second step we distorted the clean speech data by convolving it with simlated implse responses that represent rooms of different size and reverberation time, and with differing distances between the sond sorce and the simlated microphones. The choice of the far-field conditions and the degradation process itself are described in detail in the next two sbsections. The far-field conditions sed to compile this database were based on the reslts of a series of pilot stdies. In these stdies, we passed tterances from the YOHO database [7] throgh simlated room implse responses of varios types. The reslts of the pilot stdy were sed to determine the ranges of the ey parameters of room dimensions, reverberation time, and distance between sorce and microphone that were both compact and meaningfl in terms of the type of reverberation introdced. We ltimately selected three different room sizes, for different reverberation times, and five different distances between sorce and microphone, which were observed to have a significant impact on speaer identification performance nder simlated distorted conditions. The reslting far-field conditions are listed in Table. In total, the FarSID database contains 45 far-field conditions pls the original clean speech condition. We name each of the far-field conditions in the format <room><reverberation><distance>. For example, SR5D refers to a medim conference room with reverberation time.5 seconds and a sorce-microphone distance of centimeters. We distorted the clean speech by convolving it with the simlated Room Implse Response (RIR) generated sing [6]. While the RIR program incorporates the dependence of room implse response on many different physical characteristics of the room and environment, we focsed on three major attribtes, the room size, reverberation time, and distance between the sorce and microphone, sing the parameter vales in Table. We sed the standard definition of reverberation time as the time time reqired for the acostic signal power in the room to decrease by 6 db when a sond sorce is trned off..3.. Experimental Reslts UnCompensated Compensated SR3D5 SR3D SR3D SR3D4 SR5D5 SR5D SR5D SR5D4 SRD5 SRD SRD SRD4 SRD5 SRD SRD SRD4 Figre : SID performance in a medim-sized room with post-filtering reverberation compensation Figre presents SID performance with the Reverberation Compensation algorithm applied across conditions in Medim Room. The compensation algorithm attempts to find an optimal casal linear filter which minimizes sqared distortion de to mismatch in training and testing conditions. Under certain assmptions, the optimal filter is invariant to the RIR as well as invariant to the test speech nder consideration. Compensation is applied individally to different cepstral featres. The nmber of taps in the optima filter was taen as 5. The filter has to be applied dring testing as well as in training and ths new speaer models need to be bilt by sing filtered training speech. Comparing the reslts in Figre 5, we see that applying compensation algorithm provides sbstantial improvement in SID accracy. We see that on an average across different conditions, applying compensation provides a relative redction in error by % with the highest relative redction being %..4. DISCUSSION In this section we discss some of the assmptions made in this approach. At first we assmed a representation for reverberated featres in (). As featres are generated by windowing on overlapping segments of speech signal, an exact relationship between reverberated and clean featres is hard to find. We therefore, needed approximations and borrowed () from representation of reverberation as a LTI filer in time domain. h = was chosen to eep the problem analytically attractable. The assmption of (6) essentially means that the freqency response of the h filter is flat. This wold occr if the RTs were constant over all freqencies. This assmption was made in simlating the speech data that was sed for the reslts in Figre, bt it is not empirically valid, as noted above. Nevertheless, the algorithm was also sccessfl for the environmental conditions smmarized in Figre. These data indirectly validate the proposition that the assmption of (4), althogh physically invalid, still can prodce sefl compensation reslts in practice. We can obtain an estimate of nmber of taps in the optimal p as the nee in the crve describing the dependence of d c on Np.
4 Althogh d c decreases with a larger nmber of taps, the dissimilarity among compensated featres for different speaers also decreases. As Np becomes very large, the p filter converges to be a niform moving average filter that smoothes ot all the data and redces every featre to its mean vale, which is zero for the MFCC featres nder consideration. This redces the SID decisions to a random gess. For these reasons, we expect a local maximm in performance as a fnction of Np. Next, we note that the optimization constrction in section. garantees that for optimal p, the mean sqared distortion for compensated case d c [] n ], is never greater than that for ncompensated case E[ d [] n ]. Frther, the optimal p is a linear phase filter. The post-filtering algorithm was also applied directly to the speech signal in the time domain bt in this case ncompensated case otperformed compensated case. This indicates that the modeling and assmptions in Section. and. is not easily generalizable to the time domain. Or approach for dereverberation is somewhat similar to Wiener Filtering at a conceptal level. The major digression from the Wiener filter is that the p is applied to both training and testing speech, which leads to different reqirements and soltions to the problem. While or approach is invariant to the detailed natre of the reverberation this is not the case for Wiener Filtering. We will consider generalizations of or approach in later stdies. We will also apply the algorithm for speaer verification tass. 3. Frame-base Score Competition (FSC) In this section we first qicly review the decision process of speaer identification systems based on GMM lielihood scores and then smmarize or FSC approach which is described in more detail in [5]. Let S be the total nmber of enrolled speaers and LL( X Θ ) the log lielihood score that the test featre seqence X was generated by the GMM Θ of speaer, which consists of M mixtres of Gassian distribtions. Then the recognized speaer identity S * is given by: S * ( X Θ )} =,, L, S = arg max{ LL (5) Since the vectors of the seqence X = ( x, x, L, xn ) are assmed to be independent and identically distribted, the lielihood score for speaer and model Θ is compted as: LL N ( X Θ ) = LL( Θ ). Or mltiple microphone setp x n n= i allows s to bild mltiple GMM models Θ for each speaer and each channel CH i, reslting in a set CH CH C Θ = { Θ, L, Θ } for speaers and C channels. The ey idea of or FSC approach is to se this set of mltiple GMM models rather than a single GMM model for the speaer identity decision. In each frame we compare featre vector x i provided by channel CH h to the mltiple GMMs of speaer. The highest log lielihood score is chosen to be the frame score. In this case the lielihood score of the observed featres given speaer is compted as: CH LL N ( X Θ ) = LL( x Θ ) n= N = max n= n CH j { LL( xn Θ )} C (6) Note that this process does not rely on models for the test channel. Also, this competition process differs from the monochannel scoring process in that per-frame log lielihood scores for different speaers are not necessarily derived on the same channel. 6 Array Data and Setp A Far-Field Speaer Identification (FarSID) Database has recently been collected at Carnegie Mellon University to stdy the performance of speaer identification algorithms in adverse conditions, inclding a far-field microphone setp, varios interfering noise sorces, and reverberant room characteristics. Similar to the database described in [5] the FarSID database consists of speech recordings from mltiple far-distant microphones as depicted in Figre 3. In addition, to mae the FarSID database even more challenging than its predecessor database, we additionally recorded nder varios noise and reverberation conditions. The FarSID database consists of conversational speech recorded in face-to-face dialog sessions nder two different reverberation conditions (small vs. medim-sized room) nder six noise conditions per room. The noise conditions were applied by playing interfering noises at different Signal to Noise Ratio (SNR) levels (msic, white noise, speech) while recording the respective sessions. Each condition lasted for abot 3 mintes per session per speaer. The different noise conditions and their respective SNR are indicated below: No noise Msic Noise -5 db SNR Msic Noise db SNR White Noise -5 db SNR White Noise db SNR Speaer Interfering Noise -5 db In total we recorded native speaers of American English, where each speaer is engaged by an interviewer in a conversation abot varios topics. Each speaer participated in 4 recording sessions, two in a small and two in a medim-sized room, totaling to hors dration per speaer. Figre 3 shows the distant microphone setp in the FarSID database. The right hand side illstrates the microphone positioning in the 3D space. Five microphones (labeled 9 to and 4) are hanging from the ceiling or are monted to high microphone stands. Seven microphones (labeled to 7) are bilding a microphone array, with a distance of 5cm between each microphone. The microphone array and two other microphones (labeled 3 and 5) are set p on the table which is 3 9 Array Figre 3: Microphone setp in the FarSID database 5
5 arranged between the interviewer and interviewee. In addition, we se two lapel microphones, nmber 8 worn by the interviewer and nmber 6 worn by the interviewee, i.e. the speaer whose identity is to be recognized. The left hand side of Figre 3 illstrates the distance between the speaer (mared by a red X ) and these 6 microphones. One grid nit roghly corresponds to.5 meters. 3.. Experimental Setp and Reslts The experiments reported in this section are based on the FarSID database. The aim of the investigation is to demonstrate the robstness of or FSC approach for far-field scenarios and show how it addresses the challenges posed by mismatched condition, i.e. the fact that speaer models are applied to acostic channel conditions which have not been observed dring training. For this prpose we train and test or approach nder for scenarios, where each scenario is designed to be more challenging than the previos one and more close to the challenges for real-world applications. The for scenarios are described next, followed by the experimental reslts in the respective sbsections. [Match] Matched condition: train a speaer model with data recorded nder the same conditions as in the test case this is the golden line as it reflects the best case scenario. [Mis-MM] Mismatched condition with Mltiple Microphone data: train speaer models with data simltaneosly recorded by mltiple microphones that cover a variety of conditions bt the test condition. [Mis-SM-] Mismatched conditions with Single Microphone data and nowledge abot test condition: train speaer models with data recorded with one microphone nder one condition and tested on a different condition sing some prior nowledge abot the test condition. [Mis-SM-n] lie [Mis-SM-] bt this time we varied the nmber of microphone positions involved for model training. The experiments were carried ot on data with the first noise condition of the FarSID database (see section ), i.e. distant speech with common bacgrond noise sch as air conditioning, compter fans, and reverberation recorded by the 6 microphones (as described above) in a medim-sized room. We selected 6 seconds of these data per speaer to train the speaer model. For testing we selected 3 seconds per speaer of the same noise condition and same room. The major mismatch reslts from the selection of the microphone positions for training and test, as described above. In total we had 6 test trials. All described experiments are condcted as closed-set speaer identification. Performance is measred in terms of identification accracy, i.e. the percentage of correctly identified test trials. The applied speech featres X are Mel Freqency Cepstral Coefficients (MFCC); the speaer models consist of Gassian Mixtre Models (GMM) with 64 Gassians per model Experiment on [Match] Scenario To get the pper bond performance, i.e. the best case scenario we trained and tested nder the matched scenario. The second colmn of Table shows the breadown of speaer identification performance for each microphone position. To get the performance for microphone position y we trained all speaer models on the training trials recorded by microphone y and tested on the test trials recorded by microphone y. So, on average we get 98.4% identification rate on speaers for farfield recordings if we assme that we now the test condition and that we do have recordings in this condition available for each speaer Experiment on [Mis-MM] Scenario The reslts of the second experiment are described in colmn 3 of Table. Here we assme that we do have simltaneos recordings from microphone positions 9-5. To calclate the performance on position 9 we train 6 speaer models per speaer, one on each of the remaining positions,,, 3, 4, and 5. For the final nmber given in the table we average over the identification rates for each of these mismatching conditions. We achieve 9.% accracy over all positions, i.e. a drastic drop from the matched condition. The forth colmn in Table compares this brte-force approach to or FSC techniqe. The same 6 models per speaer are now combined sing competition at the scoring stage. As can been seen FSC significantly improves over the brte-force approach and even gets close to the [Match] performance. This reslt indicates that FSC compensates well for scenarios, in which recordings with mltiple microphones bt the matching one are available for a speaer. Or earlier reslts also showed that the SID performance frther improves if the matching condition is available [5]. Table. Performance for Mlti-Microphone Setp Test [Mis-MM] [Match] [Mis-MM] Microphone FSC Position Position Position Position Position Position Position Average Experiment on [Mis-SM-] Scenario In this next set of experiments we investigate the performance of or FSC approach in the more realistic case where only single microphone recordings are available per speaer. We assme here withot loss of generality to have training data from microphone position 4. As colmn 3 of Table 3 (labeled as ) shows, the performance drops drastically compared to the matched and the mlti-microphone performance. The gap is more sbstantial for microphone positions 9, which is intitively clear as these are frther away from microphone 4 than microphone positions 3 5. The ey idea to mae se of or FSC approach in the single microphone case is to simlate mltiple microphone recordings from the single microphone data. In order to apply FSC, we simlated different channels from the microphone 4 speech. This simlation targets microphone positions 9-5 by convolving the sorce speech with the simlated Room Implse Response (RIR) generated sing [6]. While the RIR program incorporates the dependence of room implse response on many different physical characteristics of the room and environment, we focsed here on three major attribtes, the room size,
6 reverberation time, and distance between the speaer and the microphone. Colmn 4 in Table 3 (labeled -FSC) shows the performance when simlating the 7 microphone positions. FSC on simlated channels significantly otperforms the baseline nder mismatched conditions althogh it cannot beat the performance of FSC on real mlti-microphone data [Mis-MM] FSC from Table. Please note that for both, the FSC on real mlti-microphone data and on simlated mlti-microphone data, we prposely exclde the data from the matching channel, i.e. we assme to not now the microphone position of the test condition. However, we do assme in the simlation to have some nowledge abot possible microphone positions, i.e. the RIR filter do now the actal room size and reverberation time, and create realistic microphone distances. Table 3. Performance for Single-Microphone Setp Test [Match] Microphone FSC Position Position Position Position Position Position Position Average mic 9 mic Matched FSC mic mic mic 3 Test Channel [Mis-MM] FSC Mismatched mic 4 mic 5 Avg Figre 4: SID Performance comparison nder all setps Figre 4 smmarizes or findings on the for cases, i.e. trained and tested on microphone 4 [Match], real mlti-microphone conditions with FSC [Mis-MM]-FSC, single microphone conditions with simlation -FSC, and the mismatched case [Mismatched], in which the models are trained on single microphone data at position 4 and applied to position 9-5 microphones Experiment on [Mis-SM-n] Scenario In this final set of experiments, we compared the impact of the nmber of microphone positions on performance. We tested this by repeating the -FSC experiments bt this time applying FSC on different nmbers of simlated mltimicrophone data streams. FSC6 refers to the case, in which we sed all 6 mismatched microphone data (position 9-5 except the position matched with test condition) to train 6 models. This corresponds to experiment -FSC. FSC5 refers to the case where we sed only 5 ot of 6 simlated mlti-microphone data. Since we can have 6 over 5 = 6 different choices, we averaged the performance over all different choices. In addition, we calclated the best and worst performance depending on the choice. FSC4, FSC3, FSC repeat the same experiments with fewer microphones giving s 6 over 4 = 5, 6 over 3 =, and 6 over = 5 choices, respectively. Figres 5 and 6 show the best and worst performance for all selections mic 9 FSC6 FSC5 FSC4 FSC3 FSC mic mic mic mic 3 mic 4 mic 5 Test Channel Figre 5: Best performance of -FSC mic 9 FSC6 FSC5 FSC4 FSC3 FSC mic mic mic mic 3 mic 4 mic 5 Test Channel Figre 6: Worst performance of -FSC FSC6 Best Average Worst FSC5 FSC4 FSC3 FSC Avg Avg nofsc Figre 7: Performance smmary of FSC with different nmber of channels
7 As can been seen from Figres 5 and 6 the worst selection of microphone positions for the data simlation does not have a significant impact on the system performance compared to the best choice. In other words, the sccess of the FSC approach does not depend on a proper selection of microphone positions for the simlation. Figre 7 compares the worst, average, and best case and with the mismatched condition. Even in the worst case, FSC still significantly improves performance compared to the baseline performance nder mismatched condition (nofsc in Figre 7) 4. Discssion and Conclsions In this paper, we presented an algorithm for reverberation compensation sing Cepstral Post-Filtering. The compensation procedre consisted of a relatively simple FIR filter that is applied to seqences of cepstral coefficients, with the coefficients of the filter optimized to minimize the mean sqare difference between the compensated coefficients for speech in the training and testing environments. The optimal filter obtained was niqe and invariant to the environmental conditions of a particlar test trial. This approach provided significant improvement in SID accracy across different reverberation conditions encompassing simlated as well as actal RIRs and also across different speech databases. In this paper we also reported far-field speaer recognition performances nder mismatched conditions. The aim of the investigation is to demonstrate the robstness of or framebased score competition approach (FSC) for far-field scenarios and show how it addresses the challenges posed by mismatched condition, i.e. the fact that speaer models are sally applied to acostic channel conditions which have not been observed dring training. For this prpose we trained and tested or approach nder for scenarios, where each scenario is designed to be more challenging than the previos one and more close to the challenges for real-world applications. The first scenario assmes to now the test condition, i.e. the best bt most nliely case. The second case assmes to not now the test condition bt to have training samples recorded from mltiple microphone positions for the speaer in qestion. Here, FSC significantly improves over the mismatched case, i.e. applying mlti-microphone data gives more robstness since they cover mltiple microphone positions and ths better prepare for the nnown. In the third scenario we assme to have only singlemicrophone data available and compensate this lac by simlating mlti-microphone data sing room implse response filters. FSC manages to still significantly otperform the mismatched scenario. In other words, even when only single microphone recordings from a speaer are available, the simlation of mltiple microphone recordings combined with or FSC approach improves the overall performance significantly. In the last scenario we vary the selection of microphone positions for the simlated data and show that even if we mae the worst choice of microphone positions, we still see significant improvements over the mismatched case. official policies, either expressed or implied, of the NGA, the United States Government, or Rosettex. We wold also lie to than Fred Goodman for sefl discssions and comments. 6. References [] Chin-Hi Lee, Fran K. Soong, Kldip K. Paliwal, Atomatic Speech and Speaer Recognition: Advanced Topics, Springer, 996, ISBN: [] S. Fri, Towards Robst Speech Recognition Under Adverse Conditions, ESCA Worshop on Speech Processing in Adverse Conditions, p. 3-4, 99. [3] F. Bimbot, J.-F. Bonastre, C. Fredoille, G. Gravier, I. Magrin- Chagnollea, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovsa-Delacretaz and D. A. Reynolds, A Ttorial on Text- Independent Speaer Verification, Jornal on Applied Signal Processing 4, pp , 4. [4] D. Reynolds, Speaer Identification and Verification Using Gassian Mixtre Speaer Models, Speech Commnication, Vol. 7, No. -, p. 9-8, Agst 995. [5] Q. Jin, T. Schltz, and A. Waibel, Far-field Speaer Recognition, IEEE Transactions on Adio, Speech, and Langage Processing, Vol. 5, No.7, p. 3-3, 7. [6] D. Campbell, K. Palomäi, and G. Brown, A MATLAB Simlation of Shoebox Room Acostics for se in Research and Teaching. /V9/V9N3/campbell.doc [7] J.P. Campbell and D.A. Reynolds, Corpora for the evalation of speaer recognition systems, IEEE ICASSP, 999. [8] K. Kmar, and R. M. Stern, Environment-Invariant Compensation for Reverberation sing Linear Post-Filtering for Minimm Distortion, IEEE International Conference on Acostics, Speech, and Signal Processing, April 8, Las Vegas, Nevada. [9] S. G. McGovern, A model for room acostics, [] CMU Sphinx Open Sorce Speech Recognition Engines, [] RWCP Sond Scene Database in Real Acostical Environments, [] S. Brger, K. Weilhammer, F. Schiel, H.G. Tillmann, Verbmobil Data Collection and Annotation, in: Wolfgang Wahlster (Ed.) Verbmobil: Fondations of Speech-to-Speech Translation, Springer-Verlag Berlin, Heidelberg, New Yor, pp ,. 5. Acnowledgements The wor was fnded in part by the National Geospatial- Intelligence Agency (NGA) nder National Technology Alliance (NTA) Agreement Nmber NMA The views and conclsions contained in this docment are those of the athors and shold not be interpreted as representing the
WHITE PAPER. Filter Bandwidth Definition of the WaveShaper S-series Programmable Optical Processor
WHITE PAPER Filter andwidth Definition of the WaveShaper S-series 1 Introdction The WaveShaper family of s allow creation of ser-cstomized filter profiles over the C- or L- band, providing a flexible tool
Spectrum Balancing for DSL with Restrictions on Maximum Transmit PSD
Spectrm Balancing for DSL with Restrictions on Maximm Transmit PSD Driton Statovci, Tomas Nordström, and Rickard Nilsson Telecommnications Research Center Vienna (ftw.), Dona-City-Straße 1, A-1220 Vienna,
ASAND: Asynchronous Slot Assignment and Neighbor Discovery Protocol for Wireless Networks
ASAND: Asynchronos Slot Assignment and Neighbor Discovery Protocol for Wireless Networks Fikret Sivrikaya, Costas Bsch, Malik Magdon-Ismail, Bülent Yener Compter Science Department, Rensselaer Polytechnic
Every manufacturer is confronted with the problem
HOW MANY PARTS TO MAKE AT ONCE FORD W. HARRIS Prodction Engineer Reprinted from Factory, The Magazine of Management, Volme 10, Nmber 2, Febrary 1913, pp. 135-136, 152 Interest on capital tied p in wages,
GUIDELINE. Guideline for the Selection of Engineering Services
GUIDELINE Gideline for the Selection of Engineering Services 1998 Mission Statement: To govern the engineering profession while enhancing engineering practice and enhancing engineering cltre Pblished by
9 Setting a Course: Goals for the Help Desk
IT Help Desk in Higher Edcation ECAR Research Stdy 8, 2007 9 Setting a Corse: Goals for the Help Desk First say to yorself what yo wold be; and then do what yo have to do. Epictets Key Findings Majorities
Planning a Managed Environment
C H A P T E R 1 Planning a Managed Environment Many organizations are moving towards a highly managed compting environment based on a configration management infrastrctre that is designed to redce the
Using GPU to Compute Options and Derivatives
Introdction Algorithmic Trading has created an increasing demand for high performance compting soltions within financial organizations. The actors of portfolio management and ris assessment have the obligation
BLIND speech separation (BSS) aims to recover source
A Convex Speech Extraction Model and Fast Comptation by the Split Bregman Method Meng Y, Wenye Ma, Jack Xin, and Stanley Osher. Abstract A fast speech extraction (FSE) method is presented sing convex optimization
Regular Specifications of Resource Requirements for Embedded Control Software
Reglar Specifications of Resorce Reqirements for Embedded Control Software Rajeev Alr and Gera Weiss University of Pennsylvania Abstract For embedded control systems a schedle for the allocation of resorces
Sickness Absence in the UK: 1984-2002
Sickness Absence in the UK: 1984-2002 Tim Barmby (Universy of Drham) Marco Ecolani (Universy of Birmingham) John Treble (Universy of Wales Swansea) Paper prepared for presentation at The Economic Concil
Corporate performance: What do investors want to know? Innovate your way to clearer financial reporting
www.pwc.com Corporate performance: What do investors want to know? Innovate yor way to clearer financial reporting October 2014 PwC I Innovate yor way to clearer financial reporting t 1 Contents Introdction
Modeling Roughness Effects in Open Channel Flows D.T. Souders and C.W. Hirt Flow Science, Inc.
FSI-2-TN6 Modeling Roghness Effects in Open Channel Flows D.T. Soders and C.W. Hirt Flow Science, Inc. Overview Flows along rivers, throgh pipes and irrigation channels enconter resistance that is proportional
Optimal Trust Network Analysis with Subjective Logic
The Second International Conference on Emerging Secrity Information, Systems and Technologies Optimal Trst Network Analysis with Sbjective Logic Adn Jøsang UNIK Gradate Center, University of Oslo Norway
Introduction to HBase Schema Design
Introdction to HBase Schema Design Amandeep Khrana Amandeep Khrana is a Soltions Architect at Clodera and works on bilding soltions sing the Hadoop stack. He is also a co-athor of HBase in Action. Prior
Curriculum development
DES MOINES AREA COMMUNITY COLLEGE Crriclm development Competency-Based Edcation www.dmacc.ed Why does DMACC se competency-based edcation? DMACC tilizes competency-based edcation for a nmber of reasons.
Deploying Network Load Balancing
C H A P T E R 9 Deploying Network Load Balancing After completing the design for the applications and services in yor Network Load Balancing clster, yo are ready to deploy the clster rnning the Microsoft
A Spare Part Inventory Management Model for Better Maintenance of Intelligent Transportation Systems
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 A Spare Part Inventory Management Model for Better Maintenance of Intelligent
Designing and Deploying File Servers
C H A P T E R 2 Designing and Deploying File Servers File servers rnning the Microsoft Windows Server 2003 operating system are ideal for providing access to files for sers in medim and large organizations.
Motorola Reinvents its Supplier Negotiation Process Using Emptoris and Saves $600 Million. An Emptoris Case Study. Emptoris, Inc. www.emptoris.
Motorola Reinvents its Spplier Negotiation Process Using Emptoris and Saves $600 Million An Emptoris Case Stdy Emptoris, Inc. www.emptoris.com VIII-03/3/05 Exective Smmary With the disastros telecommnication
Purposefully Engineered High-Performing Income Protection
The Intelligent Choice for Disability Income Insrance Prposeflly Engineered High-Performing Income Protection Keeping Income strong We engineer or disability income prodcts with featres that deliver benefits
High Availability for Microsoft SQL Server Using Double-Take 4.x
High Availability for Microsoft SQL Server Using Doble-Take 4.x High Availability for Microsoft SQL Server Using Doble-Take 4.x pblished April 2000 NSI and Doble-Take are registered trademarks of Network
TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings
TrstSVD: Collaborative Filtering with Both the Explicit and Implicit Inflence of User Trst and of Item Ratings Gibing Go Jie Zhang Neil Yorke-Smith School of Compter Engineering Nanyang Technological University
Planning an Active Directory Deployment Project
C H A P T E R 1 Planning an Active Directory Deployment Project When yo deploy the Microsoft Windows Server 2003 Active Directory directory service in yor environment, yo can take advantage of the centralized,
Sample Pages. Edgar Dietrich, Alfred Schulze. Measurement Process Qualification
Sample Pages Edgar Dietrich, Alfred Schlze Measrement Process Qalification Gage Acceptance and Measrement Uncertainty According to Crrent Standards ISBN: 978-3-446-4407-4 For frther information and order
The Boutique Premium. Do Boutique Investment Managers Create Value? AMG White Paper June 2015 1
The Botiqe Premim Do Botiqe Investment Managers Create Vale? AMG White Paper Jne 2015 1 Exective Smmary Botiqe active investment managers have otperformed both non-botiqe peers and indices over the last
2.1 Unconstrained Graph Partitioning. 1.2 Contributions. 1.3 Related Work. 1.4 Paper Organization 2. GRAPH-THEORETIC APPROACH
Mining Newsgrops Using Networks Arising From Social Behavior Rakesh Agrawal Sridhar Rajagopalan Ramakrishnan Srikant Yirong X IBM Almaden Research Center 6 Harry Road, San Jose, CA 95120 ABSTRACT Recent
An unbiased crawling strategy for directed social networks
Abstract An nbiased crawling strategy for directed social networks Xeha Yang 1,2, HongbinLi 2* 1 School of Software, Shenyang Normal University, Shenyang 110034, Liaoning, China 2 Shenyang Institte of
High Availability for Internet Information Server Using Double-Take 4.x
High Availability for Internet Information Server Using Doble-Take 4.x High Availability for Internet Information Server Using Doble-Take 4.x pblished April 2000 NSI and Doble-Take are registered trademarks
A Contemporary Approach
BORICP01.doc - 1 Second Edition Edcational Psychology A Contemporary Approach Gary D. Borich The University of Texas at Astin Martin L. Tombari University of Denver (This pblication may be reprodced for
Research on Pricing Policy of E-business Supply Chain Based on Bertrand and Stackelberg Game
International Jornal of Grid and Distribted Compting Vol. 9, No. 5 (06), pp.-0 http://dx.doi.org/0.457/ijgdc.06.9.5.8 Research on Pricing Policy of E-bsiness Spply Chain Based on Bertrand and Stackelberg
EMC VNX Series. EMC Secure Remote Support for VNX. Version VNX1, VNX2 300-014-340 REV 03
EMC VNX Series Version VNX1, VNX2 EMC Secre Remote Spport for VNX 300-014-340 REV 03 Copyright 2012-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Jly, 2014 EMC believes the information
NAZIA KANWAL VECTOR TRACKING LOOP DESIGN FOR DEGRADED SIGNAL ENVIRONMENT. Master of Science Thesis
NAZIA KANWAL VECTOR TRACKING LOOP DESIGN FOR DEGRADED SIGNAL ENVIRONMENT Master of Science Thesis Examiners: Professor Jari Nrmi, Adjnct Professor Simona Lohan and Dr. Heikki Hrskainen Examiner and topic
Kentucky Deferred Compensation (KDC) Program Summary
Kentcky Deferred Compensation (KDC) Program Smmary Smmary and Highlights of the Kentcky Deferred Compensation (KDC) Program Simple. Smart. For yo. For life. 457 Plan 401(k) Plan Roth 401(k) Deemed Roth
On the urbanization of poverty
On the rbanization of poverty Martin Ravallion 1 Development Research Grop, World Bank 1818 H Street NW, Washington DC, USA Febrary 001; revised Jly 001 Abstract: Conditions are identified nder which the
Chapter 3. 2. Consider an economy described by the following equations: Y = 5,000 G = 1,000
Chapter C evel Qestions. Imagine that the prodction of fishing lres is governed by the prodction fnction: y.7 where y represents the nmber of lres created per hor and represents the nmber of workers employed
Planning and Implementing An Optimized Private Cloud
W H I T E PA P E R Intelligent HPC Management Planning and Implementing An Optimized Private Clod Creating a Clod Environment That Maximizes Yor ROI Planning and Implementing An Optimized Private Clod
Enabling Advanced Windows Server 2003 Active Directory Features
C H A P T E R 5 Enabling Advanced Windows Server 2003 Active Directory Featres The Microsoft Windows Server 2003 Active Directory directory service enables yo to introdce advanced featres into yor environment
Facilities. Car Parking and Permit Allocation Policy
Facilities Car Parking and Permit Allocation Policy Facilities Car Parking and Permit Allocation Policy Contents Page 1 Introdction....................................................2 2.0 Application
Periodized Training for the Strength/Power Athlete
Periodized Training for the /Power Athlete Jay R. Hoffman, PhD, FACSM, CSCS *D The se of periodized training has been reported to go back as far as the ancient Olympic games. Its basic premise is that
Effective governance to support medical revalidation
Effective governance to spport medical revalidation A handbook for boards and governing bodies This docment sets ot a view of the core elements of effective local governance of the systems that spport
LIMITS IN CATEGORY THEORY
LIMITS IN CATEGORY THEORY SCOTT MESSICK Abstract. I will start assming no knowledge o category theory and introdce all concepts necessary to embark on a discssion o limits. I will conclde with two big
7 Help Desk Tools. Key Findings. The Automated Help Desk
7 Help Desk Tools Or Age of Anxiety is, in great part, the reslt of trying to do today s jobs with yesterday s tools. Marshall McLhan Key Findings Help desk atomation featres are common and are sally part
KEYS TO BEING AN EFFECTIVE WORKPLACE PERSONAL ASSISTANT
5 KEYS TO BEING AN EFFECTIVE WORKPLACE PERSONAL ASSISTANT by: John Barrett Personal assistants (PAs) and their ability to effectively provide essential spports at the workplace are extremely important
Executive Coaching to Activate the Renegade Leader Within. Renegades Do What Others Won t To Get the Results that Others Don t
Exective Coaching to Activate the Renegade Leader Within Renegades Do What Others Won t To Get the Reslts that Others Don t Introdction Renegade Leaders are a niqe breed of leaders. The Renegade Leader
Designing an Authentication Strategy
C H A P T E R 1 4 Designing an Athentication Strategy Most organizations need to spport seamless access to the network for mltiple types of sers, sch as workers in offices, employees who are traveling,
Stock Market Liquidity and Macro-Liquidity Shocks: Evidence from the 2007-2009 Financial Crisis
Stock Market Liqidity and Macro-Liqidity Shocks: Evidence from the 2007-2009 Financial Crisis Chris Florackis *, Alexandros Kontonikas and Alexandros Kostakis Abstract We develop an empirical framework
Manipulating Deformable Linear Objects: Characteristic Features for Vision-Based Detection of Contact State Transitions
Maniplating Deformable Linear Objects: Characteristic Featres for Vision-Based Detection of Contact State Transitions Jürgen Acker Dominik Henrich Embedded Systems and Robotics Lab. (RESY) Faclty of Informatics,
Optimal control and piecewise parametric programming
Proceedings of the Eropean Control Conference 2007 Kos, Greece, Jly 2-5, 2007 WeA07.1 Optimal control and piecewise parametric programming D. Q. Mayne, S. V. Raković and E. C. Kerrigan Abstract This paper
Magnetic anomaly detection systems target based and noise based approach
Magnetic anomaly detection systems target based and noise based approach International Scientific CNRS Fall School High Sensitivity Magnetometers "Sensors & Applications" 4 th Edition, Monday - Friday
5 Using Your Verbatim Autodialer
5 Using Yor Verbatim Atodialer 5.1 Placing Inqiry Calls to the Verbatim Atodialer ( Yo may call the Verbatim atodialer at any time from any phone. The nit will wait the programmed nmber of rings before
EMC ViPR. Concepts Guide. Version 1.1.0 302-000-482 02
EMC ViPR Version 1.1.0 Concepts Gide 302-000-482 02 Copyright 2013-2014 EMC Corporation. All rights reserved. Pblished in USA. Pblished Febrary, 2014 EMC believes the information in this pblication is
Bonds with Embedded Options and Options on Bonds
FIXED-INCOME SECURITIES Chapter 14 Bonds with Embedded Options and Options on Bonds Callable and Ptable Bonds Instittional Aspects Valation Convertible Bonds Instittional Aspects Valation Options on Bonds
Planning a Smart Card Deployment
C H A P T E R 1 7 Planning a Smart Card Deployment Smart card spport in Microsoft Windows Server 2003 enables yo to enhance the secrity of many critical fnctions, inclding client athentication, interactive
Evolutionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending
Evoltionary Path Planning for Robot Assisted Part Handling in Sheet Metal Bending Abstract Xiaoyn Liao G. Gary Wang * Dept. of Mechanical & Indstrial Engineering, The University of Manitoba Winnipeg, MB,
A Novel QR Code and mobile phone based Authentication protocol via Bluetooth Sha Liu *1, Shuhua Zhu 2
International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) A Novel QR Code and mobile phone based Athentication protocol via Bletooth Sha Li *1, Shha Zh 2 *1
DIRECT TAX LAWS Taxability of Capital Gains on By-back of Shares - Debate ignites after AAR s rling in RST s case BACKGROUND 1. Recently, the Athority for Advance Rlings ( AAR ) in the case of RST, In
Closer Look at ACOs. Putting the Accountability in Accountable Care Organizations: Payment and Quality Measurements. Introduction
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Janary 2012
Closer Look at ACOs. Making the Most of Accountable Care Organizations (ACOs): What Advocates Need to Know
Closer Look at ACOs A series of briefs designed to help advocates nderstand the basics of Accontable Care Organizations (ACOs) and their potential for improving patient care. From Families USA Updated
Equilibrium of Forces Acting at a Point
Eqilibrim of orces Acting at a Point Eqilibrim of orces Acting at a Point Pre-lab Qestions 1. What is the definition of eqilibrim? Can an object be moving and still be in eqilibrim? Explain.. or this lab,
How To Plan A Cloud Infrastructure
Concrrent Placement, Capacity Provisioning, and Reqest Flow Control for a Distribted Clod Infrastrctre Shang Chen, Yanzhi Wang, Massod Pedram Department of Electrical Engineering University of Sothern
FINANCIAL FITNESS SELECTING A CREDIT CARD. Fact Sheet
FINANCIAL FITNESS Fact Sheet Janary 1998 FL/FF-02 SELECTING A CREDIT CARD Liz Gorham, Ph.D., AFC Assistant Professor and Family Resorce Management Specialist, Utah State University Marsha A. Goetting,
Designing a TCP/IP Network
C H A P T E R 1 Designing a TCP/IP Network The TCP/IP protocol site defines indstry standard networking protocols for data networks, inclding the Internet. Determining the best design and implementation
Resource Pricing and Provisioning Strategies in Cloud Systems: A Stackelberg Game Approach
Resorce Pricing and Provisioning Strategies in Clod Systems: A Stackelberg Game Approach Valeria Cardellini, Valerio di Valerio and Francesco Lo Presti Talk Otline Backgrond and Motivation Provisioning
How To Get A Better Pay For A Nrsing Edcation
Adopted Fall 2005 The Stats of Nrsing Edcation in the California Commnity Colleges T h e A c a d e m i c S e n a t e f o r C a l i f o r n i a C o m m n i t y C o l l e g e s Nrsing Task Force 2004-2005
Contents Welcome to FOXTEL iq2...5 For your safety...6 Getting Started...7 Playlist... 51 Active...53 Setup...54 FOXTEL Guide...18 ON DEMAND...
Contents Welcome to FOXTEL iq2...5 The FOXTEL iq2...5 Updates to FOXTEL iq2...5 Getting in toch with FOXTEL...5 For yor safety...6 Getting Started...7 Switching the FOXTEL iq2 on and off...7 Changing channel...7
NAPA TRAINING PROGRAMS FOR:
NAPA TRAINING PROGRAMS FOR: Employees Otside Sales Store Managers Store Owners See NEW ecatalog Inside O V E R V I E W 2010_StoreTrainingBrochre_SinglePg.indd 1 5/25/10 12:39:32 PM Welcome 2010 Store Training
Inter-Dealer Trading in Financial Markets*
S. Viswanathan Dke University James J. D. Wang Hong Kong University of Science and Technology Inter-Dealer Trading in Financial Markets* I. Introdction Trading between dealers who act as market makers
Opening the Door to Your New Home
Opening the Door to Yor New Home A Gide to Bying and Financing. Contents Navigating Yor Way to Home Ownership...1 Getting Started...3 Finding Yor Home...9 Finalizing Yor Financing...12 Final Closing...13
Member of the NKT Group. We connect renewable energy sources. Onshore, offshore and photovoltaic
Member of the NKT Grop We connect renewable energy sorces Onshore, offshore and photovoltaic Completing the pictre www.nktcables.com We connect renewable energy sorces These days, renewable and clean energies
