A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization

Size: px
Start display at page:

Download "A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization"

Transcription

1 T-ASL A Study of the Cosne Dstance-Based Mean Shft for Telephone Speech Darzaton Mohammed Senoussaou, Patrck Kenny, Themos Stafylaks and Perre Dumouchel Abstract Speaker clusterng s a crucal step for speaker darzaton. The short duraton of speech segments n telephone speech dalogue and the absence of pror nformaton on the number of clusters dramatcally ncrease the dffculty of ths problem n darzng spontaneous telephone speech conversatons. We propose a smple teratve Mean Shft algorthm based on the cosne dstance to perform speaker clusterng under these condtons. Two varants of the cosne dstance Mean Shft are compared n an exhaustve practcal study. We report state of the art results as measured by the Darzaton Error Rate and the Number of Detected Speakers on the LDC CallHome telephone corpus. Index Terms Speaker darzaton, clusterng, Mean Shft, cosne dstance. S I. INTRODUCTION PEAKER darzaton conssts n splttng an audo stream nto homogeneous regons correspondng to speech of partcpatng speakers. As the problem s usually formulated, darzaton requres performng two prncpal steps, namely segmentaton and speaker clusterng. The am of segmentaton s to fnd speaker change ponts n order to form segments known as speaker turns that contan speech of a gven speaker. The am of speaker clusterng s to lnk unlabeled segments accordng to a gven metrc n order to determne the ntrnsc groupng n data. The challenge of speaker clusterng ncreases by vrtue of the absence of any pror knowledge about the consttuent number of speakers n the stream. Model selecton based on the Bayesan nformaton crteron BIC s the most popular method for speaker segmentaton [1][]. BIC can also be used to estmate the number of speakers n a recordng and other Bayesan methods have recently been proposed for ths purpose [3][4]. Herarchcal Agglomeratve Clusterng HAC s by far the most wdespread approach to the speaker clusterng problem. Other methods, ncludng hybrd approaches contnue to be developed [4] [5]. Manuscrpt receved June 1, 13, revsed August 31, 13 and accepted September 4, 13. Ths work was supported by the Natural Scence and Engneerng Research Councl of Canada. Copyrght c 13 IEEE. Personal use of ths materal s permtted. However, permsson to use ths materal for any other purposes must be obtaned from the IEEE by sendng a request to M. Senoussaou s wth Centre de recherche nformatque de Montréal CRIM, Montréal, Qc, H3A 1B9, Canada and wth École de technologe supéreure ÉTS, Montréal, QC, Canada e-mal: P. Kenny and T. Stafylaks are wth Centre de recherche nformatque de Montréal CRIM, Montréal, QC, H3A 1B9, Canada e-mal: P. Dumouchel s wth École de technologe supéreure ÉTS, Montréal, QC, Canada e-mal: In ths work, we focus on the speaker clusterng task rather than speaker segmentaton. We propose a clusterng method whch s capable of estmatng the number of speakers partcpatng n a telephone conversaton, a challengng problem consderng that speaker turns are generally of very short duraton [4]. The method n queston s the so-called Mean Shft MS algorthm. Ths approach s borrowed from the feld of computer vson where t s wdely used to detect the number of colors and for mage segmentaton purposes. The MS algorthm s a nonparametrc teratve mode-seekng algorthm ntroduced by Fukunaga [6]. Despte ts frst appearance n 1975, MS remaned n oblvon except for works such as [7] that amed to generalze the orgnal verson. The MS algorthm reappeared n 0 wth the work of Comancu [8] n mage processng. Recently, Stafylaks et al. [9][10] has shown how to generalze the basc Eucldean space MS algorthm to non-eucldean manfolds so that objects other than ponts n Eucldean space can be clustered. Ths generalzed method was appled to the problem of speaker clusterng n a context where speaker turns were characterzed by multvarate Gaussan dstrbutons. Our choce of the MS algorthm s manly motvated by ts nonparametrc nature. Ths characterstc offers the major advantage of not havng to make assumptons about the shape of data dstrbuton, n contrast to conventonal probablstc darzaton methods. Recently [11], we presented a new extenson of the Eucldean Mean Shft that works wth a cosne dstance metrc. Ths new algorthm was shown to be very effectve for speaker clusterng n large populatons where each speaker was represented by a whole sde of a telephone conversaton. Ths work was motvated by the success of cosne smlarty matchng n the speaker verfcaton feld [1][13][14][15]. Cosne dstance has also been successfully tested n speaker darzaton of the CallHome telephone corpus [16][17]. In ths work, frstly, we propose to test the cosne-based MS algorthm on the darzaton of mult-speaker -wre telephone recordngs. We do not assume that the number of partcpatng speakers s gven. Secondly, we compare two clusterng mechansms that explot the cosne-based MS algorthm wth respect to darzaton performance as measured by the number of speakers detected and the standard darzaton error metrc and wth respect to executon tmes. Although darzaton on telephone conversatons s an mportant and dffcult task, there are not, to our knowledge, any publshed studes on the use of the MS algorthm to solve ths problem. Unlke broadcast news speech, the shortness of

2 T-ASL the speaker turn duraton n telephone speech typcally one second makes the task of properly representng these segments n a feature space more dffcult. In order to deal wth ths problem, we represent each speaker turn by an - vector a representaton of speech segments by vectors of fxed dmenson, ndependent of segment duratons [1]. I-vector features have been used successfully not only n speaker recognton [1][13][14][15] and speaker darzaton and clusterng [16][17][1][11] but also n language recognton [][3]. Although probablstc classfers such as Probablstc Lnear Dscrmnant Analyss have become predomnant n applyng -vector methods to speaker recognton, smple cosne dstance based classfers reman compettve [1][13] and we wll use ths approach n developng the speaker darzaton algorthms presented here. Note that n [4], the authors show that cosne dstance provdes a better metrc than Eucldean dstance n GMMsupervector space. In [16], the authors ntroduced a darzaton system where -vectors were used to represent speaker turns and cosne dstance based k-means clusterng was used to assocate speaker turns wth ndvdual speakers. Tested on two-speaker conversatons, ths approach outperformed a BIC-based herarchcal agglomeratve clusterng system by a wde margn. But, n order for k-means clusterng to work, the number of speakers n a gven conversaton needs to be known n advance, so t s not straghtforward to extend ths approach to the general darzaton problem where the number of speakers partcpatng n the conversaton needs to be determned a smple heurstc s presented n [17]. The man contrbuton of ths paper s to show how usng the Mean Shft algorthm n place of k-means enables ths problem to be dealt wth very effectvely. As our test bed we use the CallHome telephone speech corpus development/test provded by NIST n the year 00 speaker recognton evaluaton SRE. Ths conssts of spontaneous telephone conversatons nvolvng varyng numbers of speakers. The CallHome dataset has been the subject of several studes [17][18][19][][1]. The rest of ths paper s organzed as follows. In Secton II we frst provde some background materal on the -vector feature space that wll be used n ths work. In Secton III, we gve some prelmnares on the orgnal verson of the Mean Shft algorthm and explan how we nclude the cosne dstance by ntroducng a smple modfcaton. Two ways of explotng the MS algorthm for clusterng purposes wll also be gven. In Secton IV, we present dfferent methods of normalzng -vectors for darzaton such normalzatons turn out to be very mportant for our approach. Thereafter, we perform a detaled expermental study and analyss n Sectons V and VI before concludng ths work n Secton VII. II. I-VECTORS FOR SPEAKER DIARIZATION The supervector representaton has been appled wth great success to the feld of speaker recognton, especally when t was exploted n the well-known generatve model named Jont Factor Analyss JFA [5]. In hgh-dmensonal Input: audo sgnal Output: segmentaton Feature extracton/intal segmentaton -vector clusterng Mean Shft Darzaton system -vector extracton Fg. 1. Skeleton of the Mean shft -vector darzaton system: Segmentaton of the speech sgnal s followed by extractng -vectors for each segment and then the -vectors are clustered usng Mean Shft n our case. supervector space, JFA attempts to jontly model speaker and channel varabltes usng a large amount of background data. When a relatvely small amount of speaker data s avalable.e. durng enrolment and test stages, JFA enables effectve speaker modelng by suppressng channel varablty from the speech sgnal. A major advance n the area of speaker recognton was the ntroducton of the low dmensonal feature vectors known as -vectors [1]. We can defne an -vector as the mappng usng a Factor Analyss or a Probablstc Prncpal Component Analyss of a hgh-dmensonal supervector to a low-dmensonal space called total varablty space here the word total s used to refer to both speaker and channel varabltes. Unlke JFA that proposes to dstngush between speaker and channel effects n the supervector space, -vector methods seek to model these effects n a low dmensonal space where many standard pattern recognton methods can be brought to bear on the problem. Mathematcally, the mappng of a supervector X to an - vector x s expressed by the followng formula: X = X UBM + Tx. 1 where X UBM s the supervector of the Unversal Background Model UBM and the rectangular matrx T s the so-called Total Varablty matrx. More mathematcal detals of - vectors and ther estmaton can be found n [1][5][6]. I-vectors have successfully been deployed n many felds other than speaker recognton [16][17][1][][3]. Methods successful n one feld can often be translated to other felds by dentfyng the sources of useful and nusance varablty. Thus n speaker recognton, speaker varablty s useful but t counts as nusance varablty n language recognton. In the darzaton problem, the speaker turn represented by an -vector n our case s the fundamental representaton unt or what we usually call a sample n Pattern Recognton termnology. Moreover, an aggregaton of homogenous - vectors wthn one conversaton represents a cluster speaker n our case or what s commonly known as a class. Thus, the darzaton problem becomes one of clusterng -vectors [11][16][17][1].

3 T-ASL III. THE MEAN SHIFT ALGORITHM The Mean Shft algorthm can be vewed as a clusterng algorthm or as a way of fndng the modes n a nonparametrc dstrbuton. In ths secton we wll present the ntutve dea behnd the Mean Shft mode-seekng process as well as the mathematcal dervatons of ths algorthm. Addtonally, we present two varants of ths algorthm whch can be appled for clusterng purposes. Fnally, the extenson of the tradtonal MS to the cosne-based MS s presented. A. The ntutve dea behnd Mean Shft The ntutve dea of Mean Shft s qute natural and smple. Startng from a gven vector x n a set S = { x 1, x,..., x n } of unlabeled data whch are -vectors n our case we can reach a statonary pont called a densty mode through the teratve process depcted n Algorthm 1. Note that the Algorthm 1 refers to the orgnal Mean Shft process. The mathematcal convergence proof of the sequence of successve postons {y } =1,... s found n [6][8]. Algorthm 1 Mean Shft Intuton dea =1, y = x Center a wndow around y // Intalzaton repeat µ h y // estmate the sample mean of data fallng wthn the wndow.e. neghborhood of y n terms of Eucldean dstance y +1 = µ h y Move the wndow from x to y +1 = +1 untl Stablzaton // a mode has been found B. Mathematcal development Mean Shft s a member of the Kernel Densty Estmaton KDE famly of algorthms also known as Parzen wndowng. Estmatng the probablty densty functon of a dstrbuton usng a lmted sample of data s a fundamental problem n pattern recognton. The standard form of the estmated kernel densty functon ˆf x at a randomly selected pont x s gven by the followng formula 1 : ˆf x = 1 nh d n " k x! x # h =1 where kx s a kernel functon and h s ts radal wdth, referred to as the kernel bandwdth. Ignorng the selecton of kernel type, h s the only tunable parameter n the Mean shft algorthm; ts role s to smooth the estmated densty functon. In order to ensure certan propertes such as asymptotc unbasedness and consstency, the kernel and bandwdth h should satsfy some condtons that are dscussed n detal n [6]. In general, the purpose of KDE s to estmate the densty 1 Note that for smplcty we gnore some constants n the mathematcal dervatons. functon but the Mean shft procedure s only concerned wth locatng the modes of the densty functon f x and not the values of the densty functon at these ponts. To fnd the modes, the Mean shft algorthm dervaton requres calculatng the gradent of the densty functon f x. The estmate of the gradent of the densty functon f x s gven by the gradent of the estmate of the densty functon ˆf x as follows [6][8][7][8][9]: ˆ!f x "!ˆf x = 1 n x # x *!k nh d h =1 = n x # x *x # x nh d+ k+ h. =1 A smple type of kernel s the Epanechnkov kernel gven by the followng formula: # kx = 1! x x " x >1 Let gx be the unform kernel: " gx = # 1 x! x >1 Note that t satsfes: k!x = "c gx 6 where c s a constant and the prme s the dervaton operator. Then we can wrte!ˆf x as:!ˆf x = n # x nh d+ " xg x " x h =1 * n #, = * n # g x " x - x g, =1, nh d+ h /, +, =1./ n #, g +, =1 The expresson: n " x g =1 # m h x = n " g =1 # x! x h x! x h! x x " x h x " x h - / / " x /. /. / s what we refer to as the Mean Shft vector m h x. Note that the Mean Shft vector m h x s just the dfference of the current poston nstance vector x from the next poston presented by the weghed sample mean vector of all data. Indeed, the weghts n the mean formula are gven by the bnary outputs.e. 0 or 1 of the flat kernel gx

4 T-ASL For smplcty, let us denote the unform kernel wth bandwdth h by gx, x, h so that: = 1 x! x g x, x, h # " h. 9 0 x! x > h In other words, gx, x, h selects a subset S h x of n x samples by analogy wth Parzen wndows we refer to ths subset as a wndow n whch the Eucldean parwse dstances wth x are less or equal to the threshold bandwdth h: S h { }. 10 x! x : x " x # h Therefore, we can rewrte the Mean Shft vector as: m h x = µ h x! x 11 where µ h x s the sample mean of the n x samples of S h x: µ h x = 1 " x. 1 n x x!s h x The teratve processng of calculatng the sample mean followed by data shftng whch produces the sequence {y } =1,... referred to n Algorthm1 converges to a mode of the data dstrbuton. C. Mean Shft for speaker clusterng The Mean Shft algorthm can be exploted to deal wth the problem of speaker clusterng n the case where the number of clusters speakers n our case s unknown, as well as other problems such as the segmentaton steps nvolved n mage processng and object trackng [8]. In the followng subsectons, we present two clusterng mechansms based on the MS algorthm, namely, the Full and the Selectve clusterng strateges. Full strategy One may apply the teratve Mean Shft procedure at each data nstance. In general, some of the MS processes wll converge to the same densty mode. The number of densty modes after prunng represents the number of detected clusters and nstances that converge to the same mode are deemed to belong to the same cluster we call these ponts the basn of attracton of the mode. In ths work we refer to ths approach as Full strategy. Selectve strategy Unlke the Full Mean Shft clusterng strategy, we can adapt ths strategy to run the MS process on a subset of data only. The dea s to keep track of the number of vsts to each data pont that occurs durng the evoluton of a Mean Shft process. After the convergence of the frst Mean Shft process the samples that have been vsted are assgned to the frst cluster. We then run a second process startng from one of the unvsted samples and create a second cluster. We contnue to run MS processes one after another untl we have no unvsted data samples. Some of the samples may be allocated to more than one cluster by ths procedure then majorty votng s needed to reconcle these conflcts. Note that the computatonal complexty depends on the number of samples n the Full strategy and t depends only on the number of clusters n the case of the Selectve strategy. A MATLAB mplementaton of the Selectve strategy can be found onlne. In ths work the expermental results of the Full and Selectve clusterng strateges are compared n Secton VI. D. Mean Shft- based on cosne dstance The success of the cosne dstance n speaker recognton s well known [1][13][14][15]. A ratonale for usng cosne dstance nstead of Eucldean dstance can be suppled by postulatng a normal dstrbuton for the speaker populaton as n PLDA [30]. Suppose we are gven a par of -vectors and we wsh to test the hypothess that they belong to the same speaker cluster aganst the hypothess that they belong to dfferent clusters. Because most of the populaton mass s concentrated n the neghborhood of the orgn, speakers n ths regon are n danger of beng confused wth each other. In the case of a par of -vectors whch are close to the orgn, the same speaker hypothess wll only be accepted f the -vectors are relatvely close together. On the other hand, f the -vectors are far from the orgn, they can be relatvely far apart from each other wthout nvaldatng the same speaker hypothess. Hence, n order to ncorporate ths pror knowledge regardng the dstrbuton of the speaker means nto the MS algorthm, we may ether use a the Eucldean dstance and a varable bandwdth that ncreases wth the dstance from the orgn or b fxed bandwdth and the cosne smlarty. The latter approach s evdently preferable. The cosne dstance between two vectors x and y s gven by: # Dx, y =1! x " y x y. 13 The orgnal Mean Shft algorthm based on a flat kernel reles on the Eucldean dstance to fnd ponts fallng wthn the wndow as shown n 10. In [11] we proposed the use of the cosne metrc nstead of the Eucldean one to buld a new verson of the Mean shft algorthm. Only one modfcaton needs be ntroduced n 10; we set S h x! { x : Dx, x " h} 14 where Dx, x s the cosne dstance between x and x gven by the formula 13. Ths corresponds to redefnng the unform kernel as:

5 T-ASL = 1 Dx, x! h #. 15 g x, x, h " 0 Dx, x > h E. Conversaton-dependent bandwdth It s known from the lterature [31] that one of the practcal lmtatons of Mean Shft algorthm s the need to fx the bandwdth h. Usng a fxed bandwdth s not generally approprate, as the local structure of samples can change the data that needs to be clustered. We have found that varyng the bandwdth from one conversaton to another turns out to be useful n darzaton based on Mean Shft algorthm. In order to deal wth the dsparty caused by the varable duraton of conversatons, we adopt a verson of the varable bandwdth scheme proposed n [10]. Ths s desgned to smooth the densty estmator n the case of short conversatons where the number of segments to be clustered s small. The varable bandwdth s controlled by two parameters! and the fxed bandwdth h. For a conversaton c, the conversaton-dependent bandwdth!h c s gven by "!h c =1! nc! 1! h 16 # n c! + 1! h where n c s the number of segments n the conversaton. Note that! h c! h wth equalty f nc s very large. F. Cluster prunng An artfact of the Mean Shft algorthm s that there s nothng to prevent t from producng clusters wth very small numbers of segments. To counter ths tendency, we smply prune clusters contanng a small number of samples less than or equal to a constant p by mergng them wth ther nearest neghbors. IV. I-VECTOR NORMALIZAION FOR DIARIZATION By desgn, -vectors are ntended to represent a wde range of speech varabltes. Hence, raw -vectors need to be normalzed n ways whch vary from one applcaton to another. Based on the above defntons of class and sample n relaton to our problem see Secton II, we wll present n the followng sectons some methods to normalze -vectors whch are sutable for speaker darzaton. A. Prncpal components analyss PCA In [16] t was shown that projectng -vectors onto the conversaton-dependent PCA axes wth hgh varance helps to compensate for ntra-sesson varablty. A further weghtng wth the square root of the correspondng egenvalues was also appled to these axes n order to emphasze ther mportance. The authors of [16] recommend choosng the PCA dmensonalty so as to retan 50 of the data varance. We wll denote ths quantty by r. Ideally each retaned PCA axs represents the varablty due to a sngle speaker n the conversaton. Note that that ths type of PCA s local n the sense that analyss s done on a fle-by-fle bass. Thus has the advantage that no background data s requred to mplement t. B. Wthn Class Covarance Normalzaton WCCN Normalzng data varances usng a Wthn Class Covarance matrx has become common practce n the Speaker Recognton feld [1][13][15]. The dea behnd ths normalzaton s to penalze axes wth hgh ntra-class varance by rotatng data usng a decomposton of the nverse of the Wthn Class Covarance matrx. C. Between Class Covarance Normalzaton BCCN By analogy wth the WCCN approach, we propose a new normalzaton method based on the maxmzaton of the drectons of between class varance by normalzng the - vectors wth the decomposton of the between class covarance matrx B. The between class covarance matrx s gven by the followng formula: B = 1 I " n x! x x! x t 17 n =1 where the sum ranges over I conversaton sdes n a background tranng set, x = 1! x j s the sample mean of n k j=1 speaker turns wthn the conversaton sde and x s the sample mean of all -vectors. A. CallHome data V. IMPLEMENTATION DETAILS We use the CallHome dataset dstrbuted by NIST durng the year 00 speaker recognton evaluaton [18]. CallHome s a mult-lngual 6 languages dataset of mult-speaker telephone recordngs of 1 to 10 mnutes duraton. Fg. depcts the development part of the dataset whch contans 38 conversatons, broken down by the number of speakers to 4 #!"!!" #!"!!" #!"!!" #!"!"!" " #" #"!" #"!"!" " " Fg. CallHome development data set broken down by categores representng the number of partcpatng speakers n conversatons. Fg. 3 CallHome test set broken down by categores representng the number of partcpatng speakers n conversatons. n "!" " " "*+,*-"."*+,*-" /"*+,*-" "*+,-+." "*+,-+." "*+,-+." #"*+,-+." "*+,-+." /"*+,-+."

6 T-ASL speakers. The CallHome test set contans 500 conversatons, broken down by the number of speakers n Fg. 3. Note that the number of speakers ranges from to 4 n the development set and from to 7 n the test set, so that there s a danger of over-tunng on the development set. For our purposes the development set serves to decde whch types of -vector normalzaton to use, to fx the bandwdth parameter h n 15, 16 and to determne a strategy for prunng sparsely populated clusters. Because there s essentally only one scalar parameter to be tuned, our approach s not at rsk for overtunng on the development set. B. Feature extracton 1 Speech parameterzaton Every 10ms, Mel Frequency Cepstral Coeffcents MFCC are extracted from a 5 ms hammng wndow 19 MFC Coeffcents + energy. As s tradtonal n darzaton, no feature normalzaton s appled. Unversal background model We use a gender-ndependent UBM contanng 51 Gaussans. Ths UBM s traned wth the LDC releases of Swtchboard II, Phases and 3; Swtchboard Cellular, Parts 1 and ; and NIST SRE telephone speech only. 3 I-vector extractor We use a gender-ndependent -vector extractor of dmenson 100, traned on the same data as UBM together wth data from the Fsher corpus. C. I-vector normalzaton Among the normalzaton methods presented n Secton IV, only the wthn and the between class covarance matrces need background data to be estmated. In order to estmate them we used telephone speech whole conversaton sdes from the 04 and 05 NIST speaker recognton evaluatons. D. Intal segmentaton The focus n ths work s speaker clusterng rather than segmentaton. Followng the authors of [4] [16], we unformly segmented speech ntervals found by a voce actvty detector nto segments of about one second of duraton. Ths naïve approach to speaker turn segmentaton s tradtonal n darzng telephone speech where speaker turns tend to be very short and Vterb re-segmentaton s generally appled n subsequent processng. Note that the results presented n [16] show that usng reference slence detector offers no sgnfcant mprovement n comparson to ther own speech detector. E. Evaluaton protocol In order to evaluate the performances of dfferent systems we use the NIST Darzaton Error Rate DER as the prncpal measure system performance. Usng the NIST scorng scrpt md-eval-v1.pl 3 we evaluate the DER of the concatenated.rttm fles produced for all conversatons n the development and test sets. As s tradtonal n speaker darzaton of telephone speech, we gnore overlappng speech segments and 3 we tolerate errors less than 50 ms n locatng segment boundares. In addton to DER, the Number of Detected Speakers NDS and ts average calculated over all fles ANDS are also useful performance evaluaton metrcs n the context of clusterng wth unknown numbers of speakers. We adopt a graphcal llustraton of DER vs. NDS to represent systems behavors Fgs. 4 and 5. These graphs are obtaned by sweepng out the bandwdth parameter h. On these graphs, the actual number of speakers s gven by the vertcal sold lne and the estmated number s gven by the dashed lne. VI. RESULTS AND DISCUSSIONS In ths secton we provde a detaled study of the effect of the -vector normalzaton methods descrbed n Secton VI-C. A. Parameter tunng on the development set Fg. 4 Results on the development set obtaned wth PCA -vector normalzaton: Full Mean Shft performances DER/Number of estmated speakers. The mnmum of DER, the correspondng bandwdth h and the number of detected speakers #Spk are also gven for each PCA reducton factor r = 80, 60, 50 and 30. In order to establsh a benchmark we frst ran the two r: 80 DER: 1.60 #Spk: 370 h: r: 50 DER: #Spk: 177 h: r: 80 DER: #Spk: 5 h: r: 50 DER: 1.8 #Spk: 33 h: Fg. 5 Results on the development set obtaned wth PCA -vector normalzaton: Selectve Mean Shft performances DER/Number of estmated speakers. The mnmum of DER, the correspondng bandwdth h and the number of detected speakers #Spk are also gven for each PCA reducton factor r = 80, 60, 50 and r: 60 DER: 1.16 #Spk: 75 h: r: 30 DER: #Spk: 45 h: r: 60 DER: 1.70 #Spk: 38 h: 0.3 r: 30 DER: 1.18 #Spk: 165 h:

7 T-ASL versons of Mean Shft wth PCA normalzaton of -vectors. Each graph n Fgs. 4 and 5 corresponds to a percentage of retaned egenvalues r = 80, 60, 50 and 30 respectvely. In Fgs. 4 and 5 we observe that although the results for the two strateges wth r = 30 are slghtly better than those wth r = 50, the graphs are rregular n the former case so that takng r = 50 as n [16] seems to be the better course. Note that the optmal DER for all confguratons s reached wth an overestmaton of the number of speakers. Fortunately overestmaton s preferable to underestmaton, as t can be remeded by prunng sparsely populated clusters. Impact of length normalzaton We began by testng the effect of length normalzaton of raw -vectors before applyng PCA. Surprsngly, ths smple operaton mproves the DER by absolute row 3 - Len.n - n Table 1. Wth length normalzaton and r = 50, DER decreases from 11.9 see Fg. 4 to 10 Full strategy and from 1. see Fg. 5 to 10. Selectve strategy. Furthermore, the number of detected speakers NDS n the case of Selectve strategy decreases from 33 to 81, thus approachng the actual value of 103. However, n the case of Full strategy the detected NDS ncreases form 177 to 316. TABLE 1 RESULTS ON THE DEVELOPEMNT TEST SET ILLUSTRATING THE EFFECT OF DIFFERENT NORMALIZATION METHODS DER IS THE DIARIZATION ERROR RATE, NDS THE NUMBER OF DETECTED SPEAKERS, h THE BANDWIDTH AND p THE PRUNINING PARARMETER THE ACTUAL NUMBER OF SPEAKERS IS 103. Full MS Selectve MS Norm DER DER method NDS h p NDS h p Len. n WCC BCC Var. h Prun Impact of wthn class covarance normalzaton In ths experment we frst normalze -vectors usng the Cholesky decomposton of the nverse of the WCC matrx, and follow ths wth length normalzaton and PCA projecton. As we see n row 4 of Table 1 WCC normalzaton causes performance degradaton. The DER ncreases from 10 to 11.7 n the Full case and from 10. to 11.7 n the Selectve case compared to both prevous normalzaton methods, namely PCA and length normalzaton. These results were not n lne wth our expectatons derved from our experence n speaker recognton; they may be due to an nteracton between the PCA and WCC normalzatons. Impact of between class covarance normalzaton We proceeded n a smlar way to WCC normalzaton. We project data usng the Cholesky decomposton of the BCC matrx followed by length normalzaton and PCA projecton. In row 5 of Table 1 we notce a remarkable twofold mprovement compared wth row. On the one hand, we obtan a DER decrease from 10 to 7.6 for the Full strategy case and from 10. to 7.7 for the Selectve case. On the other hand, we detect a number of speakers much nearer to the actual value of 103, partcularly n the Selectve case 189 speakers. Conversaton-dependent bandwdth Mean Shft We appled the varable bandwdth scheme gven n formula 16 to the prevous BCC normalzaton system. In row 6 of Table 1, we observe a slght mprovement n DER for both strateges. Cluster prunng Although we succeed n reducng the DER from ~1 to ~7 for both strateges, the estmated number of speakers correspondng to the mnmum of DER s stll hgher than the actual value. As dscussed n Secton III-F we prune clusters contanng a small number of samples less than or equal to a constant p n order to counter ths tendency. The correspondng results appear n the last row of Table 1. We observe that for the Full strategy, mergng clusters havng one nstance p = 1 reduces the estmated number of speakers from 300 to 109 whle the DER slghtly ncreases from 7.5 to 8.3. For the Selectve strategy, wth p = 3 we get a nce mprovement regardng Number of Detected Speakers 111 speakers nstead of 3 whle the DER s essentally unaffected, decreasng from 7.6 to 7.5. B. Results on the test set As we explaned when dscussng the evaluaton protocol Secton V-E, we now present the results obtaned on the test set by usng parameters bandwdth and the prunng factor p tuned on the development set. Table presents the most mportant results. The term Fx. h n row 3 of Table refers to the best system usng fxed bandwdth presented n row 5 of Table 1 BCC. In ths system we used respectvely BCC, length normalzaton followed by length normalzaton and PCA projecton wth r = 50 as optmzed on the development set. In row 4 of table Var. h, the system s exactly the same as the prevous one Fx. h system but wth a varable bandwdth. Fnally, the last row of Table Prun. shows the mpact of clusters prunng on the varable bandwdth system Var. h. TABLE RESULTS ON TEST DATA SET USING OPTIMAL PARAMETERS ESTIMATED ON THE DEVELOPMENT SET. THE TOTAL ACTUAL NUMBER OF SPEAKERS IS 183. Full MS Selectve MS Norm DER DER method NDS h p NDS h p Fx. h Var. h Prun From the results n Table we observe the usefulness of the varable bandwdth n reducng the DER from 14.3 to 1.7 n the Full MS case and from 13.9 to 1.6 n the Selectve case. Observe also that the number of detected speakers NDS s reduced from 3456 to 550 n the Full MS strategy and from 3089 to 310 n the Selectve case. Fnally, n the test set, cluster prunng leads to a degradaton of the

8 T-ASL Raw -vectors BCC normalzaton Length normalzaton whch descrbes the evaluaton protocol, these works present results broken down by the number of partcpatng speakers. Darzaton System DER from 1.6 to 14.3 usng the Selectve strategy, n contrast to what s observed on the development set where the DER showed a slght mprovement. However, cluster prunng on the test set for the Full case s surprsngly helpful to the pont that the DER 1.4 concdes perfectly wth the one optmzed on the development set see the bandwdth h n row 7 of Table 1. Gven that the above results are obtaned usng the parameters tuned on an ndependent dataset, ths confrms the generalzaton capablty of the cosne-based Mean Shft for both clusterng strateges. Among the publcatons reportng results on the CallHome dataset [17][18][19][][1], only Vaquero s thess presents results based on the total DER calculated over all fles [1]. He uses speaker factors rather than -vectors to represent speech segments and he used a multstage system based prncpally on Herarchcal Agglomeratve Clusterng HAC, k-means and Vterb segmentaton. He also estmated tunable parameters on an ndependent development set consstng solely of two-speaker recordngs. However, he was constraned to provde the actual number of speakers as stoppng crteron for HAC n order to acheve a total DER of 13.7 on the test CallHome set. Wthout ths constrant, the performance was Compared to hs results, we were able to acheve a 37 relatve mprovement n the total DER see Table. In summary, we presented some results on development and test sets from whch we can draw the followng conclusons: Length normalzaton of the raw -vectors before PCA projecton helps n reducng DER. PCA wth r = 50 offers the best confguraton. WCC normalzaton degrades performance. BCC normalzaton, followed by length normalzaton and PCA, helps to decrease both DER and NDS. Varable bandwdth combned wth cluster prunng p = 1 appled after length normalzaton, PCA projecton and BCC normalzaton help n reducng DER and NDS n the Full case. Both strateges, namely Full and Selectve, perform equvalently well on development and test sets. In Fg. 6 we depct the best -vectors normalzaton protocol that we adopt n ths study. C. Results broken-down by the number speakers PCA projecton Fg. 6 The best protocol of -vector normalzaton for the MS darzaton systems. In order to compare our results wth those of [17][18][19][] we need to adopt the same conventon for presentng darzaton results. As mentoned n Secton V-E TABLE 3 FULL MEAN SHIFT RESULTS ON TEST-SET DEPICTED AS A FUNCTION OF THE NUMBER OF PATRICIPATING SPEAKERS. Speakers h / p number Dev. Param. Test param. DER Fx. h ANDS / 0 DER Var. h ANDS / 0 DER Fx. h ANDS / 0 DER Var. h ANDS / 0 Indeed, the offcal development set conssted of conversatons wth just to 4 speakers so t s hard to avod tunng on the test set f one wshes to optmze performance on conversatons wth large numbers of speakers. In Tables 3 and 4 we present results broken down by the number of speakers on the test set for the Full and Selectve Mean shft algorthms, wth two tunngs, one on the development rows --5 and the other on the test set rows Recall that the tunable parameters are the nature of the bandwdth.e. fxed or varable, ts value.e. h and the prunng factor p. It s apparent from the tables that all of the Mean Shft mplementatons generalze well from the development set to the test set. From Table 3 we observe frstly that the Full MS mplementaton does not need any fnal cluster prunng.e. p = 0 when we optmze takng account of the number of partcpatng speakers see last column of Table 3. Second, estmatng the number of speakers works better wth a fxed bandwdth see rows 3 and 7 of Table 3 and the DERs are almost comparable to those obtaned wth a conversatondependent bandwdth.e. Var. h. Generally speakng, varable bandwdth helps n reducng DER for recordngs havng small number of speakers, 3, 4 speakers. Fnally, the most mportant observaton from Table 3 s the hgh generalzaton capablty of the Full MS especally n the fxed bandwdth case. Comparng rows and 3 wth rows 6 and 7 of Table 3 we see that the optmal parameters for the test set are the same as those for the development set. Dev. Param. Test param. TABLE 4 SELECTIVE MEAN SHIFT RESULTS ON TEST DATA SET DEPICTED AS A FUNCTION OF THE NUMBER OF PATRICIPATING SPEAKERS. Speakers h / p number DER Fx. ANDS / 3 DER Var. ANDS / 3 DER Fx. ANDS / 3 DER Var. ANDS / 3

9 T-ASL !" #"!" #"!" #" 1/G" G/H" #/#" /I"!/" G/#" G/8" #/8" H/I" #/" /#" G/8" 1" G/H" I/#" I/1" #/1" /" /8" /#" /G" G/" H/I" H/1" /1" /1" /G" /G" 8/I" 8/#" From the results depcted n Table 4 we observe that the fnal cluster prunng s necessary n the Selectve MS case. Compared to Full MS results n Table 3, we observe that DERs are smlar but the Selectve strategy outperforms the Full one regardng the average number of detected speakers ANDS. The combnaton of the varable bandwdth wth the fnal cluster prunng p = 3 enables us to get the best results, both for DER and Average Number of Detected Speakers see rows 4 and 5 and rows 8 and 9 n table 4. The ANDS values are n fact very close to the actual numbers row 6 vs. row 1 wth a slght overestmaton, except n the 6-speaker fles case where there s a slght underestmaton 5.8. Fnally, we observe that the Full strategy generalzes better than the Selectve one n the sense that we were able to reach the best performance on the test usng development tunable parameters. Vterb re-segmentaton Refnng segment boundares between speaker turns usng Vterb re-segmentaton s a standard procedure for mprovng darzaton system performance. Results reported n Table 5 show ts effectveness when combned wth the Mean Shft algorthms. Note that the results wthout Vterb resegmentaton gray entres n table 5 are the results presented n the 6 th and 8 th rows n tables 3 and 4. TABLE 5 IMPACT OF VITERBI RE-SEGMENTATION ON THE TEST-SET RESULTS USING PARAMETRES ESTIMATED ON TEST DATA DEPICTED AS A FUNCTION OF THE Full MS Selectve MS!" NUMBER OF PATRICIPATING SPEAKERS AND MEASURED WITH DER. Fx. h - Vterb Vterb Var. h - Vterb Vterb Fx. h Var. h *++,"-."/"01" 3+.4,"-."/"0!" 567*"-."/"08" - Vterb Vterb Vterb Vterb n [19] estmatng the number of speakers was done separately from speaker clusterng. We compare graphcally n Fg. 7 the results as measured by DER of our best confguratons wth Vterb resegmentaton of the Full and Selectve strateges.e. Full and Selectve systems presented n the 4 th and 8 th rows of Table 5 respectvely wth those n [19][][17]. It s evdent that our results as measured by DER are n lne wth the state-of-theart. To be clear, snce our results were taken from Tables 3 and 4, there was some tunng on the test set as n Dalmasso et al. [19], Castaldo et al. [], and Shum et al. [17] Furthermore, the comparson based on the average number of detected speakers s not possble except n the case of Dalmasso et al. [19]. In Table 6 we compare our best results from Table 4 usng ths crteron wth those of [19]. The results are smlar although the Mean Shft algorthm tends to overestmate the speaker number. TABLE 6 COMPARISON WITH DALMASSO RESULTS BASED ON THE AVRAGE OF THE NUMBER OF DETECTED SPEAKERS ANDS. Actual Number of speakers Dalmasso et al. [19] Selectve MS D. Tme complexty 5--DEF-":-;"56<="""""""""""""""""""""""""" Fg. 7 Comparson of Full and Selectve Mean Shft clusterng algorthms wth state-of-the-art results based on DER for each category of CallHome test set recordngs havng same number of speakers. Tme complexty s not a major concern n ths study but Fg. 8 llustrates the dfference between the Full and the Selectve strateges n ths regard. The average tme for the Full case s seconds per fle vs seconds for the Selectve case Comparson wth exstng state-of-the-art results We conclude ths secton wth a comparson between our results and those obtaned by other authors on the Call Home data although there are several factors whch make back-toback comparsons dffcult. Contrary to [1] and our work, the authors of [17][19][] dd not use a development set ndependent of the test set for parameter tunng. Furthermore n [19] and [], the authors assumed pror hypotheses about the maxmum number of speakers wthn a slce of speech, and Tme s Full Selectve Recordngs of CallHome development set Fg. 8 Tme complextes of the Full and Selectve strateges calculated n seconds on each conversaton of the development set. The horzontal lnes ndcate the processng tmes averaged over all fles.

10 T-ASL VII. CONCLUSION Ths paper provdes a detaled study of the applcaton of the non-parametrc Mean Shft algorthm to the problem of speaker clusterng n darzng telephone speech conversatons usng two varants of the basc clusterng algorthm the Full and Selectve versons. We have suppled n the Appendx a convergence proof whch justfes our extenson of the Mean Shft algorthm from the Eucldean dstance metrc to the cosne dstance metrc. We have shown how, together wth an -vector representaton of speaker turns, ths smple approach to the speaker clusterng problem can handle several dffcult problems --- short speaker turns, varyng numbers of speakers and varyng conversaton duratons. Wth a sngle pass clusterng strategy that s, wthout Vterb re-segmentaton we were able to acheve a 37 relatve mprovement as measured by global darzaton error rate on the Call Home data usng as a benchmark [1], the only other study that evaluates performance n ths way. We have seen how our results usng other metrcs are smlar to the state-of-the art as reported by other authors [16][17][19][]. We have seen that refnng speaker boundares wth Vterb re-segmentaton s also helpful. Usng segment boundares obtaned n ths way could serve as a good ntalzaton for a second pass of Mean Shft clusterng. An nterestng complcaton that would arse n explorng ths avenue s that speaker turns would be of much more varable duraton than n the frst pass based on the unform segmentaton descrbed n Secton V.D. Snce the uncertanty entaled n estmatng an - vector n the case of short speaker turns than n the case of long speaker turns, ths suggests that takng account of ths uncertanty as n [3] would be helpful. APPENDIX In ths appendx we present the mathematcal convergence proof of the cosne dstance-based Mean Shft. Indeed, ths proof s very smlar the one of theorem 1 presented n [8]. Theorem 1 [8]: f the kernel k has a convex and monotoncally decreasng profle, the sequence { ˆf } =1,... converges, and s monotoncally ncreasng. Let us suppose that all vectors n our dataset are constraned to lve n the unt sphere by normalzng ther Eucldean-norm durng MS convergence process. ˆf j+1! ˆf n # j = c k 1! y " x +1 j #! k 1! y " y, / +.. j=1* h h - Due to the convexty of the profle: kx! kx 1 " k #x 1 x! x 1 and snce gx =!kx from 6 than: kx! kx 1 " gx 1 x 1! x we obtan: ˆf +1! ˆf n " c g 1! y # x j * 1! y # x j 0, j=1 h +,! 1! y +1 # x j n = c g 1! y # x j y x +1! y 0 j j=1 h h we know from 8 and 11 that the +1 th poston y +1 s equal to the weghted mean vector, so Thus: wth equalty ff y +1 = y. The sequence n # g 1! y " x j n # y +1 = g 1! y " x j x j. j=1 h j=1 h ˆf +1! ˆf n " c g 1! y # x j y * y +1! y +1 j=1 h h n = c g 1! y # x j 1! y * +1 # y " 0 j=1 h h { ˆf } =1,... s bounded and monotoncally ncreasng, and so s convergent. Ths argument does not show that {y } =1,... s convergent t may be possble to construct pathologcal examples n whch { ˆf } =1,... converges but {y } =1,... does not but t establshes convergence of the Mean Shft algorthm n the same sense as convergence of the EM algorthm s demonstrated n [33]. ACKNOWLEDGMENT Frst we would lke to thank the edtor as well as revewers for ther helpful comments. Then, we would lke to thank Stephen Shum and Najm Dehak from MIT for ther useful dscussons and feedback and also for sharng ther ntal segmentaton wth us. We would lke also to thank our colleague Vshwa Gupta for hs help wth Vterb resegmentaton software and our colleague Perre Ouellet for hs help wth other software. REFERENCES [1] G. Schwarz, Estmatng the dmenson of a model, Ann. Statst. 6, [] S. S. Chen and P. Gopalakrshnan, Clusterng va the bayesan nformaton crteron wth applcatons n speech recognton, n ICASSP 98, vol., Seattle, USA, 1998, pp [3] F. Valente, Varatonal Bayesan methods for audo ndexng, Ph.D. dssertaton, Eurecom, Sep 05. [4] P. Kenny, D. Reynolds and F. Castaldo, Darzaton of Telephone Conversatons usng Factor Analyss, Selected Topcs n Sgnal Processng, IEEE Journal of, vol.4, no.6, pp , Dec. 10. [5] Margarta Kott, Vasslk Moschou, Constantne Kotropoulos, Speaker segmentaton and clusterng, Sgnal Processng, Volume 88, Issue 5, May 08, Pages , ISSN , /j.sgpro [6] K. Fukunaga and L. Hostetler, The estmaton of the gradent of a densty functon, wth applcatons n pattern recognton, IEEE Trans. on Informaton Theory, vol. 1, no. 1, pp. 3 40, January h - /./

11 T-ASL [7] Y. Cheng, Mean Shft, Mode Seekng, and Clusterng, IEEE Trans. PAMI, vol. 17, no. 8, pp , [8] D. Comancu and P. Meer, Mean shft: A robust approach toward feature space analyss, IEEE Trans. Pattern Analyss and Machne Intellgence, vol. 4, no. 5, pp , May 0. [9] T. Stafylaks, V. Katsouros, and G. Carayanns, Speaker clusterng va the mean shft algorthm, n Odyssey 10: The Speaker and Language Recognton Workshop - Odyssey-10, Brno, Czech Republc, June 10. [10] T. Stafylaks, V. Katsouros, P. Kenny, and P. Dumouchel, Mean Shft Algorthm for Exponental Famles wth Applcatons to Speaker Clusterng, Proc. Odyssey Speaker and Language Recognton Workshop, Sngapore, June 1. [11] M. Senoussaou, P. Kenny, P. Dumouchel and T. Stafylaks, Effcent Iteratve Mean Shft based Cosne Dssmlarty for Mult-Recordng Speaker Clusterng, n Proceedngs of ICASSP, 13. [1] N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Frontend factor analyss for speaker verfcaton, IEEE Transactons on Audo, Speech, and Language Processng, Vol. 19, No. 4, May 11, pp [13] N. Dehak, R. Dehak, J. Glass, D. Reynolds, and P. Kenny, "Cosne Smlarty Scorng wthout Score Normalzaton Technques," Proc. IEEE Odyssey Workshop, Brno, Czech Republc, June 10. [14] N. Dehak, Z. Karam, D. Reynolds, R. Dehak, W. Campbell, and J. Glass, "A Channel-Blnd System for Speaker Verfcaton," Proc. ICASSP, pp , Prague, Czech Republc, May 11. [15] M. Senoussaou, P. Kenny, N. Dehak and P. Dumouchel, An -vector Extractor Sutable for Speaker Recognton wth both Mcrophone and Telephone Speech, n Proc Odyssey Speaker and Language Recognton Workshop, Brno, Czech Republc, June 10. [16] S. Shum, N. Dehak, E. Chuangsuwanch, D. Reynolds, and J. Glass, "Explotng Intra-Conversaton Varablty for Speaker Darzaton," Proc. Interspeech, pp , Florence, Italy, August 11. [17] S. Shum, N. Dehak, and J. Glass, "On the Use of Spectral and Iteratve Methods for Speaker Darzaton," Proc. Interspeech, Portland, Oregon, September 1. [18] A. Martn and M. Przybock, Speaker recognton n a mult-speaker envronment, n Proceedngs of Eurospeech, 01. [19] E. Dalmasso, P. Laface, D. Colbro, C. Var, Unsupervsed Segmentaton and Verfcaton of Mult-Speaker Conversatonal Speech, Proc. Interspeech 05. [] F. Castaldo, D. Colbro, E. Dalmasso, P. Laface, and C. Var, Streambased speaker segmentaton usng speaker factors and egenvoces, n Proceedngs of ICASSP, 08. [1] C. Vaquero Avlés-Casco, Robust Darzaton For Speaker Characterzaton Darzacon Robusta Para Caracterzacon De Locutores, Ph.D. dssertaton, Zaragoza Unversty, 11. [] N. Dehak, P. Torres-Carrasqullo, D. Reynolds, and R. Dehak, "Language Recognton va Ivectors and Dmensonalty Reducton," Proc. Interspeech, pp , Florence, Italy, August 11. [3] D. Martnez, Oldrcht Plchot, Lukas Burget, Ondrej Glembek and Pavel Matejka, Language Recognton n Vectors Space, Proceedngs of Interspeech, Florence, Italy, August 11. [4] H. Tang, S.M. Chu, M. Hasegawa-Johnson and T.S. Huang, Partally Supervsed Speaker Clusterng, Pattern Analyss and Machne Intellgence, IEEE Transactons on, vol.34, no.5, pp.959, 971, May 1. [5] P. Kenny, Jont factor analyss of speaker and sesson varablty: theory and algorthms. Techncal report CRIM-06/08-14, 06. [6] P. Kenny, G. Boulanne, and P. Dumouchel, Egenvoce modelng wth sparse tranng data, IEEE Transactons on Speech and Audo Processng, May 05. [7] D. Comancu, V. Ramesh, and P. Meer, Kernel-based object trackng. IEEE Transactons on Pattern Analyss and Machne Intellgence, 55, [8] B. Georgescu, I. Shmshon, and P. Meer, Mean shft based clusterng n hgh dmensons: A texture classfcaton example, n Proceedngs of Internatonal Conference on Computer Vson pp [9] U. Ozertem, D. Erdogmus, R. Jenssen, Mean shft spectral clusterng. Pattern Recognton, Volume 41, Issue 6, June 08, Pages [30] D. Garca-Romero, Analyss of -vector length normalzaton n Gaussan-PLDA speaker recognton systems, n Proceedngs of Interspeech, Florence, Italy, Aug. 11. [31] D. Comancu, V. Ramesh, and P. Meer, The Varable Bandwdth Mean Shft and Data-Drven Scale Selecton, Proc Eghth Intl Conf. Computer Vson, vol. I, pp , July 01. [3] P. Kenny, T. Stafylaks, P. Ouellet, J. Alam, and P. Dumouchel, PLDA for Speaker Verfcaton wth Utterances of Arbtrary Duraton, In Proceedng of ICASSP, Vancouver, Canada, May 13. [33] A. P. Dempster, N. M. Lard, and D. B. Rubn, Maxmum lkelhood from ncomplete data va the EM algorthm, Journal of the Royal Statstcal Socety, Seres B Methodologcal, vol. 39, no. 1, pp. 1 38, M. Senoussaou receved the Engneer degree n Artfcal Intellgence n 05 and Magster Masters degree n 07 from Unversté des Scences et de la Technologe d Oran, Algera. Currently h s a PhD student n the École de technologe supéreure ÉTS of Unversté du Québec, Canada and also wth Centre de recherche nformatque de Montréal CRIM, Canada. Hs research nterests are concentrated to the applcaton of Pattern Recognton and Machne learnng methods to the speaker verfcaton and Darzaton problems. P. Kenny receved the BA degree n Mathematcs from Trnty College, Dubln and the MSc and PhD degrees, also n Mathematcs, from McGll Unversty. He was a professor of Electrcal Engneerng at INRS- Telecommuncatons n Montreal from 1990 to1995 when he started up a company Spoken Word Technologes to spn off INRSs speech recognton technology. He joned CRIM n 1998 where he now holds the poston of prncpal research scentst. Hs current research nterests are n text-dependent and text-ndependent speaker recognton wth partcular emphass on Bayesan methods such as Jont Factor Analyss and Probablstc Lnear Dscrmnant Analyss. T. Stafylaks receved the Dploma degree n electrcal and computer engneerng from the Natonal Techncal Unversty of Athens NTUA, Athens, Greece, and the M.Sc. degree n communcaton and sgnal processng from Imperal College London, London, U.K., n 04 and 05, respectvely. He receved hs Ph.D. from NTUA on speaker darzaton, whle workng for the Insttute for Language and Speech Processng, Athens as research assstant. Snce 11, he s a post-doc researcher at CRIM and ETS, under the supervson of Patrck Kenny and Perre Dumouchel, respectvely. Hs current nterests are speaker recognton and darzaton, Bayesan modelng and multmeda sgnal analyss. P. Dumouchel receved B.Eng. McGll Unversty, M.Sc. INRS-Télécommuncatons, PhD INRS- Télécommuncatons, has over 5 years of experence n the feld of speech recognton, speaker recognton and emoton detecton. Perre s Charman and Professor at the Software Engneerng and IT Department at École de technologe supéreure ETS of Unversté du Québec, Canada.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Lecture 18: Clustering & classification

Lecture 18: Clustering & classification O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

The Analysis of Outliers in Statistical Data

The Analysis of Outliers in Statistical Data THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Data Visualization by Pairwise Distortion Minimization

Data Visualization by Pairwise Distortion Minimization Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven)

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven) Clusterng Gene Epresson Data Sldes thanks to Dr. Mark Craven Gene Epresson Proles we ll assume we have a D matr o gene epresson measurements rows represent genes columns represent derent eperments tme

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions Usng Mxture Covarance Matrces to Improve Face and Facal Expresson Recogntons Carlos E. homaz, Duncan F. Glles and Raul Q. Fetosa 2 Imperal College of Scence echnology and Medcne, Department of Computng,

More information

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University Tme Seres Analyss n Studes of AGN Varablty Bradley M. Peterson The Oho State Unversty 1 Lnear Correlaton Degree to whch two parameters are lnearly correlated can be expressed n terms of the lnear correlaton

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: zoubn@cs.toronto.edu Techncal

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays

VoIP Playout Buffer Adjustment using Adaptive Estimation of Network Delays VoIP Playout Buffer Adjustment usng Adaptve Estmaton of Network Delays Mroslaw Narbutt and Lam Murphy* Department of Computer Scence Unversty College Dubln, Belfeld, Dubln, IRELAND Abstract The poor qualty

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15

The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15 The Analyss of Covarance ERSH 830 Keppel and Wckens Chapter 5 Today s Class Intal Consderatons Covarance and Lnear Regresson The Lnear Regresson Equaton TheAnalyss of Covarance Assumptons Underlyng the

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

Eye Center Localization on a Facial Image Based on Multi-Block Local Binary Patterns

Eye Center Localization on a Facial Image Based on Multi-Block Local Binary Patterns Eye Center Localzaton on a Facal Image Based on Mult-Bloc Local Bnary Patterns Anatoly tn, Vladmr Khryashchev, Olga Stepanova Yaroslavl State Unversty Yaroslavl, Russa anatolyntnyar@gmal.com, vhr@yandex.ru,

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

On Mean Squared Error of Hierarchical Estimator

On Mean Squared Error of Hierarchical Estimator S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

A Multi-mode Image Tracking System Based on Distributed Fusion

A Multi-mode Image Tracking System Based on Distributed Fusion A Mult-mode Image Tracng System Based on Dstrbuted Fuson Ln zheng Chongzhao Han Dongguang Zuo Hongsen Yan School of Electroncs & nformaton engneerng, X an Jaotong Unversty X an, Shaanx, Chna Lnzheng@malst.xjtu.edu.cn

More information

Statistical Approach for Offline Handwritten Signature Verification

Statistical Approach for Offline Handwritten Signature Verification Journal of Computer Scence 4 (3): 181-185, 2008 ISSN 1549-3636 2008 Scence Publcatons Statstcal Approach for Offlne Handwrtten Sgnature Verfcaton 2 Debnath Bhattacharyya, 1 Samr Kumar Bandyopadhyay, 2

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

A Comparative Study of Data Clustering Techniques

A Comparative Study of Data Clustering Techniques A COMPARATIVE STUDY OF DATA CLUSTERING TECHNIQUES A Comparatve Study of Data Clusterng Technques Khaled Hammouda Prof. Fakhreddne Karray Unversty of Waterloo, Ontaro, Canada Abstract Data clusterng s a

More information

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Lecture 3: Force of Interest, Real Interest Rate, Annuity Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

Enabling P2P One-view Multi-party Video Conferencing

Enabling P2P One-view Multi-party Video Conferencing Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information