Speeding up k-means Clustering by Bootstrap Averaging

Size: px
Start display at page:

Download "Speeding up k-means Clustering by Bootstrap Averaging"

Transcription

1 Speedg up -meas Clusterg by Bootstrap Averagg Ia Davdso ad Ashw Satyaarayaa Computer Scece Dept, SUNY Albay, NY, USA,. {davdso, Abstract K-meas clusterg s oe of the most popular clusterg algorthms used data mg. However, clusterg s a tme cosumg tas, partcularly wth the large data sets foud data mg. I ths paper we show how bootstrap averagg wth -meas ca produce results comparable to clusterg all of the data but much less tme. The approach of bootstrap (samplg wth replacemet) averagg cossts of rug -meas clusterg to covergece o small bootstrap samples of the trag data ad averagg smlar cluster cetrods to obta a sgle model. We show why our approach should tae less computato tme ad emprcally llustrate ts beefts. We show that the performace of our approach s a mootoc fucto of the sze of the bootstrap sample. However, owg the sze of the bootstrap sample that yelds as good results as clusterg the etre data set remas a ope ad mportat questo.. Itroducto ad Motvato Clusterg s a popular data mg tas [] wth - meas clusterg beg a commo algorthm. However, sce the algorthm s ow to coverge to local optma of ts loss/objectve fucto ad s sestve to tal startg postos [8] t s typcally restarted from may tal startg postos. Ths results a very tme cosumg process ad may techques are avalable to speed up the -meas clusterg algorthm cludg preprocessg the data [], parallelzato [3] ad tellgetly settg the tal cluster postos [8]. I ths paper we propose a alteratve approach to speedg up -meas clusterg ow as bootstrap averagg. Ths approach s complmetary to other speed-up techques such as parallelzato. Our approach bulds multple models by creatg small bootstrap samples of the trag set ad buldg a model from each, but rather tha aggregatg le baggg [4], we average smlar cluster ceters to produce a sgle model that cotas clusters. I ths paper we shall focus o bootstrap samples that are smaller tha the trag data sze. Ths produces results that are comparable wth multple radom restartg of -meas clusterg usg all of the trag data, but taes far less computato tme. For example, whe we tae T bootstrap samples of sze 5% of the trag data set the the techque taes at least four tmes less computato tme but yelds as good as results f we had radomly restarted -meas T tmes usg all of the trag data. To test the effectveess of bootstrap averagg, we apply clusterg two popular settgs: fdg represetatve clusters of the populato ad predcto. Our approach yelds a speedup for two reasos. Frstly, we are clusterg less data ad secodly because the -meas algorthm coverges (usg stadard tests) more qucly for smaller data sets tha larger data sets from the same source/populato. It s mportat to ote that we do ot eed to re-start our algorthm may tmes for each bootstrap sample. Our approach s superfcally smlar to Bradley ad Fayyad s tal pot refemet (IPR) [8] approach that: ) sub-samples the trag data, ) clusters each sub-sample, 3) clusters the resultat cluster ceters may tmes to geerate refed tal startg postos for -meas. However, we shall show there are ey dffereces to ther clever alteratve to radomly choosg startg postos. We beg ths paper by troducg the -meas algorthm ad explore ts computatoal behavor. I partcular we show ad emprcally demostrate why clusterg smaller sets of data leads to faster covergece tha clusterg larger sets of data from the same data source/populato. We the troduce our bootstrap averagg algorthm after whch we dscuss our expermetal methodology ad results. We show that for bootstrap samples of less sze tha the orgal trag data set our approach performs as well as stadard techques but far less tme. We the dscuss the related Bradley ad Fayyad techque IPR ad dscuss dffereces to our ow wor. Fally, we coclude ad dscuss future wor.. Bacgroud to -meas Clusterg Cosder a set of data cotag staces/observatos each descrbed by m attrbutes.

2 The -meas clusterg problem s to dvde the staces to clusters wth the clusters parttog the staces (x x ) to the subsets Q. The subsets ca be summarzed as pots (C ) the m dmesoal space, commoly ow as cetrods or cluster ceters, whose co-ordates are the average of all pots belogg to the subset. We shall refer to ths collecto of cetrods obtaed from a applcato of the clusterg algorthm as a model. K-meas clusterg ca also be thought of as vector quatzato wth the am beg to mmze the vector quatzato error (also ow as the dstorto) show equato ( ). The mathematcal trval soluto s to have a cluster for each stace but typcally <<. VQ = = D( x, C f ( x ) ),where D s a dstace fucto ad f(x) returs the closet cluster dex tostace ( ) What s ow today as the -meas clusterg algorthm was postulated a umber of papers the etee sxtes [5][6]. The algorthm s extremely popular appearg leadg commercal data mg sutes offered by SAS, SPSS, SGI, ANGOSS [7]. Typcally the tal cetrod locatos are determed by assgg staces to a radomly chose cluster though alteratves exst [8]. After tal cluster cetrod placemet the algorthm cossts of the two followg steps that are repeated utl covergece. As the soluto coverged to s sestve to the startg posto the algorthm s typcally restarted may tmes. ) The assgmet step: staces are placed the closest cluster as defed by the dstace fucto. f ( x ) = arg m D( x, C j ) j ) The re-estmato step: the cluster cetrods are recalculated from the staces assged to the cluster. x C j = f ( x ) = j Q j These two steps repeat utl the re-estmato step leads to mmal chages cetrod values. Throughout ths paper we use the verso of the - meas clusterg algorthm commoly foud data mg applcatos. We use the Eucldea dstace, radomly assg staces to clusters to obta tal cluster locatos ad ormalze each attrbutes value to be betwee 0 ad by dvdg by the attrbutes rage. The -meas algorthm performs gradet descet o the dstorto error surface ad hece coverges to a local optmum of ts loss fucto (the dstorto). Covergece ca be measured a umber of ways such as the sum of the chages the cluster cetrods betwee adjacet teratos does ot exceed some very small umber epslo ( ths paper 0-5 ). 3. Computatoal Complexty of -meas Clusterg As before, let be the umber of staces to cluster, m be the umber of attrbutes/colums for each stace ad the umber of clusters. The t s well ow that f the algorthm performs teratos the the algorthm complexty s O(m) [9]. Whle, m ad are ow before the algorthm begs executo, as stated earler the algorthm s typcally ru utl covergece occurs, meag that, s, at vocato, uow. Of course ca be predetermed f the test for covergece s abadoed, but ths s computatoally effcet. For the rest of ths secto we derve results showg that (umber of teratos) s drectly proportoal to (the umber of staces) for a gve data source ad stadard tests of covergece. That s, smaller data sets wll coverge less teratos tha larger data sets draw from the same populato. Wthout loss of geeralty, we assume a uvarate data set ad Eucldea dstace. We frst state the dstorto whch -meas tres to mmze, tae the dervatve of ths expresso wth respect to the cluster cetrod value, set to 0 ad solve to derve the expresso that mmzes the dstorto. We fd that ths s as expected, the cluster cetrod update expresso as show below. VQ = VQ 0 δ ( f ( x ), ).( C = K, f ( x ) = x ) [( C ) ] t C t x x tae frst order dervatve, set to zero ad solve VQ = [ ] C t x t C, f ( x ) = C = = t = K t ( C, f ( x ) = = Q VQ = C t x, f ( x ) = Q x,, f ( x ) = t + x ) ( )

3 We ow derve a expresso to calculate the chage a cluster cetrod betwee tme t- ad t below: t C = C C = ( x ) C Q, ft ( x ) = = Q, ft ( x ) = as the summato t t t ( x C ) occurs Q tmes. ( 3 ) A smlar aalyss to our ow has bee performed whle llustratg that -meas s performg Newto s gradet descet wth a learg rate versely related to the cluster sze [0] as llustrated equato ( 3 ). Furthermore, from equato ( 4 ) we see that the sze of cluster cetrod chage s depedet o the chage the umber of staces assged to a cluster adjacet teratos, that s the umber of staces assged to at tme t but ot t- plus those ot assged to at tme t- but assged at tme t. For a gve data source, the larger the data set the more lely the codto assocated wth the frst summato equato ( 4 ) wll occur towards the ed of the - meas ru as our expermets wll llustrate later ths secto. Whether mproved tests of covergece that cosder the data set sze may remove ths pheomeo remas a ope questo., f ft ( x ) = ft ( x ) = ad f ft ( x ) = ft ( x ) = the C therefore : ( x ), f t ( x ) f t ( x ), f t ( x ) = or f t ( x ) = C = δ ( ft ( x ), ) + δ ( ft ( x ), ) ( 4 ), f t ( x ) f t ( x ), f t ( x ) f t ( x ) where δ ( a, b) =, whe a = b, zero otherwse = 0, We ow emprcally llustrate our earler clam that the umber of teratos utl covergece s related to data set sze o a umber of real world data sets as show Table, Table ad Table 3. These tables measure the average umber of teratos utl covergece occurs agast creasg data set szes. Covergece occurs f the chage (betwee adjacet teratos) clusters cetrod postos whe summed across all attrbutes ad clusters s less tha 0-5. Other covergece tests were appled such as: the chage cluster ceters s less tha a percetage of the smallest dstace betwee two clusters, that oe cetrod s chages s below epslo or that o chages cluster ceters locatos t t occurred, that s, Q = Q but our fdgs dd ot dffer sgfcatly. We report average results for 00 expermets (radom restarts). It s mportat to ote that for a partcular data set the 00 uque radom startg postos are detcal regardless of data set sze. That s, we start the algorthm from the same 00 startg postos regardless of dataset sze. Each smaller data set s a subset of the larger data sets. Average Iteratos No. 0% 5% 50% 75% 00% Dgt Pma Image Table. The average umber of teratos for -meas to coverge for a varety of data sets. Note that = ad results are average over 00 expermets (radom restarts). Average Iteratos No. 0% 5% 50% 75% 00% Dgt Pma Image Table. The average umber of teratos for -meas to coverge for a varety of data sets. Note that =4 ad results are average over 00 expermets (radom restarts). Average Iteratos No. 0% 5% 50% 75% 00% Dgt Pma Image Table 3. The average umber of teratos for -meas to coverge for a varety of data sets. Note that =6 ad results are average over 00 expermets (radom restarts). Fgure dagrammatcally llustrates our expermets showg that for larger values of that the umber of teratos mootocally creases as the data set sze creases.

4 Bag Sze Agast Number of Iteratos Utl Covergece - = Bag Sze Agast Number of Iteratos Utl Covergece - =4 5 Average Number of Iteratos Dgt Pma Image Average Number of Iteratos Dgt Pma Image 4 0% 5% 50% 75% 00% 7 0% 5% 50% 75% 00% Sze of Bag (% of Trag Data Set) Sze of Bag (% of Trag Data Set) Bag Sze Agast Number of Iteratos Utl Covergece - =6 Average Number of Iteratos % 5% 50% 75% 00% Sze of Bag (% of Trag Data Set) Dgt Pma Image Fgure. The average umber of teratos for -meas to coverge for a varety of data sets for =, =4 ad =6. Results are average over 00 expermets (radom restarts). To llustrate the pot that for a gve source of data, clusterg smaller data sets leads to faster covergece we map the trajectory the cluster algorthm taes through the stace space for dfferet sze data sets startg from the same startg posto. To vsualze these trajectores we wll reduce the dgt data set to two dmesos. Ths correspods to the data set represetg the startg posto where the pe wrtg the dgt was placed oto the tablet (a excellet predctor for the dgt type). Some example trajectores are show Fgure ad Fgure 3. I the 00 trals the average umber of teratos for the 500 stace data set s 4.33 ad for 000 staces 9.. Over the trals oly 6 trals was the umber of teratos for the larger data set less tha for the smaller data set. The fgures llustrate that for both data sets the cetrods qucly move to approxmately the same locato but the larger data set taes loger to coverge. Ths s so because for larger data sets the codto assocated wth the frst summato equato ( 4 ) occurs more ofte towards the ed of the -meas ru (as see the rght-had fgures above, t s more crowded ear the fal cluster cetrod postos). Ths result s cosstet wth the results of Mee et al [] whch foud that whe calculatg a learg curve for mxture models that allowg the algorthm to reach covergece was ot requred, the results obtaed after a few teratos was suffcet to determe the shape of the learg curve Cluster Cluster Cluster 3 Cluster 4 Cluster Cluster Cluster 3 Cluster 4 Fgure. The trajectory of four cluster cetrods through the stace space. The top fgure s for 500 staces (7 teratos), the bottom for 000 staces (3 teratos). Startg postos are crcled.

5 Cluster Cluster Cluster 3 Cluster 4 Cluster Cluster Cluster 3 Cluster 4 Fgure 3. The trajectory of four cluster cetrods through the stace space. The top fgure s for 500 staces (5 teratos), the bottom for 000 staces (8 teratos). Startg postos are crcled. 4. Descrpto of Our Approach We ow dscuss our algorthm. Note oly a sgle model s bult from each bootstrap sample as -meas s oly ru oce o each sample. Algorthm: Bootstrap Averagg Iput: D:Trag Data,T: Number of bags, K: Number of clusters Output: A: The averaged cetrods. // Geerate ad cluster each bag For = to T X = BootStrap(D) C = -meas-cluster(x,k) // Note C s the set of cluster // cetrods ad C = {c, c c K ) EdFor // Group smlar clusters to bs // wth the b averages stored B // B ther szes are S S For = to T For j = to K Idex = AssgToB(c j ) //See secto o sgature based comparso B Idex += c j EdFor EdFor For = to K B /= S A = B EdFor Our approach volves averagg smlar cluster cetrods. We propose the followg geeral-purpose method that s used throughout the paper, however, practce problem specfc approaches may be better. 4. Sgature Based Cluster Groupg For each cluster cetrod we create a sgature that ca be geerated qucly ad group clusters accordg to the sgature. We use the postos of the attrbutes ad ther values to create a sgature of the form: Sgature(c j )=Σ l c jl * l, where c jl s the l th attrbute for the j th cluster of the th model. As each attrbute s scaled to be betwee zero ad oe ths creates a sgature wth the rage 0 tll m+ as there are m attrbutes. After the sgature from each cluster s derved we sort them ascedg order ad dvde them to equally sze tervals to form the groups. Throughout ths paper we use ths method. I the future we pla to explore the feasblty of usg the approaches proposed by Strehl ad Ghosh [] that volves combg multple parttos/clusters. 5. How Much Our Approach Speeds up Clusterg If we average T bootstrap samples of sze /s of the trag data the we expect our techque to obta as good results as performg T radom restarts but less computato tme. As descrbed earler we expect a speedup for two reasos. The speedup due to clusterg less data wll be of magtude s sce the relatve complexty of the clusterg process wll be O(TmI) versus O(Tm/sI) assumg the same umber of teratos utl covergece. However, as dscussed earler for dfferet szed data sets from the same populato/source the tme to covergece s ot the same, typcally I /s <I, ths provdes a eve greater speed up whch the ext set of expermets quatfy. 6. Expermetal Results The frst use of our approach s ts ablty to fd more accurate estmates of the geeratg mechasm. To test ths ablty we eed to artfcally create data to ow the actual geeratg mechasm. We created a artfcal data set of sx clusters wth sx attrbutes. All attrbutes are Gaussas wth a mea of zero ad a stadard devato of 0.5 except the th attrbute of the th cluster whch has a mea of oe ad a stadard devato of 0.5. Formally, C Ge ={c c 6 }, c = {c j =, f =j, otherwse 0, j = to 6}. We ca measure the goodess of a clusterg soluto by ts Eucldea

6 dstace to these geeratg mechasms. We geerated 3000 staces from ths dstrbuto ffty tmes. For each sample we too 0 bootstrap samples, bult a sgle model from each ad averaged the cluster cetrods to produce a sgle model. We group smlar cluster cetrods usg the method descrbed earler. The bootstrap averagg approach taes at least fve tmes less computato tme tha clusterg all of the data after performg 0 radom restarts. If the umber of radom restarts s creased to 50 ad eve 00 the best model (mmum dstorto) from the radom restart approach stll does ot yeld better results tha bootstrap averagg. The dstace to the true cluster ceters s show Fgure 4. Ths fgure shows that for 0% bootstrap sample szes that the averaged model s further away tha the best model from radom restarts. However, for 0% bootstrap sample sze the averaged model s closer to the geeratg mechasm. Determg the precse sze of the bootstrap sample whe performace s as good as clusterg the etre data set remas a ope questo. KL Dstace to Geeratg Mechasm KL Dstace to Geeratg Mechasm Dstace to Geeratg Mechasm Versus Trag Set Sze Bags of 0% of Orgal Trag Set Sze Trag Set Sze Dstace to Geeratg Mechasm Versus Trag Set Sze Bags of 0% of Orgal Trag Set Sze Trag Set Sze Averaged Model Best Model Averaged Model Best Model Fgure 4. Trag data sze agast mea dstace (over 50 samples) to the geeratg mechasm for averaged model (0 bags) ad best sgle model foud from 0 radom restarts for the etre data set. The top graph s for bootstrap samples of sze 0% ad the bottom graph 0% of the etre data set. The decrease computato tme by usg the bootstrap averagg approach s ever less tha fve tmes. We ow show results for the sze of the bootstrap sample agast predctve accuracy. Though predcto s ot a commo applcato of clusterg, t allows us to show the performace of our approach o real world problem where we do ot ow the geeratg mechasm or true model. We fd as before that averagg smaller bootstraps yelds as accurate results as radom restarts o all of the data but less computato tme. I all expermets we drew 50 radom samples from the data, dvded ths data to a trag set (70%) ad test set (30%). For each data set, we radomly restarted -meas 50 tmes ad selected the model that mmzed the dstorto. We compare ths model agast the model obtaed by averagg over 50 bootstrap samples. Dgt Data Set: The accuracy of the averaged model ad computato tme for the dgt data set (predctg 3 or 8, the most dffcult dgt predcto decso ths data set) for dfferet szed bootstrap samples are show Fgure 5. The best model foud from all of the data after the radom restarts has a mea accuracy of 3.70% ad the total computato tme s CPU mutes compared to the averaged model s accuracy 30.8% foud approxmately 5 CPU mutes whe usg 5% of data sze bootstrap sample. Note that all our results we report the user tme reported by the Lux tme commad. Ths correspods to the amout of CPU tme that was dedcated to the process apart from system erel calls whch was eglgble all case (uder 0.0secods). We fd that smlar speedups hold for other data sets. However, the sze of the reduced bags whe the accuracy of the averaged model s the same as obtaed from radom restarts from all of the data vares betwee data sets. Image ad Letter Datasets: We see Fgure 6 ad Fgure 7 that for the Image ad Letter data sets that the computato speedup s approxmately a factor of 3.98 ad 3.54 respectvely. Ths s so as the bootstrap sample sze must crease to 40% to obta a acceptable accuracy. Determg the correct sze of the reduce bag remas a ope questo ad we hope to explore lterature from the learg curve area [] to address ths questo our future wor. Our results dcate that the sze of the bootstrap sample sze s a mootoc fucto of the averaged model s dstace to the geeratg mechasm (true model). Ths s a advatage over the wor by Bradley ad Fayyad where the sze of the sub-samples leads to dfferet results [secto 3., 8].

7 Error Versus Bag Sze For Averaged M odel CPU M utes Versus Bag Sze for Averaged M odel 50% 48% 46% 44% 4% 40% 38% 36% 34% 3% 30% % 0% 5% 0% 5% % 0% 5% 0% 5% Fgure 5. Dgt Data Set. Trag data sze agast predctve error (over 50 samples) (left graph) ad computato tme (rght graph) for the averaged model. Note: Usg the etre data set the computato tme s CPU mutes ad accuracy s 3.7% as compared to 30.8% approxmately 5 CPU mutes whe usg 5% of data sze bootstrap sample Error Versus Bag Sze For Averaged Model CPU Mutes Versus Bag Sze for Averaged Model 50% 48% 46% 44% 4% 40% 38% 36% 34% 3% 30% % 0% 5% 30% 40% % 0% 5% 30% 40% Fgure 6. Image Data Set. Trag data sze agast predctve error (over 50 samples) (left graph) ad computato tme (rght graph) for the averaged model. Note: Usg the etre data set the computato tme s 3.5 CPU mutes ad accuracy s 3.7%, as compared to accuracy of aroud 3.9% ad 5 CPU mutes whe usg 40% of data sze bootstrap sample. Error Versus Bag Sze For Averaged M odel CPU M utes Versus Bag Sze for Averaged M odel 40% 38% 36% 34% 3% 30% 8% 6% 4% % 0% % 0% 5% 30% 40% % 0% 5% 30% 40% Fgure 7. Letter Data Set. Trag data sze agast predctve error (over 50 samples) (left graph) ad computato tme (rght graph) for the averaged model. Note: Usg the etre data set the computato tme s 8. CPU mutes ad accuracy s 0.6%, as compared to accuracy of aroud.0% ad 5. CPU mutes whe usg 40% of data sze bootstrap sample. Table 4 shows a summary table of the stuato whe the bootstrap averaged accuracy equals the accuracy of clusterg f all the data s used. Ths table shows the expected speed up ([ / colum ] * [colum 3 / colum 4], see secto 5 for detals) ad the actual speedup. The two umbers dffer as the expected speed up does ot cosder the extra overhead such as the tme requred to geerate the bootstrap samples. We dd ot attempt to optmze our code ad hece expect the real dfferece betwee the expected ad actual fgures to be closer.

8 Bootstrap Sample Sze Requred Table 4. A summary of the statstcs where the bootstrap averaged accuracy approxmately equals the accuracy f all the data s clustered. 7. Dscusso We have show that: Radom Restarts: Ave. Number of Iteratos Bootstrap Averagg: Ave. Number of Iteratos Expected (Actual) Speed Up Dgt 5% (4.63) Image 40% (3.98) Letter 40% (3.54). Bootstrap averagg T subsets of the data set wll typcally be more computatoally effcet as radomly restartg the algorthm T tmes o the etre data set. (Secto 3, Table, Table ad Table 3). Bootstrap averagg o a proporto of the dataset ca yeld as accurate results as clusterg the etre data set (see Fgure 4): Ths produces results that are comparable wth radom restartg of -meas clusterg usg all of the trag data, but taes far less computato tme. 3. The results of bootstrap averagg mootocally mprove as a fucto of the bootstrap sample sze (see Fgure 5, Fgure 6 ad Fgure 7): As we crease the sze of the bootstrap sample, the accuracy mproves, utl at some pot the accuracy s comparable to that whe the etre dataset s used. Determg the sze of the bags at whch the averaged model performs as well as clusterg the etre data set vares from data set to data set hece remas a ope questo. A vald questo s: how s our approach related to the IPR approach. For the rest of ths secto we emprcally show that the two approaches obta qute dfferet results. 7. Is Bootstrap Averagg a Geeralzato of IPR We beg by showg that the bootstrap averagg approach heretly does ot produce a model that mmzes the dstorto. That s, we are ot compesatg for some search effcecy of the - meas algorthm as the IPR approach s effectvely dog, stead we are mmzg aother loss fucto as Table 5 dcates. The IPR approach attempts to fd a good set of tal postos ad the apples the -meas clusterg algorthm to further mmze the dstorto. The secod colum ths table s the result of applyg bootstrap averagg. The frst colum was geerated by talzg the clusterg algorthm to the averaged model ad clusterg the etre trag data. We fd that the -meas algorthm that performs a gradet descet o the dstorto error surface fds aother model that further mmzes the dstorto but yelds worse performace. The averaged models are statstcally sgfcatly better at the 95% cofdece level. Startg from Averaged Averaged Model Model Dgt 7.9% (5.6) 3.6% (.3) Image 3.3% (4.6) 4.3% (4.) Pma 3.3% (3.) 7.9% (3.3) Letter 3.7% (4.) 7.5% (.3) Abaloe 5.6% (5.) 9.9% (3.) Adult 33.4% (7.) 5.% (4.5) Table 5. Collecto of data sets. The average ad paretheses stadard devato test set error statstcs for the predctve ablty of models foud by startg -meas from the averaged model ad the averaged model for 50 radom dvsos of trag (70%) ad test (30%) sets. I our ext set of expermets we show that bootstrap averagg performs qute dfferetly to IPR. To llustrate ths pot clearly, we show that for bootstrap samples of equal sze to the trag data set that the results that -meas wth IPR coverges to s qute dfferet from the bootstrap averaged model. I Table 6 the averaged model s sgfcatly better (at the 95% cofdece level) tha -meas wth IPR. The frst colum refers to the predcto ablty of the model mmzg the dstorto from 00 radom restarts of the algorthm o the orgal trag data. The secod colum refers to the predcto ablty of the model mmzg dstorto from 00 IPR selected startg solutos o the orgal trag data. The fal colum (our approach) refers to the sgle model that s the average of all 0 bags. Predctg sex of abaloe

9 Radom Restart IPR Restart Averaged Model Dgt 30.6% (6.7) 9.5% (4.4) 3.6% (.3) Image 35.5% (9.8) 9.3% (5.) 4.3% (4.) Pma 33.5% (4.5) 3.% (3.) 7.9% (3.3) Letter.0% (6.4).3% (3.) 7.5% (.3) Abaloe 5.3% (7.9).4% (4.5) 9.9% (3.) Adult 34.3% (0.3) 8.9% (5.4) 5.% (4.5) Table 6. Collecto of data sets. The average ad paretheses stadard devato test set error statstcs for the predctve ablty of models foud by applyg -meas a varety of stuatos over 50 radom dvsos of trag (70%) ad test (30%) sets. 8. Cocluso K-Meas clusterg s a popular but tme cosumg algorthm used data mg. It s tme cosumg as t coverges to a local optmum of ts loss fucto (the dstorto) ad the soluto coverged to s partcularly sestve to the tal startg postos. As a result ts typcal use practce volves applyg the algorthm from may radomly chose tal startg postos. I ths paper we explore a approach we term bootstrap averagg. Bootstrap averagg bulds multple models by creatg small bootstrap samples of the trag set ad buldg a sgle model from each, smlar cluster ceters are the averaged to produce a sgle model that cotas clusters. If we average T bags of sze /s of the etre data set the our approach taes less tme tha radomly restartg the algorthm T tmes by a factor of at least s. Kowg the value of s where the averaged model performs as well as clusterg the etre data set vares betwee data sets. The speedup s because the computatoal complexty of -meas s lear wth respect to the umber of data pots. I practce we fd that the speedup our approach provdes s fact greater tha s sce the stadard test for covergece (the chage cluster cetrods s less tha some small umber, epslo) do ot cosder the sze of the trag data set. Our results dcate that the umber of teratos of the algorthm utl covergece s proportoal to the sze of the data set. I future wor we wll explore developg tests of covergece that factor the data set sze. Predctg dgt 3 or 8 Our emprcal results show that bootstrap samplg ca acheve comparable results as clusterg all of the data but less computato tme. We perform expermets to measure a clusterg model s results two ways: ) dstace to the geeratg mechasm ad ) predctve ablty. However, owg the sze of the sample that performs as well as clusterg the etre data set remas a ope questo. We hope to explore usg the learg curve lterature to determe potetal ways to address ths questo. Our research emprcally shows that clusterg bgger data sets s ot always desrable. No doubt that clusterg large data sets offer addtoal beefts amely producg better results, but our expermets dcate that bootstrappg smaller portos of the dataset ca produce ths beeft as well but at a reducto computato tme. 9. Refereces [] Ha J., ad Kamber M., Data Mg: Cocepts ad Techques, Morga Kauffma, 000. [] Pelleg, D. ad Moore, A, "Acceleratg Exact - meas Algorthms wth Geometrc Reasog", KDD-99, Proc. of the Ffth ACM SIGKDD Iter. Cof. O Kowledge Dscovery ad Data Mg, page [3] Dhllo, I. S. ad Modha, D. M., A Data Clusterg Algorthm o Dstrbuted Memory Multprocessors, Large-Scale Parallel Data Mg, Lecture Notes Artfcal Itellgece, Volume 759, pages 45-60, 000. [4] L. Brema. Baggg predctors. Mache Learg, 6():3-40, 996. [5] MacQuee, J., Some Methods for classfcato ad aalyss of multattrbute staces, Ffteth Bereley Symposum o Mathematcs, Statstcs ad Probablty, vol, 967. [6] Max, J., Quatzg for Mmum Dstorto, IEEE Trasactos o Iformato Theory, 6, pages 7-, 960 [7] H. Edelste, The Two Crows Report: 999. Avalable at [8] P. Bradley, U. Fayyad, Refg Ital Pots for -meas Clusterg. ICML 998. [9] Hartga, J., Clusterg Algorthms, Wley Publshg, 975. [0] Bottou, L., ad Bego, Y., Covergece propertes of the -meas algorthm. I G. Tesauro ad D. Touretzy, edtors, Adv. Neural Ifo. Proc.

10 Systems, volume 7, pages MIT Press, Cambrdge MA, 995. [] A. Strehl ad J. Ghosh. Cluster esembles - a owledge reuse framewor for combg multple parttos. Joural o Mache Learg Research (JMLR), 3:583-67, December 00. [] C. Mee, B. Thesso, ad D. Hecerma. The learg-curve method appled to model-based clusterg. Joural of Mache Learg Research, : , 00.

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data ANOVA Notes Page Aalss of Varace for a Oe-Wa Classfcato of Data Cosder a sgle factor or treatmet doe at levels (e, there are,, 3, dfferet varatos o the prescrbed treatmet) Wth a gve treatmet level there

More information

6.7 Network analysis. 6.7.1 Introduction. References - Network analysis. Topological analysis

6.7 Network analysis. 6.7.1 Introduction. References - Network analysis. Topological analysis 6.7 Network aalyss Le data that explctly store topologcal formato are called etwork data. Besdes spatal operatos, several methods of spatal aalyss are applcable to etwork data. Fgure: Network data Refereces

More information

APPENDIX III THE ENVELOPE PROPERTY

APPENDIX III THE ENVELOPE PROPERTY Apped III APPENDIX III THE ENVELOPE PROPERTY Optmzato mposes a very strog structure o the problem cosdered Ths s the reaso why eoclasscal ecoomcs whch assumes optmzg behavour has bee the most successful

More information

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN Wojcech Zelńsk Departmet of Ecoometrcs ad Statstcs Warsaw Uversty of Lfe Sceces Nowoursyowska 66, -787 Warszawa e-mal: wojtekzelsk@statystykafo Zofa Hausz,

More information

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time. Computatoal Geometry Chapter 6 Pot Locato 1 Problem Defto Preprocess a plaar map S. Gve a query pot p, report the face of S cotag p. S Goal: O()-sze data structure that eables O(log ) query tme. C p E

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology I The Name of God, The Compassoate, The ercful Name: Problems' eys Studet ID#:. Statstcal Patter Recogto (CE-725) Departmet of Computer Egeerg Sharf Uversty of Techology Fal Exam Soluto - Sprg 202 (50

More information

Average Price Ratios

Average Price Ratios Average Prce Ratos Morgstar Methodology Paper August 3, 2005 2005 Morgstar, Ic. All rghts reserved. The formato ths documet s the property of Morgstar, Ic. Reproducto or trascrpto by ay meas, whole or

More information

Numerical Methods with MS Excel

Numerical Methods with MS Excel TMME, vol4, o.1, p.84 Numercal Methods wth MS Excel M. El-Gebely & B. Yushau 1 Departmet of Mathematcal Sceces Kg Fahd Uversty of Petroleum & Merals. Dhahra, Saud Araba. Abstract: I ths ote we show how

More information

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki IDENIFICAION OF HE DYNAMICS OF HE GOOGLE S RANKING ALGORIHM A. Khak Sedgh, Mehd Roudak Cotrol Dvso, Departmet of Electrcal Egeerg, K.N.oos Uversty of echology P. O. Box: 16315-1355, ehra, Ira sedgh@eetd.ktu.ac.r,

More information

Simple Linear Regression

Simple Linear Regression Smple Lear Regresso Regresso equato a equato that descrbes the average relatoshp betwee a respose (depedet) ad a eplaator (depedet) varable. 6 8 Slope-tercept equato for a le m b (,6) slope. (,) 6 6 8

More information

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0 Chapter 2 Autes ad loas A auty s a sequece of paymets wth fxed frequecy. The term auty orgally referred to aual paymets (hece the ame), but t s ow also used for paymets wth ay frequecy. Autes appear may

More information

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R = Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS Objectves of the Topc: Beg able to formalse ad solve practcal ad mathematcal problems, whch the subjects of loa amortsato ad maagemet of cumulatve fuds are

More information

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev The Gompertz-Makeham dstrbuto by Fredrk Norström Master s thess Mathematcal Statstcs, Umeå Uversty, 997 Supervsor: Yur Belyaev Abstract Ths work s about the Gompertz-Makeham dstrbuto. The dstrbuto has

More information

Chapter 3 0.06 = 3000 ( 1.015 ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

Chapter 3 0.06 = 3000 ( 1.015 ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization Chapter 3 Mathematcs of Face Secto 4 Preset Value of a Auty; Amortzato Preset Value of a Auty I ths secto, we wll address the problem of determg the amout that should be deposted to a accout ow at a gve

More information

Settlement Prediction by Spatial-temporal Random Process

Settlement Prediction by Spatial-temporal Random Process Safety, Relablty ad Rs of Structures, Ifrastructures ad Egeerg Systems Furuta, Fragopol & Shozua (eds Taylor & Fracs Group, Lodo, ISBN 978---77- Settlemet Predcto by Spatal-temporal Radom Process P. Rugbaapha

More information

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract Preset Value of Autes Uder Radom Rates of Iterest By Abraham Zas Techo I.I.T. Hafa ISRAEL ad Uversty of Hafa, Hafa ISRAEL Abstract Some attempts were made to evaluate the future value (FV) of the expected

More information

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software J. Software Egeerg & Applcatos 3 63-69 do:.436/jsea..367 Publshed Ole Jue (http://www.scrp.org/joural/jsea) Dyamc Two-phase Trucated Raylegh Model for Release Date Predcto of Software Lafe Qa Qgchua Yao

More information

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems Fto: A Faster, Permutable Icremetal Gradet Method for Bg Data Problems Aaro J Defazo Tbéro S Caetao Just Domke NICTA ad Australa Natoal Uversty AARONDEFAZIO@ANUEDUAU TIBERIOCAETANO@NICTACOMAU JUSTINDOMKE@NICTACOMAU

More information

An Effectiveness of Integrated Portfolio in Bancassurance

An Effectiveness of Integrated Portfolio in Bancassurance A Effectveess of Itegrated Portfolo Bacassurace Taea Karya Research Ceter for Facal Egeerg Isttute of Ecoomc Research Kyoto versty Sayouu Kyoto 606-850 Japa arya@eryoto-uacp Itroducto As s well ow the

More information

The simple linear Regression Model

The simple linear Regression Model The smple lear Regresso Model Correlato coeffcet s o-parametrc ad just dcates that two varables are assocated wth oe aother, but t does ot gve a deas of the kd of relatoshp. Regresso models help vestgatg

More information

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information

Models for Selecting an ERP System with Intuitionistic Trapezoidal Fuzzy Information JOURNAL OF SOFWARE, VOL 5, NO 3, MARCH 00 75 Models for Selectg a ERP System wth Itutostc rapezodal Fuzzy Iformato Guwu We, Ru L Departmet of Ecoomcs ad Maagemet, Chogqg Uversty of Arts ad Sceces, Yogchua,

More information

Chapter Eight. f : R R

Chapter Eight. f : R R Chapter Eght f : R R 8. Itroducto We shall ow tur our atteto to the very mportat specal case of fuctos that are real, or scalar, valued. These are sometmes called scalar felds. I the very, but mportat,

More information

Reinsurance and the distribution of term insurance claims

Reinsurance and the distribution of term insurance claims Resurace ad the dstrbuto of term surace clams By Rchard Bruyel FIAA, FNZSA Preseted to the NZ Socety of Actuares Coferece Queestow - November 006 1 1 Itroducto Ths paper vestgates the effect of resurace

More information

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering Moder Appled Scece October, 2009 Applcatos of Support Vector Mache Based o Boolea Kerel to Spam Flterg Shugag Lu & Keb Cu School of Computer scece ad techology, North Cha Electrc Power Uversty Hebe 071003,

More information

Regression Analysis. 1. Introduction

Regression Analysis. 1. Introduction . Itroducto Regresso aalyss s a statstcal methodology that utlzes the relato betwee two or more quattatve varables so that oe varable ca be predcted from the other, or others. Ths methodology s wdely used

More information

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN Colloquum Bometrcum 4 ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN Zofa Hausz, Joaa Tarasńska Departmet of Appled Mathematcs ad Computer Scece Uversty of Lfe Sceces Lubl Akademcka 3, -95 Lubl

More information

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree , pp.277-288 http://dx.do.org/10.14257/juesst.2015.8.1.25 A New Bayesa Network Method for Computg Bottom Evet's Structural Importace Degree usg Jotree Wag Yao ad Su Q School of Aeroautcs, Northwester Polytechcal

More information

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation Securty Aalyss of RAPP: A RFID Authetcato Protocol based o Permutato Wag Shao-hu,,, Ha Zhje,, Lu Sujua,, Che Da-we, {College of Computer, Najg Uversty of Posts ad Telecommucatos, Najg 004, Cha Jagsu Hgh

More information

Integrating Production Scheduling and Maintenance: Practical Implications

Integrating Production Scheduling and Maintenance: Practical Implications Proceedgs of the 2012 Iteratoal Coferece o Idustral Egeerg ad Operatos Maagemet Istabul, Turkey, uly 3 6, 2012 Itegratg Producto Schedulg ad Mateace: Practcal Implcatos Lath A. Hadd ad Umar M. Al-Turk

More information

Statistical Intrusion Detector with Instance-Based Learning

Statistical Intrusion Detector with Instance-Based Learning Iformatca 5 (00) xxx yyy Statstcal Itruso Detector wth Istace-Based Learg Iva Verdo, Boja Nova Faulteta za eletroteho raualštvo Uverza v Marboru Smetaova 7, 000 Marbor, Sloveja va.verdo@sol.et eywords:

More information

The Digital Signature Scheme MQQ-SIG

The Digital Signature Scheme MQQ-SIG The Dgtal Sgature Scheme MQQ-SIG Itellectual Property Statemet ad Techcal Descrpto Frst publshed: 10 October 2010, Last update: 20 December 2010 Dalo Glgorosk 1 ad Rue Stesmo Ødegård 2 ad Rue Erled Jese

More information

of the relationship between time and the value of money.

of the relationship between time and the value of money. TIME AND THE VALUE OF MONEY Most agrbusess maagers are famlar wth the terms compoudg, dscoutg, auty, ad captalzato. That s, most agrbusess maagers have a tutve uderstadg that each term mples some relatoshp

More information

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are : Bullets bods Let s descrbe frst a fxed rate bod wthout amortzg a more geeral way : Let s ote : C the aual fxed rate t s a percetage N the otoal freq ( 2 4 ) the umber of coupo per year R the redempto of

More information

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

An IG-RS-SVM classifier for analyzing reviews of E-commerce product Iteratoal Coferece o Iformato Techology ad Maagemet Iovato (ICITMI 205) A IG-RS-SVM classfer for aalyzg revews of E-commerce product Jaju Ye a, Hua Re b ad Hagxa Zhou c * College of Iformato Egeerg, Cha

More information

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil ECONOMIC CHOICE OF OPTIMUM FEEDER CABE CONSIDERING RISK ANAYSIS I Camargo, F Fgueredo, M De Olvera Uversty of Brasla (UB) ad The Brazla Regulatory Agecy (ANEE), Brazl The choce of the approprate cable

More information

Towards Network-Aware Composition of Big Data Services in the Cloud

Towards Network-Aware Composition of Big Data Services in the Cloud (IJACSA) Iteratoal Joural of Advaced Computer Scece ad Applcatos, Towards Network-Aware Composto of Bg Data Servces the Cloud Umar SHEHU Departmet of Computer Scece ad Techology Uversty of Bedfordshre

More information

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK Fractal-Structured Karatsuba`s Algorthm for Bary Feld Multplcato: FK *The authors are worg at the Isttute of Mathematcs The Academy of Sceces of DPR Korea. **Address : U Jog dstrct Kwahadog Number Pyogyag

More information

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability Egeerg, 203, 5, 4-8 http://dx.do.org/0.4236/eg.203.59b003 Publshed Ole September 203 (http://www.scrp.org/joural/eg) Mateace Schedulg of Dstrbuto System wth Optmal Ecoomy ad Relablty Syua Hog, Hafeg L,

More information

Green Master based on MapReduce Cluster

Green Master based on MapReduce Cluster Gree Master based o MapReduce Cluster Mg-Zh Wu, Yu-Chag L, We-Tsog Lee, Yu-Su L, Fog-Hao Lu Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of

More information

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds. Proceedgs of the 21 Wter Smulato Coferece B. Johasso, S. Ja, J. Motoya-Torres, J. Huga, ad E. Yücesa, eds. EMPIRICAL METHODS OR TWO-ECHELON INVENTORY MANAGEMENT WITH SERVICE LEVEL CONSTRAINTS BASED ON

More information

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Credibility Premium Calculation in Motor Third-Party Liability Insurance Advaces Mathematcal ad Computatoal Methods Credblty remum Calculato Motor Thrd-arty Lablty Isurace BOHA LIA, JAA KUBAOVÁ epartmet of Mathematcs ad Quattatve Methods Uversty of ardubce Studetská 95, 53

More information

A particle swarm optimization to vehicle routing problem with fuzzy demands

A particle swarm optimization to vehicle routing problem with fuzzy demands A partcle swarm optmzato to vehcle routg problem wth fuzzy demads Yag Peg, Ye-me Qa A partcle swarm optmzato to vehcle routg problem wth fuzzy demads Yag Peg 1,Ye-me Qa 1 School of computer ad formato

More information

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS I Ztou, K Smaïl, S Delge, F Bmbot To cte ths verso: I Ztou, K Smaïl, S Delge, F Bmbot. A COMPARATIVE STUDY BETWEEN POLY- CLASS AND MULTICLASS

More information

Performance Attribution. Methodology Overview

Performance Attribution. Methodology Overview erformace Attrbuto Methodology Overvew Faba SUAREZ March 2004 erformace Attrbuto Methodology 1.1 Itroducto erformace Attrbuto s a set of techques that performace aalysts use to expla why a portfolo's performace

More information

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation The Iteratoal Joural Of Egeerg Ad Scece (IJES) olume 3 Issue 6 Pages 30-36 204 ISSN (e): 239 83 ISSN (p): 239 805 A partcle Swarm Optmzato-based Framework for Agle Software Effort Estmato Maga I, & 2 Blamah

More information

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time Joural of Na Ka, Vol. 0, No., pp.5-9 (20) 5 A Study of Urelated Parallel-Mache Schedulg wth Deteroratg Mateace Actvtes to Mze the Total Copleto Te Suh-Jeq Yag, Ja-Yuar Guo, Hs-Tao Lee Departet of Idustral

More information

Curve Fitting and Solution of Equation

Curve Fitting and Solution of Equation UNIT V Curve Fttg ad Soluto of Equato 5. CURVE FITTING I ma braches of appled mathematcs ad egeerg sceces we come across epermets ad problems, whch volve two varables. For eample, t s kow that the speed

More information

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information A Approach to Evaluatg the Computer Network Securty wth Hestat Fuzzy Iformato Jafeg Dog A Approach to Evaluatg the Computer Network Securty wth Hestat Fuzzy Iformato Jafeg Dog, Frst ad Correspodg Author

More information

RUSSIAN ROULETTE AND PARTICLE SPLITTING

RUSSIAN ROULETTE AND PARTICLE SPLITTING RUSSAN ROULETTE AND PARTCLE SPLTTNG M. Ragheb 3/7/203 NTRODUCTON To stuatos are ecoutered partcle trasport smulatos:. a multplyg medum, a partcle such as a eutro a cosmc ray partcle or a photo may geerate

More information

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID CH. ME56 STTICS Ceter of Gravt, Cetrod, ad Momet of Ierta CENTE OF GITY ND CENTOID 5. CENTE OF GITY ND CENTE OF MSS FO SYSTEM OF PTICES Ceter of Gravt. The ceter of gravt G s a pot whch locates the resultat

More information

We present a new approach to pricing American-style derivatives that is applicable to any Markovian setting

We present a new approach to pricing American-style derivatives that is applicable to any Markovian setting MANAGEMENT SCIENCE Vol. 52, No., Jauary 26, pp. 95 ss 25-99 ess 526-55 6 52 95 forms do.287/msc.5.447 26 INFORMS Prcg Amerca-Style Dervatves wth Europea Call Optos Scott B. Laprse BAE Systems, Advaced

More information

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom. UMEÅ UNIVERSITET Matematsk-statstska sttutoe Multvarat dataaalys för tekologer MSTB0 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multvarat dataaalys för tekologer B, 5 poäg.

More information

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING STATISTICS @ Sunflowers Apparel

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING STATISTICS @ Sunflowers Apparel CHAPTER 3 Smple Lear Regresso USING STATISTICS @ Suflowers Apparel 3 TYPES OF REGRESSION MODELS 3 DETERMINING THE SIMPLE LINEAR REGRESSION EQUATION The Least-Squares Method Vsual Exploratos: Explorg Smple

More information

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

Constrained Cubic Spline Interpolation for Chemical Engineering Applications Costraed Cubc Sple Iterpolato or Chemcal Egeerg Applcatos b CJC Kruger Summar Cubc sple terpolato s a useul techque to terpolate betwee kow data pots due to ts stable ad smooth characterstcs. Uortuatel

More information

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization M. Salah, F. Mehrdoust, F. Pr Uversty of Gula, Rasht, Ira CVaR Robust Mea-CVaR Portfolo Optmzato Abstract: Oe of the most mportat problems faced by every vestor s asset allocato. A vestor durg makg vestmet

More information

Group Nearest Neighbor Queries

Group Nearest Neighbor Queries Group Nearest Neghbor Queres Dmtrs Papadas Qogmao She Yufe Tao Kyrakos Mouratds Departmet of Computer Scece Hog Kog Uversty of Scece ad Techology Clear Water Bay, Hog Kog {dmtrs, qmshe, kyrakos}@cs.ust.hk

More information

Impact of Mobility Prediction on the Temporal Stability of MANET Clustering Algorithms *

Impact of Mobility Prediction on the Temporal Stability of MANET Clustering Algorithms * Impact of Moblty Predcto o the Temporal Stablty of MANET Clusterg Algorthms * Aravdha Vekateswara, Vekatesh Saraga, Nataraa Gautam 1, Ra Acharya Departmet of Comp. Sc. & Egr. Pesylvaa State Uversty Uversty

More information

Optimal Packetization Interval for VoIP Applications Over IEEE 802.16 Networks

Optimal Packetization Interval for VoIP Applications Over IEEE 802.16 Networks Optmal Packetzato Iterval for VoIP Applcatos Over IEEE 802.16 Networks Sheha Perera Harsha Srsea Krzysztof Pawlkowsk Departmet of Electrcal & Computer Egeerg Uversty of Caterbury New Zealad sheha@elec.caterbury.ac.z

More information

Report 52 Fixed Maturity EUR Industrial Bond Funds

Report 52 Fixed Maturity EUR Industrial Bond Funds Rep52, Computed & Prted: 17/06/2015 11:53 Report 52 Fxed Maturty EUR Idustral Bod Fuds From Dec 2008 to Dec 2014 31/12/2008 31 December 1999 31/12/2014 Bechmark Noe Defto of the frm ad geeral formato:

More information

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011 Cyber Jourals: Multdscplary Jourals cece ad Techology, Joural of elected Areas Telecommucatos (JAT), Jauary dto, 2011 A ovel rtual etwork Mappg Algorthm for Cost Mmzg ZHAG hu-l, QIU Xue-sog tate Key Laboratory

More information

Optimization Model in Human Resource Management for Job Allocation in ICT Project

Optimization Model in Human Resource Management for Job Allocation in ICT Project Optmzato Model Huma Resource Maagemet for Job Allocato ICT Project Optmzato Model Huma Resource Maagemet for Job Allocato ICT Project Saghamtra Mohaty Malaya Kumar Nayak 2 2 Professor ad Head Research

More information

Trend Projection using Predictive Analytics

Trend Projection using Predictive Analytics Iteratoal Joural of Computer Applcatos (0975 8887) Tred Projecto usg Predctve Aalytcs Seema L. Vadure KLS Gogte Isttute of Techology, Udyambag, Belgaum Karataka, Ida Majula Ramaavar KLS Gogte Isttute of

More information

10.5 Future Value and Present Value of a General Annuity Due

10.5 Future Value and Present Value of a General Annuity Due Chapter 10 Autes 371 5. Thomas leases a car worth $4,000 at.99% compouded mothly. He agrees to make 36 lease paymets of $330 each at the begg of every moth. What s the buyout prce (resdual value of the

More information

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS L et al.: A Dstrbuted Reputato Broker Framework for Web Servce Applcatos A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS Kwe-Jay L Departmet of Electrcal Egeerg ad Computer Scece

More information

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity Computer Aded Geometrc Desg 19 (2002 365 377 wwwelsevercom/locate/comad Optmal mult-degree reducto of Bézer curves wth costrats of edpots cotuty Guo-Dog Che, Guo-J Wag State Key Laboratory of CAD&CG, Isttute

More information

1. The Time Value of Money

1. The Time Value of Money Corporate Face [00-0345]. The Tme Value of Moey. Compoudg ad Dscoutg Captalzato (compoudg, fdg future values) s a process of movg a value forward tme. It yelds the future value gve the relevat compoudg

More information

RESEARCH ON PERFORMANCE MODELING OF TRANSACTIONAL CLOUD APPLICATIONS

RESEARCH ON PERFORMANCE MODELING OF TRANSACTIONAL CLOUD APPLICATIONS Joural of Theoretcal ad Appled Iformato Techology 3 st October 22. Vol. 44 No.2 25-22 JATIT & LLS. All rghts reserved. ISSN: 992-8645 www.jatt.org E-ISSN: 87-395 RESEARCH ON PERFORMANCE MODELING OF TRANSACTIONAL

More information

On Error Detection with Block Codes

On Error Detection with Block Codes BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 3 Sofa 2009 O Error Detecto wth Block Codes Rostza Doduekova Chalmers Uversty of Techology ad the Uversty of Gotheburg,

More information

Research on Cloud Computing and Its Application in Big Data Processing of Railway Passenger Flow

Research on Cloud Computing and Its Application in Big Data Processing of Railway Passenger Flow 325 A publcato of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Re, Yacag L, Hupg Sog Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Itala Assocato of

More information

Online Appendix: Measured Aggregate Gains from International Trade

Online Appendix: Measured Aggregate Gains from International Trade Ole Appedx: Measured Aggregate Gas from Iteratoal Trade Arel Burste UCLA ad NBER Javer Cravo Uversty of Mchga March 3, 2014 I ths ole appedx we derve addtoal results dscussed the paper. I the frst secto,

More information

Optimizing Software Effort Estimation Models Using Firefly Algorithm

Optimizing Software Effort Estimation Models Using Firefly Algorithm Joural of Software Egeerg ad Applcatos, 205, 8, 33-42 Publshed Ole March 205 ScRes. http://www.scrp.org/joural/jsea http://dx.do.org/0.4236/jsea.205.8304 Optmzg Software Effort Estmato Models Usg Frefly

More information

Near Neighbor Distribution in Sets of Fractal Nature

Near Neighbor Distribution in Sets of Fractal Nature Iteratoal Joural of Computer Iformato Systems ad Idustral Maagemet Applcatos. ISS 250-7988 Volume 5 (202) 3 pp. 59-66 MIR Labs, www.mrlabs.et/jcsm/dex.html ear eghbor Dstrbuto Sets of Fractal ature Marcel

More information

How To Make A Supply Chain System Work

How To Make A Supply Chain System Work Iteratoal Joural of Iformato Techology ad Kowledge Maagemet July-December 200, Volume 2, No. 2, pp. 3-35 LATERAL TRANSHIPMENT-A TECHNIQUE FOR INVENTORY CONTROL IN MULTI RETAILER SUPPLY CHAIN SYSTEM Dharamvr

More information

Lecture 7. Norms and Condition Numbers

Lecture 7. Norms and Condition Numbers Lecture 7 Norms ad Codto Numbers To dscuss the errors umerca probems vovg vectors, t s usefu to empo orms. Vector Norm O a vector space V, a orm s a fucto from V to the set of o-egatve reas that obes three

More information

A Parallel Transmission Remote Backup System

A Parallel Transmission Remote Backup System 2012 2d Iteratoal Coferece o Idustral Techology ad Maagemet (ICITM 2012) IPCSIT vol 49 (2012) (2012) IACSIT Press, Sgapore DOI: 107763/IPCSIT2012V495 2 A Parallel Trasmsso Remote Backup System Che Yu College

More information

CSSE463: Image Recognition Day 27

CSSE463: Image Recognition Day 27 CSSE463: Image Recogto Da 27 Ths week Toda: Alcatos of PCA Suda ght: roject las ad relm work due Questos? Prcal Comoets Aalss weght grth c ( )( ) ( )( ( )( ) ) heght sze Gve a set of samles, fd the drecto(s)

More information

CHAPTER 2. Time Value of Money 6-1

CHAPTER 2. Time Value of Money 6-1 CHAPTER 2 Tme Value of Moey 6- Tme Value of Moey (TVM) Tme Les Future value & Preset value Rates of retur Autes & Perpetutes Ueve cash Flow Streams Amortzato 6-2 Tme les 0 2 3 % CF 0 CF CF 2 CF 3 Show

More information

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT ESTYLF08, Cuecas Meras (Meres - Lagreo), 7-9 de Septembre de 2008 DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT José M. Mergó Aa M. Gl-Lafuete Departmet of Busess Admstrato, Uversty of Barceloa

More information

USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT

USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT USEFULNESS OF BOOTSTRAPPING IN PORTFOLIO MANAGEMENT Radovaov Bors Faculty of Ecoomcs Subotca Segedsk put 9-11 Subotca 24000 E-mal: radovaovb@ef.us.ac.rs Marckć Aleksadra Faculty of Ecoomcs Subotca Segedsk

More information

Numerical Comparisons of Quality Control Charts for Variables

Numerical Comparisons of Quality Control Charts for Variables Global Vrtual Coferece Aprl, 8. - 2. 203 Nuercal Coparsos of Qualty Cotrol Charts for Varables J.F. Muñoz-Rosas, M.N. Pérez-Aróstegu Uversty of Graada Facultad de Cecas Ecoócas y Epresarales Graada, pa

More information

Modeling of Router-based Request Redirection for Content Distribution Network

Modeling of Router-based Request Redirection for Content Distribution Network Iteratoal Joural of Computer Applcatos (0975 8887) Modelg of Router-based Request Redrecto for Cotet Dstrbuto Network Erw Harahap, Jaaka Wjekoo, Rajtha Teekoo, Fumto Yamaguch, Shch Ishda, Hroak Nsh Hroak

More information

RQM: A new rate-based active queue management algorithm

RQM: A new rate-based active queue management algorithm : A ew rate-based actve queue maagemet algorthm Jeff Edmods, Suprakash Datta, Patrck Dymod, Kashf Al Computer Scece ad Egeerg Departmet, York Uversty, Toroto, Caada Abstract I ths paper, we propose a ew

More information

Software Aging Prediction based on Extreme Learning Machine

Software Aging Prediction based on Extreme Learning Machine TELKOMNIKA, Vol.11, No.11, November 2013, pp. 6547~6555 e-issn: 2087-278X 6547 Software Agg Predcto based o Extreme Learg Mache Xaozh Du 1, Hum Lu* 2, Gag Lu 2 1 School of Software Egeerg, X a Jaotog Uversty,

More information

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning CIS63 - Artfcal Itellgece Logstc regresso Vasleos Megalookoomou some materal adopted from otes b M. Hauskrecht Supervsed learg Data: D { d d.. d} a set of eamples d < > s put vector ad s desred output

More information

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts Optmal replacemet ad overhaul decsos wth mperfect mateace ad warraty cotracts R. Pascual Departmet of Mechacal Egeerg, Uversdad de Chle, Caslla 2777, Satago, Chle Phoe: +56-2-6784591 Fax:+56-2-689657 rpascual@g.uchle.cl

More information

MDM 4U PRACTICE EXAMINATION

MDM 4U PRACTICE EXAMINATION MDM 4U RCTICE EXMINTION Ths s a ractce eam. It does ot cover all the materal ths course ad should ot be the oly revew that you do rearato for your fal eam. Your eam may cota questos that do ot aear o ths

More information

Banking (Early Repayment of Housing Loans) Order, 5762 2002 1

Banking (Early Repayment of Housing Loans) Order, 5762 2002 1 akg (Early Repaymet of Housg Loas) Order, 5762 2002 y vrtue of the power vested me uder Secto 3 of the akg Ordace 94 (hereafter, the Ordace ), followg cosultato wth the Commttee, ad wth the approval of

More information

We investigate a simple adaptive approach to optimizing seat protection levels in airline

We investigate a simple adaptive approach to optimizing seat protection levels in airline Reveue Maagemet Wthout Forecastg or Optmzato: A Adaptve Algorthm for Determg Arle Seat Protecto Levels Garrett va Ryz Jeff McGll Graduate School of Busess, Columba Uversty, New York, New York 10027 School

More information

Analysis of real underkeel clearance for Świnoujście Szczecin waterway in years 2009 2011

Analysis of real underkeel clearance for Świnoujście Szczecin waterway in years 2009 2011 Scetfc Jourals Martme Uversty of Szczec Zeszyty Naukowe Akadema Morska w Szczece 2012, 32(104) z. 2 pp. 162 166 2012, 32(104) z. 2 s. 162 166 Aalyss of real uderkeel clearace for Śwoujśce Szczec waterway

More information

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM 28-30 August, 2013 Sarawak, Malaysa. Uverst Utara Malaysa (http://www.uum.edu.my ) ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM Rosshary Abd. Rahma 1 ad Razam Raml 2 1,2 Uverst Utara

More information

Agent-based modeling and simulation of multiproject

Agent-based modeling and simulation of multiproject Aget-based modelg ad smulato of multproject schedulg José Alberto Araúzo, Javer Pajares, Adolfo Lopez- Paredes Socal Systems Egeerg Cetre (INSISOC) Uversty of Valladold Valladold (Spa) {arauzo,pajares,adolfo}ssoc.es

More information

MODELLING OF STOCK PRICES BY THE MARKOV CHAIN MONTE CARLO METHOD

MODELLING OF STOCK PRICES BY THE MARKOV CHAIN MONTE CARLO METHOD ISSN 8-80 (prt) ISSN 8-8038 (ole) INTELEKTINĖ EKONOMIKA INTELLECTUAL ECONOMICS 0, Vol. 5, No. (0), p. 44 56 MODELLING OF STOCK PRICES BY THE MARKOV CHAIN MONTE CARLO METHOD Matas LANDAUSKAS Kauas Uversty

More information

Incorporating demand shifters in the Almost Ideal demand system

Incorporating demand shifters in the Almost Ideal demand system Ecoomcs Letters 70 (2001) 73 78 www.elsever.com/ locate/ ecobase Icorporatg demad shfters the Almost Ideal demad system Jula M. Alsto, James A. Chalfat *, Ncholas E. Pggott a,1 1 a, b a Departmet of Agrcultural

More information

On formula to compute primes and the n th prime

On formula to compute primes and the n th prime Joural's Ttle, Vol., 00, o., - O formula to compute prmes ad the th prme Issam Kaddoura Lebaese Iteratoal Uversty Faculty of Arts ad ceces, Lebao Emal: ssam.addoura@lu.edu.lb amh Abdul-Nab Lebaese Iteratoal

More information

Suspicious Transaction Detection for Anti-Money Laundering

Suspicious Transaction Detection for Anti-Money Laundering Vol.8, No. (014), pp.157-166 http://dx.do.org/10.1457/jsa.014.8..16 Suspcous Trasacto Detecto for At-Moey Lauderg Xgrog Luo Vocatoal ad techcal college Esh Esh, Hube, Cha es_lxr@16.com Abstract Moey lauderg

More information

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy

Capacitated Production Planning and Inventory Control when Demand is Unpredictable for Most Items: The No B/C Strategy SCHOOL OF OPERATIONS RESEARCH AND INDUSTRIAL ENGINEERING COLLEGE OF ENGINEERING CORNELL UNIVERSITY ITHACA, NY 4853-380 TECHNICAL REPORT Jue 200 Capactated Producto Plag ad Ivetory Cotrol whe Demad s Upredctable

More information

The paper presents Constant Rebalanced Portfolio first introduced by Thomas

The paper presents Constant Rebalanced Portfolio first introduced by Thomas Itroducto The paper presets Costat Rebalaced Portfolo frst troduced by Thomas Cover. There are several weakesses of ths approach. Oe s that t s extremely hard to fd the optmal weghts ad the secod weakess

More information

Common p-belief: The General Case

Common p-belief: The General Case GAMES AND ECONOMIC BEHAVIOR 8, 738 997 ARTICLE NO. GA97053 Commo p-belef: The Geeral Case Atsush Kaj* ad Stephe Morrs Departmet of Ecoomcs, Uersty of Pesylaa Receved February, 995 We develop belef operators

More information

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research 02420 Jorvas, Finland. Michael Meyer Ericsson Research, Germany

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research 02420 Jorvas, Finland. Michael Meyer Ericsson Research, Germany ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS Jae Pesa Erco Research 4 Jorvas, Flad Mchael Meyer Erco Research, Germay Abstract Ths paper proposes a farly complex model to aalyze the performace of

More information