An Efficient Greedy Method for Unsupervised Feature Selection

Size: px
Start display at page:

Download "An Efficient Greedy Method for Unsupervised Feature Selection"

Transcription

1 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng An Effcent Greedy Method for Unsupervsed Feature Seecton Ahmed K. Farahat A Ghods Mohamed S. Kame Unversty of Wateroo Wateroo, Ontaro, Canada N2L 3G1 Ema: {afarahat, aghodsb, mkame}@uwateroo.ca Abstract In data mnng appcatons, data nstances are typcay descrbed by a huge number of features. Most of these features are rreevant or redundant, whch negatvey affects the effcency and effectveness of dfferent earnng agorthms. he seecton of reevant features s a cruca task whch can be used to aow a better understandng of data or mprove the performance of other earnng tasks. Athough the seecton of reevant features has been extensvey studed n supervsed earnng, feature seecton wth the absence of cass abes s st a chaengng task. hs paper proposes a nove method for unsupervsed feature seecton, whch effcenty seects features n a greedy manner. he paper frst defnes an effectve crteron for unsupervsed feature seecton whch measures the reconstructon error of the data matrx based on the seected subset of features. he paper then presents a nove agorthm for greedy mnmzng the reconstructon error based on the features seected so far. he greedy agorthm s based on an effcent recursve formua for cacuatng the reconstructon error. Experments on rea data sets demonstrate the effectveness of the proposed agorthm n comparson to the state-of-the-art methods for unsupervsed feature seecton. Keywords-Feature Seecton; Greedy Agorthms; Unsupervsed Learnng I. INRODUCION Data nstances are typcay descrbed by a huge number of features. Most of these features are ether redundant, or rreevant to the data mnng task at hand. Havng a arge number of redundant and rreevant features negatvey affects the performance of the underyng earnng agorthms, and makes them more computatonay demandng. herefore, reducng the dmensonaty of the data s a fundamenta task for machne earnng and data mnng appcatons. hroughout past years, two approaches have been proposed for dmenson reducton; feature seecton, and feature extracton. Feature seecton aso known as varabe seecton or subset seecton) searches for a reevant subset of exstng features, whe feature extracton aso known as feature transformaton) earns a new set of features whch combnes exstng features. hese methods have been empoyed wth both supervsed and unsupervsed earnng, where n the case of supervsed earnng cass abes are used to gude the seecton or extracton of features. Feature extracton methods produce a set of contnuous vectors whch represent data nstances n the space of the extracted features. Accordngy, most of these methods obtan unque soutons n poynoma tme, whch make these methods more attractve n terms of computatona compexty. On the other hand, feature seecton s a combnatora optmzaton probem whch s NP-hard, and most feature seecton methods depend on heurstcs to obtan a subset of reevant features n a manageabe tme. Nevertheess, feature extracton methods usuay produce features whch are dffcut to nterpret, and accordngy feature seecton s more appeang n appcatons where understandng the meanng of features s cruca for data anayss. Feature seecton methods can be categorzed nto wrapper and fter methods. Wrapper methods wrap feature seecton around the earnng process and search for features whch enhance the performance of the earnng task. Fter methods, on the other hand, anayze the ntrnsc propertes of the data, and seect hghy-ranked features accordng to some crteron before dong the earnng task. Wrapper methods are computatonay more compex than fter methods as they depend on depoyng the earnng modes many tmes unt a subset of reevant features are found. hs paper presents an effectve fter method for unsupervsed feature seecton. he method s based on a nove crteron for feature seecton whch measures the reconstructon error of the data matrx based on the subset of seected features. he paper presents a nove recursve formua for cacuatng the crteron functon as we as an effcent greedy agorthm to seect features. he greedy agorthm seects at each teraton the most representatve feature among the remanng features, and then emnates the effect of the seected features from the data matrx. hs step makes t ess key for the agorthm to seect features that are smar to prevousy seected features, whch accordngy reduces the redundancy between the seected features. In addton, the use of the recursve crteron makes the agorthm computatonay feasbe and memory effcent compared to the state of the art methods for unsupervsed feature seecton. he rest of ths paper s organzed as foows. Secton II defnes the notatons used throughout the paper. Secton III dscusses prevous work on fter methods for unsupervsed feature seecton. Secton IV presents the proposed feature seecton crteron. Secton V presents a nove recursve formua for the feature seecton crteron. Secton VI proposes

2 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng an effectve greedy agorthm for feature seecton as we as memory and tme effcent varants of the agorthm. Secton VII presents an emprca evauaton of the proposed method. Fnay, Secton VIII concudes the paper. II. NOAIONS hroughout the paper, scaars, vectors, sets, and matrces are shown n sma, sma bod tac, scrpt, and capta etters, respectvey. In addton, the foowng notatons are used. For a vector x R p : x -th eement of x. x the Eucdean norm 2 -norm) of x. For a matrx A R p q : A j, j)-th entry of A. A : -th row of A. A :j j-th coumn of A. A S: the sub-matrx of A whch conssts of the set S of rows. A :S the sub-matrx of A whch conssts of the set S of coumns. Ã a ow rank approxmaton of A. Ã S a rank-k approxmaton of A based on the set S of coumns, where S = k. A F the Frobenus norm of A: A F = Σ,j A 2 j III. PREVIOUS WORK Many fter methods for unsupervsed feature seecton depend on the Prncpa Component Anayss PCA) method [1] to search for the most representatve features. PCA s the best-known method for unsupervsed feature extracton whch fnds drectons wth maxmum varance n the feature space namey prncpa components). he prncpa components are aso those drectons that acheve the mnmum reconstructon error for the data matrx. Joffe [1] suggests dfferent agorthms to use PCA for unsupervsed feature seecton. In these agorthms, features are frst assocated wth prncpa components based on the absoute vaue of ther coeffcents, and then features correspondng to the frst or ast) prncpa components are seected or deeted). hs can be done once or recursvey.e., by frst seectng or deetng some features and then recomputng the prncpa components based on the remanng features). Smary, sparse PCA [2], a varant of PCA whch produces sparse prncpa components, can aso be used for feature seecton. hs can be done by seectng for each prncpa component the subset of features wth non-zero coeffcents. However, Masae et a. [3] showed that these sparse coeffcents may be dstrbuted across dfferent features and accordngy are not aways usefu for feature seecton. Another teratve approach s suggested by Cu and Dy [4], n whch the feature that s most correated wth the frst prncpa component s seected, and then other features are projected onto the drecton orthogona to that feature. hese steps are repeated unt the requred number of features are seected. Lu et a. [5] suggests a dfferent PCA-based approach whch appes k-means custerng to the prncpa components, and then seects the features that are cose to custers centrods. Boutsds et a. [6], [7] propose a feature seecton method that randomy sampes features based on probabtes cacuated usng the k-eadng snguar vaues of the data matrx. In [6], random sampng s used to reduce the number of canddate features, and then the requred number of features s seected by appyng a compex subset seecton agorthm on the reduced matrx. In [7], the authors derve a theoretca guarantee for the error of the k-means custerng when features are seected usng random sampng. However, theoretca guarantees for other custerng agorthms were not expored n ths work. Recenty, Masae et a. [3] propose an agorthm caed Convex Prncpa Feature Seecton CPFS). CPFS formuates feature seecton as a convex contnuous optmzaton probem whch mnmzes the mean-squaredreconstructon error of the data matrx a PCA-ke crteron) wth sparsty constrants. hs s a quadratc programmng probem wth near constrants, whch was soved usng a projected quas-newton method. Another category of unsupervsed feature seecton methods are based on seectng features that preserve smartes between data nstances. Most of these methods frst construct a k nearest neghbor graph between data nstances, and then seect features that preserve the structure of that graph. Exampes for these methods ncude the Lapacan score ) [8] and the spectra feature seecton method a.k.a., ) [9]. he Lapacan score ) [8] cacuates a score for each feature based on the graph Lapacan and degree matrces. hs score quantfes how each feature preserves smarty between data nstances and ther neghbors n the graph. Spectra feature seecton [9] extends ths dea and presents a genera framework for rankng features on a k nearest neghbor graph. Some methods drecty seect features whch preserve the custer structure of the data. he Q α agorthm [10] measures the goodness of a subset of features based on the custerng quaty namey custer coherence) when data s represented usng ony those features. he authors defne a feature weght vector, and propose an teratve agorthm that aternates between cacuatng the custer coherence based on current weght vector and estmatng a new weght vector that maxmzes that coherence. hs agorthm converges to a oca mnmum of the custer coherence and produces a sparse weght vector that ndcates whch features shoud be seected. Recenty, Ca et a. [11] propose an agorthm caed Mut-Custer Feature Seecton ) whch seects a subset of features such that the mut-custer structure of the data s preserved. o acheve that, the authors empoy a method smar to spectra custerng [12], whch frst constructs a k nearest neghbor graph over the data nstances, and then soves a generazed egenprobem over the graph Lapacan and degree matrces. After that, for each

3 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng egenvector, an L1-reguarzed regresson probem s soved to represent each egenvector usng a sparse combnaton of features. Features are then assgned scores based on these coeffcents and hghy scored features are seected. he authors show expermentay that the agorthm outperforms Lapacan score SC) and the Q α agorthm. Another we-known approach for unsupervsed feature seecton s the Feature Seecton usng Feature Smarty ) method suggested by Mtra et a. [13]. he method groups features nto custers and then seects a representatve feature for each custer. o group features, the agorthm starts by cacuatng parwse smartes between features, and then t constructs a k nearest neghbor graph over the features. he agorthm then seects the feature wth the most compact neghborhood and removes a ts neghbors. hs process s repeated on the remanng features unt a features are ether seected or removed. he authors aso suggested a new feature smarty measure, namey maxma nformaton compresson, whch quantfes the mnmum amount of nformaton oss when one feature s represented by the other. In comparson to prevous work, the greedy feature seecton method proposed n ths paper uses a PCA-ke crteron whch mnmzes the reconstructon error of the data matrx based on the seected subset of features. In contrast to tradtona PCA-based methods, the proposed agorthm does not cacuate the prncpa components, whch s computatonay demandng. Unke Lapacan score ) [8] and ts extenson [9], the greedy feature seecton method does not depend on cacuatng parwse smarty between nstances. It aso does not cacuate egenvaue decomposton over the smarty matrx as the Q α agorthm [10] and Mut-Custer Feature Seecton ) [11] do. he feature seecton crteron presented n ths paper s smar to that of Convex Prncpa Feature Seecton CPFS) [3] as both mnmze the reconstructon error of the data matrx. Whe the method presented here uses a greedy agorthm to mnmze a dscrete optmzaton probem, CPFS soves a quadratc programmng probem wth sparsty constrants. In addton, the number of features seected by the CPFS depends on a reguarzaton parameter λ whch s dffcut to tune. Smar to the method proposed by Cu and Dy [4], the method presented n ths paper removes the effect of each seected feature by projectng other features to the drecton orthogona to that seected feature. However, the method proposed by Cu and Dy s computatonay very compex as t requres the cacuaton of the frst prncpa component for the whoe matrx after each teraton. he Feature Seecton usng Feature Smarty ) [13] method empoys a smar greedy approach whch seects the most representatve feature, and then emnates ts neghbors n the feature smarty graph. he method, however, depends on a computatonay compex measure for cacuatng smarty between features. As shown n Secton VII, experments on rea data sets show that the proposed agorthm outperforms the Feature Seecton usng Feature Smarty ) method [13], Lapacan score SC) [8], and Mut-Custer Feature Seecton ) [11] when apped wth dfferent custerng agorthms. IV. FEAURE SELECION CRIERION hs secton defnes a nove crteron for unsupervsed feature seecton. he crteron measures the reconstructon error of data matrx based on the seected subset of features. he goa of the proposed feature seecton agorthm s to seect a subset of features that mnmzes ths reconstructon error. Defnton 1: Unsupervsed Feature Seecton Crteron) Let A be an m n data matrx whose rows represent the set of data nstances and whose coumns represent the set of features. he feature seecton crteron s defned as: F S) = A P S) A 2 F where S s the set of the ndces of seected features, and P S) s an m m projecton matrx whch projects the coumns of A onto the span of the set S of coumns. he crteron F S) represents the sum of squared errors between orgna data matrx A and ts rank-k approxmaton based on the seected set of features where k = S ): Ã S = P S) A. 1) he projecton matrx P S) can be cacuated as: P S) = A :S A :S A :S ) 1 A :S 2) where A :S s the sub-matrx of A whch conssts of the coumns correspondng to S. It shoud be noted that f the subset of features S s known, the projecton matrx P S) s the cosed-form souton of the east-squares probem whch mnmzes F S). he goa of the feature seecton agorthm presented n ths paper s to seect a subset S of features such that F S) s mnmzed. Probem 1: Unsupervsed Feature Seecton) Fnd a subset of features L such that, L = arg mn F S). S hs s an NP-hard combnatora optmzaton probem. In Secton V, a recursve formua for the seecton crteron s presented. hs formua aows the deveopment of an effcent agorthm to greedy mnmze F S). he greedy agorthm s presented n Secton VI. V. RECURSIVE SELECION CRIERION In ths secton, a recursve formua s derved for the feature seecton crteron presented n Secton IV. hs formua s based on a recursve formua for the projecton matrx P S) whch can be derved as foows.

4 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng Lemma 1: Gven a set of features S. For any P S, P S) = P P) + R R) where R R) s a projecton matrx whch projects the coumns of E = A P P) A onto the span of the subset R = S \ P of coumns: R R) = E :R E :R E :R ) 1 E :R. Proof: Defne a matrx B = A :S A :S whch represents the nner-product over the coumns of the sub-matrx A :S. he projecton matrx P S) can be wrtten as: P S) = A :S B 1 A :S 3) Wthout oss of generaty, the coumns and rows of A :S and B n Eq. 3) can be rearranged such that the frst sets of rows and coumns correspond to P: A :S = [ [ ] ] BPP B A :P A :R, B = PR BPR B RR where B PP = A :P A :P, B PR = A :P A :R and B RR = A :R A :R. Let B RR B PR B 1 PP B PR be the Schur compement [14] of B PP n B. Use the bock-wse nverson formua [14] of B 1 and substtute wth A :S and B 1 n Eq. 3): P S) = [ A :P A :R ] [ B 1 PP + B 1 PP B PRS 1 B PR B 1 PP [ A :P A :R ] S 1 B PR B 1 PP he rght-hand sde can be smpfed to: B 1 PP B PRS 1 S 1 P S) = A :P B 1 PP A :P + A :R A :P B 1 PP B ) PR S 1 A :R BPRB 1 ] PP A :P he frst term of Eq. 4) s the projecton matrx whch projects the coumns of A onto the span of the subset P of coumns: P P) = A :P B 1 PP A :P. he second term can be smpfed as foows. Let E be an m n resdua matrx whch s cacuated as: E = A P P) A. It can be shown that E :R = A :R A :P B 1 PP B PR, and S = E:R E :R. Hence, the second term of Eq. 4) s the projecton matrx whch projects the coumns of E onto the span of the subset R of coumns: R R) ) = E :R E 1 :R E :R E :R. 5) hs proves that P S) can be wrtten n terms of P P) and R as: P S) = P P) + R R) hs means that projecton matrx P S) can be constructed n a recursve manner by frst cacuatng the projecton matrx whch projects the coumns of A onto the span of the subset P of coumns, and then cacuatng the projecton matrx whch projects the coumns of the resdua matrx 4) ) onto the span of the remanng coumns. Based on ths emma, a recursve formua can be deveoped for ÃS. Coroary 1: Gven a matrx A and a subset of coumns S. For any P S, Ã S = ÃP + ẼR where E = A P P) A, and ẼR s the ow-rank approxmaton of E based on the subset R = S \ P of coumns. Proof: Usng Lemma 1), and substtutng wth P S) n Eq. 1) gves: Ã S = P P) A + E :R E :R E :R ) 1 E :R A 6) he frst term s the ow-rank approxmaton of A based on P: Ã P = P P) A. he second term s equa to ẼR as E :R A = E :R E. o prove that, mutpyng E :R by E = A P P) A gves: E :RE = E :RA E :RP P) A. Usng E :R = A :R P P) A :R, the expresson E :R P P) can be wrtten as: E :RP P) = A :RP P) A :RP P) P P). hs s equa to 0 as P P) P P) = P P) A property of projecton matrces). hs means that E:R A = E :RE. Substtutng E:R A wth E :RE n Eq. 6) proves the coroary. Based on Coroary 1), a recursve formua for the feature seecton crteron can be deveoped as foows. heorem 2: Gven a set of features S. For any P S, F S) = F P) ẼR 2 F where E = A P P) A, and ẼR s the ow-rank approxmaton of E based on the subset R = S \ P of coumns. Proof: Substtutng wth P S) n Eq. 1) gves: F S) = A ÃS 2 F = A ÃP ẼR 2 F = E ẼR 2 F Usng the reaton between the Frobenus norm and the trace functon 1, the rght-hand sde can be expressed as: ) ) ) E ẼR 2 F = trace E ẼR) E ẼR = tracee E 2E Ẽ R + Ẽ RẼR) As R R) R R) = R R), the expresson Ẽ RẼR can be wrtten as: Ẽ RẼR = E R R) R R) E = E R R) E = E Ẽ R hs means that: F S) = E ẼR 2 F = tracee E Ẽ R Ẽ R ) = E 2 F ẼR 2 F. Repacng E 2 F wth F P) proves the theorem. he term ẼR 2 F represents the decrease n reconstructon error acheved by addng the subset R of features to P. In 1 A 2 F = tracea A)

5 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng the foowng secton, a nove greedy heurstc s presented to optmze the feature seecton crteron based on ths recursve formua. VI. GREEDY SELECION ALGORIHM hs secton presents an effcent greedy agorthm to optmze the feature seecton crteron presented n Secton IV. he agorthm seects at each teraton one feature such that the reconstructon error for the new set of features s mnmum. hs probem can be formuated as foows. Probem 2: Greedy Feature Seecton) At teraton t, fnd feature such that, = arg mn F S {}) 7) where S s the set of features seected durng the frst t 1 teratons. A naïve mpementaton of the greedy agorthm s to cacuate the reconstructon error for each canddate feature, and then seect the feature wth the smaest error. hs mpementaton s however computatonay very compex as t requres Om 2 n 2 ) foatng-pont operatons per teraton. A more effcent approach s to use the recursve formua for cacuatng the reconstructon error. Usng heorem 2, F S {}) = F S) Ẽ{} 2 F, where E = A ÃS. Snce F S) s a constant for a canddate features, an equvaent crteron s: = arg max Ẽ{} 2 F 8) hs formuaton seects the feature whch acheves the maxmum decrease n reconstructon error. he new objectve functon Ẽ{} can be smpfed as foows: 2 F Ẽ{} 2 ) ) = trace Ẽ {} Ẽ {} = trace E R {}) E F = trace E ) E : E 1 ) : E : E : E = 1 E: E trace E E : E: E ) E E : 2 = : E: E. : hs defnes the foowng smpfed probem. Probem 3: Smpfed Greedy Feature Seecton) At teraton t, fnd feature such that, = arg max E E : 2 E : E : where E = A ÃS, and S s the set of features seected durng the frst t 1 teratons. he computatona compexty of ths seecton crteron s O n 2 m ) per teraton, and t requres O nm) memory to store the resdua of the whoe matrx, E, after each teraton. In the rest of ths secton, two nove technques are proposed to reduce the memory and tme requrements of ths seecton crteron. 9) A. Memory-Effcent Crteron hs secton proposes a memory-effcent agorthm to cacuate the smpfed feature seecton crteron wthout expcty cacuatng and storng the resdua matrx E at each teraton. he agorthm s based on a recursve formua for cacuatng the resdua matrx E. Let S t) denote the set of features seected durng the frst t 1 teratons, E t) denote the resdua matrx at the start of the t-th teraton.e., E t) = A ÃS t)), and t) be the feature seected at teraton t. he foowng emma gves a recursve formua for resdua matrx at the end of teraton t, E t+1). Lemma 2: E t+1) can be cacuated recursvey as: E t+1) = E E :E: E: E E) t). : Proof: Usng Coroary 1, Ã S {} = ÃS + Ẽ{}. Subtractng both sdes from A, and substtutng A ÃS {} and A ÃS wth E t+1) and E t) respectvey gves: E t+1) t) = E Ẽ{}) Usng Eqs 1) and 2), Ẽ {} can be expressed as E: E: E ) :) 1 E: E. Substtutng Ẽ {} wth ths formua n the above equaton proves the emma. Let G be an n n whch represents the nner-products over the coumns of the resdua matrx E: G = E E. he foowng coroary s a drect resut of Lemma 2. Coroary 3: G t+1) can be cacuated recursvey as: G t+1) = G G :G : G ) t). Proof: hs coroary can be proved by substtutng wth E t+1) Lemma 2) n G t+1) = E t+1) E t+1), and usng the fact that E : E: E ) :) 1 E: E: E: E ) :) 1 E: = E : E: E :) 1 E:. o smpfy the dervaton of the memory-effcent agorthm, at teraton t, defne δ = G : and ω = G : / G = δ/ δ. hs means that G t+1) can be cacuated n terms of G t) and ω t) as foows: G t+1) = G ωω ) t), 10) or n terms of A and prevous ω s as: G t+1) = A A t ωω ) r). 11) r=1 δ t) and ω t) can be cacuated n terms of A and prevous ω s as foows: t 1 δ t) = A A : ω r) ω r), r=1 ω t) = δ t) / δ t).

6 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng he smpfed feature seecton crteron can be expressed n terms of G as: = arg max G : 2 G he foowng theorem gves recursve formuas for cacuatng the smpfed feature seecton crteron wthout expcty cacuatng E nor G. heorem 4: Let f = G : 2 and g = G be the numerator and denomnator of the smpfed crteron functon for a feature respectvey, f = [f ] =1..n, and g = [g ] =1..n. hen, ) f t) = f 2 ω A Aω Σ t 2 r=1 ω r) ω ω r))) ) t 1), + ω 2 ω ω) ) t 1). g t) = g ω ω) where represents the Hadamard product operator. Proof: Based on Eq. 10), f t) can be cacuated as: f t) = G : 2) t) = G: ω ω 2) t 1) Smary, g t) = G : ω ω) G : ω ω) ) t 1) = G :G : 2ω G :ω + ω 2 ω 2) t 1) = f 2ω G :ω + ω 2 ω 2) t 1). can be cacuated as: g t) = G t) = g ω 2 12) = G ω 2 ) t 1) ) t 1) 13). Let f = [f ] =1..n and g = [g ] =1..n, f t) and g t) can be expressed as: f t) = f 2 ω Gω) + ω 2 ω ω) ) t 1), g t) = g ω ω)) t 1), 14) where represents the Hadamard product operator, and. s the 2 norm. Based on the recursve formua of G Eq. 11), the term Gω at teraton t 1) can be expressed as: Gω = A A Σ t 2 = A Aω Σ t 2 r=1 r=1 ωω ) ) r) ω ) 15) ω r) ω ω r) Substtute wth Gω n Equaton 14) gves the update formuas for f and g hs means that the greedy crteron can be memoryeffcent by ony mantanng two score varabes for each feature, f and g, and updatng them at each teraton based on ther prevous vaues and the seected features so far. B. Partton-Based Crteron he smpfed feature seecton crteron cacuates, at each teraton, the nner-products between each canddate feature E : and other features E. he computatona compexty of these nner-products s Onm) per canddate feature or On 2 m) per teraton). When the memory-effcent update formuas are used, the computatona compexty s reduced to Onm) per teraton that of cacuatng A Aω). However, the compexty of cacuatng the nta vaue of f s st On 2 m). In order to reduce ths computatona compexty, a nove partton-based crteron s proposed, whch reduces the number of nner products to be cacuated at each teraton. he crteron parttons features nto c n random groups, and seects the feature whch best represents the centrods of these groups. Let P j be the set of feature that beong to the j-th partton, P = {P 1, P 2,...P c } be a random parttonng of features nto c groups, and B be an m c matrx whose eement j-th coumn s the sum of feature vectors that beong to the j-th group: B :j = r P j A :r. he use of the sum functon nstead of mean) weghts each coumn of B wth the sze of the correspondng group. hs avods any bas towards arger groups when cacuatng the sum of nnerproducts. he smpfed seecton crteron can be wrtten as: Probem 4: Smpfed Partton-Based Greedy Feature Seecton) At teraton t, fnd feature such that, = arg max F E : 2 E : E : 16) where E = A ÃS, S s the set of features seected durng the frst t 1 teratons, F :j = r P j E :r, and P = {P 1, P 2,...P c } s a random parttonng of features nto c groups. Smar to E Lemma 2), F can be cacuated n a recursve manner as foows: F t+1) = F E :E: E: E F ) t). : hs means that random parttonng can be done once at the start of the agorthm. After that, F s ntazed to B and then updated recursvey usng the above formua. he computatona compexty of cacuatng B s Onm) f the data matrx s fu. However, ths compexty coud be consderaby reduced f the data matrx s very sparse. Further, a memory-effcent varant of the partton-based agorthm can be deveoped as foows. Let H be an c n matrx whose eement H j s the nner-product of the centrod of the j-th group and the -th feature, weghted wth the sze of the j-th group: H = F E. Smary, H can be cacuated recursvey as foows: H t+1) = H H :G : G ) t).

7 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng Defne γ = H : and υ = H : / G = γ/ δ. H t+1) can be cacuated n terms of H t), υ t) and ω t) as foows: H t+1) = H υω ) t), 17) or n terms of A and prevous ω s and υ s as: H t+1) = B A t υω ) r). 18) r=1 γ t) and υ t) can be cacuated n terms of A, B and prevous ω s and υ s as foows: t 1 γ t) = B A : ω r) υ r), r=1 υ t) = γ t) / δ t). he smpfed partton-based seecton crteron can be expressed n terms of H and G as: = arg max H : 2 G Smar to heorem 4, the foowng theorem derves recursve formuas for the smpfed partton-based crteron functon. heorem 5: Let f = H : 2 and g = G be the numerator and denomnator of the partton-based smpfed crteron functon for a feature respectvey, f = [f ] =1..n, and g = [g ] =1..n. hen, ) f t) = f 2 ω A Bυ Σ t 2 r=1 υ r) υ ω r))) ) t 1), + υ 2 ω ω) ) t 1). g t) = g ω ω) where represents the Hadamard product operator. Proof: he proof s smar to that of heorem 4. It can be easy derved by usng the recursve formua for H : nstead of that for G :. In these update formuas, A B can be cacuated once and then used n dfferent teratons. hs makes the computatona compexty of the new update formuas s Onc) per teraton. Agorthm 1 shows the compete greedy agorthm. he computatona compexty of the agorthm s domnated by that of cacuatng A A : n Step b) whch s of Omn) per teraton. he other compex step s that of cacuatng the nta f, whch s Omnc). However, these steps can be mpemented n an effcent way f the data matrx s sparse. he tota compexty of the agorthm s Omaxmnk, mnc)), where k s the number of features and c s the number of random parttons. Agorthm 1 Greedy Feature Seecton Inputs: Data matrx A, Number of features k Outputs: Seected features S, Steps: 1) Intaze S = { }, Generate a random parttonng P, Cacuate B: B :j = r P j A :r 2) Intaze f 0) = B A : 2, and g 0) = A : A : 3) Repeat t = 1 k: a) = arg max f t) /g t), S = S {} b) δ t) = A A : t 1 r=1 ωr) ω r) c) γ t) = B A : t 1 υ r) d) ω t) = δ t) /, υ t) = γ t) / e) Update f s, g s heorem 5) r=1 ωr) δ t) VII. EXPERIMENS AND RESULS δ t) Experments have been conducted on four benchmark data sets, whose propertes are summarzed n abe I. hese data sets were recenty used by Ca et a. [11] to evauate dfferent feature seecton methods n comparson to the Mut-Custer Feature Seecton ) method 2. In ths secton, seven methods for unsupervsed feature seecton are compared 3 : 1) : s a PCA-based method that seects features assocated wth the frst k prncpa components [1]. It has been shown that by Masae et a. [3] that ths method acheves a ow reconstructon error of the data matrx compared to other PCA-based methods 4. 2) : s the Feature Seecton usng Feature Smarty [13] method wth the maxma nformaton compresson as the feature smarty measure. 3) : s the Lapacan Score ) [8] method. 4) : s the spectra feature seecton method [9] usng a the egenvectors of the graph Lapacan. 5) : s the Mut-Custer Feature Seecton [11] method whch has been shown to outperform other methods that preserve the custer struture of the data. 6) : he basc greedy agorthm presented n ths paper usng recursve update formuas for f and g but wthout random parttonng). 7) : he partton-based greedy agorthm Agorthm 1). 2 Data sets are avaabe at: he foowng mpementatons were used: : pabtra/paper/fsfs.tar.gz : : uns spec.zp : p.m 4 he CPFA method was not ncuded n the comparson as ts mpementaton detas were not competey specfed n [3].

8 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng Smar to prevous work [8], [11], the feature seecton methods were compared based on ther performance n custerng tasks. wo custerng agorthms were used to compare dfferent methods: the we-known k-means agorthm [15], and the state-of-the-art affnty propagaton AP) agorthm [16]. For each feature seecton method, the k- means agorthm s apped to the rows of the data matrx whose coumns are the subset of the seected features. For the affnty propagaton, a dstance matrx s frst cacuated based on the seected subset of features, and then the agorthm s apped to the negatve of ths dstance matrx. he preference vector, whch contros the number of custers, s set to the medan of each coumn of the smarty matrx, as suggested by Frey and Dueck [16]. After the custerng s performed usng the subset of seected features, the custer abes are compared to ground-truth abes provded by human annotators and the Normazed Mutua Informaton NMI) [17] between custerng abes and the cass abes s cacuated. he custerng performance wth a features s aso cacuated and used as a basene. In addton to custerng performance, the run tmes of dfferent feature seecton methods are compared. hs run tme ncudes the tme for seectng features ony, and not the run tme of the custerng agorthm. Fgures 1 and 2 show the custerng performance for the k-means and affnty propagaton AP) agorthms respectvey 5. It can be observed from resuts that the greedy feature seecton methods and ) outperforms the,,, and methods for amost a data sets. he method outperforms for many data sets, whe ts partton-based varant,, outperforms for some data sets and shows comparabe performance for others. Fgure 3 shows the run tmes of dfferent feature seecton methods. It can be observed that s computatonay more expensve than other methods as t depends on cacuatng compex smartes between features. he method, however effcent, s more computatonay compex than Lapacan score ) and the proposed greedy methods. It can be aso observed that for data sets wth arge number of nstances ke USPS), the method, the Lapacan score ) and the become very computatonay demandng as they depend on cacuatng parwse smartes between nstances. Fgure 4 shows the run tmes of the PCA- LRG and Lapacan score ) methods n comparson to the proposed greedy methods. It can be observed that the compexty of the Lapacan score ncreases as the sze of the data set ncreases. It can aso be observed that the parttonbased greedy feature seecton ) s more effcent than the basc greedy feature seecton ). 5 he mpementatons of and AP do not scae to run on the USPS data set on the used smuaton machne. abe I HE PROPERIES OF DAA SES USED O EVALUAE DIFFEREN FEAURE SELECION MEHODS [11]. NMI %) NMI %) NMI %) NMI %) Data set # Instances # Features # Casses ORL COIL ISOLE USPS ORL COIL ISOLE USPS Fgure 1. he k-means custerng performance of dfferent feature seecton methods.

9 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng ORL COIL ISOLE NMI %) Fgure 2. NMI %) he affnty propagaton AP) custerng performance of dfferent feature seecton methods. NMI %) VIII. CONCLUSIONS hs paper presents a nove greedy agorthm for unsupervsed feature seecton. he agorthm optmzes a feature seecton crteron whch measures the reconstructon error of the data matrx based on the subset of seected features. he paper proposes a nove recursve formua for cacuatng the feature seecton crteron, whch s then empoyed to deveop an effcent greedy agorthm for feature seecton. In addton, two memory and tme effcent varants of the feature seecton agorthm are proposed. It has been emprcay shown that the proposed agorthm acheves better custerng performance compared to state-of-the-art methods for feature seecton, and s ess computatonay demandng than methods that gve comparabe custerng performance. REFERENCES [1] I. Joffe, Prncpa Component Anayss, 2nd ed. Sprnger, 02. [2] H. Zou,. Haste, and R. bshran, Sparse prncpa component anayss, J. Comput. Graph. Stat., vo. 15, no. 2, pp , 06. [3] M. Masae, Y. Yan, Y. Cu, G. Fung, and J. Dy, Convex prncpa feature seecton, n Proceedngs of SIAM Internatona Conference on Data Mnng SDM), 10, pp [4] Y. Cu and J. Dy, Orthogona prncpa feature seecton, n the Sparse Optmzaton and Varabe Seecton Workshop at the Internatona Conference on Machne Learnng ICML), 08. [5] Y. Lu, I. Cohen, X. Zhou, and Q. an, Feature seecton usng prncpa feature anayss, n Proceedngs of the 15th Internatona Conference on Mutmeda. New York, NY, USA: ACM, 07, pp [6] C. Boutsds, M. W. Mahoney, and P. Drneas, Unsupervsed feature seecton for prncpa components anayss, n Proceedng of the 14th ACM SIGKDD Internatona Conference on Knowedge Dscovery and Data Mnng KDD 08). New York, NY, USA: ACM, 08, pp [7] C. Boutsds, M. Mahoney, and P. Drneas, Unsupervsed feature seecton for the k-means custerng probem, n Advances n Neura Informaton Processng Systems 22, Y. Bengo, D. Schuurmans, J. Lafferty, C. K. I. Wams, and A. Cuotta, Eds., 09, pp [8] X. He, D. Ca, and P. Nyog, Lapacan score for feature seecton, n Advances n Neura Informaton Processng Systems 18, Y. Wess, B. Schökopf, and J. Patt, Eds. Cambrdge, MA, USA: MI Press, 06, pp [9] Z. Zhao and H. Lu, Spectra feature seecton for supervsed and unsupervsed earnng, n Proceedngs of the 24th Internatona Conference on Machne earnng. New York, NY, USA: ACM, 07, pp [10] L. Wof and A. Shashua, Feature seecton for unsupervsed and supervsed nference: he emergence of sparsty n a weght-based approach, J. Mach. Learn. Res., vo. 6, pp , 05. [11] D. Ca, C. Zhang, and X. He, Unsupervsed feature seecton for mut-custer data, n Proceedngs of the 16th ACM SIGKDD Internatona Conference on Knowedge Dscovery and Data Mnng KDD 10). New York, NY, USA: ACM, 10, pp [12] A. Ng, M. Jordan, and Y. Wess, On spectra custerng: Anayss and an agorthm, n Advances n Neura Informaton Processng Systems 14 NIPS 01). Cambrdge, MA, USA: MI Press, 01, pp [13] P. Mtra, C. Murthy, and S. Pa, Unsupervsed feature seecton usng feature smarty, IEEE rans. Pattern Ana. Mach. Inte., vo. 24, no. 3, pp , 02. [14] H. Lütkepoh, Handbook of Matrces. John Wey & Sons Inc, [15] A. Jan and R. Dubes, Agorthms for Custerng Data. Prentce-Ha, Inc., [16] B. Frey and D. Dueck, Custerng by passng messages between data ponts, Scence, vo. 315, no. 5814, p. 972, 07. [17] A. Streh and J. Ghosh, Custer ensembes a knowedge reuse framework for combnng mutpe parttons, J. Mach. Learn. Res., vo. 3, pp , 03.

10 hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng ORL ORL Run me Seconds) 10 Run me Seconds) Run me Seconds) 80 COIL Run me Seconds) COIL Run me Seconds) Run me Seconds) ISOLE USPS Run me Seconds) Run me Seconds) ISOLE USPS Fgure 3. he run tmes of dfferent feature seecton methods. Fgure 4. he run tmes of and methods n comparson to the proposed greedy agorthms.

An Ensemble Classification Framework to Evolving Data Streams

An Ensemble Classification Framework to Evolving Data Streams Internatona Journa of Scence and Research (IJSR) ISSN (Onne): 39-7064 An Ensembe Cassfcaton Framework to Evovng Data Streams Naga Chthra Dev. R MCA, (M.Ph), Sr Jayendra Saraswathy Maha Vdyaaya, Coege of

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Multi-agent System for Custom Relationship Management with SVMs Tool

Multi-agent System for Custom Relationship Management with SVMs Tool Mut-agent System for Custom Reatonshp Management wth SVMs oo Yanshan Xao, Bo Lu, 3, Dan Luo, and Longbng Cao Guangzhou Asan Games Organzng Commttee, Guangzhou 5063, P.R. Chna Facuty of Informaton echnoogy,

More information

Clustering based Two-Stage Text Classification Requiring Minimal Training Data

Clustering based Two-Stage Text Classification Requiring Minimal Training Data OI: 10.2298/CSIS120130044Z Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata Xue Zhang 1,2 and Wangxn Xao 3,4 1 Key Laboratory of Hgh Confdence Software Technooges, Mnstry of Educaton, Pekng

More information

An Efficient Job Scheduling for MapReduce Clusters

An Efficient Job Scheduling for MapReduce Clusters Internatona Journa of Future Generaton ommuncaton and Networkng, pp. 391-398 http://dx.do.org/10.14257/jfgcn.2015.8.2.32 An Effcent Job Schedung for MapReduce usters Jun Lu 1, Tanshu Wu 1, and Mng We Ln

More information

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL SIMPLIFYING NDA PROGRAMMING WITH PROt SQL Aeen L. Yam, Besseaar Assocates, Prnceton, NJ ABSRACf The programmng of New Drug Appcaton (NDA) Integrated Summary of Safety (ISS) usuay nvoves obtanng patent

More information

Distributed Column Subset Selection on MapReduce

Distributed Column Subset Selection on MapReduce Dstrbuted Column Subset Selecton on MapReduce Ahmed K. arahat Ahmed Elgohary Al Ghods Mohamed S. Kamel Unversty of Waterloo Waterloo, Ontaro, Canada N2L 3G1 Emal: {afarahat, aelgohary, aghodsb, mkamel}@uwaterloo.ca

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers Approxmaton Agorthms for Data Dstrbuton wth Load Baancng of Web Servers L-Chuan Chen Networkng and Communcatons Department The MITRE Corporaton McLean, VA 22102 chen@mtreorg Hyeong-Ah Cho Department of

More information

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling Predctng Advertser Bddng Behavors n Sponsored Search by Ratonaty Modeng Hafeng Xu Centre for Computatona Mathematcs n Industry and Commerce Unversty of Wateroo Wateroo, ON, Canada [email protected] Dy

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties* Predctve Contro of a Smart Grd: A Dstrbuted Optmzaton Agorthm wth Centrazed Performance Propertes* Phpp Braun, Lars Grüne, Chrstopher M. Keett 2, Steven R. Weer 2, and Kar Worthmann 3 Abstract The authors

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks A Smpe Congeston-Aware Agorthm for Load Baancng n Datacenter Networs Mehrnoosh Shafee, and Javad Ghader, Coumba Unversty Abstract We study the probem of oad baancng n datacenter networs, namey, assgnng

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis Proceedngs of the Twenty-Eghth AAAI Conference on Artfca Integence Adapte Mut-Compostonaty for Recurse Neura Modes wth Appcatons to Sentment Anayss L Dong Furu We Mng Zhou Ke Xu State Key Lab of Software

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

A Fast Incremental Spectral Clustering for Large Data Sets

A Fast Incremental Spectral Clustering for Large Data Sets 2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School

More information

Off-line and on-line scheduling on heterogeneous master-slave platforms

Off-line and on-line scheduling on heterogeneous master-slave platforms Laboratore de Informatque du Paraésme Écoe Normae Supéreure de Lyon Unté Mxte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Off-ne and on-ne schedung on heterogeneous master-save patforms Jean-Franços

More information

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center 56 IEICE TRANS. COMMUN., VOL.E96 B, NO. JANUARY 203 PAPER Speca Secton on Networ Vrtuazaton, and Fuson Patform of Computng and Networng Dynamc Vrtua Networ Aocaton for OpenFow Based Coud Resdent Data Center

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population Cardovascuar Event Rsk Assessment Fuson of Indvdua Rsk Assessment Toos Apped to the Portuguese Popuaton S. Paredes, T. Rocha, P. de Carvaho, J. Henrques, J. Moras*, J. Ferrera, M. Mendes Abstract Cardovascuar

More information

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

TCP/IP Interaction Based on Congestion Price: Stability and Optimality TCP/IP Interacton Based on Congeston Prce: Stabty and Optmaty Jayue He Eectrca Engneerng Prnceton Unversty Ema: jhe@prncetonedu Mung Chang Eectrca Engneerng Prnceton Unversty Ema: changm@prncetonedu Jennfer

More information

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks Haca un Modeo de Red Inmunoógca Artfca Basado en Kernes Towards a Kerne Based Mode for Artfca Immune Networs Juan C. Gaeano, Ing. 1, Fabo A. Gonzáez, PhD. 1 Integent Systems Research Lab, Natona Unversty

More information

Greedy Column Subset Selection for Large-scale Data Sets

Greedy Column Subset Selection for Large-scale Data Sets Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods Predcton of Success or Fa of on Dfferent Educatona Maors at the End of the Hgh Schoo th Artfca Neura Netors Methods Sayyed Mad Maznan, Member, IACSIT, and Sayyede Azam Aboghasempur Abstract The man obectve

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Expressive Negotiation over Donations to Charities

Expressive Negotiation over Donations to Charities Expressve Negotaton over Donatons to Chartes Vncent Contzer Carnege Meon Unversty 5000 Forbes Avenue Pttsburgh, PA 523, USA [email protected] Tuomas Sandhom Carnege Meon Unversty 5000 Forbes Avenue Pttsburgh,

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution

Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution Neura Networ-based Coonoscopc Dagnoss Usng On-ne Learnng and Dfferenta Evouton George D. Magouas, Vasss P. Paganaos * and Mchae N. Vrahats * Department of Informaton Systems and Computng, Brune Unversty,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks Master s Thess Ttle Confgurng robust vrtual wreless sensor networks for Internet of Thngs nspred by bran functonal networks Supervsor Professor Masayuk Murata Author Shnya Toyonaga February 10th, 2014

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem REVISTA DE MÉTODOS CUANTITATIVOS PARA LA ECONOMÍA Y LA EMPRESA (12) Págnas 5 38 Dcembre de 2011 ISSN: 1886-516X DL: SE-2927-06 URL: http://wwwupoes/revmetcuant/artphp?d=51 Branch-and-Prce and Heurstc Coumn

More information

INSTITUT FÜR INFORMATIK

INSTITUT FÜR INFORMATIK INSTITUT FÜR INFORMATIK Schedulng jobs on unform processors revsted Klaus Jansen Chrstna Robene Bercht Nr. 1109 November 2011 ISSN 2192-6247 CHRISTIAN-ALBRECHTS-UNIVERSITÄT ZU KIEL Insttut für Informat

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Adaptive Fractal Image Coding in the Frequency Domain

Adaptive Fractal Image Coding in the Frequency Domain PROCEEDINGS OF INTERNATIONAL WORKSHOP ON IMAGE PROCESSING: THEORY, METHODOLOGY, SYSTEMS AND APPLICATIONS 2-22 JUNE,1994 BUDAPEST,HUNGARY Adaptve Fractal Image Codng n the Frequency Doman K AI UWE BARTHEL

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style

Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style Factored Condtona Restrcted Botzmann Machnes for Modeng Moton Stye Graham W. Tayor [email protected] Geoffrey E. Hnton [email protected] Department of Computer Scence, Unversty of Toronto, Toronto,

More information

USING EMPIRICAL LIKELIHOOD TO COMBINE DATA: APPLICATION TO FOOD RISK ASSESSMENT.

USING EMPIRICAL LIKELIHOOD TO COMBINE DATA: APPLICATION TO FOOD RISK ASSESSMENT. Submtted to the Annas of Apped Statstcs USING EMPIRICA IKEIHOOD TO COMBINE DATA: APPICATION TO FOOD RISK ASSESSMENT. By Amée Crépet, Hugo Harar-Kermadec and Jessca Tressou INRA Mét@rs and INRA COREA Ths

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control Swng-Free Transportng of Two-Dmensona Overhead Crane Usng Sdng Mode Fuzzy Contro Dantong Lu, Janqang, Dongn Zhao, and We Wang Astract An adaptve sdng mode fuzzy contro approach s proposed for a two-dmensona

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

A heuristic task deployment approach for load balancing

A heuristic task deployment approach for load balancing Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of

More information

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2016. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

2) A single-language trained classifier: one. classifier trained on documents written in

2) A single-language trained classifier: one. classifier trained on documents written in Openng the ega terature Porta to mutngua access E. Francescon, G. Perugne ITTIG Insttute of Lega Informaton Theory and Technooges Itaan Natona Research Counc, Forence, Itay Te: +39 055 43999 Fax: +39 055

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Software Alignment for Tracking Detectors

Software Alignment for Tracking Detectors Software Algnment for Trackng Detectors V. Blobel Insttut für Expermentalphysk, Unverstät Hamburg, Germany Abstract Trackng detectors n hgh energy physcs experments requre an accurate determnaton of a

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation Increasng Supported VoIP Fows n WMNs through n-based Aggregaton J. Oech, Y. Hamam, A. Kuren F SATIE TUT Pretora, South Afrca [email protected] T. Owa Meraa Insttute Counc of Scentfc and Industra Research (CSIR)

More information

Vehicle Routing Problem with Time Windows for Reducing Fuel Consumption

Vehicle Routing Problem with Time Windows for Reducing Fuel Consumption 3020 JOURNAL OF COMPUTERS, VOL. 7, NO. 12, DECEMBER 2012 Vehcle Routng Problem wth Tme Wndows for Reducng Fuel Consumpton Jn L School of Computer and Informaton Engneerng, Zhejang Gongshang Unversty, Hangzhou,

More information

A Resources Allocation Model for Multi-Project Management

A Resources Allocation Model for Multi-Project Management A Resources Aocaton Mode for Mut-Proect Management Hamdatou Kane, Aban Tsser To cte ths verson: Hamdatou Kane, Aban Tsser. A Resources Aocaton Mode for Mut-Proect Management. 9th Internatona Conference

More information

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle The Dynamcs of Weath and Income Dstrbuton n a Neocassca Growth Mode * Stephen J. Turnovsy Unversty of Washngton, Seatte Ceca García-Peñaosa CNRS and GREQAM March 26 Abstract: We examne the evouton of the

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Method for Production Planning and Inventory Control in Oil

Method for Production Planning and Inventory Control in Oil Memors of the Faculty of Engneerng, Okayama Unversty, Vol.41, pp.20-30, January, 2007 Method for Producton Plannng and Inventory Control n Ol Refnery TakujImamura,MasamKonshandJunIma Dvson of Electronc

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers Ž. The Journal of Supercomputng, 15, 25 49 2000 2000 Kluwer Academc Publshers. Manufactured n The Netherlands. A Prefx Code Matchng Parallel Load-Balancng Method for Soluton-Adaptve Unstructured Fnte Element

More information

Section 5.3 Annuities, Future Value, and Sinking Funds

Section 5.3 Annuities, Future Value, and Sinking Funds Secton 5.3 Annutes, Future Value, and Snkng Funds Ordnary Annutes A sequence of equal payments made at equal perods of tme s called an annuty. The tme between payments s the payment perod, and the tme

More information

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Least 1-Norm SVMs: a New SVM Variant between Standard and LS-SVMs

Least 1-Norm SVMs: a New SVM Variant between Standard and LS-SVMs ESANN proceedngs, European Smposum on Artfcal Neural Networks - Computatonal Intellgence and Machne Learnng. Bruges (Belgum), 8-3 Aprl, d-sde publ., ISBN -9337--. Least -Norm SVMs: a New SVM Varant between

More information

SVM Tutorial: Classification, Regression, and Ranking

SVM Tutorial: Classification, Regression, and Ranking SVM Tutoral: Classfcaton, Regresson, and Rankng Hwanjo Yu and Sungchul Km 1 Introducton Support Vector Machnes(SVMs) have been extensvely researched n the data mnng and machne learnng communtes for the

More information

Active Learning for Interactive Visualization

Active Learning for Interactive Visualization Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,

More information

Implementation of Boolean Functions through Multiplexers with the Help of Shannon Expansion Theorem

Implementation of Boolean Functions through Multiplexers with the Help of Shannon Expansion Theorem Internatonal Journal o Computer pplcatons (975 8887) Volume 62 No.6, January 23 Implementaton o Boolean Functons through Multplexers wth the Help o Shannon Expanson Theorem Saurabh Rawat Graphc Era Unversty.

More information