Distributed Column Subset Selection on MapReduce


 Thomasine Jenkins
 1 years ago
 Views:
Transcription
1 Dstrbuted Column Subset Selecton on MapReduce Ahmed K. arahat Ahmed Elgohary Al Ghods Mohamed S. Kamel Unversty of Waterloo Waterloo, Ontaro, Canada N2L 3G1 Emal: {afarahat, aelgohary, aghodsb, Abstract Gven a very large data set dstrbuted over a cluster of several nodes, ths paper addresses the problem of selectng a few data nstances that best represent the entre data set. he soluton to ths problem s of a crucal mportance n the bg data era as t enables data analysts to understand the nsghts of the data and explore ts hdden structure. he selected nstances can also be used for data preprocessng tasks such as learnng a lowdmensonal embeddng of the data ponts or computng a lowrank approxmaton of the correspondng matrx. he paper frst formulates the problem as the selecton of a few representatve columns from a matrx whose columns are massvely dstrbuted, and t then proposes a MapReduce algorthm for selectng those representatves. he algorthm frst learns a concse representaton of all columns usng random projecton, and t then solves a generalzed column subset selecton problem at each machne n whch a subset of columns are selected from the submatrx on that machne such that the reconstructon error of the concse representaton s mnmzed. he paper then demonstrates the effectveness and effcency of the proposed algorthm through an emprcal evaluaton on benchmark data sets. KeywordsColumn Subset Selecton; Greedy Algorthms; Dstrbuted Computng; Bg Data; MapReduce; I. INRODUCION Recent years have wtnessed the rse of the bg data era n computng and storage systems. Wth the great advances n nformaton and communcaton technology, hundreds of petabytes of data are generated, transferred, processed and stored every day. he avalablty of ths overwhelmng amount of structured and unstructured data creates an acute need to develop fast and accurate algorthms to dscover useful nformaton that s hdden n the bg data. One of the crucal problems n the bg data era s the ablty to represent the data and ts underlyng nformaton n a succnct format. Although dfferent algorthms for clusterng and dmenson reducton can be used to summarze bg data, these algorthms tend to learn representatves whose meanngs are dffcult to nterpret. or nstance, the tradtonal clusterng algorthms such as kmeans [1] tend to produce centrods whch encode nformaton about thousands of data nstances. he meanngs of these centrods are hard to nterpret. Even clusterng methods that use data nstances as prototypes, such as kmedod [2], learn only one representatve for each cluster, whch s usually not enough to capture the nsghts of the data nstances n that cluster. In addton, usng medods as representatves mplctly assumes that the data ponts are dstrbuted as clusters and that the number of those clusters are known ahead of tme. hs assumpton s not true for many data sets. On the other hand, tradtonal dmenson reducton algorthms such as Latent Semantc Analyss LSA) [3] tend to learn a few latent concepts n the feature space. Each of these concepts s represented by a dense vector whch combnes thousands of features wth postve and negatve weghts. hs makes t dffcult for the data analyst to understand the meanng of these concepts. Even f the goal of representatve selecton s to learn a lowdmensonal embeddng of data nstances, learnng dmensons whose meanngs are easy to nterpret allows the understandng of the results of the data mnng algorthms, such as understandng the meanngs of data clusters n the lowdmensonal space. he acute need to summarze bg data to a format that appeals to data analysts motvates the development of dfferent algorthms to drectly select a few representatve data nstances and/or features. hs problem can be generally formulated as the selecton of a subset of columns from a data matrx, whch s formally known as the Column Subset Selecton CSS) problem [4], [5], [6]. Although many algorthms have been proposed for tacklng the CSS problem, most of these algorthms focus on randomly selectng a subset of columns wth the goal of usng these columns to obtan a lowrank approxmaton of the data matrx. In ths case, these algorthms tend to select a relatvely large number of columns. When the goal s to select a very few columns to be drectly presented to a data analyst or ndrectly used to nterpret the results of other algorthms, the randomzed CSS methods are not gong to produce a meanngful subset of columns. On the other hand, determnstc algorthms for CSS, although more accurate, do not scale to work on bg matrces wth massvely dstrbuted columns. hs paper addresses the aforementoned problem by presentng a fast and accurate algorthm for selectng a very few columns from a bg data matrx wth massvely dstrbuted columns. he algorthm starts by learnng a concse representaton of the data matrx usng random projecton. Each machne then ndependently solves a generalzed column subset selecton problem n whch a subset of columns s selected from the current submatrx such that the reconstructon error of the concse representaton s mnmzed. A further selecton step s then appled to
2 the columns selected at dfferent machnes to select the requred number of columns. he proposed algorthm s desgned to be executed effcently over massve amounts of data stored on a cluster of several commodty nodes. In such settngs of nfrastructure, ensurng the scalablty and the fault tolerance of data processng jobs s not a trval task. In order to allevate these problems, MapReduce [7] was ntroduced to smplfy largescale data analytcs over a dstrbuted envronment of commodty machnes. Currently, MapReduce and ts open source mplementaton Hadoop [8]) s consdered the most successful and wdelyused framework for managng bg data processng jobs. he approach proposed n ths paper consders the dfferent aspects of developng MapReduceeffcent algorthms. he contrbutons of the paper can be summarzed as follows: he paper proposes an algorthm for dstrbuted Column Subset Selecton CSS) whch frst learns a concse representaton of the data matrx and then selects columns from dstrbuted submatrces that approxmate ths concse representaton. o facltate CSS from dfferent submatrces, a fast and accurate algorthm for generalzed CSS s proposed. hs algorthm greedly selects a subset of columns from a source matrx whch approxmates the columns of a target matrx. A MapReduceeffcent algorthm s proposed for learnng a concse representaton usng random projecton. he paper also presents a MapReduce algorthm for dstrbuted CSS whch only requres two passes over the data wth a very low communcaton overhead. Largescale experments have been conducted on benchmark data sets n whch dfferent methods for CSS are compared. he rest of the paper s organzed as follows. Secton II descrbes the notatons used throughout the paper. Secton III gves a bref background on the CSS problem. Secton IV descrbes a centralzed greedy algorthm for CSS, whch s the core of the dstrbuted algorthm presented n ths paper. Secton V gves a necessary background on the framework of MapReduce. he proposed MapReduce algorthm for dstrbuted CSS s descrbed n detals n Secton VI. Secton VII revews the stateoftheart CSS methods and ther applcablty to dstrbuted data. In Secton VIII, an emprcal evaluaton of the proposed method s descrbed. nally, Secton IX concludes the paper. II. NOAIONS he followng notatons are used throughout the paper unless otherwse ndcated. Scalars are denoted by small letters e.g., m, n), sets are denoted n scrpt letters e.g., S, R), vectors are denoted by small bold talc letters e.g., f, g), and matrces are denoted by captal letters e.g., A, B). he subscrpt ) ndcates that the varable corresponds to the th block of data n the dstrbuted envronment. In addton, the followng notatons are used: or a set S: S the cardnalty of the set. or a vector x R m : x th element of x. x the Eucldean norm l 2 norm) of x. or a matrx A R m n : A j, j)th entry of A. A : th row of A. A :j jth column of A. A :S the submatrx of A whch conssts of the set S of columns. A the transpose of A. A the robenus norm of A: A = Σ,j A 2 j. Ã a low rank approxmaton of A. Ã S a rankl approxmaton of A based on the set S of columns, where S = l. III. COLUMN SUBSE SELECION CSS) he Column Subset Selecton CSS) problem can be generally defned as the selecton of the most representatve columns of a data matrx [4], [5], [6]. he CSS problem generalzes the problem of selectng representatve data nstances as well as the unsupervsed feature selecton problem. Both are crucal tasks, that can be drectly used for data analyss or as preprocessng steps for developng fast and accurate algorthms n data mnng and machne learnng. Although dfferent crtera for column subset selecton can be defned, a common crteron that has been used n much recent work measures the dscrepancy between the orgnal matrx and the approxmate matrx reconstructed from the subset of selected columns [9], [10], [11], [12], [13], [4], [5], [6], [14]. Most of the recent work ether develops CSS algorthms that drectly optmze ths crteron or uses ths crteron to assess the qualty of the proposed CSS algorthms. In the present work, the CSS problem s formally defned as Problem 1: Column Subset Selecton) Gven an m n matrx A and an nteger l, fnd a subset of columns L such that L = l and L = arg mn A P S) A 2, S where P S) s an m m projecton matrx whch projects the columns of A onto the span of the canddate columns A :S. he crteron S) = A P S) A 2 represents the sum of squared errors between the orgnal data matrx A and ts rankl columnbased approxmaton where l = S ), Ã S = P S) A. 1)
3 In other words, the crteron S) calculates the robenus norm of the resdual matrx E = A ÃS. Other types of matrx norms can also be used to quantfy the reconstructon error. Some of the recent work on the CSS problem [4], [5], [6] derves theoretcal bounds for both the robenus and spectral norms of the resdual matrx. he present work, however, focuses on developng algorthms that mnmze the robenus norm of the resdual matrx. he projecton matrx P S) can be calculated as P S) = A :S A :S A :S ) 1 A :S, 2) where A :S s the submatrx of A whch conssts of the columns correspondng to S. It should be noted that f S s known, the term A :S A 1 :S) A :S A s the closedform soluton of leastsquares problem = arg mn A A :S 2. he set of selected columns.e., data nstances or features) can be drectly presented to a data analyst to learn about the nsghts of the data, or they can be used to preprocess the data for further analyss. or nstance, the selected columns can be used to obtan a lowdmensonal representaton of all columns nto the subspace of selected ones. hs representaton can be obtaned by calculatng an orthogonal bass for the selected columns Q and then embeddng all columns of A nto the subspace of Q as W = Q A. he selected columns can also be used to calculate a columnbased lowrank approxmaton of A [12]. Moreover, the leadng sngular values and vectors of the lowdmensonal embeddng W can be used to approxmate those of the data matrx. IV. GREEDY CSS he column subset selecton crteron presented n Secton III measures the reconstructon error of a data matrx based on the subset of selected columns. he mnmzaton of ths crteron s a combnatoral optmzaton problem whose optmal soluton can be obtaned n O n l mnl ) [5]. hs secton brefly descrbes a determnstc greedy algorthm for optmzng ths crteron, whch extends the greedy method for unsupervsed feature selecton recently proposed by arahat et al. [15], [16]. A bref descrpton of ths method s ncluded n ths secton for completeness. he reader s referred to [16] for the proofs of the dfferent formulas presented n ths secton. he greedy CSS [16] s based the followng recursve formula for the CSS crteron. heorem 1: Gven a set of columns S. or any P S, S) = P) ẼR 2, where E = A P P) A, and ẼR s the lowrank approxmaton of E based on the subset R = S \ P of columns. Proof: See [16, heorem 2]. he term ẼR 2 represents the decrease n reconstructon error acheved by addng the subset R of columns to P. hs recursve formula allows the development of an effcent greedy algorthm that approxmates the optmal soluton of the column subset selecton problem. At teraton t, the goal s to fnd column p such that p = arg mn S {}), 3) where S s the set of columns selected durng the frst t 1 teratons. Let G be an n n matrx whch represents the nnerproducts over the columns of the resdual matrx E,.e., G = E E. he greedy selecton problem can be smplfed to See [16, Secton 6]) Problem 2: Greedy Column Subset Selecton) At teraton t, fnd column p such that p = arg max G : 2 G where G = E E, E = A ÃS and S s the set of columns selected durng the frst t 1 teratons. or teraton t, defne δ = G :p and ω = G :p / G pp = δ/ δ p. he vector δ t) can be calculated n terms of A and prevous ω s as t 1 δ t) = A A :p ω r) p ω r). 4) r=1 he numerator and denomnator of the selecton crteron at each teraton can be calculated n an effcent manner wthout explctly calculatng E or G usng the followng theorem. heorem 2: Let f = G : 2 and g = G be the numerator and denomnator of the crteron functon for column respectvely, f = [f ] =1..n, and g = [g ] =1..n. hen, ) f t) = f 2 ω A Aω Σ t 2 r=1 ω r) ω ω r))) ) t 1), + ω 2 ω ω) ) t 1). g t) = g ω ω) where represents the Hadamard product operator. Proof: See [16, heorem 4]. Algorthm 1 shows the complete greedy CSS algorthm. he dstrbuted CSS algorthm presented n ths paper ntroduces a generalzed varant of the greedy CSS algorthm n whch a subset of columns s selected from a source matrx such that the reconstructon error of a target matrx s mnmzed. he dstrbuted CSS method uses the greedy generalzed CSS algorthm as the core method for selectng columns at dfferent machnes as well as n the fnal selecton step.
4 Algorthm 1 Greedy Column Subset Selecton Input: Data matrx A, Number of columns l Output: Selected subset of columns S 1: Intalze S = { } 2: Intalze f 0) = A A : 2, g 0) = A : A : for = 1...n 3: Repeat t = 1 l: 4: p = arg max f t) 5: δ t) = A A :p t 1 /g t), S = S {p} r=1 ωr) p ω r) 6: ω t) = δ t) / δ t) p 7: Update f s, g s heorem 2) V. MAPREDUCE PARADIGM MapReduce [7] was presented as a programmng model to smplfy largescale data analytcs over a dstrbuted envronment of commodty machnes. he ratonale behnd MapReduce s to mpose a set of constrants on data access at each ndvdual machne and communcaton between dfferent machnes to ensure both the scalablty and faulttolerance of the analytcal tasks. Currently, MapReduce s consdered the defacto soluton for many data analytcs tasks over large dstrbuted clusters [17], [18]. A MapReduce job s executed n two phases of userdefned data transformaton functons, namely, map and reduce phases. he nput data s splt nto physcal blocks dstrbuted among the nodes. Each block s vewed as a lst of keyvalue pars. In the frst phase, the keyvalue pars of each nput block b are processed by a sngle map functon runnng ndependently on the node where the block b s stored. he keyvalue pars are provded onebyone to the map functon. he output of the map functon s another set of ntermedate keyvalue pars. he values assocated wth the same key across all nodes are grouped together and provded as an nput to the reduce functon n the second phase. Dfferent groups of values are processed n parallel on dfferent machnes. he output of each reduce functon s a thrd set of keyvalue pars and collectvely consdered the output of the job. It s mportant to note that the set of the ntermedate keyvalue pars s moved across the network between the nodes whch ncurs sgnfcant addtonal executon tme when much data are to be moved. or complex analytcal tasks, multple jobs are typcally chaned together [17] and/or many rounds of the same job are executed on the nput data set [18]. In addton to the programmng model constrants, Karloff et al. [19] defned a set of computatonal constrants that ensure the scalablty and the effcency of MapReducebased analytcal tasks. hese computatonal constrants lmt the used memory sze at each machne, the output sze of both the map and reduce functons and the number of rounds used to complete a certan tasks. he MapReduce algorthms presented n ths paper adhere to both the programmng model constrants and the computatonal constrants. he proposed algorthm ams also at mnmzng the overall runnng tme of the dstrbuted column subset selecton task to facltate nteractve data analytcs. VI. DISRIBUED CSS ON MAPREDUCE hs secton descrbes a MapReduce algorthm for the dstrbuted column subset selecton problem. Gven a bg data matrx A whose columns are dstrbuted across dfferent machnes, the goal s to select a subset of columns S from A such that the CSS crteron S) s mnmzed. One naïve approach to perform dstrbuted column subset selecton s to select dfferent subsets of columns from the submatrces stored on dfferent machnes. he selected subsets are then sent to a centralzed machne where an addtonal selecton step s optonally performed to flter out rrelevant or redundant columns. Let A ) be the submatrx stored at machne, the naïve approach optmzes the followng functon. c A ) P L )) 2 A) =1, 5) where L ) s the set of columns selected from A ) and c s the number of physcal blocks of data. he resultng set of columns s the unon of the sets selected from dfferent submatrces: L = c =1 L ). he set L can further be reduced by nvokng another selecton process n whch a smaller subset of columns s selected from A :L. he naïve approach, however smple, s prone to mssng relevant columns. hs s because the selecton at each machne s based on approxmatng a local submatrx, and accordngly there s no way to determne whether the selected columns are globally relevant or not. or nstance, suppose the extreme case where all the truly representatve columns happen to be loaded on a sngle machne. In ths case, the algorthm wll select a lessthanrequred number of columns from that machne and many rrelevant columns from other machnes. In order to allevate ths problem, the dfferent machnes have to select columns that best approxmate a common representaton of the data matrx. o acheve that, the proposed algorthm frst learns a concse representaton of the span of the bg data matrx. hs concse representaton s relatvely small and t can be sent over to all machnes. After that each machne can select columns from ts submatrx that approxmate ths concse representaton. he proposed algorthm uses random projecton to learn ths concse representaton, and proposes a generalzed Column Subset Selecton CSS) method to select columns from dfferent machnes. he detals of the proposed methods are explaned n the rest of ths secton.
5 A. Random Projecton he frst step of the proposed algorthm s to learn a concse representaton B for a dstrbuted data matrx A. In the proposed approach, a random projecton method s employed. Random projecton [20][21][22] s a wellknown technque for dealng wth the curseofthedmensonalty problem. Let Ω be a random projecton matrx of sze n r, and gven a data matrx X of sze m n, the random projecton can be calculated as Y = XΩ. It has been shown that applyng random projecton Ω to X preserves the parwse dstances between vectors n the row space of X wth a hgh probablty [20]: 1 ɛ) X : X j: X : Ω X j: Ω 1 + ɛ) X : X j:, where ɛ s an arbtrarly small factor. Snce the CSS crteron S) measures the reconstructon error between the bg data matrx A and ts lowrank approxmaton P S) A, t essentally measures the sum of the dstances between the orgnal rows and ther approxmatons. hs means that when applyng random projecton to both A and P S) A, the reconstructon error of the orgnal data matrx A wll be approxmately equal to that of AΩ when both are approxmated usng the subset of selected columns: 6) A P S) A 2 AΩ P S) AΩ 2. 7) So, nstead of optmzng A P S) A 2, the dstrbuted CSS can approxmately optmze AΩ P S) AΩ 2. Let B = AΩ, the dstrbuted column subset selecton problem can be formally defned as Problem 3: Dstrbuted Column Subset Selecton) Gven an m n ) submatrx A ) whch s stored at node and an nteger l ), fnd a subset of columns L ) such that L ) = l ) and L ) = arg mn B P S) B 2, S where B = AΩ, Ω s an n r random projecton matrx, S s the set of the ndces of the canddate columns and L ) s the set of the ndces of the selected columns from A ). A key observaton here s that random projecton matrces whose entres are sampled..d from some unvarate dstrbuton Ψ can be exploted to compute random projecton on MapReduce n a very effcent manner. Examples of such matrces are Gaussan random matrces [20], unform random sgn ±1) matrces [21], and sparse random sgn matrces [22]. In order to mplement random projecton on MapReduce, the data matrx A s dstrbuted n a columnwse fashon and vewed as pars of, A : where A : s the th column of A. Recall that B = AΩ can be rewrtten as n B = A : Ω : 8) =1 Algorthm 2 ast Random Projecton on MapReduce Input: Data matrx A, Unvarate dstrbuton Ψ, Number of dmensons r Output: Concse representaton B = AΩ, Ω j Ψ, j 1: map: 2: B = [0]m r 3: foreach, A : 4: 5: Generate v = [v 1, v 2,...v r ], v j Ψ B = B + A: v 6: for j = 1 to m 7: emt j, B j: 8: reduce: 9: foreach j, [ [ B 1) ] j:, [ B 2) ] j:,..., [ B c) ] j: ] 10: B j: = c =1 [ B ) ] j: 11: emt j, B j: and snce the map functon s provded one column of A at a tme, one does not need to worry about precomputng the full matrx Ω. In fact, for each nput column A :, a new vector Ω : needs to be sampled from Ψ. So, each nput column generates a matrx of sze m r whch means that Onmr) data should be moved across the network to sum the generated n matrces at m ndependent reducers each summng a row B j: to obtan B. o mnmze that network cost, an nmemory summaton can be carred out over the generated m r matrces at each mapper. hs can be done ncrementally after processng each column of A. hat optmzaton reduces the network cost to Ocmr), where c s the number of physcal blocks of the matrx 1. Algorthm 2 outlnes the proposed random projecton algorthm. he term emt s used to refer to outputtng new key, value pars from a mapper or a reducer. B. Generalzed CSS hs secton presents the generalzed column subset selecton algorthm whch wll be used to perform the selecton of columns at dfferent machnes. Whle Problem 1 s concerned wth the selecton of a subset of columns from a data matrx whch best represent other columns of the same matrx, Problem 3 selects a subset of columns from a source matrx whch best represent the columns of a dfferent target matrx. he objectve functon of Problem 3 represents the reconstructon error of the target matrx B based on the selected columns) from the source matrx. and the term P S) = A :S A 1 :S A :S A :S s the projecton matrx whch projects the columns of B onto the subspace of the columns selected from A. In order to optmze ths new crteron, a greedy algorthm can be ntroduced. Let S) = B P S) B 2 be the 1 he nmemory summaton can also be replaced by a MapReduce combner [7].
6 dstrbuted CSS crteron, the followng theorem derves a recursve formula for S). heorem 3: Gven a set of columns S. or any P S, S) = P) 2 R, where = B P P) B, and R s the lowrank approxmaton of based on the subset R = S \ P of columns of E = A P P) A. Proof: Usng the recursve formula for the lowrank approxmaton of A: Ã S = ÃP + ẼR, and multplyng both sdes wth Ω gves Ã S Ω = ÃPΩ + ẼRΩ. Lowrank approxmatons can be wrtten n terms of projecton matrces as Usng B = AΩ, P S) AΩ = P P) AΩ + R R) EΩ. P S) B = P P) B + R R) EΩ. Let = EΩ. he matrx s the resdual after approxmatng B usng the set P of columns ) = EΩ = A P P) A Ω = AΩ P P) AΩ = B P P) B. hs means that P S) B = P P) B + R R) Substtutng n S) = B P S) B 2 gves S) = B P P) B R R) Usng = B P P) B gves S) = R R) Usng the relaton between robenus norm and trace, ) ) ) S) = trace R R) R R) ) = trace 2 R R) + R R) R R) ) = trace R R) = 2 R R) Usng P) = 2 and R = R R) proves the theorem. Usng the recursve formula for S {}) allows the development of a greedy algorthm whch at teraton t optmzes p = arg mn 2 S {}) = arg max 2 2 {} 2 9) Algorthm 3 Greedy Generalzed Column Subset Selecton Input: Source matrx A, arget matrx B, Number of columns l Output: Selected subset of columns S 1: Intalze f 0) = B A : 2, g 0) = A : A : for = 1...n 2: Repeat t = 1 l: 3: p = arg max f t) 4: δ t) = A A :p t 1 /g t), S = S {p} r=1 ωr) p ω r) r=1 ωr) p υ r) δ t) 5: γ t) = B A :p t 1 6: ω t) = δ t) / p, υ t) = γ t) / 7: Update f s, g s heorem 4) δ t) p Let G = E E and H = E, the objectve functon of ths optmzaton problem can be smplfed as follows. 2 ) {} = E : E 1 : E : E : 2 = trace ) E : E 1 ) : E : E : 10) E : 2 = E: E = H : 2. : G hs allows the defnton of the followng generalzed CSS problem. Problem 4: Greedy Generalzed CSS) At teraton t, fnd column p such that p = arg max H : 2 G where H = E, G = E E, = B P S) B, E = A P S) A and S s the set of columns selected durng the frst t 1 teratons. or teraton t, defne γ = H :p and υ = H :p / G pp = γ/ δ p. he vector γ t) can be calculated n terms of A, B and prevous ω s and υ s as γ t) = B A :p t 1 r=1 ωr) p υ r). Smlarly, the numerator and denomnator of the selecton crteron at each teraton can be calculated n an effcent manner usng the followng theorem. heorem 4: Let f = H : 2 and g = G be the numerator and denomnator of the greedy crteron functon for column respectvely, f = [f ] =1..n, and g = [g ] =1..n. hen, ) f t) = f 2 ω A Bυ Σ t 2 r=1 υ r) υ ω r))) ) t 1), + υ 2 ω ω) ) t 1) g t) = g ω ω), where represents the Hadamard product operator. As outlned n Secton VIA, the algorthm s dstrbuton strategy s based on sharng the concse representaton of the data B among all mappers. hen, ndependent l b) columns
7 Algorthm 4 Dstrbuted CSS on MapReduce Input: Matrx A of sze m n, Concse representaton B, Number of columns l Output: Selected columns C 1: map: 2: A b) = [ ] 3: foreach, A : 4: 5: A b) = [A b) A : ] S = GeneralzedCSSAb), B, l b) ) 6: foreach j n S 7: emt 0, [A b) ] :j 8: reduce: 9: or all values {[A 1) ] : S1), [A 2) ] : S2),..., [A c) ] : Sc) } ] 10: A 0) = [[A 1) ] : S1), [A 2) ] : S2),..., [A c) ] : Sc) 11: S = GeneralzedCSS A 0), B, l) 12: foreach j n S 13: emt 0, [A 0) ] :j from each mapper are selected usng the generalzed CSS algorthm. A second phase of selecton s run over the c b=1 l b) where c s the number of nput blocks) columns to fnd the best l columns to represent B. Dfferent ways can be used to set l b) for each nput block b. In the context of ths paper, the set of l b) s assgned unform values for all blocks.e. l b) = l/c b 1, 2,..c). Other methods are to be consdered n future extensons. Algorthm 4 sketches the MapReduce mplementaton of the dstrbuted CSS algorthm. It should be emphaszed that the proposed MapReduce algorthm requres only two passes over the data set and ts moves a very few amount of the data across the network. VII. RELAED WORK Dfferent approaches have been proposed for selectng a subset of representatve columns from a data matrx. hs secton focuses on brefly descrbng these approaches and ther applcablty to massvely dstrbuted data matrces. he Column Subset Selecton CSS) methods can be generally categorzed nto randomzed, determnstc and hybrd. he randomzed methods sample a subset of columns from the orgnal matrx usng carefully chosen samplng probabltes. reze et al. [9] was the frst to suggest the dea of randomly samplng l columns from a matrx and usng these columns to calculate a rankk approxmaton of the matrx where l k). hat work of reze et al. was followed by dfferent papers [10], [11] that enhanced the algorthm by proposng dfferent samplng probabltes. Drneas et al. [12] proposed a subspace samplng method whch samples columns usng probabltes proportonal to the norms of the rows of the top k rght sngular vectors of A. Deshpande et al. [13] proposed an adaptve samplng method whch updates the samplng probabltes based on the columns selected so far. Column subset selecton wth unform samplng can be easly mplemented on MapReduce. or nonunform samplng, the effcency of mplementng the selecton on MapReduce s determned by how easy are the calculatons of the samplng probabltes. he calculatons of probabltes that depend on calculatng the leadng sngular values and vectors are tmeconsumng on MapReduce. On the other hand, adaptve samplng methods are computatonally very complex as they depend on calculatng the resdual of the whole data matrx after each teraton. he second category of methods employs a determnstc algorthm for selectng columns such that some crteron functon s mnmzed. hs crteron functon usually quantfes the reconstructon error of the data matrx based on the subset of selected columns. he determnstc methods are slower, but more accurate, than the randomzed ones. In the area of numercal lnear algebra, the column pvotng method exploted by the QR decomposton [23] permutes the columns of the matrx based on ther norms to enhance the numercal stablty of the QR decomposton algorthm. he frst l columns of the permuted matrx can be drectly selected as representatve columns. Besdes methods based on QR decomposton, dfferent recent methods have been proposed for drectly selectng a subset of columns from the data matrx. Boutsds et al. [4] proposed a determnstc column subset selecton method whch frst groups columns nto clusters and then selects a subset of columns from each cluster. Çvrl and MagdonIsmal [14] presented a determnstc algorthm whch greedly selects columns from the data matrx that best represent the rght leadng sngular values of the matrx. Recently, Boutsds et al. [6] presented a column subset selecton algorthm whch frst calculates the topk rght sngular values of the data matrx where k s the target rank) and then uses determnstc sparsfcaton methods to select l k columns from the data matrx. Besdes, other determnstc algorthms have been proposed for selectng columns based on the volume defned by them and the orgn [24], [25]. he determnstc algorthms are more complex to mplement on MapReduce. or nstance, t s tmeconsumng to calculate the leadng sngular values and vectors of a massvely dstrbuted matrx or to cluster ther columns usng kmeans. It s also computatonally complex to calculate QR decomposton wth pvotng. Moreover, the recently proposed algorthms for volume samplng are more complex than other CSS algorthms as well as the one presented n ths paper, and they are nfeasble for large data sets. A thrd category of CSS technques s the hybrd methods whch combne the benefts of both the randomzed and determnstc methods. In these methods, a large subset of columns s randomly sampled from the columns of the data matrx and then a determnstc step s employed to reduce
8 able I HE PROPERIES O HE DAA SES USED O EVALUAE HE DISRIBUED CSS MEHOD. Data set ype # Instances # eatures RCV1200K Documents 193,844 47,236 nyimages1m Images 1 mllon 1,024 the number of selected columns to the desred rank. or nstance, Boutsds et al. [5] proposed a twostage hybrd CSS algorthm whch frst samples O l log l) columns based on probabltes calculated usng the lleadng rght sngular vectors, and then employs a determnstc algorthm to select exactly l columns from the columns sampled n the frst stage. However, the algorthm depends on calculatng the leadng l rght sngular vectors whch s tmeconsumng for large data sets. he hybrd algorthms for CSS can be easly mplemented on MapReduce f the randomzed selecton step s MapReduceeffcent and the determnstc selecton step can be mplemented on a sngle machne. hs s usually true f the number of columns selected by the randomzed step s relatvely small. In comparson to other CSS methods, the algorthm proposed n ths paper s desgned to be MapReduceeffcent. In the dstrbuted selecton step, representatve columns are selected based on a common representaton. he common representaton proposed n ths work s based on random projecton. hs s more effcent than the work of Çvrl and MagdonIsmal [14] whch selects columns based on the leadng sngular vectors. In comparson to other determnstc methods, the proposed algorthm s specfcally desgned to be parallelzed whch makes t applcable to bg data matrces whose columns are massvely dstrbuted. On the other hand, the twostep of dstrbuted then centralzed selecton s smlar to that of the hybrd CSS methods. he proposed algorthm however employs a determnstc algorthm at the dstrbuted selecton phase whch s more accurate than the randomzed selecton employed by hybrd methods n the frst phase. VIII. EXPERIMENS Experments have been conducted on two bg data sets to evaluate the effcency and effectveness of the proposed dstrbuted CSS algorthm on MapReduce. he propertes of the data sets are descrbed n able I. he RCV1200K s a subset of the RCV1 data set [26] whch has been prepared and used by Chen et al. [27] to evaluate parallel spectral clusterng algorthms. he nyimages1m data set contans 1 mllon mages that were sampled from the 80 mllon tny mages data set [28] and converted to grayscale. Smlar to prevous work on CSS, the dfferent methods are evaluated accordng to ther ablty to mnmze the reconstructon error of the data matrx based on the subset of selected columns. In order to quantfy the reconstructon error across dfferent data sets, a relatve accuracy measure s defned as Relatve Accuracy = A ÃU A ÃS A ÃU A Ãl 100%, where ÃU s the rankl approxmaton of the data matrx based on a random subset U of columns, Ã S s the rankl approxmaton of the data matrx based on the subset S of columns and Ãl s the best rankl approxmaton of the data matrx calculated usng the Sngular Value Decomposton SVD). hs measure compares dfferent methods relatve to the unform samplng as a baselne wth hgher values ndcatng better performance. he experments were conducted on Amazon EC2 2 clusters, whch consst of 10 nstances for the RCV1200K data set and 20 nstances for the nyimages1m data set. Each nstance has a 7.5 GB of memory and a twocores processor. All nstances are runnng Deban and Hadoop verson he data sets were converted nto a bnary format n the form of a sequence of keyvalue pars. Each par conssted of a column ndex as the key and a vector of the column entres. hat s the standard format used n Mahout 3 for storng dstrbuted matrces. he dstrbuted CSS method has been compared wth dfferent stateoftheart methods. It should be noted that most of these methods were not desgned wth the goal of applyng them to massvelydstrbuted data, and hence ther mplementaton on MapReduce s not straghtforward. However, the desgned experments used the best practces for mplementng the dfferent steps of these methods on MapReduce to the best of the authors knowledge. In specfc, the followng dstrbuted CSS algorthms were compared. UnNoRep: s unform samplng of columns wthout replacement. hs s usually the worst performng method n terms on approxmaton error and t wll be used as a baselne to evaluate methods across dfferent data sets. HybrdUn, HybrdCol and HybrdSVD: are dfferent dstrbuted varants of the hybrd CSS algorthm whch can be mplemented effcently on MapReduce. In the randomzed phase, the three methods use probabltes calculated based on unform samplng, column norms and the norms of the leadng sngular vectors rows, respectvely. he number of selected columns n the randomzed phase s set to l log l). In the determnstc phase, the centralzed greedy CSS s employed to select exactly l columns from the randomly sampled columns. DstApproxSVD: s an extenson of the centralzed algorthm for sparse approxmaton of Sngular Value Decomposton SVD) [14]. he dstrbuted CSS algorthm presented n ths paper Algorthm 4) s used 2 Amazon Elastc Compute Cloud EC2): 3 Mahout s an Apache project for mplementng Machne Learnng algorthms on Hadoop. See
9 able II HE RUN IMES AND RELAIVE ACCURACIES O DIEREN CSS MEHODS. HE BES PERORMING MEHOD OR EACH l IS HIGHLIGHED IN BOLD, AND HE SECOND BES MEHOD IS UNDERLINED. NEGAIVE MEASURES INDICAE MEHODS HA PERORM WORSE HAN UNIORM SAMPLING. Methods Run tme mnutes) Relatve accuracy %) l = 10 l = 100 l = 500 l = 10 l = 100 l = 500 RCV1200K Unform  Baselne Hybrd Unform) Hybrd Column Norms) Hybrd SVDbased) Dstrbuted Approx. SVD Dstrbuted Greedy CSS rnd) Dstrbuted Greedy CSS ssgn) ny Images  1M Unform  Baselne Hybrd Unform) Hybrd Column Norms) Hybrd SVDbased) Dstrbuted Approx. SVD Dstrbuted Greedy CSS ssgn) to select columns that best approxmate the leadng sngular vectors by settng B = U k Σ k ). he use of the dstrbuted CSS algorthm extends the orgnal algorthm proposed by Çvrl and MagdonIsmal [14] to work on dstrbuted matrces. In order to allow effcent mplementaton on MapReduce, the number of leadng sngular vectors s set of 100. DstGreedyCSS: s the dstrbuted column subset selecton method descrbed n Algorthm 4. or all experments, the dmenson of the random projecton matrx s set to 100. hs makes the sze of the concse representaton the same as the DstApproxSVD method. wo types of random matrces are used for random projecton: 1) a dense Gaussan random matrx rnd), and 2) a sparse random sgn matrx ssgn). or the methods that requre the calculatons of Sngular Value Decomposton SVD), the Stochastc SVD SSVD) algorthm [29] s used to approxmate the leadng sngular values and vectors of the data matrx. he use of SSVD sgnfcantly reduces the run tme of the orgnal SVDbased algorthms whle achevng comparable accuracy. In the conducted experments, the SSVD mplementaton of Mahout was used. able II shows the run tmes and relatve accuraces for dfferent CSS methods. It can be observed from the table that for the RCV1200K data set, the DstGreedyCSS methods wth random Gaussan and sparse random sng matrces) outperforms all other methods n terms of relatve accuraces. In addton, the run tmes of both of them are relatvely small compared to the DstApproxSVD method whch acheves accuraces that are close to the DstGreedyCSS method. Both the DstApproxSVD and DstGreedyCSS methods acheve very good approxmaton accuraces compared to randomzed and hybrd methods. It should also be noted that usng a sparse random sgn matrx for random projecton takes much less tme than a dense Gaussan matrx, whle achevng comparable approxmaton accuraces. Based on ths observaton, the sparse random matrx has been used wth the nyimages1m data set. or the nyimages1m data set, although the DstApproxSVD acheves slghtly hgher approxmaton accuraces than DstGreedyCSS wth sparse random sgn matrx), the DstGreedyCSS selects columns n almost onethrd of the tme. he reason why the DstApproxSVD outperforms DstGreedyCSS for ths data set s that ts rank s relatvely small less than 1024). hs means that usng the leadng 100 sngular values to represent the concse representaton of the data matrx captures most of the nformaton n the matrx and accordngly s more accurate than random projecton. he DstGreedyCSS however stll selects a very good subset of columns n a relatvely small tme. IX. CONCLUSION hs paper proposes an accurate and effcent MapReduce algorthm for selectng a subset of columns from a massvely dstrbuted matrx. he algorthm starts by learnng a concse representaton of the data matrx usng random projecton. It then selects columns from each submatrx that best approxmate ths concse approxmaton. A centralzed selecton step s then performed on the columns selected from dfferent submatrces. In order to facltate the mplementaton of the proposed method, a novel algorthm for greedy generalzed CSS s proposed to perform the selecton from dfferent submatrces. In addton, the dfferent steps of the algorthms are carefully desgned to be MapReduceeffcent. Experments on bg data sets demonstrate the effectveness and effcency of the proposed algorthm n comparson to other CSS methods when mplemented on dstrbuted data. REERENCES [1] A. K. Jan and R. C. Dubes, Algorthms for Clusterng Data. Upper Saddle Rver, NJ, USA: PrentceHall, Inc., 1988.
10 [2] L. Kaufman and P. Rousseeuw, Clusterng by means of medods, echnsche Hogeschool, Delft Netherlands). Department of Mathematcs and Informatcs, ech. Rep., [3] S. Deerwester, S. Dumas, G. urnas,. Landauer, and R. Harshman, Indexng by latent semantc analyss, Journal of the Amercan Socety for Informaton Scence and echnology, vol. 41, no. 6, pp , [4] C. Boutsds, J. Sun, and N. Anerouss, Clustered subset selecton and ts applcatons on t servce metrcs, n Proceedngs of the Seventeenth ACM Conference on Informaton and Knowledge Management CIKM 08), 2008, pp [5] C. Boutsds, M. W. Mahoney, and P. Drneas, An mproved approxmaton algorthm for the column subset selecton problem, n Proceedngs of the wenteth Annual ACMSIAM Symposum on Dscrete Algorthms SODA 09), 2009, pp [6] C. Boutsds, P. Drneas, and M. MagdonIsmal, Near optmal columnbased matrx reconstructon, n Proceedngs of the 52nd Annual IEEE Symposum on oundatons of Computer Scence OCS 11), 2011, pp [7] J. Dean and S. Ghemawat, MapReduce: Smplfed data processng on large clusters, Communcatons of the ACM, vol. 51, no. 1, pp , [8]. Whte, Hadoop: he Defntve Gude, 1st ed. O Relly Meda, Inc., [9] A. reze, R. Kannan, and S. Vempala, ast MonteCarlo algorthms for fndng lowrank approxmatons, n Proceedngs of the 39th Annual IEEE Symposum on oundatons of Computer Scence OCS 98), 1998, pp [10] P. Drneas, A. reze, R. Kannan, S. Vempala, and V. Vnay, Clusterng large graphs va the sngular value decomposton, Machne Learnng, vol. 56, no. 13, pp. 9 33, [11] P. Drneas, R. Kannan, and M. Mahoney, ast Monte Carlo algorthms for matrces II: Computng a lowrank approxmaton to a matrx, SIAM Journal on Computng, vol. 36, no. 1, pp , [12] P. Drneas, M. Mahoney, and S. Muthukrshnan, Subspace samplng and relatveerror matrx approxmaton: Columnbased methods, n Approxmaton, Randomzaton, and Combnatoral Optmzaton. Algorthms and echnques. Sprnger Berln / Hedelberg, 2006, pp [13] A. Deshpande, L. Rademacher, S. Vempala, and G. Wang, Matrx approxmaton and projectve clusterng va volume samplng, heory of Computng, vol. 2, no. 1, pp , [14] A. Çvrl and M. MagdonIsmal, Column subset selecton va sparse approxmaton of SVD, heoretcal Computer Scence, vol. 421, no. 0, pp. 1 14, [15] A. K. arahat, A. Ghods, and M. S. Kamel, An effcent greedy method for unsupervsed feature selecton, n Proceedngs of the Eleventh IEEE Internatonal Conference on Data Mnng ICDM 11), 2011, pp [16], Effcent greedy feature selecton for unsupervsed learnng, Knowledge and Informaton Systems, vol. 35, no. 2, pp , [17]. Elsayed, J. Ln, and D. W. Oard, Parwse document smlarty n large collectons wth MapReduce, n Proceedngs of the 46th Annual Meetng of the Assocaton for Computatonal Lngustcs on Human Language echnologes: Short Papers HL 08), 2008, pp [18] A. Ene, S. Im, and B. Moseley, ast clusterng usng MapReduce, n Proceedngs of the Seventeenth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng KDD 11), 2011, pp [19] H. Karloff, S. Sur, and S. Vasslvtsk, A model of computaton for MapReduce, n Proceedngs of the 21st Annual ACMSIAM Symposum on Dscrete Algorthms SODA 10), 2010, pp [20] S. Dasgupta and A. Gupta, An elementary proof of a theorem of Johnson and Lndenstrauss, Random Structures and Algorthms, vol. 22, no. 1, pp , [21] D. Achloptas, Databasefrendly random projectons: JohnsonLndenstrauss wth bnary cons, Journal of computer and System Scences, vol. 66, no. 4, pp , [22] P. L,. J. Haste, and K. W. Church, Very sparse random projectons, n Proceedngs of the welfth ACM SIGKDD nternatonal conference on Knowledge Dscovery and Data Mnng KDD 06), 2006, pp [23] G. Golub and C. Van Loan, Matrx Computatons, 3rd ed. Johns Hopkns Unv Pr, [24] A. Deshpande and L. Rademacher, Effcent volume samplng for row/column subset selecton, n Proceedngs of the 51st Annual IEEE Symposum on oundatons of Computer Scence OCS 10), 2010, pp [25] V. Guruswam and A. K. Snop, Optmal columnbased lowrank matrx reconstructon, n Proceedngs of the 21st Annual ACMSIAM Symposum on Dscrete Algorthms SODA 12), 2012, pp [26] D. D. Lews, Y. Yang,. G. Rose, and. L, Rcv1: A new benchmark collecton for text categorzaton research, he Journal of Machne Learnng Research, vol. 5, pp , [27] W.Y. Chen, Y. Song, H. Ba, C.J. Ln, and E. Chang, Parallel spectral clusterng n dstrbuted systems, Pattern Analyss and Machne Intellgence, IEEE ransactons on, vol. 33, no. 3, pp , [28] A. orralba, R. ergus, and W. reeman, 80 mllon tny mages: A large data set for nonparametrc object and scene recognton, Pattern Analyss and Machne Intellgence, IEEE ransactons on, vol. 30, no. 11, pp , [29] N. Halko, P.G. Martnsson, Y. Shkolnsky, and M. ygert, An algorthm for the prncpal component analyss of large data sets, SIAM Journal on Scentfc Computng, vol. 33, no. 5, pp , 2011.
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the
More informationA Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning
A Scalable Data Scence Workflow Approach for Bg Data Bayesan Network Learnng Janwu Wang 1, Yan Tang 2, Ma Nguyen 1, Ilkay Altntas 1 1 San Dego Supercomputer Center Unversty of Calforna, San Dego La Jolla,
More informationDropout: A Simple Way to Prevent Neural Networks from Overfitting
Journal of Machne Learnng Research 15 (2014) 19291958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever
More informationSequential DOE via dynamic programming
IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BENGAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel Emal:
More informationOn Mean Squared Error of Hierarchical Estimator
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
More informationA Study of the Cosine DistanceBased Mean Shift for Telephone Speech Diarization
TASL046013 1 A Study of the Cosne DstanceBased Mean Shft for Telephone Speech Darzaton Mohammed Senoussaou, Patrck Kenny, Themos Stafylaks and Perre Dumouchel Abstract Speaker clusterng s a crucal
More informationAlgebraic Point Set Surfaces
Algebrac Pont Set Surfaces Gae l Guennebaud Markus Gross ETH Zurch Fgure : Illustraton of the central features of our algebrac MLS framework From left to rght: effcent handlng of very complex pont sets,
More informationThe Stochastic Guaranteed Service Model with Recourse for MultiEchelon Warehouse Management
The Stochastc Guaranteed Servce Model wth Recourse for MultEchelon Warehouse Management Jörg Rambau, Konrad Schade 1 Lehrstuhl für Wrtschaftsmathematk Unverstät Bayreuth Bayreuth, Germany Abstract The
More informationStable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation
Stable Dstrbutons, Pseudorandom Generators, Embeddngs, and Data Stream Computaton PIOTR INDYK MIT, Cambrdge, Massachusetts Abstract. In ths artcle, we show several results obtaned by combnng the use of
More informationPerson Reidentification by Probabilistic Relative Distance Comparison
Person Redentfcaton by Probablstc Relatve Dstance Comparson WeSh Zheng 1,2, Shaogang Gong 2, and Tao Xang 2 1 School of Informaton Scence and Technology, Sun Yatsen Unversty, Chna 2 School of Electronc
More informationMANY of the problems that arise in early vision can be
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 26, NO. 2, FEBRUARY 2004 147 What Energy Functons Can Be Mnmzed va Graph Cuts? Vladmr Kolmogorov, Member, IEEE, and Ramn Zabh, Member,
More informationSelfAdaptive SLADriven Capacity Management for Internet Services
SelfAdaptve SLADrven Capacty Management for Internet Servces Bruno Abrahao, Vrglo Almeda and Jussara Almeda Computer Scence Department Federal Unversty of Mnas Geras, Brazl Alex Zhang, Drk Beyer and
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.
More informationMean Field Theory for Sigmoid Belief Networks. Abstract
Journal of Artæcal Intellgence Research 4 è1996è 61 76 Submtted 11è95; publshed 3è96 Mean Feld Theory for Sgmod Belef Networks Lawrence K. Saul Tomm Jaakkola Mchael I. Jordan Center for Bologcal and Computatonal
More informationNew Approaches to Support Vector Ordinal Regression
New Approaches to Support Vector Ordnal Regresson We Chu chuwe@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, Unversty College London, London, WCN 3AR, UK S. Sathya Keerth selvarak@yahoonc.com
More information(Almost) No Label No Cry
(Almost) No Label No Cry Gorgo Patrn,, Rchard Nock,, Paul Rvera,, Tbero Caetano,3,4 Australan Natonal Unversty, NICTA, Unversty of New South Wales 3, Ambata 4 Sydney, NSW, Australa {namesurname}@anueduau
More informationMaxMargin Early Event Detectors
MaxMargn Early Event Detectors Mnh Hoa Fernando De la Torre Robotcs Insttute, Carnege Mellon Unversty Abstract The need for early detecton of temporal events from sequental data arses n a wde spectrum
More informationOutofSample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
OutofSample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, JeanFranços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque
More informationVIII SIMPÓSIO DE ESPECIALISTAS EM PLANEJAMENTO DA OPERAÇÃO E EXPANSÃO ELÉTRICA
VIII SIMPÓSIO DE ESPECIALISTAS EM PLANEJAMENTO DA OPERAÇÃO E EXPANSÃO ELÉTRICA VIII SEPOPE 9 a 3 de mao de May, 9 th to 3 rd Brasíla (DF) Brasl VIII SYMPOSIUM OF SPECIALISTS IN ELECTRIC OPERATIONAL AND
More informationDocumentation for the TIMES Model PART I
Energy Technology Systems Analyss Programme http://www.etsap.org/tools.htm Documentaton for the TIMES Model PART I Aprl 2005 Authors: Rchard Loulou Uwe Remne Amt Kanuda Antt Lehtla Gary Goldsten 1 General
More informationHuman Tracking by Fast Mean Shift Mode Seeking
JOURAL OF MULTIMEDIA, VOL. 1, O. 1, APRIL 2006 1 Human Trackng by Fast Mean Shft Mode Seekng [10 font sze blank 1] [10 font sze blank 2] C. Belezna Advanced Computer Vson GmbH  ACV, Venna, Austra Emal:
More informationAsRigidAsPossible Image Registration for Handdrawn Cartoon Animations
AsRgdAsPossble Image Regstraton for Handdrawn Cartoon Anmatons Danel Sýkora Trnty College Dubln John Dnglana Trnty College Dubln Steven Collns Trnty College Dubln source target our approach [Papenberg
More informationGAME THEORETIC APPROACHES TO PARALLEL MACHINE SCHEDULING DIANA GINETH RAMÍREZ RIOS CLAUDIA MARCELA RODRÍGUEZ PINTO
GAME THEORETIC APPROACHES TO PARALLEL MACHINE SCHEDULING DIANA GINETH RAMÍREZ RIOS CLAUDIA MARCELA RODRÍGUEZ PINTO Undergraduate Project as a requste for graduaton DIRECTOR: Ing. Carlos Paternna Ph.D UNIVERSIDAD
More informationOptimal Call Routing in VoIP
Optmal Call Routng n VoIP Costas Courcoubets Department of Computer Scence Athens Unversty of Economcs and Busness 47A Evelpdon Str Athens 11363, GR Emal: courcou@aueb.gr Costas Kalogros Department of
More informationA fractional adaptation law for sliding mode control
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING Int. J. Adapt. Control Sgnal Process. 28; 22:968 986 Publshed onlne 7 October 28 n Wley InterScence (www.nterscence.wley.com). DOI:.2/acs.62
More informationTAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES
TAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES Gregoro Perchnsky(1) Magdalena Servente(2) Arturo Carlos Servetto(1) Ramón García Martínez(3,2) Rosa Beatrz Orellana(4)
More informationEVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURELESS VOIP. Kunchan Lan and Tsunghsun Wu
EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURELESS VOIP Kunchan Lan and Tsunghsun Wu Natonal Cheng Kung Unversty klan@cse.ncku.edu.tw, ryan@cse.ncku.edu.tw ABSTRACT Voce over IP (VoIP) s one of
More informationComplete Fairness in Secure TwoParty Computation
Complete Farness n Secure TwoParty Computaton S. Dov Gordon Carmt Hazay Jonathan Katz Yehuda Lndell Abstract In the settng of secure twoparty computaton, two mutually dstrustng partes wsh to compute
More informationP2P/ Gridbased Overlay Architecture to Support VoIP Services in Large Scale IP Networks
PP/ Grdbased Overlay Archtecture to Support VoIP Servces n Large Scale IP Networks We Yu *, Srram Chellappan # and Dong Xuan # * Dept. of Computer Scence, Texas A&M Unversty, U.S.A. {weyu}@cs.tamu.edu
More informationHow Bad are Selfish Investments in Network Security?
1 How Bad are Selfsh Investments n Networ Securty? Lbn Jang, Venat Anantharam and Jean Walrand EECS Department, Unversty of Calforna, Bereley {ljang,ananth,wlr}@eecs.bereley.edu Abstract Internet securty
More information