Multi-View Regression via Canonical Correlation Analysis
|
|
|
- Claud Williams
- 9 years ago
- Views:
Transcription
1 Mult-Vew Regresson va Canoncal Correlaton Analyss Sham M. Kakade 1 and Dean P. Foster 2 1 Toyota Technologcal Insttute at Chcago Chcago, IL Unversty of Pennsylvana Phladelpha, PA Abstract. In the mult-vew regresson problem, we have a regresson problem where the nput varable (whch s a real vector) can be parttoned nto two dfferent vews, where t s assumed that ether vew of the nput s suffcent to make accurate predctons ths s essentally (a sgnfcantly weaker verson of) the co-tranng assumpton for the regresson problem. We provde a sem-supervsed algorthm whch frst uses unlabeled data to learn a norm (or, equvalently, a kernel) and then uses labeled data n a rdge regresson algorthm (wth ths nduced norm) to provde the predctor. The unlabeled data s used va canoncal correlaton analyss (CCA, whch s a closely related to PCA for two random varables) to derve an approprate norm over functons. We are able to characterze the ntrnsc dmensonalty of the subsequent rdge regresson problem (whch uses ths norm) by the correlaton coeffcents provded by CCA n a rather smple expresson. Interestngly, the norm used by the rdge regresson algorthm s derved from CCA, unlke n standard kernel methods where a specal apror norm s assumed (.e. a Banach space s assumed). We dscuss how ths result shows that unlabeled data can decrease the sample complexty. 1 Introducton Extractng nformaton relevant to a task n an unsupervsed (or sem-supervsed) manner s one of the fundamental challenges n machne learnng the underlyng queston s how unlabeled data can be used to mprove performance. In the mult-vew approach to sem-supervsed learnng [Yarowsky, 1995, Blum and Mtchell, 1998], one assumes that the nput varable x can be splt nto two dfferent vews (x (1), x (2) ), such that good predctors based on each vew tend to agree. Roughly speakng, the common underlyng mult-vew assumpton s that the best predctor from ether vew has a low error thus the best predctors tend to agree wth each other. There are many applcatons where ths underlyng assumpton s applcable. For example, object recognton wth pctures form dfferent camera angles we expect a predctor based on ether angle to have good performance. One can even
2 2 consder mult-modal vews, e.g. dentty recognton where the task mght be to dentfy a person wth one vew beng a vdeo stream and the other an audo stream each of these vews would be suffcent to determne the dentty. In NLP, an example would be a pared document corpus, consstng of a document and ts translaton nto another language. The motvatng example n Blum and Mtchell [1998] s a web-page classfcaton task, where one vew was the text n the page and the other was the hyperlnk structure. A characterstc of many of the mult-vew learnng algorthms [Yarowsky, 1995, Blum and Mtchell, 1998, Farquhar et al., 2005, Sndhwan et al., 2005, Brefeld et al., 2006] s to force agreement between the predctors, based on ether vew. The dea s to force a predctor, h (1) ( ), based on vew one to agree wth a predctor, h (2) ( ), based on vew two,.e. by constranng h (1) (x (1) ) to usually equal h (2) (x (2) ). The ntuton s that the complexty of the learnng problem should be reduced by elmnatng hypothess from each vew that do not agree wth each other (whch can be done usng unlabeled data). Ths paper studes the mult-vew, lnear regresson case: the nputs x (1) and x (2) are real vectors; the outputs y are real valued; the samples ((x (1), x (2) ), y) are jontly dstrbuted; and the predcton of y s lnear n the nput x. Our frst contrbuton s to explctly formalze a mult-vew assumpton for regresson. The mult-vew assumpton we use s a regret based one, where we assume that the best lnear predctor from each vew s roughly as good as the best lnear predctor based on both vews. Denote the (expected) squared loss of a predcton functon g(x) to be loss(g). More precsely, the mult-vew assumpton s that loss(f (1) ) loss(f) ɛ loss(f (2) ) loss(f) ɛ where f (ν) s the best lnear predctor based on vew ν {1, 2} and f s the best lnear predctor based on both vews (so f (ν) s a lnear functon of x (ν) and f s a lnear functon of x = (x (1), x (2) )). Ths assumpton mples that (only on average) the predctors must agree (shown n Lemma 1). Clearly, f the both optmal predctors f (1) and f (2) have small error, then ths assumpton s satsfed, though ths precondton s not necessary. Ths (average) agreement s explctly used n the co-regularzed least squares algorthms of Sndhwan et al. [2005], Brefeld et al. [2006], whch drectly constran such an agreement n a least squares optmzaton problem. Ths assumpton s rather weak n comparson to prevous assumptons [Blum and Mtchell, 1998, Dasgupta et al., 2001, Abney, 2004]. Our assumpton can be vewed as weakenng the orgnal co-tranng assumpton (for the classfcaton case). Frst, our assumpton s stated n terms of expected errors only and mples only expected approxmate agreement (see Lemma 1). Second, our assumpton s only n terms of regret we do not requre that the loss of any predctor be small. Lastly, we make no further dstrbutonal assumptons (asde from a bounded second moment on the output varable), such as the commonly used, overly-strngent assumpton that the dstrbuton of the vews be condtonally
3 3 ndependent gven the label [Blum and Mtchell, 1998, Dasgupta et al., 2001, Abney, 2004]. In Balcan and Blum [2006], they provde a compatblty noton whch also relaxes ths latter assumpton, though t s unclear f ths compatblty noton (defned for the classfcaton settng) easly extends to the assumpton above. Our man result provdes an algorthm and an analyss under the above multvew regresson assumpton. The algorthm used can be thought of as a rdge regresson algorthm wth regularzaton based on a norm that s determned by canoncal correlaton analyss (CCA). Intutvely, CCA [Hotellng, 1935] s an unsupervsed method for analyzng jontly dstrbuted random vectors. In our settng, CCA can be performed wth the unlabeled data. We characterze the expected regret of our mult-vew algorthm, n comparson to the best lnear predctor, as a sum of a bas and a varance term: the bas s 4ɛ so t s small f the mult-vew assumpton s good; and the varance s d n, where n s the sample sze and d s the ntrnsc dmensonalty whch we show to be the sum of the squares of the correlaton coeffcents provded by CCA. The noton of ntrnsc dmensonalty we use s the related to that of Zhang [2005], whch provdes a noton of ntrnsc dmensonalty for kernel methods. An nterestng aspect to our settng s that no apror assumptons are made about any specal norm over the space of lnear predctons, unlke n kernel methods whch apror mpose a Banach space over predctors. In fact, our mult-vew assumpton s co-ordnate free the assumpton s stated n terms of the best lnear predctor for the gven lnear subspaces, whch has no reference to any coordnate system. Furthermore, no apror assumptons about the dmensonalty of our spaces are made thus beng applcable to nfnte dmensonal methods, ncludng kernel methods. In fact, kernel CCA methods have been developed n Hardoon et al. [2004]. The remander of the paper s organzed as follows. Secton 2 formalzes our mult-vew assumpton and revews CCA. Secton 3 presents the man results, where the bas-varance tradeoff and the ntrnsc dmensonalty are characterzed. The Dscusson expands on a number of ponts. The foremost ssue addressed s how the mult-vew assumpton, wth unlabeled data, could potentally allow a sgnfcant reducton n the sample sze. Essentally, n the hgh (or nfnte) dmensonal case, the mult-vew assumpton mposes a norm whch could concde wth a much lower ntrnsc dmensonalty. In the Dscusson, we also examne two related mult-vew learnng algorthms: the SVM-2K algorthm of Farquhar et al. [2005] and the co-regularzed least squares regresson algorthm of Sndhwan et al. [2005]. 2 Prelmnares Ths frst part of ths secton presents the mult-vew regresson settng and formalzes the mult-vew assumpton. As s standard, we work wth a dstrbuton D(x, y) over nput-output pars. To abstract away the dffcultes of analyzng the use of a random unlabeled set sampled from D(x), we nstead assume that
4 4 the second order statstcs of x are known. The transductve settng and the fxed desgn settng (whch we dscuss later n Secton 3) are cases where ths assumpton s satsfed. The second part of ths secton revews CCA. 2.1 Regresson wth Multple Vews Assume that the nput space X s a subset of a real lnear space, whch s of ether fnte dmenson (.e. X R d ) or countably nfnte dmenson. Also assume that each x X s n l 2 (.e. x s a squared summable sequence). In the mult-vew framework, assume each x has the form x = (x (1), x (2) ), where x (1) and x (2) are nterpreted as the two vews of x. Hence, x (1) s an element of a real lnear space X (1) and x (2) s n a real lnear space X (2) (and both x (1) and x (2) are n l 2 ). Conceptually, we should thnk of these spaces as beng hgh dmensonal (or countably nfnte dmensonal). We also have outputs y that are n R, along wth a jont dstrbuton D(x, y) over X R. We assume that the second moment of the output s bounded by 1,.e. E[y 2 x] 1 t s not requred that y tself be bounded. No boundedness assumptons on x X are made, snce these assumptons would have no mpact on our analyss as t s only the subspace defned by X that s relevant. We also assume that our algorthm has knowledge of the second order statstcs of D(x),.e. we assume that the covarance matrx of x s known. In both the transductve settng and the fxed desgn settng, such an assumpton holds. Ths s dscussed n more detal n Secton 3. The loss functon consdered for g : X R s the average squared error. More formally, loss(g) = E [ (g(x) y) 2] where the expectaton s wth respect to (x, y) sampled from D. We are also nterested n the losses for predctors, g (1) : X (1) R and g (2) : X (2) R, based on the dfferent vews, whch are just loss(g (ν) ) for ν {1, 2}. The followng assumpton s made throughout the paper. Assumpton 1 (Mult-Vew Assumpton) Defne L(Z) to be the space of lnear mappngs from a lnear space Z to the reals and defne: f (1) = argmn g L(X (1) )loss(g) f (2) = argmn g L(X (2) )loss(g) f = argmn g L(X) loss(g) whch exst snce X s a subset of l 2. The mult-vew assumpton s that for ν {1, 2}. loss(f (ν) ) loss(f) ɛ Note that ths assumpton makes no reference to any coordnate system or norm over the lnear functons. Also, t s not necessarly assumed that the losses,
5 5 themselves are small. However, f loss(f (ν) ) s small for ν {1, 2}, say less than ɛ, then t s clear that the above assumpton s satsfed. The followng Lemma shows that the above assumpton mples that f (1) and f (2) tend to agree on average. Lemma 1. Assumpton 1 mples that: ( 2 E f (1) (x (1) ) f (2) (x )) (2) 4ɛ where the expectaton s wth respect to x sampled from D. The proof s provded n the Appendx. As mentoned n the Introducton, ths agreement s explctly used n the co-regularzed least squares algorthms of Sndhwan et al. [2005], Brefeld et al. [2006]. 2.2 CCA and the Canoncal Bass A useful bass s that provded by CCA, whch we defne as the canoncal bass. Defnton 1. Let B (1) be a bass of X (1) and B (2) be a bass of X (2). Let x (ν) 1, x(ν) 2,... be the coordnates of x(ν) n B (ν). The par of bases B (1) and B (2) are the canoncal bases f the followng holds (where the expectaton s wth respect to D): 1. Orthogonalty Condtons: 2. Correlaton Condtons: E E[x (ν) x (ν) j ] = { 1 f = j 0 else [ ] { x (1) x (2) λ f = j j = 0 else where, wthout loss of generalty, t s assumed that 1 λ 0 and that 1 λ 1 λ 2... The -th canoncal correlaton coeffcent s defned as λ. Roughly speakng, the jont covarance matrx of x = (x (1), x (2) ) n the canoncal bass has a partcular structured form: the ndvdual covarance matrces of x (1) and x (2) are just dentty matrces and the cross covarance matrx between x (1) and x (2) s dagonal. CCA can also be specfed as an egenvalue problem 3 (see Hardoon et al. [2004] for revew). 3 CCA fnds such a bass s as follows. The correlaton coeffcent between two real values (jontly dstrbuted) s defned as corr(z, z E[zz ) = ] Let Π ax be the E[z 2 ]E[z 2 ] projecton operator, whch projects x onto drecton a. The frst canoncal bass vectors b (1) 1 B (1) and b (2) 1 B (2) are the unt length drectons a and b whch maxmze corr(π ax (1), Π b x (2) ) and the correspondng canoncal correlaton coeffcent λ 1 s ths maxmal correlaton. Inductvely, the next par of drectons can be found whch maxmze the correlaton subject to the par beng orthogonal to the prevously found pars.
6 6 3 Learnng Now let us assume we have observed a tranng sample T = {(x (ν) m, y m )} n m=1 of sze n from a vew ν, where the samples drawn ndependently from D. We also assume that our algorthm has access to the covarance matrx of x, so that the algorthm can construct the canoncal bass. Our goal s to construct an estmator f (ν) of f (ν) recall f (ν) s the best lnear predctor usng only vew ν such that the regret s small. loss( f (ν) ) loss(f (ν) ) Remark 1. (The Transductve and Fxed Desgn Settng) There are two natural settngs where ths assumpton of knowledge about the second order statstcs of x holds the random transductve case and the fxed desgn case. In both cases, X s a known fnte set. In the random transductve case, the dstrbuton D s assumed to be unform over X, so each x m s sampled unformly from X and each y m s sampled from D(y x m ). In the fxed desgn case, assume that each x X appears exactly once n T and agan y m s sampled from D(y x m ). The fxed desgn case s commonly studed n statstcs and s also referred to as sgnal reconstructon. 4 The covarance matrx of x s clearly known n both cases. 3.1 A Shrnkage Estmator (va Rdge Regresson) Let the representaton of our estmator f (ν) n the canoncal bass B (ν) be f (ν) (x (ν) ) = β (ν) x (ν) (1) where x (ν) of β (ν) as: s the -th coordnate n B (ν). Defne the canoncal shrnkage estmator β (ν) = λ Ê[x y] λ n m x (ν) m, y m (2) Intutvely, the shrnkage by λ down-weghts drectons that are less correlated wth the other vew. In the extreme case, ths estmator gnores the uncorrelated coordnates, those where λ = 0. The followng remark shows how ths estmator has a natural nterpretaton n the fxed desgn settng t s the result of rdge regresson wth a specfc norm (nduced by CCA) over functons n L(X (ν) ). 4 In the fxed desgn case, one can vew each y m = f(x m) + η, where η s 0 mean nose so f(x m) s the condtonal mean. After observng a sample {(x (ν) m, y m)} X m=1 for all x X (so n = X ), the goal s to reconstruct f( ) accurately.
7 7 Remark 2. (Canoncal Rdge Regresson). We now specfy a rdge regresson algorthm for whch the shrnkage estmator s the soluton. Defne the canoncal norm for a lnear functon n L(X (ν) (ν) ) as follows: usng the representaton of f n B (ν) as defned n Equaton 1, the canoncal norm of f (ν) s defned as: f (ν) 1 λ ( β(ν) ) 2 CCA = (3) λ where we overload notaton and wrte f (ν) CCA = β (ν) CCA. Hence, functons whch have large weghts n the less correlated drectons (those wth small λ ) have larger norms. Equpped wth ths norm, the functons n L(X (ν) ) defne a Banach space. In the fxed desgn settng, the rdge regresson algorthm wth ths norm chooses the β (ν) whch mnmzes: 1 X X m=1 ( y m β (ν) x (ν) m ) 2 + β(ν) 2 CCA Recall, that n the fxed desgn settng, we have a tranng example for each x X, so the sum s over all x X. It s straghtforward to show (by usng orthogonalty) that the estmator whch mnmzes ths loss s the canoncal shrnkage estmator defned above. In the more general transductve case, t s not qute ths estmator, snce the sampled ponts {x (ν) m } m may not be orthogonal n the tranng sample (they are only orthogonal when summed over all x X). However, n ths case, we expect that the estmator provded by rdge regresson s approxmately equal to the shrnkage estmator. We now state the frst man theorem. Theorem 1. Assume that E[y 2 x] 1 and that Assumpton 1 holds. Let f (ν) be the estmator constructed wth the canoncal shrnkage estmator (Equaton 2) on tranng set T. For ν 1, 2, then E T [loss( f (ν) )] loss(f (ν) ) 4ɛ + λ2 n where expectaton s wth respect to the tranng set T sampled accordng to D n. We comment on obtanng hgh probablty bounds n the Dscusson. The proof (presented n Secton 3.3) shows that the 4ɛ results from the bas n the P λ2 n algorthm and as the ntrnsc dmensonalty. Note that Assumpton 1 mples that: results from the varance. It s natural to nterpret λ2 E T [loss( f (ν) )] loss(f) 5ɛ + λ2 n where the comparson s to the best lnear predctor f over both vews.
8 8 Remark 3. (Intrnsc Dmensonalty) Let β (ν) be a lnear estmator n the vector of sampled outputs, Y = (y 1, y 2,... y m ). Note that the prevous thresholded estmator s such a lnear estmator (n the fxed desgn case). We can wrte β (ν) = P Y where P s a lnear operator. Zhang [2005] defnes tr(p T P ) as the ntrnsc dmensonalty, where tr( ) s the trace operator. Ths was motvated by the fact that n the fxed desgn settng the error drops as tr(p T P ) n, whch s bounded by d n n a fnte dmensonal space. Zhang [2005] then goes on to analyze the ntrnsc dmensonalty of kernel methods n the random desgn settng (obtanng hgh probablty bounds). In our settng, the sum λ2 s precsely ths trace, as P s a dagonal matrx wth entres λ. 3.2 A (Possbly) Lower Dmensonal Estmator Consder the thresholded estmator: { β (ν) Ê[x y] f λ = 1 ɛ 0 else (4) where agan Ê[x y] s the emprcal expectaton 1 n m x(ν) m, y m. Ths estmator uses an unbased estmator of β (ν) for those wth large λ and thresholds to 0 for those wth small λ. Hence, the estmator lves n a fnte dmensonal space (determned by the number of λ whch are greater than 1 ɛ). Theorem 2. Assume that E[y 2 x] 1 and that Assumpton 1 holds. Let d be the number of λ for whch λ 1 ɛ. Let f (ν) be the estmator constructed wth the threshold estmator (Equaton 4) on tranng set T. For ν 1, 2, then E T [loss( f (ν) )] loss(f (ν) ) 4 ɛ + d n where expectaton s wth respect to the tranng set T sampled accordng to D n. Essentally, the above ncreases the bas to 4 ɛ and (potentally) decreases (ν) the varance. Such a bound may be useful f we desre to explctly keep β n a lower dmensonal space n contrast, the explct dmensonalty of the shrnkage estmator could be as large as X. 3.3 The Bas-Varance Tradeoff Ths secton provdes lemmas for the proofs of the prevous theorems. We characterze the bas-varance tradeoff n ths error analyss. Frst, a key techncal lemma s useful, for whch the proof s provded n the Appendx.
9 9 Lemma 2. Let the representaton of the best lnear predctor f (ν) (defned n Assumpton 1) n the canoncal bass B (ν) be f (ν) (x (ν) ) = β (ν) x (ν) (5) Assumpton 1 mples that ( (1 λ ) β (ν) ) 2 4ɛ for ν {1, 2}. Ths lemma shows how the weghts (of an optmal lnear predctor) cannot be too large n coordnates wth small canoncal correlaton coeffcents. Ths s because for those coordnates wth small λ, the correspondng β must be small enough so that the bound s not volated. Ths lemma provdes the techncal motvaton for our algorthms. Now let us revew some useful propertes of the square loss. Usng the representatons of f (ν) and f defned n Equatons 1 and 5, a basc fact for the square loss wth lnear predctors s that loss( f (ν) ) loss(f (ν) ) = β (ν) β (ν) 2 2 where x 2 = x2. The expected regret can be decomposed as follows: ] [ ] E T [ β (ν) β (ν) 2 2 = E T [ β (ν) ] β (ν) E T β (ν) E T [ β (ν) ] 2 2 (6) = E T [ β (ν) ] β (ν) Var( β (ν) ) (7) where the frst term s the bas and the second s the varance. The proof of Theorems 1 and 2 follow drectly from the next two lemmas. Lemma 3. (Bas-Varance for the Shrnkage Estmator) Under the precondtons of Theorem 1, the bas s bounded as: and the varance s bounded as: E T [ β (ν) ] β (ν) 2 2 4ɛ Proof. It s straghtforward to see that: Var( β (ν) ) λ2 n β (ν) = E[x y] whch mples that E T [ β (ν) ] = λ β (ν)
10 10 Hence, for the bas term, we have: E T [ β (ν) ] β (ν) 2 2 = (1 λ ) 2 (β (ν) ) 2 (1 λ )(β (ν) ) 2 4ɛ We have for the varance Var( β (ν) ) = λ2 n Var(x(ν) y) λ2 n E[(x(ν) y) 2 ] = λ2 n E[(x(ν) ) 2 E[y 2 x]] λ2 n E[(x(ν) ) 2 ] = λ2 n The proof s completed by summng over. Lemma 4. (Bas-Varance for the Thresholded Estmator) Under the precondtons of Theorem 2, the bas s bounded as: E T [ β (ν) ] β (ν) ɛ and the varance s bounded as: Var( β (ν) ) d n Proof. For those such that λ 1 ɛ, E T [ β (ν) ] = β (ν) Let j be the ndex at whch the thresholdng begns to occur,.e. t s the smallest nteger such that λ j < 1 ɛ. Usng that for j, we have 1 < (1 λ j )/ ɛ
11 11 (1 λ )/ ɛ, so the bas can be bounded as follows: E T [ β (ν) ] β (ν) 2 2 = = j ( E T [ β (ν) (β (ν) ) 2 ) 2 ] β (ν) j 1 λ (β (ν) ɛ ) 2 1 (1 λ )(β (ν) ɛ 4 ɛ ) 2 where the last step uses Lemma 2. Analogous to the prevous proof, for each < j, we have: Var( β (ν) ) 1 and there are d such. 4 Dscusson Why does unlabeled data help? Theorem 1 shows that the regret drops at a unform rate (down to ɛ). Ths rate s the ntrnsc dmensonalty, λ2, dvded by the sample sze n. Note that ths ntrnsc dmensonalty s only a property of the nput dstrbuton. Wthout the mult-vew assumpton (or workng n the sngle vew case), the rate at whch our error drops s governed by the extrnsc dmensonalty of x, whch could be large (or countably nfnte), makng ths rate very slow wthout further assumptons. It s straghtforward to see that the ntrnsc dmensonalty s no greater than the extrnsc dmensonalty (snce λ s bounded by 1), though t could be much less. The knowledge of the covarance matrx of x allows us to compute the CCA bass and construct the shrnkage estmator whch has the mproved converge rate based on the ntrnsc dmensonalty. Such second order statstcal knowledge can be provded by the unlabeled data, such as n the transductve and fxed desgn settngs. Let us compare to a rdge regresson algorthm (n the sngle vew case), where one apror chooses a norm for regularzaton (such as an RKHS norm mposed by a kernel). As dscussed n Zhang [2005], ths regularzaton governs the bas-varance tradeoff. The regularzaton can sgnfcantly decrease the varance the varance drops as d n where d s a noton of ntrnsc dmensonalty defned n Zhang [2005]. However, the regularzaton also bases the algorthm to predctors wth small norm there s no apror reason that there exsts a good predctor wth a bounded norm (under the pre-specfed norm). In order to obtan a reasonable convergence rate, t must also be the case that the best predctor (or a good one) has a small norm under our pre-specfed norm.
12 12 In contrast, n the mult-vew case, the mult-vew assumpton mples that the bas s bounded recall that Lemma 3 showed that the bas was bounded by 4ɛ. Essentally, our proof shows that the bas nduced by usng the specal norm nduced by CCA (n Equaton 3) s small. Now t may be the case that we have apror knowledge of what a good norm s. However, learnng the norm (or learnng the kernel) s an mportant open queston. The mult-vew settng provdes one soluton to ths problem. Can the bas be decreased to 0 asymptotcally? Theorem 1 shows that the error drops down to 4ɛ for large n. It turns out that we can not drve ths bas to 0 asymptotcally wthout further assumptons, as the nput space could be nfnte dmensonal. On obtanng hgh probablty bounds. Clearly, stronger assumptons are needed than just a bounded second moment to obtan hgh probablty bounds wth concentraton propertes. For the fxed desgn settng, f y s bounded, then t s straghtforward to obtan hgh probablty bounds through standard Chernoff arguments. For the random transductve case, ths assumpton s not suffcent ths s due to the addtonal randomness from x. Note that we cannot artfcally mpose a bound on x as the algorthm only depends on the subspace spanned by X, so upper bounds have no meanng note the algorthm scales X such that t has an dentty covarance matrx (e.g. E[x 2 ] = 1). However, f we have a hgher moment bound, say on the rato of E[x 4 ]/E[x2 ], then one could use the Bennett bound can be used to obtan data dependent hgh probablty bounds, though provdng these s beyond the scope of ths paper. Related work. The most closely related mult-vew learnng algorthms are the SVM-2K algorthm of Farquhar et al. [2005] and the co-regularzed least squares regresson algorthm of Sndhwan et al. [2005]. Roughly speakng, both of these algorthms try to fnd two hypothess h (1) ( ), based on vew one, and h (2) ( ), based on vew two whch both have low tranng error and whch tend to agree wth each other on unlabeled error, where the latter condton s enforced by constranng h (1) (x (1) ) to usually equal h (2) (x (2) ) on an unlabeled data set. The SVM-2K algorthm consders a classfcaton settng and the algorthm attempts to force agreement between the two hypothess wth slack varable style constrants, common to SVM algorthms. Whle ths algorthm s motvated by kernel CCA and SVMs, the algorthm does not drectly use kernel CCA, n contrast to our algorthm, where CCA naturally provdes a coordnate system. The theoretcal analyss n [Farquhar et al., 2005] argues that the Rademacher complexty of the hypothess space s reduced due to the agreement constrant between the two vews. The mult-vew approach to regresson has been prevously consdered n Sndhwan et al. [2005]. Here, they specfy a co-regularzed least squares regresson algorthm, whch s a rdge regresson algorthm wth an addtonal penalty
13 13 term whch forces the two predctons, from both vews, to agree. A theoretcal analyss of ths algorthm s provded n Rosenberg and Bartlett [2007], whch shows that the Rademacher complexty of the hypothess class s reduced by forcng agreement. Both of these prevous analyss do not explctly state a mult-vew assumpton, so t hard to drectly compare the results. In our settng, the mult-vew regret s explctly characterzed by ɛ. In a rather straghtforward manner (wthout appealng to Rademacher complextes), we have shown that the rate at whch the regret drops to 4ɛ s determned by the ntrnsc dmensonalty. Furthermore, both of these prevous algorthms use an apror specfed norm over ther class of functons (nduced by an apror specfed kernel), and the Rademacher complextes (whch are used to bound the convergence rates) depend on ths norm. In contrast, our framework assumes no norm the norm over functons s mposed by the correlaton structure between the two vews. We should also note that ther are close connectons to those unsupervsed learnng algorthms whch attempt to maxmze relevant nformaton. The Imax framework of Becker and Hnton [1992], Becker [1996] attempts to maxmze nformaton between two vews x (1) and x (2), for whch CCA s a specal case (n a contnuous verson). Subsequently, the nformaton bottleneck provded a framework for capturng the mutual nformaton between two sgnals [Tshby et al., 1999]. Here, the goal s to compress a sgnal x (1) such that t captures relevant nformaton about another sgnal x (2). The framework here s unsupervsed as there s no specfc supervsed task at hand. For the case n whch the jont dstrbuton of x (1) and x (2) s Gaussan, Chechk et al. [2003] completely characterzes the compresson tradeoffs for capturng the mutual nformaton between these two sgnals CCA provdes the coordnate system for ths compresson. In our settng, we do not explctly care about the mutual nformaton between x (1) and x (2) performance s judged only by performance at the task at hand, namely our loss when predctng some other varable y. However, as we show, t turns out that these unsupervsed mutual nformaton maxmzng algorthms provde approprate ntuton for mult-vew regresson, as they result n CCA as a bass. Acknowledgements We thank the anonymous revewers for ther helpful comments. References Steven Abney. Understandng the yarowsky algorthm. Comput. Lngust., 30(3): , ISSN Mara-Florna Balcan and Avrm Blum. A pac-style model for learnng from labeled and unlabeled data. In Sem-Supervsed Learnng, pages MIT Press, S. Becker. Mutual nformaton maxmzaton: Models of cortcal self-organzaton. Network: Computaton n Neural Systems, 1996.
14 14 Suzanna Becker and Geoffrey E. Hnton. Self-organzng neural network that dscovers surfaces n random-dot stereograms. Nature, 355(6356): , January do: /355161a0. Avrm Blum and Tom Mtchell. Combnng labeled and unlabeled data wth cotranng. In COLT 98: Proceedngs of the eleventh annual conference on Computatonal learnng theory, pages , New York, NY, USA, ACM Press. ISBN Ulf Brefeld, Thomas Gartner, Tobas Scheffer, and Stefan Wrobel. Effcent coregularsed least squares regresson. In ICML 06: Proceedngs of the 23rd nternatonal conference on Machne learnng, pages , New York, NY, USA, ACM Press. ISBN G. Chechk, A. Globerson, N. Tshby, and Y. Wess. Informaton bottleneck for gaussan varables, URL cteseer.st.psu.edu/artcle/chechk03nformaton.html. Sanjoy Dasgupta, Mchael L. Lttman, and Davd Mcallester. Pac generalzaton bounds for co-tranng, Jason D. R. Farquhar, Davd R. Hardoon, Hongyng Meng, John Shawe-Taylor, and Sándor Szedmák. Two vew learnng: Svm-2k, theory and practce. In NIPS, Davd R. Hardoon, Sandor R. Szedmak, and John R. Shawe-Taylor. Canoncal correlaton analyss: An overvew wth applcaton to learnng methods. Neural Comput., 16(12): , ISSN H. Hotellng. The most predctable crteron. Journal of Educatonal Psychology, D. Rosenberg and P Bartlett. The rademacher complexty of co-regularzed kernel classes. submtted. Proceedngs of the Eleventh Internatonal Conference on Artfcal Intellgence and Statstcs, V. Sndhwan, P. Nyog, and M. Belkn. A co-regularzed approach to sem-supervsed learnng wth multple vews. Proceedngs of the ICML Workshop on Learnng wth Multple Vews, N. Tshby, F. Perera, and W. Balek. The nformaton bottleneck method. In Proceedngs of the 37-th Annual Allerton Conference on Communcaton, Control and Computng, pages , URL cteseer.st.psu.edu/tshby99nformaton.html. Davd Yarowsky. Unsupervsed word sense dsambguaton rvalng supervsed methods. In Proceedngs of the 33rd annual meetng on Assocaton for Computatonal Lngustcs, pages , Morrstown, NJ, USA, Assocaton for Computatonal Lngustcs. Tong Zhang. Learnng bounds for kernel regresson usng effectve data dmensonalty. Neural Comput., 17(9): , ISSN Appendx We now provde the proof of Lemma 1 Proof. (of Lemma 1). Let β (ν) be the weghts for f (ν) and let β be the weghts of f n some bass. Let β (ν) x (ν) and β x be the representaton of f (ν) and f n ths bass. By Assumpton 1 ɛ E(β (ν) x (ν) y) 2 E(β x y) 2 = E(β (ν) x (ν) β x + β x y) 2 E(β x y) 2 = E(β (ν) x (ν) β x) 2 2E[(β (ν) x (ν) β x)(β x y)]
15 15 Now the normal equatons for β (the frst dervatve condtons for the optmal lnear predctor β) states that for each : E[x (β x y)] = 0 where x s the component of x. Ths mples that both E[β x(β x y)] = 0 E[β (ν) x (ν) (β x y)] = 0 where the last equaton follows snce x (ν) has components n x. Hence, E[(β (ν) x (ν) β x)(β x y)] = 0 and we have shown that: The trangle nequalty states that: E(β (1) x (1) β x) 2 ɛ E(β (2) x (2) β x) 2 ɛ E(β (1) x (1) β (2) x) 2 ( ) 2 E(β (1) x (1) β x) 2 + E(β (2) x (2) β x) 2 (2 ɛ) 2 whch completes the proof. Below s the proof of Lemma 2. Proof. (of Lemma 2) From Lemma 1, we have: 4ɛ E [(β (1) x (1) β (2) x (2) ) 2] = ( ) (β (1) ) 2 + (β (2) ) 2 2λ β (1) β (2) = ( ) (1 λ )(β (1) ) 2 + (1 λ )(β (2) ) 2 + λ ((β (1) ) 2 + (β (2) ) 2 2β (1) β (2) ) = ( (1 λ )(β (1) ) 2 + (1 λ )(β (2) ) 2 + λ (β (1) β (2) ) 2) ( (1 λ )(β (1) ) 2 + (1 λ )(β (2) ) 2) (1 λ )(β (ν) ) 2 where the last step holds for ether ν = 1 or ν = 2.
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
1 Example 1: Axis-aligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
v a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
L10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES
The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
CHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
Lecture 5,6 Linear Methods for Classification. Summary
Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson
Lecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
where the coordinates are related to those in the old frame as follows.
Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product
Fisher Markets and Convex Programs
Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
Active Learning for Interactive Visualization
Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,
Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts
Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
Learning from Multiple Outlooks
Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel [email protected] [email protected]
Performance Analysis and Coding Strategy of ECOC SVMs
Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
Georey E. Hinton. University oftoronto. Email: [email protected]. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract
The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: [email protected] Techncal
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
Statistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
Loop Parallelization
- - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
PERRON FROBENIUS THEOREM
PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()
Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
REGULAR MULTILINEAR OPERATORS ON C(K) SPACES
REGULAR MULTILINEAR OPERATORS ON C(K) SPACES FERNANDO BOMBAL AND IGNACIO VILLANUEVA Abstract. The purpose of ths paper s to characterze the class of regular contnuous multlnear operators on a product of
A Lyapunov Optimization Approach to Repeated Stochastic Games
PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT. 2013 1 A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna http://www-bcf.usc.edu/
Joe Pimbley, unpublished, 2005. Yield Curve Calculations
Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward
PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.
PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato
Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.
Producer Theory Producton ASSUMPTION 2.1 Propertes of the Producton Set The producton set Y satsfes the followng propertes 1. Y s non-empty If Y s empty, we have nothng to talk about 2. Y s closed A set
Application of Quasi Monte Carlo methods and Global Sensitivity Analysis in finance
Applcaton of Quas Monte Carlo methods and Global Senstvty Analyss n fnance Serge Kucherenko, Nlay Shah Imperal College London, UK skucherenko@mperalacuk Daro Czraky Barclays Captal DaroCzraky@barclayscaptalcom
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho
Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
Ring structure of splines on triangulations
www.oeaw.ac.at Rng structure of splnes on trangulatons N. Vllamzar RICAM-Report 2014-48 www.rcam.oeaw.ac.at RING STRUCTURE OF SPLINES ON TRIANGULATIONS NELLY VILLAMIZAR Introducton For a trangulated regon
Efficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
Sketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA [email protected] [email protected] Abstract Samplng s used as a unversal method to reduce the
SIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
The Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
Implementation of Deutsch's Algorithm Using Mathcad
Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"
A Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
Regression Models for a Binary Response Using EXCEL and JMP
SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal
Financial market forecasting using a two-step kernel learning method for the support vector regression
Ann Oper Res (2010) 174: 103 120 DOI 10.1007/s10479-008-0357-7 Fnancal market forecastng usng a two-step kernel learnng method for the support vector regresson L Wang J Zhu Publshed onlne: 28 May 2008
Least Squares Fitting of Data
Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2016. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng
Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits
Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.
Logistic Regression. Steve Kroon
Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
Mining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Relay Secrecy in Wireless Networks with Eavesdropper
Relay Secrecy n Wreless Networks wth Eavesdropper Parvathnathan Venktasubramanam, Tng He and Lang Tong School of Electrcal and Computer Engneerng Cornell Unversty, Ithaca, NY 14853 Emal : {pv45, th255,
Distributed Multi-Target Tracking In A Self-Configuring Camera Network
Dstrbuted Mult-Target Trackng In A Self-Confgurng Camera Network Crstan Soto, B Song, Amt K. Roy-Chowdhury Department of Electrcal Engneerng Unversty of Calforna, Rversde {cwlder,bsong,amtrc}@ee.ucr.edu
Addendum to: Importing Skill-Biased Technology
Addendum to: Importng Skll-Based Technology Arel Bursten UCLA and NBER Javer Cravno UCLA August 202 Jonathan Vogel Columba and NBER Abstract Ths Addendum derves the results dscussed n secton 3.3 of our
ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble
1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, [email protected] Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In
Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System
Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons
Quantization Effects in Digital Filters
Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value
Bypassing Synthesis: PLS for Face Recognition with Pose, Low-Resolution and Sketch
Bypassng Synthess: PLS for Face Recognton wth Pose, Low-Resoluton and Setch Abhshe Sharma Insttute of Advanced Computer Scence Unversty of Maryland, USA [email protected] Davd W Jacobs Insttute of Advanced
1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
HÜCKEL MOLECULAR ORBITAL THEORY
1 HÜCKEL MOLECULAR ORBITAL THEORY In general, the vast maorty polyatomc molecules can be thought of as consstng of a collecton of two electron bonds between pars of atoms. So the qualtatve pcture of σ
Enabling P2P One-view Multi-party Video Conferencing
Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P
Realistic Image Synthesis
Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
The Distribution of Eigenvalues of Covariance Matrices of Residuals in Analysis of Variance
JOURNAL OF RESEARCH of the Natonal Bureau of Standards - B. Mathem atca l Scence s Vol. 74B, No.3, July-September 1970 The Dstrbuton of Egenvalues of Covarance Matrces of Resduals n Analyss of Varance
+ + + - - This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
Portfolio Loss Distribution
Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment
Prediction of Disability Frequencies in Life Insurance
Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but
GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM
GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: [email protected] 1/Introducton The
New Approaches to Support Vector Ordinal Regression
New Approaches to Support Vector Ordnal Regresson We Chu [email protected] Gatsby Computatonal Neuroscence Unt, Unversty College London, London, WCN 3AR, UK S. Sathya Keerth [email protected]
Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque
