Factoring Big Tensors in the Cloud. A Tutorial on Big Tensor Data Analytics
|
|
|
- Norman Ellis
- 10 years ago
- Views:
Transcription
1 Factoring Tensors in the Cloud: A Tutorial on Big Tensor Data Analytics Nikos Sidiropoulos, UMN, and Evangelos Papalexakis, CMU ICASSP Tutorial, 5/5/2014 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
2 Web, papers, software, credits Nikos Sidiropoulos, UMN Signal and Tensor Analytics Research (STAR) group Signal processing, communications, optimization Tensor decomposition, factor analysis Big data Spectrum sensing, cognitive radio Sidekicks: preference measurement, conference review & session assignment Vangelis Papalexakis, Christos Faloutsos, CMU epapalex/ and christos/ Data mining Graph mining Tensor analysis, coupled matrix-tensor analysis Big data, Hadoop Christos Faloutsos, Tom Mitchell (CMU), Nikos Sidiropoulos, George Karypis (UMN), NSF-NIH/BIGDATA: Big Tensor Mining: Theory, Scalable Algorithms and Applications, NSF IIS / Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
3 Matrices, rank decomposition A matrix (or two-way array) is a dataset X indexed by two indices, (i, j)-th entry X(i, j). Simple matrix S(i, j) = a(i)b(j), i, j; separable, every row (column) proportional to every other row (column). Can write as S = ab T. rank(x) := smallest number of simple (separable, rank-one) matrices needed to generate X - a measure of complexity. X(i, j) = F a f (i)b f (j); or X = f =1 F a f b T f = AB T. Turns out rank(x) = maximum number of linearly independent rows (or, columns) in X. Rank decomposition for matrices is not unique (except for matrices of rank = 1), as invertible M: ( ) X = AB T = (AM) M T B T = (AM) ( BM 1) T = Ã B T. f =1 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
4 Tensor? What is this? CS slang for three-way array: dataset X indexed by three indices, (i, j, k)-th entry X(i, j, k). In plain words: a shoebox! Warning: different meaning in Physics! For two vectors a (I 1) and b (J 1), a b is an I J rank-one matrix with (i, j)-th element a(i)b(j); i.e., a b = ab T. For three vectors, a (I 1), b (J 1), c (K 1), a b c is an I J K rank-one three-way array with (i, j, k)-th element a(i)b(j)c(k). The rank of a three-way array X is the smallest number of outer products needed to synthesize X. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
5 Motivating example: CMU / Tom Mitchell Crawl web, learn language like children do : encounter new concepts, learn from context NELL triplets of subject-verb-object naturally lead to a 3-mode tensor subject verb X c 1 b 1 + a 1 c 2 b 2 + a c F a F b F object Each rank-one factor corresponds to a concept, e.g., leaders or tools E.g., say a 1, b 1, c 1 corresponds to leaders : subjects/rows with high score on a 1 will be Obama, Merkel, Steve Jobs, objects/columns with high score on b 1 will be USA, Germany, Apple Inc., and verbs/fibers with high score on c 1 will be verbs, like lead, is-president-of, and is-ceo-of. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
6 Kronecker and Khatri-Rao products stands for the Kronecker product: BA(1, 1), BA(1, 2), A B = BA(2, 1), BA(2, 2),. stands for the Khatri-Rao (column-wise Kronecker) product: given A (I F ) and B (J F ), A B is the JI F matrix A B = [ A(:, 1) B(:, 1) A(:, F ) B(:, F ) ] vec(abc) = (C T A)vec(B) If D = diag(d), then vec(adc) = (C T A)d Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
7 Rank decomposition for tensors Tensor: Scalar: Slabs: Matrix: Tall vector: x (KJI) := vec X(i, j, k) = X = F a f b f c f f =1 F a i,f b j,f c k,f, f =1 i {1,, I} j {1,, J} k {1,, K } X k = AD k (C)B T, k = 1,, K X (KJ I) = (B C)A T ( X (KJ I)) = (A (B C)) 1 F 1 = (A B C) 1 F 1 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
8 Do I need to worry about higher-way (or, higher-order) tensors? Semantic analysis of Brain fmri data fmri semantic category scores fmri mode is vectorized (O( )) Spatial coherence - better to treat as three separate spatial modes 5-way array Temporal dynamics - include time as another dimension 6-way array Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
9 Rank decomposition for higher-order tensors Tensor: Scalar: Matrix: X = F f =1 a (1) f X(i 1,, i N ) = a (N) f F N a (n) i n,f f =1 n=1 X (I 1I 2 I N 1 I N ) = (A (N 1) A (N 2) A (1)) ( A (N)) T Tall vector: ( ) x (I 1 I N ) := vec X (I 1I 2 I N 1 I N ) = ( A (N) A (N 1) A (N 2) A (1)) 1 F 1 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
10 Tensor Singular Value Decomposition? For matrices, SVD is instrumental: rank-revealing, Eckart-Young So is there a tensor equivalent to the matrix SVD? Yes,... and no! In fact there is no single tensor SVD. Two basic decompositions: CANonical DECOMPosition (CANDECOMP), also known as PARAllel FACtor (PARAFAC) analysis, or CANDECOMP-PARAFAC (CP) for short: non-orthogonal, unique under certain conditions. Tucker3, orthogonal without loss of generality, non-unique except for very special cases. Both are outer product decompositions, but with very different structural properties. Rule of thumb: use Tucker3 for subspace estimation and tensor approximation, e.g., compression applications; use PARAFAC for latent parameter estimation - recovering the hidden rank-one factors. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
11 Tucker3 I J K three-way array X A : I L, B : J M, C : K N mode loading matrices G : L M N Tucker3 core Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
12 Tucker3, continued Consider an I J K three-way array X comprising K matrix slabs {X k } K k=1, arranged into matrix X := [vec(x 1),, vec(x K )]. The Tucker3 model can be written as X (B A)GC T, where G is the Tucker3 core tensor G recast in matrix form. The non-zero elements of the core tensor determine the interactions between columns of A, B, C. The associated model-fitting problem is min X (B A,B,C,G A)GCT 2 F, which is usually solved using an alternating least squares procedure. vec(x) (C B A) vec(g). Highly non-unique - e.g., rotate C, counter-rotate G using unitary matrix. Subspaces can be recovered; Tucker3 is good for tensor approximation, not latent parameter estimation. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
13 PARAFAC Figure : From Matlab Central / N-way toolbox (Rasmus Bro) mathworks.com/matlabcentral/fileexchange/1088-the-n-way-toolbox Low-rank tensor decomposition / approximation F X a f b f c f, f =1 PARAFAC [Harshman 70-72], CANDECOMP [Carroll & Chang, 70], now CP; also cf. [Hitchcock, 27] X k AD k (C)B T, where D k (C) is a diagonal matrix holding the k-th row of C in its diagonal. Combining slabs and using Khatri-Rao product, X (B A)C T vec(x) (C B A) 1 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
14 Uniqueness Under certain conditions, PARAFAC is essentially unique, i.e., (A, B, C) can be identified from X up to permutation and scaling of columns - there s no rotational freedom; cf. [Kruskal 77, Sidiropoulos et al 00-07, de Lathauwer 04-, Stegeman 06-, Chiantini, Ottaviani 11-,...] I J K tensor X of rank F, vectorized as IJK 1 vector x = (A B C) 1, for some A (I F ), B (J F), and C (K F ) - a PARAFAC model of size I J K and order F parameterized by (A, B, C). The Kruskal-rank of A, denoted k A, is the maximum k such that any k columns of A are linearly independent (k A r A := rank(a)). Given X ( x), if k A + k B + k C 2F + 2, then (A, B, C) are unique up to a common column permutation and scaling/counter-scaling (e.g., multiply first column of A by 5, divide first column of B by 5, outer product stays the same) - cf. [Kruskal, 1977] N-way case: N n=1 k A (n) 2F + (N 1) [Sidiropoulos & Bro, 2000] Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
15 Special case: Two slabs Assume square/tall, full column rank A(I F) and B(J F ); and distinct nonzero elements along the diagonal of D: X 2 X 1 = AB T, X 2 = ADB T Introduce the compact F -component SVD of [ ] [ ] X1 A = B T = UΣV H AD ([ A rank(b) = F implies that span(u) = span AD a nonsingular matrix P such that U = [ U1 ] [ A = U 2 AD ] P ]) ; hence there exists Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
16 Special case: Two slabs, continued Next, construct the auto- and cross-product matrices R 1 = U H 1 U 1 = P H A H AP =: QP, R 2 = U H 1 U 2 = P H A H ADP = QDP. All matrices on the right hand side are square and full-rank. Solving the first equation and substituting to the second, yields the eigenvalue problem ( ) R 1 1 R 2 P 1 = P 1 D P 1 (and P) recovered up to permutation and scaling From bottom of previous page, A likewise recovered, then B Starts making sense... Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
17 Alternating Least Squares (ALS) Based on matrix view: Multilinear LS problem: X (KJ I) = (B C)A T min A,B,C X(KJ I) (B C)A T 2 F NP-hard - even for a single component, i.e., vector a, b, c. See [Hillar and Lim, Most tensor problems are NP-hard, 2013] But... given interim estimates of B, C, can easily solve for conditional LS update of A: A CLS = ((B C) X (KJ I)) T Similarly for the CLS updates of B, C (symmetry); alternate until cost function converges (monotonically). Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
18 Other algorithms? Many! - first-order (gradient-based), second-order (Hessian-based) Gauss-Newton, line search, Levenberg-Marquardt, weighted least squares, majorization Algebraic initialization (matters) See Tomasi and Bro, 2006, for a good overview Second-order advantage when close to optimum, but can (and do) diverge First-order often prone to local minima, slow to converge Stochastic gradient descent (CS community) - simple, parallel, but very slow Difficult to incorporate additional constraints like sparsity, non-negativity, unimodality, etc. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
19 ALS No parameters to tune! Easy to program, uses standard linear LS Monotone convergence of cost function Does not require any conditions beyond model identifiability Easy to incorporate additional constraints, due to multilinearity, e.g., replace linear LS with linear NNLS for NN Even non-convex (e.g., FA) constraints can be handled with column-wise updates (optimal scaling lemma) Cons: sequential algorithm, convergence can be slow Still workhorse after all these years Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
20 Outliers, sparse residuals Instead of LS, Conditional update: LP min X (KJ I) (B C)A T 1 Almost as good: coordinate-wise, using weighted median filtering (very cheap!) [Vorobyov, Rong, Sidiropoulos, Gershman, 2005] PARAFAC CRLB: [Liu & Sidiropoulos, 2001] (Gaussian); [Vorobyov, Rong, Sidiropoulos, Gershman, 2005] (Laplacian, etc differ only in pdf-dependent scale factor). Alternating optimization algorithms approach the CRLB when the problem is well-determined (meaning: not barely identifiable). Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
21 Big (tensor) data Tensors can easily become really big! - size exponential in the number of dimensions ( ways, or modes ). Datasets with millions of items per mode - examples in second part of this tutorial. Cannot load in main memory; may reside in cloud storage. Sometimes very sparse - can store and process as (i,j,k,value) list, nonzero column indices for each row, runlength coding, etc. (Sparse) Tensor Toolbox for Matlab [Kolda et al]. Avoids explicitly computing dense intermediate results. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
22 Tensor partitioning? Parallel algorithms for matrix algebra use data partitioning Can we reuse some of these ideas? Low-hanging fruit? First considered in [Phan, Cichocki, Neurocomputing, 2011] Later revisited in [Almeida, Kibangou, CAMSAP 2013, ICASSP 2014] Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
23 Tensor partitioning In [Phan & Cichocki, Neurocomputing, 2011] each sub-block is independently decomposed, then low-rank factors of the big tensor are stitched together. The decomposition of each sub-block must be identifiable far more stringent ID conditions; Each sub-block has different permutation and scaling of the factors - all these must be reconciled before stitching (not discussed in paper); Noise affects sub-block decomposition more than full tensor decomposition. More recently, [Almeida & Kibangou, CAMSAP 2013, ICASSP 2014]: Also rely on partitioning, so sub-block identification conditions are required; But more robust, based on collaborative consensus-averaging, and the matching of different permutations and scales is explicitly taken into account; Requires considerable communication overhead across clusters and nodes; i.e., the parallel threads are not independent. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
24 Tensor compression Commonly used compression method for moderate -size tensors: fit orthogonal Tucker3 model, regress data onto fitted mode-bases. Implemented in COMFAC nikos/comfac.m Lossless if exact mode bases used [CANDELINC]; but Tucker3 fitting is itself cumbersome for big tensors (big matrix SVDs), cannot compress below mode ranks without introducing errors Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
25 Tensor compression Consider compressing x = vec(x) into y = Sx, where S is d IJK, d IJK. In particular, consider a specially structured compression matrix S = U T V T W T Corresponds to multiplying (every slab of) X from the I-mode with U T, from the J-mode with V T, and from the K -mode with W T, where U is I L, V is J M, and W is K N, with L I, M J, N K and LMN IJK M J L I I J X _ K K N L Y _ M N Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
26 Key Due to a property of the Kronecker product ( U T V T W T ) (A B C) = ( ) (U T A) (V T B) (W T C), from which it follows that ( ) ) y = (U T A) (V T B) (W T C) 1 = (à B C 1. i.e., the compressed data follow a PARAFAC model of size L M N and order F parameterized by (Ã, B, C), with à := UT A, B := V T B, C := W T C. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
27 Random multi-way compression can be better! Sidiropoulos & Kyrillidis, IEEE SPL Oct Assume that the columns of A, B, C are sparse, and let n a (n b, n c ) be an upper bound on the number of nonzero elements per column of A (respectively B, C). Let the mode-compression matrices U (I L, L I), V (J M, M J), and W (K N, N K ) be randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R IL, R JM, and R KN, respectively. If min(l, k A ) + min(m, k B ) + min(n, k C ) 2F + 2, L 2n a, M 2n b, N 2n c, and then the original factor loadings A, B, C are almost surely identifiable from the compressed data. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
28 Proof rests on two lemmas + Kruskal Lemma 1: Consider à := UT A, where A is I F, and let the I L matrix U be randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R IL (e.g., multivariate Gaussian with a non-singular covariance matrix). Then kã = min(l, k A ) almost surely (with probability 1) [Sidiropoulos & Kyrillidis, 2012] Lemma 2: Consider à := UT A, where à and U are given and A is sought. Suppose that every column of A has at most n a nonzero elements, and that k U T 2n a. (The latter holds with probability 1 if the I L matrix U is randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R IL, and min(i, L) 2n a.) Then A is the unique solution with at most n a nonzero elements per column [Donoho & Elad, 03] Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
29 Complexity First fitting PARAFAC in compressed space and then recovering the sparse A, B, C from the fitted compressed factors entails complexity O(LMNF + (I J K 3.5 )F ) in practice. Using sparsity first and then fitting PARAFAC in raw space entails complexity O(IJKF + (IJK ) 3.5 ) - the difference is huge. Also note that the proposed approach does not require computations in the uncompressed data domain, which is important for big data that do not fit in memory for processing. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
30 Further compression - down to O( F) in 2/3 modes Sidiropoulos & Kyrillidis, IEEE SPL Oct Assume that the columns of A, B, C are sparse, and let n a (n b, n c ) be an upper bound on the number of nonzero elements per column of A (respectively B, C). Let the mode-compression matrices U (I L, L I), V (J M, M J), and W (K N, N K ) be randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R IL, R JM, and R KN, respectively. If r A = r B = r C = F L(L 1)M(M 1) 2F (F 1), N F, L 2n a, M 2n b, N 2n c, and then the original factor loadings A, B, C are almost surely identifiable from the compressed data up to a common column permutation and scaling. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
31 Proof: Lemma 3 + results on a.s. ID of PARAFAC Lemma 3: Consider à = UT A, where A (I F) is deterministic, tall/square (I F ) and full column rank r A = F, and the elements of U (I L) are i.i.d. Gaussian zero mean, unit variance random variables. Then the distribution of à is nonsingular multivariate Gaussian. From [Stegeman, ten Berge, de Lathauwer 2006] (see also [Jiang, Sidiropoulos 2004], we know that PARAFAC is almost surely identifiable if the loading matrices Ã, B are randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R (L+M)F, C is full column rank, and L(L 1)M(M 1) 2F (F 1). Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
32 Generalization to higher-way arrays Theorem 3: Let x = (A 1 A δ ) 1 R δ d=1 I d, where Ad is I d F, and consider compressing it to y = ( U T 1 ) UT δ x = ( (U T 1 A 1 ) (U T δ A δ) ) 1 = (Ã1 Ãδ) 1 R δ d=1 L d, where the mode-compression matrices U d (I d L d, L d I d ) are randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R I d L d. Assume that the columns of Ad are sparse, and let n d be an upper bound on the number of nonzero elements per column of A d, for each d {1,, δ}. If δ min(l d, k Ad ) 2F + δ 1, and L d 2n d, d {1,, δ}, d=1 then the original factor loadings {A d } δ d=1 are almost surely identifiable from the compressed data y up to a common column permutation and scaling. Various additional results possible. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
33 Latest on PARAFAC uniqueness Luca Chiantini and Giorgio Ottaviani, On Generic Identifiability of 3-Tensors of Small Rank, SIAM. J. Matrix Anal. & Appl., 33(3), : Consider an I J K tensor X of rank F, and order the dimensions so that I J K Let i be maximal such that 2 i I, and likewise j maximal such that 2 i J If F 2 i+j 2, then X has a unique decomposition almost surely For I, J powers of 2, the condition simplifies to F IJ 4 More generally, condition implies: if F (I+1)(J+1) 16, then X has a unique decomposition almost surely Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
34 Even further compression Assume that the columns of A, B, C are sparse, and let n a (n b, n c ) be an upper bound on the number of nonzero elements per column of A (respectively B, C). Let the mode-compression matrices U (I L, L I), V (J M, M J), and W (K N, N K ) be randomly drawn from an absolutely continuous distribution with respect to the Lebesgue measure in R IL, R JM, and R KN, respectively. Assume L M N, and L, M are powers of 2, for simplicity If r A = r B = r C = F LM 4F, N M L, and L 2n a, M 2n b, N 2n c, then the original factor loadings A, B, C are almost surely identifiable from the compressed data up to a common column permutation and scaling. Allows compression down to order of F in all three modes Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
35 What if A, B, C are not sparse? If A, B, C are sparse with respect to known bases, i.e., A = RǍ, B = SˇB, and C = TČ, with R, S, T the respective sparsifying bases, and Ǎ, ˇB, Č sparse Then the previous results carry over under appropriate conditions, e.g., when R, S, T are non-singular. OK, but what if such bases cannot be found? Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
36 PARACOMP: PArallel RAndomly COMPressed Cubes L 1 N 1 Y_ 1 M 1 K I J X_ fork (U 2,V 2,W 2 ) N 2 L 2 Y_ 2 M 2 (A~ 2,B~ 2,C~ 2 ) join (LS) (A,B,C) L P Y_ P N P M p M P J V p T L p U p T I I J X_ K K N p W T p Lp N p Y_ p M p Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
37 PARACOMP Assume Ãp, B p, C p identifiable from Y p (up to perm & scaling of cols) Upon factoring Y p into F rank-one components, we obtain à p = U T p AΠ p Λ p. (1) Assume first 2 columns of each U p are common, let Ū denote this common part, and Āp := first two rows of Ãp. Then Ā p = ŪT AΠ p Λ p. Dividing each column of Āp by the element of maximum modulus in that column, denoting the resulting 2 F matrix Âp,  p = ŪT AΛΠ p. Λ does not affect the ratio of elements in each 2 1 column. If ratios are distinct, then permutations can be matched by sorting the ratios of the two coordinates of each 2 1 column of Âp. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
38 PARACOMP In practice using a few more anchor rows will improve perm-matching. When S anchor rows are used, the opt permutation matching cast as min Π Â1 ÂpΠ 2 F, Optimization over set of permutation matrices - hard? ) Â1 ÂpΠ 2 F ((Â1 = Tr ÂpΠ) T (Â1 ÂpΠ) = Â1 2 F + ÂpΠ 2 F 2Tr(ÂT 1 ÂpΠ) = Â1 2 F + Âp 2 F 2Tr(ÂT 1 ÂpΠ). max Π Tr(ÂT 1 ÂpΠ), Linear Assignment Problem (LAP), efficient soln via Hungarian Algorithm. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
39 PARACOMP After perm-matching, back to (1) and permute columns Ăp satisfying Ă p = U T p AΠΛ p. Remains to get rid of Λ p. For this, we can again resort to the first two common rows, and divide each column of Ăp with its top element Ǎ p = U T p AΠΛ. For recovery of A up to perm-scaling of cols, we then require that be full column rank. Ǎ 1. Ǎ P = U T 1. U T P AΠΛ (2) Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
40 PARACOMP This implies that 2 + P (L p 2) I. p=1 Every sub-matrix contains the two common anchor rows, duplicate rows do not increase rank. Once dim requirement met, full rank w.p. 1, non-redundant entries drawn from jointly cont. distribution (by design). Assuming L p = L, p {1,, P} P I 2 L 2. Common column perm of Ãp, B p, C p is common ( I 2 P max L 2, J M, K ) N Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
41 PARACOMP If compression ratios in different modes are similar, makes sense to use longest mode for anchoring; if this is the last mode, then ( I P max L, J M, K 2 ) N 2 Theorem: Assume that F I J K, and A, B, C are full column rank (F ). Further assume that L p = L, M p = M, N p = N, p {1,, P}, L M N, (L + 1)(M + 1) 16F, random {U p } P p=1, {V p} P p=1, each W p contains two common anchor columns, otherwise random {W p } P p=1. Then (Ãp, B p, C ) p unique up to column permutation and scaling. ( ) I If, in addition, P max L, J M, K 2 N 2, then (A, B, C) are almost surely identifiable from {(Ãp, B p, C )} P p up to a common column permutation and scaling. p=1 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
42 PARACOMP - Significance Indicative of a family of results that can be derived. Theorem shows that fully parallel computation of the big tensor decomposition is possible first result that guarantees ID of the big tensor decomposition from the small tensor decompositions, without stringent additional constraints. ( ) Corollary: If K 2 N 2 = max I L, J M, K 2 N 2, then the memory / storage and computational complexity savings afforded by PARACOMP relative to brute-force computation are of order IJ F. Note on complexity of solving master join equation: after removing redundant rows, system matrix in (2) will have approximately orthogonal columns for large I left pseudo-inverse its transpose, complexity I 2 F. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
43 PARACOMP Cases where F > min(i, J, K ) can be treated using Kruskal s condition; total storage and complexity gains still of order IJ F 2. Latent sparsity: Assume that every column of A (B, C) has at most n a (resp. n b, n c ) nonzero elements. A column of A can be uniquely recovered from only 2n a incoherent linear equations. Therefore, we may replace the condition ( I P max L, J M, K 2 ), N 2 with ( 2na P max L, 2n b M, 2n ) c 2. N 2 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
44 Color of compressed noise Y = X + Z, where Z: zero-mean additive white noise. y = x + z, with y := vec (Y), x := vec (X), z := vec (Z). Multi-way compression Y c y c := vec (Y c ) = ( U T V T W T ) y = ( ) ( ) U T V T W T x + U T V T W T z. Let z c := ( U T V T W T ) z. Clearly, E[z c ] = 0; it can be shown that E [ ] (( ) ( ) ( )) z c z T c = σ 2 U T U V T V W T W. If U, V, W are orthonormal, then noise in the compressed domain is white. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
45 Universal LS E[z c ] = 0, and E [ ] (( ) ( ) ( )) z c z T c = σ 2 U T U V T V W T W. For large I and U drawn from a zero-mean unit-variance uncorrelated distribution, U T U I by the law of large numbers. Furthermore, even if z is not Gaussian, z c will be approximately Gaussian for large IJK, by the Central Limit Theorem. Follows that least-squares fitting is approximately optimal in the compressed domain, even if it is not so in the uncompressed domain. Compression thus makes least-squares fitting universal! Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
46 Component energy preserved after compression Consider randomly compressing a rank-one tensor X = a b c, written in vectorized form as x = a b c. The compressed tensor is X, in vectorized form ( ) x = U T V T W T (a b c) = (U T a) (V T b) (W T c). Can be shown that, for moderate L, M, N and beyond, Frobenious norm of compressed rank-one tensor approximately proportional to Frobenious norm of the uncompressed rank-one tensor component of original tensor. In other words: compression approximately preserves component energy order. Low-rank least-squares approximation of the compressed tensor low-rank least-squares approximation of the big tensor, approximately. Match component permutations across replicas by sorting component energies. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
47 PARACOMP: Numerical results Nominal setup: I = J = K = 500; F = 5; A, B, C randn(500,5); L = M = N = 50 (each replica = 0.1% of big tensor); P = 12 replicas (overall cloud storage = 1.2% of big tensor). S = 3 (vs. S min = 2) anchor rows. Satisfy identifiability without much slack. + WGN std σ = COMFAC nikos used for all factorizations, big and small Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
48 PARACOMP: MSE as a function of L = M = N Fix P = 12, vary L = M = N. I=J=K=500; F=5; sigma=0.01; P=12; S= % of full data (P=12 processors, each w/ 0.1 % of full data) PARACOMP Direct, no compression A Ahat F % of full data L=M=N Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
49 PARACOMP: MSE as a function of P Fix L = M = N = 50, vary P. I=J=K=500; L=M=N=50; F=5; sigma=0.01; S= % of the full data in P=12 processors w/ 0.1% each PARACOMP Direct, no compression 10 4 A Ahat F % of the full data in P=120 processors w/ 0.1% each P Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
50 PARACOMP: MSE vs AWGN variance σ 2 Fix L = M = N = 50, P = 12, vary σ I=J=K=500; F=5; L=M=N=50; P=12; S= A Ahat F PARACOMP Direct, no compression sigma 2 Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
51 Missing elements Recommender systems, NELL, many other datasets: over 90% of the values are missing! PARACOMP to the rescue: fortuitous fringe benefit of compression (rather: taking linear combinations)! Let T denote the set of all elements, and Ψ the set of available elements. Consider one element of the compressed tensor, as it would have been computed had all elements been available; and as it can be computed from the available elements (notice normalization - important!): Y ν (l, m, n) = 1 T Ỹ ν (l, m, n) = 1 E[ Ψ ] (i,j,k) T (i,j,k) Ψ u l (i)v m (j)w n (k)x(i, j, k) u l (i)v m (j)w n (k)x(i, j, k) Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
52 Missing elements Theorem: [Marcos & Sidiropoulos, IEEE ISCCSP 2014] Assume a Bernoulli i.i.d. miss model, with parameter ρ = Prob[(i, j, k) Ψ], and let X(i, j, k) = F f =1 a f (i)b f (j)c f (k), where the elements of a f, b f and c f are all i.i.d. random variables drawn from a f (i) P a (µ a, σ a ), b f (j) P b (µ b, σ b ), and c f (k) P c (µ c, σ c ), with p a := µ 2 a + σa, 2 p b := µ 2 b + σ2 b, p c := µ 2 c + σc 2, and F := (F 1). Then, for µ a, µ b, µ c all 0, E[ E ν 2 F ] (1 ρ) E[ Y ν 2 (1 + σ2 U F ] ρ T µ 2 )(1 + σ2 V u µ 2 )(1 + σ2 W v µ 2 )( F w F + p ap b p c Fµ 2 aµ 2 ) b µ2 c Additional results in paper. Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
53 Missing elements SNR of Y, for rank(x) = 1, (µ X,σ X ) = (1,1) 70 Actual SNR using expected number of available alements Actual SNR using actual number of available alements Theoretical SNR Lower bound 60 Theoretical SNR (Exact) 50 I = J = K = 200 SNRY (db) I = 400 J = K = I = J = K = Bernoulli Density parameter (ρ) Figure : SNR of compressed tensor for different sizes of rank-one X Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
54 Missing elements SNR of loadings A, for rank(x) = 1, (µ X,σ X ) = (1,1) 45 (I,J,K) = (200,200,200) (I,J,K) = (100,100,100) 40 (I,J,K) = (50,50,50) (I,J,K) = (200,100,50) (I,J,K) = (400,50,50) SNRA (db) Bernoulli Density parameter (ρ) Figure : SNR of recovered loadings for different sizes of rank-one X Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
55 Missing elements Three-way (emission, excitation, sample) fluorescence spectroscopy data slab randn based imputation Figure : Measured and imputed data; recovered latent spectra Works even with systematically missing data! Nikos Sidiropoulos, Evangelos Papalexakis Factoring Big Tensors in the Cloud ICASSP Tutorial, 5/5/ / 55
CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
Unsupervised Multiway Data Analysis: A Literature Survey
Unsupervised Multiway Data Analysis: A Literature Survey Evrim Acar Department of Computer Science Rensselaer Polytechnic Institute [email protected] Bulent Yener Department of Computer Science Rensselaer
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
MAT188H1S Lec0101 Burbulla
Winter 206 Linear Transformations A linear transformation T : R m R n is a function that takes vectors in R m to vectors in R n such that and T (u + v) T (u) + T (v) T (k v) k T (v), for all vectors u
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
CS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
Factorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
LINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
Direct Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Multi-way Analysis in the Food Industry
Multi-way Analysis in the Food Industry Models, Algorithms, and Applications ACADEMISH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam, op gezag van de Rector Magnificus
The von Kries Hypothesis and a Basis for Color Constancy
Appears in Proc. IEEE International Conference on Computer Vision (ICCV) 2007. The von Kries Hypothesis and a Basis for Color Constancy Hamilton Y. Chong Harvard University [email protected] Steven
Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.
Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(
CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
Text Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey
α = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,
Solving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
Applied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
More than you wanted to know about quadratic forms
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences More than you wanted to know about quadratic forms KC Border Contents 1 Quadratic forms 1 1.1 Quadratic forms on the unit
3 Orthogonal Vectors and Matrices
3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix
7. LU factorization EE103 (Fall 2011-12) factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Subspace intersection tracking using the Signed URV algorithm
Subspace intersection tracking using the Signed URV algorithm Mu Zhou and Alle-Jan van der Veen TU Delft, The Netherlands 1 Outline Part I: Application 1. AIS ship transponder signal separation 2. Algorithm
Linear Algebra: Determinants, Inverses, Rank
D Linear Algebra: Determinants, Inverses, Rank D 1 Appendix D: LINEAR ALGEBRA: DETERMINANTS, INVERSES, RANK TABLE OF CONTENTS Page D.1. Introduction D 3 D.2. Determinants D 3 D.2.1. Some Properties of
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
Linear Algebraic Equations, SVD, and the Pseudo-Inverse
Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular
Chapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
Section 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
A Direct Numerical Method for Observability Analysis
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method
ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa
ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS Mikhail Tsitsvero and Sergio Barbarossa Sapienza Univ. of Rome, DIET Dept., Via Eudossiana 18, 00184 Rome, Italy E-mail: [email protected], [email protected]
8 MIMO II: capacity and multiplexing
CHAPTER 8 MIMO II: capacity and multiplexing architectures In this chapter, we will look at the capacity of MIMO fading channels and discuss transceiver architectures that extract the promised multiplexing
Inner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
Orthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj
Section 5. l j v j = [ u u j u m ] l jj = l jj u j + + l mj u m. l mj Section 5. 5.. Not orthogonal, the column vectors fail to be perpendicular to each other. 5..2 his matrix is orthogonal. Check that
Lecture 4: Partitioned Matrices and Determinants
Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying
Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in 2-106. Total: 175 points.
806 Problem Set 4 Solution Due Wednesday, March 2009 at 4 pm in 2-06 Total: 75 points Problem : A is an m n matrix of rank r Suppose there are right-hand-sides b for which A x = b has no solution (a) What
Lecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
Partial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
Lecture 2 Matrix Operations
Lecture 2 Matrix Operations transpose, sum & difference, scalar multiplication matrix multiplication, matrix-vector product matrix inverse 2 1 Matrix transpose transpose of m n matrix A, denoted A T or
Similar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.
MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column
State of Stress at Point
State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
Learning Tools for Big Data Analytics
Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF 1343860, 1442686, and MURI-FA9550-10-1-0567 Center for Advanced Signal and Image Sciences
Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining
Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
Factor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
Bilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper
Randomized Robust Linear Regression for big data applications
Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou,
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Lecture 5 Least-squares
EE263 Autumn 2007-08 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
Report: Declarative Machine Learning on MapReduce (SystemML)
Report: Declarative Machine Learning on MapReduce (SystemML) Jessica Falk ETH-ID 11-947-512 May 28, 2014 1 Introduction SystemML is a system used to execute machine learning (ML) algorithms in HaDoop,
University of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
MATLAB and Big Data: Illustrative Example
MATLAB and Big Data: Illustrative Example Rick Mansfield Cornell University August 19, 2014 Goals Use a concrete example from my research to: Demonstrate the value of vectorization Introduce key commands/functions
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
Link Prediction on Evolving Data using Tensor Factorization
Link Prediction on Evolving Data using Tensor Factorization Stephan Spiegel 1, Jan Clausen 1, Sahin Albayrak 1, and Jérôme Kunegis 2 1 DAI-Labor, Technical University Berlin Ernst-Reuter-Platz 7, 10587
THREE DIMENSIONAL GEOMETRY
Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
The Geometry of Polynomial Division and Elimination
The Geometry of Polynomial Division and Elimination Kim Batselier, Philippe Dreesen Bart De Moor Katholieke Universiteit Leuven Department of Electrical Engineering ESAT/SCD/SISTA/SMC May 2012 1 / 26 Outline
Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
Orthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes
A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences
1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
NOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
Manipulability of the Price Mechanism for Data Centers
Manipulability of the Price Mechanism for Data Centers Greg Bodwin 1, Eric Friedman 2,3,4, and Scott Shenker 3,4 1 Department of Computer Science, Tufts University, Medford, Massachusetts 02155 2 School
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
