Randomized Robust Linear Regression for big data applications

Similar documents

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Linear Algebra Review. Vectors

Review Jeopardy. Blue vs. Orange. Review Jeopardy

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Bilinear Prediction Using Low-Rank Models

Sublinear Algorithms for Big Data. Part 4: Random Topics

Big Data Analytics: Optimization and Randomization

Sketch As a Tool for Numerical Linear Algebra

Statistical Machine Learning

Orthogonal Projections

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Greedy Column Subset Selection for Large-scale Data Sets

1 Introduction to Matrices

LINEAR ALGEBRA. September 23, 2010

Lecture 5: Singular Value Decomposition SVD (1)

MATH1231 Algebra, 2015 Chapter 7: Linear maps

Learning Tools for Big Data Analytics

Lecture 4: Partitioned Matrices and Determinants

Similarity and Diagonalization. Similar Matrices

Statistical machine learning, high dimension and big data

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

CS3220 Lecture Notes: QR factorization and orthogonal transformations

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

ISOMETRIES OF R n KEITH CONRAD

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

6. Cholesky factorization

When is missing data recoverable?

Text Analytics (Text Mining)

Inner Product Spaces and Orthogonality

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

Learning, Sparsity and Big Data

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

1 Sets and Set Notation.

Chapter 6. Orthogonality

Federated Optimization: Distributed Optimization Beyond the Datacenter

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Dynamic data processing

Chapter 7. Lyapunov Exponents. 7.1 Maps

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Collaborative Filtering. Radek Pelánek

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Operation Count; Numerical Linear Algebra

Manifold Learning Examples PCA, LLE and ISOMAP

8. Linear least-squares

The Scientific Data Mining Process

NOTES ON LINEAR TRANSFORMATIONS

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

3. INNER PRODUCT SPACES

Machine learning challenges for big data

5. Orthogonal matrices

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Orthogonal Diagonalization of Symmetric Matrices

Advanced In-Database Analytics

Factor Analysis. Chapter 420. Introduction

Compact Representations and Approximations for Compuation in Games

The degrees of freedom of the Lasso in underdetermined linear regression models

Lecture 5 Least-squares

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

The p-norm generalization of the LMS algorithm for adaptive filtering

Nonlinear Iterative Partial Least Squares Method

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Lecture 3: Finding integer solutions to systems of linear equations

α = u v. In other words, Orthogonal Projection

Linear Codes. Chapter Basics

State of Stress at Point

Big learning: challenges and opportunities

Section Inner Products and Norms

Modélisation et résolutions numérique et symbolique

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Towards running complex models on big data

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Sketching as a Tool for Numerical Linear Algebra

We shall turn our attention to solving linear systems of equations. Ax = b

Lecture 1: Schur s Unitary Triangularization Theorem

Factor analysis. Angela Montanari

Machine Learning Final Project Spam Filtering

Factor Analysis. Factor Analysis

1 VECTOR SPACES AND SUBSPACES

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Scalable Machine Learning - or what to do with all that Big Data infrastructure

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

Understanding and Applying Kalman Filtering

Chapter 17. Orthogonal Matrices and Symmetries of Space

3 Orthogonal Vectors and Matrices

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

Transcription:

Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou, Sergios Theodoridis Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 1/31

Outline 1 Big Data era 2 Randomized Methods 3 Randomized Linear Regression 4 Robust Randomized Linear Regression 5 Iterative Randomized Robust Regression 6 Randomized Low Rank matrix approximation Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 2/31

Big Data era Why all the fuss? Massive Data Volumes is not a new thing - 65 10 18 bytes flowed through telecommunication networks on 2007 First New Thing: Established Data analysis and Machine learning techniques face Big Challenges Second New Thing: Novel approaches for data capturing, handling and processing emerged Third New Thing: New modalities and increased complexity (internet of things, cyber-physical systems, smart homes, smart cars etc.) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 3/31

Big Data era Why all the fuss? Marketing policies From Big Data to Insights New emerging applications From Big Data to Insights Big Profits 4.4 million data scientists needed by 2015 (IBM) Many challenging open problems / paradigm shift Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 4/31

Big Data era What characterizes big data Volume (scale of data) Variety (different forms of data) Velocity (streaming data) Veracity (presence of outliers / corruptions) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 5/31

Big Data era How to deal with big data Distributed Processing Centralized approach, e.g. MapReduce/Hadoop Decentralized approach, e.g. ad-hoc in-network processing Share processing power and storage requirements Privacy protection Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 6/31

Big Data era How to deal with big data Distributed Processing Centralized approach, e.g. MapReduce/Hadoop Decentralized approach, e.g. ad-hoc in-network processing Share processing power and storage requirements Privacy protection Online Learning Process data on the fly Limited storage demands Reduced computational complexity (stochastic gradient descent) Dealing with time-varying situations Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 6/31

Big Data era How to deal with big data Distributed Processing Centralized approach, e.g. MapReduce/Hadoop Decentralized approach, e.g. ad-hoc in-network processing Share processing power and storage requirements Privacy protection Online Learning Process data on the fly Limited storage demands Reduced computational complexity (stochastic gradient descent) Dealing with time-varying situations Randomized Methods Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 6/31

Randomized Methods Major Principle that governs randomized methods Instead of working with the original large-scale data matrices, operate on compressed versions of them. The compression is realized via computationally efficient dimensionality reduction, which is performed in a randomized rather than in a deterministic way. Some Facts! It is a very appealing idea! Data are highly compressible Low speed memory units are the major bottleneck It is applicable to ubiquitous data analysis and ML tasks, even to basic matrix operations Matrix Multiplication Linear Regression Low-rank Matrix approximation (Singular Value Decomposition) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 7/31

Randomized Methods Some Facts! (cont.) Which is the price to pay for? Provide approximate rather than exact solutions There is a probability of failure Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 8/31

Randomized Linear Regression Linear LS Regression b = A x + η N 1 N l l 1 N 1 N l, and at least N very large ˆx LS = arg min x R l b Ax 2 2 ˆx LS = (A T A) 1 A T b Computational complexity: O(Nl 2 ) via QR decomposition Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 9/31

Randomized Least Squares Randomized Linear LS Regression b = R b, d 1 d N N 1 b = A x + η N 1 N l l 1 N 1 A d l = R A d N N l, where d N ˆx R = arg min x R l b Ax 2 2, Computational complexity: O(dl 2 ) + C(R) + T (RA) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 10/31

Randomized Least Squares Randomized Linear LS Regression b = R b, d 1 d N N 1 b = A x + η N 1 N l l 1 N 1 A d l = R A d N N l, where d N ˆx R = arg min x R l b Ax 2 2, Computational complexity: O(dl 2 ) + C(R) + T (RA) Compression Ratio Compression Ratio Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 10/31

Randomized Least Squares Randomized Linear LS Regression b = R b, d 1 d N N 1 b = A x + η N 1 N l l 1 N 1 A d l = R A d N N l, where d N ˆx R = arg min x R l b Ax 2 2, Computational complexity: O(dl 2 ) + C(R) + T (RA) Some theoretic results [Drineas 2011] If d = O ( ) l(ln l)(ln N) + l ln N ɛ, then with probability 0.8 b Ax R 2 (1 + ɛ) b Ax LS 2 x LS x R 2 ɛ (κ(a) ) γ 2 1 x LS 2 if N e l, and U A U T A b 2 γ b 2, γ (0, 1] Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 10/31

Johnson Lindenstrauss (JL) seminal work (1984) Lemma For any set, S, of k points, u 1, u 2,... in R N there exist a linear mapping R : R N R d, with d = O(ɛ 2 log l), such that all the pairwise distances are approximately preserved: i, j (1 ɛ) u i u j 2 2 Ru i Ru j 2 2 (1 + ɛ) u i u j 2 2 W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz mapping into Hilbert space, Contemporary Mathematics, 1984. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 11/31

Johnson Lindenstrauss (JL) seminal work (1984) Lemma For any set, S, of k points, u 1, u 2,... in R N there exist a linear mapping R : R N R d, with d = O(ɛ 2 log l), such that all the pairwise distances are approximately preserved: i, j (1 ɛ) u i u j 2 2 Ru i Ru j 2 2 (1 + ɛ) u i u j 2 2 W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz mapping into Hilbert space, Contemporary Mathematics, 1984. JL Transforms (R Matrix) Johnson and Lindenstrauss (1984): Choose R uniformly at random from the space of projection matrices. Frankl and Maehara (1988): Random orthogonal matrix Indyk and Motwani (1998), DasGupta and Gupta (1999): entries chosen uniformly at random from N (0, 1 N ) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 11/31

Johnson Lindenstrauss (JL) seminal work JL geometry in the Linear Regression case b = A x + η N 1 N l l 1 N 1 ˆx R = arg min x R l Rb RAx 2 2, 0 0 Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 12/31

Accelerating Johnson Lindenstrauss (JL) Transforms Achlioptas (2003) 3 + d, with probability 1 6, a i,j = 0, with probability 2 3, 3 d, with probability 1 6. then if d 4+2β ɛ 2 /2 ɛ 3 /3log(l), each pairwise distance is preserved with probability at least 1 l β. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 13/31

Accelerating Johnson Lindenstrauss (JL) Transforms Achlioptas (2003) 3 + d, with probability 1 6, a i,j = 0, with probability 2 3, 3 d, with probability 1 6. then if d 4+2β ɛ 2 /2 ɛ 3 /3log(l), each pairwise distance is preserved with probability at least 1 l β. Fast JL Transforms (e.g. Sarlos 2006, Drineas et all 2011) R = P HD D R N N diagonal matrix with ±1 H R N N Hadamard matrix (normalized) P R d N a sparse matrix (or simply a Sampling Matrix) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 13/31

Accelerating Johnson Lindenstrauss (JL) Transforms Fast JL Transforms (e.g. Sarlos 2006, Drineas et all 2011) R = P HD D R N N diagonal matrix with ±1 H R N N Hadamard matrix (normalized) P R d N a sparse matrix (or simply a Sampling Matrix) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 14/31

Accelerating Johnson Lindenstrauss (JL) Transforms Fast JL Transforms (e.g. Sarlos 2006, Drineas et all 2011) R = P HD D R N N diagonal matrix with ±1 H R N N Hadamard matrix (normalized) P R d N a sparse matrix (or simply a Sampling Matrix) Computational Complexity / Facts It is called Randomized Hadamard Transform Multiplication with D is just selective sign changes Ha O(N log k), where k N, is the number Hadamard components needed Overall, RA takes O(lN log k) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 14/31

Fast LS approximation Example Compression Ratio Randomized Hadamard Transform Recall: d = O ( ) l(ln l)(ln N) + l ln N ɛ Example 1: N = 10 6, l = 200, ɛ = 0.1. Nl 2 dl 2 + ln log(d) = 10, Example 2: N = 10 8, l = 1000, ɛ = 0.1. Nl 2 dl 2 + ln log(d) = 63, Compression Ratio N d = 23 N d = 321 Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 15/31

Randomized projections vs Randomized sampling = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 16/31

Randomized projections vs Randomized sampling = = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 16/31

Randomized projections vs Randomized sampling = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 17/31

Randomized projections vs Randomized sampling = = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 17/31

Statistical Leverage Hat Matrix Statistical Leverage Scores b = A x + η ˆx LS = (A T A) 1 A T b ˆb = Aˆx LS = A(A T A) 1 A T b ˆb = Hb H ij measures the influence exerted on the prediction ˆb i by observation b j l i = H ii measures the importance of b i in determining the best LS fit. l i, i = 1... N are referred to as statistical leverage scores H = P A = UU T, for any orthogonal matrix spanning the column space of A. l i = U i,. 2 2 Very large H ii are indicators for outliers in A. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 18/31

Randomized Sampling Sampling Strategy Construct an importance sampling distribution {p i } N i=1, with p i = l i l. Intuitively, the larger the p i is the higher the probability of selecting the ith data sample (b i, A i,. ). Start with a zero-matrix R R d N. Then successively fill a single entry of each row, say the ith as follows Sample a random value, say ρ [1,..., N], from the importance sampling distribution. Set R i,ρ = 1 dp ρ. Via A = RA, A comprises rescaled rows of A randomly sampled with replacement. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 19/31

Computation of the Statistical Leverage Naive way A = UΣV T then U is orthogonal spanning the column space of A. Alas, complexity O(Nl 2 ) Fast approximations (Drineas 2012) Exploit the fact that l i = (UU T ) i,. 2 2 = (AA ) i,. 2 2 Construct two fast JL transform matrices (e.g. randomized Hadamard transforms), Π 1 R r 1 N, Π 2 R r 2 r 1 Estimate leverage scores as ˆl i = (A(Π 1 A) Π 2 ) i,. 2 2 it is proved that l i ˆl i ɛl i, i Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 20/31

Randomized projections vs Randomized sampling common ground! Random projections uniformize the leverage scores (so simple random sampling is adequate) Without random projection-based preprocessing, advanced sampling is needed (and Leverage scores-based importance sampling is doing the job!) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 21/31

Robust Randomized Linear Regression Robust Linear Regression Recall Veracity! b = A x + η, η = n + o ˆx LAD = arg min x R l b Ax 1 Least Absolute Deviations do not admit a closed form solution Linear programming using, e.g. interior-point methods O(poly(N)) Use approximate, iterative solutions, e.g. ADMM [Boyed 2011]. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 22/31

Robust Randomized Linear Regression Robust Linear Regression Recall Veracity! b = A x + η, η = n + o ˆx LAD = arg min x R l b Ax 1 Least Absolute Deviations do not admit a closed form solution Linear programming using, e.g. interior-point methods O(poly(N)) Use approximate, iterative solutions, e.g. ADMM [Boyed 2011]. A Hard time for Fast JL transforms Rb = RA + Rn + Ro The sparsity property is missing from Ro The energy of the nonzero values of o is spread across all d dimensions LAD is not appropriately anymore Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 22/31

Robust Randomized Linear Regression Robust Linear Regression Recall Veracity! b = A x + η, η = n + o ˆx LAD = arg min x R l b Ax 1 Least Absolute Deviations do not admit a closed form solution Linear programming using, e.g. interior-point methods O(poly(N)) Use approximate, iterative solutions, e.g. ADMM [Boyed 2011]. Randomized Sampling is Still OK Ro is still sparse LAD can be applied Harder (at least in theory) to compute the approximate leverage scores Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 22/31

Randomized Sampling for LAD l (1) leverage scores Recall the LS case: l (2) i = U i,. 2 2 where U could be any orthogonal base spanning the column space of A. LAD regression case: Leverage scores: l (1) i = U i,. 1 U i,. 1 is not invariant under rotation so, a well conditioned U need to be used Cauchy distributed variables / submatrices are needed. In practice, benefits over the l (2) construction are observed when N is way much larger than l Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 23/31

Reasoning behind our approach Proposed Approach Apply a fast JL transform, b = Rb, A = RA. Progressively clean the data from outliers in the reduced dimensional space Obtain final solution with ordinary LS. In an ideal world...(i) Let Λ {1,, N} be the index set indicating the corrupted data. Assume Λ is known. Then A Λ c,., b Λ c are the outlier-free data. Ideal solution: ˆx LS = arg min x R l Rb Λ c RA Λc,.x 2 Cleaning the compressed data directly in the low dim domain. Rb Λ c = b R.,Λ b Λ RA Λ c,. = A R.,Λ A Λ,. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 24/31

Reasoning behind our approach In an ideal world...(ii) Let randomized Hadamard transform be applied to the full data set where n = Rn. b = Ax + n + Ro Assume that x can be estimated exactly. Then where z is computed as b Ax. z = Ro + n (1) Request: Is it possible to estimate the support of o, in the reduced dimensional space, based on (1)? Indeed, this is a typical compressed sensing scenario. ô = min( z Ro 2 ) s.t. o 0 K o We only need to estimate the support (or a subset of it) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 25/31

Reasoning behind our approach Back in reality... Let randomized Hadamard transform is applied to the full data set b = Ax + n + Ro where b = Rb, A = RA, n = Rn. x is not known but Ro is likely to be Normal distributed. so ˆx = arg min x R l b Ax 2 where ˆx = x + x e. Then, z = b Aˆx z = Ro + n Request: Estimate any part of the support of o. Suggestion: Just use the CoSaMP proxy, ψ = R T z, Λ = Supp( ψ, K) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 26/31

The full picture Iterative Randomized Robust LS: Concept Compress data: b = Rb, A = RA Start Iterations Get a tentative estimate ˆx via arg min x R l b Ax 2 Compute ψ = R T (b Aˆx) Define Λ as the set of indices of the K larger (in magnitude) components of ψ. Key remark: We are happy if Λ contains some, not necessarily K, outlier indices Exclude / Clear the data indexed in Λ from the compressed data set Rb Λ c = b R.,Λ b Λ RA Λ c,. = A R.,Λ A Λ,. Key remark: Note that some healthy data might be omitted as well. Return to hopefully get an improved ˆx or stop Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 27/31

Computational complexity analysis Proposed Once: O((l + 1)N log d) Per iteration: O(dl 2 ) + d(l + 1) + O(Nd) + O(N) + (dkl) Random Sampling For the leverage Scores: O((l + 1)N log r 1 + lnr 2 + r 1 l 2 + r 2 l 2 ) r 1 = d and r 2 = O(log l) For LAD: poly(d) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 28/31

Some Results Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 29/31

Some Results Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 30/31

Randomized Methods for Low Rank approximation Sampling the column space is the key... Task: Let A R n m,. min X:rank(X=k) A X F Randomized Projection based Range finder Generate matrix R R m d and compress: Y = AR some housekeeping: Replace Y with Q whose columns form an orthonormal basis for the range of Y SVD estimation in 3 steps B = Q T A Compute low dimensional SVD: B = ŨΣV T U = QŨ Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 31/31