Randomized Robust Linear Regression for big data applications

Size: px
Start display at page:

Download "Randomized Robust Linear Regression for big data applications"

Transcription

1 Randomized Robust Linear Regression for big data applications Yannis Kopsinis 1 Dept. of Informatics & Telecommunications, UoA Thursday, Apr 16, 2015 In collaboration with S. Chouvardas, Harris Georgiou, Sergios Theodoridis Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 1/31

2 Outline 1 Big Data era 2 Randomized Methods 3 Randomized Linear Regression 4 Robust Randomized Linear Regression 5 Iterative Randomized Robust Regression 6 Randomized Low Rank matrix approximation Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 2/31

3 Big Data era Why all the fuss? Massive Data Volumes is not a new thing bytes flowed through telecommunication networks on 2007 First New Thing: Established Data analysis and Machine learning techniques face Big Challenges Second New Thing: Novel approaches for data capturing, handling and processing emerged Third New Thing: New modalities and increased complexity (internet of things, cyber-physical systems, smart homes, smart cars etc.) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 3/31

4 Big Data era Why all the fuss? Marketing policies From Big Data to Insights New emerging applications From Big Data to Insights Big Profits 4.4 million data scientists needed by 2015 (IBM) Many challenging open problems / paradigm shift Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 4/31

5 Big Data era What characterizes big data Volume (scale of data) Variety (different forms of data) Velocity (streaming data) Veracity (presence of outliers / corruptions) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 5/31

6 Big Data era How to deal with big data Distributed Processing Centralized approach, e.g. MapReduce/Hadoop Decentralized approach, e.g. ad-hoc in-network processing Share processing power and storage requirements Privacy protection Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 6/31

7 Big Data era How to deal with big data Distributed Processing Centralized approach, e.g. MapReduce/Hadoop Decentralized approach, e.g. ad-hoc in-network processing Share processing power and storage requirements Privacy protection Online Learning Process data on the fly Limited storage demands Reduced computational complexity (stochastic gradient descent) Dealing with time-varying situations Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 6/31

8 Big Data era How to deal with big data Distributed Processing Centralized approach, e.g. MapReduce/Hadoop Decentralized approach, e.g. ad-hoc in-network processing Share processing power and storage requirements Privacy protection Online Learning Process data on the fly Limited storage demands Reduced computational complexity (stochastic gradient descent) Dealing with time-varying situations Randomized Methods Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 6/31

9 Randomized Methods Major Principle that governs randomized methods Instead of working with the original large-scale data matrices, operate on compressed versions of them. The compression is realized via computationally efficient dimensionality reduction, which is performed in a randomized rather than in a deterministic way. Some Facts! It is a very appealing idea! Data are highly compressible Low speed memory units are the major bottleneck It is applicable to ubiquitous data analysis and ML tasks, even to basic matrix operations Matrix Multiplication Linear Regression Low-rank Matrix approximation (Singular Value Decomposition) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 7/31

10 Randomized Methods Some Facts! (cont.) Which is the price to pay for? Provide approximate rather than exact solutions There is a probability of failure Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 8/31

11 Randomized Linear Regression Linear LS Regression b = A x + η N 1 N l l 1 N 1 N l, and at least N very large ˆx LS = arg min x R l b Ax 2 2 ˆx LS = (A T A) 1 A T b Computational complexity: O(Nl 2 ) via QR decomposition Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 9/31

12 Randomized Least Squares Randomized Linear LS Regression b = R b, d 1 d N N 1 b = A x + η N 1 N l l 1 N 1 A d l = R A d N N l, where d N ˆx R = arg min x R l b Ax 2 2, Computational complexity: O(dl 2 ) + C(R) + T (RA) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 10/31

13 Randomized Least Squares Randomized Linear LS Regression b = R b, d 1 d N N 1 b = A x + η N 1 N l l 1 N 1 A d l = R A d N N l, where d N ˆx R = arg min x R l b Ax 2 2, Computational complexity: O(dl 2 ) + C(R) + T (RA) Compression Ratio Compression Ratio Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 10/31

14 Randomized Least Squares Randomized Linear LS Regression b = R b, d 1 d N N 1 b = A x + η N 1 N l l 1 N 1 A d l = R A d N N l, where d N ˆx R = arg min x R l b Ax 2 2, Computational complexity: O(dl 2 ) + C(R) + T (RA) Some theoretic results [Drineas 2011] If d = O ( ) l(ln l)(ln N) + l ln N ɛ, then with probability 0.8 b Ax R 2 (1 + ɛ) b Ax LS 2 x LS x R 2 ɛ (κ(a) ) γ 2 1 x LS 2 if N e l, and U A U T A b 2 γ b 2, γ (0, 1] Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 10/31

15 Johnson Lindenstrauss (JL) seminal work (1984) Lemma For any set, S, of k points, u 1, u 2,... in R N there exist a linear mapping R : R N R d, with d = O(ɛ 2 log l), such that all the pairwise distances are approximately preserved: i, j (1 ɛ) u i u j 2 2 Ru i Ru j 2 2 (1 + ɛ) u i u j 2 2 W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz mapping into Hilbert space, Contemporary Mathematics, Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 11/31

16 Johnson Lindenstrauss (JL) seminal work (1984) Lemma For any set, S, of k points, u 1, u 2,... in R N there exist a linear mapping R : R N R d, with d = O(ɛ 2 log l), such that all the pairwise distances are approximately preserved: i, j (1 ɛ) u i u j 2 2 Ru i Ru j 2 2 (1 + ɛ) u i u j 2 2 W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz mapping into Hilbert space, Contemporary Mathematics, JL Transforms (R Matrix) Johnson and Lindenstrauss (1984): Choose R uniformly at random from the space of projection matrices. Frankl and Maehara (1988): Random orthogonal matrix Indyk and Motwani (1998), DasGupta and Gupta (1999): entries chosen uniformly at random from N (0, 1 N ) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 11/31

17 Johnson Lindenstrauss (JL) seminal work JL geometry in the Linear Regression case b = A x + η N 1 N l l 1 N 1 ˆx R = arg min x R l Rb RAx 2 2, 0 0 Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 12/31

18 Accelerating Johnson Lindenstrauss (JL) Transforms Achlioptas (2003) 3 + d, with probability 1 6, a i,j = 0, with probability 2 3, 3 d, with probability 1 6. then if d 4+2β ɛ 2 /2 ɛ 3 /3log(l), each pairwise distance is preserved with probability at least 1 l β. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 13/31

19 Accelerating Johnson Lindenstrauss (JL) Transforms Achlioptas (2003) 3 + d, with probability 1 6, a i,j = 0, with probability 2 3, 3 d, with probability 1 6. then if d 4+2β ɛ 2 /2 ɛ 3 /3log(l), each pairwise distance is preserved with probability at least 1 l β. Fast JL Transforms (e.g. Sarlos 2006, Drineas et all 2011) R = P HD D R N N diagonal matrix with ±1 H R N N Hadamard matrix (normalized) P R d N a sparse matrix (or simply a Sampling Matrix) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 13/31

20 Accelerating Johnson Lindenstrauss (JL) Transforms Fast JL Transforms (e.g. Sarlos 2006, Drineas et all 2011) R = P HD D R N N diagonal matrix with ±1 H R N N Hadamard matrix (normalized) P R d N a sparse matrix (or simply a Sampling Matrix) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 14/31

21 Accelerating Johnson Lindenstrauss (JL) Transforms Fast JL Transforms (e.g. Sarlos 2006, Drineas et all 2011) R = P HD D R N N diagonal matrix with ±1 H R N N Hadamard matrix (normalized) P R d N a sparse matrix (or simply a Sampling Matrix) Computational Complexity / Facts It is called Randomized Hadamard Transform Multiplication with D is just selective sign changes Ha O(N log k), where k N, is the number Hadamard components needed Overall, RA takes O(lN log k) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 14/31

22 Fast LS approximation Example Compression Ratio Randomized Hadamard Transform Recall: d = O ( ) l(ln l)(ln N) + l ln N ɛ Example 1: N = 10 6, l = 200, ɛ = 0.1. Nl 2 dl 2 + ln log(d) = 10, Example 2: N = 10 8, l = 1000, ɛ = 0.1. Nl 2 dl 2 + ln log(d) = 63, Compression Ratio N d = 23 N d = 321 Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 15/31

23 Randomized projections vs Randomized sampling = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 16/31

24 Randomized projections vs Randomized sampling = = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 16/31

25 Randomized projections vs Randomized sampling = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 17/31

26 Randomized projections vs Randomized sampling = = Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 17/31

27 Statistical Leverage Hat Matrix Statistical Leverage Scores b = A x + η ˆx LS = (A T A) 1 A T b ˆb = Aˆx LS = A(A T A) 1 A T b ˆb = Hb H ij measures the influence exerted on the prediction ˆb i by observation b j l i = H ii measures the importance of b i in determining the best LS fit. l i, i = 1... N are referred to as statistical leverage scores H = P A = UU T, for any orthogonal matrix spanning the column space of A. l i = U i,. 2 2 Very large H ii are indicators for outliers in A. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 18/31

28 Randomized Sampling Sampling Strategy Construct an importance sampling distribution {p i } N i=1, with p i = l i l. Intuitively, the larger the p i is the higher the probability of selecting the ith data sample (b i, A i,. ). Start with a zero-matrix R R d N. Then successively fill a single entry of each row, say the ith as follows Sample a random value, say ρ [1,..., N], from the importance sampling distribution. Set R i,ρ = 1 dp ρ. Via A = RA, A comprises rescaled rows of A randomly sampled with replacement. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 19/31

29 Computation of the Statistical Leverage Naive way A = UΣV T then U is orthogonal spanning the column space of A. Alas, complexity O(Nl 2 ) Fast approximations (Drineas 2012) Exploit the fact that l i = (UU T ) i,. 2 2 = (AA ) i,. 2 2 Construct two fast JL transform matrices (e.g. randomized Hadamard transforms), Π 1 R r 1 N, Π 2 R r 2 r 1 Estimate leverage scores as ˆl i = (A(Π 1 A) Π 2 ) i,. 2 2 it is proved that l i ˆl i ɛl i, i Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 20/31

30 Randomized projections vs Randomized sampling common ground! Random projections uniformize the leverage scores (so simple random sampling is adequate) Without random projection-based preprocessing, advanced sampling is needed (and Leverage scores-based importance sampling is doing the job!) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 21/31

31 Robust Randomized Linear Regression Robust Linear Regression Recall Veracity! b = A x + η, η = n + o ˆx LAD = arg min x R l b Ax 1 Least Absolute Deviations do not admit a closed form solution Linear programming using, e.g. interior-point methods O(poly(N)) Use approximate, iterative solutions, e.g. ADMM [Boyed 2011]. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 22/31

32 Robust Randomized Linear Regression Robust Linear Regression Recall Veracity! b = A x + η, η = n + o ˆx LAD = arg min x R l b Ax 1 Least Absolute Deviations do not admit a closed form solution Linear programming using, e.g. interior-point methods O(poly(N)) Use approximate, iterative solutions, e.g. ADMM [Boyed 2011]. A Hard time for Fast JL transforms Rb = RA + Rn + Ro The sparsity property is missing from Ro The energy of the nonzero values of o is spread across all d dimensions LAD is not appropriately anymore Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 22/31

33 Robust Randomized Linear Regression Robust Linear Regression Recall Veracity! b = A x + η, η = n + o ˆx LAD = arg min x R l b Ax 1 Least Absolute Deviations do not admit a closed form solution Linear programming using, e.g. interior-point methods O(poly(N)) Use approximate, iterative solutions, e.g. ADMM [Boyed 2011]. Randomized Sampling is Still OK Ro is still sparse LAD can be applied Harder (at least in theory) to compute the approximate leverage scores Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 22/31

34 Randomized Sampling for LAD l (1) leverage scores Recall the LS case: l (2) i = U i,. 2 2 where U could be any orthogonal base spanning the column space of A. LAD regression case: Leverage scores: l (1) i = U i,. 1 U i,. 1 is not invariant under rotation so, a well conditioned U need to be used Cauchy distributed variables / submatrices are needed. In practice, benefits over the l (2) construction are observed when N is way much larger than l Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 23/31

35 Reasoning behind our approach Proposed Approach Apply a fast JL transform, b = Rb, A = RA. Progressively clean the data from outliers in the reduced dimensional space Obtain final solution with ordinary LS. In an ideal world...(i) Let Λ {1,, N} be the index set indicating the corrupted data. Assume Λ is known. Then A Λ c,., b Λ c are the outlier-free data. Ideal solution: ˆx LS = arg min x R l Rb Λ c RA Λc,.x 2 Cleaning the compressed data directly in the low dim domain. Rb Λ c = b R.,Λ b Λ RA Λ c,. = A R.,Λ A Λ,. Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 24/31

36 Reasoning behind our approach In an ideal world...(ii) Let randomized Hadamard transform be applied to the full data set where n = Rn. b = Ax + n + Ro Assume that x can be estimated exactly. Then where z is computed as b Ax. z = Ro + n (1) Request: Is it possible to estimate the support of o, in the reduced dimensional space, based on (1)? Indeed, this is a typical compressed sensing scenario. ô = min( z Ro 2 ) s.t. o 0 K o We only need to estimate the support (or a subset of it) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 25/31

37 Reasoning behind our approach Back in reality... Let randomized Hadamard transform is applied to the full data set b = Ax + n + Ro where b = Rb, A = RA, n = Rn. x is not known but Ro is likely to be Normal distributed. so ˆx = arg min x R l b Ax 2 where ˆx = x + x e. Then, z = b Aˆx z = Ro + n Request: Estimate any part of the support of o. Suggestion: Just use the CoSaMP proxy, ψ = R T z, Λ = Supp( ψ, K) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 26/31

38 The full picture Iterative Randomized Robust LS: Concept Compress data: b = Rb, A = RA Start Iterations Get a tentative estimate ˆx via arg min x R l b Ax 2 Compute ψ = R T (b Aˆx) Define Λ as the set of indices of the K larger (in magnitude) components of ψ. Key remark: We are happy if Λ contains some, not necessarily K, outlier indices Exclude / Clear the data indexed in Λ from the compressed data set Rb Λ c = b R.,Λ b Λ RA Λ c,. = A R.,Λ A Λ,. Key remark: Note that some healthy data might be omitted as well. Return to hopefully get an improved ˆx or stop Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 27/31

39 Computational complexity analysis Proposed Once: O((l + 1)N log d) Per iteration: O(dl 2 ) + d(l + 1) + O(Nd) + O(N) + (dkl) Random Sampling For the leverage Scores: O((l + 1)N log r 1 + lnr 2 + r 1 l 2 + r 2 l 2 ) r 1 = d and r 2 = O(log l) For LAD: poly(d) Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 28/31

40 Some Results Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 29/31

41 Some Results Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 30/31

42 Randomized Methods for Low Rank approximation Sampling the column space is the key... Task: Let A R n m,. min X:rank(X=k) A X F Randomized Projection based Range finder Generate matrix R R m d and compress: Y = AR some housekeeping: Replace Y with Q whose columns form an orthonormal basis for the range of Y SVD estimation in 3 steps B = Q T A Compute low dimensional SVD: B = ŨΣV T U = QŨ Y. Kopsinis, Dept. of Informatics & Telecommunications, UoA. Randomized methods for big data applications, 31/31

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

More information

Sublinear Algorithms for Big Data. Part 4: Random Topics

Sublinear Algorithms for Big Data. Part 4: Random Topics Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 2-1 Topic 1: Compressive sensing Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Sketch As a Tool for Numerical Linear Algebra

Sketch As a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Orthogonal Projections

Orthogonal Projections Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Greedy Column Subset Selection for Large-scale Data Sets

Greedy Column Subset Selection for Large-scale Data Sets Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA. September 23, 2010 LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

MATH1231 Algebra, 2015 Chapter 7: Linear maps

MATH1231 Algebra, 2015 Chapter 7: Linear maps MATH1231 Algebra, 2015 Chapter 7: Linear maps A/Prof. Daniel Chan School of Mathematics and Statistics University of New South Wales [email protected] Daniel Chan (UNSW) MATH1231 Algebra 1 / 43 Chapter

More information

Learning Tools for Big Data Analytics

Learning Tools for Big Data Analytics Learning Tools for Big Data Analytics Georgios B. Giannakis Acknowledgments: Profs. G. Mateos and K. Slavakis NSF 1343860, 1442686, and MURI-FA9550-10-1-0567 Center for Advanced Signal and Image Sciences

More information

Lecture 4: Partitioned Matrices and Determinants

Lecture 4: Partitioned Matrices and Determinants Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function 17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):

More information

ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

When is missing data recoverable?

When is missing data recoverable? When is missing data recoverable? Yin Zhang CAAM Technical Report TR06-15 Department of Computational and Applied Mathematics Rice University, Houston, TX 77005 October, 2006 Abstract Suppose a non-random

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve

More information

Learning, Sparsity and Big Data

Learning, Sparsity and Big Data Learning, Sparsity and Big Data M. Magdon-Ismail (Joint Work) January 22, 2014. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have data to learn it Tested on new cases? Teaching

More information

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column

More information

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA) SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

More information

1 Sets and Set Notation.

1 Sets and Set Notation. LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Federated Optimization: Distributed Optimization Beyond the Datacenter

Federated Optimization: Distributed Optimization Beyond the Datacenter Federated Optimization: Distributed Optimization Beyond the Datacenter Jakub Konečný School of Mathematics University of Edinburgh [email protected] H. Brendan McMahan Google, Inc. Seattle, WA 98103

More information

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

More information

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors Chapter 9. General Matrices An n m matrix is an array a a a m a a a m... = [a ij]. a n a n a nm The matrix A has n row vectors and m column vectors row i (A) = [a i, a i,..., a im ] R m a j a j a nj col

More information

Dynamic data processing

Dynamic data processing Dynamic data processing recursive least-squares P.J.G. Teunissen Series on Mathematical Geodesy and Positioning Dynamic data processing recursive least-squares Dynamic data processing recursive least-squares

More information

Chapter 7. Lyapunov Exponents. 7.1 Maps

Chapter 7. Lyapunov Exponents. 7.1 Maps Chapter 7 Lyapunov Exponents Lyapunov exponents tell us the rate of divergence of nearby trajectories a key component of chaotic dynamics. For one dimensional maps the exponent is simply the average

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,

More information

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Parallel & Distributed Optimization. Based on Mark Schmidt s slides Parallel & Distributed Optimization Based on Mark Schmidt s slides Motivation behind using parallel & Distributed optimization Performance Computational throughput have increased exponentially in linear

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Operation Count; Numerical Linear Algebra

Operation Count; Numerical Linear Algebra 10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Machine learning challenges for big data

Machine learning challenges for big data Machine learning challenges for big data Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure Joint work with R. Jenatton, J. Mairal, G. Obozinski, N. Le Roux, M. Schmidt - December 2012

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in 2-106. Total: 175 points.

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in 2-106. Total: 175 points. 806 Problem Set 4 Solution Due Wednesday, March 2009 at 4 pm in 2-06 Total: 75 points Problem : A is an m n matrix of rank r Suppose there are right-hand-sides b for which A x = b has no solution (a) What

More information

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

The degrees of freedom of the Lasso in underdetermined linear regression models

The degrees of freedom of the Lasso in underdetermined linear regression models The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN

More information

Lecture 5 Least-squares

Lecture 5 Least-squares EE263 Autumn 2007-08 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property

More information

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010 Math 550 Notes Chapter 7 Jesse Crawford Department of Mathematics Tarleton State University Fall 2010 (Tarleton State University) Math 550 Chapter 7 Fall 2010 1 / 34 Outline 1 Self-Adjoint and Normal Operators

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n. ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

More information

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

State of Stress at Point

State of Stress at Point State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,

More information

Big learning: challenges and opportunities

Big learning: challenges and opportunities Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Modélisation et résolutions numérique et symbolique

Modélisation et résolutions numérique et symbolique Modélisation et résolutions numérique et symbolique via les logiciels Maple et Matlab Jeremy Berthomieu Mohab Safey El Din Stef Graillat [email protected] Outline Previous course: partial review of what

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Lecture 1: Schur s Unitary Triangularization Theorem

Lecture 1: Schur s Unitary Triangularization Theorem Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl [email protected] dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation

More information

Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

3 Orthogonal Vectors and Matrices

3 Orthogonal Vectors and Matrices 3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj Section 5. l j v j = [ u u j u m ] l jj = l jj u j + + l mj u m. l mj Section 5. 5.. Not orthogonal, the column vectors fail to be perpendicular to each other. 5..2 his matrix is orthogonal. Check that

More information