Statistical machine learning, high dimension and big data

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Statistical machine learning, high dimension and big data"

Transcription

1 Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars CMAP - Ecole Polytechnique

2 Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling, Graphical Gaussian Model

3 Divide and Conquer Principle for Matrix Completion

4 SVD-based matrix completion Unknown matrix M of size n 1 n 2. Prior: rank(m) n 1 n 2 We observe P Ω (M) = {M j,k : (j, k) Ω} R m Basic iteration of a proximal gradient algorithm writes X k+1 S λ (X k η k (P Ω (X k ) P Ω (M))) where S λ spectral soft-thresholding operator S λ (X ) = U diag[(σ 1 (X ) λ) +,..., (σ n1 n 2 (X ) λ) + ]V with X = UΣV SVD of X.

5 SVD-based matrix completion Bottleneck: a truncated SVD is necessary at each iteration Best-case complexity is O(n 1 n 2 k) [Lanczos algorithms] Such algorithms for matrix completion with theoretical guarantee rely on expensive truncated SVD computation This does not scale! Idea: Divide and Conquer principle Divide M into submatrices Solve the subproblems: matrix completion of each submatrix (done in parallel) Combine the reconstructed submatrices for entire reconstruction of M [Mackey Talwalkar and Jordan (2011)]

6 Divide and Conquer Matrix Completion 1 Randomly partition M into t column submatrices M = [ C 1 C 2 C T ] with C t R n 1 p, p = n 2 /T 2 Complete each submatrix Ĉ t using trace norm penalization on this subproblem in parallel [if fully done in parallel, this is T times faster than the single completion of M] leading to [Ĉ1 Ĉ 2 Ĉ T ] 3 Combine them: project this matrix onto the column spaces of each Ĉ t, and average. If Ĉ t = Û t ˆΣ t ˆV t SVD of Ĉ t, compute ˆM = 1 T T Û t Ût t=1 [Ĉ1 Ĉ 2 Ĉ T ] [Note that Û t Ût is the projection matrix onto the space spanned by the columns of Ĉ t ]

7 Divide and Conquer Matrix Completion Full matrix completion: complexity O(n 1 n 2 k) per iteration (truncated SVD on the full matrix) DC matrix completion: maximum O(n 1 p max t k t ) complexity per iteration (for truncated SVDs on the subcompletion problems done in parallel) ] O(n 1 kp) for the multiplication ÛtÛ t [Ĉ1 Ĉ 2 Ĉ T done in parallel, hence O(n 1 kpt ) = O(n 1 n 2 k) for the averaging (but done only once) 1 T T t=1 Warning: Û t Ût Ĉ j = Û t (Ût Ĉ j ) is O(n 1 kp) while Û t Ût Ĉ j = (Û t Ût )Ĉ j is O(n1 2p)

8 Divide and Conquer Matrix Completion Numerical results [Mackey et al. (2011)] And almost the same theoretical guarantees as matrix completion on the full matrix

9 Divide and Conquer Matrix Completion What is behind this? Getting a low-rank approximation using projection onto a random column subsample M a n 1 n 2 matrix and L a rank r approximation of M. Fix x > 0 and ε > 0 Construct a matrix C of size n 1 p that contains columns of M picked at random without replacement Compute C = U C Σ C V C Then SVD of C M U C U C M F (1 + ε) M L F with probability 1 e x whenever p crµ 0 (V L ) log(n 1 n 2 )x/ε 2 where µ 0 (V ) = n 2 r max 1 i n2 V i, 2 2 = n 2 r V 2, with L = U L Σ L VL SVD of L

10 Graphical modelling, Graphical Gaussian Model

11 Graphs

12 Graphs

13 Graphs Co-occure of words

14 Graphs Relation of artists in last.fm database

15 Graphs Evolution of co-voters in the US Senate [Show video]

16 Graphs Graph A graph G consists of a set of vertices V and a set of edges E We often note G = (V, E) E is a subset of V V containing ordered pairs of distinct vertices. An edge is directed from j to k if (j, k) E Undirected graphs, directed graphs

17 Graphical Models Graphical Model The set V corresponds to a collection of random variables Denote V = {1,..., p} with V = p X = (X 1,..., X p ) P The pair (G, P) is a graphical model

18 Graphs, Graphical Models Consider an undirected graph G and a graphical model (G, P) We say that P satisfies the pairwise Markov property with respect to G = (V, E) iif X j X k X V {j,k} for any (j, k) / E, j k, namely X j and X k are conditionaly independent given the all the other vertices A graphical model satisfying this property is called a conditional independence graph (CIG)

19 Gaussian Graphical Models A Gaussian Graphical Model is a CIG with the assumption X = (X 1,..., X d ) N(0, Σ) for a positive definite covariance matrix Σ. Mean is zero to simplify notations A well-known result (Lauritzen (1996)): (j, k) and (k, j) E iff X j X k X V {j,k} iff (Σ 1 ) j,k = 0 [exerc.] The edges can be read on the precision matrix K = Σ 1 : (j, k) V and (k, j) V iff K j,k 0

20 Gaussian Graphical Models The partial correlation ρ j,k V {j,k} between X j and X k conditional on X V {j,k} is given by K j,k ρ j,k V {j,k} = Kj,j K k,k The partial correlation coefficients are regression coefficients: we can write X j = β j,k X k + β l,j X l + ε j l V {j,k} where E[ε j ] = 0 and ε j X V {j}, with β j,k = K j,k K j,j and β k,j = K j,k K k,k [exerc.]

21 Sparse Gaussian Graphical Model Suppose that we observe X 1,..., X n i.i.d. N(0, Σ) Put X the n p observation matrix with lines X i = [ X i,1 X i,p ] Estimation of K = Σ 1 achieved by maximum likelihood estimation L(Σ; X) = n i=1 1 (2π) p/2 det Σ exp( 1 2 X i Σ 1 X i ) or L(K; X) = n det(k) i=1 (2π) p/2 exp( 1 2 X i KX i )

22 Gaussian Graphical Models Minus log-likelihood is l(k; X) = log det K + ˆΣ, K + c where c does not depend on K and where A, B = tr(a B) Prior assumption: each vertice isn t connected to all others: there is only few edges in the graph Use l 1 -penalization on K to obtain a sparse solution Graphical Lasso [Friedman et al (2007), Banerjee et al (2008)] { ˆK argmin log det K + ˆΣ, K + λ } K j,k K:K 0 1 j<k p

23 Sparse Gaussian Graphical Model

24 Sparse Gaussian Graphical Model How to solve ˆK argmin K 0 { log det K + ˆΣ, K + λ K 1 } It is a convex minimization problem: log det is convex log det differentiable, with log det(x ) = X 1 Recall that max X 1 X, Y = K 1 Dual problem is { } max log det(ˆσ + X ) + p X λ and primal and dual variable related by K = (ˆΣ + X ) 1 Duality gap is [Exerc.] K, ˆΣ p + λ K 1

25 Sparse Gaussian Graphical Model Rewrite dual problem as min X λ { } log det(ˆσ + X ) p min log det(x ) X ˆΣ λ This will be optimized recursively by updating over a single row and column of K at a time

26 Sparse Gaussian Graphical Model Let X j, k be the matrix with removed j-th line and k-th column and X j the j-th column with removed j-th entry Recall the Schur complement formula [ ] A B det = det(a) det(d CA 1 B) C D Namely [ ] K p, p k det p k p k p,p = det(k p, p ) det(k p,p k p K 1 p, p k p) If we are at iteration k, update the p-th row and column by k p (k) solution of min y (K (k 1) j, j ) 1 y y ˆΣ j λ

27 Sparse Gaussian Graphical Model The dual problem min y (K (k 1) j, j ) 1 y y ˆΣ j λ is a box-constrained quadratic program Its dual is min x K (k 1) x j, j x ˆΣ j, x + λ x 1 = min Ax b 2 x 2 + λ x 1 with A = (K (k 1) j, j )1/2 and b = 1 2 (K (k 1) j, j ) 1/2 ˆΣ j Several Lasso problem at each iteration

28 Sparse Gaussian Graphical Model Algorithm for graphical Lasso [Block coordinate descent] Initialize ˆK (0) = K (0) = ˆΣ + λi For k 0 repeat for j = 1,..., p solve ˆx argmin x (K (j 1) j, j )1/2 x 1 2 (K (j 1) j j ) 1/2 ˆΣj λ x 1 Obtain K (j) by replacing j-th row and column of K (j 1) by ˆx Put ˆK (k) = K (p) and K (0) (k) = ˆK If ˆK (k), ˆΣ p + λ ˆK (k) 1 ε stop and return ˆK (k)

29 Conclusion

30 What I didn t spoke about A plethora of other penalizations, optimization algorithms, settings for machine learning Lasso is not consistent for variable selection. Use Adaptive Lasso, namely a l 1 -penalization weighted by a previous solution d j=1 θ j θ j + ε where θ previous estimator [Zou et al (2006)]

31 What I didn t talk about Fused Lasso for finding change points: use a penalization based on d λ 1 θ 1 + λ tv θ j θ j 1 j=2 [decomposition of the proximal operator]

32 What I didn t talk about Support Vector Machine: non-linear classification using the Kernel Trick Classification trees, CART, Random Forest Multi-testing Feature screening Bayesian Networks Deep learning Multi-task learning, dictionary learning Non-negative matrix factorization Spectral Clustering Latent Dirichlet Allocation among many many other things...

33 This evening Don t forget!

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Week 5 Integral Polyhedra

Week 5 Integral Polyhedra Week 5 Integral Polyhedra We have seen some examples 1 of linear programming formulation that are integral, meaning that every basic feasible solution is an integral vector. This week we develop a theory

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Discuss the size of the instance for the minimum spanning tree problem.

Discuss the size of the instance for the minimum spanning tree problem. 3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Journal of Machine Learning Research 9 (2008) 485-516 Submitted 5/07; Revised 12/07; Published 3/08 Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

More information

Extracting correlation structure from large random matrices

Extracting correlation structure from large random matrices Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Lecture 3: Linear Programming Relaxations and Rounding

Lecture 3: Linear Programming Relaxations and Rounding Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can

More information

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

More information

Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Traffic Driven Analysis of Cellular Data Networks

Traffic Driven Analysis of Cellular Data Networks Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu

More information

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A = MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Cheng Soon Ong & Christfried Webers. Canberra February June 2016

Cheng Soon Ong & Christfried Webers. Canberra February June 2016 c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

More information

MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, Chapter 1: Linear Equations and Matrices MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

More information

Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Onureena Banerjee Laurent El Ghaoui EECS Department University of California, Berkeley Berkeley, CA

More information

Definition of a Linear Program

Definition of a Linear Program Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics. Clustering expression data 10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Generating Valid 4 4 Correlation Matrices

Generating Valid 4 4 Correlation Matrices Applied Mathematics E-Notes, 7(2007), 53-59 c ISSN 1607-2510 Available free at mirror sites of http://www.math.nthu.edu.tw/ amen/ Generating Valid 4 4 Correlation Matrices Mark Budden, Paul Hadavas, Lorrie

More information

Greedy Column Subset Selection for Large-scale Data Sets

Greedy Column Subset Selection for Large-scale Data Sets Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Large-scale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Using the Singular Value Decomposition

Using the Singular Value Decomposition Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

Distance based clustering

Distance based clustering // Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

MATH36001 Background Material 2015

MATH36001 Background Material 2015 MATH3600 Background Material 205 Matrix Algebra Matrices and Vectors An ordered array of mn elements a ij (i =,, m; j =,, n) written in the form a a 2 a n A = a 2 a 22 a 2n a m a m2 a mn is said to be

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

M5MS09. Graphical Modelling

M5MS09. Graphical Modelling Course: MMS09 Setter: Walden Checker: Ginzberg Editor: Calderhead External: Wood Date: April, 0 MSc EXAMINATIONS (STATISTICS) May-June 0 MMS09 Graphical Modelling Setter s signature Checker s signature

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

EECS 445: Introduction to Machine Learning Winter 2015

EECS 445: Introduction to Machine Learning Winter 2015 Instructor: Prof. Jenna Wiens Office: 3609 BBB wiensj@umich.edu EECS 445: Introduction to Machine Learning Winter 2015 Graduate Student Instructor: Srayan Datta Office: 3349 North Quad (**office hours

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology.

Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology. International Conference on Mathematical and Statistical Modeling in Honor of Enrique Castillo. June 28-30, 2006 Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology. Patricia

More information

MA 242 LINEAR ALGEBRA C1, Solutions to Second Midterm Exam

MA 242 LINEAR ALGEBRA C1, Solutions to Second Midterm Exam MA 4 LINEAR ALGEBRA C, Solutions to Second Midterm Exam Prof. Nikola Popovic, November 9, 6, 9:3am - :5am Problem (5 points). Let the matrix A be given by 5 6 5 4 5 (a) Find the inverse A of A, if it exists.

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information