Statistical machine learning, high dimension and big data


 Donald Briggs
 3 years ago
 Views:
Transcription
1 Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars CMAP  Ecole Polytechnique
2 Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling, Graphical Gaussian Model
3 Divide and Conquer Principle for Matrix Completion
4 SVDbased matrix completion Unknown matrix M of size n 1 n 2. Prior: rank(m) n 1 n 2 We observe P Ω (M) = {M j,k : (j, k) Ω} R m Basic iteration of a proximal gradient algorithm writes X k+1 S λ (X k η k (P Ω (X k ) P Ω (M))) where S λ spectral softthresholding operator S λ (X ) = U diag[(σ 1 (X ) λ) +,..., (σ n1 n 2 (X ) λ) + ]V with X = UΣV SVD of X.
5 SVDbased matrix completion Bottleneck: a truncated SVD is necessary at each iteration Bestcase complexity is O(n 1 n 2 k) [Lanczos algorithms] Such algorithms for matrix completion with theoretical guarantee rely on expensive truncated SVD computation This does not scale! Idea: Divide and Conquer principle Divide M into submatrices Solve the subproblems: matrix completion of each submatrix (done in parallel) Combine the reconstructed submatrices for entire reconstruction of M [Mackey Talwalkar and Jordan (2011)]
6 Divide and Conquer Matrix Completion 1 Randomly partition M into t column submatrices M = [ C 1 C 2 C T ] with C t R n 1 p, p = n 2 /T 2 Complete each submatrix Ĉ t using trace norm penalization on this subproblem in parallel [if fully done in parallel, this is T times faster than the single completion of M] leading to [Ĉ1 Ĉ 2 Ĉ T ] 3 Combine them: project this matrix onto the column spaces of each Ĉ t, and average. If Ĉ t = Û t ˆΣ t ˆV t SVD of Ĉ t, compute ˆM = 1 T T Û t Ût t=1 [Ĉ1 Ĉ 2 Ĉ T ] [Note that Û t Ût is the projection matrix onto the space spanned by the columns of Ĉ t ]
7 Divide and Conquer Matrix Completion Full matrix completion: complexity O(n 1 n 2 k) per iteration (truncated SVD on the full matrix) DC matrix completion: maximum O(n 1 p max t k t ) complexity per iteration (for truncated SVDs on the subcompletion problems done in parallel) ] O(n 1 kp) for the multiplication ÛtÛ t [Ĉ1 Ĉ 2 Ĉ T done in parallel, hence O(n 1 kpt ) = O(n 1 n 2 k) for the averaging (but done only once) 1 T T t=1 Warning: Û t Ût Ĉ j = Û t (Ût Ĉ j ) is O(n 1 kp) while Û t Ût Ĉ j = (Û t Ût )Ĉ j is O(n1 2p)
8 Divide and Conquer Matrix Completion Numerical results [Mackey et al. (2011)] And almost the same theoretical guarantees as matrix completion on the full matrix
9 Divide and Conquer Matrix Completion What is behind this? Getting a lowrank approximation using projection onto a random column subsample M a n 1 n 2 matrix and L a rank r approximation of M. Fix x > 0 and ε > 0 Construct a matrix C of size n 1 p that contains columns of M picked at random without replacement Compute C = U C Σ C V C Then SVD of C M U C U C M F (1 + ε) M L F with probability 1 e x whenever p crµ 0 (V L ) log(n 1 n 2 )x/ε 2 where µ 0 (V ) = n 2 r max 1 i n2 V i, 2 2 = n 2 r V 2, with L = U L Σ L VL SVD of L
10 Graphical modelling, Graphical Gaussian Model
11 Graphs
12 Graphs
13 Graphs Cooccure of words
14 Graphs Relation of artists in last.fm database
15 Graphs Evolution of covoters in the US Senate [Show video]
16 Graphs Graph A graph G consists of a set of vertices V and a set of edges E We often note G = (V, E) E is a subset of V V containing ordered pairs of distinct vertices. An edge is directed from j to k if (j, k) E Undirected graphs, directed graphs
17 Graphical Models Graphical Model The set V corresponds to a collection of random variables Denote V = {1,..., p} with V = p X = (X 1,..., X p ) P The pair (G, P) is a graphical model
18 Graphs, Graphical Models Consider an undirected graph G and a graphical model (G, P) We say that P satisfies the pairwise Markov property with respect to G = (V, E) iif X j X k X V {j,k} for any (j, k) / E, j k, namely X j and X k are conditionaly independent given the all the other vertices A graphical model satisfying this property is called a conditional independence graph (CIG)
19 Gaussian Graphical Models A Gaussian Graphical Model is a CIG with the assumption X = (X 1,..., X d ) N(0, Σ) for a positive definite covariance matrix Σ. Mean is zero to simplify notations A wellknown result (Lauritzen (1996)): (j, k) and (k, j) E iff X j X k X V {j,k} iff (Σ 1 ) j,k = 0 [exerc.] The edges can be read on the precision matrix K = Σ 1 : (j, k) V and (k, j) V iff K j,k 0
20 Gaussian Graphical Models The partial correlation ρ j,k V {j,k} between X j and X k conditional on X V {j,k} is given by K j,k ρ j,k V {j,k} = Kj,j K k,k The partial correlation coefficients are regression coefficients: we can write X j = β j,k X k + β l,j X l + ε j l V {j,k} where E[ε j ] = 0 and ε j X V {j}, with β j,k = K j,k K j,j and β k,j = K j,k K k,k [exerc.]
21 Sparse Gaussian Graphical Model Suppose that we observe X 1,..., X n i.i.d. N(0, Σ) Put X the n p observation matrix with lines X i = [ X i,1 X i,p ] Estimation of K = Σ 1 achieved by maximum likelihood estimation L(Σ; X) = n i=1 1 (2π) p/2 det Σ exp( 1 2 X i Σ 1 X i ) or L(K; X) = n det(k) i=1 (2π) p/2 exp( 1 2 X i KX i )
22 Gaussian Graphical Models Minus loglikelihood is l(k; X) = log det K + ˆΣ, K + c where c does not depend on K and where A, B = tr(a B) Prior assumption: each vertice isn t connected to all others: there is only few edges in the graph Use l 1 penalization on K to obtain a sparse solution Graphical Lasso [Friedman et al (2007), Banerjee et al (2008)] { ˆK argmin log det K + ˆΣ, K + λ } K j,k K:K 0 1 j<k p
23 Sparse Gaussian Graphical Model
24 Sparse Gaussian Graphical Model How to solve ˆK argmin K 0 { log det K + ˆΣ, K + λ K 1 } It is a convex minimization problem: log det is convex log det differentiable, with log det(x ) = X 1 Recall that max X 1 X, Y = K 1 Dual problem is { } max log det(ˆσ + X ) + p X λ and primal and dual variable related by K = (ˆΣ + X ) 1 Duality gap is [Exerc.] K, ˆΣ p + λ K 1
25 Sparse Gaussian Graphical Model Rewrite dual problem as min X λ { } log det(ˆσ + X ) p min log det(x ) X ˆΣ λ This will be optimized recursively by updating over a single row and column of K at a time
26 Sparse Gaussian Graphical Model Let X j, k be the matrix with removed jth line and kth column and X j the jth column with removed jth entry Recall the Schur complement formula [ ] A B det = det(a) det(d CA 1 B) C D Namely [ ] K p, p k det p k p k p,p = det(k p, p ) det(k p,p k p K 1 p, p k p) If we are at iteration k, update the pth row and column by k p (k) solution of min y (K (k 1) j, j ) 1 y y ˆΣ j λ
27 Sparse Gaussian Graphical Model The dual problem min y (K (k 1) j, j ) 1 y y ˆΣ j λ is a boxconstrained quadratic program Its dual is min x K (k 1) x j, j x ˆΣ j, x + λ x 1 = min Ax b 2 x 2 + λ x 1 with A = (K (k 1) j, j )1/2 and b = 1 2 (K (k 1) j, j ) 1/2 ˆΣ j Several Lasso problem at each iteration
28 Sparse Gaussian Graphical Model Algorithm for graphical Lasso [Block coordinate descent] Initialize ˆK (0) = K (0) = ˆΣ + λi For k 0 repeat for j = 1,..., p solve ˆx argmin x (K (j 1) j, j )1/2 x 1 2 (K (j 1) j j ) 1/2 ˆΣj λ x 1 Obtain K (j) by replacing jth row and column of K (j 1) by ˆx Put ˆK (k) = K (p) and K (0) (k) = ˆK If ˆK (k), ˆΣ p + λ ˆK (k) 1 ε stop and return ˆK (k)
29 Conclusion
30 What I didn t spoke about A plethora of other penalizations, optimization algorithms, settings for machine learning Lasso is not consistent for variable selection. Use Adaptive Lasso, namely a l 1 penalization weighted by a previous solution d j=1 θ j θ j + ε where θ previous estimator [Zou et al (2006)]
31 What I didn t talk about Fused Lasso for finding change points: use a penalization based on d λ 1 θ 1 + λ tv θ j θ j 1 j=2 [decomposition of the proximal operator]
32 What I didn t talk about Support Vector Machine: nonlinear classification using the Kernel Trick Classification trees, CART, Random Forest Multitesting Feature screening Bayesian Networks Deep learning Multitask learning, dictionary learning Nonnegative matrix factorization Spectral Clustering Latent Dirichlet Allocation among many many other things...
33 This evening Don t forget!
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLecture Topic: LowRank Approximations
Lecture Topic: LowRank Approximations LowRank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationWeek 5 Integral Polyhedra
Week 5 Integral Polyhedra We have seen some examples 1 of linear programming formulation that are integral, meaning that every basic feasible solution is an integral vector. This week we develop a theory
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationNonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning
Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationDiscuss the size of the instance for the minimum spanning tree problem.
3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationApproximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai
Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
Journal of Machine Learning Research 9 (2008) 485516 Submitted 5/07; Revised 12/07; Published 3/08 Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
More informationExtracting correlation structure from large random matrices
Extracting correlation structure from large random matrices Alfred Hero University of Michigan  Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationLecture 3: Linear Programming Relaxations and Rounding
Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can
More informationCONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationTraffic Driven Analysis of Cellular Data Networks
Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu
More informationMAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =
MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationMATH 240 Fall, Chapter 1: Linear Equations and Matrices
MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS
More informationTree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B4000
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More informationThe Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonaldiagonalorthogonal type matrix decompositions Every
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationAn extension of the factoring likelihood approach for nonmonotone missing data
An extension of the factoring likelihood approach for nonmonotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationThe Characteristic Polynomial
Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Onureena Banerjee Laurent El Ghaoui EECS Department University of California, Berkeley Berkeley, CA
More informationDefinition of a Linear Program
Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1
More informationSolution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
More informationBayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in threespace, we write a vector in terms
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationP164 Tomographic Velocity Model Building Using Iterative Eigendecomposition
P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More information10810 /02710 Computational Genomics. Clustering expression data
10810 /02710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intracluster similarity low intercluster similarity Informally,
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationLecture 7: Approximation via Randomized Rounding
Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? Web data (web logs, click histories) ecommerce applications (purchase histories) Retail purchase histories
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationGenerating Valid 4 4 Correlation Matrices
Applied Mathematics ENotes, 7(2007), 5359 c ISSN 16072510 Available free at mirror sites of http://www.math.nthu.edu.tw/ amen/ Generating Valid 4 4 Correlation Matrices Mark Budden, Paul Hadavas, Lorrie
More informationGreedy Column Subset Selection for Largescale Data Sets
Knowledge and Information Systems manuscript No. will be inserted by the editor) Greedy Column Subset Selection for Largescale Data Sets Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel Received:
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationBig Data Optimization: Randomized lockfree methods for minimizing partially separable convex functions
Big Data Optimization: Randomized lockfree methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the likeminded users
More informationThe pnorm generalization of the LMS algorithm for adaptive filtering
The pnorm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationUsing the Singular Value Decomposition
Using the Singular Value Decomposition Emmett J. Ientilucci Chester F. Carlson Center for Imaging Science Rochester Institute of Technology emmett@cis.rit.edu May 9, 003 Abstract This report introduces
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More information4.6 Linear Programming duality
4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal
More informationDistance based clustering
// Distance based clustering Chapter ² ² Clustering Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 99). What is a cluster? Group of objects separated from other clusters Means
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 234) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationMATH36001 Background Material 2015
MATH3600 Background Material 205 Matrix Algebra Matrices and Vectors An ordered array of mn elements a ij (i =,, m; j =,, n) written in the form a a 2 a n A = a 2 a 22 a 2n a m a m2 a mn is said to be
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationM5MS09. Graphical Modelling
Course: MMS09 Setter: Walden Checker: Ginzberg Editor: Calderhead External: Wood Date: April, 0 MSc EXAMINATIONS (STATISTICS) MayJune 0 MMS09 Graphical Modelling Setter s signature Checker s signature
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationEECS 445: Introduction to Machine Learning Winter 2015
Instructor: Prof. Jenna Wiens Office: 3609 BBB wiensj@umich.edu EECS 445: Introduction to Machine Learning Winter 2015 Graduate Student Instructor: Srayan Datta Office: 3349 North Quad (**office hours
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 918/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 201112 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationGeneralized Inverse Computation Based on an Orthogonal Decomposition Methodology.
International Conference on Mathematical and Statistical Modeling in Honor of Enrique Castillo. June 2830, 2006 Generalized Inverse Computation Based on an Orthogonal Decomposition Methodology. Patricia
More informationMA 242 LINEAR ALGEBRA C1, Solutions to Second Midterm Exam
MA 4 LINEAR ALGEBRA C, Solutions to Second Midterm Exam Prof. Nikola Popovic, November 9, 6, 9:3am  :5am Problem (5 points). Let the matrix A be given by 5 6 5 4 5 (a) Find the inverse A of A, if it exists.
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More informationVariational approach to restore pointlike and curvelike singularities in imaging
Variational approach to restore pointlike and curvelike singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure BlancFéraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012
More information