Overview. Sparse Matrices for Large Data Modeling. Acknowledgements



Similar documents
6. Cholesky factorization

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Data Mining: Algorithms and Applications Matrix Math Review

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

An Introduction to Machine Learning

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Arithmetic and Algebra of Matrices

Your Best Next Business Solution Big Data In R 24/3/2010

SOLVING LINEAR SYSTEMS

Linear Codes. Chapter Basics

The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files

Matrix Multiplication

7 Gaussian Elimination and LU Factorization

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

Elementary Matrices and The LU Factorization

Sections 2.11 and 5.8

Direct Methods for Solving Linear Systems. Matrix Factorization

Geostatistics Exploratory Analysis

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Smoothing and Non-Parametric Regression

Operation Count; Numerical Linear Algebra

Least Squares Estimation

Solving polynomial least squares problems via semidefinite programming relaxations

Exploratory Data Analysis with MATLAB

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Introduction to Matrix Algebra

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Chapter 6. Orthogonality

Transportation Polytopes: a Twenty year Update

LINEAR ALGEBRA. September 23, 2010

Several Views of Support Vector Machines

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Lecture 5 Principal Minors and the Hessian

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Applications to Data Smoothing and Image Processing I

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Numerical Methods I Eigenvalue Problems

Regularized Logistic Regression for Mind Reading with Parallel Validation

Exploratory Data Analysis

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department

Math 1050 Khan Academy Extra Credit Algebra Assignment

Multivariate Normal Distribution

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Similarity and Diagonalization. Similar Matrices

Similar matrices and Jordan form

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

HSL and its out-of-core solver

Quantile Regression under misspecification, with an application to the U.S. wage structure

R: A self-learn tutorial

Alabama Department of Postsecondary Education

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Solution of Linear Systems

Component Ordering in Independent Component Analysis Based on Data Power

Introduction to General and Generalized Linear Models

DATA ANALYSIS II. Matrix Algorithms

A note on fast approximate minimum degree orderings for symmetric matrices with some dense rows

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

GLAM Array Methods in Statistics

MATLAB and Big Data: Illustrative Example

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

A Direct Numerical Method for Observability Analysis

Interior-Point Algorithms for Quadratic Programming

Multivariate Analysis of Ecological Data

Fitting Subject-specific Curves to Grouped Longitudinal Data

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

Lecture 6: Logistic Regression

Installing R and the psych package

Sketch As a Tool for Numerical Linear Algebra

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

5. Linear Regression

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Linear Algebra Methods for Data Mining

Lecture 3: Finding integer solutions to systems of linear equations

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Applied Linear Algebra I Review page 1

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

Big Data Optimization at SAS

Big Data - Lecture 1 Optimization reminders

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Introduction of geospatial data visualization and geographically weighted reg

Linear Algebra Review. Vectors

Regression III: Advanced Methods

Randomized Robust Linear Regression for big data applications

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Penalized Logistic Regression and Classification of Microarray Data

Transcription:

Overview Sparse Matrices for Large Data Modeling Acknowledgements Martin Maechler ETH Zurich Switzerland maechler@r-project.org user!2007, Ames, Iowa, USA Aug 9 2007 Doug Bates has introduced me to sparse matrices and the Matrix package, and the last two years have been a lot of fun in collaboration with him. Large Data not just sparse matrices (Sparse) Matrices for Large Data : Applications LMER talk by Doug Bates * Quantile Smoothing with Constraints Regression Splines for Total Variation Penalized Densities; notably in 2D (triograms) * Sparse Least Squares Regression (and Generalized, Robust,...) Sparse Matrix Graph incidence (in Network ) Sparse Covariance or Correlation Matrices and Conditional Variance via sparse arithmetic * Teaser of a case study: ETH professor evaluation by students: Who s the best in teaching? Overview of Sparse Matrices sparse matrix storage Sparse Matrices in R s Matrix: arithmetic, indexing, solve methods, possibly even for sparse RHS. Sparse Matrices factorizations: chol(), qr, Schur, LU, Bunch-Kaufmann Large Data Analysis This session Large Data will focus on one important kind of large data analysis, namely: Sparse Matrix modelling. Further considerations a user should know: 1. Think first, then read the data Read the docs! The R Data Import/Export manual (part of the R manuals that come with R and are online in PDF and HTML). Note section 2.1 Variations on read.table ; and read help(read.table), notably about the colclasses and as.is arguments. How large is large? Do I only need some variables? Should I use a database system from R (SQLite, MySQL)? Rather work with simple random samples of rows?! 2. Think again: First plot, then model! (T-shirt); or slightly generalized: First explore, then model!, i.e., first use exploratory data analysis (EDA). 3.......

Constrained Quantile Smoothing Splines constrained B-spline fit Pin Ng (1996) defined quantile smoothing splines, as solution of, e.g., min g n ρτ (yi g(xi)) + λ max g (x). x (1) i=1 as a nonparametric estimator for gτ (x), τ (0, 1), where for τ = 1 2, i ρτ (ri) = i ri is least absolute values (L1) regression. Solving (1) means linear optimization with linear constraints, and hence can easily be extended by further (linear) constraints such monotonicity, convexity, upper bounds, exact fit constraints, etc. The matrix X corresponding to the linear optimization problem for the constrained smoothing splines is of dimension (f n) n but has only f2 n (f2 3) non-zero entries. Fit a constrained (B-) smoothing spline to n = 100 data points constrained to be monotone increasing, and fulfull the 3 pointwise constraints (above): > library(cobs) > Phi.cnstr <- rbind(c( 1, -3, 0), ## g(-3) >= 0 + c(-1, 3, 1), ## g(+3) <= 1 + c( 0, 0, 0.5)) ## g( 0) == 0.5 > msp <- cobs(x, y, nknots = length(x) - 1, + constraint = "increase", pointwise = Phi.cnstr, + lambda = 0.1) Example: constraints on g(.) are: g() increasing, i.e., g (x) > 0, and 0 < g( 3) g(0) = 0.5 g(3) < 1, and n = 50 observations, the matrices X and X X are > ## msp <- cobs(...) > plot(msp, main = "cobs(x,y, constraint= \"increase\", pointwise > abline(h = c(0,1), lty=2, col = "olivedrab", lwd = 2) > points(0, 0.5, cex=2, col = "olivedrab") > lines(xx, pnorm(2 * xx), col="light gray")# true function

Sparse Least Squares Koenker and Ng (2003) were the first to provide a sparse matrix package for R, including sparse least squares, via slm.fit(x,y,...). They provide the following nice example of a model matrix (probably from a quantile smoothing context): > library(matrix) > data(knex) # Koenker-Ng ex{ample} > dim (KNex$mm) [1] 1850 712 > print(image(knex$mm, aspect = "iso", colorkey = FALSE, + main = "Koenker-Ng model matrix")) Cholesky for Sparse L.S. For sparse matrices, the Cholesky decomposition has been the most researched factorization and hence used for least squares regression modelling. Estimating β in the model y = Xβ + ɛ by solving the normal equations X Xβ = X y (2) the Cholesky decomposition of the (symmetric positive semi-definite) X X is LL or R R with lower-left upper-right triangular matrix L or R L, respectively. System solved via two triangular (back- and forward-) solves : ˆβ = (X X) 1 X y = (LL ) 1 X y = L L 1 X y (3) CHOLMOD (Tim Davis, 2006): Efficient sparse algorithms. or rather transposed, for screen display: Cholesky Fill-in The usual Cholesky decomposition works,... > X.X <- crossprod(knex$mm) > c1 <- chol(x.x) > image(x.x, main= "X X", aspect="iso", colorkey = FALSE) > image(c1, main= "chol(x X)",...) but the resulting cholesky factor has suffered from so-called fill-in, i.e., its sparsity is quite reduced compared to X X.

Fill-reducing Permutation So, chol(x X) suffered from fill-in (sparsity decreased considerably). Solution: Fill-reducing techniques which permute rows and columns of X X, i.e., use P X XP for a permutation matrix P or in R syntax, X.X[pvec,pvec] where pvec is a permutation of 1:n. The permutation P is chosen such that the Cholesky factor of P X XP is as sparse as possible > image(t(c1), main= "t( chol(x X) )",...) > c2 <- Cholesky(X.X, perm = TRUE) > image(c2, main= "Cholesky(X X, perm = TRUE)",...) ((Note that such permutations are done for dense chol() when pivot=true, but there the goal is dealing with rank-deficiency.)) Timing Least Squares Solving > y <- KNex$y > m. <- as(knex$mm, "matrix") # traditional (dense) Matrix > system.time(cpod.sol <- solve(crossprod(m.), crossprod(m., y))) user system elapsed 2.111 0.001 2.119 > ## Using sparse matrices is so fast, we have to bump the time b > system.time(for(i in 1:10) ## sparse solution without permutat + sp1.sol <- solve(c1,solve(t(c1), crossprod(knex$mm, user system elapsed 0.049 0.000 0.048 > system.time(for(i in 1:10) ## sparse Cholesky WITH fill-reduc + sp2.sol <- solve(c2, crossprod(knex$mm, y))) user system elapsed 0.009 0.000 0.010 > stopifnot(all.equal(sp1.sol, sp2.sol), + all.equal(as.vector(sp2.sol), c(cpod.sol))) Fill-reducing Permutation - Result

Teaser case study: Who s the best prof? Modelling the ETH teacher ratings Private donation for encouraging excellent teaching at ETH Student union of ETH Zurich organizes survey to award prizes: Best lecturer of ETH, and of each of the 14 departments. Smart Web-interface for survey: Each student sees the names of his/her professors from the last 4 semesters and all the lectures that applied. ratings in {1, 2, 3, 4, 5}. high response rate Model: The rating depends on students (s) (rating subjectively) teacher (d) main interest department (dept) service lecture or own department student, (service: 0/1). semester of student at time of rating (studage {2, 4, 6, 8}). how many semesters back was the lecture (lectage). Main question: Who s the best prof? Hence, for political reasons, want d as a fixed effect. Who s the best prof data Model for ETH teacher ratings > ## read the data; several factor assignments, such as > md$d <- factor(md$d) # Lecturer_ID ("d"ozentin) > str(md) Want d ( teacher ID, 1000 levels) as fixed effect. Consequently, in y = Xβ + Zb + ɛ have X as n 1000 (roughly) data.frame : 73421 obs. of 7 variables: have Z as n 5000, n 70 000. $ s : Factor w/ 2972 levels "1","2","3","4",..: 1 1 1 1 2 2 3 3 3 3... $ d : Factor w/ 1128 levels "1","6","7","8",..: 525 560 832 1068 62 406 3 6 19 75... $ studage: Ord.factor w/ 4 levels "2"<"4"<"6"<"8": 1 1 1 1 1 1 1 1 1 1... > fm0 <- lmer(y ~ d*dept + dept*service + studage + lectage + (1 $ lectage: Ord.factor w/ 6 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 1 1 1 1 1 1... + data = md) $ service: Factor w/ 2 levels "0","1": 1 2 1 2 1 1 2 1 1 1... $ dept : Factor w/ 15 levels "1","2","3","4",..: 15 5 15 12 2 2 14 3 3 3... Error in model.matrix.default(mt, mf, contrasts) : $ y : int 5 2 5 3 2 4 4 5 5 4... cannot allocate vector of length 1243972003 > 1243972003 / 2^20 ## number of Mega bytes [1] 1186.344 Want sparse matrices for X and Z and crossprods, etc.

Intro to Sparse Matrices in R package Matrix simple example 2 The R Package Matrix contains dozens of matrix classes and hundreds of method definitions. Has sub-hierarchies of densematrix and sparsematrix. Very basic intro in some of sparse matrices: > str(a) # note that *internally* 0-based indices (i,j) are used Formal class dgtmatrix [package "Matrix"] with 6 slots..@ i : int [1:7] 0 2 3 4 5 6 7..@ j : int [1:7] 1 8 5 6 7 8 9..@ Dim : int [1:2] 10 20..@ Dimnames:List of 2..@ x : num [1:7] 7 14 21 28 35 42 49..@ factors : list() > A[2:7, 12:20] <- rep(c(0,0,0,(3:1)*30,0), length = 6*9) simple example Triplet form The most obvious way to store a sparse matrix is the so called Triplet form; (virtual class TsparseMatrix in Matrix): > A <- spmatrix(10,20, i = c(1,3:8), + j = c(2,9,6:10), + x = 7 * (1:7)) > A # a "dgtmatrix" 10 x 20 sparse Matrix of class "dgtmatrix" [1,]. 7.................. [2,].................... [3,]........ 14........... [4,]..... 21.............. [5,]...... 28............. [6,]....... 35............ [7,]........ 42........... [8,]......... 49.......... [9,].................... [10,].................... simple example 3 > A >= 20 # -> logical sparse; nice show() method 10 x 20 sparse Matrix of class "lgtmatrix" [1,].................... [2,]................. [3,]................. [4,]................ [5,]............... [6,].............. [7,].............. [8,]................... [9,].................... [10,]....................

sparse compressed form Conclusions Triplet representation: easy for us humbly humans, but can be both made smaller and more efficient for (column-access heavy) operations: The column compressed sparse representation, (virtual class CsparseMatrix in Matrix): > Ac <- as(t(a), "CsparseMatrix") > str(ac) Formal class dgcmatrix [package "Matrix"] with 6 slots..@ i : int [1:30] 1 13 14 15 8 14 15 16 5 15.....@ p : int [1:11] 0 1 4 8 12 17 23 29 30 30.....@ Dim : int [1:2] 20 10..@ Dimnames:List of 2..@ x : num [1:30] 7 30 60 90 14 30 60 90 21 30.....@ factors : list() column index slot j replaced by a column pointer slot p. Sparse Matrices: crucial for several important data modelling situations There s the R package Matrix... Many? more conclusions at the end of Doug Bates talk :-) Other R packages for large matrices biglm updating QR decompositions; storing O(p 2 ) instead of O(n p). R.Huge: Using class FileMatrix to store matrices on disk instead of RAM memory. SQLiteDF by Miguel Manese ( our Google Summer of Code project 2006). Description: Transparently stores data frames & matrices into SQLite tables.... sqldf by Gabor Grothendieck: learn SQL... R ff package (memory mapping arrays) by Adler, Nendic, Zucchini and Glaser: poster of today and......... winner of the user!2007 programming competition.