Computing a Nearest Correlation Matrix with Factor Structure

Similar documents

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

(Quasi-)Newton methods

Nonlinear Programming Methods.S2 Quadratic Programming

Perron vector Optimization applied to search engines

Notes on Symmetric Matrices

Multigrid preconditioning for nonlinear (degenerate) parabolic equations with application to monument degradation

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

2.3 Convex Constrained Optimization Problems

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

DATA ANALYSIS II. Matrix Algorithms

Solving polynomial least squares problems via semidefinite programming relaxations

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

Nonlinear Optimization: Algorithms 3: Interior-point methods

On using numerical algebraic geometry to find Lyapunov functions of polynomial dynamical systems

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

An Introduction on SemiDefinite Program

More than you wanted to know about quadratic forms

A note on companion matrices

Stationarity Results for Generating Set Search for Linearly Constrained Optimization

Examination paper for TMA4205 Numerical Linear Algebra

Big Data - Lecture 1 Optimization reminders

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

A Distributed Line Search for Network Optimization

Lecture 5 Principal Minors and the Hessian

Numerical Methods for Solving Systems of Nonlinear Equations

Least-Squares Intersection of Lines

Parameter Estimation for Bingham Models

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Lecture 13 Linear quadratic Lyapunov theory

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

GI01/M055 Supervised Learning Proximal Methods

Duality of linear conic problems

The Characteristic Polynomial

Portfolio Optimization Part 1 Unconstrained Portfolios

10. Proximal point method

Interior-Point Algorithms for Quadratic Programming

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Continuity of the Perron Root

Similarity and Diagonalization. Similar Matrices

The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Statistical Machine Learning

11 Projected Newton-type Methods in Machine Learning

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Completely Positive Cone and its Dual

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Inner Product Spaces and Orthogonality

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Observability Index Selection for Robot Calibration

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

Numerical Methods I Eigenvalue Problems

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Lecture 3: Linear methods for classification

Interior Point Methods and Linear Programming

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Mathematical finance and linear programming (optimization)

Applied Linear Algebra I Review page 1

Massive Data Classification via Unconstrained Support Vector Machines

CSCI567 Machine Learning (Fall 2014)

Lecture 2: The SVM classifier

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Chapter 6. Orthogonality

Numerical methods for American options

Inner Product Spaces

Cost Minimization and the Cost Function

Lecture 7: Finding Lyapunov Functions 1

Vector and Matrix Norms

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Solving Linear Systems, Continued and The Inverse of a Matrix

CS 295 (UVM) The LMS Algorithm Fall / 23

Sparse recovery and compressed sensing in inverse problems

4.6 Linear Programming duality

LINEAR ALGEBRA. September 23, 2010

Nonlinear Algebraic Equations Example

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS

1 Solving LPs: The Simplex Algorithm of George Dantzig

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Transcription:

Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger Borsdorf, Marcos Raydan NAG Quant Event, London, October 2009

Outline Properties of structured correlation matrices. Nearness problem for factor structured correlation matrices. Selection of optimization method. Numerical analysis issues. MIMS Nick Higham Nearest Correlation Matrix 2 / 33

Correlation Matrix n n symmetric positive semidefinite matrix A with a ii 1. symmetric, 1s on the diagonal, eigenvalues nonnegative or all principal minors nonnegative. Properties: off-diagonal elements between 1 and 1, convex set. MIMS Nick Higham Nearest Correlation Matrix 3 / 33

Quiz Is this a correlation matrix? 1 1 0 1 1 1. 0 1 1 MIMS Nick Higham Nearest Correlation Matrix 4 / 33

Quiz Is this a correlation matrix? 1 1 0 1 1 1. Spectrum: 0.4142, 1.0000, 2.4142. 0 1 1 MIMS Nick Higham Nearest Correlation Matrix 4 / 33

Quiz Is this a correlation matrix? 1 1 0 1 1 1. Spectrum: 0.4142, 1.0000, 2.4142. 0 1 1 For what w is this a correlation matrix? 1 w w w 1 w. w w 1 MIMS Nick Higham Nearest Correlation Matrix 4 / 33

Quiz Is this a correlation matrix? 1 1 0 1 1 1. Spectrum: 0.4142, 1.0000, 2.4142. 0 1 1 For what w is this a correlation matrix? 1 w w 1 w 1 w. n 1 w 1. w w 1 MIMS Nick Higham Nearest Correlation Matrix 4 / 33

Structured Correlation Matrices Nonnegative: 1 1 1 2 3 1 1 1 2 4 1 1 1 3 4. MIMS Nick Higham Nearest Correlation Matrix 5 / 33

Structured Correlation Matrices Nonnegative: Low rank: 1 1 1 2 3 1 1 1 2 4 1 1 1 3 4. 1 1 1 1 1 1. 1 1 1 MIMS Nick Higham Nearest Correlation Matrix 5 / 33

Structured Correlation Matrices Nonnegative: Low rank: 1 1 1 2 3 1 1 1 2 4 1 1 1 3 4. 1 1 1 1 1 1. 1 1 1 Factor structure: 1 x 1x 2 x 1 x 3 x 1 x 2 1 x 2 x 3. x 1 x 3 x 2 x 3 1 MIMS Nick Higham Nearest Correlation Matrix 5 / 33

Approximate Correlation Matrices Empirical correlation matrices often not true correlation matrices, due to http://www.movielens.org asynchronous data missing data limited precision stress testing MIMS Nick Higham Nearest Correlation Matrix 6 / 33

Nearest Correlation Matrix Find X achieving min{ A X F : X is a correlation matrix }, where A 2 F = i,j a2 ij. Constraint set is a closed, convex set, so unique minimizer. Nonlinear optimization problem. MIMS Nick Higham Nearest Correlation Matrix 7 / 33

Optimization Problems Issues Existence and uniqueness of solution. Convexity? Explicit, closed-form solution? Choice of algorithm. Availability of derivatives. Starting matrix and convergence criterion. Practical behaviour. MIMS Nick Higham Nearest Correlation Matrix 8 / 33

Quick and Dirty Differentiation Analytic function f : R R, i = 1. Complex step approximation: E.g., h = 10 100. f (x) Im f(x + ih). h MIMS Nick Higham Nearest Correlation Matrix 9 / 33

Quick and Dirty Differentiation Analytic function f : R R, i = 1. Complex step approximation: E.g., h = 10 100. f (x) Im f(x + ih). h f f(x + ih) (x) = Im + O(h 2 ), h f(x) = Re f(x + ih) + O(h 2 ). See MIMS EPrint 2009.31, April 2009. MIMS Nick Higham Nearest Correlation Matrix 9 / 33

Alternating Projections Method H (2002): repeatedly project onto the positive semidefinite matrices then the unit diagonal matrices. S 1 S 2 Easy to implement. Guaranteed convergence, at a linear rate. Can add further constraints/projections, e.g., fixed elements (Lucas, 2001). MIMS Nick Higham Nearest Correlation Matrix 10 / 33

Newton Method Qi & Sun (2006): Newton method based on theory of strongly semismooth matrix functions. Applies Newton to dual (unconstrained) of min 1 2 A X 2 F problem. Globally and quadratically convergent. H & Borsdorf (2007) improve efficiency and reliability by using appropriate iterative method and eigensolver, preconditioning the Newton direction solve. The basis of G02AAF (nearest correlation matrix) in NAG Library Mark 22. MIMS Nick Higham Nearest Correlation Matrix 11 / 33

Factor Model ξ = X }{{} n k Since E(ξ) = 0, η }{{} k 1 + diag(f i ) } {{ } n n }{{} ε n 1, η,ε N(0, 1). cov(ξ) = E(ξξ T ) = XX T + F 2. Assume var(ξ i ) 1. Then k j=1 x 2 ij + f 2 ii = 1, so k j=1 x 2 ij 1, i = 1: n. Collateralized debt obligations (CDOs), multivariate time series. MIMS Nick Higham Nearest Correlation Matrix 12 / 33

Structured Correlation Matrix Yields correlation matrix of form C(X) = D + XX T = D + k j=1 x j x T j, D = diag(i XX T ), X = [x 1,...,x k ]. C(X) has k factor correlation matrix structure. MIMS Nick Higham Nearest Correlation Matrix 13 / 33

Structured Correlation Matrix Yields correlation matrix of form C(X) = D + XX T = D + k j=1 x j x T j, D = diag(i XX T ), X = [x 1,...,x k ]. C(X) has k factor correlation matrix structure. 1 y1 Ty 2... y1 Ty n y C(X) = 1 T y 2 1........ yn 1 T y, y i R k. n y1 Ty n... yn 1 T y n 1 MIMS Nick Higham Nearest Correlation Matrix 13 / 33

Aims For k factor correlation matrices, investigate mathematical properties, nearness problem. MIMS Nick Higham Nearest Correlation Matrix 14 / 33

1-Parameter Correlation Matrix X(w) = 1 w w w 1 w w w 1, w R. Theorem min{ A X(w) F : X(w) a corr. matrix } has unique solution the projection of onto [ 1/(n 1), 1]. w = et Ae trace(a), n 2 n MIMS Nick Higham Nearest Correlation Matrix 15 / 33

Block Structured Correlation Matrix 1 γ 11 γ 12 γ 12 γ 11 1 γ 12 γ 12 γ 12 γ 12 1 γ 22 γ 12 γ 12 γ 22 1, C ij = { C(γii ) R n i n i, i = j, γ ij ee T R n i n j, i j. Objective function: f(γ) = A C(Γ) 2 F = m A ii C(γ ii ) 2 F + i j i=1 A ij γ ij ee T 2 F. Convex constraint set unique minimizer. Alternating projections converges. MIMS Nick Higham Nearest Correlation Matrix 16 / 33

1-Factor Correlation Matrix i.e., c ij = x i x j, i j. C(x) = diag(1 x 2 i ) + xx T, x R n Lemma det(c(x)) = n (1 xi 2 ) + i=1 n i=1 x 2 i n j=1 j i (1 x 2 j ). Corollary If x e with x i = 1 for at most one i then C(x) is nonsingular. C(x) is singular if x i = x j = 1 for some i j. MIMS Nick Higham Nearest Correlation Matrix 17 / 33

Rank Result Theorem C(x) = diag(1 x 2 i ) + xx T, Let x R n with x e. Then rank(c(x)) = min(p + 1, n), where p is the number of x i for which x i < 1. 1 1 1 x 4 x 5 1 1 1 x 4 x 5 x = [1 1 1 x 4 x 5 ] C(x) = 1 1 1 x 4 x 5 x 4 x 4 x 4 1 x 4 x 5 x 5 x 5 x 5 x 4 x 5 1. MIMS Nick Higham Nearest Correlation Matrix 18 / 33

One-Factor Problem min x R n f(x) := A C(x) 2 F subject to e x e. Objective function is nonconvex. The constraint implies C(x) is a correlation matrix. MIMS Nick Higham Nearest Correlation Matrix 19 / 33

One-Factor Problem: Derivatives Objective: f(x) = A I, A I F 2x T (A I)x + (x T x) 2 n i=1 x 4 i. Gradient: f(x) = 4((x T x)x (A I)x diag(x 2 i )x). Hessian: 2 f(x) = 4(2xx T + (x T x + 1)I A 3diag(x 2 i )). f(x), 2 f(x) cheap. f(x) has a saddle point at x = 0. MIMS Nick Higham Nearest Correlation Matrix 20 / 33

Case 1: f(x ) = 0 If f(x ) = 0 then 2 f(x ) has the form H n (x) = (x T x)i + xx T 2D 2, x R n. For example, for n = 4: x2 2 + x 3 2 + x 4 2 x 1 x 2 x 1 x 3 x 1 x 4 x 2 x 1 x1 2 + x 3 2 + x 4 2 x 2 x 3 x 2 x 4 x 3 x 1 x 3 x 2 x1 2 + x 2 2 + x 4 2 x 3 x 4. x 4 x 1 x 4 x 2 x 4 x 3 x1 2 + x 2 2 + x 3 2 MIMS Nick Higham Nearest Correlation Matrix 21 / 33

Case 1: f(x ) = 0 If f(x ) = 0 then 2 f(x ) has the form H n (x) = (x T x)i + xx T 2D 2, x R n. For example, for n = 4: x2 2 + x 3 2 + x 4 2 x 1 x 2 x 1 x 3 x 1 x 4 x 2 x 1 x1 2 + x 3 2 + x 4 2 x 2 x 3 x 2 x 4 x 3 x 1 x 3 x 2 x1 2 + x 2 2 + x 4 2 x 3 x 4. x 4 x 1 x 4 x 2 x 4 x 3 x1 2 + x 2 2 + x 3 2 Theorem H n is positive semidefinite. Moreover, H n is nonsingular iff at least three of x 1, x 2,...,x n are nonzero. MIMS Nick Higham Nearest Correlation Matrix 21 / 33

Case 2: f(x ) > 0 Can write 1 4 2 f(x) = H n (x) + E n (x) where E n has the form 0 x 1 x 2 a 12 x 1 x 3 a 13 x 1 x 4 a 14 E 4 = x 2 x 1 a 21 0 x 2 x 3 a 23 x 2 x 4 a 24 x 3 x 1 a 31 x 3 x 2 a 32 0 x 3 x 4 a 34. x 4 x 1 a 41 x 4 x 2 a 42 x 4 x 3 a 43 0 Hence, if x i x j a ij is sufficiently small and H n positive definite a stationary point will be a local minimizer. MIMS Nick Higham Nearest Correlation Matrix 22 / 33

k Factor Problem C(X) := I diag(xx T ) + XX T with X R n k. Representation not unique! k j=1 x 2 ij 1 = C(X) is a correlation matrix. The k factor problem is min X R n k f(x) := A C(X) 2 F subject to k j=1 x 2 ij 1. MIMS Nick Higham Nearest Correlation Matrix 23 / 33

k Factor Problem: Derivatives Gradient f(x) = 4(X(X T X) AX + X diag(xx T )X) Hessian given implicitly, can be viewed as a matrix representation of the Fréchet derivative of f(x). MIMS Nick Higham Nearest Correlation Matrix 24 / 33

Choice of Optimization Method Derivatives available. Ignore the constraints? Starting matrix, convergence test? MIMS Nick Higham Nearest Correlation Matrix 25 / 33

Choice of Optimization Method Derivatives available. Ignore the constraints? Starting matrix, convergence test? Rich set of solvers in NAG Library, Mark 22: E04 - Minimizing or Maximizing a Function E05 - Global Optimization of a Function MATLAB Optimization toolbox. MIMS Nick Higham Nearest Correlation Matrix 25 / 33

Alternating Directions Hence f (x ij ) = 0 if f(x ij ) = const. + 2 q i x ij = Project x ij onto [ 1, 1]. Convergence guaranteed. ( a iq k ) 2 x is x qs. s=1 q i x qj (a iq ) s j x isx qs. q i x 2 qj Limit may not be feasible for k > 1. MIMS Nick Higham Nearest Correlation Matrix 26 / 33

Principal Factors Method Anderson, Sidenius & Basu (2003): with F(X) = I diag(xx T ), X i = argmin X R n k A F(X i 1 ) XX T F. Min obtained by eigendecomposition of A F(X i 1 ). Equivalent to alternating projections method for U := {W R n n : w ij = a ij for i j} convex, S := {W R n n : W = XX T for X R n k } nonconvex! Alt proj theory says no guarantee of convergence! Constraints ignored, so project final iterate onto them. MIMS Nick Higham Nearest Correlation Matrix 27 / 33

Spectral Projected Gradient Method Birgin, Martínez & Raydan (2000). To minimize f : R n R over convex set Ω: x k+1 = x k + α k d k. d k = P Ω (x k t k f(x k )) x k is descent direction, α k [ 1, 1] chosen through nonmonotone line search strategy. Promising since P Ω ( ) is cheap to compute. MIMS Nick Higham Nearest Correlation Matrix 28 / 33

Code in ACM TOMS (Alg 813).

Test Examples corr: gallery( randcorr,n) nrand: 1 2 (B + BT ) + diag(i B) with B [ 1, 1] n n such that λ min (B) < 0. Results averaged over 10 instances. AD: alternating directions. PFM: principal factors method. Nwt: e04lb of NAG Toolbox for MATLAB (modified Newton), bound constraints. SPGM: spectral projected gradient method. MIMS Nick Higham Nearest Correlation Matrix 30 / 33

Comparison: k = 1, n = 2000 tol= 10 3 tol= 10 6 t(sec.) #its f(x ) t(sec.) #its f(x ) corr, f(x 0 ) = 26.0 AD 3.3 5.2 26.0 3938 7282 26.0 PFM 68 1.1 26.0 827 18 26.0 Nwt 23 1.8 26.0 36 5.0 26.0 SPGM 9.8 5.2 26.0 638 760 26.0 nrand f(x 0 ) = 825.13 AD 3.8 7.2 815.79 3.4 10.0 815.79 PFM 22 3.0 815.81 19.0 4.0 815.81 Nwt 4167 1222 815.79 4312 1229 815.79 SPGM 9.4 7.2 815.79 11 9.6 815.79 MIMS Nick Higham Nearest Correlation Matrix 31 / 33

Comparison: k = 6, n = 1000 tol= 10 3 tol= 10 6 t(sec.) #its f(x ) t(sec.) #its f(x ) corr, f(x 0 ) = 18.5 AD 704 836 18.38 5060 5955 18.38 PFM 10 4.1 18.38 95 28.1 18.38 Nwt 167 52 18.38 280 68.2 18.38 SPGM 24 235 18.38 108 892 18.38 nrand, f(x 0 ) = 415 AD 8694 9816 421 1.13e4 1.28e4 414 PFM 10.1 6.0 421 9.8 10 420 Nwt 146 40.8 421 109 56 420 SPGM 122 1263 407 276 2925 407 MIMS Nick Higham Nearest Correlation Matrix 32 / 33

Conclusions Performance of methods depends on the problem type, the required tolerance, the problem size. Alternating directions good for k = 1, low accuracy. Principal factors method generally fast, but may not converge to feasible point. Spectral projected gradient method wins overall. Incorporating the constraints need not hurt performance. Exploit convexity when present. MIMS Nick Higham Nearest Correlation Matrix 33 / 33

References I L. Anderson, J. Sidenius, and S. Basu. All your hedges in one basket. Risk, pages 67 72, Nov. 2003. www.risk.net. J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMA J. Numer. Anal., 8:141 148, 1988. E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim., 10(4):1196 1211, 2000. MIMS Nick Higham Nearest Correlation Matrix 29 / 33

References II E. G. Birgin, J. M. Martínez, and M. Raydan. Algorithm 813: SPG Software for convex-constrained optimization. ACM Trans. Math. Software, 27(3):340 349, 2001. E. G. Birgin, J. M. Martínez, and M. Raydan. Spectral projected gradient methods. In C. A. Floudas and P. M. Pardalos, editors, Encyclopedia of Optimization, pages 3652 3659. Springer-Verlag, Berlin, second edition, 2009. P. Glasserman and S. Suchintabandid. Correlation expansions for CDO pricing. Journal of Banking & Finance, 31:1375 1398, 2007. MIMS Nick Higham Nearest Correlation Matrix 30 / 33

References III N. J. Higham. Computing the nearest correlation matrix A problem from finance. IMA J. Numer. Anal., 22(3):329 343, 2002. C. Lucas. Computing nearest covariance and correlation matrices. M.Sc. Thesis, University of Manchester, Manchester, England, Oct. 2001. MIMS Nick Higham Nearest Correlation Matrix 31 / 33

References IV H.-D. Qi and D. Sun. A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl., 28(2):360 385, 2006. P. Sonneveld, J. J. I. M. van Kan, X. Huang, and C. W. Oosterlee. Nonnegative matrix factorization of a correlation matrix. Linear Algebra Appl., 431:334 349, 2009. MIMS Nick Higham Nearest Correlation Matrix 32 / 33

References V A. Vandendorpe, N.-D. Ho, S. Vanduffel, and P. Van Dooren. On the parameterization of the CreditRisk + model for estimating credit portfolio risk. Insurance: Mathematics and Economics, 42(2): 736 745, 2008. MIMS Nick Higham Nearest Correlation Matrix 33 / 33