THE UNIVERSITY OF AUCKLAND

Similar documents
1 Introduction to Matrices

Linear Algebra Review. Vectors

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Direct Methods for Solving Linear Systems. Matrix Factorization

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Chapter 6. Orthogonality

Vector and Matrix Norms

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

by the matrix A results in a vector which is a reflection of the given

x = + x 2 + x

DATA ANALYSIS II. Matrix Algorithms

Lecture 5: Singular Value Decomposition SVD (1)

Inner Product Spaces

Nonlinear Iterative Partial Least Squares Method

LINEAR ALGEBRA. September 23, 2010

Notes on Symmetric Matrices

Manifold Learning Examples PCA, LLE and ISOMAP

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Inner Product Spaces and Orthogonality

Similarity and Diagonalization. Similar Matrices

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Data Mining: Algorithms and Applications Matrix Math Review

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

6. Cholesky factorization

SOLVING LINEAR SYSTEMS

Linear Algebra: Determinants, Inverses, Rank

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department

Factorization Theorems

Orthogonal Diagonalization of Symmetric Matrices

3. INNER PRODUCT SPACES

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

QUALITY ENGINEERING PROGRAM

Introduction to Matrix Algebra

MATH APPLIED MATRIX THEORY

Inner product. Definition of inner product

Chapter 20. Vector Spaces and Bases

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

α = u v. In other words, Orthogonal Projection

Solution of Linear Systems

Section Inner Products and Norms

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

Math 312 Homework 1 Solutions

(Quasi-)Newton methods

Chapter 13: Binary and Mixed-Integer Programming

Linear Algebra Notes for Marsden and Tromba Vector Calculus

8. Linear least-squares

October 3rd, Linear Algebra & Properties of the Covariance Matrix

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

Similar matrices and Jordan form

1 VECTOR SPACES AND SUBSPACES

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Machine Learning and Pattern Recognition Logistic Regression

Approximation Algorithms

7 Gaussian Elimination and LU Factorization

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

[1] Diagonal factorization

Mean value theorem, Taylors Theorem, Maxima and Minima.

Methods for Finding Bases

Nonlinear Programming Methods.S2 Quadratic Programming

Name: Section Registered In:

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

MAT 242 Test 2 SOLUTIONS, FORM T

Principles of Scientific Computing Nonlinear Equations and Optimization

Numerical Methods I Eigenvalue Problems

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Lecture 3: Linear methods for classification

Applied Linear Algebra I Review page 1

University of Lille I PC first year list of exercises n 7. Review

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Lecture Topic: Low-Rank Approximations

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

14. Nonlinear least-squares

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Factor Analysis. Chapter 420. Introduction

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

Nonlinear Algebraic Equations Example

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

DERIVATIVES AS MATRICES; CHAIN RULE

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

4.5 Linear Dependence and Linear Independence

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

NOV /II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

Lecture Notes 2: Matrices as Systems of Linear Equations

1 Sets and Set Notation.

F Matrix Calculus F 1

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Optimization of Supply Chain Networks

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Transcription:

COMPSCI 369 THE UNIVERSITY OF AUCKLAND FIRST SEMESTER, 2011 MID-SEMESTER TEST Campus: City COMPUTER SCIENCE Computational Science (Time allowed: 50 minutes) NOTE: Attempt all questions Use of calculators is NOT permitted. Put your answers in the answer boxes provided below each question. You may use the blank page at the end of the test script for scratch work, which will not be marked. Section: A B C D Total Possible marks: 15 10 12 13 50 Awarded marks: SURNAME: FIRSTNAME:

2 COMPSCI 369 Section A: Linear Systems, SVD, PCA (10 marks) 1. Which types of matrices appear in the LU decomposition, A = LU, and QR decomposition, A = QR, of a square m m matrix A. [2 marks] L (m m) lower-triangular matrix U (m m) upper-triangular matrix Q (m m) orthogonal {(column-)orthonormal} matrix R (m m) upper-triangular matrix 2. What decomposition is done by Gauss elimination and Gram-Schmidt orthogonalisation? [2 marks] Gauss elimination: LU decomposition Gram-Schmidt orthogonalisation: QR decomposition 3. The Singular Value Decomposition (SVD) of an ordinary m n matrix A, m n, is represented by a product, A = UDV T, of three matrices with special properties. In particular, U is an m n column-orthogonal matrix with the following properties: its n columns are the n top-rank eigenvectors u 1,...,u n of the m m symmetric matrix AA T, i.e. the only n eigenvectors that can have non-zero eigenvalues. Specify the properties of the matrices D and V. Hint: Consider the eigenvectors and eigenvalues {(v i, λ i) : i = 1,..., n}, of the n n matrix A T A. [3 marks] V n n column-orthonormal matrix; its columns are eigenvectors of the n n symmetric matrix A T A D n n diagonal matrix with the singular values σ i = λ i on the main diagonal. 4. Let A = [a 1,..., a n ]; m n, be an m n matrix of n centred m 1 measurement vectors a i ; i = 1,..., n. Let u 1 and u 2 be two principal components, i.e. the eigenvectors of the m m covariance matrix of the measurements, Σ n = 1 n 1 AAT, with the largest eigenvalues. Write down the projection matrix P 2, which projects the measurement vectors (columns of A) to the principal subspace of the eigenvectors u 1 and u 2. [3 marks] P 2 = [u 1 u 2 ] [ u T 1 u T 2 ]

3 COMPSCI 369 Section B: Root Finders, Optimisation, Least Squares (15 marks) 5. Explain how the root of the non-linear function g(x) = x 4 + 2x 2 25 is being searched for with the modified Newton method starting from the first approximation x [0] = 2 and derive the approximate root value x [1] at the first step of the search. Hint: In this particular case the Jacobian J(x) of the function g(x) is the first derivative J(x) = dg(x) at point dx x. The non-modified Newton method chooses at each iteration t the root x [t+1] of the linear approximation g app:t(x) = g(x [t] ) + J(x [t] ) x x [t], i.e. g ap:t(x [t+1] ) = 0. Recall what is different in the modified method. The Jacobian J [0] = 4(x [0] ) 3 + 4x [0] = 40. Each iteration of the modified Newton method chooses x t+1] such that g(x [t] ) + J [0] ( x [t+1] x [t]) = 0, i.e. x [t+1] = x [t] 1 40 g(x[t] ). At step t = 1: x [1] = 2 1 40 ( 1) = 2.025. 6. At each iteration of unconstrained gradient ) maximisation of a scalar bivariate function f(x, y), the gradient, f(x, y) =, is computed at the current point (x i, y i ), and the next ( f(x,y) x, f(x,y) y point, (x i+1, y i+1 ), ( is selected along a line L i (t) = (x i + tu i, y i + tv i ) through (x i, y i ) in the gradient direction u i = f(x,y) x, v i = f(x,y) y ). Write down in the general x=xi,y=y i x=xi,y=y i form the condition for selecting the next steepest ascent point (x i+1, y i+1 ). [3 marks] t = arg max {f(x i + tu i, y i + tv i )} t x i+1 = x i + t u i y i+1 = y i + t v i

4 COMPSCI 369 7. You have to maximise the function f(x, y) = x 3 x 2 y + 4xy 2 + y 3 subject to the constraint x 2 + y 2 = 4. Write down the Lagrangian F (x, y, λ) and derive the system of equations that give all the stationary (i.e. maximum, minimum, and saddle) points of the Lagrangian. You need not solve this system. Largrangian L(x, y, λ) = x 3 x 2 y + 4xy 2 + y 3 λ ( x 2 + y 2 4 ) System of equations: L(x,y,λ) x = 3x 2 2xy + 4y 2 2λx = 0 L(x,y,λ) x = x 2 + 8xy + 3y 2 2λy = 0 L(x,y,λ) λ = x 2 + y 2 4 = 0 8. You have to solve an overdetermined system Ax = a of m linear equations with n 1; n m, vector x of unknowns, provided that the m n matrix A and the m 1 vector a are known and the matrix A has n linearly independent columns. Derive the normal equation and the least squares solution for this system. Hint: Consider the residual error vector e x = Ax a and recall that the least squares solution x minimises the total squared error E(x) = e x 2 e T xe x. The total squared error to be minimised is E(x) = (Ax a) T (Ax a) = ( x T A T a T) (Ax a) = x T A T Ax 2x T A T a + a T a The minimiser x follows from the zero gradient xe(x) = 0, that is, the normal least squares equation is ( A T A ) x = A T a and the least squares solution is x = ( A T A ) 1 A T a.

5 COMPSCI 369 Section C: Probability Models, Finite Differences (12 marks) 9. You have two probability models for scalar random variables x 1, x 2, x 3, x 4, x 5,, x 6 : (a) Pr(x 1,..., x 6 ) = p 1 (x 1 )p 2 (x 2 x 1 )p 3 (x 3 x 2 )p 4 (x 4 x 3 )p 5 (x 5 x 4 )p 6 (x 6 x 5 ) and (b) Pr(x 1,..., x 6 ) = p 1 (x 1 )p 2 (x 2 )p 3 (x 3 )p 4 (x 4 x 1, x 2 )p 5 (x 5 x 1, x 2, x 3 )p 6 (x 6 x 4, x 5 ) in terms of the known unconditional and conditional probability distributions p j (x j ), j = 1,..., 6, p j (x j x j 1 ); j = 2,..., 6, p 4 (x 4 x 1, x 2 ), p 6 x 4, x 5 ), and p 5 (x j x 1, x 2, x 3 ). Name each model and draw a digraph showing causal relationships between its variables. [5 marks] (a) Pr(x 1,..., x 6 ) = p 1 (x 1 )p 2 (x 2 x 1 )p 3 (x 3 x 2 )p 4 (x 4 x 3 )p 5 (x 5 x 4 )p 6 (x 6 x 5 ) Name of the model: Markov chain Digraph: (b) Pr(x 1,..., x 6 ) = p 1 (x 1 )p 2 (x 2 )p 3 (x 3 )p 4 (x 4 x 1, x 2 )p 5 (x 5 x 1, x 2, x 3 )p 6 (x 6 x 4, x 5 ) Name(s) of the model: Bayesian network, or belief network Digraph: 10. Which nodes in the digraph showing the causal relationships in the above probability model (9b) form the Markov blanket of the node for the variable x 4? Hint: The Markov blanket of a node is the set of its parents, children, and co-parents of its children. [3 marks] Parents x 1, x 2, child x 6, and co-parent x 5

6 COMPSCI 369 11. Assuming only integer values of x, derive the centred finite difference approximation for the first derivative of function u(x) = x 4 and compare it to the exact derivative d dxu(x) at point x = 2. Hint: Recall that (a ± b) 4 = a 4 ± 4a 3 b + 6a 2 b 2 ± 4ab 3 + b 4 and dxn = dx nxn 1. The centred difference approximation 1 2 (x + 1)4 (x 1) 4 = 4x 3 + 4x. At point x = 1 the exact first derivative 4x 3 is equal to 4 2 3 = 32, whereas the finite difference approximation is equal to 32 + 8 = 40. Section D: Dynamic Programming, Search Methods (13 marks) 12. Consider a collection C = {001, 011, 111, 110} of 4 strings of length 3 over the binary alphabet {0, 1}, which are the leaves of the tree T below. Using the leave 110 as the root, perform the forward and backward passes of the Fitch s dynamic programming algorithm to find sets of the best candidate strings and pick the strings, respectively, for each of the two internal (parent) nodes, calculate the Hamming distances between the nodes, and find the optimum parsimony score for T. Hint: The forward pass computes a set of best candidate strings for each internal (non-leaf) node and the backward pass picks a label from the set for each internal node while computing an optimum parsimony score. Use the notation α[β : γ][δ : ɛ] for the set of strings αβδ and αγɛ. Optimum parsimony score = 3

7 COMPSCI 369 13. Draw a weighed digraph for computing the edit distance D(C 1, C 2 ) between two strings, C 1 = fat and C 2 = part, assuming the unit weight, δ = 1, of inserting or deleting a character; the unit weight, α i,j = 1, for substituting a j-th character c 2:j of C 2 for a different i-th character c 1:i of C 1, such that c 2:j c 1:i, and zero weight, α i,j = 0, for substituting the same character, c 2:j = c 1:i. Show the unit and zero weights of the graph edges with solid and dashed lines, respectively. Specify recurrent computations of the edit distance d i,j = D ( ) C 1:[i], C 2:[j] for transforming a substring C 1:i = c 1:0 c 1:i into a substring C 2:j = c 2:0 c 2:j. The indices i = 0 and j = 0 relate to empty substrings, and i = 1, 2, 3 and j = 1, 2, 3, 4 correspond to the substrings ending on the characters c 1:i and c 2:j, respectively, e.g. C 1:1 = f or C 2:3 = par. d i,j = 0 if i = j = 0 d i 1,j + 1 if i > 0; j = 0 d i,j 1 + 1 if i = 0; j > 0 min{d i 1,j + 1, d i,j 1 + 1, d i 1,j 1 + α i,j } if i > 0; j > 0 14. Which types of search through an implicit graph of possible solutions are performed by backtracking and branch-and-bound methods? [2 marks] Backtracking: the DFS (depth-first search) Branch-and-bound: the BFS (breath-first search) 15. You have to minimise a function f(x 1, x 2,..., x n ) of discrete variables x i {0, 1,..., K}, which has for any fixed subsequence (x 1,..., x j ); j = 1,..., n 1, easily computable lower and upper bounds L(x 1,..., x j ) f(x 1,..., x j, x j+1,..., x n )} U(x 1,..., x j ), which do not depend on the remaining variables (x j+1,..., x n ). Explain in brief how these bounds can be used in the branch-and-bound search to prune the tree of all the K n candidate solutions for discarding large subsets of fruitless candidates. [3 marks] If the lower bound for some tree node (set of candidates) n j is greater than the upper bound for some other node n k, then the node n j and all its descendants may be discarded.

8 COMPSCI 369 Blank page will not be marked