Operation Count; Numerical Linear Algebra



Similar documents
Solution of Linear Systems

SOLVING LINEAR SYSTEMS

General Framework for an Iterative Solution of Ax b. Jacobi s Method

8 Square matrices continued: Determinants

6. Cholesky factorization

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

CS3220 Lecture Notes: QR factorization and orthogonal transformations

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Using row reduction to calculate the inverse and the determinant of a square matrix

Notes on Determinant

2.3. Finding polynomial functions. An Introduction:

LINEAR ALGEBRA. September 23, 2010

Data Mining: Algorithms and Applications Matrix Math Review

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Direct Methods for Solving Linear Systems. Matrix Factorization

University of Lille I PC first year list of exercises n 7. Review

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

8.2. Solution by Inverse Matrix Method. Introduction. Prerequisites. Learning Outcomes

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

The Characteristic Polynomial

Vector and Matrix Norms

Lecture 3: Finding integer solutions to systems of linear equations

How To Understand And Solve Algebraic Equations

7 Gaussian Elimination and LU Factorization

Question 2: How do you solve a matrix equation using the matrix inverse?

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Unit 18 Determinants

1 Determinants and the Solvability of Linear Systems

Solving simultaneous equations using the inverse matrix

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Equations, Inequalities & Partial Fractions

Linear Algebra Notes

Lecture 1: Systems of Linear Equations

Linear Algebra: Determinants, Inverses, Rank

Linear Equations ! $ & " % & " 11,750 12,750 13,750% MATHEMATICS LEARNING SERVICE Centre for Learning and Professional Development

Solving Systems of Linear Equations

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department

Introduction to Matrix Algebra

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Eigenvalues and Eigenvectors

Review of Fundamental Mathematics

Typical Linear Equation Set and Corresponding Matrices

1 Lecture: Integration of rational functions by decomposition

Solving Linear Systems, Continued and The Inverse of a Matrix

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Lecture 5: Singular Value Decomposition SVD (1)

by the matrix A results in a vector which is a reflection of the given

Linear Programming Notes V Problem Transformations

Continued Fractions and the Euclidean Algorithm

MATH APPLIED MATRIX THEORY

3 Orthogonal Vectors and Matrices

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Natural cubic splines

EL-9650/9600c/9450/9400 Handbook Vol. 1

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Solution to Homework 2

Answer Key for California State Standards: Algebra I

The Method of Partial Fractions Math 121 Calculus II Spring 2015

Numerical Matrix Analysis

1 Solving LPs: The Simplex Algorithm of George Dantzig

Solving Systems of Linear Equations Using Matrices

Excel supplement: Chapter 7 Matrix and vector algebra

4.3 Lagrange Approximation

DATA ANALYSIS II. Matrix Algorithms

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

3.1. Solving linear equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

MATH 60 NOTEBOOK CERTIFICATIONS

7.4. The Inverse of a Matrix. Introduction. Prerequisites. Learning Style. Learning Outcomes

Matrix Multiplication

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

Iterative Methods for Solving Linear Systems

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

Method To Solve Linear, Polynomial, or Absolute Value Inequalities:

Linear Programming. March 14, 2014

8.1. Cramer s Rule for Solving Simultaneous Linear Equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

9.2 Summation Notation

A Concrete Introduction. to the Abstract Concepts. of Integers and Algebra using Algebra Tiles

3.6. Partial Fractions. Introduction. Prerequisites. Learning Outcomes

SOLVING COMPLEX SYSTEMS USING SPREADSHEETS: A MATRIX DECOMPOSITION APPROACH

DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH

1.7 Graphs of Functions

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

CURVE FITTING LEAST SQUARES APPROXIMATION

Vocabulary Words and Definitions for Algebra

Measurement with Ratios

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Matrix Differentiation

Solving Systems of Linear Equations

Inner Product Spaces

Similarity and Diagonalization. Similar Matrices

Big Ideas in Mathematics

Transcription:

10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point operations are the dominant cost then the computation time is proportional to the number of mathematical operations. Therefore, we should practice counting. For example, a 0 +a 1 x+a 2 x 2 involves two additions and three multiplications, because the square also requires a multiplication, but the equivalent formula a 0 +(a 1 +a 2 x)x involves only two multiplications and two additions. More generally, a N x N +...+a 1 x+a 0 involves N additions and N + (N 1)+...+1 = N(N +1)/2 multiplications, but (a N x+...+a 1 )x+a 0 only N multiplications and N additions. Both require O(N) operations, but the first takes about N 2 /2 flops for large N, the latter 2N for large N. (Although the second form of polynomial evaluation is superior to the first in terms of the number of floating-point operations, in terms of roundoff it may be the other way round.) Multiplying two N N matrices obviously requires N multiplications and N 1 additions for each element. Since there are N 2 elements in the matrix this yields a total of N 2 (2N 1) floating-point operations, or about 2N 3 for large N, that is, O(N 3 ). A precise definition of the order of symbol O is in place (Big-O notation). A function is of order x p if there is a constant c such that the absolute value of the function is no larger than cx p for sufficiently large x. For example, 2N 2 + 4N + log(n) + 7 1/N is O(N 2 ). With 62

Chapter 10 63 this definition, a function that is O(N 6 ) is also O(N 7 ), but it is usually implied that the power is the lowest possible. The analogous definition is also applicable for small numbers, as in chapter 7. More generally, a function is of order g(x) if f(x) c g(x) for x > x 0. An actual comparison of the relative speed of floating-point operations is given in table 8-I. According to that table, we do not need to distinguish between addition, subtraction, and multiplication, but divisions take somewhat longer. 10.2 Slow and fast method for the determinant of a matrix It is easy to exceed the computational ability of even the most powerful computer. Hence methods are needed that solve a problem quickly. As a demonstration we calculate the determinant of a matrix. Doing these calculations by hand gives us a feel for the problem. Although we are ultimately interested in the N N case, the following 3 3 matrix serves as an example: A = 2 1 1 1 2 5 One way to evaluate a determinant is Cramer s rule, according to which the determinant can be calculated in terms of the determinants of submatrices. Cramer s rule using the first row yields det(a) = 1 (5 + 2) 1 (10 1) + 0 (4 + 1) = 7 9 = 2. For a matrix of size N this requires calculating N subdeterminants, each of which in turn requires N 1 subdeterminants, and so on. Hence the number of necessary operations is O(N!). A faster way of evaluating the determinant of a large matrix is to bring the matrix to upper triangular or lower triangular form by linear transformations. Appropriate linear transformations preserve the value of the determinant. The determinant is then the product of diagonal elements, as is clear from the previous definition. For our example the transforms(row2 2 row1) row2 and(row3+row1) row3 yield zeros

64 Lessons in Computational Physics in the first column below the first matrix element. Then the transform (row3+3 row2) row3 yields zeros below the second element on the diagonal: 2 1 1 1 2 5 0 1 1 0 3 5 0 1 1 0 0 2 Now, the matrix is in triangular form and det(a) = 1 ( 1) 2 = 2. An N N matrix requires N such steps; each linear transformation involves adding a multiple of one row to another row, that is, N or fewer additions and N or fewer multiplications. Hence this is an O(N 3 ) procedure. Therefore calculation of the determinant by bringing the matrix to upper triangular form is far more efficient than either of the previous two methods. For say N = 10, the change from N! to N 3 means a speedup of very roughly a thousand. This enormous speedup is achieved through a better choice of numerical method. 10.3 Non-intuitive operation counts in linear algebra We all know how to solve a linear system of equations by hand, by extracting one variable at a time and repeatedly substituting it in all remaining equations, a method called Gauss elimination. This is essentially the same as we have done above in eliminating columns. The following symbolizes the procedure again on a 4 4 matrix: ** ** ** * Stars indicate nonzero elements and blank elements are zero. Eliminating the first column takes about 2N 2 floating-point operations, the second column 2(N 1) 2, the third column 2(N 2) 2, and so on. This yields a total of about 2N 3 /3 floating-point operations. (One way to see that is to approximate the sum by an integral, and the integral of N 2 is N 3 /3.)

Chapter 10 65 time (seconds) 10 4 10 3 10 2 10 1 10 0 10-1 10-2 10 2 10 3 10 4 problem size N Figure 10-1: Execution time of a program solving a system of N linear equations in N variables. For comparison, the dashed line shows an increase proportional to N 3. Once triangular form is reached, the value of one variable is known and can be substituted in all other equations, and so on. These substitutions require only O(N 2 ) operations. A count of 2N 3 /3 is less than the approximately 2N 3 operations for matrix multiplication. Solving a linear system is faster than multiplying two matrices! During Gauss elimination the right-hand side of a system of linear equations is transformed along with the matrix. Many right-hand sides can be transformed simultaneously, but they need to be known in advance. Inversion of a square matrix can be achieved by solving a system with N different right-hand sides. Since the right-hand side(s) can be carried along in the transformation process, this is still O(N 3 ). Given Ax = b, the solution x = A 1 b can be obtained by multiplying the inverse of A with b, but it is not necessary to invert a matrix to solve a linear system. Solving a linear system is faster than inverting and inverting a matrix is faster than multiplying two matrices. We have only considered efficiency. It is also desirable to avoid dividing by too small a number or to optimize roundoff behavior or to introduce parallel efficiency. Since the solution of linear systems is an important and ubiquitous application, all these issues have received detailed attention and elaborate routines are available. Figure 10-1 shows the actual time of execution for a program that solves a linear system of N equations in N variables. First of all, note the tremendous computational power of computers: Solving a linear system in 1000 variables, requiring about 600 million floating point operations, takes less than one second. (Using my workstation bought around 2010

66 Lessons in Computational Physics and Matlab, I can solve a linear system of about 3500 equations in one second. Matlab automatically took advantage of all 4 CPU cores.) The increase in computation time with the number of variables is not as ideal as expected from the operation count, because time is required not only for arithmetic operations but also for data movement. For this particular implementation the execution time is larger when N is a power of two. Other wiggles in the graph arise because the execution time is not exactly the same every time the program is run. Table 10-I shows the operation count for several important problems. Operation count also goes under the name of computational cost or computational complexity (of an algorithm). problem operation memory count count Solving system of N linear equations in N variables O(N 3 ) O(N 2 ) Inversion of N N matrix O(N 3 ) O(N 2 ) Inversion of tridiagonal N N matrix O(N) O(N) Sorting N real numbers O(N logn) O(N) Fourier transform of N equidistant points O(N logn) O(N) Table 10-I: Order counts and required memory for several important problems solved by standard algorithms. A matrix where most elements are zero is called sparse. An example is the tridiagonal matrix: A tridiagonal matrix has the following shape:

Chapter 10 67......... A sparse matrix, where most elements are zero, can be solved with appropriate algorithms, much faster and with less storage than a full matrix. For example, a tridiagonal system of equations can be solved with O(N) effort, not O(N 2 ) but O(N), as compared to O(N 3 ) for the a matrix. The ratio of floating-point operations to bytes of main memory accessed is called the arithmetic intensity. Large problems with low arithmetic intensity are limited by memory bandwidth, while large problems with high arithmetic intensity are floating point limited. For example, a matrix transpose has an arithmetic intensity of zero, matrix addition of double precision numbers 1/(3 8) (very low), and naive matrix multiplication 2N/24 (high). 10.4 Data locality Calculations may not be limited by FLOPs/second but by data movement; hence we may also want to keep track of and minimize access to main memory. For multiplication of two N N matrices, simple row times column multiplication involves 2N memory pulls for each of the N 2 output elements. If the variables are 8-byte double precision numbers, then the arithmetic intensity is (2N 1)/(8(2N +1)) 1/8 FLOPs/byte. Hence the bottleneck may be memory bandwidth. Tiled matrix multiplication: Data traffic from main memory can be reduced by tiling the matrix. The idea is to divide the matrix into m m tiles small enough that they fit into a fast (usually on-chip) memory. For each pair of input tiles, there are 2m 2 movements out of and m 2 into the (slow) main memory. In addition, the partial sums, which will have to be stored temporarily in main memory, have to be brought in again for all

68 Lessons in Computational Physics but the first block; that is about another m 2 of traffic per tile. For each output tile, we have to go through N/m pairs of input tiles. In total, this amounts to (N/m) 3 4m 2 = 4N 3 /m memory movements, or 4N/m for each matrix element, versus 2N for the naive method. This reduces data access to the main memory and turns matrix multiplication into a truly floating point limited operation. There are enough highly-optimized matrix multiplication routines around that we never have to write one on our own, but the same concept grouping of data to reduce traffic to main memory can be applied to various problems with low arithmetic intensity. Recommended Reading: Golub & Van Loan, Matrix Computations is a standard work on numerical linear algebra.