# S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)

Save this PDF as:

Size: px
Start display at page:

Download "S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)"

## Transcription

1 last revised: 26 January Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving to analysis of short-run and long-run dynamics. This chapter also introduces one sociological application social mobility that will be pursued further in Chapter Description Consider a process that occupies one of n possible states each period. If the process is currently in state i, then it will occupy state j in the next period with probability P (i, j). Crucially, transition probabilities are determined entirely by the current state no further history dependence is permitted and these probabilities remain fixed over time. Under these conditions, this process is a Markov chain process, and the sequence of states generated over time is a Markov chain. To restate this somewhat more formally, suppose that the possible states of the process are given by the finite set S = {1, 2,..., n}. Let s t S denote the state occupied by the process in period t {0, 1, 2,...}. Further suppose that P rob(s t+1 = j s t = i) = P (i, j) where P (i, j) is parameter fixed for each pair of states (i, j) S S. As this notation makes explicit, the probability of moving to each state j in period t + 1 depends only on the state i occupied in period t. Given these assumptions, the process is a Markov chain process, and the sequence (s 0, s 1, s 2,...) is a Markov chain. The parameters of a Markov chain process can thus be summarized by a transition matrix, written as P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = P (n, 1) P (n, 2)... P (n, n) Of course, because the elements of this matrix are interpreted as probabilities, they must be non-negative. That is, P (i, j) 0 for all i, j S. 1

2 Further, each row of P must be a probability vector, which requires that P (i, j) = 1 for all i S. j S Consequently, while the transition matrix has n 2 elements, the Markov chain process has only n(n 1) free parameters. To make this description more concrete, consider an example (drawn from Kemeny et al, 1966, p 195). A Markov process has 3 states, with the transition matrix P = /2 1/2 1/3 0 2/3 In words, if the process is currently in state 1, it always transitions to state 2 in the next period. If the process is in state 2, it remains in state 2 with probability 1/2, and transitions to state 3 with probability 1/2. Finally, if the process is in state 3, it remains in state 3 with probability 2/3, and moves to state 1 with probability 1/3. Beyond the matrix specification of the transition probabilities, it may also be helpful to visualize a Markov chain process using a transition diagram. Below is the transition diagram for the 3 3 transition matrix given above.. 1 1/ /3 1/2 3 2/3 The nodes of the transition diagram correspond to the possible states of the process (labeled in boldface); the directed edges indicate possible transitions between states. If a transition is possible from state i to state j, the directed edge from node i to node j is labeled with the probability P (i, j). A loop (an edge from some node to itself) indicates the possibility that the process continues to occupy the same state next period. By convention, no edge is drawn from node i to node j if P (i, j) = Analysis using a probability tree Having specified a Markov chain process, we might now tackle the following sorts of questions: If the process begins in state i in period 0, what is the probability that 2

3 the process will be in state j in period 1? in period 2? in period t? in the long run, as t becomes very large? We ll soon learn simple matrix methods for answering these questions. But it might be instructive first to approach these questions by drawing a probability tree diagram. The initial node of a probability tree (aptly called the root ) is given by the state i occupied in period 0. The first set of branches are given by the edges indicating possible transitions from state i. These edges are labeled with the probabilities P (i, j) just as in the transition diagram. From each new node each state j that the process may occupy in period 1 we may then draw another set of branches indicating possible transitions from that state. This creates another set of nodes every state j that the process may occupy in period 2 which would each sprout another set of branches. Continuing in this fashion, we could (at least in principle) construct a probability tree to describe the future of the Markov chain over the first t periods (for any number of periods t). A particular Markov chain generated by the process corresponds to a single path through the tree, starting at the initial (period 0) node and ending at one of the terminal (period t) nodes. To illustrate, let s return to the 3-state example above, and suppose the process starts in state 2 in period 0. The probability tree for the next 3 periods is shown below. 2 1/2 1/2 1/2 2 1/2 1/3 1/2 2 1/2 1/3 3 2/ /3 2/3 3 2/ period 0 period 1 period 2 period 3 3

4 Having assumed (arbitrarily) that the process started in state 2 in period 0, we could also construct similar diagrams with state 1 or state 3 as the initial node. (See Kemeny et al, 1966, p 196 for the diagram starting from state 1. I ll leave the diagram for state 3 as an exercise.) Using the probability tree diagram, we can now begin to answer the questions posed above. By assumption, the process starts in state 2 in period 0. Following one of the two branches from this initial node, the process then occupies either state 2 (with probability 1/2) or state 3 (with probability 1/2) in period 1. Of course, we could have obtained those probabilities directly from the second row of the transition matrix without the aid of the tree diagram. But for period 2, the diagram becomes more useful, showing that the process has reached state 1 with probability has reached state 2 with probability and has reached state 3 with probability (1/2)(1/3) = 1/6, (1/2)(1/2) = 1/4, (1/2)(1/2) + (1/2)(2/3) = 7/12. Importantly, as reflected in this last computation, there are two different paths that the process could have taken from the initial state to the destination state. In particular, the process could have transitioned from state 2 to state 3 to state 3 (with probability (1/2)(1/2) = 1/4) or it could have transitioned from state 2 to state 2 to state 3 (with probability (1/2)(2/3) = 1/3). To obtain the probability that the process has transitioned from state 2 to state 3 through any path, we add these two probabilities together (hence 1/4 + 1/3 = 7/12). It is also important to note that 1/6 + 1/4 + 7/12 = 1, which reflects the fact that the process must occupy some state in period 2. More formally, we say that the probability vector [ 1/6 1/4 7/12 ] constitutes a probability distribution over states of the process in period 2. In a similar way, we can use the tree diagram to obtain the probability distribution over states in period 3. The process has reached state 1 with probability (1/2)(1/2)(1/3) + (1/2)(2/3)(1/3) = 7/36, 4

5 has reached state 2 with probability and has reached state 3 with probability (1/2)(1/2)(1/2) + (1/2)(1/3)(1) = 7/24, (1/2)(1/2)(1/2) + (1/2)(1/2)(2/3) + (1/2)(2/3)(2/3) = 37/72. The computations reflect 2 possible paths (of length 3) from state 2 to state 1, 2 possible paths (of length 3) from state 2 to state 2, and 3 possible paths (of length 3) from state 2 to state Analysis using matrix algebra While probability trees offer an intuitive first approach for analyzing Markov chains, it is obvious that this method would quickly become impractical as the number of states (n) or time periods (t) becomes large. Fortunately, there is a simple matrix approach involving iterated multiplication of the transition matrix. Raising the matrix P to the power t, we obtain the matrix P t. Let element (i, j) of this matrix be denoted as P t (i, j). This element has the following interpretation: P t (i, j) is the probability that a process which begin in state i in period 0 will occupy state j in period t. Consequently, given initial state i, the probability distribution over states after t periods is given by the ith row of P t matrix. Having so quickly summarized the matrix approach, perhaps an illustration would be useful. Returning again to the 3-state example, we can use Matlab to determine the (probabilistic) future of the chain after 2, 3, 4, 5, 10, or 100 periods. >> P = [0 1 0; 0 1/2 1/2; 1/3 0 2/3] % transition matrix P = >> P^2 % period >> P^3 % period

6 >> P^4 % period >> P^5 % period >> P^10 % period >> P^100 % period To reconcile these computations with our probability tree analysis, which assumed that the process was initially in state 2, consider the second row of the P 2 and P 3 matrices. Consistent with our previous analysis of period 2, we find P 2 (2, 1) =.1667, P 2 (2, 2) =.25, and P 2 (2, 3) = And consistent with our analysis of period 3, we find P 3 (2, 1) =.1944, P 3 (2, 2) =.2917, and P 3 (2, 3) = But moving beyond the results of our probability tree analysis, we can learn much more from the preceding matrix computations. Suppose that the process was initially in state 1. To use the probability tree approach, we would need to draw a new tree (with state 1 as the root). But given these computations, we can simply inspect the first row of the matrix P t to determine the probability distribution over states in period t. Similarly, without drawing a new probability tree with state 3 as the root, we can inspect the third row of P t to determine the probability distribution over states in period t. Furthermore, while the probability tree method is impractical when the number of periods is large, it is trivial (using matrix algebra software) to compute probability distributions for any number of periods t. 6

7 Having seen how to use matrix algebra, we should also consider why this approach works. The argument is already implicit in our reconciliation of the matrix computations with the probability tree diagram. But restated somewhat more formally, the argument proceeds by induction. Consider element (i, j) of the P 2 matrix. Because this element is found by multiplying row i of P by column j of P, we obtain P 2 (i, j) = k S P (i, k)p (k, j). The kth term of this sum gives the probability that the process transitions from state i to state k to state j over the course of two periods. Summing over all k, we obtain the probability that the process moves from state i in period 0 to state j in period 2 through any intermediate state in period 1. Now consider element (i, j) of the P 3 matrix. Because P 3 = P 2 P, this element can be found by multiplying row i of P 2 by column j of P, and we thus obtain P 3 (i, j) = k S P 2 (i, k)p (k, j). The kth term in this sum gives the probability that the process transitions from state i in period 0 to state k in period 2 through any intermediate state, and then transitions from state k to state j. Summing over all k, we obtain the probability that the process moves from state i in period 0 to state j in period 3 through any intermediate states in periods 1 and 2. Because this argument could be extended to cover every subsequent period, we have established that P t (i, j) is the probability that the process moves from state i in period 0 to state j in period t through any intermediate states in periods 1 through t 1. Returning to the matrix computations, it is interesting (perhaps even surprising) to find that the rows of the P t matrix become more similar as t becomes large. We ll return to this convergence result below. But first, let s consider another example which introduces a first sociological application of Markov chains. 1.4 Social mobility Sociologists have long been interested in social mobility the transition of individuals between social classes defined on the basis of income or occupation. Some research has focused on intergenerational mobility from parent s class to child s class, while other research has examined intragenerational mobility over an individual s life course. Chapter 2 will explore this several aspects of this topic in greater depth. But here, we ll develop a simple hypothetical example. Consider a society with three social classes. Each individual may belong to the lower class (state 1), the middle class (state 2), or the upper class (state 3). Thus, the social class occupied by an individual in generation t may be denoted by s t {1, 2, 3}. Further suppose that each individual in generation t has exactly one child 7

8 in generation t + 1, who has exactly one child in generation t + 2, and so on. 1 Finally, suppose that intergenerational mobility is characterized by a (3 3) transition matrix which does not change over time. Under these conditions, a single family history the sequence of social classes (s 0, s 1, s 2,...) is a Markov chain. To offer a numerical example, suppose that intergenerational mobility is described by the transition matrix P = which may, in turn, be represented by the transition diagram below Thus, a child with a lower-class parent has a 60% chance of remaining in the lower class, has a 40% chance to rise to the middle class, and has no chance to reach the upper class. A child with a middle-class parent has a 30% chance of falling to the lower class, a 40% chance of remaining middle class, and a 40% chance of rising to the upper class. Finally, a child with an upper-class parent have no chance of falling to the lower class, has a 70% chance of falling to the middle class, and has a 30% chance of remaining in the upper class. While mobility across one generation can thus be read directly from the transition matrix, how can we compute the life chances of subsequent generations? That is, how does the parent s class affect the class attained by the grandchild, greatgrandchild, etc? Having assumed that social mobility is a Markov chain process, we can again obtain the answers through matrix algebra. >> P = [.6.4 0;.3.4.3; 0.7.3] P = Two assumptions are implicit here. First, we are assuming that fertility rates do not vary across social classes. Second, we are ignoring complications that arise because individuals actually have two parents (who might themselves have come from different social classes). To use demographers terminology, we are considering a one-sex (rather than two-sex ) model. While the present model offers a useful starting point, we will begin to address differential reproduction and two-sex models in Chapters 3 and xx. 8

9 >> P^ >> P^ >> P^ >> P^ >> P^ >> P^ Given the assumed transition matrix, it is impossible for the children of lower-class parents to transition directly to the upper class. But these computations show that grandchildren of these parents have a 12% of reaching the upper class, that greatgrandchildren have greater than 15% chance, and that in the long-run (after many generations) there s a nearly 20% chance. 9

10 1.5 The limiting distribution In both of the numerical examples we ve considered, the rows of the P t matrix become increasingly similar as t rises, and become identical as t becomes very large. This convergence result may be described in several ways. We may say that, after a Markov chain process has run for many periods, the current state of the chain does not depend on the initial state of the chain. Alternatively, we may say that the limiting distribution the probability distribution over states in the long run is independent of the initial state. To define the limiting distribution more precisely, suppose that the initial state of the chain is characterized by the row vector x 0. For instance, given a Markov process with 3 states, and assuming that the process begins in state 2, this vector is x 0 = [ ], which indicates that the process is initially in state 2 with probability 1 (and the other states with probability 0). Having specified the initial condition in this fashion, the probability distribution over states in period 1 is given by x 1 = x 0 P, the probability distribution in period 2 is given by x 2 = x 1 P = (x 0 P )P = x 0 P 2, and the probability distribution in period 3 is given by x 3 = x 2 P = (x 1 P )P = ((x 0 P )P )P = x 0 P 3. By induction, the probability distribution for period t is given by x t = x 0 P t. By definition, the limiting distribution is the probability distribution x t as t becomes very large (i.e., as t approaches the limit of infinity). In the examples we have seen, the limiting distribution does not depend on the initial condition x 0. Of course, because we have considered only two examples, we should not assume that this result will hold for every Markov chain process. Indeed, it turns out that this result is guaranteed only if the transition matrix satisifies the condition described in the following Definition. 2 A square, non-negative matrix A is primitive if and only if there exists some t 1 such that every element of A t is positive (i.e., A t (i, j) > 0 for all i, j). 2 Some authors notably Kemeny and Snell (1960) refer to a primitive matrix as a regular matrix. In any case, this condition should not be mistaken for irreducibility, a weaker condition that will be discussed in Chapter 7. 10

11 Given our matrix computations, it is easy to see that both of the transition matrices we ve considered are primitive. In the first example, all elements of the P t matrix are positive for every t 3. In the second (social mobility) example, all elements of the P t matrix are positive for every t 2. Having offered this definition, the convergence result can now be stated precisely as the following Theorem. 3 Consider a Markov chain process for which the transition matrix P is primitive, and the initial state of the chain is given by x 0 so that the probability distribution in period t is given by the vector x t = x 0 P t. As t becomes very large, x t converges to the unique probability vector x such that x = xp. Thus, if the transition matrix is primitive, the probability distribution x t converges to the limiting distribution x as t becomes large. Moreover, the limiting distribution x does not depend on the initial condition x Solving algebraically for the limiting distribution We have already seen how to solve numerically for the limiting distribution. Raising the (primitive) transition matrix P to some sufficiently high power t, every row of P t is equal to x. But it is useful to recognize that the condition x = xp is a simultaneous equation system that can also be solved algebraically. To illustrate, consider again the social mobility example. The limiting distribution is determined by the condition [ x(1) x(2) x(3) ] = [ x(1) x(2) x(3) ] along with the requirement that x is a probability vector. Reverting to high-school algebra, the matrix condition can be rewritten as x(1) = 0.6 x(1) x(2) x(2) = 0.4 x(1) x(2) x(3) x(3) = 0.3 x(2) x(3) and the probability vector requirement is x(1) + x(2) + x(3) = 1. 3 See Kemeny and Snell (1960, Chapter 4) for a restatement and proof of this theorem. This theorem can also be viewed as a special case of the Perron-Frobenius Theorem, which is the central result necessary for understanding the long-run dynamics of linear systems. See Chapter 3 for more discussion. 11

12 Using the first and third equations, we obtain x(1) = (3/4) x(2) x(3) = (3/7) x(2) and substitution into the probability vector condition yields We thus obtain (3/4) x(2) + x(2) + (3/7) x(2) = 1. x(1) = 21/ x(2) = 28/ x(3) = 12/ which is consistent with our numerical computations. This algebraic approach becomes essential when the elements of the transition matrix are specified symbolically (rather than numerically). For instance, consider the generic 2-state Markov chain process, with transition matrix [ ] 1 a a P =. b 1 b Without specifying the parameters a and b we cannot solve numerically for the limiting distribution. But proceeding algebraically, it is easy to verify that the limiting distribution is given by x = [ b/(a + b) a/(a + b) ]. Intuitively, an increase in the parameter a (holding b constant) implies that the system transitions more often from state 1 to state 2, and thus increases the probability of occupying state 2 (and decreases the probability of occupying state 1) in the long run. Conversely, an increase in b (holding a constant) increases the probability that the process occupies state 1 in the long run. 1.7 A macro-level interpretation Throughout this chapter, we have so far maintained a micro-level interpretation of Markov chain processes. For instance, in the social mobility example, we adopted the perspective of a particular individual, and considered how that individual s social class affected the life chances of his child and grandchild and subsequent descendents. Thus, the initial condition x 0 reflected the individual s class, and the probability vector x t reflected the probability distribution over classes for a particular descendent in generation t. But it is also possible to adopt a macro-level interpretation of this process. Given a large population in which each individual belongs to one of the social classes, 12

13 we may interpret x 0 (i) as the share of the population in class i in generation 0. For instance, setting x 0 = [ ], we are assuming that 20% of the population belongs to the lower class, that 30% belong to the middle class, and that 50% belong to the upper class. Assuming that each individual has one child, population dynamics are determined by the equation x t = x 0 P t and we know (from the preceding theorem) that the population distribution x t will converge to the limiting distribution x as t becomes large. To illustrate, consider population dynamics over the next 15 generations (computed using a for loop in Matlab). >> x = [.2.3.5]; P = [.6.4 0;.3.4.3; 0.7.3]; >> for t = 0:15; disp(x*p^t); end

14 Thus, for this example, the population distribution has converged to the limiting distribution in less than 15 generations. Given the micro-level interpretation, Markov chain processes are stochastic. Our knowledge of an individual s current class i in period 0 allows us to derive the probability that a descendent is in class i in period t. But obviously we cannot know the realization of the process the particular social class that will occupied in period t. In contrast, adopting the macro-level interpretation, Markov chain processes generate population dynamics that are deterministic. Intuitively, each individual in the population is associated with a different Markov chain, and we can ignore sampling variation because we are averaging across the realizations of many chains Further reading Bradley and Meek (1986, Chap 6) provide an informal introduction to Markov chains. The classic undergraduate text by Kemeny, Snell, and Thompson (1966) offers a somewhat more formal (but still very readable) introduction to Markov chains (see especially Chapters 4.13 and 5.7) as well as probability theory, matrix algebra, and other useful topics in discrete mathematics. See Kemeny and Snell (1960) for a more rigorous treatment of Markov chains. 4 More formally, by assuming that the size of the population approaches infinity, we can invoke the Law of Large Numbers (see, e.g., Kemeny and Snell, 1960, p 73). 14

### 7 Communication Classes

this version: 26 February 2009 7 Communication Classes Perhaps surprisingly, we can learn much about the long-run behavior of a Markov chain merely from the zero pattern of its transition matrix. In the

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Lecture 6. Inverse of Matrix

Lecture 6 Inverse of Matrix Recall that any linear system can be written as a matrix equation In one dimension case, ie, A is 1 1, then can be easily solved as A x b Ax b x b A 1 A b A 1 b provided that

### 6.3 Conditional Probability and Independence

222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### Solution to Homework 2

Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

### Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Solving Systems of Linear Equations

LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

### A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### 2. Methods of Proof Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try.

2. METHODS OF PROOF 69 2. Methods of Proof 2.1. Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try. Trivial Proof: If we know q is true then

### Diagonal, Symmetric and Triangular Matrices

Contents 1 Diagonal, Symmetric Triangular Matrices 2 Diagonal Matrices 2.1 Products, Powers Inverses of Diagonal Matrices 2.1.1 Theorem (Powers of Matrices) 2.2 Multiplying Matrices on the Left Right by

### LS.6 Solution Matrices

LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

### Solving a System of Equations

11 Solving a System of Equations 11-1 Introduction The previous chapter has shown how to solve an algebraic equation with one variable. However, sometimes there is more than one unknown that must be determined

### 4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

### Notes on Determinant

ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

### SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections

### 3. Mathematical Induction

3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

### Just the Factors, Ma am

1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive

### Induction. Margaret M. Fleck. 10 October These notes cover mathematical induction and recursive definition

Induction Margaret M. Fleck 10 October 011 These notes cover mathematical induction and recursive definition 1 Introduction to induction At the start of the term, we saw the following formula for computing

### Elementary Number Theory We begin with a bit of elementary number theory, which is concerned

CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,

### The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

### WRITING PROOFS. Christopher Heil Georgia Institute of Technology

WRITING PROOFS Christopher Heil Georgia Institute of Technology A theorem is just a statement of fact A proof of the theorem is a logical explanation of why the theorem is true Many theorems have this

### 11 Dynamic Programming

11 Dynamic Programming EXAMPLE 1 Dynamic programming is a useful mathematical technique for making a sequence of interrelated decisions. It provides a systematic procedure for determining the optimal combination

### Pythagorean Triples. Chapter 2. a 2 + b 2 = c 2

Chapter Pythagorean Triples The Pythagorean Theorem, that beloved formula of all high school geometry students, says that the sum of the squares of the sides of a right triangle equals the square of the

### . 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9

Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a

### Introduction to Matrix Algebra I

Appendix A Introduction to Matrix Algebra I Today we will begin the course with a discussion of matrix algebra. Why are we studying this? We will use matrix algebra to derive the linear regression model

### Helpsheet. Giblin Eunson Library MATRIX ALGEBRA. library.unimelb.edu.au/libraries/bee. Use this sheet to help you:

Helpsheet Giblin Eunson Library ATRIX ALGEBRA Use this sheet to help you: Understand the basic concepts and definitions of matrix algebra Express a set of linear equations in matrix notation Evaluate determinants

### 4.6 Null Space, Column Space, Row Space

NULL SPACE, COLUMN SPACE, ROW SPACE Null Space, Column Space, Row Space In applications of linear algebra, subspaces of R n typically arise in one of two situations: ) as the set of solutions of a linear

### Circuits 1 M H Miller

Introduction to Graph Theory Introduction These notes are primarily a digression to provide general background remarks. The subject is an efficient procedure for the determination of voltages and currents

### MODULAR ARITHMETIC. a smallest member. It is equivalent to the Principle of Mathematical Induction.

MODULAR ARITHMETIC 1 Working With Integers The usual arithmetic operations of addition, subtraction and multiplication can be performed on integers, and the result is always another integer Division, on

### Math 313 Lecture #10 2.2: The Inverse of a Matrix

Math 1 Lecture #10 2.2: The Inverse of a Matrix Matrix algebra provides tools for creating many useful formulas just like real number algebra does. For example, a real number a is invertible if there is

### Homework 5 Solutions

Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which

### The Graphical Method: An Example

The Graphical Method: An Example Consider the following linear program: Maximize 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2 0, where, for ease of reference,

### DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

DIFFERENTIABILITY OF COMPLEX FUNCTIONS Contents 1. Limit definition of a derivative 1 2. Holomorphic functions, the Cauchy-Riemann equations 3 3. Differentiability of real functions 5 4. A sufficient condition

### 8 Square matrices continued: Determinants

8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

### NODAL ANALYSIS. Circuits Nodal Analysis 1 M H Miller

NODAL ANALYSIS A branch of an electric circuit is a connection between two points in the circuit. In general a simple wire connection, i.e., a 'short-circuit', is not considered a branch since it is known

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Sec 4.1 Vector Spaces and Subspaces

Sec 4. Vector Spaces and Subspaces Motivation Let S be the set of all solutions to the differential equation y + y =. Let T be the set of all 2 3 matrices with real entries. These two sets share many common

### CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

CHAPTER 3 Methods of Proofs 1. Logical Arguments and Formal Proofs 1.1. Basic Terminology. An axiom is a statement that is given to be true. A rule of inference is a logical rule that is used to deduce

### Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way

Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### Intro. to the Divide-and-Conquer Strategy via Merge Sort CMPSC 465 CLRS Sections 2.3, Intro. to and various parts of Chapter 4

Intro. to the Divide-and-Conquer Strategy via Merge Sort CMPSC 465 CLRS Sections 2.3, Intro. to and various parts of Chapter 4 I. Algorithm Design and Divide-and-Conquer There are various strategies we

### SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

### Student Outcomes. Lesson Notes. Classwork. Discussion (10 minutes)

NYS COMMON CORE MATHEMATICS CURRICULUM Lesson 5 8 Student Outcomes Students know the definition of a number raised to a negative exponent. Students simplify and write equivalent expressions that contain

### Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

### Mathematical Induction

Mathematical Induction In logic, we often want to prove that every member of an infinite set has some feature. E.g., we would like to show: N 1 : is a number 1 : has the feature Φ ( x)(n 1 x! 1 x) How

### Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.

Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than

### Row Echelon Form and Reduced Row Echelon Form

These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation

### Full and Complete Binary Trees

Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full

### We seek a factorization of a square matrix A into the product of two matrices which yields an

LU Decompositions We seek a factorization of a square matrix A into the product of two matrices which yields an efficient method for solving the system where A is the coefficient matrix, x is our variable

### Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

### MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

### Lecture Notes: Matrix Inverse. 1 Inverse Definition

Lecture Notes: Matrix Inverse Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Inverse Definition We use I to represent identity matrices,

### Operation Count; Numerical Linear Algebra

10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point

### Mathematical Induction

Mathematical Induction (Handout March 8, 01) The Principle of Mathematical Induction provides a means to prove infinitely many statements all at once The principle is logical rather than strictly mathematical,

### Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

### a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

### SYSTEMS OF EQUATIONS

SYSTEMS OF EQUATIONS 1. Examples of systems of equations Here are some examples of systems of equations. Each system has a number of equations and a number (not necessarily the same) of variables for which

### Block Diagram Reduction

Appendix W Block Diagram Reduction W.3 4Mason s Rule and the Signal-Flow Graph A compact alternative notation to the block diagram is given by the signal- ow graph introduced Signal- ow by S. J. Mason

### LINEAR SYSTEMS. Consider the following example of a linear system:

LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x +2x 2 +3x 3 = 5 x + x 3 = 3 3x + x 2 +3x 3 = 3 x =, x 2 =0, x 3 = 2 In general we want to solve n equations in

### Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

### 1 Solving LPs: The Simplex Algorithm of George Dantzig

Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

### Math212a1010 Lebesgue measure.

Math212a1010 Lebesgue measure. October 19, 2010 Today s lecture will be devoted to Lebesgue measure, a creation of Henri Lebesgue, in his thesis, one of the most famous theses in the history of mathematics.

### B such that AB = I and BA = I. (We say B is an inverse of A.) Definition A square matrix A is invertible (or nonsingular) if matrix

Matrix inverses Recall... Definition A square matrix A is invertible (or nonsingular) if matrix B such that AB = and BA =. (We say B is an inverse of A.) Remark Not all square matrices are invertible.

### Chapter 11 Number Theory

Chapter 11 Number Theory Number theory is one of the oldest branches of mathematics. For many years people who studied number theory delighted in its pure nature because there were few practical applications

### The Characteristic Polynomial

Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

### 3 Does the Simplex Algorithm Work?

Does the Simplex Algorithm Work? In this section we carefully examine the simplex algorithm introduced in the previous chapter. Our goal is to either prove that it works, or to determine those circumstances

### Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### 10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES

58 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 you saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c nn c nn...

### Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms

### Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

### UNCOUPLING THE PERRON EIGENVECTOR PROBLEM

UNCOUPLING THE PERRON EIGENVECTOR PROBLEM Carl D Meyer INTRODUCTION Foranonnegative irreducible matrix m m with spectral radius ρ,afundamental problem concerns the determination of the unique normalized

### 5.3 Determinants and Cramer s Rule

290 5.3 Determinants and Cramer s Rule Unique Solution of a 2 2 System The 2 2 system (1) ax + by = e, cx + dy = f, has a unique solution provided = ad bc is nonzero, in which case the solution is given

### Continuity of the Perron Root

Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

### 4. MATRICES Matrices

4. MATRICES 170 4. Matrices 4.1. Definitions. Definition 4.1.1. A matrix is a rectangular array of numbers. A matrix with m rows and n columns is said to have dimension m n and may be represented as follows:

### An Interesting Way to Combine Numbers

An Interesting Way to Combine Numbers Joshua Zucker and Tom Davis November 28, 2007 Abstract This exercise can be used for middle school students and older. The original problem seems almost impossibly

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 Proofs Intuitively, the concept of proof should already be familiar We all like to assert things, and few of us

### 3.4 Complex Zeros and the Fundamental Theorem of Algebra

86 Polynomial Functions.4 Complex Zeros and the Fundamental Theorem of Algebra In Section., we were focused on finding the real zeros of a polynomial function. In this section, we expand our horizons and

### Real Roots of Univariate Polynomials with Real Coefficients

Real Roots of Univariate Polynomials with Real Coefficients mostly written by Christina Hewitt March 22, 2012 1 Introduction Polynomial equations are used throughout mathematics. When solving polynomials

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### 1. LINEAR EQUATIONS. A linear equation in n unknowns x 1, x 2,, x n is an equation of the form

1. LINEAR EQUATIONS A linear equation in n unknowns x 1, x 2,, x n is an equation of the form a 1 x 1 + a 2 x 2 + + a n x n = b, where a 1, a 2,..., a n, b are given real numbers. For example, with x and

### LEARNING OBJECTIVES FOR THIS CHAPTER

CHAPTER 2 American mathematician Paul Halmos (1916 2006), who in 1942 published the first modern linear algebra book. The title of Halmos s book was the same as the title of this chapter. Finite-Dimensional

### Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

### Lecture L3 - Vectors, Matrices and Coordinate Transformations

S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between

### 1 Review of Least Squares Solutions to Overdetermined Systems

cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

### Duality in Linear Programming

Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow

### Spring 2007 Math 510 Hints for practice problems

Spring 2007 Math 510 Hints for practice problems Section 1 Imagine a prison consisting of 4 cells arranged like the squares of an -chessboard There are doors between all adjacent cells A prisoner in one

### The Determinant: a Means to Calculate Volume

The Determinant: a Means to Calculate Volume Bo Peng August 20, 2007 Abstract This paper gives a definition of the determinant and lists many of its well-known properties Volumes of parallelepipeds are

### Vector Spaces II: Finite Dimensional Linear Algebra 1

John Nachbar September 2, 2014 Vector Spaces II: Finite Dimensional Linear Algebra 1 1 Definitions and Basic Theorems. For basic properties and notation for R N, see the notes Vector Spaces I. Definition

### Basic Proof Techniques

Basic Proof Techniques David Ferry dsf43@truman.edu September 13, 010 1 Four Fundamental Proof Techniques When one wishes to prove the statement P Q there are four fundamental approaches. This document

### Factorizations: Searching for Factor Strings

" 1 Factorizations: Searching for Factor Strings Some numbers can be written as the product of several different pairs of factors. For example, can be written as 1, 0,, 0, and. It is also possible to write