Markov chains (cont d)
|
|
- Felicia Smith
- 7 years ago
- Views:
Transcription
1 6.867 Machine learning, lecture 19 (Jaaola) 1 Lecture topics: Marov chains (cont d) Hidden Marov Models Marov chains (cont d) In the contet of spectral clustering (last lecture) we discussed a random wal over the nodes induced by a weighted graph. Let W ij 0 be symmetric weights associated with the edges in the graph; W ij = 0 whenever edge doesn t eist. We also assumed that W ii = 0 for all i. The graph defines a random wal where the probability of transitioning from node (state) i to node j is given by P (X(t + 1) = j X(t) = i) = P ij = W ij W ij j (1) Note that self-transitions (going from i to i) are disallowed because W ii = 0 for all i. We can understand the random wor as a homogeneous Marov chain: the probability of transitioning from i to j only depends on i, not the path that too the process to i. In other words, the current state summarizes the past as far as the future transitions are concerned. This is a Marov (conditional independence) property: P (X(t + 1) = j X(t) = i, X(t 1) = i t 1,..., X(1) = i 1 ) = P (X(t + 1) = j X(t) = i) (2) The term homogeneous specifies that the transition probabilities are independent of time (the same probabilities are used whenever the random wal returns to i). We also defined ergodicity as follows: Marov chain is ergodic if there eist a finite m such that P (X(t + m) = j X(t) = i) > 0 for all i and j (3) Simple weighted graphs need not define ergodic chains. Consider, for eample, a weighted graph between two nodes 1 2 where W 12 > 0. The resulting random wal is necessarily periodic, i.e., A Marov chain is ergodic only when all the states are communicating and the chain is aperiodic which is clearly not the case here. Similarly, even a graph with positive weights on the edges would not define an ergodic Marov chain. Every other state would necessarily be 2, thus the chain is periodic. The reason here is
2 6.867 Machine learning, lecture 19 (Jaaola) 2 that W ii = 0. By adding a positive self-transition, we can remove periodicity (random wal would stay in the same state a variable number of steps). Any connected weighted graph with positive weights and positive self-transitions gives rise to an ergodic Marov chain. Our definition of the random wal so far is a bit incomplete. We did not specify how the process started, i.e., we didn t specify the initial state distribution. Let q(i) be the probability that the random wal is started from state i. We will use q as a vector of probabilities across states (reserving n for the number training eamples as usual). There are two ways of describing Marov chains: through state transition diagrams or as simple graphical models. The descriptions are complementary. A transition diagram is a directed graph over the possible states where the arcs between states specify all allowed transitions (those occuring with non-zero probability). See Figure 1 for eamples. We could also add the initial state distribution as transitions from a dedicated initial (null) state (not shown in the figure) Figure 1: Eamples of transition diagrams defining non-ergodic Marov chains. In graphical models, on the other hand, we focus on eplicating variables and their dependencies. At each time point the random wal is in a particular state X(t). This is a random variable. It s value is only affected by the random variable X(t 1) specifying the state of the random wal at the previous time point. Graphically, we can therefore write a sequence of random variables where arcs specify how the values of the variables are influenced by others (dependent on others). More precisely, X(t 1) X(t) means that the value of X(t) depends on X(t 1). Put another way, in simulating the random wal, we would have to now the value of X(t 1) in order to sample a value for X(t). The graphical model is shown in Figure 2. State prediction We will cast the problem of calculating the predictive probabilities over states in a form
3 6.867 Machine learning, lecture 19 (Jaaola) 3... X(t 1) X(t) X(t + 1)... Figure 2: Marov chain as a graphical model. that will be useful for Hidden Marov Models later on. Since we can also write for any n P (X(t + m) = j X(t) = i) = [P m ] ij (4) P (X(n) = j) = q(i)p (X(n) = j X(1) = i) = q(i)[p ] ij (5) In a vector form q T P is a row vector whose j th component is P (X(n) = j). Note that the matri products involve summing over all the intermediate states until X(n) = j. More eplicitly, let s evaluate the sum over all the states 1,..., n in the matri form as n 1 times {}}{ P (X(1) = 1 ) P (X(t + 1) = t+1 X(t) = t ) = q T P P P P 1 = 1 (6) 1,..., n t=1 This is a sum over n possible state configurations (settings of 1,..., n ) but can be easily performed in terms of matri products. We can understand this in terms of recursive evaluation of t step probabilities α t (i) = P (X(t) = i). We will write α t for the corresponding column vector so that Clearly, Estimation t 1 times {}}{ q T T P P P P = α t α t 1 (i)p ij = α t (j) (10) (7) q T T = α 1 (8) T T α t 1 P = α t, t > 1 (9)
4 6.867 Machine learning, lecture 19 (Jaaola) 4 Marov models can be estimated easily from observed sequences of states. Given 1,..., n (e.g., ), the log-lielihood of the observed sequence is given by [ ] log P ( 1,..., n ) = log P (X(1) = 1 ) P (X(t + 1) = t+1 X(t) = t ) (11) t=1 = log q( 1 ) + log P t,t+1 (12) t=1 = log q( 1 ) + nˆ(i, j) log P ij (13) i,j where ˆn(i, j) is the number of observed transitions from i to j in the sequence 1,..., n. The resulting maimum lielihood setting of P ij is obtained as an empirical fraction nˆ(i, j) Pˆij = j nˆ(i, j ) (14) Note that q(i) can only be reliably estimated from multiple observed sequences. For eample, based on 1,..., n, we would simply set ˆq(i) = δ(i, 1 ) which is hardly accurate (sample size one). Regularization is useful here, as before. Hidden Marov Models Hidden Marov Models (HMMs) etend Marov models by assuming that the states of the Marov chain are not observed directly, i.e., the Marov chain itself remains hidden. We therefore also model how the states relate to the actual observations. This assumption of a simple Marov model underlying a sequence of observations is very useful in many practical contets and has made HMMs very popular models of sequence data, from speech recognition to bio-sequences. For eample, to a first approimation, we may view words in speech as Marov sequences of phonemes. Phonemes are not observed directly, however, but have to be related to acoustic observations. Similarly, in modeling protein sequences (sequences of amino acid residues), we may, again approimately, describe a protein molecule as a Marov sequence of structural characteristics. The structural features are typically not observable, only the actual residues. We can understand HMMs by combining miture models and Marov models. Consider the simple eample in Figure 3 over four discrete time points t = 1, 2, 3, 4. The figure summarizes multiple sequences of observations y 1,..., y 4, where each observation sequence
5 6.867 Machine learning, lecture 19 (Jaaola) 5 corresponds to a single value y t per time point. Let s begin by ignoring the time information and instead collapse the observations across the time points. The observations form two clusters are now well modeled by a two component miture: 2 P (y) = P (j)p (y j) (15) j=1 where, e.g., P (y j) could be a Gaussian N(y; µ j, σj 2 ). By collapsing the observations we are effectively modeling the data at each time point with the same miture model. If we generate data form the resulting miture model we would select a miture component at random at each time step and generate the observation from the corresponding component (cluster). There s nothing that ties the selection of miture components in time so that samples from the miture yield phantom clusters at successive time points (we select the wrong component/cluster with equal probability). By omitting the time information, we therefore place half of the probability mass in locations with no data. Figure 4 illustrates the miture model as a graphical model. y y a) t = 1 t = 2 t = 3 t = 4 time b) t = 1 t = 2 t = 3 t = 4 time Figure 3: a) Eample data over four time points, b) actual data and ranges of samples generated from a miture model (red ovals) estimated without time information. The solution is to model the selection of the miture components as a Marov model, i.e., the component at t = 2 is selected on the basis of the component used at t = 1. Put another way, each state in the Marov model now uses one of the components in the miture model to generate the corresponding observation. As a graphical model, the miture model is a combination of the two as shown in Figure 5. Probability model One advantage of representing the HMM as a graphical model is that we can easily write down the joint probability distribution over all the variables. The graph eplicates how the
6 6.867 Machine learning, lecture 19 (Jaaola) 6 X(1) X(2) X(3) X(4) Y (1) Y (2) Y (3) Y (4) Figure 4: A graphical model view of the miture model over the four time points. The variables are indeed by time (different samples would be drawn at each time point) but the parameters are shared across the four time points. X(t) refers to the selection of the miture component while Y (t) refers to the observations. variables depend on each other (who influences who) and thus highlights which conditional probabilities we need to write down: P ( 1,..., n, y 1,..., y n ) = P ( 1 )P (y 1 1 )P ( 2 1 )P (y 2 2 )... (16) = P ( 1 )P (y 1 1 ) [P ( t+1 t )P (y t+1 t+1 )] (17) t=1 [ ] = q( 1 )P (y 1 1 ) P t,t+1 P (y t+1 t+1 ) (18) where we have used the same notation as before for the Marov chains. t=1 X(1) X(2) X(3) X(4) Y (1) Y (2) Y (3) Y (4) Figure 5: HMM as a graphical model. It is a Marov model where each state is associated with a distribution over observations. Alternatively, we can view it as a miture model where the miture components are selected in a time dependent manner. Three problems to solve We typically have to be able to solve the following three problems in order to use these models effectively:
7 6.867 Machine learning, lecture 19 (Jaaola) 7 1. Evaluate the probability of observed data or P (y 1,..., y n ) = P ( 1,..., n, y 1,..., y n ) (19) 1,..., n 2. Find the most liely hidden state sequence 1,..., n given observations y 1,..., y n, i.e., {1,..., n} = arg ma P ( 1,..., n, y 1,..., y n ) (20) 1,..., n 3. Estimate the parameters of the model from multiple sequences of y (l) ) 1,..., y n (ll, l = 1,..., L. Problem 1 As in the contet of Marov chains we can efficiently sum over the possible hidden state sequences. Here the summation means evaluating P (y 1,..., y n ). We will perform this in two ways depending on whether the recursion moves forward in time, computing α t (j), or bacward in time, evaluating β t (i). The only change from before is the fact that whatever state we happen to visit at time t, we will also have to generate the observation y t from that state. This additional requirement of generating the observations can be included via diagonal matrices P (y 1) 0 D y = (21) 0 P (y ) So, for eample, Similarly, q T D y1 1 = q(i)p (y 1 i) = P (y 1 ) (22) q T D y1 P D y2 1 = q(i)p (y 1 i) P ij P (y 2 j) = P (y 1, y 2 ) (23) j=1 We can therefore write the forward and bacward algorithms as methods that perform the matri multiplications in q T D y1 P D y2 P P D yn 1 = P (y 1,..., y n ) (24)
8 6.867 Machine learning, lecture 19 (Jaaola) 8 either in the forward or bacward direction. In terms of the forward pass algorithm: q T D y1 = α1 T (25) α T P D yt = αt T, or equivalently (26) ( ) t 1 α t 1 (i)p ij P (y t j) = α t (j) (27) These values hold eactly α t (j) = P (y 1,..., y t, X(t) = j) since we have generated all the observations up to and including y t and have summed over all the states ecept for the last one X(t). The bacward pass algorithm is similarly defined as: β n = 1 (28) β t = P D yt+1 β t+1, or equivalently (29) β t (i) = P ij P (y t+1 j)β t+1 (j) (30) j=1 In this case β t (i) = P (y t+1,..., y n X(t) = i) since we have summed over all the possible values of the state variables X(t + 1),..., X(n), starting from a fied X(t) = i, and the first observation we have generated in the recursion is y t+1. By combining the two recursions we can finally evaluate T P (y 1,..., y n ) = α t β t = α t (i)β t (i) (31) which holds for any t = 1,..., n. You can understand this result in two ways: either in terms of performing the remaining matri multiplication corresponding to the two parts α T t {}}{{}}{ P (y 1,..., y n ) = (q T D y1 P P D yt ) (P D yt+1 P D yn 1) (32) or as an illustration of the Marov property: α t(i) β t(i) {}}{{}}{ P (y 1,..., y n ) = P (y 1,..., y t, X(t) = i) P (y t+1,..., y n X(t) = i) (33) β t
9 6.867 Machine learning, lecture 19 (Jaaola) 9 Also, since β n (i) = 1 for all i, clearly P (y 1,..., y n ) = α n (i) = P (y 1,..., y t, X(t) = i) (34)
Nonlinear Systems of Ordinary Differential Equations
Differential Equations Massoud Malek Nonlinear Systems of Ordinary Differential Equations Dynamical System. A dynamical system has a state determined by a collection of real numbers, or more generally
More informationRandom variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
More informationMoving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
More informationCOMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationSolving Systems of Linear Equations Using Matrices
Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More information1.2 Solving a System of Linear Equations
1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationFormal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationChapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs
Chapter 4. Polynomial and Rational Functions 4.1 Polynomial Functions and Their Graphs A polynomial function of degree n is a function of the form P = a n n + a n 1 n 1 + + a 2 2 + a 1 + a 0 Where a s
More information3/25/2014. 3/25/2014 Sensor Network Security (Simon S. Lam) 1
Sensor Network Security 3/25/2014 Sensor Network Security (Simon S. Lam) 1 1 References R. Blom, An optimal class of symmetric key generation systems, Advances in Cryptology: Proceedings of EUROCRYPT 84,
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationLanguage Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
More informationStudent Outcomes. Lesson Notes. Classwork. Discussion (10 minutes)
NYS COMMON CORE MATHEMATICS CURRICULUM Lesson 5 8 Student Outcomes Students know the definition of a number raised to a negative exponent. Students simplify and write equivalent expressions that contain
More informationRecurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
More information1 Determine whether an. 2 Solve systems of linear. 3 Solve systems of linear. 4 Solve systems of linear. 5 Select the most efficient
Section 3.1 Systems of Linear Equations in Two Variables 163 SECTION 3.1 SYSTEMS OF LINEAR EQUATIONS IN TWO VARIABLES Objectives 1 Determine whether an ordered pair is a solution of a system of linear
More information3.2 Roulette and Markov Chains
238 CHAPTER 3. DISCRETE DYNAMICAL SYSTEMS WITH MANY VARIABLES 3.2 Roulette and Markov Chains In this section we will be discussing an application of systems of recursion equations called Markov Chains.
More information6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
More informationMinimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring
Minimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring Qiang Zheng, Student Member, IEEE, and Guohong Cao, Fellow, IEEE Department of Computer Science and Engineering
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationA linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form
Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More information1 Representation of Games. Kerschbamer: Commitment and Information in Games
1 epresentation of Games Kerschbamer: Commitment and Information in Games Game-Theoretic Description of Interactive Decision Situations This lecture deals with the process of translating an informal description
More informationx y The matrix form, the vector form, and the augmented matrix form, respectively, for the system of equations are
Solving Sstems of Linear Equations in Matri Form with rref Learning Goals Determine the solution of a sstem of equations from the augmented matri Determine the reduced row echelon form of the augmented
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationAffine Transformations
A P P E N D I X C Affine Transformations CONTENTS C The need for geometric transformations 335 C2 Affine transformations 336 C3 Matri representation of the linear transformations 338 C4 Homogeneous coordinates
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationSurvey and remarks on Viro s definition of Khovanov homology
RIMS Kôkyûroku Bessatsu B39 (203), 009 09 Survey and remarks on Viro s definition of Khovanov homology By Noboru Ito Abstract This paper reviews and offers remarks upon Viro s definition of the Khovanov
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More information1 Simulating Brownian motion (BM) and geometric Brownian motion (GBM)
Copyright c 2013 by Karl Sigman 1 Simulating Brownian motion (BM) and geometric Brownian motion (GBM) For an introduction to how one can construct BM, see the Appendix at the end of these notes A stochastic
More information1.5 SOLUTION SETS OF LINEAR SYSTEMS
1-2 CHAPTER 1 Linear Equations in Linear Algebra 1.5 SOLUTION SETS OF LINEAR SYSTEMS Many of the concepts and computations in linear algebra involve sets of vectors which are visualized geometrically as
More informationHidden Markov Models 1
Hidden Markov Models 1 Hidden Markov Models A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (modified by Erhard and Car line Rank) Signal Processing
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationAmino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
More information8 Divisibility and prime numbers
8 Divisibility and prime numbers 8.1 Divisibility In this short section we extend the concept of a multiple from the natural numbers to the integers. We also summarize several other terms that express
More informationApproximation of Aggregate Losses Using Simulation
Journal of Mathematics and Statistics 6 (3): 233-239, 2010 ISSN 1549-3644 2010 Science Publications Approimation of Aggregate Losses Using Simulation Mohamed Amraja Mohamed, Ahmad Mahir Razali and Noriszura
More informationChapter 15: Dynamic Programming
Chapter 15: Dynamic Programming Dynamic programming is a general approach to making a sequence of interrelated decisions in an optimum way. While we can describe the general characteristics, the details
More informationReinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu
More informationMulti-state transition models with actuarial applications c
Multi-state transition models with actuarial applications c by James W. Daniel c Copyright 2004 by James W. Daniel Reprinted by the Casualty Actuarial Society and the Society of Actuaries by permission
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More informationPigeonhole Principle Solutions
Pigeonhole Principle Solutions 1. Show that if we take n + 1 numbers from the set {1, 2,..., 2n}, then some pair of numbers will have no factors in common. Solution: Note that consecutive numbers (such
More informationNormal Approximation. Contents. 1 Normal Approximation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College
Introductory Statistics Lectures Normal Approimation To the binomial distribution Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission
More informationNEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics
More informationSection 1-4 Functions: Graphs and Properties
44 1 FUNCTIONS AND GRAPHS I(r). 2.7r where r represents R & D ependitures. (A) Complete the following table. Round values of I(r) to one decimal place. r (R & D) Net income I(r).66 1.2.7 1..8 1.8.99 2.1
More informationDistributed Caching Algorithms for Content Distribution Networks
Distributed Caching Algorithms for Content Distribution Networks Sem Borst, Varun Gupta, Anwar Walid Alcatel-Lucent Bell Labs, CMU BCAM Seminar Bilbao, September 30, 2010 Introduction Scope: personalized/on-demand
More informationSession 6 Number Theory
Key Terms in This Session Session 6 Number Theory Previously Introduced counting numbers factor factor tree prime number New in This Session composite number greatest common factor least common multiple
More informationMonte Carlo-based statistical methods (MASM11/FMS091)
Monte Carlo-based statistical methods (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 7, 2014 M. Wiktorsson
More informationT cell Epitope Prediction
Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments
More informationReinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de
More information6 Systems Represented by Differential and Difference Equations
6 Systems Represented by ifferential and ifference Equations An important class of linear, time-invariant systems consists of systems represented by linear constant-coefficient differential equations in
More informationRegions in a circle. 7 points 57 regions
Regions in a circle 1 point 1 region points regions 3 points 4 regions 4 points 8 regions 5 points 16 regions The question is, what is the next picture? How many regions will 6 points give? There's an
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More information7.7 Solving Rational Equations
Section 7.7 Solving Rational Equations 7 7.7 Solving Rational Equations When simplifying comple fractions in the previous section, we saw that multiplying both numerator and denominator by the appropriate
More information5544 = 2 2772 = 2 2 1386 = 2 2 2 693. Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1
MATH 13150: Freshman Seminar Unit 8 1. Prime numbers 1.1. Primes. A number bigger than 1 is called prime if its only divisors are 1 and itself. For example, 3 is prime because the only numbers dividing
More informationSection 3-7. Marginal Analysis in Business and Economics. Marginal Cost, Revenue, and Profit. 202 Chapter 3 The Derivative
202 Chapter 3 The Derivative Section 3-7 Marginal Analysis in Business and Economics Marginal Cost, Revenue, and Profit Application Marginal Average Cost, Revenue, and Profit Marginal Cost, Revenue, and
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationThe Basics of FEA Procedure
CHAPTER 2 The Basics of FEA Procedure 2.1 Introduction This chapter discusses the spring element, especially for the purpose of introducing various concepts involved in use of the FEA technique. A spring
More informationGround Rules. PC1221 Fundamentals of Physics I. Kinematics. Position. Lectures 3 and 4 Motion in One Dimension. Dr Tay Seng Chuan
Ground Rules PC11 Fundamentals of Physics I Lectures 3 and 4 Motion in One Dimension Dr Tay Seng Chuan 1 Switch off your handphone and pager Switch off your laptop computer and keep it No talking while
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More information0 0 such that f x L whenever x a
Epsilon-Delta Definition of the Limit Few statements in elementary mathematics appear as cryptic as the one defining the limit of a function f() at the point = a, 0 0 such that f L whenever a Translation:
More informationMATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More information8 Square matrices continued: Determinants
8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You
More informationMath 202-0 Quizzes Winter 2009
Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile
More informationFFT Algorithms. Chapter 6. Contents 6.1
Chapter 6 FFT Algorithms Contents Efficient computation of the DFT............................................ 6.2 Applications of FFT................................................... 6.6 Computing DFT
More informationRecursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1
Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms
More informationAdvanced Software Test Design Techniques Use Cases
Advanced Software Test Design Techniques Use Cases Introduction The following is an excerpt from my recently-published book, Advanced Software Testing: Volume 1. This is a book for test analysts and test
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More informationUNCOUPLING THE PERRON EIGENVECTOR PROBLEM
UNCOUPLING THE PERRON EIGENVECTOR PROBLEM Carl D Meyer INTRODUCTION Foranonnegative irreducible matrix m m with spectral radius ρ,afundamental problem concerns the determination of the unique normalized
More informationNote on growth and growth accounting
CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis
More informationReading 13 : Finite State Automata and Regular Expressions
CS/Math 24: Introduction to Discrete Mathematics Fall 25 Reading 3 : Finite State Automata and Regular Expressions Instructors: Beck Hasti, Gautam Prakriya In this reading we study a mathematical model
More informationWhat is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationC3: Functions. Learning objectives
CHAPTER C3: Functions Learning objectives After studing this chapter ou should: be familiar with the terms one-one and man-one mappings understand the terms domain and range for a mapping understand the
More information5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationLecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
More informationCS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010
CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison Gomory-Hu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More information5 Homogeneous systems
5 Homogeneous systems Definition: A homogeneous (ho-mo-jeen -i-us) system of linear algebraic equations is one in which all the numbers on the right hand side are equal to : a x +... + a n x n =.. a m
More informationChapter 5 Discrete Probability Distribution. Learning objectives
Chapter 5 Discrete Probability Distribution Slide 1 Learning objectives 1. Understand random variables and probability distributions. 1.1. Distinguish discrete and continuous random variables. 2. Able
More information1 The Brownian bridge construction
The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof
More informationModeling and Performance Evaluation of Computer Systems Security Operation 1
Modeling and Performance Evaluation of Computer Systems Security Operation 1 D. Guster 2 St.Cloud State University 3 N.K. Krivulin 4 St.Petersburg State University 5 Abstract A model of computer system
More information(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
More informationDynamic Programming 11.1 AN ELEMENTARY EXAMPLE
Dynamic Programming Dynamic programming is an optimization approach that transforms a complex problem into a sequence of simpler problems; its essential characteristic is the multistage nature of the optimization
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More information