Hidden Markov Models 1
|
|
|
- Cynthia Gertrude Dawson
- 9 years ago
- Views:
Transcription
1 Hidden Markov Models 1
2 Hidden Markov Models A Tutorial for the Course Computational Intelligence Barbara Resch (modified by Erhard and Car line Rank) Signal Processing and Speech Communication Laboratory Inffeldgasse 16c Abstract This tutorial gives a gentle introduction to Markov models and hidden Markov models (HMMs) and relates them to their use in automatic speech recognition. Additionally, the Viterbi algorithm is considered, relating the most likely state sequence of a HMM to a given sequence of observations. Usage To make full use of this tutorial you should 1. Download the file HMM.zip 1 which contains this tutorial and the accompanying Matlab programs. 2. Unzip HMM.zip which will generate a subdirectory named HMM/matlab where you can find all the Matlab programs. 3. Add the folder HMM/matlab and the subfolders to the Matlab search path with a command like addpath( C:\Work\HMM\matlab ) if you are using a Windows machine or addpath( /home/jack/hmm/matlab ) if you are using a Unix/Linux machine. Sources This tutorial is based on Markov Models and Hidden Markov Models - A Brief Tutorial International Computer Science Institute Technical Report TR , by Eric Fosler-Lussier, EPFL lab notes Introduction to Hidden Markov Models by Hervé Bourlard, Sacha Krstulović, and Mathew Magimai-Doss, and HMM-Toolbox (also included in BayesNet Toolbox) for Matlab by Kevin Murphy. 1 Markov Models Let s talk about the weather. Let s say in Graz, there are three types of weather: sunny, rainy, and foggy. Let s assume for the moment that the weather lasts all day, i.e., it doesn t change from rainy to sunny in the middle of the day. Weather prediction is about trying to guess what the weather will be like tomorrow based on the observations of the weather in the past (the history). Let s set up a statistical model for weather prediction: 1 2
3 We collect statistics on what the weather q n is like today (on day n) depending on what the weather was like yesterday q n 1, the day before q n 2, and so forth. We want to find the following conditional probabilities P (q n q n 1, q n 2,..., q 1 ), (1) meaning, the probability of the unknown weather at day n, q n {,, }, depending on the (known) weather q n 1, q n 2,... of the past days. Using the probability in eq. 1, we can make probabilistic predictions of the type of weather for tomorrow and the next days using the observations of the weather history. For example, if we knew that the weather for the past three days was {,, } in chronological order, the probability that tomorrow would be is given by: P (q 4 = q 3 =, q 2 =, q 1 = ). (2) This probability could be inferred from the relative frequency (the statistics) of past observations of weather sequences {,,, }. Here s one problem: the larger n is, the more observations we must collect. Suppose that n = 6, then we have to collect statistics for 3 (6 1) = 243 past histories. Therefore, we will make a simplifying assumption, called the Markov assumption: For a sequence {q 1, q 2,..., q n }: P (q n q n 1, q n 2,..., q 1 ) = P (q n q n 1 ). (3) This is called a first-order Markov assumption: we say that the probability of a certain observation at time n only depends on the observation q n 1 at time n 1. (A second-order Markov assumption would have the probability of an observation at time n depend on q n 1 and q n 2. In general, when people talk about a Markov assumption, they usually mean the first-order Markov assumption.) A system for which eq. 3 is true is a (first-order) Markov model, and an output sequence {q i } of such a system is a (first-order) Markov chain. We can also express the probability of a certain sequence {q 1, q 2,..., q n } (the joint probability of certain past and current observations) using the Markov assumption: 2 P (q 1,..., q n ) = n P (q i q i 1 ). (4) The Markov assumption has a profound affect on the number of histories that we have to find statistics for we now only need 3 3 = 9 numbers (P (q n q n 1 ) for every possible combination of q n, q n 1 {,, }) to characterize the probabilities of all possible sequences. The Markov assumption may or may not be a valid assumption depending on the situation (in the case of weather, it s probably not valid), but it is often used to simplify modeling. So let s arbitrarily pick some numbers for P (q tomorrow q today ), as given in table 1 (note, that whatever the weather is today there certainly is some kind of weather tomorrow, so the probabilities in every row of table 1 sum up to one). For first-order Markov models, we can use these probabilities to draw a probabilistic finite state automaton. For the weather domain, we would have three states, S = {,, }, and every day there would be a possibility p(q n q n 1 ) of a transition to a (possibly different) state according to the probabilities in table 1. Such an automaton would look like shown in figure Examples i=1 1. Given that today the weather is, what s the probability that tomorrow is and the day after is? 2 One question that comes to mind is What is q 0? In general, one can think of q 0 as the start word, so P (q 1 q 0) is the probability that q 1 can start a sentence. We can also just multiply a prior probability of q 1 with the product of Q n i=2 P (qi qi 1), it s just a matter of definitions. 3
4 Today s weather Tomorrow s weather Table 1: Probabilities p(q n+1 q n ) of tomorrow s weather based on today s weather Sunny Foggy Rainy Figure 1: Markov model for the Graz weather with state transition probabilities according to table 1 Using the Markov assumption and the probabilities in table 1, this translates into: P (q 2 =, q 3 = q 1 = ) = P (q 3 = q 2 =, q 1 = ) P (q 2 = q 1 = ) = P (q 3 = q 2 = ) P (q 2 = q 1 = ) (Markov assumption) = = 0.04 You can also think about this as moving through the automaton (figure 1), multiplying the probabilities along the path you go. 2. Assume, the weather yesterday was q 1 =, and today it is q 2 =, what is the probability that tomorrow it will be q 3 =? P (q 3 = q 2 =, q 1 = ) = P (q 3 = q 2 = ) (Markov assumption) = Given that the weather today is q 1 =, what is the probability that it will be two days from now: q 3 =. (Hint: There are several ways to get from today to two days from now. You have to sum over these paths.) 2 Hidden Markov Models (HMMs) So far we heard of the Markov assumption and Markov models. So, what is a Hidden Markov Model? Well, suppose you were locked in a room for several days, and you were asked about the weather outside. 4
5 The only piece of evidence you have is whether the person who comes into the room bringing your daily meal is carrying an umbrella ( ) or not ( ). Let s suppose the probabilities shown in table 2: The probability that your caretaker carries an umbrella is 0.1 if the weather is sunny, 0.8 if it is actually raining, and 0.3 if it is foggy. Weather Probability of umbrella Sunny 0.1 Rainy 0.8 Foggy 0.3 Table 2: Probability P (x i q i ) of carrying an umbrella (x i = true) based on the weather q i on some day i The equation for the weather Markov process before you were locked in the room was (eq. 4): P (q 1,..., q n ) = n P (q i q i 1 ). However, the actual weather is hidden from you. Finding the probability of a certain weather q i {,, } can only be based on the observation x i, with x i =, if your caretaker brought an umbrella on day i, and x i = if the caretaker did not bring an umbrella. This conditional probability P (q i x i ) can be rewritten according to Bayes rule: i=1 P (q i x i ) = P (x i q i )P (q i ), P (x i ) or, for n days, and weather sequence Q = {q 1,..., q n }, as well as umbrella sequence X = {x 1,,..., x n } P (q 1,..., q n x 1,..., x n ) = P (x 1,..., x n q 1,..., q n )P (q 1,..., q n ), P (x 1,..., x n ) using the probability P (q 1,..., q n ) of a Markov weather sequence from above, and the probability P (x 1,..., x n ) of seeing a particular sequence of umbrella events (e.g., {,, }). The probability P (x 1,..., x n q 1,..., q n ) can be estimated as n i=1 P (x i q i ), if you assume that, for all i, the q i, x i are independent of all x j and q j, for all j i. We want to draw conclusions from our observations (if the persons carries an umbrella or not) about the weather outside. We can therefore omit the probability of seeing an umbrella P (x 1,..., x n ) as it is independent of the weather, that we like to predict. We get a measure for the probability, which is proportional to the probability, and which we will refer as the likelihood L. P (q 1,..., q n x 1,..., x n ) L(q 1,..., q n x 1,..., x n ) = P (x 1,..., x n q 1,..., q n ) P (q 1,..., q n ) (5) With our (first order) Markov assumption it turns to: P (q 1,..., q n x 1,..., x n ) n n L(q 1,..., q n x 1,..., x n ) = P (x i q i ) P (q i q i 1 ) i=1 i=1 (6) Examples 1. Suppose the day you were locked in it was sunny. The next day, the caretaker carried an umbrella into the room. You would like to know, what the weather was like on this second day. 5
6 First we calculate the likelihood for the second day to be sunny: L(q 2 = q 1 =, x 2 = ) = P (x 2 = q 2 = ) P (q 2 = q 1 = ) then for the second day to be rainy: = = 0.08, L(q 2 = q 1 =, x 2 = ) = P (x 2 = q 2 = ) P (q 2 = q 1 = ) and finally for the second day to be foggy: = = 0.04, L(q 2 = q 1 =, x 2 = ) = P (x 2 = q 2 = ) P (q 2 = q 1 = ) = = Thus, although the caretaker did carry an umbrella, it is most likely that on the second day the weather was sunny. 2. Suppose you do not know how the weather was when your were locked in. The following three days the caretaker always comes without an umbrella. Calculate the likelihood for the weather on these three days to have been {q 1 =, q 2 =, q 3 = }. As you do not know how the weather is on the first day, you assume the 3 weather situations are equi-probable on this day (cf. footnote on page 3), and the prior probability for sun on day one is therefore P (q 1 = q 0 ) = P (q 1 = ) = 1/3. L(q 1 =, q 2 =, q 3 = x 1 =, x 2 =, x 3 = ) = P (x 1 = q 1 = ) P (x 2 = q 2 = ) P (x 3 = q 3 = ) P (q 1 = ) P (q 2 = q 1 = ) P (q 3 = q 2 = ) = / = (7) 2.1 HMM Terminology A HMM Model is specified by: - The set of states S = {s 1, s 2,..., s Ns }, (corresponding to the three possible weather conditions above), and a set of parameters Θ = {π, A, B}: - The prior probabilities π i = P (q 1 = s i ) are the probabilities of s i being the first state of a state sequence. Collected in a vector π. (The prior probabilities were assumed equi-probable in the last example, π i = 1/N s.) - The transition probabilities are the probabilities to go from state i to state j: a i,j = P (q n+1 = s j q n = s i ). They are collected in the matrix A. - The emission probabilities characterize the likelihood of a certain observation x, if the model is in state s i. Depending on the kind of observation x we have: - for discrete observations, x n {v 1,..., v K }: b i,k = P (x n = v k q n = s i ), the probabilities to observe v k if the current state is q n = s i. The numbers b i,k can be collected in a matrix B. (This would be the case for the weather model, with K = 2 possible observations v 1 = and v 2 =.) - for continuous valued observations, e.g., x n R D : A set of functions b i (x n ) = p(x n q n = s i ) describing the probability densities (probability density functions, pdfs) over the observation space for the system being in state s i. Collected in the vector B(x) of functions. Emission pdfs are often parametrized, e.g, by mixtures of Gaussians. 6
7 The operation of a HMM is characterized by - The (hidden) state sequence Q = {q 1, q 2,..., q N }, q n S, (the sequence of the weather conditions from day 1 to N). - The observation sequence X = {x 1, x 2,..., x N }. A HMM allowing for transitions from any emitting state to any other emitting state is called an ergodic HMM. The other extreme, a HMM where the transitions only go from one state to itself or to a unique follower is called a left-right HMM. Useful formula: - Probability of a state sequence: the probability of a state sequence Q = {q 1, q 2,..., q N } coming from a HMM with parameters Θ corresponds to the product of the transition probabilities from one state to the following: N 1 P (Q Θ) = π q1 a qn,qn+1 = π q1 a q1,q 2 a q2,q 3... a qn 1,q N (8) n=1 - Likelihood of an observation sequence given a state sequence, or likelihood of an observation sequence along a single path: given an observation sequence X = {x 1, x 2,..., x N } and a state sequence Q = {q 1, q 2,..., q N } (of the same length) determined from a HMM with parameters Θ, the likelihood of X along the path Q is equal to: P (X Q, Θ) = N P (x n q n, Θ) = b q1,x 1 b q2,x 2... b qn,x N (9) n=1 i.e., it is the product of the emission probabilities computed along the considered path. - Joint likelihood of an observation sequence X and a path Q: it is the probability that X and Q occur simultaneously, p(x, Q Θ), and decomposes into a product of the two quantities defined previously: P (X, Q Θ) = P (X Q, Θ) P (Q Θ) (Bayes) (10) - Likelihood of a sequence with respect to a HMM: the likelihood of an observation sequence X = {x 1, x 2,..., x N } with respect to a Hidden Markov Model with parameters Θ expands as follows: P (X Θ) = all Q P (X, Q Θ) i.e., it is the sum of the joint likelihoods of the sequence over all possible state sequences Q allowed by the model. 2.2 Trellis Diagram A trellis diagram can be used to visualize likelihood calculations of HMMs. Figure 2 shows such a diagram for a HMM with 3 states. Each column in the trellis shows the possible states of the weather at a certain time n. Each state in one column is connected to each state in the adjacent columns by the transition likelihood given by the the elements a i,j of the transition matrix A (shown for state 1 at time 1 in figure 2). At the bottom is the observation sequence X = {x 1,..., x N }. b i,k is the likelihood of the observation x n = v k in state q n = s i at time n. Figure 3 shows the trellis diagram for Example 2 in sect The likelihood of the state sequence given the observation sequence can be found by simply following the path in the trellis diagram, multiplying the observation and transition likelihoods. L = π b, a, b, a, b, (11) = 1/ (12) 7
8 State 1 b 1,k a 1,1 b 1,k b 1,k b 1,k a 1,2 State 2 a 1, State 3 b 3,k b 3,k b 3,k b 3,k Sequence: x 1 x 2 x i x N n = 1 n = 2 n = i n = N time Figure 2: Trellis diagram b, = 0.9 b, = 0.9 a, = 0.15 a, = 0.2 STATES b, = 0.7 Sequence: x 1 = x 2 = x 3 = n=1 n=2 n=3 time Figure 3: Trellis diagram for Example 2 in sect Generating samples from Hidden Markov Models Question Find the parameter set Θ = {π, A, B} that describes the weather HMM. Let s suppose that the prior probabilities are the same for each state Experiment Let s generate a sample sequence X coming from our weather HMM. First you have to specify the parameter set Θ = {π, A, B} that describes the model. >> %Choose the prior probabilities as you like: E.g., first day sunny >> Pi w=[1 0 0] The transition matrix is given in table 1: >> A w = [ ; ; ] As the observation probabilities in this case are discrete probabilities, they can be saved in a matrix: >> B w = [ ; ; ] In this observation matrix B, using the numbers from table 2, the rows correspond to the states and the columns to our 2 discrete observations, i.e., b 1,1 = p(umbrella = ) and b 1,2 = p(umbrella = ) is the probability of seeing an umbrella, resp. not seeing an umbrella, if the weather is in state s 1 : q n =. Use the function sample dhmm to do several draws with the weather model. View the resulting samples and state sequences with the help of the function plotseq w. In the matrix containing the data, 1 stands for umbrella= and 2 for umbrella=. >> % drawing 1 sample of length 4 (4 days) 8
9 >> [data,hidden] = sample dhmm(pi w,a w,b w,1,4) >> plotseq w(data,hidden) Task Write a Matlab function prob path(), that computes the joint likelihood for a given observation sequence and an assumed path: >> prob = prob path(data,path,pi w,a w,b w) Input arguments are the observed sequence data, the assumed path path, and the weather HMM parameters (prior probabilities Pi w, transition matrix A w, and emission matrix B w). Use the function obslik = mk dhmm obs lik(data,obsmat) to produce an observation likelihood matrix where the entries correspond to b i,j in a trellis diagram (see figures 2 and 3). Type help mk dhmm obs lik for more detailed information. Produce same random sequences, plot them and calculate their likelihood. You can test your function to make the likelihood calculation for the 2. Example from section described above.(input parameters: data=[2 2 2];path=[1 3 1];prior=[1/3 1/3 1/3].) >> prob = prob path([2 2 2],[1 3 1], [1/3 1/3 1/3], A w, B w) 2.4 HMMs for speech recognition Let s forget about the weather and umbrellas for a moment and talk about speech. In automatic speech recognition, the task is to find the most likely sequence of words Ŵ given some acoustic input, or: Ŵ = arg max P (W X). (13) W W Here, X = {x 1, x 2,..., x N } is the sequence of acoustic vectors or feature vectors that are extracted from the speech signal, and we want to find Ŵ as the sequence of words W (out of all possible word sequences W), that maximizes P (W X). To compare this to the weather example, the acoustic feature vectors are our observations, corresponding to the umbrella observations, and the word sequence corresponds to the weather on successive days (a day corresponds to about 10 ms), i. e., to the hidden state sequence of a HMM for speech production. Words are made of ordered sequences of phonemes: /h/ is followed by /e/ and then by /l/ and /O/ in the word hello. This structure can be adequately modeled by a left-right HMM, where each state corresponds to a phone. Each phoneme can be considered to produce typical feature values according to a particular probability density (possibly Gaussian) (Note, that the observed feature values x i now are d-dimensional vectors and continuous valued, x i R d, and no more discrete values, as for the weather model where x i could only be true or false: x i {, }!). In real world speech recognition, the phonemes themselves are often modeled as left-right HMMs (e.g., to model separately the transition part at the begin of the phoneme, then the stationary part, and finally the transition at the end). Words are then represented by large HMMs made of concatenations of smaller phonetic HMMs. Values used throughout the following experiments: In the following experiments we will work with HMMs where the observations are drawn from a Gaussian probability distribution. Instead of an observation matrix (as in the weather example) the observations of each state are described by the mean value and the variance of a Gaussian density. The following 2-dimensional Gaussian pdfs will be used to model simulated observations of the vowels /a/, /i/, and /y/ 3. The observed features are the first two formants (maxima of the spectral envelope), 3 Phonetic symbol for the usual pronunciation of ü 9
10 which are characteristic for the vowel identity, e.g., for the vowel /a/ the formants typically occur around frequencies 730 Hz and 1090 Hz. [ ] [ ] State s 1 (for /a/): N /a/ : µ /a/ =, Σ /a/ = State s 2 (for /i/): N /i/ : µ /i/ = [ ] [ , Σ /i/ = ] [ ] [ ] State s 3 (for /y/): N /y/ : µ /y/ =, Σ /y/ = (Those densities have been used in the previous lab session.) The Gaussian pdfs now take the role of the emission matrix B in the hidden Markov models for recognition of the three vowels /a/, /i/, and /y/. The parameters of the densities and of the Markov models are stored in the file data.mat (Use: load data). A Markov model named, e.g., hmm1 is stored as an object with fields hmm1.means, hmm1.vars, hmm1.trans,hmm1.pi. The means fields contains a matrix of mean vectors, where each column of the matrix corresponds to one state s i of the HMM (e.g., to access the means of the second state of hmm1 use: hmm1.means(:,2); the vars field contains a 3 dimensional array of variance matrices, where the third dimension corresponds to the state (e.g to access Σ /a/ for state 1 use hmm1.vars(:,:,1); the trans field contains the transition matrix, and the pi field the prior probabilities Task Load the HMMs and make a sketch of each of the models with the transition probabilities Experiment Generate samples using the HMMs and plot them with plotseq and plotseq2. Use the functions plotseq and plotseq2 to plot the obtained 2-dimensional data. In the resulting views, the obtained sequences are represented by a yellow line where each point is overlaid with a colored dot. The different colors indicate the state from which any particular point has been drawn. >> %Example:generate sample from HMM1 of length N >> [X,ST]=sample ghmm(hmm1,n) >> plotseq(x,st) % View of both dimensions as separate sequences >> plotseq2(x,st,hmm1) %2D view with location of gaussian states Draw several samples with the same parameters and compare. Compare the Matlab figures with your sketch of the model. What is the effect of the different transition matrices of the HMMs on the sequences obtained during the current experiment? Hence, what is the role of the transition probabilities in the HMM? 3 Pattern Recognition with HMM In equation 10 we expressed the joint likelihood p(x, Q Θ) of an observation sequence X and a path Q given a model with parameters Θ. The likelihood of a sequence with respect to a HMM (the likelihood of an observation sequence X = {x 1, x 2,, x N } for a given hidden Markov model with parameters Θ ) expands as follows: p(x Θ) = every possible Q p(x, Q Θ), (14) i.e., it is the sum of the joint likelihoods of the sequence over all possible state sequences allowed by the model (see Trellis diagram in figure 3). 10
11 State 1 b 1,k b 1,k b 1,k b 1,k b 1,k State State 3 b 3,k b 3,k... b 3,k b 3,k... b 3,k Sequence: x 1 x 2 x 3 x i x N n = 1 n = 2 n = 3 time n = i n = N Figure 4: Calculating the likelihood in this manner is computationally expensive, particularly for large models or long sequences. It can be done with a recursive algorithm (forward-backward algorithm), which reduces the complexity of the problem. For more information about this algorithm see [1]. It is very common using log-likelihoods and log-probabilities, computing log p(x Θ) instead of p(x Θ) Experiment Classify the sequences X 1, X 2, X 3, X 4, given in the file Xdata.mat, in a maximum likelihood sense with respect to the four Markov models. Use the function loglik ghmm to compute the likelihood of a sequence with respect to a HMM. Store the results in a matrix (they will be used in the next section). >> load Xdata >> % Example: >> logprob(1,1) = loglik ghmm(x1,hmm1) >> logprob(1,2) = loglik ghmm(x1,hmm2) etc. >> logprob(3,2) = loglik ghmm(x3,hmm2) etc. Instead of typing these commands for every combination of the four sequences and models, filling the logprob matrix can be done automatically with the help of loops, using a command string composed of fixed strings and strings containing the number of sequence/model: >> for i=1:4, for j=1:4, stri = num2str(i); strj = num2str(j); cmd = [ logprob(,stri,,,strj, ) = loglik_ghmm(x,stri,,hmm,strj, ); ] eval(cmd); end; end; You can find the maximum value of each row i of the matrix, giving the index of the most likely model for sequence X i, with the Matlab function max: for i=1:4; [v,index]=max(logprob(i,:)); disp ([ X,num2str(i), -> HMM,num2str(index)]); end 11
12 i Sequence log p(x i Θ 1 ) log p(x i Θ 2 ) log p(x i Θ 3 ) log p(x i Θ 4 ) Most likely model 1 X 1 2 X 2 3 X 3 4 X 4 4 Optimal state sequence In speech recognition and several other pattern recognition applications, it is useful to associate an optimal sequence of states to a sequence of observations, given the parameters of a model. For instance, in the case of speech recognition, knowing which frames of features belong to which state allows to locate the word boundaries across time. This is called alignment of acoustic feature sequences. A reasonable optimality criterion consists of choosing the state sequence (or path) that has the maximum likelihood with respect to a given model. This sequence can be determined recursively via the Viterbi algorithm. This algorithm makes use of two variables: δ n (i) is the highest likelihood of a single path among all the paths ending in state s i at time n: δ n (i) = max p(q 1, q 2,..., q n 1, q n = s i, x 1, x 2,... x n Θ) (15) q 1,q 2,...,q n 1 a variable ψ n (i) which allows to keep track of the best path ending in state s i at time n: ψ n (i) = arg max p(q 1, q 2,..., q n 1, q n = s i, x 1, x 2,... x n Θ) (16) q 1,q 2,...,q n 1 The idea of the Viterbi algorithm is to find the most probable path for each intermediate and finally for the terminating state in the trellis. At each time n only the most likely path leading to each state s i survives. The Viterbi Algorithm for a HMM with N s states. 1. Initialization δ 1 (i) = π i b i,x1, ψ 1 (i) = 0 i = 1,..., N s (17) where π i is the prior probability of being in state s i at time n = Recursion δ n (j) = max (δ n 1 (i) a ij ) b j,xn, 1 i N s ψ n (j) = arg max 1 i N s (δ n 1 (i) a ij ), 2 n N 1 j N s 2 n N 1 j N s (18) Optimal policy is composed of optimal sub-policies : find the path that leads to a maximum likelihood considering the best likelihood at the previous step and the transitions from it; then multiply by the current likelihood given the current state. Hence, the best path is found by induction. 12
13 3. Termination p (X Θ) = max δ N (i) 1 i N s qn = arg max δ N (i) 1 i N s (19) Find the best likelihood when the end of the observation sequence t = T is reached. 4. Backtracking Q = {q 1,..., q N} so that q n = ψ n+1 (q n+1), n = N 1, N 2,..., 1 (20) Read (decode) the best sequence of states from the ψ n vectors Example Let s get back to our weather HMM. You don t know how the weather was when your were locked in. On the first 3 days your umbrella observations are: {no umbrella, umbrella, umbrella} ({,, }). Find the most probable weather-sequence using the Viterbi algorithm (assume the 3 weather situations to be equi-probable on day 1). 1. Initialization n = 1: δ 1 ( ) = π b, = 1/3 0.9 = 0.3 ψ 1 ( ) = 0 δ 1 ( ) = π b, = 1/3 0.2 = ψ 1 ( ) = 0 δ 1 ( ) = π b, = 1/3 0.7 = ψ 1 ( ) = 0 2. Recursion n = 2 : We calculate the likelihood of getting to state the most likely one to go on with: from all possible 3 predecessor states, and choose δ 2 ( ) = max(δ 1 ( ) a,, δ 1 ( ) a,, δ 1 ( ) a, ) b, = max( , , ) 0.1 = ψ 2 ( ) = The likelihood is stored in δ, the most likely predecessor in ψ. See figure 5. The same procedure is executed with states and : δ 2 ( ) = max(δ 1 ( ) a,, δ 1 ( ) a,, δ 1 ( ) a, ) b, = max( , , ) 0.8 = ψ 2 ( ) = δ 2 ( ) = max(δ 1 ( ) a,, δ 1 ( ) a,, δ 1 ( ) a, ) b, = max( , , ) 0.3 = ψ 2 ( ) = 13
14 δ 2 ( ) = max(δ 1 ( ) a,,δ 1 ( ) a,,δ 1 ( ) a, ) b, δ 1 = 0.3 ψ 2 ( ) = STATES Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time Figure 5: Viterbi algorithm to find the most likely weather sequence. Finding the most likely path to state sunny at n = 2. δ 3 ( ) = STATES δ 3 ( ) = δ 3 ( ) = Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time Figure 6: Viterbi algorithm to find most likely weather sequence at n = 3. n = 3 : δ 3 ( ) = max(δ 2 ( ) a,, δ 2 ( ) a,, δ 2 ( ) a, ) b, = max( , , ) 0.1 = ψ 3 ( ) = δ 3 ( ) = max(δ 2 ( ) a,, δ 2 ( ) a,, δ 2 ( ) a, ) b, = max( , , ) 0.8 = ψ 3 ( ) = δ 3 ( ) = max(δ 2 ( ) a,, δ 2 ( ) a,, δ 2 ( ) a, ) b, = max( , , ) 0.3 = ψ 3 ( ) = Finally, we get one most likely path ending in each state of the model. See figure Termination The globally most likely path is determined, starting by looking for the last state of the most likely sequence. P (X Θ) = max(δ 3 (i)) = δ 3 ( ) = q 3 = arg max(δ 3 (i)) = 14
15 δ 3 ( ) = STATES δ 3 ( ) = δ 3 ( ) = Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time Figure 7: Viterbi algorithm to find most likely weather sequence. Backtracking. 4. Backtracking The best sequence of states can be read from the ψ vectors. See figure 7. n = N 1 = 2: q 2 = ψ 3 (q 3) = ψ 3 ( ) = n = N 1 = 1: q 1 = ψ 2 (q 2) = ψ 2 ( ) = Thus the most likely weather sequence is: Q = {q 1, q 2, q 3 } = {,, } Task 1. Write a Matlab function [loglik,path] = vit ghmm(data,hmm) to implement the Viterbi algorithm for HMMs with Gaussian emissions. Use the function mk ghmm obs lik to calculate the observation likelihood matrix for a given sequence. Store the δ and ψ vectors in a matrix in the format of the observation likelihood matrix. You can either write the function with the algorithm exactly as given before, or switch to make the calculations in the log likelihood domain, where the multiplications of the parameters δ and ψ transform to additions. What are the advantages or disadvantages of the second method? (Think about how to implement it on a real system with limited computational accuracy, and about a HMM with a large number of states where the probabilities a i,j and b i,k might be small.) 2. Use the function vit ghmm to determine the most likely path for the sequences X 1, X 2, X 3 and X 4. Compare with the state sequence ST 1,..., ST 4 originally used to generate X 1,..., X 4. (Use the function compseq, which provides a view of the first dimension of the observations as a time series, and allows to compare the original alignment to the Viterbi solution). >> [loglik,path] = vit ghmm(x1,hmm1); compseq(x1,st1,path); >> [loglik,path] = vit ghmm(x2,hmm1); compseq(x2,st2,path); Repeat for the remaining sequences. 3. Use the function vit ghmm to compute the probabilities of the sequences X 1,..., X 4 along the best paths with respect to each model Θ 1,..., Θ 4. Note down your results below. Compare with the log-likelihoods obtained in section using the forward procedure. >> diffl=logprob-logprobviterbi Likelihoods along the best path: 15
16 i Sequence log p (X i Θ 1 ) log p (X i Θ 2 ) log p (X i Θ 3 ) log p (X i Θ 4 ) Most likely model 1 X 1 2 X 2 3 X 3 4 X 4 Difference between log-likelihoods and likelihoods along the best path: Sequence HMM1 HMM2 HMM3 HMM4 X 1 X 2 X 3 X 4 Question: Is the likelihood along the best path a good approximation of the real likelihood of a sequence given a model? References [1] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77 (2),
Hardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
Tagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Language Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The
Structural Health Monitoring Tools (SHMTools)
Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents
Formal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU
1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known
Introduction to Algorithmic Trading Strategies Lecture 2
Introduction to Algorithmic Trading Strategies Lecture 2 Hidden Markov Trading Model Haksun Li [email protected] www.numericalmethod.com Outline Carry trade Momentum Valuation CAPM Markov chain
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP
Probability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
The continuous and discrete Fourier transforms
FYSA21 Mathematical Tools in Science The continuous and discrete Fourier transforms Lennart Lindegren Lund Observatory (Department of Astronomy, Lund University) 1 The continuous Fourier transform 1.1
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
MUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
Statistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
Sudoku puzzles and how to solve them
Sudoku puzzles and how to solve them Andries E. Brouwer 2006-05-31 1 Sudoku Figure 1: Two puzzles the second one is difficult A Sudoku puzzle (of classical type ) consists of a 9-by-9 matrix partitioned
Practice Problems for Homework #8. Markov Chains. Read Sections 7.1-7.3. Solve the practice problems below.
Practice Problems for Homework #8. Markov Chains. Read Sections 7.1-7.3 Solve the practice problems below. Open Homework Assignment #8 and solve the problems. 1. (10 marks) A computer system can operate
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Solutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
Bayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
Classification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
Cell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel [email protected] 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Advanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
Exploratory Factor Analysis
Introduction Principal components: explain many variables using few new variables. Not many assumptions attached. Exploratory Factor Analysis Exploratory factor analysis: similar idea, but based on model.
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Principle of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Capacity Limits of MIMO Channels
Tutorial and 4G Systems Capacity Limits of MIMO Channels Markku Juntti Contents 1. Introduction. Review of information theory 3. Fixed MIMO channels 4. Fading MIMO channels 5. Summary and Conclusions References
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Solutions to Math 51 First Exam January 29, 2015
Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not
LECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
Conditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
Tracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
Developing an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium [email protected]
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Hidden Markov Models with Applications to DNA Sequence Analysis. Christopher Nemeth, STOR-i
Hidden Markov Models with Applications to DNA Sequence Analysis Christopher Nemeth, STOR-i May 4, 2011 Contents 1 Introduction 1 2 Hidden Markov Models 2 2.1 Introduction.......................................
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Neural Network Toolbox
Neural Network Toolbox A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Stefan Häusler Institute for Theoretical Computer Science Inffeldgasse 16b/I Abstract This
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
STATISTICAL DATA ANALYSIS COURSE VIA THE MATLAB WEB SERVER
STATISTICAL DATA ANALYSIS COURSE VIA THE MATLAB WEB SERVER Ale š LINKA Dept. of Textile Materials, TU Liberec Hálkova 6, 461 17 Liberec, Czech Republic e-mail: [email protected] Petr VOLF Dept. of Applied
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
5.5. Solving linear systems by the elimination method
55 Solving linear systems by the elimination method Equivalent systems The major technique of solving systems of equations is changing the original problem into another one which is of an easier to solve
Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
WHERE DOES THE 10% CONDITION COME FROM?
1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
Reinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg [email protected]
10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
DERIVATIVES AS MATRICES; CHAIN RULE
DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we
Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions
SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions 1. The following table contains a probability distribution for a random variable X. a. Find the expected value (mean) of X. x 1 2
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
Lecture notes: single-agent dynamics 1
Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.
Systems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
VHDL Test Bench Tutorial
University of Pennsylvania Department of Electrical and Systems Engineering ESE171 - Digital Design Laboratory VHDL Test Bench Tutorial Purpose The goal of this tutorial is to demonstrate how to automate
Solving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
Lecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION
DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION Introduction The outputs from sensors and communications receivers are analogue signals that have continuously varying amplitudes. In many systems
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
SECTION 10-2 Mathematical Induction
73 0 Sequences and Series 6. Approximate e 0. using the first five terms of the series. Compare this approximation with your calculator evaluation of e 0.. 6. Approximate e 0.5 using the first five terms
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence
Lecture 2 Mathcad Basics
Operators Lecture 2 Mathcad Basics + Addition, - Subtraction, * Multiplication, / Division, ^ Power ( ) Specify evaluation order Order of Operations ( ) ^ highest level, first priority * / next priority
Solving Systems of Linear Equations Using Matrices
Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.
ECE302 Spring 2006 HW5 Solutions February 21, 2006 1
ECE3 Spring 6 HW5 Solutions February 1, 6 1 Solutions to HW5 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
