Lecture 10: ASR: Sequence Recognition

Size: px
Start display at page:

Download "Lecture 10: ASR: Sequence Recognition"

Transcription

1 EE E6820: Speech & Audio Processing & Recognition Lecture 10: ASR: Sequence Recognition Deterministic sequence recognition: DTW Statistical sequence recognition: HMM HMM training Discriminant models Dan Ellis <dpwe@ee.columbia.edu> E6820 SAPR - Sequence Recognition - Dan Ellis

2 1 Sequence recognition After feature calculation, the problem is to classify the sequences of frames: Feature calculation sound feature vectors Network weights Word models Language model Acoustic classifier HMM decoder phone probabilities phone & word labeling Questions - how to represent model sequences - how to score matches E6820 SAPR - Sequence Recognition - Dan Ellis

3 Acoustic template matching Framewise comparison of unknown word and stored templates: Reference ONE TWO THREE FOUR FIVE time /frames Test - distance metric? - comparison between templates? - constraints? E6820 SAPR - Sequence Recognition - Dan Ellis

4 Dynamic Time Warp (DTW Find lowest-cost constrained path: - matrix d(i,j of distances between input frame f i and reference frame r j - allowable predecessors & transition costs T xy Reference frames r j D(i-1,j D(i-1,j T 10 Lowest cost to (i,j D(i-1,j + T 10 D(i,j = d(i,j + min{ } D(i,j-1 + T 01 D(i-1,j-1 + T 11 Local match cost Best predecessor D(i-1,j (including transition cost T 11 T 01 Input frames f i Best path via traceback from final state - have to store predecessors for (almost every (i,j E6820 SAPR - Sequence Recognition - Dan Ellis

5 DTW-based recognition Reference templates for each possible word Isolated word: - mark endpoints of input word - calculate scores through each template (+prune - choose best Continuous speech - one matrix of template slices; special-case constraints at word ends Reference ONE TWO THREE FOUR Input frames E6820 SAPR - Sequence Recognition - Dan Ellis

6 DTW-based recognition (2 + Successfully handles timing variation + Able to recognize speech at reasonable cost - Distance metric? - pseudo-euclidean space? - Warp penalties? - How to choose templates? - several templates per word? - choose most representative? - align and average? need a rigorous foundation... E6820 SAPR - Sequence Recognition - Dan Ellis

7 Outline Dynamic Time Warp Hidden Markov Models - Statistical formulation - Markov Models - Hidden states HMM training Discriminant models E6820 SAPR - Sequence Recognition - Dan Ellis

8 2 Statistical sequence recognition DTW limited because it s hard to optimize - interpretation of distance, transition costs? Need a theoretical foundation: Probability Formulate as MAP choice among models: M * = pm ( j X, Θ argmax M j - X = observed features - M j = word-sequence models - Θ = all current parameters E6820 SAPR - Sequence Recognition - Dan Ellis

9 Statistical formulation (2 Can rearrange via Bayes rule (& drop p(x : M * = pm ( j X, Θ argmax M j = argmax M j pxm ( j, Θ A pm ( j Θ L - p(x M j = likelihood of obs v ns under model - p(m j = prior probability of model - Θ A = acoustics-related model parameters - Θ L = language-related model parameters Questions: - what form of model to use for? - how to find Θ A (training? - how to solve for M j (decoding? pxm ( j, Θ A E6820 SAPR - Sequence Recognition - Dan Ellis

10 Model M j State-based modeling Assume discrete-state model for the speech: - observations are divided up into time frames - model states observations: Q k : q 1 q 2 q 3 q 4 q 5 q 6... N X 1 : x 2 x 3 x 4 x 5 x 6... x 1 time states observed feature vectors Probability of observations given model is: N pxm ( j = px ( 1 Qk, M j pq ( k M j all Q k - sum over all possible state sequences Q k How do observations depend on states? How do state sequences depend on model? E6820 SAPR - Sequence Recognition - Dan Ellis

11 S Generative Markov models A (first order Markov model is a finite-state system whose behavior depends only on the current state, e.g..8.8 A C B.1 E Defined by: p(q n+1 q n - state set {q i } (including start & end states - transition matrix p(q n+1 q n depends only on current state E6820 SAPR - Sequence Recognition - Dan Ellis q n S A B C E q n+1 S A B C E S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E

12 p(x q p(x q Hidden Markov models Markov models where state sequence Q = {q n } is not directly observable (= hidden But, observations X do depend on Q: - x n is rv that depends on current state: Emission distributions q = A q = B q = C q = A q = B q = C observation x x n x n - can still tell something about state seq State sequence Observation sequence pxq ( AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC time step n E6820 SAPR - Sequence Recognition - Dan Ellis

13 Hidden Markov models (2 An HMM is specified by: - emission distributions - transition probabilities pxq ( i b i ( x pq n j i q n 1 ( a ij pq 1 i ( π i - (initial state probabilities Speech models M j - typ. left-to-right HMMs (sequence constraint - observation & evolution are conditionally independent of rest given (hidden state q n S ae 1 ae 2 ae 3 E q 1 q 2 q 3 q 4 q 5 x 1 x 2 x 3 x 4 x 5 - self-loops for time dilation E6820 SAPR - Sequence Recognition - Dan Ellis

14 Markov models for sequence recognition Independence of observations: - observation x n depends only current state q n pxq ( = px ( 1, x 2, x N q 1, q 2, q N = px ( 1 q 1 px ( 2 q 2 px ( N q N N n = 1 = px ( n q n = b qn ( x n Markov transitions: n = 1 - transition to next state q i+1 depends only on q i pqm ( = pq ( 1, q 2, q N M = = pq ( N q 1 q N 1 pq ( N 1 q 1 q N 2 pq ( 2 q 1 pq ( 1 pq ( N q N 1 pq ( N 1 q N 2 pq ( 2 q 1 pq ( 1 N = pq ( 1 pq ( n q n 1 = π q1 a qn 1 n = 2 n = 2 E6820 SAPR - Sequence Recognition - Dan Ellis N N q n

15 Model fit calculation From state-based modeling : N pxm ( j = px ( 1 Qk, M j pq ( k M j For HMMs: all Q k pxq ( = b qn ( x n n = 1 pqm ( = π q1 a qn 1 Hence, solve for M * : - calculate for each available model, scale by prior Sum over all Q k??? N N n = 2 q n pxm ( j pm ( j pm ( j X E6820 SAPR - Sequence Recognition - Dan Ellis

16 The forward recursion Dynamic-programming-like technique to calculate sum over all Q k α n ( i Define as the probability of getting to state q i at time step n (by any path: α n ( i px ( 1, x 2, x n, q n =q i n i = px ( 1, qn Model states q i Then α n + 1 ( j can be calculated recursively: α n (i+1 a i+1j a ij b j (x n+1 S α n+1 (j = [Σ α n (i a ij ] b j (x n+1 i=1 α n (i Time steps n E6820 SAPR - Sequence Recognition - Dan Ellis

17 Forward recursion (2 Initialize α 1 ( i = π i b i ( x 1 Then total probability N px ( 1 M = αn ( i S i = 1 Practical way to solve for p(x M j and hence perform recognition Observations X p(x M 1 p(m 1 p(x M 2 p(m 2 Choose best E6820 SAPR - Sequence Recognition - Dan Ellis

18 Optimal path May be interested in actual q n assignments - which state was active at each time frame - e.g. phone labelling (for training? Total probability is over all paths but can also solve for single best path = Viterbi state sequence j q n + 1 Probability along best path to state : * α n + 1 * ( j = max { α n (aij i } b i j ( x n backtrack from final state to get best path - final probability is product only (no sum log-domain calculation just summation Total probability often dominated by best path: pxq (, * M pxm ( E6820 SAPR - Sequence Recognition - Dan Ellis

19 Interpreting the Viterbi path Viterbi path assigns each x n to a state q i - performing classification based on b i (x -... at the same time as applying transition constraints a ij Inferred classification 3 x n 2 Viterbi labels: AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC Can be used for segmentation - train an HMM with garbage and target states - decode on new data to find targets, boundaries Can use for (heuristic training - e.g. train classifiers based on labels... E6820 SAPR - Sequence Recognition - Dan Ellis

20 HMM summary HMM M j is hypothesized generator of obs v n X During generation, behavior of model depends only on current state q n : - transition probabilities p(q n+1 q n = a ij - emission distributions p(x n q n = b i (x Given just observed emissions X we can still calculate probability that they came from M j : pxm ( = pxqm (, pqm ( all Q Can also infer most likely state sequence Q * = pxqm (, argmax Q M = M* Q = {q 1,q 2,...q n } E6820 SAPR - Sequence Recognition - Dan Ellis X Q* assumed, hidden observed inferred

21 Validity of HMM assumptions Key assumption is conditional independence: Given q i, future evolution & obs. distribution are independent of previous events - duration behavior: self-loops imply exponential distribution γ p(n = n 1 γ γ(1 γ - independence of successive x n s n p(x n q i x n px ( = px ( n q i? n E6820 SAPR - Sequence Recognition - Dan Ellis

22 Outline Dynamic Time Warp Hidden Markov Models HMM training - EM for HMM parameters - State occupancy probabilities - Acoustic parameters Discriminative models E6820 SAPR - Sequence Recognition - Dan Ellis

23 3 Training models We know how to use an HMM, but where does it come from? In fact, HMM s assumption of stationarity within states seems a worse model than DTW templates Objection to DTW was the lack of a principled optimization procedure Probabilistic foundation of HMMs supports optimization: - maximum-likelihood - adjust Θ to maximize likelihood of training data under model - (really want minimum error rate or... E6820 SAPR - Sequence Recognition - Dan Ellis

24 EM for HMMs 0 Expectation-Maximization (EM 0 was introduced for Gaussian 0 mixture models (GMMs 0 - fit GMMs to arbitrary data 0 - EM finds locally-optimal model 0 parameters Θ that maximize data likelihood px ( train Θ - makes (some sense for decision rules like pxm ( j pm ( j Principle: adjust Θ to maximize expected log likelihood of known x & unknown u: E[ log p( x, u Θ ] = puxθ (, old log[ pxuθ (, puθ ( ] u - for GMMs, unknowns = mix assignments k - for HMMs, unknowns = hidden state q n (take Θ to include M j E6820 SAPR - Sequence Recognition - Dan Ellis

25 What EM does Maximize data likelihood by repeatedly estimating unknowns and re-maximizing expected log likelihood: Data log likelihood log p(x Θ etc... Re-estimate unknowns local optimum Adjust model params Θ to maximize expected log likelihood Estimate unknowns p(q n X,Θ Successive parameter estimates Θ E6820 SAPR - Sequence Recognition - Dan Ellis

26 all Q k = = EM for HMMs (2 Expected log likelihood for HMM: pq ( k X, Θ old log[ pxq ( k, ΘpQ ( k Θ ] all Q k N n = 1 i = S i = 1 pq ( k X, Θ old log px ( n q n pq ( n q n 1 S pq 1 i i ( X, Θ old log px ( n q n, Θ pq n i i ( X, Θ old log pq ( 1 Θ N S S i j j pq ( n 1, q n X, Θ old log pq n n = 2 i = 1 j = 1 - closed-form maximization by differentiation etc. E6820 SAPR - Sequence Recognition - Dan Ellis n i q n 1 (, Θ

27 EM update equations For acoustic model (e.g. 1-D Gauss: i ( X, Θ old x n n pq n µ i = i pq ( n X, Θ old n For transition probabilities: i j j i pq n 1 q n pq ( n q n 1 = n i pq n n 1 (, X, Θ old ( X, Θ old Normalized expectations as for GMMs Require state occupancy probabilities, i N pq ( n X 1, Θold E6820 SAPR - Sequence Recognition - Dan Ellis

28 The forward-backward algorithm i N We need pq ( n X 1 for EM updates (Θ implied Forward algorithm gave us - excludes influence of remaining data Hence, define so that then α n ( i β n ( i pq n i ( X 1 N i n α n ( i = pq ( n, X 1 Recursive definition for β: β n ( i = β n + 1 ( ja ij b j ( x n + 1 j - recurses backwards from final state N N X n + 1 β n ( i = px ( n + 1, = = pq n i N X 1 N q n i (, α n ( i β n ( i ( j β n ( j α j n X 1 n E6820 SAPR - Sequence Recognition - Dan Ellis

29 Estimating a ij from α & β From EM equations: i j j i new pq n 1 q n, pq ( n q n 1 = a ij = n i pq n 1, n - prob. of transition normalized by prob. in start Obtain from = = px n + 1 β n i pq ( n 1,, X Θ old q n j N j j ( q n px n q n (, X Θ old ( X Θ old i j ( pq ( n qn 1 pq ( n 1, ( j b j ( x n a ij α n 1 ( i i n 1 X 1 q i n-1 α n-1 (i a ij b j (x n q n j β n (j E6820 SAPR - Sequence Recognition - Dan Ellis

30 GMM-HMMs in practice GMMs as acoustic models: train by including mixture indices as unknowns - just more complicated equations... Practical GMMs: - 9 to 39 feature dimensions - 2 to 64 Gaussians per mixture depending on number of training examples Training constraint: define state sequence - decide words & pronunciations for training data - EM will then figure out boundaries Lots of data can model more classes - e.g context-independent (CI: q i = ae aa ax... context-dependent (CD: q i = b-ae-b b-ae-k... E6820 SAPR - Sequence Recognition - Dan Ellis

31 HMM training in practice EM only finds local optimum critically dependent on initialization - approximate parameters / rough alignment Model inventory ae 1 ae 2 ae 3 dh 1 dh 2 Labelled training data dh ax k ae t s ae t aa n Initialization parameters Θ init dh ax k ae s ae t aa n t Uniform initialization alignments Repeat until convergence E-step: probabilities of unknowns M-step: maximize via parameters p(q i n X N 1, Θ old dh ax Θ : max E[log p(x,q Θ] k ae Applicable for more than just words... E6820 SAPR - Sequence Recognition - Dan Ellis

32 Outline Dynamic Time Warp Hidden Markov Models HMM training Discriminant models - Being discriminant - Neural-net recognizers E6820 SAPR - Sequence Recognition - Dan Ellis

33 3 Discriminant models EM training of HMMs is maximum likelihood - i.e. local max p(x trn Θ - converges to Bayes optimum... in the limit Decision rule is max p(x M p(m - training will increase p(x M correct - may also increase p(x M wrong (more! Discriminant training tries directly to increase discrimination between right & wrong models - e.g. Maximum Mutual Information (MMI I( M j, X Θ = log pm ( j, X Θ pm ( j ΘpXΘ ( = log pxm ( j, Θ pxm ( k, ΘpM ( k Θ E6820 SAPR - Sequence Recognition - Dan Ellis

34 Neural Network Acoustic Models Single model generates posteriors directly for all classes at once = frame-discriminant Use regular HMM decoder for recognition - set b i ( x n = px ( n q i pq ( i x n pq ( i Nets are less sensitive to input representation - skewed feature distributions - correlated features Can use temporal context window to let net see feature dynamics: Feature calculation C 0 C 1 C 2 h# pcl bcl tcl dcl posteriors p(q i X C k t n t n+w E6820 SAPR - Sequence Recognition - Dan Ellis

35 Forced alignment Best (Viterbi labeling given existing classifier constrained by known word sequence Feature vectors time Existing classifier Known word sequence Dictionary Phone posterior probabilities ow th r iy... ow th r iy n s Constrained alignment Training targets ow th r iy Classifier training E6820 SAPR - Sequence Recognition - Dan Ellis

36 Neural nets: Practicalities Typical net sizes: - input layer: 9 frames x 9-40 features ~ 300 units - hidden layer: units, dep. training data - output layer: context-independent phones Hard to make context dependent - problems training many classes that are similar? Representation is opaque: Hidden -> Output weights output layer (phones feature index time frame Input -> Hidden #187 hidden layer E6820 SAPR - Sequence Recognition - Dan Ellis

37 Recap: Recognizer Structure Feature calculation sound feature vectors Network weights Word models Language model Acoustic classifier HMM decoder phone probabilities phone & word labeling Know how to execute each state Know how to train each stage.. except language/word models E6820 SAPR - Sequence Recognition - Dan Ellis

38 Summary Speech is modeled as a sequence of features - need temporal aspect to recognition - best time-alignment of templates = DTW Hidden Markov models are rigorous solution - self-loops allow temporal dilation - exact, efficient likelihood calculations HMMs can be trained via EM - guaranteed to maximize data likelihood (locally - acoustic models (GMMs can be trained too Discriminant training considers wrong models - recognition requires improved margin - frame-discriminant neural nets give better model discrimination? E6820 SAPR - Sequence Recognition - Dan Ellis

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

Introduction to Algorithmic Trading Strategies Lecture 2

Introduction to Algorithmic Trading Strategies Lecture 2 Introduction to Algorithmic Trading Strategies Lecture 2 Hidden Markov Trading Model Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Carry trade Momentum Valuation CAPM Markov chain

More information

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU 1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known

More information

Hidden Markov Models

Hidden Markov Models 8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

More information

Stock Option Pricing Using Bayes Filters

Stock Option Pricing Using Bayes Filters Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

HMM : Viterbi algorithm - a toy example

HMM : Viterbi algorithm - a toy example MM : Viterbi algorithm - a toy example.5.3.4.2 et's consider the following simple MM. This model is composed of 2 states, (high C content) and (low C content). We can for example consider that state characterizes

More information

Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models 1 Hidden Markov Models A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (modified by Erhard and Car line Rank) Signal Processing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Inferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.

Inferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc. Inferring Probabilistic Models of cis-regulatory Modules MI/S 776 www.biostat.wisc.edu/bmi776/ Spring 2015 olin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition

Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition 1 Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition Émilie CAILLAULT LASL, ULCO/University of Littoral (Calais) Christian VIARD-GAUDIN IRCCyN, University of

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Part of Speech (POS) Tagging Given a sentence X, predict its

More information

Autoregressive HMM Based Trajectory Modeling on Social Network

Autoregressive HMM Based Trajectory Modeling on Social Network 1 Autoregressive HMM Based Trajectory Modeling on Social Network Yen-Cheng Lu, Feng Chen, Chang-Tien-Lu (kevinlu@vt.edu, chenf@vt.edu, ctlu@vt.edu ) Department of Computer Science, Virginia Tech, USA Abstract

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Hidden Markov Models Chapter 15

Hidden Markov Models Chapter 15 Hidden Markov Models Chapter 15 Mausam (Slides based on Dan Klein, Luke Zettlemoyer, Alex Simma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15

More information

Structured Prediction Models for Chord Transcription of Music Audio

Structured Prediction Models for Chord Transcription of Music Audio Structured Prediction Models for Chord Transcription of Music Audio Adrian Weller, Daniel Ellis, Tony Jebara Columbia University, New York, NY 10027 aw2506@columbia.edu, dpwe@ee.columbia.edu, jebara@cs.columbia.edu

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class. Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Translational Equivalence Er hat die Prüfung bestanden, jedoch

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Direct Loss Minimization for Structured Prediction

Direct Loss Minimization for Structured Prediction Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative

More information

Multi-Class and Structured Classification

Multi-Class and Structured Classification Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Chapter 6. Decoding. Statistical Machine Translation

Chapter 6. Decoding. Statistical Machine Translation Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for translation p(e f) Task of decoding: find the translation e best with highest probability Two types of error

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4 Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Classification by Pairwise Coupling

Classification by Pairwise Coupling Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

SPRACH - WP 6 & 8: Software engineering work at ICSI

SPRACH - WP 6 & 8: Software engineering work at ICSI SPRACH - WP 6 & 8: Software engineering work at ICSI March 1998 Dan Ellis International Computer Science Institute, Berkeley CA 1 2 3 Hardware: MultiSPERT Software: speech & visualization

More information

LECTURE 4. Last time: Lecture outline

LECTURE 4. Last time: Lecture outline LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

More information

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS Iris Ginzburg and David Horn School of Physics and Astronomy Raymond and Beverly Sackler Faculty of Exact Science Tel-Aviv University Tel-A viv 96678,

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 Data balancing for efficient training of Hybrid ANN/HMM Automatic Speech Recognition systems Ana Isabel García-Moral,

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

More information

Semi-Supervised Learning in Inferring Mobile Device Locations

Semi-Supervised Learning in Inferring Mobile Device Locations Semi-Supervised Learning in Inferring Mobile Device Locations Rong Duan, Olivia Hong, Guangqin Ma rduan,ohong,gma @research.att.com AT&T Labs 200 S Laurel Ave Middletown, NJ 07748 Abstract With the development

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

More information

Machine Translation. Agenda

Machine Translation. Agenda Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm

More information

On the Interaction and Competition among Internet Service Providers

On the Interaction and Competition among Internet Service Providers On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Data Mining Techniques

Data Mining Techniques 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

Classification with Hybrid Generative/Discriminative Models

Classification with Hybrid Generative/Discriminative Models Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

Data a systematic approach

Data a systematic approach Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in

More information

Probabilistic Credit Card Fraud Detection System in Online Transactions

Probabilistic Credit Card Fraud Detection System in Online Transactions Probabilistic Credit Card Fraud Detection System in Online Transactions S. O. Falaki 1, B. K. Alese 1, O. S. Adewale 1, J. O. Ayeni 2, G. A. Aderounmu 3 and W. O. Ismaila 4 * 1 Federal University of Technology,

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

Solving simultaneous equations using the inverse matrix

Solving simultaneous equations using the inverse matrix Solving simultaneous equations using the inverse matrix 8.2 Introduction The power of matrix algebra is seen in the representation of a system of simultaneous linear equations as a matrix equation. Matrix

More information

Linear Programming I

Linear Programming I Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information