Lecture 10: ASR: Sequence Recognition
|
|
- Doreen Hensley
- 7 years ago
- Views:
Transcription
1 EE E6820: Speech & Audio Processing & Recognition Lecture 10: ASR: Sequence Recognition Deterministic sequence recognition: DTW Statistical sequence recognition: HMM HMM training Discriminant models Dan Ellis <dpwe@ee.columbia.edu> E6820 SAPR - Sequence Recognition - Dan Ellis
2 1 Sequence recognition After feature calculation, the problem is to classify the sequences of frames: Feature calculation sound feature vectors Network weights Word models Language model Acoustic classifier HMM decoder phone probabilities phone & word labeling Questions - how to represent model sequences - how to score matches E6820 SAPR - Sequence Recognition - Dan Ellis
3 Acoustic template matching Framewise comparison of unknown word and stored templates: Reference ONE TWO THREE FOUR FIVE time /frames Test - distance metric? - comparison between templates? - constraints? E6820 SAPR - Sequence Recognition - Dan Ellis
4 Dynamic Time Warp (DTW Find lowest-cost constrained path: - matrix d(i,j of distances between input frame f i and reference frame r j - allowable predecessors & transition costs T xy Reference frames r j D(i-1,j D(i-1,j T 10 Lowest cost to (i,j D(i-1,j + T 10 D(i,j = d(i,j + min{ } D(i,j-1 + T 01 D(i-1,j-1 + T 11 Local match cost Best predecessor D(i-1,j (including transition cost T 11 T 01 Input frames f i Best path via traceback from final state - have to store predecessors for (almost every (i,j E6820 SAPR - Sequence Recognition - Dan Ellis
5 DTW-based recognition Reference templates for each possible word Isolated word: - mark endpoints of input word - calculate scores through each template (+prune - choose best Continuous speech - one matrix of template slices; special-case constraints at word ends Reference ONE TWO THREE FOUR Input frames E6820 SAPR - Sequence Recognition - Dan Ellis
6 DTW-based recognition (2 + Successfully handles timing variation + Able to recognize speech at reasonable cost - Distance metric? - pseudo-euclidean space? - Warp penalties? - How to choose templates? - several templates per word? - choose most representative? - align and average? need a rigorous foundation... E6820 SAPR - Sequence Recognition - Dan Ellis
7 Outline Dynamic Time Warp Hidden Markov Models - Statistical formulation - Markov Models - Hidden states HMM training Discriminant models E6820 SAPR - Sequence Recognition - Dan Ellis
8 2 Statistical sequence recognition DTW limited because it s hard to optimize - interpretation of distance, transition costs? Need a theoretical foundation: Probability Formulate as MAP choice among models: M * = pm ( j X, Θ argmax M j - X = observed features - M j = word-sequence models - Θ = all current parameters E6820 SAPR - Sequence Recognition - Dan Ellis
9 Statistical formulation (2 Can rearrange via Bayes rule (& drop p(x : M * = pm ( j X, Θ argmax M j = argmax M j pxm ( j, Θ A pm ( j Θ L - p(x M j = likelihood of obs v ns under model - p(m j = prior probability of model - Θ A = acoustics-related model parameters - Θ L = language-related model parameters Questions: - what form of model to use for? - how to find Θ A (training? - how to solve for M j (decoding? pxm ( j, Θ A E6820 SAPR - Sequence Recognition - Dan Ellis
10 Model M j State-based modeling Assume discrete-state model for the speech: - observations are divided up into time frames - model states observations: Q k : q 1 q 2 q 3 q 4 q 5 q 6... N X 1 : x 2 x 3 x 4 x 5 x 6... x 1 time states observed feature vectors Probability of observations given model is: N pxm ( j = px ( 1 Qk, M j pq ( k M j all Q k - sum over all possible state sequences Q k How do observations depend on states? How do state sequences depend on model? E6820 SAPR - Sequence Recognition - Dan Ellis
11 S Generative Markov models A (first order Markov model is a finite-state system whose behavior depends only on the current state, e.g..8.8 A C B.1 E Defined by: p(q n+1 q n - state set {q i } (including start & end states - transition matrix p(q n+1 q n depends only on current state E6820 SAPR - Sequence Recognition - Dan Ellis q n S A B C E q n+1 S A B C E S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E
12 p(x q p(x q Hidden Markov models Markov models where state sequence Q = {q n } is not directly observable (= hidden But, observations X do depend on Q: - x n is rv that depends on current state: Emission distributions q = A q = B q = C q = A q = B q = C observation x x n x n - can still tell something about state seq State sequence Observation sequence pxq ( AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC time step n E6820 SAPR - Sequence Recognition - Dan Ellis
13 Hidden Markov models (2 An HMM is specified by: - emission distributions - transition probabilities pxq ( i b i ( x pq n j i q n 1 ( a ij pq 1 i ( π i - (initial state probabilities Speech models M j - typ. left-to-right HMMs (sequence constraint - observation & evolution are conditionally independent of rest given (hidden state q n S ae 1 ae 2 ae 3 E q 1 q 2 q 3 q 4 q 5 x 1 x 2 x 3 x 4 x 5 - self-loops for time dilation E6820 SAPR - Sequence Recognition - Dan Ellis
14 Markov models for sequence recognition Independence of observations: - observation x n depends only current state q n pxq ( = px ( 1, x 2, x N q 1, q 2, q N = px ( 1 q 1 px ( 2 q 2 px ( N q N N n = 1 = px ( n q n = b qn ( x n Markov transitions: n = 1 - transition to next state q i+1 depends only on q i pqm ( = pq ( 1, q 2, q N M = = pq ( N q 1 q N 1 pq ( N 1 q 1 q N 2 pq ( 2 q 1 pq ( 1 pq ( N q N 1 pq ( N 1 q N 2 pq ( 2 q 1 pq ( 1 N = pq ( 1 pq ( n q n 1 = π q1 a qn 1 n = 2 n = 2 E6820 SAPR - Sequence Recognition - Dan Ellis N N q n
15 Model fit calculation From state-based modeling : N pxm ( j = px ( 1 Qk, M j pq ( k M j For HMMs: all Q k pxq ( = b qn ( x n n = 1 pqm ( = π q1 a qn 1 Hence, solve for M * : - calculate for each available model, scale by prior Sum over all Q k??? N N n = 2 q n pxm ( j pm ( j pm ( j X E6820 SAPR - Sequence Recognition - Dan Ellis
16 The forward recursion Dynamic-programming-like technique to calculate sum over all Q k α n ( i Define as the probability of getting to state q i at time step n (by any path: α n ( i px ( 1, x 2, x n, q n =q i n i = px ( 1, qn Model states q i Then α n + 1 ( j can be calculated recursively: α n (i+1 a i+1j a ij b j (x n+1 S α n+1 (j = [Σ α n (i a ij ] b j (x n+1 i=1 α n (i Time steps n E6820 SAPR - Sequence Recognition - Dan Ellis
17 Forward recursion (2 Initialize α 1 ( i = π i b i ( x 1 Then total probability N px ( 1 M = αn ( i S i = 1 Practical way to solve for p(x M j and hence perform recognition Observations X p(x M 1 p(m 1 p(x M 2 p(m 2 Choose best E6820 SAPR - Sequence Recognition - Dan Ellis
18 Optimal path May be interested in actual q n assignments - which state was active at each time frame - e.g. phone labelling (for training? Total probability is over all paths but can also solve for single best path = Viterbi state sequence j q n + 1 Probability along best path to state : * α n + 1 * ( j = max { α n (aij i } b i j ( x n backtrack from final state to get best path - final probability is product only (no sum log-domain calculation just summation Total probability often dominated by best path: pxq (, * M pxm ( E6820 SAPR - Sequence Recognition - Dan Ellis
19 Interpreting the Viterbi path Viterbi path assigns each x n to a state q i - performing classification based on b i (x -... at the same time as applying transition constraints a ij Inferred classification 3 x n 2 Viterbi labels: AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC Can be used for segmentation - train an HMM with garbage and target states - decode on new data to find targets, boundaries Can use for (heuristic training - e.g. train classifiers based on labels... E6820 SAPR - Sequence Recognition - Dan Ellis
20 HMM summary HMM M j is hypothesized generator of obs v n X During generation, behavior of model depends only on current state q n : - transition probabilities p(q n+1 q n = a ij - emission distributions p(x n q n = b i (x Given just observed emissions X we can still calculate probability that they came from M j : pxm ( = pxqm (, pqm ( all Q Can also infer most likely state sequence Q * = pxqm (, argmax Q M = M* Q = {q 1,q 2,...q n } E6820 SAPR - Sequence Recognition - Dan Ellis X Q* assumed, hidden observed inferred
21 Validity of HMM assumptions Key assumption is conditional independence: Given q i, future evolution & obs. distribution are independent of previous events - duration behavior: self-loops imply exponential distribution γ p(n = n 1 γ γ(1 γ - independence of successive x n s n p(x n q i x n px ( = px ( n q i? n E6820 SAPR - Sequence Recognition - Dan Ellis
22 Outline Dynamic Time Warp Hidden Markov Models HMM training - EM for HMM parameters - State occupancy probabilities - Acoustic parameters Discriminative models E6820 SAPR - Sequence Recognition - Dan Ellis
23 3 Training models We know how to use an HMM, but where does it come from? In fact, HMM s assumption of stationarity within states seems a worse model than DTW templates Objection to DTW was the lack of a principled optimization procedure Probabilistic foundation of HMMs supports optimization: - maximum-likelihood - adjust Θ to maximize likelihood of training data under model - (really want minimum error rate or... E6820 SAPR - Sequence Recognition - Dan Ellis
24 EM for HMMs 0 Expectation-Maximization (EM 0 was introduced for Gaussian 0 mixture models (GMMs 0 - fit GMMs to arbitrary data 0 - EM finds locally-optimal model 0 parameters Θ that maximize data likelihood px ( train Θ - makes (some sense for decision rules like pxm ( j pm ( j Principle: adjust Θ to maximize expected log likelihood of known x & unknown u: E[ log p( x, u Θ ] = puxθ (, old log[ pxuθ (, puθ ( ] u - for GMMs, unknowns = mix assignments k - for HMMs, unknowns = hidden state q n (take Θ to include M j E6820 SAPR - Sequence Recognition - Dan Ellis
25 What EM does Maximize data likelihood by repeatedly estimating unknowns and re-maximizing expected log likelihood: Data log likelihood log p(x Θ etc... Re-estimate unknowns local optimum Adjust model params Θ to maximize expected log likelihood Estimate unknowns p(q n X,Θ Successive parameter estimates Θ E6820 SAPR - Sequence Recognition - Dan Ellis
26 all Q k = = EM for HMMs (2 Expected log likelihood for HMM: pq ( k X, Θ old log[ pxq ( k, ΘpQ ( k Θ ] all Q k N n = 1 i = S i = 1 pq ( k X, Θ old log px ( n q n pq ( n q n 1 S pq 1 i i ( X, Θ old log px ( n q n, Θ pq n i i ( X, Θ old log pq ( 1 Θ N S S i j j pq ( n 1, q n X, Θ old log pq n n = 2 i = 1 j = 1 - closed-form maximization by differentiation etc. E6820 SAPR - Sequence Recognition - Dan Ellis n i q n 1 (, Θ
27 EM update equations For acoustic model (e.g. 1-D Gauss: i ( X, Θ old x n n pq n µ i = i pq ( n X, Θ old n For transition probabilities: i j j i pq n 1 q n pq ( n q n 1 = n i pq n n 1 (, X, Θ old ( X, Θ old Normalized expectations as for GMMs Require state occupancy probabilities, i N pq ( n X 1, Θold E6820 SAPR - Sequence Recognition - Dan Ellis
28 The forward-backward algorithm i N We need pq ( n X 1 for EM updates (Θ implied Forward algorithm gave us - excludes influence of remaining data Hence, define so that then α n ( i β n ( i pq n i ( X 1 N i n α n ( i = pq ( n, X 1 Recursive definition for β: β n ( i = β n + 1 ( ja ij b j ( x n + 1 j - recurses backwards from final state N N X n + 1 β n ( i = px ( n + 1, = = pq n i N X 1 N q n i (, α n ( i β n ( i ( j β n ( j α j n X 1 n E6820 SAPR - Sequence Recognition - Dan Ellis
29 Estimating a ij from α & β From EM equations: i j j i new pq n 1 q n, pq ( n q n 1 = a ij = n i pq n 1, n - prob. of transition normalized by prob. in start Obtain from = = px n + 1 β n i pq ( n 1,, X Θ old q n j N j j ( q n px n q n (, X Θ old ( X Θ old i j ( pq ( n qn 1 pq ( n 1, ( j b j ( x n a ij α n 1 ( i i n 1 X 1 q i n-1 α n-1 (i a ij b j (x n q n j β n (j E6820 SAPR - Sequence Recognition - Dan Ellis
30 GMM-HMMs in practice GMMs as acoustic models: train by including mixture indices as unknowns - just more complicated equations... Practical GMMs: - 9 to 39 feature dimensions - 2 to 64 Gaussians per mixture depending on number of training examples Training constraint: define state sequence - decide words & pronunciations for training data - EM will then figure out boundaries Lots of data can model more classes - e.g context-independent (CI: q i = ae aa ax... context-dependent (CD: q i = b-ae-b b-ae-k... E6820 SAPR - Sequence Recognition - Dan Ellis
31 HMM training in practice EM only finds local optimum critically dependent on initialization - approximate parameters / rough alignment Model inventory ae 1 ae 2 ae 3 dh 1 dh 2 Labelled training data dh ax k ae t s ae t aa n Initialization parameters Θ init dh ax k ae s ae t aa n t Uniform initialization alignments Repeat until convergence E-step: probabilities of unknowns M-step: maximize via parameters p(q i n X N 1, Θ old dh ax Θ : max E[log p(x,q Θ] k ae Applicable for more than just words... E6820 SAPR - Sequence Recognition - Dan Ellis
32 Outline Dynamic Time Warp Hidden Markov Models HMM training Discriminant models - Being discriminant - Neural-net recognizers E6820 SAPR - Sequence Recognition - Dan Ellis
33 3 Discriminant models EM training of HMMs is maximum likelihood - i.e. local max p(x trn Θ - converges to Bayes optimum... in the limit Decision rule is max p(x M p(m - training will increase p(x M correct - may also increase p(x M wrong (more! Discriminant training tries directly to increase discrimination between right & wrong models - e.g. Maximum Mutual Information (MMI I( M j, X Θ = log pm ( j, X Θ pm ( j ΘpXΘ ( = log pxm ( j, Θ pxm ( k, ΘpM ( k Θ E6820 SAPR - Sequence Recognition - Dan Ellis
34 Neural Network Acoustic Models Single model generates posteriors directly for all classes at once = frame-discriminant Use regular HMM decoder for recognition - set b i ( x n = px ( n q i pq ( i x n pq ( i Nets are less sensitive to input representation - skewed feature distributions - correlated features Can use temporal context window to let net see feature dynamics: Feature calculation C 0 C 1 C 2 h# pcl bcl tcl dcl posteriors p(q i X C k t n t n+w E6820 SAPR - Sequence Recognition - Dan Ellis
35 Forced alignment Best (Viterbi labeling given existing classifier constrained by known word sequence Feature vectors time Existing classifier Known word sequence Dictionary Phone posterior probabilities ow th r iy... ow th r iy n s Constrained alignment Training targets ow th r iy Classifier training E6820 SAPR - Sequence Recognition - Dan Ellis
36 Neural nets: Practicalities Typical net sizes: - input layer: 9 frames x 9-40 features ~ 300 units - hidden layer: units, dep. training data - output layer: context-independent phones Hard to make context dependent - problems training many classes that are similar? Representation is opaque: Hidden -> Output weights output layer (phones feature index time frame Input -> Hidden #187 hidden layer E6820 SAPR - Sequence Recognition - Dan Ellis
37 Recap: Recognizer Structure Feature calculation sound feature vectors Network weights Word models Language model Acoustic classifier HMM decoder phone probabilities phone & word labeling Know how to execute each state Know how to train each stage.. except language/word models E6820 SAPR - Sequence Recognition - Dan Ellis
38 Summary Speech is modeled as a sequence of features - need temporal aspect to recognition - best time-alignment of templates = DTW Hidden Markov models are rigorous solution - self-loops allow temporal dilation - exact, efficient likelihood calculations HMMs can be trained via EM - guaranteed to maximize data likelihood (locally - acoustic models (GMMs can be trained too Discriminant training considers wrong models - recognition requires improved margin - frame-discriminant neural nets give better model discrimination? E6820 SAPR - Sequence Recognition - Dan Ellis
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationIntroduction to Algorithmic Trading Strategies Lecture 2
Introduction to Algorithmic Trading Strategies Lecture 2 Hidden Markov Trading Model Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Carry trade Momentum Valuation CAPM Markov chain
More informationDimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU
1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationStock Option Pricing Using Bayes Filters
Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationHMM : Viterbi algorithm - a toy example
MM : Viterbi algorithm - a toy example.5.3.4.2 et's consider the following simple MM. This model is composed of 2 states, (high C content) and (low C content). We can for example consider that state characterizes
More informationHidden Markov Models 1
Hidden Markov Models 1 Hidden Markov Models A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (modified by Erhard and Car line Rank) Signal Processing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationInferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.
Inferring Probabilistic Models of cis-regulatory Modules MI/S 776 www.biostat.wisc.edu/bmi776/ Spring 2015 olin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationVideo Affective Content Recognition Based on Genetic Algorithm Combined HMM
Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationUsing Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition
1 Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition Émilie CAILLAULT LASL, ULCO/University of Littoral (Calais) Christian VIARD-GAUDIN IRCCyN, University of
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationNLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models
NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Part of Speech (POS) Tagging Given a sentence X, predict its
More informationAutoregressive HMM Based Trajectory Modeling on Social Network
1 Autoregressive HMM Based Trajectory Modeling on Social Network Yen-Cheng Lu, Feng Chen, Chang-Tien-Lu (kevinlu@vt.edu, chenf@vt.edu, ctlu@vt.edu ) Department of Computer Science, Virginia Tech, USA Abstract
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationHidden Markov Models Chapter 15
Hidden Markov Models Chapter 15 Mausam (Slides based on Dan Klein, Luke Zettlemoyer, Alex Simma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan
More informationComp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationStructured Prediction Models for Chord Transcription of Music Audio
Structured Prediction Models for Chord Transcription of Music Audio Adrian Weller, Daniel Ellis, Tony Jebara Columbia University, New York, NY 10027 aw2506@columbia.edu, dpwe@ee.columbia.edu, jebara@cs.columbia.edu
More informationTracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
More informationPhrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.
Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Translational Equivalence Er hat die Prüfung bestanden, jedoch
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative
More informationMulti-Class and Structured Classification
Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationChapter 6. Decoding. Statistical Machine Translation
Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for translation p(e f) Task of decoding: find the translation e best with highest probability Two types of error
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationBayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4
Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationSPRACH - WP 6 & 8: Software engineering work at ICSI
SPRACH - WP 6 & 8: Software engineering work at ICSI March 1998 Dan Ellis International Computer Science Institute, Berkeley CA 1 2 3 Hardware: MultiSPERT Software: speech & visualization
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationMoral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania
Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationCOMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS
COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS Iris Ginzburg and David Horn School of Physics and Astronomy Raymond and Beverly Sackler Faculty of Exact Science Tel-Aviv University Tel-A viv 96678,
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 Data balancing for efficient training of Hybrid ANN/HMM Automatic Speech Recognition systems Ana Isabel García-Moral,
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationLanguage Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationReinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationMachine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
More informationSemi-Supervised Learning in Inferring Mobile Device Locations
Semi-Supervised Learning in Inferring Mobile Device Locations Rong Duan, Olivia Hong, Guangqin Ma rduan,ohong,gma @research.att.com AT&T Labs 200 S Laurel Ave Middletown, NJ 07748 Abstract With the development
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationGlobally Optimal Crowdsourcing Quality Management
Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford
More informationMachine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
More informationOn the Interaction and Competition among Internet Service Providers
On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers
More informationBiometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationInformation Leakage in Encrypted Network Traffic
Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationData Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
More informationClassification with Hybrid Generative/Discriminative Models
Classification with Hybrid Generative/Discriminative Models Rajat Raina, Yirong Shen, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 Andrew McCallum Department of Computer
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationData a systematic approach
Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in
More informationProbabilistic Credit Card Fraud Detection System in Online Transactions
Probabilistic Credit Card Fraud Detection System in Online Transactions S. O. Falaki 1, B. K. Alese 1, O. S. Adewale 1, J. O. Ayeni 2, G. A. Aderounmu 3 and W. O. Ismaila 4 * 1 Federal University of Technology,
More informationIntroduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationSolving simultaneous equations using the inverse matrix
Solving simultaneous equations using the inverse matrix 8.2 Introduction The power of matrix algebra is seen in the representation of a system of simultaneous linear equations as a matrix equation. Matrix
More informationLinear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
More informationDynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More information