Tekniker för storskalig parsning


 Claribel Hancock
 3 years ago
 Views:
Transcription
1 Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi Tekniker för storskalig parsning 1(19)
2 Generative Models A generative statistical model defines the joint probability P(x, y) of input x and output y Pros: Learning problems have closed form solutions Related probabilities can be derived: Conditionalization: P(y x) = P(x,y) P(x) Marginalization: P(x) = P y P(x, y) Cons: Rigid independence assumptions (or intractable parsing) Indirect modeling of parsing problem Tekniker för storskalig parsning 2(19)
3 Discriminative Models A discriminative statistical model defines the conditional probability P(y x) of output y given input x Pros: No rigid independence assumptions More direct modeling of parsing problem Cons: Learning problems require numerical approximation Related probabilities cannot be derived: No way to compute P(x, y) from P(y x) No way to compute P(x) or P(y) from P(y x) Tekniker för storskalig parsning 3(19)
4 Conditional and Discriminative Models Subdivision of discriminative models Conditional model: Explicitly model the conditional probability P(y x) Use model in mapping X Y: argmax y P(y x) Purely discriminative model: Directly optimize mapping X Y No explicit model of conditional probability P(y x) Tekniker för storskalig parsning 4(19)
5 Local and Global Models Local discriminative models: Maximize probability (or accuracy) of local decisions in the derivation of analysis y given input x Find globally optimal solution by making a sequence of locally optimal decisions Global discriminative models: Maximize probability (or accuracy) of complete analysis y given input x Examples: Local: Transitionbased dependency parsing Global: Graphbased dependency parsing Tekniker för storskalig parsning 5(19)
6 Local Discriminative Models Conditional historybased model: m P(y x) = P(d i Φ(d 1,..., d i 1, x)) i=1 Probabilities can be conditioned on properties of the input For example: Lookahead in lefttoright derivations Compare generative model: m P(x, y) = P(d i Φ(d 1,..., d i 1 )) i=1 Tekniker för storskalig parsning 6(19)
7 Parsing Model GEN(x): Defined by derivational process (for example, transition system) EVAL(y): Score local decisions, conditioned on input and history Combine local scores into global scores Tekniker för storskalig parsning 7(19)
8 Inference Local discriminative models typically use greedy inference: Deterministic bestfirst search Beam search with agenda of k best hypotheses Properties: Very efficient Reasonably accurate thanks to lookahead No guarantee that globally best solution is found Tekniker för storskalig parsning 8(19)
9 Learning Learning problem: Local decision, conditioned on input and history Conditional: Estimate P(d i Φ(d 1,..., d i 1, x)) Training: Conditional MLE (more later) Purely discriminative: Optimize mapping Φ(d 1,..., d i 1, x) d i Training: Any classifier (SVM, Perceptron, CMLE,... ) Tekniker för storskalig parsning 9(19)
10 Evaluation Criteria Robustness: Yes (same as generative historybased models) Disambiguation: Yes, thanks to probability model or classifier Accuracy: Sometimes state of the art Efficiency: Very good (often linear complexity) Tekniker för storskalig parsning 10(19)
11 Global Discriminative Models Global models: No specific factorization of P(y x) Features can be defined over arbitrary substructures Training optimizes probability/accuracy of global structures In practice, some shortcuts are always necessary: Restricted scope of features Approximate inference Tekniker för storskalig parsning 11(19)
12 Parsing Model GEN(x): Formal grammar: HPSG [Toutanova et al. 2002, Miyao et al. 2003] LFG [Riezler et al. 2002, Kaplan et al. 2004] CCG [Clark and Curran 2004] Generative statistical parser (reranking) [Charniak and Johnson 2005] All possible trees over some alphabets [Taskar et al. 2004, McDonald et al. 2005] EVAL(y): Score related to P(y x) Tekniker för storskalig parsning 12(19)
13 Inference Exact parsing with global model is intractable Strategy 1: Use dynamic programming (chart parsing) Restrict feature scope Example: Graphbased dependency parsing (projective) Strategy 2: Use an independent generative component Restrict GEN(x) to make exact inference feasible Examples: Grammardriven parsers for HPSG, LFG, CCG Reranking parsers using kbest list from statistical parser Tekniker för storskalig parsning 13(19)
14 Learning 1: Linear Classifiers Score S(x, y) defined as inner product of two vectors: S(x, y) = f(x, y) w = k w i f i (x, y) i=1 Feature vector: f(x, y) = f1 (x, y),..., f k (x, y) Weight vector: w = w 1,..., w k Note: Each fi (x, y) is a numerical feature of x and y Each wi is a realvalued weight for f i (x, y): Positive if f i (x, y) tends to occur in good trees Negative if f i (x, y) tends to occur in bad trees S(x, y) summarizes the evidence of all (nonzero) features Tekniker för storskalig parsning 14(19)
15 Learning 2: Discriminative Training Find weights that maximize accuracy on training set Training criterion (for all y in training set): y = argmax y GEN(x)f(x, y ) w Examples: Perceptron learning Support vector machines Tekniker för storskalig parsning 15(19)
16 Learning 3: LogLinear Models Transform score to conditional probability: P(y x) = exp [f(x, y) w] y GEN(x) exp [f(x, y ) w] Note: exp [f(x, y) w] > 0 exp [f(x, y) w] y GEN(x) exp [f(x, y ) w] If exp a = b, then a = log b log a b = log a + log b Linear sum of products corresponds to log of probability Tekniker för storskalig parsning 16(19)
17 Learning 4: Conditional MLE Joint MLE: Find estimate that maximizes P(x, y) for training set Easy: Relative frequencies (analytical, closed form solution) Conditional MLE: Find estimate that maximizes P(y x) for training set Hard: No closed form solution Numerical optimization: Many methods: Iterative scaling, gradient descent,... Computationally intensive Guaranteed to converge to global maximum Tekniker för storskalig parsning 17(19)
18 Evaluation Criteria Robustness: Depends on GEN(x) Disambiguation: Yes, same as other statistical models Efficiency: Not so good, especially during training Accuracy: Currently the state of the art Tekniker för storskalig parsning 18(19)
19 Summary Discriminative models focus on conditional distribution P(y x) Pros: No rigid independence assumptions more global features Easy to combine with different base parsers Cons: Learning requires computationally intensive numerical methods Inference is often intractable and requires approximation Loglinear models are most widely used Tekniker för storskalig parsning 19(19)
20 References and Further Reading Eugene Charniak and Mark Johnson Coarsetofine nbest parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages Stephen Clark and James R. Curran Parsing the WSJ using CCG and loglinear models. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pages Ronald M. Kaplan, Stefan Riezler, Tracy Holloway King, John T. Maxwell III, Alexander Vasserman, and Richard Crouch Speed and accuracy in shallow and deep stochastic parsing. In Proceedings of Human Language Technology and the Conference of the North American Chapter of the Association for Computational Linguistics (HLTNAACL), pages Ryan McDonald, Koby Crammer, and Fernando Pereira Online largemargin training of dependency parsers. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages Yusuke Miyao, T. Ninomiya, and Jun ichi Tsujii Tekniker för storskalig parsning 19(19)
21 Probabilistic modeling of argument structures including nonlocal dependencies. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pages Stephan Riezler, Margaret H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, and Mark Johnson Parsing the Wall Street Journal using a LexicalFunctional Grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning Maxmargin parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1 8. Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen Parse disambiguation for a rich HPSG grammar. In Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT), pages Tekniker för storskalig parsning 19(19)
Online LargeMargin Training of Dependency Parsers
Online LargeMargin Training of Dependency Parsers Ryan McDonald Koby Crammer Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA {ryantm,crammer,pereira}@cis.upenn.edu
More informationACS Syntax and Semantics of Natural Language Lecture 8: Statistical Parsing Models for CCG
ACS Syntax and Semantics of Natural Language Lecture 8: Statistical Parsing Models for CCG Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk Parsing Models for CCG
More informationDEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. ArcFactored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
More informationEffective SelfTraining for Parsing
Effective SelfTraining for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky  dmcc@cs.brown.edu
More informationOnline Learning of Approximate Dependency Parsing Algorithms
Online Learning of Approximate Dependency Parsing Algorithms Ryan McDonald Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 {ryantm,pereira}@cis.upenn.edu
More informationTransitionBased Dependency Parsing with Long Distance Collocations
TransitionBased Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
More informationShiftReduce Constituency Parsing with Dynamic Programming and POS Tag Lattice
ShiftReduce Constituency Parsing with Dynamic Programming and POS Tag Lattice Haitao Mi T.J. Watson Research Center IBM hmi@us.ibm.com Liang Huang Queens College & Graduate Center City University of New
More informationlargescale machine learning revisited Léon Bottou Microsoft Research (NYC)
largescale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationSemantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models LeMinh Nguyen, Akira Shimazu, and XuanHieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 11, Nomi, Ishikawa,
More informationStructured Models for FinetoCoarse Sentiment Analysis
Structured Models for FinetoCoarse Sentiment Analysis Ryan McDonald Kerry Hannan Tyler Neylon Mike Wells Jeff Reynar Google, Inc. 76 Ninth Avenue New York, NY 10011 Contact email: ryanmcd@google.com
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPeking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing Yantao Du, Fan Zhang, Weiwei Sun and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationAnalysis of Representations for Domain Adaptation
Analysis of Representations for Domain Adaptation Shai BenDavid School of Computer Science University of Waterloo shai@cs.uwaterloo.ca John Blitzer, Koby Crammer, and Fernando Pereira Department of Computer
More informationA Systematic CrossComparison of Sequence Classifiers
A Systematic CrossComparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko BarIlan University, Computer Science Department, Israel grurgrur@gmail.com, feldman@cs.biu.ac.il,
More informationTaskoriented Evaluation of Syntactic Parsers and Their Representations
Taskoriented Evaluation of Syntactic Parsers and Their Representations Yusuke Miyao Rune Sætre Kenji Sagae Takuya Matsuzaki Jun ichi Tsujii Department of Computer Science, University of Tokyo, Japan School
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationBareBones Dependency Parsing
BareBones Dependency Parsing A Case for Occam s Razor? Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se BareBones Dependency Parsing 1(30) Introduction
More informationIntroduction to DataDriven Dependency Parsing
Introduction to DataDriven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA Email: ryanmcd@google.com 2 Uppsala University and Växjö University,
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationGuest Editors Introduction: Machine Learning in Speech and Language Technologies
Guest Editors Introduction: Machine Learning in Speech and Language Technologies Pascale Fung (pascale@ee.ust.hk) Department of Electrical and Electronic Engineering Hong Kong University of Science and
More informationSVD and Clustering for Unsupervised POS Tagging
SVD and Clustering for Unsupervised POS Tagging Michael Lamar* Division of Applied Mathematics Brown University Providence, RI, USA mlamar@dam.brown.edu Mark Johnson Department of Computing Faculty of
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationIntroduction to Machine Learning
Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:
More informationA Fast Decoder for Joint Word Segmentation and POSTagging Using a Single Discriminative Model
A Fast Decoder for Joint Word Segmentation and POSTagging Using a Single Discriminative Model Yue Zhang and Stephen Clark University of Cambridge Computer Laboratory William Gates Building, 15 JJ Thomson
More informationChallenges of Cloud Scale Natural Language Processing
Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent
More informationAn EndtoEnd Discriminative Approach to Machine Translation
An EndtoEnd Discriminative Approach to Machine Translation Percy Liang Alexandre BouchardCôté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,
More informationDomain Adaptation for Dependency Parsing via Selftraining
Domain Adaptation for Dependency Parsing via Selftraining Juntao Yu 1, Mohab Elkaref 1, Bernd Bohnet 2 1 University of Birmingham, Birmingham, UK 2 Google, London, UK 1 {j.yu.1, m.e.a.r.elkaref}@cs.bham.ac.uk,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTIChicago mcallester@ttic.edu Tamir Hazan TTIChicago tamir@ttic.edu Joseph Keshet TTIChicago jkeshet@ttic.edu Abstract In discriminative
More informationWes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].
Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a twoclass logistic regression problem with x R. Characterize the maximumlikelihood estimates of the slope and intercept parameter if the sample for
More informationImproved Discriminative Bilingual Word Alignment
Improved Discriminative Bilingual Word Alignment Robert C. Moore Wentau Yih Andreas Bode Microsoft Research Redmond, WA 98052, USA {bobmoore,scottyhi,abode}@microsoft.com Abstract For many years, statistical
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationLexicalized Stochastic Modeling of ConstraintBased Grammars using LogLinear Measures and EM Training Stefan Riezler IMS, Universit t Stuttgart riezler@ims.unistuttgart.de Jonas Kuhn IMS, Universit t
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Classifiers Lisbon Machine Learning School, 2015 Shay Cohen School of Informatics, University of Edinburgh Email: scohen@inf.ed.ac.uk Slides heavily based on Ryan
More informationEasyFirst, Chinese, POS Tagging and Dependency Parsing 1
EasyFirst, Chinese, POS Tagging and Dependency Parsing 1 ABSTRACT Ji Ma, Tong Xiao, Jing Bo Zhu, Fei Liang Ren Natural Language Processing Laboratory, Northeastern University, Shenyang, China majineu@outlook.com,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationA Separately PassiveAggressive Training Algorithm for Joint POS Tagging and Dependency Parsing
A Separately PassiveAggressive Training Algorithm for Joint POS Tagging and Dependency Parsing Zhenghua Li 1, Min Zhang 2, Wanxiang Che 1, Ting Liu 1 (1) Research Center for Social Computing and Information
More informationNATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
More informationApplying CoTraining Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu
Applying CoTraining Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and humanbased
More informationTransitionBased Natural Language Parsing with Dependency and Constituency Representations
Acta Wexionensia No 152/2008 Computer Science TransitionBased Natural Language Parsing with Dependency and Constituency Representations Johan Hall Växjö University Press TransitionBased Natural Language
More informationLess Grammar, More Features
Less Grammar, More Features David Hall Greg Durrett Dan Klein Computer Science Division University of California, Berkeley {dlwh,gdurrett,klein}@cs.berkeley.edu Abstract We present a parser that relies
More informationAccelerated Training of Maximum Margin Markov Models for Sequence Labeling: A Case Study of NP Chunking
Accelerated Training of Maximum Margin Markov Models for Sequence Labeling: A Case Study of NP Chunking Xiaofeng YU Wai LAM Information Systems Laboratory Department of Systems Engineering & Engineering
More informationLearning and Inference over Constrained Output
IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wentau Yih Dav Zimak Department of Computer Science University of Illinois at UrbanaChampaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationDistributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationLearning to Rank Revisited: Our Progresses in New Algorithms and Tasks
The 4 th ChinaAustralia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationShallow Parsing with Conditional Random Fields
Shallow Parsing with Conditional Random Fields Proceedings of HLTNAACL 2003 Main Papers, pp. 134141 Edmonton, MayJune 2003 Fei Sha and Fernando Pereira Department of Computer and Information Science
More informationSibyl: a system for large scale machine learning
Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning
More informationIntroduction to Logistic Regression
OpenStaxCNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStaxCNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK  michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationLearning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
: Approximate Large Margin Methods for Structured Prediction Hal Daumé III Daniel Marcu Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292 USA hdaume@isi.edu marcu@isi.edu Abstract
More informationA Shortestpath Method for Arcfactored Semantic Role Labeling
A Shortestpath Method for Arcfactored Semantic Role Labeling Xavier Lluís TALP Research Center Universitat Politècnica de Catalunya xlluis@cs.upc.edu Xavier Carreras Xerox Research Centre Europe xavier.carreras@xrce.xerox.com
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationLogLinear Models. Michael Collins
LogLinear Models Michael Collins 1 Introduction This note describes loglinear models, which are very widely used in natural language processing. A key advantage of loglinear models is their flexibility:
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationLatentDescriptor Clustering for Unsupervised POS Induction
LatentDescriptor Clustering for Unsupervised POS Induction Michael Lamar Department of Mathematics and Computer Science Saint Louis University 220 N. Grand Blvd. St.Louis, MO 63103, USA mlamar@slu.edu
More informationOnline LargeMargin Training of Syntactic and Structural Translation Features
Online LargeMargin Training of Syntactic and Structural Translation Features David Chiang Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey,
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationMapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data
More informationSemantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing
Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationCrossTask KnowledgeConstrained Self Training
CrossTask KnowledgeConstrained Self Training Hal Daumé III School of Computing University of Utah Salt Lake City, UT 84112 me@hal3.name Abstract We present an algorithmic framework for learning multiple
More informationOnline learning for Deterministic Dependency Parsing
Online learning for Deterministic Dependency Parsing Prashanth Reddy Mannem Language Technologies Research Center IIITHyderabad, India prashanth@research.iiit.ac.in Abstract Deterministic parsing has
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationDifferential privacy in health care analytics and medical research An interactive tutorial
Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could
More informationJoint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases
Joint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases Daniel Dahlmeier 1, Hwee Tou Ng 1,2, Tanja Schultz 3 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationWith blinkers on: robust prediction of eye movements across readers
With blinkers on: robust prediction of eye movements across readers Franz Matties and Anders Søgaard University of Copenhagen Njalsgade 142 DK2300 Copenhagen S Email: soegaard@hum.ku.dk Abstract Nilsson
More informationChapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
More informationDetecting Parser Errors Using Webbased Semantic Filters
Detecting Parser Errors Using Webbased Semantic Filters Alexander Yates Stefan Schoenmackers University of Washington Computer Science and Engineering Box 352350 Seattle, WA 981952350 Oren Etzioni {ayates,
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal  the stuff biology is not
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationA* CCG Parsing with a Supertagfactored Model
A* CCG Parsing with a Supertagfactored Model Mike Lewis School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK mike.lewis@ed.ac.uk Mark Steedman School of Informatics University of Edinburgh
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka loglinear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationNbest. Nbest Reranking Using Optimal Phrase Alignment for Statistical Machine Translation
Vol. 51 No. 8 1443 1451 (Aug. 2010) Nbest 1 2 3 4 1 Nbest Reranking Using Optimal Phrase Alignment for Statistical Machine Translation Mitsuru Koshikawa, 1 Masao Utiyama, 2 Shunji Umetani, 3 Tomomi Matsui
More informationMingWei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.
MingWei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at UrbanaChampaign, Urbana, IL 61801 +1 (917) 3456125 mchang21@uiuc.edu http://flake.cs.uiuc.edu/~mchang21 Research
More informationAutomatic Detection and Correction of Errors in Dependency Treebanks
Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany alexander.volokh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg
More informationA CCG Parsing with a Supertagfactored Model
A CCG Parsing with a Supertagfactored Model Mike Lewis School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK mike.lewis@ed.ac.uk Mark Steedman School of Informatics University of Edinburgh
More informationLecture 6: Logistic Regression
Lecture 6: CS 19410, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationWhy language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
More informationRuntime Hardware Reconfiguration using Machine Learning
Runtime Hardware Reconfiguration using Machine Learning Tanmay Gangwani University of Illinois, UrbanaChampaign gangwan2@illinois.edu Abstract Tailoring the machine hardware to varying needs of the software
More informationLarge Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More information