# Tekniker för storskalig parsning

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi Tekniker för storskalig parsning 1(19)

2 Generative Models A generative statistical model defines the joint probability P(x, y) of input x and output y Pros: Learning problems have closed form solutions Related probabilities can be derived: Conditionalization: P(y x) = P(x,y) P(x) Marginalization: P(x) = P y P(x, y) Cons: Rigid independence assumptions (or intractable parsing) Indirect modeling of parsing problem Tekniker för storskalig parsning 2(19)

3 Discriminative Models A discriminative statistical model defines the conditional probability P(y x) of output y given input x Pros: No rigid independence assumptions More direct modeling of parsing problem Cons: Learning problems require numerical approximation Related probabilities cannot be derived: No way to compute P(x, y) from P(y x) No way to compute P(x) or P(y) from P(y x) Tekniker för storskalig parsning 3(19)

4 Conditional and Discriminative Models Subdivision of discriminative models Conditional model: Explicitly model the conditional probability P(y x) Use model in mapping X Y: argmax y P(y x) Purely discriminative model: Directly optimize mapping X Y No explicit model of conditional probability P(y x) Tekniker för storskalig parsning 4(19)

5 Local and Global Models Local discriminative models: Maximize probability (or accuracy) of local decisions in the derivation of analysis y given input x Find globally optimal solution by making a sequence of locally optimal decisions Global discriminative models: Maximize probability (or accuracy) of complete analysis y given input x Examples: Local: Transition-based dependency parsing Global: Graph-based dependency parsing Tekniker för storskalig parsning 5(19)

6 Local Discriminative Models Conditional history-based model: m P(y x) = P(d i Φ(d 1,..., d i 1, x)) i=1 Probabilities can be conditioned on properties of the input For example: Lookahead in left-to-right derivations Compare generative model: m P(x, y) = P(d i Φ(d 1,..., d i 1 )) i=1 Tekniker för storskalig parsning 6(19)

7 Parsing Model GEN(x): Defined by derivational process (for example, transition system) EVAL(y): Score local decisions, conditioned on input and history Combine local scores into global scores Tekniker för storskalig parsning 7(19)

8 Inference Local discriminative models typically use greedy inference: Deterministic best-first search Beam search with agenda of k best hypotheses Properties: Very efficient Reasonably accurate thanks to lookahead No guarantee that globally best solution is found Tekniker för storskalig parsning 8(19)

9 Learning Learning problem: Local decision, conditioned on input and history Conditional: Estimate P(d i Φ(d 1,..., d i 1, x)) Training: Conditional MLE (more later) Purely discriminative: Optimize mapping Φ(d 1,..., d i 1, x) d i Training: Any classifier (SVM, Perceptron, CMLE,... ) Tekniker för storskalig parsning 9(19)

10 Evaluation Criteria Robustness: Yes (same as generative history-based models) Disambiguation: Yes, thanks to probability model or classifier Accuracy: Sometimes state of the art Efficiency: Very good (often linear complexity) Tekniker för storskalig parsning 10(19)

11 Global Discriminative Models Global models: No specific factorization of P(y x) Features can be defined over arbitrary substructures Training optimizes probability/accuracy of global structures In practice, some shortcuts are always necessary: Restricted scope of features Approximate inference Tekniker för storskalig parsning 11(19)

12 Parsing Model GEN(x): Formal grammar: HPSG [Toutanova et al. 2002, Miyao et al. 2003] LFG [Riezler et al. 2002, Kaplan et al. 2004] CCG [Clark and Curran 2004] Generative statistical parser (reranking) [Charniak and Johnson 2005] All possible trees over some alphabets [Taskar et al. 2004, McDonald et al. 2005] EVAL(y): Score related to P(y x) Tekniker för storskalig parsning 12(19)

13 Inference Exact parsing with global model is intractable Strategy 1: Use dynamic programming (chart parsing) Restrict feature scope Example: Graph-based dependency parsing (projective) Strategy 2: Use an independent generative component Restrict GEN(x) to make exact inference feasible Examples: Grammar-driven parsers for HPSG, LFG, CCG Reranking parsers using k-best list from statistical parser Tekniker för storskalig parsning 13(19)

14 Learning 1: Linear Classifiers Score S(x, y) defined as inner product of two vectors: S(x, y) = f(x, y) w = k w i f i (x, y) i=1 Feature vector: f(x, y) = f1 (x, y),..., f k (x, y) Weight vector: w = w 1,..., w k Note: Each fi (x, y) is a numerical feature of x and y Each wi is a real-valued weight for f i (x, y): Positive if f i (x, y) tends to occur in good trees Negative if f i (x, y) tends to occur in bad trees S(x, y) summarizes the evidence of all (non-zero) features Tekniker för storskalig parsning 14(19)

15 Learning 2: Discriminative Training Find weights that maximize accuracy on training set Training criterion (for all y in training set): y = argmax y GEN(x)f(x, y ) w Examples: Perceptron learning Support vector machines Tekniker för storskalig parsning 15(19)

16 Learning 3: Log-Linear Models Transform score to conditional probability: P(y x) = exp [f(x, y) w] y GEN(x) exp [f(x, y ) w] Note: exp [f(x, y) w] > 0 exp [f(x, y) w] y GEN(x) exp [f(x, y ) w] If exp a = b, then a = log b log a b = log a + log b Linear sum of products corresponds to log of probability Tekniker för storskalig parsning 16(19)

17 Learning 4: Conditional MLE Joint MLE: Find estimate that maximizes P(x, y) for training set Easy: Relative frequencies (analytical, closed form solution) Conditional MLE: Find estimate that maximizes P(y x) for training set Hard: No closed form solution Numerical optimization: Many methods: Iterative scaling, gradient descent,... Computationally intensive Guaranteed to converge to global maximum Tekniker för storskalig parsning 17(19)

18 Evaluation Criteria Robustness: Depends on GEN(x) Disambiguation: Yes, same as other statistical models Efficiency: Not so good, especially during training Accuracy: Currently the state of the art Tekniker för storskalig parsning 18(19)

19 Summary Discriminative models focus on conditional distribution P(y x) Pros: No rigid independence assumptions more global features Easy to combine with different base parsers Cons: Learning requires computationally intensive numerical methods Inference is often intractable and requires approximation Log-linear models are most widely used Tekniker för storskalig parsning 19(19)

20 References and Further Reading Eugene Charniak and Mark Johnson Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages Stephen Clark and James R. Curran Parsing the WSJ using CCG and log-linear models. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pages Ronald M. Kaplan, Stefan Riezler, Tracy Holloway King, John T. Maxwell III, Alexander Vasserman, and Richard Crouch Speed and accuracy in shallow and deep stochastic parsing. In Proceedings of Human Language Technology and the Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages Ryan McDonald, Koby Crammer, and Fernando Pereira Online large-margin training of dependency parsers. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages Yusuke Miyao, T. Ninomiya, and Jun ichi Tsujii Tekniker för storskalig parsning 19(19)

21 Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pages Stephan Riezler, Margaret H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, and Mark Johnson Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning Max-margin parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1 8. Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen Parse disambiguation for a rich HPSG grammar. In Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT), pages Tekniker för storskalig parsning 19(19)

### Online Large-Margin Training of Dependency Parsers

Online Large-Margin Training of Dependency Parsers Ryan McDonald Koby Crammer Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA {ryantm,crammer,pereira}@cis.upenn.edu

### ACS Syntax and Semantics of Natural Language Lecture 8: Statistical Parsing Models for CCG

ACS Syntax and Semantics of Natural Language Lecture 8: Statistical Parsing Models for CCG Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk Parsing Models for CCG

### DEPENDENCY PARSING JOAKIM NIVRE

DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes

### Effective Self-Training for Parsing

Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

### Online Learning of Approximate Dependency Parsing Algorithms

Online Learning of Approximate Dependency Parsing Algorithms Ryan McDonald Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 {ryantm,pereira}@cis.upenn.edu

### Transition-Based Dependency Parsing with Long Distance Collocations

Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,

### Shift-Reduce Constituency Parsing with Dynamic Programming and POS Tag Lattice

Shift-Reduce Constituency Parsing with Dynamic Programming and POS Tag Lattice Haitao Mi T.J. Watson Research Center IBM hmi@us.ibm.com Liang Huang Queens College & Graduate Center City University of New

### large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

### Semantic parsing with Structured SVM Ensemble Classification Models

Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,

### Structured Models for Fine-to-Coarse Sentiment Analysis

Structured Models for Fine-to-Coarse Sentiment Analysis Ryan McDonald Kerry Hannan Tyler Neylon Mike Wells Jeff Reynar Google, Inc. 76 Ninth Avenue New York, NY 10011 Contact email: ryanmcd@google.com

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing

Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing Yantao Du, Fan Zhang, Weiwei Sun and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Analysis of Representations for Domain Adaptation

Analysis of Representations for Domain Adaptation Shai Ben-David School of Computer Science University of Waterloo shai@cs.uwaterloo.ca John Blitzer, Koby Crammer, and Fernando Pereira Department of Computer

### A Systematic Cross-Comparison of Sequence Classifiers

A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel grurgrur@gmail.com, feldman@cs.biu.ac.il,

### Task-oriented Evaluation of Syntactic Parsers and Their Representations

Task-oriented Evaluation of Syntactic Parsers and Their Representations Yusuke Miyao Rune Sætre Kenji Sagae Takuya Matsuzaki Jun ichi Tsujii Department of Computer Science, University of Tokyo, Japan School

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### Bare-Bones Dependency Parsing

Bare-Bones Dependency Parsing A Case for Occam s Razor? Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Bare-Bones Dependency Parsing 1(30) Introduction

### Introduction to Data-Driven Dependency Parsing

Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,

### Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

### Guest Editors Introduction: Machine Learning in Speech and Language Technologies

Guest Editors Introduction: Machine Learning in Speech and Language Technologies Pascale Fung (pascale@ee.ust.hk) Department of Electrical and Electronic Engineering Hong Kong University of Science and

### SVD and Clustering for Unsupervised POS Tagging

SVD and Clustering for Unsupervised POS Tagging Michael Lamar* Division of Applied Mathematics Brown University Providence, RI, USA mlamar@dam.brown.edu Mark Johnson Department of Computing Faculty of

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model

A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model Yue Zhang and Stephen Clark University of Cambridge Computer Laboratory William Gates Building, 15 JJ Thomson

### Challenges of Cloud Scale Natural Language Processing

Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent

### An End-to-End Discriminative Approach to Machine Translation

An End-to-End Discriminative Approach to Machine Translation Percy Liang Alexandre Bouchard-Côté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,

### Domain Adaptation for Dependency Parsing via Self-training

Domain Adaptation for Dependency Parsing via Self-training Juntao Yu 1, Mohab Elkaref 1, Bernd Bohnet 2 1 University of Birmingham, Birmingham, UK 2 Google, London, UK 1 {j.yu.1, m.e.a.r.elkaref}@cs.bham.ac.uk,

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Direct Loss Minimization for Structured Prediction

Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative

### Wes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a two-class logistic regression problem with x R. Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample for

### Improved Discriminative Bilingual Word Alignment

Improved Discriminative Bilingual Word Alignment Robert C. Moore Wen-tau Yih Andreas Bode Microsoft Research Redmond, WA 98052, USA {bobmoore,scottyhi,abode}@microsoft.com Abstract For many years, statistical

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training Stefan Riezler IMS, Universit t Stuttgart riezler@ims.uni-stuttgart.de Jonas Kuhn IMS, Universit t

### Introduction to Machine Learning

Introduction to Machine Learning Linear Classifiers Lisbon Machine Learning School, 2015 Shay Cohen School of Informatics, University of Edinburgh E-mail: scohen@inf.ed.ac.uk Slides heavily based on Ryan

### Easy-First, Chinese, POS Tagging and Dependency Parsing 1

Easy-First, Chinese, POS Tagging and Dependency Parsing 1 ABSTRACT Ji Ma, Tong Xiao, Jing Bo Zhu, Fei Liang Ren Natural Language Processing Laboratory, Northeastern University, Shenyang, China majineu@outlook.com,

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

### Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### A Separately Passive-Aggressive Training Algorithm for Joint POS Tagging and Dependency Parsing

A Separately Passive-Aggressive Training Algorithm for Joint POS Tagging and Dependency Parsing Zhenghua Li 1, Min Zhang 2, Wanxiang Che 1, Ting Liu 1 (1) Research Center for Social Computing and Information

### NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

### Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu

Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based

### Transition-Based Natural Language Parsing with Dependency and Constituency Representations

Acta Wexionensia No 152/2008 Computer Science Transition-Based Natural Language Parsing with Dependency and Constituency Representations Johan Hall Växjö University Press Transition-Based Natural Language

### Less Grammar, More Features

Less Grammar, More Features David Hall Greg Durrett Dan Klein Computer Science Division University of California, Berkeley {dlwh,gdurrett,klein}@cs.berkeley.edu Abstract We present a parser that relies

### Accelerated Training of Maximum Margin Markov Models for Sequence Labeling: A Case Study of NP Chunking

Accelerated Training of Maximum Margin Markov Models for Sequence Labeling: A Case Study of NP Chunking Xiaofeng YU Wai LAM Information Systems Laboratory Department of Systems Engineering & Engineering

### Learning and Inference over Constrained Output

IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu

### CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

### Distributed Structured Prediction for Big Data

Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning

### Lecture 2: The SVM classifier

Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks

The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Shallow Parsing with Conditional Random Fields

Shallow Parsing with Conditional Random Fields Proceedings of HLT-NAACL 2003 Main Papers, pp. 134-141 Edmonton, May-June 2003 Fei Sha and Fernando Pereira Department of Computer and Information Science

### Sibyl: a system for large scale machine learning

Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

### Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction

: Approximate Large Margin Methods for Structured Prediction Hal Daumé III Daniel Marcu Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292 USA hdaume@isi.edu marcu@isi.edu Abstract

### A Shortest-path Method for Arc-factored Semantic Role Labeling

A Shortest-path Method for Arc-factored Semantic Role Labeling Xavier Lluís TALP Research Center Universitat Politècnica de Catalunya xlluis@cs.upc.edu Xavier Carreras Xerox Research Centre Europe xavier.carreras@xrce.xerox.com

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Log-Linear Models. Michael Collins

Log-Linear Models Michael Collins 1 Introduction This note describes log-linear models, which are very widely used in natural language processing. A key advantage of log-linear models is their flexibility:

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

### Latent-Descriptor Clustering for Unsupervised POS Induction

Latent-Descriptor Clustering for Unsupervised POS Induction Michael Lamar Department of Mathematics and Computer Science Saint Louis University 220 N. Grand Blvd. St.Louis, MO 63103, USA mlamar@slu.edu

### Online Large-Margin Training of Syntactic and Structural Translation Features

Online Large-Margin Training of Syntactic and Structural Translation Features David Chiang Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey,

### Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

### MapReduce/Bigtable for Distributed Optimization

MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data

### Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing

Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### Cross-Task Knowledge-Constrained Self Training

Cross-Task Knowledge-Constrained Self Training Hal Daumé III School of Computing University of Utah Salt Lake City, UT 84112 me@hal3.name Abstract We present an algorithmic framework for learning multiple

### Online learning for Deterministic Dependency Parsing

Online learning for Deterministic Dependency Parsing Prashanth Reddy Mannem Language Technologies Research Center IIIT-Hyderabad, India prashanth@research.iiit.ac.in Abstract Deterministic parsing has

### Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

### Differential privacy in health care analytics and medical research An interactive tutorial

Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could

### Joint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases

Joint Learning of Preposition Senses and Semantic Roles of Prepositional Phrases Daniel Dahlmeier 1, Hwee Tou Ng 1,2, Tanja Schultz 3 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department

### LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

### The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

### With blinkers on: robust prediction of eye movements across readers

With blinkers on: robust prediction of eye movements across readers Franz Matties and Anders Søgaard University of Copenhagen Njalsgade 142 DK-2300 Copenhagen S Email: soegaard@hum.ku.dk Abstract Nilsson

### Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

### Detecting Parser Errors Using Web-based Semantic Filters

Detecting Parser Errors Using Web-based Semantic Filters Alexander Yates Stefan Schoenmackers University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195-2350 Oren Etzioni {ayates,

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### Introduction to Online Learning Theory

Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

### A* CCG Parsing with a Supertag-factored Model

A* CCG Parsing with a Supertag-factored Model Mike Lewis School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK mike.lewis@ed.ac.uk Mark Steedman School of Informatics University of Edinburgh

### Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15

### N-best. N-best Reranking Using Optimal Phrase Alignment for Statistical Machine Translation

Vol. 51 No. 8 1443 1451 (Aug. 2010) N-best 1 2 3 4 1 N-best Reranking Using Optimal Phrase Alignment for Statistical Machine Translation Mitsuru Koshikawa, 1 Masao Utiyama, 2 Shunji Umetani, 3 Tomomi Matsui

### Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 mchang21@uiuc.edu http://flake.cs.uiuc.edu/~mchang21 Research

### Automatic Detection and Correction of Errors in Dependency Treebanks

Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany alexander.volokh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg

### A CCG Parsing with a Supertag-factored Model

A CCG Parsing with a Supertag-factored Model Mike Lewis School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK mike.lewis@ed.ac.uk Mark Steedman School of Informatics University of Edinburgh

### Lecture 6: Logistic Regression

Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

### Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like

### Runtime Hardware Reconfiguration using Machine Learning

Runtime Hardware Reconfiguration using Machine Learning Tanmay Gangwani University of Illinois, Urbana-Champaign gangwan2@illinois.edu Abstract Tailoring the machine hardware to varying needs of the software