Tekniker för storskalig parsning
|
|
- Claribel Hancock
- 8 years ago
- Views:
Transcription
1 Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi Tekniker för storskalig parsning 1(19)
2 Generative Models A generative statistical model defines the joint probability P(x, y) of input x and output y Pros: Learning problems have closed form solutions Related probabilities can be derived: Conditionalization: P(y x) = P(x,y) P(x) Marginalization: P(x) = P y P(x, y) Cons: Rigid independence assumptions (or intractable parsing) Indirect modeling of parsing problem Tekniker för storskalig parsning 2(19)
3 Discriminative Models A discriminative statistical model defines the conditional probability P(y x) of output y given input x Pros: No rigid independence assumptions More direct modeling of parsing problem Cons: Learning problems require numerical approximation Related probabilities cannot be derived: No way to compute P(x, y) from P(y x) No way to compute P(x) or P(y) from P(y x) Tekniker för storskalig parsning 3(19)
4 Conditional and Discriminative Models Subdivision of discriminative models Conditional model: Explicitly model the conditional probability P(y x) Use model in mapping X Y: argmax y P(y x) Purely discriminative model: Directly optimize mapping X Y No explicit model of conditional probability P(y x) Tekniker för storskalig parsning 4(19)
5 Local and Global Models Local discriminative models: Maximize probability (or accuracy) of local decisions in the derivation of analysis y given input x Find globally optimal solution by making a sequence of locally optimal decisions Global discriminative models: Maximize probability (or accuracy) of complete analysis y given input x Examples: Local: Transition-based dependency parsing Global: Graph-based dependency parsing Tekniker för storskalig parsning 5(19)
6 Local Discriminative Models Conditional history-based model: m P(y x) = P(d i Φ(d 1,..., d i 1, x)) i=1 Probabilities can be conditioned on properties of the input For example: Lookahead in left-to-right derivations Compare generative model: m P(x, y) = P(d i Φ(d 1,..., d i 1 )) i=1 Tekniker för storskalig parsning 6(19)
7 Parsing Model GEN(x): Defined by derivational process (for example, transition system) EVAL(y): Score local decisions, conditioned on input and history Combine local scores into global scores Tekniker för storskalig parsning 7(19)
8 Inference Local discriminative models typically use greedy inference: Deterministic best-first search Beam search with agenda of k best hypotheses Properties: Very efficient Reasonably accurate thanks to lookahead No guarantee that globally best solution is found Tekniker för storskalig parsning 8(19)
9 Learning Learning problem: Local decision, conditioned on input and history Conditional: Estimate P(d i Φ(d 1,..., d i 1, x)) Training: Conditional MLE (more later) Purely discriminative: Optimize mapping Φ(d 1,..., d i 1, x) d i Training: Any classifier (SVM, Perceptron, CMLE,... ) Tekniker för storskalig parsning 9(19)
10 Evaluation Criteria Robustness: Yes (same as generative history-based models) Disambiguation: Yes, thanks to probability model or classifier Accuracy: Sometimes state of the art Efficiency: Very good (often linear complexity) Tekniker för storskalig parsning 10(19)
11 Global Discriminative Models Global models: No specific factorization of P(y x) Features can be defined over arbitrary substructures Training optimizes probability/accuracy of global structures In practice, some shortcuts are always necessary: Restricted scope of features Approximate inference Tekniker för storskalig parsning 11(19)
12 Parsing Model GEN(x): Formal grammar: HPSG [Toutanova et al. 2002, Miyao et al. 2003] LFG [Riezler et al. 2002, Kaplan et al. 2004] CCG [Clark and Curran 2004] Generative statistical parser (reranking) [Charniak and Johnson 2005] All possible trees over some alphabets [Taskar et al. 2004, McDonald et al. 2005] EVAL(y): Score related to P(y x) Tekniker för storskalig parsning 12(19)
13 Inference Exact parsing with global model is intractable Strategy 1: Use dynamic programming (chart parsing) Restrict feature scope Example: Graph-based dependency parsing (projective) Strategy 2: Use an independent generative component Restrict GEN(x) to make exact inference feasible Examples: Grammar-driven parsers for HPSG, LFG, CCG Reranking parsers using k-best list from statistical parser Tekniker för storskalig parsning 13(19)
14 Learning 1: Linear Classifiers Score S(x, y) defined as inner product of two vectors: S(x, y) = f(x, y) w = k w i f i (x, y) i=1 Feature vector: f(x, y) = f1 (x, y),..., f k (x, y) Weight vector: w = w 1,..., w k Note: Each fi (x, y) is a numerical feature of x and y Each wi is a real-valued weight for f i (x, y): Positive if f i (x, y) tends to occur in good trees Negative if f i (x, y) tends to occur in bad trees S(x, y) summarizes the evidence of all (non-zero) features Tekniker för storskalig parsning 14(19)
15 Learning 2: Discriminative Training Find weights that maximize accuracy on training set Training criterion (for all y in training set): y = argmax y GEN(x)f(x, y ) w Examples: Perceptron learning Support vector machines Tekniker för storskalig parsning 15(19)
16 Learning 3: Log-Linear Models Transform score to conditional probability: P(y x) = exp [f(x, y) w] y GEN(x) exp [f(x, y ) w] Note: exp [f(x, y) w] > 0 exp [f(x, y) w] y GEN(x) exp [f(x, y ) w] If exp a = b, then a = log b log a b = log a + log b Linear sum of products corresponds to log of probability Tekniker för storskalig parsning 16(19)
17 Learning 4: Conditional MLE Joint MLE: Find estimate that maximizes P(x, y) for training set Easy: Relative frequencies (analytical, closed form solution) Conditional MLE: Find estimate that maximizes P(y x) for training set Hard: No closed form solution Numerical optimization: Many methods: Iterative scaling, gradient descent,... Computationally intensive Guaranteed to converge to global maximum Tekniker för storskalig parsning 17(19)
18 Evaluation Criteria Robustness: Depends on GEN(x) Disambiguation: Yes, same as other statistical models Efficiency: Not so good, especially during training Accuracy: Currently the state of the art Tekniker för storskalig parsning 18(19)
19 Summary Discriminative models focus on conditional distribution P(y x) Pros: No rigid independence assumptions more global features Easy to combine with different base parsers Cons: Learning requires computationally intensive numerical methods Inference is often intractable and requires approximation Log-linear models are most widely used Tekniker för storskalig parsning 19(19)
20 References and Further Reading Eugene Charniak and Mark Johnson Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages Stephen Clark and James R. Curran Parsing the WSJ using CCG and log-linear models. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pages Ronald M. Kaplan, Stefan Riezler, Tracy Holloway King, John T. Maxwell III, Alexander Vasserman, and Richard Crouch Speed and accuracy in shallow and deep stochastic parsing. In Proceedings of Human Language Technology and the Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages Ryan McDonald, Koby Crammer, and Fernando Pereira Online large-margin training of dependency parsers. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages Yusuke Miyao, T. Ninomiya, and Jun ichi Tsujii Tekniker för storskalig parsning 19(19)
21 Probabilistic modeling of argument structures including non-local dependencies. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pages Stephan Riezler, Margaret H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, and Mark Johnson Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning Max-margin parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1 8. Kristina Toutanova, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen Parse disambiguation for a rich HPSG grammar. In Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT), pages Tekniker för storskalig parsning 19(19)
Online Large-Margin Training of Dependency Parsers
Online Large-Margin Training of Dependency Parsers Ryan McDonald Koby Crammer Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA {ryantm,crammer,pereira}@cis.upenn.edu
More informationDEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
More informationEffective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu
More informationOnline Learning of Approximate Dependency Parsing Algorithms
Online Learning of Approximate Dependency Parsing Algorithms Ryan McDonald Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 {ryantm,pereira}@cis.upenn.edu
More informationTransition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
More informationSemantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationStructured Models for Fine-to-Coarse Sentiment Analysis
Structured Models for Fine-to-Coarse Sentiment Analysis Ryan McDonald Kerry Hannan Tyler Neylon Mike Wells Jeff Reynar Google, Inc. 76 Ninth Avenue New York, NY 10011 Contact email: ryanmcd@google.com
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPeking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing Yantao Du, Fan Zhang, Weiwei Sun and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationAnalysis of Representations for Domain Adaptation
Analysis of Representations for Domain Adaptation Shai Ben-David School of Computer Science University of Waterloo shai@cs.uwaterloo.ca John Blitzer, Koby Crammer, and Fernando Pereira Department of Computer
More informationTask-oriented Evaluation of Syntactic Parsers and Their Representations
Task-oriented Evaluation of Syntactic Parsers and Their Representations Yusuke Miyao Rune Sætre Kenji Sagae Takuya Matsuzaki Jun ichi Tsujii Department of Computer Science, University of Tokyo, Japan School
More informationA Systematic Cross-Comparison of Sequence Classifiers
A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel grurgrur@gmail.com, feldman@cs.biu.ac.il,
More informationIntroduction to Data-Driven Dependency Parsing
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationEasy-First, Chinese, POS Tagging and Dependency Parsing 1
Easy-First, Chinese, POS Tagging and Dependency Parsing 1 ABSTRACT Ji Ma, Tong Xiao, Jing Bo Zhu, Fei Liang Ren Natural Language Processing Laboratory, Northeastern University, Shenyang, China majineu@outlook.com,
More informationA Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model
A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model Yue Zhang and Stephen Clark University of Cambridge Computer Laboratory William Gates Building, 15 JJ Thomson
More informationAn End-to-End Discriminative Approach to Machine Translation
An End-to-End Discriminative Approach to Machine Translation Percy Liang Alexandre Bouchard-Côté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,
More informationDomain Adaptation for Dependency Parsing via Self-training
Domain Adaptation for Dependency Parsing via Self-training Juntao Yu 1, Mohab Elkaref 1, Bernd Bohnet 2 1 University of Birmingham, Birmingham, UK 2 Google, London, UK 1 {j.yu.1, m.e.a.r.elkaref}@cs.bham.ac.uk,
More informationDirect Loss Minimization for Structured Prediction
Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative
More informationImproved Discriminative Bilingual Word Alignment
Improved Discriminative Bilingual Word Alignment Robert C. Moore Wen-tau Yih Andreas Bode Microsoft Research Redmond, WA 98052, USA {bobmoore,scottyhi,abode}@microsoft.com Abstract For many years, statistical
More informationWes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].
Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a two-class logistic regression problem with x R. Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample for
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationLexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training Stefan Riezler IMS, Universit t Stuttgart riezler@ims.uni-stuttgart.de Jonas Kuhn IMS, Universit t
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationChallenges of Cloud Scale Natural Language Processing
Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationNATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
More informationApplying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
More informationAccelerated Training of Maximum Margin Markov Models for Sequence Labeling: A Case Study of NP Chunking
Accelerated Training of Maximum Margin Markov Models for Sequence Labeling: A Case Study of NP Chunking Xiaofeng YU Wai LAM Information Systems Laboratory Department of Systems Engineering & Engineering
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationLearning and Inference over Constrained Output
IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationDistributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationLess Grammar, More Features
Less Grammar, More Features David Hall Greg Durrett Dan Klein Computer Science Division University of California, Berkeley {dlwh,gdurrett,klein}@cs.berkeley.edu Abstract We present a parser that relies
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationShallow Parsing with Conditional Random Fields
Shallow Parsing with Conditional Random Fields Proceedings of HLT-NAACL 2003 Main Papers, pp. 134-141 Edmonton, May-June 2003 Fei Sha and Fernando Pereira Department of Computer and Information Science
More informationIntroduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationA Shortest-path Method for Arc-factored Semantic Role Labeling
A Shortest-path Method for Arc-factored Semantic Role Labeling Xavier Lluís TALP Research Center Universitat Politècnica de Catalunya xlluis@cs.upc.edu Xavier Carreras Xerox Research Centre Europe xavier.carreras@xrce.xerox.com
More informationLearning to Rank Revisited: Our Progresses in New Algorithms and Tasks
The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationLatent-Descriptor Clustering for Unsupervised POS Induction
Latent-Descriptor Clustering for Unsupervised POS Induction Michael Lamar Department of Mathematics and Computer Science Saint Louis University 220 N. Grand Blvd. St.Louis, MO 63103, USA mlamar@slu.edu
More informationOnline Large-Margin Training of Syntactic and Structural Translation Features
Online Large-Margin Training of Syntactic and Structural Translation Features David Chiang Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey,
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationSibyl: a system for large scale machine learning
Sibyl: a system for large scale machine learning Tushar Chandra, Eugene Ie, Kenneth Goldman, Tomas Lloret Llinares, Jim McFadden, Fernando Pereira, Joshua Redstone, Tal Shaked, Yoram Singer Machine Learning
More informationLearning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
: Approximate Large Margin Methods for Structured Prediction Hal Daumé III Daniel Marcu Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292 USA hdaume@isi.edu marcu@isi.edu Abstract
More informationMapReduce/Bigtable for Distributed Optimization
MapReduce/Bigtable for Distributed Optimization Keith B. Hall Google Inc. kbhall@google.com Scott Gilpin Google Inc. sgilpin@google.com Gideon Mann Google Inc. gmann@google.com Abstract With large data
More informationOnline learning for Deterministic Dependency Parsing
Online learning for Deterministic Dependency Parsing Prashanth Reddy Mannem Language Technologies Research Center IIIT-Hyderabad, India prashanth@research.iiit.ac.in Abstract Deterministic parsing has
More informationChapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
More informationDetecting Parser Errors Using Web-based Semantic Filters
Detecting Parser Errors Using Web-based Semantic Filters Alexander Yates Stefan Schoenmackers University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195-2350 Oren Etzioni {ayates,
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationDifferential privacy in health care analytics and medical research An interactive tutorial
Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationA* CCG Parsing with a Supertag-factored Model
A* CCG Parsing with a Supertag-factored Model Mike Lewis School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK mike.lewis@ed.ac.uk Mark Steedman School of Informatics University of Edinburgh
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationA CCG Parsing with a Supertag-factored Model
A CCG Parsing with a Supertag-factored Model Mike Lewis School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK mike.lewis@ed.ac.uk Mark Steedman School of Informatics University of Edinburgh
More informationMing-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.
Ming-Wei Chang 201 N Goodwin Ave, Department of Computer Science University of Illinois at Urbana-Champaign, Urbana, IL 61801 +1 (917) 345-6125 mchang21@uiuc.edu http://flake.cs.uiuc.edu/~mchang21 Research
More informationAutomatic Detection and Correction of Errors in Dependency Treebanks
Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany alexander.volokh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationRuntime Hardware Reconfiguration using Machine Learning
Runtime Hardware Reconfiguration using Machine Learning Tanmay Gangwani University of Illinois, Urbana-Champaign gangwan2@illinois.edu Abstract Tailoring the machine hardware to varying needs of the software
More informationWhy language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
More informationLarge Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationSemantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing
Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento
More informationCross-Task Knowledge-Constrained Self Training
Cross-Task Knowledge-Constrained Self Training Hal Daumé III School of Computing University of Utah Salt Lake City, UT 84112 me@hal3.name Abstract We present an algorithmic framework for learning multiple
More informationLog-Linear Models. Michael Collins
Log-Linear Models Michael Collins 1 Introduction This note describes log-linear models, which are very widely used in natural language processing. A key advantage of log-linear models is their flexibility:
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationMulti-Class and Structured Classification
Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationContent-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationWhat is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control
What is Learning? CS 391L: Machine Learning Introduction Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem solving
More informationPredicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationA cloud-based editor for multilingual grammars
A cloud-based editor for multilingual grammars Anonymous Abstract Writing deep linguistic grammars has been considered a highly specialized skill, requiring the use of tools with steep learning curves
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationPrinciples of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n
Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
More informationOptimization Strategies for Online Large-Margin Learning in Machine Translation
Optimization Strategies for Online Large-Margin Learning in Machine Translation Vladimir Eidelman UMIACS Laboratory for Computational Linguistics and Information Processing Department of Computer Science
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationDesign Patterns in Parsing
Abstract Axel T. Schreiner Department of Computer Science Rochester Institute of Technology 102 Lomb Memorial Drive Rochester NY 14623-5608 USA ats@cs.rit.edu Design Patterns in Parsing James E. Heliotis
More informationTraining a Log-Linear Parser with Loss Functions via Softmax-Margin
Training a Log-Linear Parser with Loss Functions via Softmax-Margin Michael Auli School of Informatics University of Edinburgh m.auli@sms.ed.ac.uk Adam Lopez HLTCOE Johns Hopkins University alopez@cs.jhu.edu
More information