POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches

Similar documents
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

PoS-tagging Italian texts with CORISTagger

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Part-of-Speech Tagging for Bengali

Brill s rule-based PoS tagger

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Overview of the EVALITA 2009 PoS Tagging Task

Learning Morphological Disambiguation Rules for Turkish

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Improving statistical POS tagging using Linguistic feature for Hindi and Telugu

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Training and evaluation of POS taggers on the French MULTITAG corpus

Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar anoop/

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Effective Self-Training for Parsing

Turkish Radiology Dictation System

ETL Ensembles for Chunking, NER and SRL

Machine Learning for natural language processing

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Log-Linear Models a.k.a. Logistic Regression, Maximum Entropy Models

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

Reliable and Cost-Effective PoS-Tagging

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Improving Word-Based Predictive Text Entry with Transformation-Based Learning

Word Completion and Prediction in Hebrew

Context Grammar and POS Tagging

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Semantic parsing with Structured SVM Ensemble Classification Models

Building A Vocabulary Self-Learning Speech Recognition System

Terminology Extraction from Log Files

Applications of Named Entity Recognition in Customer Relationship Management Systems

A Systematic Comparison of Various Statistical Alignment Models

Speech and Language Processing

A POS-based Word Prediction System for the Persian Language

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Log-Linear Models. Michael Collins

CS 533: Natural Language. Word Prediction

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Clustering Connectionist and Statistical Language Processing

Automatic slide assignation for language model adaptation

Terminology Extraction from Log Files

Identifying Focus, Techniques and Domain of Scientific Papers

Parsing Swedish. Atro Voutilainen Conexor oy CG and FDG

Customizing an English-Korean Machine Translation System for Patent Translation *

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

An online semi automated POS tagger for. Presented By: Pallav Kumar Dutta Indian Institute of Technology Guwahati

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

Translating the Penn Treebank with an Interactive-Predictive MT System

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Language Modeling. Chapter Introduction

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Supervised Learning (Big Data Analytics)

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

TagMiner: A Semisupervised Associative POS Tagger E ective for Resource Poor Languages

Tagging with Hidden Markov Models

Filtering Junk Mail with A Maximum Entropy Model

Stock Market Prediction Using Data Mining

NewsX. Diploma Thesis. Event Extraction from News Articles. Dresden University of Technology

Linguistic Knowledge-driven Approach to Chinese Comparative Elements Extraction

Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition

HMM : Viterbi algorithm - a toy example

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

Automated Extraction of Vulnerability Information for Home Computer Security

CS570 Data Mining Classification: Ensemble Methods

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Course: Model, Learning, and Inference: Lecture 5

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Tree based ensemble models regularization by convex optimization

Hybrid Strategies. for better products and shorter time-to-market

BILINGUAL TRANSLATION SYSTEM

Factored Translation Models

Natural Language Processing

Projektgruppe. Categorization of text documents via classification

The Design of a Proofreading Software Service

Transcription:

POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1

POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ TD tocar el bajo bajo la escalera. TD AQ AQ POS Tagging 2 TD PP AQ FP

POS Tagging 3 Most of words have a unique POS within a context Yo bajo con el hombre bajo a PP AQ TD tocar el bajo bajo la escalera. TD AQ AQ POS Tagging 3 TD PP AQ FP

POS Tagging 4 Pos taggers The goal of a POS tagger is to assign each word the most likely within a context Rule-based Statistical Hybrid POS Tagging 4

POS Tagging 5 W T = w 1 w 2 w n sequence of words = t 1 t 2 t n sequence of POS tags For each word w i only some of the tags can be assigned (except the unknown words). We can get them from a lexicon or a morphological analyzer. Tagset. Open vs closed categories f : W T = f(w) POS Tagging 5

POS Tagging 6 Rule-based taggers Knowledge-driven taggers Usually rules built manually Limited amount of rules ( 1000) LM and smoothing explicitely defined. POS Tagging 6

POS Tagging 7 Rule-based taggers + Linguistically motivated rules + High precision + ej. EngCG 99.5% High development cost Not transportable Time cost of tagging TAGGIT, Green,Rubin,1971 TOSCA, Oosdijk,1991 Constraint Grammars, EngCG, Voutilainen,1994, Karlsson et al, 1995 AMBILIC, de Yzaguirre et al, 2000 POS Tagging 7

POS Tagging 8 A CG consists of a sequence of subgrammars each one consisting of a set of restrictions (constraints) which set context conditions ej. (@w =0 VFIN (-1 TO)) ENGCG Constraint Grammars CG Discards POS VFIN when the previous word is to ENGTWOL Reductionist POS tagging 1,100 constraints 93-97% of the words are correctily disambiguated 99.7% accuracy Heuristic rules can be applied over the rest 2-3% residual ambiguity with 99.6% precision CG syntactic POS Tagging 8

POS Tagging 8 Statistical POS taggers LM and smoothing automatically learned from tagged corpora (supervised learning). Data-driven taggers Statistical inference Techniques coming from speech processing POS Tagging 9

POS Tagging 9 Statistical POS taggers + Well founded theoretical framework + Simple models. + Acceptable precision + > 97% + Language independent Learning the model CLAWS, Garside et al, 1987 De Rose, 1988 Church, 1988 Cutting et al,1992 Merialdo, 1994 Sparseness less precision POS Tagging 10

POS Tagging 10 N-gram Statistical POS taggers smoothing interpolation HMM ML Supervised learning MLE Semi-supervised Forward-Backward, Baum-Welch (EM Expectation Maximization) Charniak, 1993 Jelinek, 1998 Manning, Schütze, 1999 POS Tagging 11

POS Tagging 12 POS Tagging 11 = n k k k k k k n n t t t w P t t t P w w t t P n 1 1 2 1 1 ) ( ), ( ),,,, ( arg max 1 Contextual probability (trigrams) Lexical Probability 3-gram tagger

POS Tagging 12 HMM tagger Hidden States associated to n-grams Transition probabilities restricted to valid transitions: *BC -> BC* Emision probabilities restricted by lexicons POS Tagging 13

POS Tagging 13 Hybrid systems Transformation-based, error-driven Based on rules automatically acquired Maximum Entropy Combination of several knowledge sources No independence is assumed A high number of parameters is allowed (e.g. lexical features) Brill, 1995 Roche,Schabes, 1995 Ratnaparkhi, 1998, Rosenfeld, 1994 Ristad, 1997 POS Tagging 14

POS Tagging 14 Based on transformation rules that corerect errors produced by an initial HMM tagger rule Brill s system change label A into label B when... Each rule corresponds to the instantiation of a templete templetes The previous (following) word is tagged with Z One of the two previous (following) words is tagged with Z The previous word is tagged with Z and the following with W... Learning of the variables A,B,Z,W through an iterative process That choose at each iteration the rule (the instanciation) correcting more errors. POS Tagging 15

Other complex systems POS Tagging 15 Decision trees Supervised learning ej. TreeTagger Case-based, Memory-based Learning IGTree Relaxation labelling Statistical and linguistic constraints ej. RELAX Black,Magerman, 1992 Magerman 1996 Màrquez, 1999 Màrquez, Rodríguez, 1997 TiMBL Daelemans et al, 1996 Padrò, 1997 POS Tagging 16

POS Tagging 16 Màrquez s Tree-tagger root IN/RB ambiguity IN (preposition) RB (adverb) statístical interpretation : ^ other P(IN)=0.81 P(RB)=0.19 Word Form... P( RB word= A/as & tag(+1)=rb & tag(+2)=in) = 0.987 other As, as P(IN)=0.83 P(RB)=0.17 tag(+1)... IN RB P(IN)=0.13 P(RB)=0.87 tag(+2) ^ P( IN word= A/as & tag(+1)=rb & tag(+2)=in) = 0.013 P(IN)=0.013 P(RB)=0.987 leaf POS Tagging 17

POS Tagging 17 Combining taggers Combination of LM in a tagger STT+ RELAX Combination of taggers through votation bootstrapping Combinación of classifiers bagging (Breiman, 1996) boosting (Freund, Schapire, 1996) Màrquez, Rodríguez, 1998 Màrquez, 1999 Padrò, 1997 Màrquez et al, 1998 Brill, Wu, 1998 Màrquez et al, 1999 Abney et al, 1999 POS Tagging 18

POS Tagging 18 Màrquez s STT+ Language Model Lexical probs. + + N-grams Contextual probs. Raw text Morphological analysis Viterbi algorithm Tagged text Disambiguation POS Tagging 19

POS Tagging 19 Padró s Relax Language Model N-grams + + Linguistic rules Set of constraints Raw text Morphological analysis Relaxation Labelling (Padró, 1996) Tagged text Disambiguation POS Tagging 20