Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Similar documents
Machine Learning for natural language processing

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Basic Parsing Algorithms Chart Parsing

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Introduction. Philipp Koehn. 28 January 2016

Special Topics in Computer Science

Hidden Markov Models Chapter 15

Effective Self-Training for Parsing

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Statistical Machine Translation

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

Outline of today s lecture

Voice User Interfaces (CS4390/5390)

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Big Data in Education

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

BILINGUAL TRANSLATION SYSTEM

Machine Translation and the Translator

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Multi-Engine Machine Translation by Recursive Sentence Decomposition

Clustering Connectionist and Statistical Language Processing

Learning Translation Rules from Bilingual English Filipino Corpus

Semantic analysis of text and speech

CS 40 Computing for the Web

The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

Classification of Natural Language Interfaces to Databases based on the Architectures

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Natural Language Processing

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures

MONITORING URBAN TRAFFIC STATUS USING TWITTER MESSAGES

English Language (first language, first year)

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Language and Computation

Empirical Machine Translation and its Evaluation

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

Hybrid Strategies. for better products and shorter time-to-market

Simple Type-Level Unsupervised POS Tagging

Spanish 003 Syllabus Spring 2016

A Graph-Based Approach to Commonsense Concept Extraction and Semantic Similarity Detection

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Natural Language to Relational Query by Using Parsing Compiler

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Syntactic Transfer Using a Bilingual Lexicon

31 Case Studies: Java Natural Language Tools Available on the Web

Longman English Interactive

COURSE DESCRIPTION Introduction to Greek grammar, vocabulary, and pronunciation for the beginning student.

Course Manual Automata & Complexity 2015

Activities. but I will require that groups present research papers

A Chart Parsing implementation in Answer Set Programming

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning

Developing Academic Language Skills to Support Reading and Writing. Kenna Rodgers February, 2015 IVC Series

Course: Model, Learning, and Inference: Lecture 5

CSCI 3136 Principles of Programming Languages

Constituency. The basic units of sentence structure

Natural Language Database Interface for the Community Based Monitoring System *

Text Mining - Scope and Applications

MASTER THESIS. Mining Product Opinions and Reviews on the Web. Felipe Jordão Almeida Prado Mattosinho Born: , Cruzeiro Brazil

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

2014/02/13 Sphinx Lunch

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

The Seven Practice Areas of Text Analytics

: Introduction to Machine Learning Dr. Rita Osadchy

Rule based Sentence Simplification for English to Tamil Machine Translation System

Presented by Steve Cawthon, WIDA Certified Trainer

Log-Linear Models a.k.a. Logistic Regression, Maximum Entropy Models

Comprendium Translator System Overview

Metafrastes: A News Ontology-Based Information Querying Using Natural Language Processing

Paraphrasing controlled English texts

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Extracting translation relations for humanreadable dictionaries from bilingual text

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

A Statistical Parser for Czech*

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

DEGREE IN TOURISM AND HOSPITALITY MANAGEMENT. Program Guide

Customizing an English-Korean Machine Translation System for Patent Translation *

Transcription:

Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014

Outline What is computational linguistics? Topics of this course Organizational issues

Siri

Text prediction

Facebook Graph Search

Similarity in Google Search

Navigation Systems Turn right in fifty meters.

Information access European Media Monitor, http://emm.newsbrief.eu

Sentiment Analysis Crimson Hexagon, http://www.crimsonhexagon.com/

Sentiment Analysis Crimson Hexagon, http://www.crimsonhexagon.com/

Google Translate elpais.com, May 2014

Google Translate elpais.com, May 2014

Google Translate elpais.com, May 2014

Google Translate The medium, who has enjoyed always, the coach s trust, and has recovered from the rupture of the cruciate ligament and from the inside of the right knee, which elpais.com, May 2014

Lexical Ambiguity El medio, que siempre ha contado Medium (medium) Mittelfeldspieler (midfield player)

Word order El medio, que siempre ha contado con la confianza del seleccionador, Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer hat Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer hat gezählt auf Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer hat gezählt auf das Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer hat gezählt auf das Vertrauen Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer hat gezählt auf das Vertrauen des Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer hat gezählt auf das Vertrauen des Trainers Translation choose words in the other language and bring them in the correct order

Word order El medio, que siempre ha contado con la confianza del seleccionador, Der Mittelfeldspieler der immer auf das Vertrauen des Trainers gezählt hat Translation choose words in the other language and bring them in the correct order

Structural Ambiguity se ha recuperado de la rotura del ligamento cruzado y el interior de la rodilla derecha" has himself recovered of the rupture of the ligament cruciate and the interior of the knee right der Kreuzbandriss und der Innenseite des rechten Knies the cruciate ligament and the inside of the right knee el ligamento cruzado y el interior de la rodilla derecha el ligamento cruzado y el interior de la rodilla derecha cruciate and lateral the cruciate and the lateral ligament of the right knee

Content of this class Fundamental techniques of CL. Common themes: uncovering hidden linguistic structure dealing with ambiguity statistical methods efficient algorithms

Recovering parts of speech NNP NNS NN NNS CD NN Fed raises interest rates 0.5 percent (POS tags from Penn Treebank)

Recovering syntactic structure S NP VP I VP PP IV NP P NP shot Det N in PRP$ N an elephant my pyjamas John

Ambiguity: parts of speech VBD VBN NNP VBZ NNS VB VBP NN VBZ NNS CD NN Fed raises interest rates 0.5 percent (POS tags from Penn Treebank)

NP VP Ambiguity: Syntax I VP PP IV NP P NP shot Det N in PRP$ N an elephant my pyjamas S S NP VP NP VP I IV NP I VP PP shot Det N IV NP P NP an N PP shot Det N in PRP$ N elephant P NP an elephant my pyjamas in PRP$ N my pyjamas John shot in John How it got there, I have no idea. an elephant his pyjamas shot in

Other types of ambiguity A central problem: NL expressions are frequently highly ambiguous. lexical ambiguities: interest (noun) vs. interest (verb) vs. interest (the other noun) structural semantic ambiguities: every student did not pass the exam referential ambiguities: John beat Peter up. That really hurt him. Individual analyses are called readings.

The ambiguity challenge Number of readings grows exponentially with the sources of ambiguity. How do we identify the correct one? e.g. statistical models In practice, infeasible to enumerate all readings and choose the right one. How can we compute the correct reading efficiently? development of good algorithms

The knowledge challenge Uncovering hidden structure requires knowledge about language. Where do we get it? Classical approach: hand-written rules. Can be effective, but is very expensive. Modern approach: statistical models. dominant paradigm since the late 1990s Tradeoff between depth of modeling and quality of available models.

Siri of the Future? User: Book a table at Da Giovanni after I finish work, and tell John and Mary to meet me there. System: User: System: User: System: Sorry, Giovanni has no free tables until 9pm. Should I find a different Italian restaurant for 6:30pm? Can you find a table in a restaurant with a good wine list? Ristorante Biscotti still has an opening. It s in the Financial District, but has a similar travel time. Okay, sounds good. (makes the reservation online) (Example by Ron Kaplan, Nuance)

Topics in this class Elementary statistical models of language Tagging: Hidden Markov Models Parsing: esp. probabilistic context-free grammars Further topics: a bit of speech recognition and analysis semantics grammar induction machine translation

Lectures We will assign you some reading for each lecture. Please read it beforehand. There will be a quick, anonymous quiz for each reading assignment. Please participate, so we know what people are having problems with. Lecture will summarize reading, add some extra information, give you a chance to discuss.

Standard Textbooks Dan Jurafsky and James Martin, Speech and Language Processing Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing

Assignments There will be six programming assignments. Start early and plan enough time. Grading: You need to turn in at least five assignments. We will add up your best two scores from A1-3 and your best two scores from A4-6. In total, you must get at least 250 (of 400) points out of these best four assignments.

Final project The grade for the class is determined by a final project, which you work on in the term break. submit code plus documentation You should propose a topic for the project. size of project = roughly one assignment Grade will be based on difficulty of task quality of solution clarity of presentation

Resources Course website: http://www.ling.uni-potsdam.de/tcl/anlp14 Piazza (please sign up!): https://piazza.com/class/i0jvmbacgjd5kv Weekly tutorials with Zeljko