Natural Language Processing

Similar documents
Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Semantic analysis of text and speech

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education

Year 3 Grammar Guide. For Children and Parents MARCHWOOD JUNIOR SCHOOL

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

The Book of Grammar Lesson Six. Mr. McBride AP Language and Composition

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Nouns are naming words - they are used to name a person, place or thing.

Lesson 4 Parts of Speech: Verbs

California Treasures High-Frequency Words Scope and Sequence K-3

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

PUSD High Frequency Word List

Paraphrasing controlled English texts

Introduction. Philipp Koehn. 28 January 2016

Chapter 3 Growing with Verbs 77

Parts of Speech. Skills Team, University of Hull

Outline of today s lecture

A Beginner s Guide To English Grammar

Syntax: Phrases. 1. The phrase

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Grammar Boot Camp. Building Muscle: Phrases and Clauses. (click mouse to proceed)

7.5 Emphatic Verb Tense

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Special Topics in Computer Science

Phonics. High Frequency Words P.008. Objective The student will read high frequency words.

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Early Morphological Development

Making Friends at College

Pupil SPAG Card 1. Terminology for pupils. I Can Date Word

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Natural Language Processing. What s this story about?

Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA

REPORTED SPEECH. Reported speech is used to retell or report what other person has actually said. It is a very usual function in everyday language.

A test based on the grammar-grade one

I m sorry, Dave, I m afraid I can t do that :

English Appendix 2: Vocabulary, grammar and punctuation

SAMPLE. Grammar, punctuation and spelling. Paper 2: short answer questions. English tests KEY STAGE LEVEL. Downloaded from satspapers.org.

The parts of speech: the basic labels

Rethinking the relationship between transitive and intransitive verbs

Sample only Oxford University Press ANZ

Compound Sentences and Coordination

This handout will help you understand what relative clauses are and how they work, and will especially help you decide when to use that or which.

First Grade Spelling 3-1. First Grade Spelling. 1. an 2. at 3. can 4. cat 5. had 6. man 7. I 8. and 9. the 10. a. Dictation Sentences:

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

Basic Parsing Algorithms Chart Parsing

What Is Linguistics? December 1992 Center for Applied Linguistics

Glossary of literacy terms

Building a Question Classifier for a TREC-Style Question Answering System

Overview of the TACITUS Project

Grammar and Mechanics Test 3

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning

Classification of Natural Language Interfaces to Databases based on the Architectures

Machine Learning for natural language processing

Comprendium Translator System Overview

Checklist for Recognizing Complete Verbs

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

RELATIVE CLAUSES PRACTICE

Natural Language Processing

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

An Introduction to Language Processing with Perl and Prolog

Student s Worksheet. Writing útvary, procvičování

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

Online Tutoring System For Essay Writing

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Past Simple & Past Continuous. Exercises

PARALLEL STRUCTURE S-10

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

Prepositions. off. down. beneath. around. above. during

How To Understand A Sentence In A Syntactic Analysis

Fiction: Poetry. Classic Poems. Contemporary Poems. Example. Key Point. Example

HAVERHILL PUBLIC SCHOOLS Spanish I College Prep Curriculum Map. AR Verbs Present Regular Tense

Fry s Sight Word Phrases

Language at work To be Possessives

Parent Help Booklet. Level 3

Introduction to Semantics. A Case Study in Semantic Fieldwork: Modality in Tlingit

BILINGUAL TRANSLATION SYSTEM

Year 7. Grammar booklet 3 and tasks Sentences, phrases and clauses

Chapter 2 Phrases and Clauses

Strategies for Technical Writing

Teaching Vocabulary to Young Learners (Linse, 2005, pp )

Discourse Markers in English Writing

Clustering Connectionist and Statistical Language Processing

Phonics. P.041 High Frequency Words. Objective The student will read high frequency words.

Movers Reading & Writing

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology

Specialty Answering Service. All rights reserved.

Introduction to formal semantics -

Guided Reading Level J

2. SEMANTIC RELATIONS

Multisensory Grammar Online

Transcription:

Natural Language Processing Huong LeThanh huong@wv.inf.tu dresden.de 1

Why study natural language processing (NLP)? Applications Line breakers, hyphenators, spell checkers, grammar & style checkers information retrieval question answering automatic speech recognition intelligent Web searching automatic text summarization and classification pseudo understanding and generation of natural language; multi lingual systems including machine translation 2

Some information Time and Place Lectures and Tutorials are held on Wednesday, room GRU 350, 09.20am 10.50am References Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press. Dan Jurafsky and James Martin. 2000. Speech and Language Processing. PrenticeHall. Recommended Reading James Allen. 1994. Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 3

Course Goals Learn the basic principles and theoretical approaches underlying NLP Learn techniques and tools which can be used to develop practical, robust systems that can (partly) understand text or communicate with users in one or more languages Gain insight into many of the open research problems in natural language 4

Topics in NLP Levels of Analysis: syntax, semantics, discourse, pragmatics, world knowledge... Subproblems: part of speech tagging, syntactic parsing, word sense disambiguation, discourse processing... Algorithms and Methodologies: corpus based methods, knowledge based techniques,... Applications: information extraction, information retrieval, machine translation, question answering, natural language understanding... 5

Levels of Analysis and Knowledge Used in NLP Morphology: how words are constructed; prefixes & suffixes Syntax: structural relationships between words Semantics: meanings of words, phrases, and expressions Discourse: relationships across different sentences or thoughts; contextual effects Pragmatic: the purpose of a statement; how we use language to communicate World Knowledge: facts about the world at large; common sense 6

Morphology kick, kicks, kicked, kicking sit, sits, sat, sitting murder, murders But it s not just as simple as adding and deleting endings... gorge, gorgeous glass, glasses arm, army 7

Syntax: part of speech tagging The boy threw a ball to the brown dog. The/DT boy/nn threw/vbd a/dt ball/nn to/in the/dt brown/jj dog/nn./. DT determiner VBD verb, past tense JJ adjective NN noun, single or mass IN preposition, sub conj. sentence final punc 8

Syntax: structural ambiguity (part of speech) Time flies like an arrow. Time // flies like an arrow. VBZ comparative proposition (IN) Time flies // like an arrow. NNS VBP 9

Syntax: structural ambiguity (attachment) S VP NP NP V NP PP PP I saw the man on the hill with a telescope. 10

Syntax: structural ambiguity (attachment) S VP NP NP V NP PP PP I saw the man on the hill with a telescope. 11

Syntax: structural ambiguity (attachment) S VP NP V NP PP PP I saw the man on the hill with a telescope. 12

But syntax doesn t tell us much about meaning Colorless green ideas sleep furiously. [Chomsky] fire match arson hotel plastic cat food can cover 13

Semantics: lexical ambiguity I walked to the bank... The bug in the room... of the river. to get money. was planted by spies. flew out the window. I work for John Hancock... and he is a good boss. which is a good company. 14

Discourse: coreference President John F. Kennedy was assassinated. The president was shot yesterday. Relatives said that John was a good father. JFK was the youngest president in history. His family will bury him tomorrow. Friends of the Massachusetts native will hold a candlelight service in Mr. Kennedy s home town. 15

Pragmatics What should you conclude from the fact that I said something? How should you react? Rules of Conversation Can you tell me what time it is? Could I please have the salt? Speech Acts I bet you $50 that the Jazz will win. 16

World Knowledge John went to the diner. He ordered a steak. He left a tip and went home. What did John eat for dinner? Who brought John his food? Who cooked the steak? Did John pay his bill? 17

Why NLP is difficult Complex phenomenon arising out of the interaction of many distinct kinds of knowledge What is this knowledge? (data structures linguistics) How is it put to use? (algorithms) Example: the dogs ate ice cream 18

Knowledge of language: What do we know about this sequence? Words must appear in a certain order: *Dogs icecream ate Parts and divisions: dogs = Subject; ate icecream = Predicate Who did what to whom: agent(dogs), action(ate), object(ice cream) 19

Anything else? The two sentences John claimed the dogs ate icecream and John denied the dogs ate ice cream are logically incompatible Sentence & the world: know whether the sentence is true or not perhaps whether in some particular situation (possible world) the dogs did indeed eat icecream I had espresso this morning, but John is intelligent looks odd. 20

What is the character of this knowledge? Some of it must be memorized: Singing Sing+ing; Bringing bring+ing Duckling?? Duckl +ing So, must know duckl is not a word But it can t all be memorized because there is too much to know 21

Besides memory, what else do we need? English plural: Toy+s > toyz ; add z Book+s > books ; add s Church+s > churchiz ; add iz Box+s > boxiz ; add iz must be a rule system to generate/process infinite # of examples 22

Parsing = mapping from surface to underlying representation What makes NLP hard: there is not a 1 1 mapping between any of these representations! We have to know the data structures and the algorithms to make this efficient, despite exponential complexity at every point 23

LSAT / (former) GRE Analytic Section Questions Six sculptures C, D, E, F, G, H are to be exhibited in rooms 1, 2, and 3 of an art gallery. Sculptures C and E may not be exhibited in the same room. Sculptures D and G must be exhibited in the same room. If sculptures E and F are exhibited in the same room, no other sculpture may be exhibited in that room. At least one sculpture must be exhibited in each room, and no more than three sculptures may be exhibited in any room. If sculpture D is exhibited in room 3 and sculptures E and F are exhibited in room 1, which of the following may be true? A. Sculpture C is exhibited in room 1 B. Sculpture H is exhibited in room 1 C. Sculpture G is exhibited in room 2 D. Sculptures C and H are exhibited in the same room E. Sculptures G and F are exhibited in the same room 24

Reference Resolution U: Where is A Bug s Life playing in Mountain View? S: A Bug s Life is playing at the Summit theater. U: When is it playing there? S: It s playing at 2pm, 5pm, and 8pm. U: I d like 1 adult and 2 children for the first show. How much would that cost? Knowledge sources: Domain knowledge Discourse knowledge World knowledge 25

Why is natural language computing hard? Natural language is: highly ambiguous at all levels complex and fuzzy involves reasoning about the world 26

Making progress on this problem The task is difficult! What tools do we need? Knowledge about language Knowledge about the world A way to combine knowledge sources A potential solution: probabilistic models built from language data P( maison house ) high P( L avocat general the general avocado ) low 27