Module 13. Natural Language Processing. Version 2 CSE IIT, Kharagpur

Similar documents
Natural Language Database Interface for the Community Based Monitoring System *

Special Topics in Computer Science

Natural Language Processing. What s this story about?

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

English Grammar Checker

Speech and Language Processing

How To Complete The Danish Masters Program In Lct

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

A Machine Translation System Between a Pair of Closely Related Languages

Text-To-Speech Technologies for Mobile Telephony Services

Overview of the TACITUS Project

Natural Language to Relational Query by Using Parsing Compiler

Outline of today s lecture

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Paraphrasing controlled English texts

Semantic analysis of text and speech

Processing: current projects and research at the IXA Group

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Information extraction from online XML-encoded documents

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

31 Case Studies: Java Natural Language Tools Available on the Web

Learning Translation Rules from Bilingual English Filipino Corpus

Ling 201 Syntax 1. Jirka Hana April 10, 2006

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Application of Natural Language Interface to a Machine Translation Problem

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Marie Dupuch, Frédérique Segond, André Bittar, Luca Dini, Lina Soualmia, Stefan Darmoni, Quentin Gicquel, Marie-Hélène Metzger

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Role of NLP in Production and Comprehension to Communicate and Understand Human Language

Cambridge English: Advanced Speaking Sample test with examiner s comments

Building a Question Classifier for a TREC-Style Question Answering System

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

Automatic Text Analysis Using Drupal

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Department of Computer Science and Engineering, Kurukshetra Institute of Technology &Management, Haryana, India

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Context Grammar and POS Tagging

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

European Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students

Automated Extraction of Security Policies from Natural-Language Software Documents

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

Customizing an English-Korean Machine Translation System for Patent Translation *

USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE

Overview of MT techniques. Malek Boualem (FT)

Comprendium Translator System Overview

Natural Language Processing

Teaching Formal Methods for Computational Linguistics at Uppsala University

Julia Hirschberg. AT&T Bell Laboratories. Murray Hill, New Jersey 07974

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

Statistical Machine Translation

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Introduction to formal semantics -

Specialty Answering Service. All rights reserved.

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

GSLT Course - spring NLP1 Practical - WORDS Building a morphological analyzer. Preben Wik preben@speech.kth.se

A Knowledge-based System for Translating FOL Formulas into NL Sentences

An English-to-Arabic Prototype Machine Translator for Statistical Sentences

INTEGRATED DEVELOPMENT ENVIRONMENTS FOR NATURAL LANGUAGE PROCESSING

Language and Computation

TRANSLATION OF TELUGU-MARATHI AND VICE- VERSA USING RULE BASED MACHINE TRANSLATION

Natural Language Web Interface for Database (NLWIDB)

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology

Teaching English as a Foreign Language (TEFL) Certificate Programs

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Verb omission errors: Evidence of rational processing of noisy language inputs

WORKSHOP ON THE EVALUATION OF NATURAL LANGUAGE PROCESSING SYSTEMS. Martha Palmer Unisys Center for Advanced Information Technology Paoli, PA 19301

Exalead CloudView. Semantics Whitepaper

Empirical Machine Translation and its Evaluation

Course Description (MA Degree)

The primary goals of the M.A. TESOL Program are to impart in our students:

INCREASE YOUR PRODUCTIVITY WITH CELF 4 SOFTWARE! SAMPLE REPORTS. To order, call , or visit our Web site at

A Comparative Analysis of Standard American English and British English. with respect to the Auxiliary Verbs

Syntax: Phrases. 1. The phrase

D2.4: Two trained semantic decoders for the Appointment Scheduling task

A Chart Parsing implementation in Answer Set Programming

SOCIS: Scene of Crime Information System - IGR Review Report

Introduction to Semantics. A Case Study in Semantic Fieldwork: Modality in Tlingit

How To Teach English To Other People

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

Prosodic Phrasing: Machine and Human Evaluation

English. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns.

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

TESOL Standards for P-12 ESOL Teacher Education = Unacceptable 2 = Acceptable 3 = Target

Using NLP and Ontologies for Notary Document Management Systems

Natural Language Dialogue in a Virtual Assistant Interface

An Introduction to Language Processing with Perl and Prolog

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

An Arabic Natural Language Interface System for a Database of the Holy Quran

How to make Ontologies self-building from Wiki-Texts

Knowledge Representation and Question Answering

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Representation and Processing Revisited: Meaning

Machine Learning for natural language processing

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

Expressive Objective: Realize the importance of using polite expressions in showing respect when communicating with others

Transcription:

Module 13 Natural Language Processing

13.1 Instructional Objective The students should understand the necessity of natural language processing in building an intelligent system Students should understand the difference between natural and formal language and the difficulty in processing the former Students should understand the ambiguities that arise in natural language processing Students should understand the language information required like like o Phonology o Morphology o Syntax o Semantic o Discourse o World knowledge Students should understand the steps involved in natural language understanding and generation The student should be familiar with basic language processing operations like o Morphological analysis o Parts-of-Speech tagging o Lexical processing o Semantic processing o Knowledge representation At the end of this lesson the student should be able to do the following: Design the processing steps required for a NLP task Implement the processing techniques.

Lesson 40 Issues in NLP

13.1 Natural Language Processing Natural Language Processing (NLP) is the process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation. The field of NLP is primarily concerned with getting computers to perform useful and interesting tasks with human languages. The field of NLP is secondarily concerned with helping us come to a better understanding of human language. The input/output of a NLP system can be: written text speech We will mostly concerned with written text (not speech). To process written text, we need: lexical, syntactic, semantic knowledge about the language discourse information, real world knowledge To process spoken language, we need everything required to process written text, plus the challenges of speech recognition and speech synthesis. There are two components of NLP. Natural Language Understanding Mapping the given input in the natural language into a useful representation. Different level of analysis required: morphological analysis, syntactic analysis, semantic analysis, discourse analysis, Natural Language Generation Producing output in the natural language from some internal representation. Different level of synthesis required: deep planning (what to say), syntactic generation NL Understanding is much harder than NL Generation. But, still both of them are hard. The difficulty in NL understanding arises from the following facts: Natural language is extremely rich in form and structure, and very ambiguous. How to represent meaning, Which structures map to which meaning structures. One input can mean many different things. Ambiguity can be at different levels.

Lexical (word level) ambiguity -- different meanings of words Syntactic ambiguity -- different ways to parse the sentence Interpreting partial information -- how to interpret pronouns Contextual information -- context of the sentence may affect the meaning of that sentence. Many input can mean the same thing. Interaction among components of the input is not clear. The following language related information are useful in NLP: Phonology concerns how words are related to the sounds that realize them. Morphology concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. Syntax concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. Semantics concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning. Pragmatics concerns how sentences are used in different situations and how use affects the interpretation of the sentence. Discourse concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information. World Knowledge includes general knowledge about the world. What each language user must know about the other s beliefs and goals. 13.1.1 Ambiguity I made her duck. How many different interpretations does this sentence have? What are the reasons for the ambiguity? The categories of knowledge of language can be thought of as ambiguity resolving components. How can each ambiguous piece be resolved? Does speech input make the sentence even more ambiguous? Yes deciding word boundaries Some interpretations of : I made her duck.

1. I cooked duck for her. 2. I cooked duck belonging to her. 3. I created a toy duck which she owns. 4. I caused her to quickly lower her head or body. 5. I used magic and turned her into a duck. duck morphologically and syntactically ambiguous: noun or verb. her syntactically ambiguous: dative or possessive. make semantically ambiguous: cook or create. make syntactically ambiguous: Transitive takes a direct object. => 2 Di-transitive takes two objects. => 5 Takes a direct object and a verb. => 4 Ambiguities are resolved using the following methods. models and algorithms are introduced to resolve ambiguities at different levels. part-of-speech tagging -- Deciding whether duck is verb or noun. word-sense disambiguation -- Deciding whether make is create or cook. lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing. 13.1.2 Models to represent Linguistic Knowledge We will use certain formalisms (models) to represent the required linguistic knowledge. State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. Logic-based Formalisms -- first order predicate logic, some higher order logic. Models of Uncertainty -- Bayesian probability theory. 13.1.3 Algorithms to Manipulate Linguistic Knowledge We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. Most of the algorithms we will study are transducers and parsers. These algorithms construct some structure based on their input. Since the language is ambiguous at all levels, these algorithms are never simple processes. Categories of most algorithms that will be used can fall into following categories. state space search dynamic programming

13.2 Natural Language Understanding The steps in natural language understanding are as follows: Words Morphological Analysis Morphologically analyzed words (another step: POS tagging) Syntactic Analysis Syntactic Structure Semantic Analysis Context-independent meaning representation Discourse Processing Final meaning representation