CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage



Similar documents
I m sorry, Dave, I m afraid I can t do that :

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Semantic analysis of text and speech

Machine Learning for natural language processing

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Natural Language Processing. What s this story about?

Special Topics in Computer Science

Processing: current projects and research at the IXA Group

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Empirical Machine Translation and its Evaluation

Search and Information Retrieval

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

Text Mining - Scope and Applications

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Introduction. Philipp Koehn. 28 January 2016

31 Case Studies: Java Natural Language Tools Available on the Web

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

2014/02/13 Sphinx Lunch

Survey Results: Requirements and Use Cases for Linguistic Linked Data

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Semantic annotation of requirements for automatic UML class diagram generation

Classification of Natural Language Interfaces to Databases based on the Architectures

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

How To Complete The Danish Masters Program In Lct

USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE

Log-Linear Models a.k.a. Logistic Regression, Maximum Entropy Models

English Grammar Checker

Outline of today s lecture

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

The Seven Practice Areas of Text Analytics

Paraphrasing controlled English texts

Identifying Focus, Techniques and Domain of Scientific Papers

Statistical Machine Translation

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation*

Build Vs. Buy For Text Mining

Language and Computation

Customizing an English-Korean Machine Translation System for Patent Translation *

A. Schedule: Reading, problem set #2, midterm. B. Problem set #1: Aim to have this for you by Thursday (but it could be Tuesday)

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

Natural Language to Relational Query by Using Parsing Compiler

Speculating on the Future for Automatic Speech Recognition

Programming Languages

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. CSC467 Compilers and Interpreters Fall Semester, 2005

ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION

Natural Language Interfaces to Databases: simple tips towards usability

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

DEVELOPMENT OF NATURAL LANGUAGE INTERFACE TO RELATIONAL DATABASES

Automatic Text Analysis Using Drupal

Contents The College of Information Science and Technology Undergraduate Course Descriptions

Natural Language Dialogue in a Virtual Assistant Interface

Sentiment Analysis: a case study. Giuseppe Castellucci castellucci@ing.uniroma2.it

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Basic Parsing Algorithms Chart Parsing

Skills for Effective Business Communication: Efficiency, Collaboration, and Success

BBC LEARNING ENGLISH 6 Minute Grammar So, such, enough, too

Master of Arts in Linguistics Syllabus

TREC 2003 Question Answering Track at CAS-ICT

Finding Advertising Keywords on Web Pages. Contextual Ads 101

SakkamMT White Paper

Overview of MT techniques. Malek Boualem (FT)

Specialty Answering Service. All rights reserved.

University of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus

Information extraction from texts. Technical and business challenges

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC

The Cambridge English Scale explained A guide to converting practice test scores to Cambridge English Scale scores

Learning Translation Rules from Bilingual English Filipino Corpus

Ask your Database: Natural Language Processing using In-Memory Technology

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

European Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Left-Brain/Right-Brain Multi-Document Summarization

Course Syllabus My TOEFL ibt Preparation Course Online sessions: M, W, F 15:00-16:30 PST

Language Processing Systems

Role of NLP in Production and Comprehension to Communicate and Understand Human Language

M LTO Multilingual On-Line Translation

University of North Dakota Department of Electrical Engineering Graduate Program Assessment Plan

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Natural Language Interfaces (NLI s)

School of Computer Science

WORKSHOP ON THE EVALUATION OF NATURAL LANGUAGE PROCESSING SYSTEMS. Martha Palmer Unisys Center for Advanced Information Technology Paoli, PA 19301

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Why are Organizations Interested?

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

An Overview of Applied Linguistics

Text Mining and its Applications to Intelligence, CRM and Knowledge Management

F. Aiolli - Sistemi Informativi 2007/2008

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

Word Completion and Prediction in Hebrew

Endowing a virtual assistant with intelligence: a multi-paradigm approach

Leveraging ASEAN Economic Community through Language Translation Services

PROMT Technologies for Translation and Big Data

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

Activities. but I will require that groups present research papers

Transcription:

CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing (NLP) and/or information retrieval (IR). Possible topics include text categorization and clustering, information extraction, latent semantic analysis (LSI), click-through data for web search, language modeling, computational syntactic and semantic formalisms, grammar induction, and machine translation. Natural Language Processing (NLP) Natural language Languages that people use to communicate Ultimate goal To build computer systems that perform as well at using natural language as humans do Immediate goal To build computer systems that can process text and speech more intelligently NL input computer NL output understanding generation Information retrieval (IR) Information retrieval Ad-hoc IR Topic: Advantages and disadvantages of using potassium hydroxide in any aspect of organic farming, especially doc 1 score doc 2 score doc 3 score doc n score information need text collection relevant documents (ranked) IR system

Text categorization Sentiment categorization Document Is the document about politics? fashion Document Is the document positive? sports? negative? economics? fashion? Summarization Question answering (QA) Task» How many calories are there in a Big Mac?» Who is the voice of Miss Piggy?» Who was the first American in space? Retrieve not just relevant documents, but return the answer? answer + supporting text [White et al., 2002] text collection

Machine translation MT systems would clearly facilitate human-human communication Certainly see a need for it u The extension of the coverage of the health services to the underserved or not served population of the countries of the region was the central goal of the Ten-Year Plan and probably that of greater scope and transcendence. Dialogue systems Require both understanding and generation Dave: Open the pod bay doors, HAL. HAL: I'm sorry Dave, I'm afraid I can't do that. Dave: What's the problem? HAL: I think you know what the problem is just as well as I do. u Welcome to Chinese Restaurant. Please try your Nice chinese Food With chopsticks. the traditional and typical of Chinese glorious history and cultual. PRODUCT OF CHINA Bill Gates, 1997 now we re betting the company on these natural interface technologies Umm there WILL be complications NLP AI-complete» To solve NLP, you d need to solve all of the problems in AI Turing test» Posits that engaging effectively in linguistic behavior is a sufficient condition for having achieved intelligence. But little kids can do NLP Why is understanding language hard? Syntax Concerns sentence structure Different syntactic structure implies different interpretation» Squad helps dog bite victim. u [ np squad] [ vp helps [ np dog bite victim]] u [ np squad] [ vp helps [ np dog] [ inf-clause bite victim]]» Visiting relatives can be trying.

Semantics Concerns what words mean and how these meanings combine to form sentence meanings.» We steered the sailboat away from the bank. u River bank vs. $$$ bank?» Visiting relatives can be trying.» Visiting museums s can be trying. u Same set of possible syntactic structures for this sentence u But the meaning of museums makes only one of them plausible Discourse Concerns how the immediately preceding sentences affect the interpretation of the next sentence It will be called Prodome Ltd. It will own 50% of the new company to be called Prodome Ltd. It had previously teamed up with Merck in two unsuccessful pharmaceutical ventures. Pragmatics Concerns how sentences are used in different situations and how use affects the interpretation of the sentence. I just came from New York.» Would you like to go to New York today?» Would you like to go to Boston today?» Why do you seem so out of it?» Boy, you look tired. What topics will we cover? Language modeling Lexical semantics and word-sense disambiguation Part-of-speech tagging and HMMs Parsing Semantic analysis Discourse processing Coreference analysis NL Generation Machine translation Information extraction Information retrieval models Sentiment analysis Text categorization Question answering Summarization GOAL: learn to evaluate and execute research in NLP/IR.

Reference Material Required text book: Jurafsky and Martin, Speech and Language Processing, Prentice-Hall, 2 nd edition. Other useful references: Manning and Schutze. Foundations of Statistical NLP,, MIT Press, 1999. Frederick Jelinek. Statistical Methods for Speech Recognition, MIT Press, 1998. Others listed on course web page Prereqs, Coursework, & Grading Prerequisites Permission of instructor. Neither INFO/CS4300 nor CS4740 are prerequisites. Grading 25%: participation You'll be expected to participate in class discussion and class exercises. 40%: semester project 33%: presentation of readings and research papers 2%: course evaluation completion