I m sorry, Dave, I m afraid I can t do that :

Similar documents
CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

CS 533: Natural Language. Word Prediction

Guidelines for Effective Business Writing: Concise, Persuasive, Correct in Tone and Inviting to Read

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Virtual Personal Assistant

Language Modeling. Chapter Introduction

An Overview of Computational Advertising

Introduction to Encryption

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

What is Organizational Communication?

CSC384 Intro to Artificial Intelligence

What is Artificial Intelligence?

USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE

Big Data and Scripting

Ummmm! Definitely interested. She took the pen and pad out of my hand and constructed a third one for herself:

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

lot of Chinese people who can speak English right there so I think they can help you.

Natural Language Interfaces (NLI s)

Introduction. Philipp Koehn. 28 January 2016

Apprenticeship. Have you ever thought how. Teaching Adults to Read. with Reading CTE AND LITERACY BY MICHELE BENJAMIN LESMEISTER

ICOM 5018 Network Security and Cryptography

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

TO WRITING AND GIVING A GREAT SPEECH. A Reference Guide for Teachers by Elaine C. Shook Leon County 4-H

Chapter 5 Conceptualization, Operationalization, and Measurement

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies

Teaching Formal Methods for Computational Linguistics at Uppsala University

Moving As A Child Part 2 Mini-Story Lesson

Collecting Polish German Parallel Corpora in the Internet

It is 1969 and three Apollo 11

תילגנאב תורגבה תניחב ןורתפ

Mayflower Compact Text-Dependent Questions

Practice file answer key

Eliminating Passive Voice

CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL?

COMP 590: Artificial Intelligence

Preparing for the GED Essay

Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples

Socratic Questioning

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

Hyper-connectivity and Artificial Intelligence

CoolaData Predictive Analytics

WHITEPAPER. Text Analytics Beginner s Guide

HOMEWORK PROJECT: An Inspector Calls

Writing College Admissions Essays/ UC Personal Statements. Information, Strategies, & Tips

EXECUTIVE SUPPORT SYSTEMS (ESS) STRATEGIC INFORMATION SYSTEM DESIGNED FOR UNSTRUCTURED DECISION MAKING THROUGH ADVANCED GRAPHICS AND COMMUNICATIONS *

Data Mining Yelp Data - Predicting rating stars from review text

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

WEB FORM E HELPING SKILLS SYSTEM

Language and Computation

Acoustical Surfaces, Inc.

Options on Beans For People Who Don t Know Beans About Options

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Reading Questions THE STRANGER PART ONE

Persuasive Speech Graphic Organizer

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Supplement to the Common Application Class of 2019

Watson. An analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds

Of Mice and Men. John Steinbeck. Study Guide. Name:

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education

Plagiarism and Cheating: Definitions, Examples and Consequences. Woodgrove High School

Copyright 2014 Edmentum - All rights reserved. Inside the suitcase.

Activities. but I will require that groups present research papers

The Old Man and The Sea

BUSINESS LETTER WRITING: PLANNING

Making requests and asking for permission.

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

Introduction to Psychology Psych 100 Online Syllabus Fall 2014

TEST-TAKING STRATEGIES FOR READING

TeachingEnglish Lesson plans

The Ontology of Cyberspace: Law, Philosophy, and the Future of Intellectual Property by

INTRUSION PREVENTION AND EXPERT SYSTEMS

How to write an Outline for a Paper

Inductive Reasoning Page 1 of 7. Inductive Reasoning

Carlsberg involved in football for 30 years

Loss Prevention Data Mining Using big data, predictive and prescriptive analytics to enpower loss prevention

SPIN Selling SITUATION PROBLEM IMPLICATION NEED-PAYOFF By Neil Rackham

HOW TO WRITE A CRITICAL ARGUMENTATIVE ESSAY. John Hubert School of Health Sciences Dalhousie University

Activity 10 - Universal Time

think customer service in the U.S. is the worst it s ever been. And, because in

Cars R and S travel the same distance, so the answer is D.

Linguistics & Cognitive Science

My Favorite Futures Setups. By John F. Carter

Single voice command recognition by finite element analysis

Key Insight 5 Marketing Goals and Objectives You Should Set. By WalkMe

Jesus at the Temple (at age 12)

Radio Montage Steps / by Ulrike Werner / Illustration and Layout: sandruschka


The Impact of Disruptive Behavior on Nursing Care and Patient Safety

Adapted from Stone Girl Bone Girl by Laurence Anholt, Francis Lincoln Children s Book

Identifying Focus, Techniques and Domain of Scientific Papers

1. The Fly In The Ointment

Applying machine learning techniques to achieve resilient, accurate, high-speed malware detection

Kotter and Bridges handouts for participants who did not attend Workshop 1.

Transcription:

I m sorry, Dave, I m afraid I can t do that : Linguistics, Statistics, and Natural-Language Processing in the Big Data Era Lillian Lee Professor, Computer Science http://www.cs.cornell.edu/home/llee

the dream

Why is this man smiling? http://www.nature.com/nature/journal/v482/n7386/full/482440a.html

The Turing test: Intelligence è human-level language use http://bittersweetsage.blogspot.com/2010/01/comic-converse-turing-test.html ]http://ghostradio.files.wordpress.com/2011/03/blade_runner_fondo.jpg Turing predicted we d be close in about 50 years.

Why is this man not smiling? http://4.bp.blogspot.com/_qm9cekv5jj4/s8iq3emehgi/aaaaaaaaaru/obz6ih5j4fi/s200/2001-a-space-odyssey.jpg Open the pod bay doors, Hal. I m sorry, Dave, I m afraid I can t do that. http://www.netbrawl.com/matchup.php?mid=11131&bracketid=497

from sci-fi to science and engineering

Natural-language processing (NLP) Goal: create systems that use human language as input/output speech-based interfaces information retrieval / question answering automatic summarization of news, emails, postings, etc. automatic translation and much more! Interdisciplinary: computer science; linguistics, psychology, communication; probability & statistics, information theory

Recently deployed: Siri http://www.apple.com/iphone/features/siri.html

State of the art: Watson Credit: AP Photo/Jeopardy Productions Inc. The Watson system beat human Jeopardy! champions (and didn t have internet access; it learned by reading before the match)

Why are these men smiling? The session co-organizers http://www.computerworld.ch/fileadmin/images/artikelbilder/14529.jpg, http://theory.stanford.edu/~sergei/portrait.jpg

But we re not all the way there yet

Real-life error (1) http://randomhandprints.blogspot.com/2011_01_01_archive.html A bunch of grapes. Hey bunch of grapes istock blankaboskov

Real-life error (2) We can email you when we're back. http://catandgirl.com/?p=2678 istock blankaboskov We can email you when you're fat.

Real-life error (3) http://jeopardy.edogo.com/wp-content/uploads/2009/01/program-jeopardy1.jpg [This U.S. city s] largest airport What is Toronto???

why is understanding language so hard?

Challenge: ambiguity List all flights on Tuesday List all flights on Tuesday = List all the flights leaving on Tuesday. List all flights on Tuesday = Wait til Tuesday, then list all flights.

More realistic example Retrieve all the local patient files

Baroque example [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope.

Baroque example [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. http://www.clipartmojo.com/plugins/clipart/clipartstock1/star%20gazing.png http://pokerfoldingtable.com/wp-content/uploads/2009/02/three-men-gambling-sitting-at-poker-table-playing-cards-betting-party-pen-ink-drawing-300x234.png http://www.geocities.ws/looneyebay/dell/bb040.jpg

Conversation complications Q: Do you know when the train to Boston leaves? A: Yes. Q: I want to know when the train to Boston leaves. A: I understand. [Grishman 1986] Images: http://3.bp.blogspot.com/_o4kq5tnl0z4/tux0j6e5bli/aaaaaaaaa5k/j7xjhvrcnlu/s1600/trillian-hitchhikers-guide-to-the-galaxy-the-2005.jpg, http://www.tvacres.com/images/robots_androids_marvin_movie.jpg

I m sorry, Dave, I m afraid I can t do that. [http://browse.deviantart.com/?qh=&section=&global=1&q=muscleduck#/d14nst5] [http://selcouth.com/2011/03] I m afraid you might be right.

Meeting these challenges: a brief history

1940s 50s: From language to probability The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point... [For] the engineering problem, the significant aspect is that the actual message is one selected from a set of possible messages. --C. Shannon, 1948

Language, statistics, cryptography WWII: Turing helps break the German Enigma code

Why is this man smiling? I can see Alaska from my house! http://artofrevolution.co.uk/new/index.php?main_page=product_info&cpath=1_3&products_id=239&zenid=vid94spfpa9vtr18sbtgug1h64 Encryption process [W. Weaver memo on translation, 1949]

Two probabilities to infer I can see Alaska from my house! Prob. of generating this original message? Encryption process Prob. of doing this encryption of the original? [Russian]

Another use of message probs: speech recognition (1) It s hard to recognize speech (2) It s hard to wreck a nice beach Both messages have almost the same acoustics, but different likelihoods.

1950s-1980s: Breaking with statistics N. Chomsky (1957): (a) Colorless green ideas sleep furiously (b) Furiously sleep ideas green colorless The argument: Neither sentence has ever occurred in the history of English. So any statistical model would given them the same probability (zero). The field moved to sophisticated non-probabilistic models of language.

1990s: The empiricists strike back Huge amounts of data start coming online Advances in algorithms, models, and horsepower 2000s and beyond: integrating language insights and statistical techniques Every time I fire a linguist, my [system s] performance goes up -- F. Jelinek (apocryphal)

Why is this man smiling? We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision... I do not know what the right answer is, but I think [different] approaches should be tried. We can only see a short distance ahead, but we can see plenty there that needs to be done.

computers can also (help us) understand us

Why is this man smiling? Beyond situational effects, phrasing also affects memorability: http://www.schwimmerlegal.com/2006/11/evidence-of-secondary-meaning-in-tv-catchphrases.html memorable movie quotes (in aggregate) are unusual word choices built on a scaffolding of common part-of-speech patterns shown via language models carries over to ad slogans C. Danescu-Niculescu-Mizil et al. ACL 2012

Social interaction: who has the lead? Communicative behaviors are patterned and coordinated, like a dance [Niederhoffer and Pennebaker, 02] adah ja ad to the adajkj the adah ja ad at a adajkj the http://minimalmovieposters.tumblr.com/post/16082323317/pulp-fiction-by-ana-balderramas adah ja ad of adah to ja ad an adajkj adajkj the gh adah ja ad of adajkj the adah ja ad the adajkj forhgh Those with less power tend to immediately match the function-word choices of those with more power. [C. Danescu-Niculescu-Mizil et al. WWW 2012]

Why is this man smiling? We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision... I do not know what the right answer is, but I think [different] approaches should be tried. We can only see a short distance ahead, but we can see plenty there that needs to be done.