A large annotated corpus of entailments and contradictions

Similar documents
Phonics. High Frequency Words P.008. Objective The student will read high frequency words.

Introduction to Semantics. A Case Study in Semantic Fieldwork: Modality in Tlingit

Homework Activities for Kindergarten

ESL QUESTION 62 ANSWER 8 LUCKY CARDS

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education

Tapescript. B Listen and write the words. C Help the baby spider. Draw a red line. D Help the baby frog. Listen and draw a green line.

Regular Verbs Simple Present and Simple Past Tenses

Grammar Academic Review

Level Lesson Plan Session 1

PUSD High Frequency Word List

Benin. Gallery activities. Contents: 1. Notes for Teachers. 2. Explore and Explain. 3. Comparisons. 4. Now and Then. 5.

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

A Note to Parents. 1. As you study the list, vary the order of the words.

The Role of Sentence Structure in Recognizing Textual Entailment

2nd Grade Language Arts Practice Test

TUTORIAL: BOARDMAKER STUDIO START-UP

Read this syllabus very carefully. If there are any reasons why you cannot comply with what I am requiring, then talk with me about this at once.

Assessing Speaking Performance Level B2

Temptation. A Youth Lesson for Grades 3-5, 6-8, & 9-12

Modals 10.1 MODALS 10.2 PAST PROGRESSIVE MODALS 10.3 REVIEW

Constituency. The basic units of sentence structure

Fry s Sight Word Phrases

English Oral End of Primary Benchmark 2014 Page 1 of 63

ESOL Customer Service Training: Unit 1 1: 1 Student Book. Unit 1: Talking With Your Customer

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Word Meaning Helping Your Child Determine the Meaning of Words

Fast Phrases. - Timed - PHRASES WITH FRY INSTANT WORDS

Dates count as one word. For example, December 2, 1935 would all count as one word.

A test based on the grammar-grade one

Simple Present Tense. Simple Present Tense in the Negative. Grammar Practice Worksheets

Preschool March Lessons. Dr. Seuss Day. One Day Free Sample

Workplace Giving Toolkit!

Year 7. Grammar booklet 3 and tasks Sentences, phrases and clauses

Circle Time Songs. The More We Get Together. I Can Read Colors

Week 4 Lesson Plan. Pre-K. Animals in the Wild. Macmillan /McGraw-Hill. Extend. the Unit

GESE Initial steps. Guide for teachers, Grades 1 3. GESE Grade 1 Introduction

Adding and Subtracting Strategies

SALE TODAY All toys half price

What does compassion look like?

Kant s deontological ethics

Just Married. PART 1 - Meet Neil and Julia. PART 2 - A tour around the kitchen

EXTRA ACTIVITy pages

Dance with Me. Pre-Reading Preparation. I love to dance. In fact, I have danced most of my life.

P1. All of the students will understand validity P2. You are one of the students C. You will understand validity

Start ASL The Fun Way to Learn American Sign Language for free!

Christmas Theme: The Greatest Gift

Discovering Math: Prediction and Probability Teacher s Guide

Monday Reading Story: Unsinkable Wreck of the R.M.S. Titanic

PHRASAL VERBS INTRODUCTION. The Òsmall wordsó in phrasal verbs are important, because they completely change the meaning.

Suggested Grade 1 2 Lesson Plan Students Rights and Responsibilities

Use the Academic Word List vocabulary to make tips on Academic Writing. Use some of the words below to give advice on good academic writing.

A Few Basics of Probability

Young Learners English

Introduction This lesson is about pets and focuses on the recent British trend of having a small dog as a fashion accessory.

Preparing for the GED Essay

Identifying Focus, Techniques and Domain of Scientific Papers

Getting-to-know-you/Sponge Activities: These activities can be used as gettingto-know-you

California Treasures High-Frequency Words Scope and Sequence K-3

Verbs: Present 1.1 SIMPLE PRESENT 1.2 NONPROGRESSIVES 1.3 PRESENT PROGRESSIVE

Sentence Blocks. Sentence Focus Activity. Contents

Lesson 3. Branches of Government. Student Materials OBJECTIVE FOUNDATIONS UNIT

2. SEMANTIC RELATIONS

Building Strong Families

Here are several tips to help you navigate Fairfax County s legal system.

Worksheet English is GREAT. Task 1 What is the common link between all these words?

A Short Course in Logic Example 8

Kindergarten Sight Word Sentences Vol. 2

Think about things that are green. List five different things that are green and use each of these words in a sentence.

Sermon Faith Is Risky Business Matthew 25: HGJ CtK

Lab Experience 17. Programming Language Translation

Introduction. Philipp Koehn. 28 January 2016

Young Learners English

Series LAW ENFORCEMENT and PROTECTIVE SERVICES (LEAPS)

Predicate Logic. For example, consider the following argument:

Grade 3 Informational Mini-Assessment Cactus Jam

Making Inferences Picture #1

Chapter. The Weekend

Learning Theories Taught in EDFL 2240: Educational Psychology. Behavioral Learning Theories (Learning is defined as a change in behavior)

Get road maps for older children.

English. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns.

Colored Hats and Logic Puzzles

The Great Debate. Handouts: (1) Famous Supreme Court Cases, (2) Persuasive Essay Outline, (3) Persuasive Essay Score Sheet 1 per student

Main Ideas and Supporting Details

A discourse approach to teaching modal verbs of deduction. Michael Howard, London Metropolitan University. Background

Things are different

Special Topic Lesson: they re, there, their

Sample only Oxford University Press ANZ

Cosmological Arguments for the Existence of God S. Clarke

Fantasy Fiction Book Projects

Warmups and Energizers

Unit 2, lesson Listen! 2. Listen and answer. (You can check with the sentences above.) 3. Draw your pet or the pet you would like.

Fun for all the Family 3- Quite a few games for articles and determiners

VALUES-DRIVEN AN ACTIVITY FOR PARENTS AND CHILDREN A GIFT TO YOU FROM: MARC AND CYNTHIA CARRIER

WHAT DO YOU DO? Teacher s notes

Team Member Policies and Guidelines

Transcription:

A large annotated corpus of entailments and contradictions Samuel R. Bowman Gabor Angeli Christopher Potts Christopher D. Manning Stanford University Premise: A man speaking or singing into a microphone while playing the piano. Hypothesis: A man is performing surgery on a giraffe while singing. Label: contradiction

Corpus Semantics and Entailments

Corpora for semantics? Answering questions about what sentences people are willing or able to say: Almost any corpus of naturally occurring text, like... CoCA BNC The Web

Corpora for semantics? Answering questions about how people interpret particular structures: Collecting new human judgments Corpora annotated with meanings: (sentence, context, meaning)

Semantic annotations How should the meaning be represented? Metalanguage sentences Logical forms: Catch 22. Nonlinguistic data: Images or videos Difficult to study statistically States of affairs in a game (cf. Potts, 2012) Very limited domain Object language sentences: Natural language inference corpora!

Natural language inference (NLI) also known as recognizing textual entailment (RTE) Premise: James Byron Dean refused to move without blue jeans Hypothesis: James Dean didn t dance without pants Label: {entails, contradicts, neither} Example: MacCartney thesis 09

NLI in Semantics Logical forms make predictions about entailments. If blue_jeans = λx. pants(x) denim(x) blue(x) and pants = λx. pants(x) then The dog is wearing blue jeans. entails The dog is wearing pants....assuming reasonable logical forms for the rest of the words in the sentence. cf. research Natural Logic: A logic that isolates entailment predictions without using explicit logical forms for sentences.

NLI in Semantics Correctly predicting entailments requires capturing every aspect of semantics except grounding: Lexical entailment: Does jeans entail pants? Quantification: Does most N V entail some N V? Factivity and implicativity Does N managed to V entail N Ved? Object and event coreference Lexical ambiguity and scope ambiguity Propositional attitudes Modality...

Existing NLI Corpora

Natural language inference data Corpus Size Full Sentences Expert Curated Model Biased FraCaS.3k RTE 7k SICK 10k DG 728k ~ Levy 1,500k PPDB 100,000k

Natural language inference data Corpus Size Full Sentences Expert Curated Model Biased FraCaS.3k RTE 7k SICK 10k SNLI 570k DG 728k ~ Levy 1,500k PPDB 100,000k

The Stanford NLI Corpus The Stanford Natural Language Inference Corpus (SNLI) 570k pairs of sentences Collected over Amazon Mechanical Turk Simple annotation guidelines No model necessary

Coreference issues for NLI data collection

One event or two? Premise: A boat sank in the Pacific Ocean. Hypothesis: A boat sank in the Atlantic Ocean.

One event or two? One. Premise: A boat sank in the Pacific Ocean. Hypothesis: A boat sank in the Atlantic Ocean. Label: Contradiction

One event or two? One. Premise: Ruth Bader Ginsburg was appointed to the US Supreme Court. Hypothesis: I had a sandwich for lunch today Label: Contradiction

One event or two? Two. Premise: Ruth Bader Ginsburg was appointed to the US Supreme Court. Hypothesis: I had a sandwich for lunch today Label: Neutral

One event or two? Two. Premise: A boat sank in the Pacific Ocean. Hypothesis: A boat sank in the Atlantic Ocean. Label: Neutral (two different events) No contradictions at all in typical natural sentences.

Our data collection prompt

Source captions from Flickr30k: Young, Lai, Hodosh, and Hockenmaier, TACL 14

Entailment Source captions from Flickr30k: Young, Lai, Hodosh, and Hockenmaier, TACL 14

Entailment Neutral Source captions from Flickr30k: Young, Lai, Hodosh, and Hockenmaier, TACL 14

Entailment Neutral Contradiction Source captions from Flickr30k: Young, Lai, Hodosh, and Hockenmaier, TACL 14

One photo or two? One. Premise: Ruth Bader Ginsburg being appointed to the US Supreme Court. Hypothesis: A man eating a sandwich for lunch. Label: Different photo ( contradiction)

Flickr30k descriptive captions The source of the premise in almost every* SNLI pair: Premise: A white dog with long hair jumps to catch a red and green toy. Hypothesis: An animal is jumping to catch an object. Label: Entailment Flickr30k: Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier, TACL 14 Collected over Mechanical Turk *A small subset of the data draws on visualgenome.org data from a pilot study.

What we got

Some sample results Premise: Two women are embracing while holding to go packages. Entailment: Two woman are holding packages.

Some sample results Premise: A man in a blue shirt standing in front of a garage-like structure painted with geometric designs. Neutral: A man is repainting a garage

Some sample results Premise: A man selling donuts to a customer during a world exhibition event held in the city of Angeles Contradiction: A woman drinks her coffee in a small cafe.

General observations Entailments aren t logically strict, and incorporate commonsense background assumptions. Only see semantic phenomena which can occur in scene descriptions. (Hard to avoid.) Premise and hypothesis sentences can be syntactically (and stylistically) very different. Spelling and grammar errors are rare. (NB: the corpus is not cleaned) Contradiction hypotheses are often (but not always) somehow related to their premises.

Sentence length

Bare NPs vs. full sentences Data source Premises (Flickr30k) Hypotheses (our work) % full sentences (Stanford Parser S ) 74.0 88.9

Data validation

Data validation Premise: Two women are embracing while holding to go packages. Entailment: Two woman are holding packages.

Data validation Premise: Two women are embracing while holding to go packages. Entailment: Two woman are holding packages. Labels: entailment

Data validation Premise: Two women are embracing while holding to go packages. Entailment: Two woman are holding packages. Labels: entailment entailment entailment neutral entailment

Data validation Premise: Two women are embracing while holding to go packages. Entailment: Two woman are holding packages. Labels: entailment entailment entailment neutral entailment Gold label: entailment

Data validation Premise: Two women are embracing while holding to go packages. Entailment: Two woman are holding packages. Labels: entailment entailment entailment neutral entailment Gold label: entailment Relabeled 10% of SNLI.

The results Condition % of pairs 5 vote unanimous agreement: 58.3% 3-4 vote majority for one label including author: 32.9% 3-4 vote majority for one label not including original author: 6.8% No majority for any one label: 2.0%

Give it a try! Available now (with an accompanying paper): nlp.stanford.edu/projects/snli Standard distribution includes JSON and tab-separated text. Sentences are included both raw and tokenized+parsed with Stanford CoreNLP.

Thanks! Download the corpus: nlp.stanford.edu/projects/snli More questions? sbowman@stanford.edu We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. Our MTurk workers were amazing, and we thank them too.

Validation: Other metrics % of total labels matching gold label 89.0% % of total labels matching author s label 85.8% Fleiss κ entailment 0.72 contradiction 0.77 neutral 0.60 overall 0.70

Realities of Mechanical Turk Employed ~2,500 workers Used several worker qualification strategies, and bonuses in validation Reviewed and discarded ~100 pairs that were marked as problematic (either in collection or validation). Data entry errors (early sumbission) Bad source captions from Flickr30k ( The image didn t load ) Uninterpretable English Updated FAQ ~10 times to discourage overly regular data Caught ~20 cases of fraud Mostly random guessing on the validation task Reddit (HWTF) reviews extremely positive: Many found task fun