Modeling coherence in ESOL learner texts



Similar documents
ETS Research Spotlight. Automated Scoring: Using Entity-Based Features to Model Coherence in Student Essays

Discourse Processing for Context Question Answering Based on Linguistic Knowledge

How To Write A Task For Ielts

How To Pass Cambriac English: First For Schools

Automated Essay Scoring with the E-rater System Yigal Attali Educational Testing Service

Delta. Handbook for tutors and candidates. Module 1, Module 2, Module

Discourse Markers in English Writing

Criterion SM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays

READING SPECIALIST STANDARDS

Do ESL Learners Use Conjunctions As Cohesive Devices In Writing? A Study

Grade 8 English Language Arts Performance Level Descriptors

psychology and its role in comprehension of the text has been explored and employed

Effective Self-Training for Parsing

Sentiment analysis for news articles

Text Analysis for Big Data. Magnus Sahlgren

Overview of MT techniques. Malek Boualem (FT)

1 o Semestre 2007/2008

Key words related to the foci of the paper: master s degree, essay, admission exam, graders

Terminology Extraction from Log Files

Assessing Speaking Performance Level B2

Texas Success Initiative (TSI) Assessment. Interpreting Your Score

Download Check My Words from:

Identifying Focus, Techniques and Domain of Scientific Papers

Text Mining - Scope and Applications

Lingua Inglese II - A.A An Introduction to English in Use: Text, Context and Discourse Prof.ssa Alessandra Molino. Updated syllabus 1

Clustering Connectionist and Statistical Language Processing

Introduction to Pattern Recognition

Grade 6 English Language Arts Performance Level Descriptors

Machine Learning Approach To Augmenting News Headline Generation

INVESTIGATING DISCOURSE MARKERS IN PEDAGOGICAL SETTINGS:

The Cambridge English Scale explained A guide to converting practice test scores to Cambridge English Scale scores

SIXTH GRADE UNIT 1. Reading: Literature

Lexico-Semantic Relations Errors in Senior Secondary School Students Writing ROTIMI TAIWO Obafemi Awolowo University, Ile-Ife, Nigeria

Text Generation for Abstractive Summarization

DEFINING COMPREHENSION

A + dvancer College Readiness Online Alignment to Florida PERT

KSE Comp. support for the writing process 2 1

A chart generator for the Dutch Alpino grammar

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Cambridge English. First. First Certificate in English (FCE) CEFR Level B2. Ready for success in the real world

Developing a Theory-Based Ontology for Best Practices Knowledge Bases

The English Language Learner CAN DO Booklet

Any Town Public Schools Specific School Address, City State ZIP

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

THE IMPACT OF TEXTUAL COHESIVE CONJUNCTIONS ON THE READING COMPREHENSION OF FOREIGN LANGUAGE STUDENTS

GCSE English Language

How To Write The English Language Learner Can Do Booklet

A Guide to Cambridge English: Preliminary

Reading Competencies

ETS Automated Scoring and NLP Technologies

Auto-Classification for Document Archiving and Records Declaration

THE BACHELOR S DEGREE IN SPANISH

Semantic Mapping Between Natural Language Questions and SQL Queries via Syntactic Pairing

LEVEL ECONOMICS. ECON2/Unit 2 The National Economy Mark scheme. June Version 1.0/Final

Reading Specialist (151)

Studying the Impact of Text Summarization on Contextual Advertising

lnteractivity in CALL Courseware Design Carla Meskill University of Massachusetts/Boston

Skills for Effective Business Communication: Efficiency, Collaboration, and Success

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

An Overview of Applied Linguistics

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

QRadar SIEM and Zscaler Nanolog Streaming Service

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Promoting Cohesion in EFL Expository Writing: A Study of Graduate Students in Thailand

PREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM

The compositional semantics of same

Language Meaning and Use

Texas Success Initiative (TSI) Assessment

TextRank: Bringing Order into Texts

Level 4 Certificate in English for Business

Library and Information Service Rangoon. English Language Collection. Title. Barcode Ideas for Teaching English 1,000 most important words

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

[C115/SQP220] NATIONAL QUALIFICATIONS. Advanced Higher English Principles of Marking [C115/SQP220] 33

TEXT LINGUISTICS: RELEVANT LINGUISTICS? WAM Carstens School of Languages and Arts, Potchefstroom University for CHE

SOUTH SEATTLE COMMUNITY COLLEGE (General Education) COURSE OUTLINE Revision: (Don Bissonnette and Kris Lysaker) July 2009

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

Reading Specialist Certification

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Natural Language Database Interface for the Community Based Monitoring System *

2014/02/13 Sphinx Lunch

Terminology Extraction from Log Files

Illinois Institute of Technology Chicago, U.S.A.

Transcription:

University of Cambridge Computer Lab Building Educational Applications NAACL 2012

Outline 1 2 3 4

The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion The Task: Automated Text Scoring (ATS) Automated Text Scoring (ATS) Automatically analyse the quality of writing competence and assign a score to a text Goal Evaluate writing as reliably as human readers Challenges Imitate the value judgements that human readers make when they mark a text

ATS systems The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion Marking criteria Identify textual features that correlate with intrinsic features of human judgments Multiple factors influence the linguistic quality of texts Grammar, style, vocabulary usage, topic similarity, discourse coherence and cohesion, etc.

ATS systems The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion Marking criteria Identify textual features that correlate with intrinsic features of human judgments Multiple factors influence the linguistic quality of texts Grammar, style, vocabulary usage, topic similarity, discourse coherence and cohesion, etc.

Discourse coherence & cohesion The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion Mechanisms Cohesion: use of cohesive devices that can signal primarily suprasentential discourse relations between textual units (Halliday and Hasan, 1976) Anaphora, discourse markers, etc. Local coherence: transitions between textual units Global coherence: sequence of topics

Discourse coherence & cohesion The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion Related work Coherence analysis on: News texts e.g., Lin et al. (2011), Elsner and Charniak (2008), Soricut and Marcu (2006), etc. Extractive summaries Pitler et al. (2010) Learner data Miltsakaki and Kukich (2004), Higgins and Burstein (2007), Burstein et al. (2010)

First Certificate in English (FCE) exam First Certificate in English (FCE) exam FCE Writing Component Upper-intermediate level assessment Two tasks eliciting free-text answers, each one between 120 and 180 words e.g. write a short story commencing... Answers annotated with mark (in the range 1 40), fitted to a RASCH model (Fischer and Molenaar, 1995) Manually error-coded using a taxonomy of 80 error types (Nicholls, 2003)

system ATS system described in Yannakoudakis et al. (2011) Features focus on lexical and grammatical properties, as well as errors Discourse coherence ignored Vulnerable to subversion Extend with discourse coherence features

Machine Learning Address ATS as a ranking learning problem (Joachims, 2002) Learn an optimal ranking function that explicitly models the grade relationships between scripts Model the fact that some scripts are better than others

Superficial proxies Number of pronouns

Superficial proxies Number of pronouns Number of discourse connectives Addition (e.g., additionally) Comparison (e.g., likewise) Constrast (e.g., whereas) Conclusion (e.g., therefore)

Superficial proxies Number of pronouns Number of discourse connectives Addition (e.g., additionally) Comparison (e.g., likewise) Constrast (e.g., whereas) Conclusion (e.g., therefore) Word length

Lemma/PoS cosine similarity Incorporate (syntactic) aspects of text coherence Represent sentences using vectors of lemma/pos-tag counts Cosine similarity between adjacent sentences: cos(θ) = s i s i+1 s i s i+1 (1) Coherence of a text T : coherence(t ) = n 1 i=1 sim(s i, s i+1 ) n 1 (2)

Incremental Semantic Analysis (ISA) Word space model (Baroni et al., 2007) Fully-incremental variation of Random Indexing (Sahlgren, 2005) Similarity among words measured by comparing their context vectors Coherence of a text T : coherence(t ) = n 1 i=1 max k,j sim(s k i, s j i+1 ) n 1 (3) Underlying idea: the degree of semantic relatedness between adjoining sentences serves as a proxy for local discourse coherence

IBM model 1 Machine translation: the use of certain words in a source language is likely to trigger the use of certain words in a target language In texts: the use of certain words in a sentence tends to trigger the use of certain words in an adjoining sentence (Soricut and Marcu, 2006) Identification of word co-occurrence patterns across adjacent sentences Probability of a text T : n 1 P IBMdir (T ) = i=1 s i+1 j=1 ε s i + 1 s i k=0 t(s j i+1 sk i ) (4)

IBM model 1 Machine translation: the use of certain words in a source language is likely to trigger the use of certain words in a target language In texts: the use of certain words in a sentence tends to trigger the use of certain words in an adjoining sentence (Soricut and Marcu, 2006) Identification of word co-occurrence patterns across adjacent sentences Probability of a text T : n 1 P IBMdir (T ) = i=1 s i+1 j=1 ε s i + 1 s i k=0 t(s j i+1 sk i ) (4) Extend: identification of PoS co-occurrence patterns

Entity-based coherence model Measures local coherence on the basis of sequences of entity mentions (Barzilay and Lapata, 2008) Learns coherence properties similar to those employed by Centering Theory (Grosz et al., 1995) Each text is represented by an entity grid that captures the distribution of discourse entities across sentences

Pronoun coreference model Unsupervised generative model (Charniak and Elsner, 2009) Model each pronoun as generated by an antecedent somewhere in the previous 2 sentences Probability of a text: probability of the resulting sequence of pronoun assignments

Discourse-new model Discourse-new classifier (Elsner and Charniak, 2008) Distinguish NPs whose referents have not been previously mentioned in the discourse from those that have Probability of a text: np:nps P(L np np)

Bag-of-Words (BOW) Represent a text as a vector d R V Each word v i is associated with a single vector dimension Histogram of word occurrences Unable to maintain any sequential information Unable to capture the semantic transition between different parts of the document Partial solution: use ngrams

Locally-Weighted Bag-of-Words (LoWBOW) LoWBOW: sequentially-sensitive alternative to BOW (Lebanon et al., 2007) A text is represented by a set of local histograms computed across the whole text, but centered on different locations Preserves local contextual information by modeling the text sequential structure

Locally-Weighted Bag-of-Words (LoWBOW) cont.

examination year 2000 r ρ 0 0.651 0.670 1 POS distr. 0.653 0.670 2 Disc. connectives 0.648 0.668 3 Word length 0.667 0.676 4 ISA 0.675 0.678 5 EGrid 0.650 0.668 6 Pronoun 0.650 0.668 7 Disc-new 0.646 0.662 8 LoWBOW lex 0.663 0.677 9 LoWBOW POS 0.659 0.674 r ρ 0 0.651 0.670 10 IBM model lexf 0.649 0.668 11 IBM model lexb 0.649 0.667 12 IBM model POSf 0.661 0.672 13 IBM model POSb 0.658 0.669 14 Lemma cosine 0.651 0.667 15 POS cosine 0.650 0.665 16 5+6+7+10+11 0.648 0.665 17 All 0.677 0.671 Table: 5-fold cross-validation performance on texts from year 2000 when adding different coherence features on top of the baseline AA system.

examination year 2001 & outlier scripts r ρ 0.741 0.773 ISA 0.749 0.790 Upper-bound 0.796 0.792 Table: Performance on the exam scripts drawn from the examination year 2001. indicates a significant improvement at α = 0.05. r ρ 0.08 0.163 ISA 0.400 0.626 Table: Performance of the ISA AA model on outliers.

& Future Work First systematic analysis of different models for assessing discourse coherence on learner data Significant improvement over Yannakoudakis et al. (2011) ISA, LOWBOW, the POS IBM model and word length are the best individual features Local histograms are useful specific to ESOL FCE texts Investigate a wider range of (learner) texts and further coherence models (e.g., Elsner and Charniak (2011a) and Lin et al. (2011)).

Thank you! Acknowledgments: we are grateful to Cambridge ESOL for supporting this research. We would like to thank Marek Rei, Øistein Andersen as well as the anonymous reviewers for their valuable comments and suggestions.