Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques

Transcription

1 Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques Jennifer Foster Natural Language Processing and Language Learning Workshop Nancy 18th June 2010 National Centre for Language Technology, Dublin City University

2 Joint work with: Joachim Wagner, Josef van Genabith, Monica Ward (DCU) Øistein Andersen (Cambridge) Joel Tetreault (ETS), Martin Chodorow (Hunter, CUNY) Montse Maritxalar (University of the Basque Country) Elaine Ui Dhonnchadha (Trinity College, Dublin)

3 Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation

4 Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation

5 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing

12 Approaches to Grammaticality Judging 3 Methods 1 POS-N-gram-based classifier 2 Precision-grammar-based classifier 3 Statistical-parsing-based classifier

16 N-gram-based Classifier Classifies a sentence as ungrammatical if it contains an unusual part of speech sequence The frequencies of least frequent bigram, trigram, 4-gram, 5-gram, 6-gram and 7-gram in the sentence. Frequencies obtained from a reference corpus of grammatical sentences.

19 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words

28 Statistical-parsing-based Classifier

29 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence

36 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm

42 Artificial Data Two types of evidence used in error detection 1 Positive: compare to some model of normal language 2 Negative: compare to some model of deviant language Combining both types of evidence is likely to be useful.

46 Why Artificial Data? Positive data is easy enough to obtain Negative data is less straightforward Find the correct type of text Annotate errors A possible solution? Create negative data automatically.

51 Why Artificial Data? Cheap to create Knowledge of errors still necessary The error will appear in varied contexts - useful for training Number and type of errors can be controlled

55 Artificial Data: Background Targeted Error Detection Sjöbergh & Knutsson, 2005 Brockett et al., 2006 Lee & Seneff, 2008 Rozovskaya & Roth, 2010

56 Artificial Data: Background Grammaticality Rating Wagner et al., 2007 Okanahara & Tsujii, 2007

57 Artificial Data: Background Robustness Evaluation Bigert et al., 2005 Foster, 2007

58 Artificial Data: Background Unsupervised Learning Contrastive Estimation (Smith & Eisner, 2005)

59 Artificial Data

60 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers

66 Artificial data Properties of error corpus (Foster 2005) Most ungrammatical sentences contain only one error Substitute (48%) > Insert (24%) > Delete (17%) > Combination (11%) Most common substitution errors: Real-word spelling errors Agreement errors Wrong verb form

70 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.

76 Evaluation Metrics Accuracy on ungrammatical data acc ungram = #correctly classified as ungrammatical #ungrammatical sentences Accuracy on grammatical data acc gram = #correctly classified as grammatical #grammatical sentences

77 Classifier Results Accuracy graph

78 Classifier Results Region of improvement

79 Classifier Results Region of degradation

80 Classifier Results Undecided

81 Comparing classifier results on artificial data

84 Accuracy Tradeoff with Voting Scheme Train multiple classifiers on overlapping subsets of data Each classifier votes whether sentence is grammatical Parameter: number of votes required for final decision Plot accuracy for all possible parameter values

85 Accuracy Tradeoff with Voting Scheme 0.9 Accuracy on grammatical data Accuracy on ungrammatical data all3 prob comb ngram xle

86 Testing on Data from Learner Corpora What happens when we apply our best classifier to sentences from learner corpora? International Corpus of Learner English (Granger, 1993) Transcribed spoken utterances (Learners of English of various levels and L1s) Microsoft mass noun corpus (Brockett et al, 2006) Performance drops!

91 Testing on Data from Learner Corpora 1 Accuracy on grammatical part Accuracy on ungrammatical part Foster 44 Spoken Essays Mass Noun

92 GenERRate Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation

93 GenERRate GenERRate A tool for introducing grammatical errors into text Available from

96 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere

105 GenERRate GenERRate Input 1 A corpus of well-formed language 2 An error analysis file Output An error-tagged corpus

106 GenERRate GenERRate Input 1 A corpus of well-formed language 2 An error analysis file Output An error-tagged corpus

107 GenERRate GenERRate Error Analysis File subst,word,an,a subst,nns,nn subst,vbg,to delete,dt move,rb,left,1 Input Corpus The DT cats NNS are VBG also RB sitting VBG on IN the DT mat NN.. Output Corpus The cat are also sitting on the mat. The cats are also to sit on the mat. The cats are also sitting on mat. The cats also are sitting on the mat.

110 GenERRate Spoken Learner Corpus Experiment Existing classifier that uses artificial data Can we improve the classifier by using more realistic training data?

111 GenERRate Spoken Learner Corpus Experiment The existing classifier Wagner et al., 2007 n-gram frequency counts Training data BNC sentences Distorted versions of the BNC sentences Test data Sentences from a spoken language learner corpus

112 GenERRate Spoken Learner Corpus Experiment The Spoken Learner Corpus 4,295 utterances Produced by ESL learners in a classroom setting Various levels and L1s Transcribed by the teacher Approx. 500 of these have been corrected

113 GenERRate Spoken Learner Corpus Experiment The new classifier 1 Take out 200 sentences from test data 2 Perform manual error analysis 3 Produce GenERRate error analysis file 4 Use GenERRate to generate new ungrammatical training data

114 GenERRate Spoken Learner Corpus Experiment Results OLD CLASSIFIER 37.0% of the ungrammatical sentences are flagged and 95.5% of the flagged sentences are ungrammatical. NEW CLASSIFIER 51.6% of the ungrammatical sentences are flagged and 94.9% of the flagged sentences are ungrammatical.

117 GenERRate Next version of GenERRate Integration with WordNet Spelling errors Different ways of specifying contextual information, e.g. parsed input Introduce morphological errors

118 Preposition Error Detection Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation

119 Preposition Error Detection Targeted Error Detection Specific Errors Articles Prepositions Preposition Error Detection System Chodorow et al.,2007 Tetreault and Chodorow, 2008

123 Preposition Error Detection Selection and Error Detection Two Tasks Preposition selection in well-formed text There are many local groups the country. Preposition error detection in learner text I had a trip for Italy. Both tasks are trained on well-formed text.

127 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics

134 Preposition Error Detection Research Questions Can we get more informative features by carrying out full syntactic parsing? And how will the parser behave on ESL data? Stanford Parser Phrase Structure Parser (Klein and Manning, 2003) Typed Dependency Generator (de Marneffe et al, 2006)

139 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country

152 Preposition Error Detection Preposition Selection Results Parsing Helps Model Accuracy T&C Phrase Structure Only Dependency Only Parse 68.5

153 Preposition Error Detection Preposition Selection Results Parsing Helps Model Accuracy T&C Phrase Structure Only Dependency Only Parse 68.5

154 Preposition Error Detection Preposition Error Detection Results Parsing Helps Somewhat! Method Precision Recall T&C Parse

155 Preposition Error Detection Parser Accuracy on ESL Data Manual Inspection of 210 Parse Trees Example Parser finds it easier to determine complement Parser is quite robust to preposition errors (S (NP A scientist) (VP devotes (NP (NP his prime part) (PP of (NP his life)) ) (PP in (NP research)) )

159 Test Generation Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation

160 Test Generation Automatic Test Generation H 2 O is a chemical compound consisting of oxygen. Helium Distractor Potassium Distractor Hydrogen Key Carbon Distractor and

163 Test Generation Automatic Test Generation Developed at the University of the Basque Country Generate distractors automatically using semantic similarity measures New Project Apply this to Irish

164 Test Generation Automatic Test Generation Developed at the University of the Basque Country Generate distractors automatically using semantic similarity measures New Project Apply this to Irish

165 Test Generation Automatic Test Generation Scientific knowledge through Irish Irish language Literature Grammar Ceist Agam Ort The End

170 Test Generation The Problem of Covert Errors When to avoid them The cats are worth seeing. The cats are worth to see. The cats are also sitting on the mat. The cats are also to sit on the mat.

175 Test Generation The Problem of Covert Errors When not to avoid them What time did you go to bed in high school? I went to bed at one. What time did you go to bed in high school? I go to bed at one. When I was a high school student I went to bed at one in the morning When I was a high school student I go to bed at one in the morning

180 Test Generation Classifier Results WSJ Test Data

181 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Biogas production is growing rapidly OLD: Biogas production production is growing rapidly NEW: Biogas productions is growing rapidly

184 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Emil was courteous and helpful OLD: Emil as courteous and helpful NEW: Emil courteous and was helpful