Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques
|
|
- Benjamin Lewis
- 7 years ago
- Views:
Transcription
1 Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques Jennifer Foster Natural Language Processing and Language Learning Workshop Nancy 18th June 2010 National Centre for Language Technology, Dublin City University
2 Joint work with: Joachim Wagner, Josef van Genabith, Monica Ward (DCU) Øistein Andersen (Cambridge) Joel Tetreault (ETS), Martin Chodorow (Hunter, CUNY) Montse Maritxalar (University of the Basque Country) Elaine Ui Dhonnchadha (Trinity College, Dublin)
3 Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation
4 Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation
5 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
6 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
7 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
8 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
9 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
10 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
11 Classifying a sentence as grammatical or ungrammatical Automatic essay grading First step before targeted error detection/correction (Limited) Feedback to advanced learners Mining bilingual sentences from web Evaluating MT output Robust parsing
12 Approaches to Grammaticality Judging 3 Methods 1 POS-N-gram-based classifier 2 Precision-grammar-based classifier 3 Statistical-parsing-based classifier
13 Approaches to Grammaticality Judging 3 Methods 1 POS-N-gram-based classifier 2 Precision-grammar-based classifier 3 Statistical-parsing-based classifier
14 Approaches to Grammaticality Judging 3 Methods 1 POS-N-gram-based classifier 2 Precision-grammar-based classifier 3 Statistical-parsing-based classifier
15 Approaches to Grammaticality Judging 3 Methods 1 POS-N-gram-based classifier 2 Precision-grammar-based classifier 3 Statistical-parsing-based classifier
16 N-gram-based Classifier Classifies a sentence as ungrammatical if it contains an unusual part of speech sequence The frequencies of least frequent bigram, trigram, 4-gram, 5-gram, 6-gram and 7-gram in the sentence. Frequencies obtained from a reference corpus of grammatical sentences.
17 N-gram-based Classifier Classifies a sentence as ungrammatical if it contains an unusual part of speech sequence The frequencies of least frequent bigram, trigram, 4-gram, 5-gram, 6-gram and 7-gram in the sentence. Frequencies obtained from a reference corpus of grammatical sentences.
18 N-gram-based Classifier Classifies a sentence as ungrammatical if it contains an unusual part of speech sequence The frequencies of least frequent bigram, trigram, 4-gram, 5-gram, 6-gram and 7-gram in the sentence. Frequencies obtained from a reference corpus of grammatical sentences.
19 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
20 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
21 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
22 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
23 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
24 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
25 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
26 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
27 Precision-grammar-based Classifier Classifies a sentence using a parser and a broad-coverage hand-written grammar ParGram English LFG and XLE engine Features: 1 whether or not the sentence can be parsed... without resorting to special robustness mechanisms 2 parsing time 3 number of parses found 4 number of optimal/unoptimal constraints used during parsing 5 number of words
28 Statistical-parsing-based Classifier
29 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
30 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
31 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
32 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
33 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
34 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
35 Statistical-parsing-based Classifier Charniak and Johnson reranking parser 1 Trained on grammatical Wall Street Journal sentences 2 Trained on ungrammatical versions of the Wall Street Journal sentences 3 Trained on a mixture of above Classifier Features 1 Parse probabilities assigned by the three grammars 2 Structural differences between the three trees for a sentence
36 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm
37 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm
38 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm
39 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm
40 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm
41 Classifier Experiments Classifiers trained on sentences from the British National Corpus 50%: original grammatical sentences 50%: artificially created ungrammatical sentences 10-fold cross validation J48 decision tree machine learning algorithm
42 Artificial Data Two types of evidence used in error detection 1 Positive: compare to some model of normal language 2 Negative: compare to some model of deviant language Combining both types of evidence is likely to be useful.
43 Artificial Data Two types of evidence used in error detection 1 Positive: compare to some model of normal language 2 Negative: compare to some model of deviant language Combining both types of evidence is likely to be useful.
44 Artificial Data Two types of evidence used in error detection 1 Positive: compare to some model of normal language 2 Negative: compare to some model of deviant language Combining both types of evidence is likely to be useful.
45 Artificial Data Two types of evidence used in error detection 1 Positive: compare to some model of normal language 2 Negative: compare to some model of deviant language Combining both types of evidence is likely to be useful.
46 Why Artificial Data? Positive data is easy enough to obtain Negative data is less straightforward Find the correct type of text Annotate errors A possible solution? Create negative data automatically.
47 Why Artificial Data? Positive data is easy enough to obtain Negative data is less straightforward Find the correct type of text Annotate errors A possible solution? Create negative data automatically.
48 Why Artificial Data? Positive data is easy enough to obtain Negative data is less straightforward Find the correct type of text Annotate errors A possible solution? Create negative data automatically.
49 Why Artificial Data? Positive data is easy enough to obtain Negative data is less straightforward Find the correct type of text Annotate errors A possible solution? Create negative data automatically.
50 Why Artificial Data? Positive data is easy enough to obtain Negative data is less straightforward Find the correct type of text Annotate errors A possible solution? Create negative data automatically.
51 Why Artificial Data? Cheap to create Knowledge of errors still necessary The error will appear in varied contexts - useful for training Number and type of errors can be controlled
52 Why Artificial Data? Cheap to create Knowledge of errors still necessary The error will appear in varied contexts - useful for training Number and type of errors can be controlled
53 Why Artificial Data? Cheap to create Knowledge of errors still necessary The error will appear in varied contexts - useful for training Number and type of errors can be controlled
54 Why Artificial Data? Cheap to create Knowledge of errors still necessary The error will appear in varied contexts - useful for training Number and type of errors can be controlled
55 Artificial Data: Background Targeted Error Detection Sjöbergh & Knutsson, 2005 Brockett et al., 2006 Lee & Seneff, 2008 Rozovskaya & Roth, 2010
56 Artificial Data: Background Grammaticality Rating Wagner et al., 2007 Okanahara & Tsujii, 2007
57 Artificial Data: Background Robustness Evaluation Bigert et al., 2005 Foster, 2007
58 Artificial Data: Background Unsupervised Learning Contrastive Estimation (Smith & Eisner, 2005)
59 Artificial Data
60 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers
61 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers
62 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers
63 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers
64 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers
65 Artificial Data Authentic error corpus (Foster 2005) 923 ungrammatical sentences Ungrammatical sentences encountered when reading were noted and corrected Errors annotated in terms of correction operations Sources: newspapers, s, academic papers Produced by native and non-native English speakers
66 Artificial data Properties of error corpus (Foster 2005) Most ungrammatical sentences contain only one error Substitute (48%) > Insert (24%) > Delete (17%) > Combination (11%) Most common substitution errors: Real-word spelling errors Agreement errors Wrong verb form
67 Artificial data Properties of error corpus (Foster 2005) Most ungrammatical sentences contain only one error Substitute (48%) > Insert (24%) > Delete (17%) > Combination (11%) Most common substitution errors: Real-word spelling errors Agreement errors Wrong verb form
68 Artificial data Properties of error corpus (Foster 2005) Most ungrammatical sentences contain only one error Substitute (48%) > Insert (24%) > Delete (17%) > Combination (11%) Most common substitution errors: Real-word spelling errors Agreement errors Wrong verb form
69 Artificial data Properties of error corpus (Foster 2005) Most ungrammatical sentences contain only one error Substitute (48%) > Insert (24%) > Delete (17%) > Combination (11%) Most common substitution errors: Real-word spelling errors Agreement errors Wrong verb form
70 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.
71 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.
72 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.
73 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.
74 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.
75 Artificial Data Five most common error types are replicated. Agreement: She steered Melissa around a corners. Real-word-spell: She could no comprehend. Extra word: Was that in the summer in? Missing word: What the subject? Verb Form: I didn t wanted to delete it. Error creation procedure expects a part-of-speech tagged corpus as input.
76 Evaluation Metrics Accuracy on ungrammatical data acc ungram = #correctly classified as ungrammatical #ungrammatical sentences Accuracy on grammatical data acc gram = #correctly classified as grammatical #grammatical sentences
77 Classifier Results Accuracy graph
78 Classifier Results Region of improvement
79 Classifier Results Region of degradation
80 Classifier Results Undecided
81 Comparing classifier results on artificial data
82 Comparing classifier results on artificial data
83 Comparing classifier results on artificial data
84 Accuracy Tradeoff with Voting Scheme Train multiple classifiers on overlapping subsets of data Each classifier votes whether sentence is grammatical Parameter: number of votes required for final decision Plot accuracy for all possible parameter values
85 Accuracy Tradeoff with Voting Scheme 0.9 Accuracy on grammatical data Accuracy on ungrammatical data all3 prob comb ngram xle
86 Testing on Data from Learner Corpora What happens when we apply our best classifier to sentences from learner corpora? International Corpus of Learner English (Granger, 1993) Transcribed spoken utterances (Learners of English of various levels and L1s) Microsoft mass noun corpus (Brockett et al, 2006) Performance drops!
87 Testing on Data from Learner Corpora What happens when we apply our best classifier to sentences from learner corpora? International Corpus of Learner English (Granger, 1993) Transcribed spoken utterances (Learners of English of various levels and L1s) Microsoft mass noun corpus (Brockett et al, 2006) Performance drops!
88 Testing on Data from Learner Corpora What happens when we apply our best classifier to sentences from learner corpora? International Corpus of Learner English (Granger, 1993) Transcribed spoken utterances (Learners of English of various levels and L1s) Microsoft mass noun corpus (Brockett et al, 2006) Performance drops!
89 Testing on Data from Learner Corpora What happens when we apply our best classifier to sentences from learner corpora? International Corpus of Learner English (Granger, 1993) Transcribed spoken utterances (Learners of English of various levels and L1s) Microsoft mass noun corpus (Brockett et al, 2006) Performance drops!
90 Testing on Data from Learner Corpora What happens when we apply our best classifier to sentences from learner corpora? International Corpus of Learner English (Granger, 1993) Transcribed spoken utterances (Learners of English of various levels and L1s) Microsoft mass noun corpus (Brockett et al, 2006) Performance drops!
91 Testing on Data from Learner Corpora 1 Accuracy on grammatical part Accuracy on ungrammatical part Foster 44 Spoken Essays Mass Noun
92 GenERRate Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation
93 GenERRate GenERRate A tool for introducing grammatical errors into text Available from
94 GenERRate GenERRate A tool for introducing grammatical errors into text Available from
95 GenERRate GenERRate A tool for introducing grammatical errors into text Available from
96 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
97 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
98 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
99 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
100 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
101 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
102 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
103 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
104 GenERRate GenERRate: Supported Error Types Error Deletion Insertion Move Subst DeletionPOS InsertionFromFileOrSentence MovePOS SubPos Form DeletionPOSWhere InsertionPOS MovePOSWhere NewPOS Specific InsertionPOSWhere
105 GenERRate GenERRate Input 1 A corpus of well-formed language 2 An error analysis file Output An error-tagged corpus
106 GenERRate GenERRate Input 1 A corpus of well-formed language 2 An error analysis file Output An error-tagged corpus
107 GenERRate GenERRate Error Analysis File subst,word,an,a subst,nns,nn subst,vbg,to delete,dt move,rb,left,1 Input Corpus The DT cats NNS are VBG also RB sitting VBG on IN the DT mat NN.. Output Corpus The cat are also sitting on the mat. The cats are also to sit on the mat. The cats are also sitting on mat. The cats also are sitting on the mat.
108 GenERRate GenERRate Error Analysis File subst,word,an,a subst,nns,nn subst,vbg,to delete,dt move,rb,left,1 Input Corpus The DT cats NNS are VBG also RB sitting VBG on IN the DT mat NN.. Output Corpus The cat are also sitting on the mat. The cats are also to sit on the mat. The cats are also sitting on mat. The cats also are sitting on the mat.
109 GenERRate GenERRate Error Analysis File subst,word,an,a subst,nns,nn subst,vbg,to delete,dt move,rb,left,1 Input Corpus The DT cats NNS are VBG also RB sitting VBG on IN the DT mat NN.. Output Corpus The cat are also sitting on the mat. The cats are also to sit on the mat. The cats are also sitting on mat. The cats also are sitting on the mat.
110 GenERRate Spoken Learner Corpus Experiment Existing classifier that uses artificial data Can we improve the classifier by using more realistic training data?
111 GenERRate Spoken Learner Corpus Experiment The existing classifier Wagner et al., 2007 n-gram frequency counts Training data BNC sentences Distorted versions of the BNC sentences Test data Sentences from a spoken language learner corpus
112 GenERRate Spoken Learner Corpus Experiment The Spoken Learner Corpus 4,295 utterances Produced by ESL learners in a classroom setting Various levels and L1s Transcribed by the teacher Approx. 500 of these have been corrected
113 GenERRate Spoken Learner Corpus Experiment The new classifier 1 Take out 200 sentences from test data 2 Perform manual error analysis 3 Produce GenERRate error analysis file 4 Use GenERRate to generate new ungrammatical training data
114 GenERRate Spoken Learner Corpus Experiment Results OLD CLASSIFIER 37.0% of the ungrammatical sentences are flagged and 95.5% of the flagged sentences are ungrammatical. NEW CLASSIFIER 51.6% of the ungrammatical sentences are flagged and 94.9% of the flagged sentences are ungrammatical.
115 GenERRate Spoken Learner Corpus Experiment Results OLD CLASSIFIER 37.0% of the ungrammatical sentences are flagged and 95.5% of the flagged sentences are ungrammatical. NEW CLASSIFIER 51.6% of the ungrammatical sentences are flagged and 94.9% of the flagged sentences are ungrammatical.
116 GenERRate Spoken Learner Corpus Experiment Results OLD CLASSIFIER 37.0% of the ungrammatical sentences are flagged and 95.5% of the flagged sentences are ungrammatical. NEW CLASSIFIER 51.6% of the ungrammatical sentences are flagged and 94.9% of the flagged sentences are ungrammatical.
117 GenERRate Next version of GenERRate Integration with WordNet Spelling errors Different ways of specifying contextual information, e.g. parsed input Introduce morphological errors
118 Preposition Error Detection Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation
119 Preposition Error Detection Targeted Error Detection Specific Errors Articles Prepositions Preposition Error Detection System Chodorow et al.,2007 Tetreault and Chodorow, 2008
120 Preposition Error Detection Targeted Error Detection Specific Errors Articles Prepositions Preposition Error Detection System Chodorow et al.,2007 Tetreault and Chodorow, 2008
121 Preposition Error Detection Targeted Error Detection Specific Errors Articles Prepositions Preposition Error Detection System Chodorow et al.,2007 Tetreault and Chodorow, 2008
122 Preposition Error Detection Targeted Error Detection Specific Errors Articles Prepositions Preposition Error Detection System Chodorow et al.,2007 Tetreault and Chodorow, 2008
123 Preposition Error Detection Selection and Error Detection Two Tasks Preposition selection in well-formed text There are many local groups the country. Preposition error detection in learner text I had a trip for Italy. Both tasks are trained on well-formed text.
124 Preposition Error Detection Selection and Error Detection Two Tasks Preposition selection in well-formed text There are many local groups the country. Preposition error detection in learner text I had a trip for Italy. Both tasks are trained on well-formed text.
125 Preposition Error Detection Selection and Error Detection Two Tasks Preposition selection in well-formed text There are many local groups the country. Preposition error detection in learner text I had a trip for Italy. Both tasks are trained on well-formed text.
126 Preposition Error Detection Selection and Error Detection Two Tasks Preposition selection in well-formed text There are many local groups the country. Preposition error detection in learner text I had a trip for Italy. Both tasks are trained on well-formed text.
127 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
128 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
129 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
130 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
131 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
132 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
133 Preposition Error Detection Baseline Features There are many local groups the country. Contextual features (token and POS) Preceding noun (PN) (token and POS) Preceding verb (PV) (token and POS) Following noun (FN) (token and POS) PN-PV-FN combinations PN, PV and FN are determined using POS-based heuristics
134 Preposition Error Detection Research Questions Can we get more informative features by carrying out full syntactic parsing? And how will the parser behave on ESL data? Stanford Parser Phrase Structure Parser (Klein and Manning, 2003) Typed Dependency Generator (de Marneffe et al, 2006)
135 Preposition Error Detection Research Questions Can we get more informative features by carrying out full syntactic parsing? And how will the parser behave on ESL data? Stanford Parser Phrase Structure Parser (Klein and Manning, 2003) Typed Dependency Generator (de Marneffe et al, 2006)
136 Preposition Error Detection Research Questions Can we get more informative features by carrying out full syntactic parsing? And how will the parser behave on ESL data? Stanford Parser Phrase Structure Parser (Klein and Manning, 2003) Typed Dependency Generator (de Marneffe et al, 2006)
137 Preposition Error Detection Research Questions Can we get more informative features by carrying out full syntactic parsing? And how will the parser behave on ESL data? Stanford Parser Phrase Structure Parser (Klein and Manning, 2003) Typed Dependency Generator (de Marneffe et al, 2006)
138 Preposition Error Detection Research Questions Can we get more informative features by carrying out full syntactic parsing? And how will the parser behave on ESL data? Stanford Parser Phrase Structure Parser (Klein and Manning, 2003) Typed Dependency Generator (de Marneffe et al, 2006)
139 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
140 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
141 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
142 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
143 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
144 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
145 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
146 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
147 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
148 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
149 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
150 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
151 Preposition Error Detection Parse Features NP DT many NP JJ local NNS groups amod(groups-3, many-1) amod(groups-3, local-2) prep(groups-3, around-4) det(country-6, the-5) pobj(around-4, country-6) IN around PP NP DT NN the country
152 Preposition Error Detection Preposition Selection Results Parsing Helps Model Accuracy T&C Phrase Structure Only Dependency Only Parse 68.5
153 Preposition Error Detection Preposition Selection Results Parsing Helps Model Accuracy T&C Phrase Structure Only Dependency Only Parse 68.5
154 Preposition Error Detection Preposition Error Detection Results Parsing Helps Somewhat! Method Precision Recall T&C Parse
155 Preposition Error Detection Parser Accuracy on ESL Data Manual Inspection of 210 Parse Trees Example Parser finds it easier to determine complement Parser is quite robust to preposition errors (S (NP A scientist) (VP devotes (NP (NP his prime part) (PP of (NP his life)) ) (PP in (NP research)) )
156 Preposition Error Detection Parser Accuracy on ESL Data Manual Inspection of 210 Parse Trees Example Parser finds it easier to determine complement Parser is quite robust to preposition errors (S (NP A scientist) (VP devotes (NP (NP his prime part) (PP of (NP his life)) ) (PP in (NP research)) )
157 Preposition Error Detection Parser Accuracy on ESL Data Manual Inspection of 210 Parse Trees Example Parser finds it easier to determine complement Parser is quite robust to preposition errors (S (NP A scientist) (VP devotes (NP (NP his prime part) (PP of (NP his life)) ) (PP in (NP research)) )
158 Preposition Error Detection Parser Accuracy on ESL Data Manual Inspection of 210 Parse Trees Example Parser finds it easier to determine complement Parser is quite robust to preposition errors (S (NP A scientist) (VP devotes (NP (NP his prime part) (PP of (NP his life)) ) (PP in (NP research)) )
159 Test Generation Outline 1 2 GenERRate 3 Preposition Error Detection 4 Test Generation
160 Test Generation Automatic Test Generation H 2 O is a chemical compound consisting of oxygen. Helium Distractor Potassium Distractor Hydrogen Key Carbon Distractor and
161 Test Generation Automatic Test Generation H 2 O is a chemical compound consisting of oxygen. Helium Distractor Potassium Distractor Hydrogen Key Carbon Distractor and
162 Test Generation Automatic Test Generation H 2 O is a chemical compound consisting of oxygen. Helium Distractor Potassium Distractor Hydrogen Key Carbon Distractor and
163 Test Generation Automatic Test Generation Developed at the University of the Basque Country Generate distractors automatically using semantic similarity measures New Project Apply this to Irish
164 Test Generation Automatic Test Generation Developed at the University of the Basque Country Generate distractors automatically using semantic similarity measures New Project Apply this to Irish
165 Test Generation Automatic Test Generation Scientific knowledge through Irish Irish language Literature Grammar Ceist Agam Ort The End
166 Test Generation Automatic Test Generation Scientific knowledge through Irish Irish language Literature Grammar Ceist Agam Ort The End
167 Test Generation Automatic Test Generation Scientific knowledge through Irish Irish language Literature Grammar Ceist Agam Ort The End
168 Test Generation Automatic Test Generation Scientific knowledge through Irish Irish language Literature Grammar Ceist Agam Ort The End
169 Test Generation Automatic Test Generation Scientific knowledge through Irish Irish language Literature Grammar Ceist Agam Ort The End
170 Test Generation The Problem of Covert Errors When to avoid them The cats are worth seeing. The cats are worth to see. The cats are also sitting on the mat. The cats are also to sit on the mat.
171 Test Generation The Problem of Covert Errors When to avoid them The cats are worth seeing. The cats are worth to see. The cats are also sitting on the mat. The cats are also to sit on the mat.
172 Test Generation The Problem of Covert Errors When to avoid them The cats are worth seeing. The cats are worth to see. The cats are also sitting on the mat. The cats are also to sit on the mat.
173 Test Generation The Problem of Covert Errors When to avoid them The cats are worth seeing. The cats are worth to see. The cats are also sitting on the mat. The cats are also to sit on the mat.
174 Test Generation The Problem of Covert Errors When to avoid them The cats are worth seeing. The cats are worth to see. The cats are also sitting on the mat. The cats are also to sit on the mat.
175 Test Generation The Problem of Covert Errors When not to avoid them What time did you go to bed in high school? I went to bed at one. What time did you go to bed in high school? I go to bed at one. When I was a high school student I went to bed at one in the morning When I was a high school student I go to bed at one in the morning
176 Test Generation The Problem of Covert Errors When not to avoid them What time did you go to bed in high school? I went to bed at one. What time did you go to bed in high school? I go to bed at one. When I was a high school student I went to bed at one in the morning When I was a high school student I go to bed at one in the morning
177 Test Generation The Problem of Covert Errors When not to avoid them What time did you go to bed in high school? I went to bed at one. What time did you go to bed in high school? I go to bed at one. When I was a high school student I went to bed at one in the morning When I was a high school student I go to bed at one in the morning
178 Test Generation The Problem of Covert Errors When not to avoid them What time did you go to bed in high school? I went to bed at one. What time did you go to bed in high school? I go to bed at one. When I was a high school student I went to bed at one in the morning When I was a high school student I go to bed at one in the morning
179 Test Generation The Problem of Covert Errors When not to avoid them What time did you go to bed in high school? I went to bed at one. What time did you go to bed in high school? I go to bed at one. When I was a high school student I went to bed at one in the morning When I was a high school student I go to bed at one in the morning
180 Test Generation Classifier Results WSJ Test Data
181 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Biogas production is growing rapidly OLD: Biogas production production is growing rapidly NEW: Biogas productions is growing rapidly
182 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Biogas production is growing rapidly OLD: Biogas production production is growing rapidly NEW: Biogas productions is growing rapidly
183 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Biogas production is growing rapidly OLD: Biogas production production is growing rapidly NEW: Biogas productions is growing rapidly
184 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Emil was courteous and helpful OLD: Emil as courteous and helpful NEW: Emil courteous and was helpful
185 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Emil was courteous and helpful OLD: Emil as courteous and helpful NEW: Emil courteous and was helpful
186 Test Generation Spoken Learner Corpus Experiment Training data examples ORIGINAL: Emil was courteous and helpful OLD: Emil as courteous and helpful NEW: Emil courteous and was helpful
Training Paradigms for Correcting Errors in Grammar and Usage
NAACL 10 Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign Urbana, IL 61801 {rozovska,danr}@illinois.edu Abstract This
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationSelf-Training for Parsing Learner Text
elf-training for Parsing Learner Text Aoife Cahill, Binod Gyawali and James V. Bruno Educational Testing ervice, 660 Rosedale Road, Princeton, NJ 0854, UA {acahill, bgyawali, jbruno}@ets.org Abstract We
More informationEffective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More informationMeasuring Language Development in Early Childhood Education: A Case Study of Grammar Checking in Child Language Transcripts
Measuring Language Development in Early Childhood Education: A Case Study of Grammar Checking in Child Language Transcripts Khairun-nisa Hassanali Computer Science Department The University of Texas at
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationAutomatic Correction of Grammatical Errors in Non-native English Text. John Sie Yuen Lee
Automatic Correction of Grammatical Errors in Non-native English Text by John Sie Yuen Lee BMath in Computer Science, University of Waterloo (2002) S.M. in Electrical Engineering and Computer Science,
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationLearning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationGrammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationChunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA
Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach
More informationArtificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen
Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen Date: 2010-01-11 Time: 08:15-11:15 Teacher: Mathias Broxvall Phone: 301438 Aids: Calculator and/or a Swedish-English dictionary Points: The
More informationSyntax: Phrases. 1. The phrase
Syntax: Phrases Sentences can be divided into phrases. A phrase is a group of words forming a unit and united around a head, the most important part of the phrase. The head can be a noun NP, a verb VP,
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationTeaching Vocabulary to Young Learners (Linse, 2005, pp. 120-134)
Teaching Vocabulary to Young Learners (Linse, 2005, pp. 120-134) Very young children learn vocabulary items related to the different concepts they are learning. When children learn numbers or colors in
More informationWhy language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
More informationClustering Semantically Similar and Related Questions
Clustering Semantically Similar and Related Questions Deepa Paranjpe deepap@stanford.edu 1 ABSTRACT The success of online question answering communities that allow humans to answer questions posed by other
More informationRegister Differences between Prefabs in Native and EFL English
Register Differences between Prefabs in Native and EFL English MARIA WIKTORSSON 1 Introduction In the later stages of EFL (English as a Foreign Language) learning, and foreign language learning in general,
More informationSentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:
More informationParaphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich kaljurand@gmail.com Abstract. We discuss paraphrasing controlled English texts, by defining
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationCourse Syllabus My TOEFL ibt Preparation Course Online sessions: M, W, F 15:00-16:30 PST
Course Syllabus My TOEFL ibt Preparation Course Online sessions: M, W, F Instructor Contact Information Office Location Virtual Office Hours Course Announcements Email Technical support Anastasiia V. Mixcoatl-Martinez
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
More informationAutomatic Detection and Correction of Errors in Dependency Treebanks
Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany alexander.volokh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationTHE knowledge needed by software developers
SUBMITTED TO IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Extracting Development Tasks to Navigate Software Documentation Christoph Treude, Martin P. Robillard and Barthélémy Dagenais Abstract Knowledge
More informationUNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics
More informationA Swedish Grammar for Word Prediction
A Swedish Grammar for Word Prediction Ebba Gustavii and Eva Pettersson ebbag,evapet @stp.ling.uu.se Master s thesis in Computational Linguistics Språkteknologiprogrammet (Language Engineering Programme)
More informationPOS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23
POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1
More informationMulti-Engine Machine Translation by Recursive Sentence Decomposition
Multi-Engine Machine Translation by Recursive Sentence Decomposition Bart Mellebeek Karolina Owczarzak Josef van Genabith Andy Way National Centre for Language Technology School of Computing Dublin City
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationThe Specific Text Analysis Tasks at the Beginning of MDA Life Cycle
SCIENTIFIC PAPERS, UNIVERSITY OF LATVIA, 2010. Vol. 757 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES 11 22 P. The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle Armands Šlihte Faculty
More informationD2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationLanguage Arts Literacy Areas of Focus: Grade 6
Language Arts Literacy : Grade 6 Mission: Learning to read, write, speak, listen, and view critically, strategically and creatively enables students to discover personal and shared meaning throughout their
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationCHARACTERISTICS FOR STUDENTS WITH: LIMITED ENGLISH PROFICIENCY (LEP)
CHARACTERISTICS FOR STUDENTS WITH: LIMITED ENGLISH PROFICIENCY (LEP) Research has shown that students acquire a second language in the same way that they acquire the first language. It is an exploratory
More informationSYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE
SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE OBJECTIVES the game is to say something new with old words RALPH WALDO EMERSON, Journals (1849) In this chapter, you will learn: how we categorize words how words
More informationModern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,
More informationAutomated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
More informationReading Listening and speaking Writing. Reading Listening and speaking Writing. Grammar in context: present Identifying the relevance of
Acknowledgements Page 3 Introduction Page 8 Academic orientation Page 10 Setting study goals in academic English Focusing on academic study Reading and writing in academic English Attending lectures Studying
More informationCS 533: Natural Language. Word Prediction
CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed
More informationPOS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches
POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ
More informationQuestion Answering and Multilingual CLEF 2008
Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We
More informationSymbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: lurdes@sip.ucm.es)
More informationGenerating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationAutomated Extraction of Vulnerability Information for Home Computer Security
Automated Extraction of Vulnerability Information for Home Computer Security Sachini Weerawardhana, Subhojeet Mukherjee, Indrajit Ray, and Adele Howe Computer Science Department, Colorado State University,
More informationLog-Linear Models. Michael Collins
Log-Linear Models Michael Collins 1 Introduction This note describes log-linear models, which are very widely used in natural language processing. A key advantage of log-linear models is their flexibility:
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationEnglish Descriptive Grammar
English Descriptive Grammar 2015/2016 Code: 103410 ECTS Credits: 6 Degree Type Year Semester 2500245 English Studies FB 1 1 2501902 English and Catalan FB 1 1 2501907 English and Classics FB 1 1 2501910
More informationThe Role of Sentence Structure in Recognizing Textual Entailment
Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure
More informationLing 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
More informationBig Data in Education
Big Data in Education Assessment of the New Educational Standards Markus Iseli, Deirdre Kerr, Hamid Mousavi Big Data in Education Technology is disrupting education, expanding the education ecosystem beyond
More informationThe Michigan State University - Certificate of English Language Proficiency (MSU- CELP)
The Michigan State University - Certificate of English Language Proficiency (MSU- CELP) The Certificate of English Language Proficiency Examination from Michigan State University is a four-section test
More informationShallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
More informationCALICO Journal, Volume 9 Number 1 9
PARSING, ERROR DIAGNOSTICS AND INSTRUCTION IN A FRENCH TUTOR GILLES LABRIE AND L.P.S. SINGH Abstract: This paper describes the strategy used in Miniprof, a program designed to provide "intelligent' instruction
More informationLanguage Arts Literacy Areas of Focus: Grade 5
Language Arts Literacy : Grade 5 Mission: Learning to read, write, speak, listen, and view critically, strategically and creatively enables students to discover personal and shared meaning throughout their
More informationL130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25
L130: Chapter 5d Dr. Shannon Bischoff Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25 Outline 1 Syntax 2 Clauses 3 Constituents Dr. Shannon Bischoff () L130: Chapter 5d 2 / 25 Outline Last time... Verbs...
More informationETS Research Spotlight. Automated Scoring: Using Entity-Based Features to Model Coherence in Student Essays
ETS Research Spotlight Automated Scoring: Using Entity-Based Features to Model Coherence in Student Essays ISSUE, AUGUST 2011 Foreword ETS Research Spotlight Issue August 2011 Editor: Jeff Johnson Contributors:
More informationFramework for Joint Recognition of Pronounced and Spelled Proper Names
Framework for Joint Recognition of Pronounced and Spelled Proper Names by Atiwong Suchato B.S. Electrical Engineering, (1998) Chulalongkorn University Submitted to the Department of Electrical Engineering
More informationCINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
More informationAutomated Whole Sentence Grammar Correction Using a Noisy Channel Model
Automated Whole Sentence Grammar Correction Using a Noisy Channel Model Y. Albert Park Department of Computer Science and Engineering 9500 Gilman Drive La Jolla, CA 92037-404, USA yapark@ucsd.edu Roger
More informationOffline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke
1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline
More informationA Beginner s Guide To English Grammar
A Beginner s Guide To English Grammar Noncredit ESL Glendale Community College Concept by: Deborah Robiglio Created by: Edwin Fallahi, Rocio Fernandez, Glenda Gartman, Robert Mott, and Deborah Robiglio
More informationWhy Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?
Why Evaluation? How good is a given system? Machine Translation Evaluation Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better?
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationMachine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!
Why Evaluation? How good is a given system? Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better? But MT evaluation is a di cult
More informationProcessing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
More informationParsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationUsing classes has the potential of reducing the problem of sparseness of data by allowing generalizations
POS Tags and Decision Trees for Language Modeling Peter A. Heeman Department of Computer Science and Engineering Oregon Graduate Institute PO Box 91000, Portland OR 97291 heeman@cse.ogi.edu Abstract Language
More informationAutomatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
More informationThe Oxford Learner s Dictionary of Academic English
ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students
More informationComputer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
More informationNATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
More informationNatural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
More informationBILINGUAL TRANSLATION SYSTEM
BILINGUAL TRANSLATION SYSTEM (FOR ENGLISH AND TAMIL) Dr. S. Saraswathi Associate Professor M. Anusiya P. Kanivadhana S. Sathiya Abstract--- The project aims in developing Bilingual Translation System for
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationComma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationSyntactic Theory on Swedish
Syntactic Theory on Swedish Mats Uddenfeldt Pernilla Näsfors June 13, 2003 Report for Introductory course in NLP Department of Linguistics Uppsala University Sweden Abstract Using the grammar presented
More informationThe new portfolio will not be assessed by examiners but will be used as a tool for students to develop their writing skills at each level.
A Teachers guide to the Trinity portfolio toolkit What is a portfolio? It s a file or folder that contains a collection of your students work. Each portfolio should include at least one example of each
More informationReliable and Cost-Effective PoS-Tagging
Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve
More informationOutline of today s lecture
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationNEW SOFTWARE TO HELP EFL STUDENTS SELF-CORRECT THEIR WRITING
Language Learning & Technology http://llt.msu.edu/issues/february2015/action1.pdf February 2015, Volume 19, Number 1 pp. 23 33 NEW SOFTWARE TO HELP EFL STUDENTS SELF-CORRECT THEIR WRITING Jim Lawley, Universidad
More informationWriting learning objectives
Writing learning objectives This material was excerpted and adapted from the following web site: http://www.utexas.edu/academic/diia/assessment/iar/students/plan/objectives/ What is a learning objective?
More informationAlignment of the National Standards for Learning Languages with the Common Core State Standards
Alignment of the National with the Common Core State Standards Performance Expectations The Common Core State Standards for English Language Arts (ELA) and Literacy in History/Social Studies, Science,
More information