Syntactic Transfer Using a Bilingual Lexicon
|
|
|
- Madeleine Chase
- 9 years ago
- Views:
Transcription
1 Syntactic Transfer Using a Bilingual Lexicon Greg Durrett, Adam Pauls, and Dan Klein UC Berkeley
2 Parsing a New Language
3 Parsing a New Language Mozambique hope on trade with other members
4 Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)]
5 Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
6 Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
7 Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
8 Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
9 Parsing a New Language? NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
10 Parsing a New Language? NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
11 Parsing a New Language? NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
12 Token-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hopes for trade with other members [Hwa et al. (2005), Ganchev et al. (2009), McDonald et al. (2011)]
13 Token-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hopes for trade with other members Assumes access to parallel data [Hwa et al. (2005), Ganchev et al. (2009), McDonald et al. (2011)]
14 Token-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hopes for trade with other members Assumes access to parallel data Asymmetric train and test conditions [Hwa et al. (2005), Ganchev et al. (2009), McDonald et al. (2011)]
15 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN
16 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN DE EN dict Handel trade, deal hofft hope mit with
17 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN trade, deal with DE EN dict Handel trade, deal hofft hope mit with
18 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN trade, deal with DE EN dict Handel trade, deal hofft hope mit with EN trees... trade with those regions... NOUN ADP DET NOUN... a deal with China... DET NOUN ADP NOUN
19 Type-level Projection Feature: Goodness in English Value: high NOUN VERB PREP NOUN PREP ADJ NOUN trade, deal with DE EN dict Handel trade, deal hofft hope mit with EN trees... trade with those regions... NOUN ADP DET NOUN... a deal with China... DET NOUN ADP NOUN
20 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN DE EN dict Handel trade, deal hofft hope mit with
21 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN hope with DE EN dict Handel trade, deal hofft hope mit with
22 Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN hope with DE EN dict Handel trade, deal hofft hope mit with EN trees no examples
23 Type-level Projection Feature: Goodness in English Value: low NOUN VERB PREP NOUN PREP ADJ NOUN hope with DE EN dict Handel trade, deal hofft hope mit with EN trees no examples
24 Model Edge-factored discriminative parser, based on MSTParser (McDonald et al. 2005) DELEX features: universal POS (McDonald et al. 2011) PROJ features: Goodness in English
25 Model: DELEX Features NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
26 Model: DELEX Features ATTACH NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
27 Model: DELEX Features ATTACH NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
28 Model: DELEX Features ATTACH Feature Value NOUN PREP 1 NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
29 Model: DELEX Features ATTACH Feature Value NOUN PREP 1 + indicators including direction and distance NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
30 Model: PROJ Features ATTACH NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
31 Model: PROJ Features ATTACH Feature Value Handel mit 1 NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
32 Model: PROJ Features ATTACH Insufficient target-language supervised data to use X Feature Value Handel mit 1 NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
33 Model: PROJ Features ATTACH Feature Value Handel mit NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
34 Model: PROJ Features ATTACH Feature Value Score(Handel mit) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
35 Model: PROJ Features ATTACH Feature Value Score(Handel mit) Query NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
36 Model: PROJ Features ATTACH Feature Goodness in English Value Score(Handel mit) Query NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
37 DELEX+PROJ MULTI-PROJ Query Score Computation Score(Handel PREP) = log p(prep Handel) p(w s Handel)
38 DELEX+PROJ DELEX+PROJ MULTIPROJ MULTI-PROJ Query Score Computation Score(Handel Score(Handel PREP) = log p(prep Handel) PREP) = log p(prep Handel) = log X p(prep w s ) p(w s Handel) p(w s Handel) w s
39 DELEX+PROJ DELEX+PROJ MULTIPROJ MULTI-PROJ our approach our produces approach gains produces across gains a range across a ra of target languages, of target languages, using two using different twolow- resource training resource methodologies training methodologies (one weakly (one wea different lo supervised supervised and one indirectly and one supervised) indirectly supervised) and two different twodictionary different dictionary sources (one sources manually constructed ally constructed and one automatically and one automatically con- c (one ma structed). structed). Query Score Computation Score(Handel Score(Handel PREP) = log p(prep Handel) PREP) = log p(prep Handel) = log X p(prep w s ) p(w s Handel) p(w s Handel) w s Score(Handel Score(Handel PREP) =p(prep Handel) PREP) p(w s Handel) p(w s Handel) trade 0.5 deal p(x trade) p(x trade) 0.5
40 DELEX+PROJ DELEX+PROJ MULTIPROJ MULTI-PROJ = log X w s E(PREP w s ) p(w s Handel) Query Score Computation EN trees... trade with those regions... NOUN PREP DET NOUN... a good trade... DET ADJ NOUN our approach our produces approach gains produces across gains a range across a ra of target languages, of target languages, using two using different twolow- resource training resource methodologies training methodologies (one weakly (one wea different lo supervised supervised and one indirectly and one supervised) indirectly supervised) and two different twodictionary different dictionary sources (one sources manually constructed ally constructed and one automatically and one automatically con- c (one ma structed). structed). Score(Handel p(w s Handel) PREP) = log p(prep Handel) Score(Handel PREP) = log p(prep Handel) p(x trade) = log X p(prep w s ) p(w s Handel) p(w s Handel) w s w s = trade Score(Handel Score(Handel PREP) =p(prep Handel) PREP) p(w s Handel) p(w s Handel) trade 0.5 deal p(x trade) p(x trade) 0.5
41 DELEX+PROJ our approach our produces demand approach gains takes produces across nouns gains a on range its left and different supervised DELEX+PROJ and one indirectly supervised) and across a ra = log X dictionary MULTIPROJ E(PREP w sources (one manuconstructed demand takes nouns on it two different dictionary s ) p(w sources of target s Handel) languages, German of target languages, using verlangen two using different should twolow- resource training resource methodologies training methodologies (one weakly (one wea do the s different lo MULTI-PROJ and one automatically concted). s constructed and one automatically con- (one manually w German verlangen should ing attachments to Verzicht and G ing attachments to Verzic structed). Query Score supervised Computation supervised and one indirectly and one supervised) indirectly supervised) and two different twodictionary different dictionary sources (one sources manually constructed (one ma Score(Handel p(w s Handel) PREP) = log p(prep Handel) ally constructed and one automatically and one automatically constructed). = structed). log c Score(Handel PREP) p(prep Handel) (Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep Handel) p(x trade) = log X p(prep w s ) p(w s Handel) w s AUTOMATIC AUTOMA p(w Score(Handel s Handel) Voc OCC Voc O p(w s Handel) p(w s Handel) Score(Handel PREP) =p(prep Handel) PREP) DA 324K DA 324K w s = trade DE 320K DE 320K p(x trade) p(x trade) p(w s Handel) p(w EL s Handel) 196K EL 196K ES PREP 0.5 trade ES K 165K IT IT 158K 0 ADJ 0.5 deal p(x trade) p(x trade) 158K NL 251K NL 251K PT 165K PT 165K SV 307K SV 307K Ta Table 14:
42 Feature Specialization Feature: Goodness in English Value: Score(hofft PREP) Feature: Goodness in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
43 Feature Specialization Feature: Goodness (VERB link) in English Value: Score(hofft PREP) Feature: Goodness in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
44 Feature Specialization Feature: Goodness (VERB link) in English Value: Score(hofft PREP) Feature: Goodness (NOUN link) in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
45 Feature Specialization Feature: Goodness (VERB link) in English Value: Score(hofft PREP) Feature: Goodness (NOUN link) in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members Similar to selective sharing of Naseem et al. (2012)
46 Experiments Sources of bilingual dictionaries Training Results
47 MANUAL Lexicon
48 MANUAL Lexicon
49 DELEX+PROJ MULTI-DIR MULTI-PROJ MANUAL Lexicon DELEX+PROJ structed). MULTI-PROJ resource training methodologies (one we supervised and one indirectly supervised two different dictionary sources (one m ally constructed and one automatically Score(Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep H p(w s Handel) p(w s Handel) deal trade p(x trade) trading 0.333
50 AUTOMATIC Lexicon Da ist zum Beispiel der Handel mit Drogen Trafficking in drugs is one example Unkontrollierter Handel mit leichten Waffen Uncontrolled trade in light weapons Ich will mit dem Handel anfangen I will start with trade
51 structed). structed). AUTOMATIC Lexicon Score(Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep Da ist zum Beispiel der Handel mit Drogen Trafficking in drugs is one example p(w s Handel) p(w s Handel) trade 0.66 trafficking p(x trade) p(x trade) 0.33 Unkontrollierter Handel mit leichten Waffen Uncontrolled trade in light weapons Ich will mit dem Handel anfangen I will start with trade
52 structed). structed). AUTOMATIC Lexicon Score(Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep Da ist zum Beispiel der Handel mit Drogen Trafficking in drugs is one example Unkontrollierter Handel mit leichten Waffen Uncontrolled trade in light weapons Ich will mit dem Handel anfangen I will start with trade p(w s Handel) p(w s Handel) trade trafficking p(x trade) p(x trade) commerce 0.05 trading 0.03 market 0.02 marketing 0.02 business traffic sales deal
53 Small Target Training Set 8 target languages, CoNLL shared task treebanks Train on 400 trees from CoNLL training set Test on CoNLL test set
54 Small Target Training Set 8 target languages, CoNLL shared task treebanks Train on 400 trees from CoNLL training set Test on CoNLL test set Universal POS tags generated from gold tags using mapping of Petrov et al. (2011)
55 Small Target Training Set 8 target languages, CoNLL shared task treebanks Train on 400 trees from CoNLL training set Test on CoNLL test set Universal POS tags generated from gold tags using mapping of Petrov et al. (2011) Source corpus: parsed English Gigaword
56 UAS (8 languages averaged) behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MULTI-DIR MULTI-PROJ Score(Handel train trees PREP) =p(prep Handel) p(w s Handel) p(x trade) Figure 1: Sentences in English and Germ ing words that mean demand. The fact demand takes nouns on its left and right German verlangen should do the same, c ing attachments to Verzicht and Gewerks AUTOMATIC Voc OCC Voc MAN DA 324K K DE 320K K EL 196K K ES 165K K IT 158K K NL 251K K PT 165K K SV 307K K Table 14:
57 UAS (8 languages averaged) behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MULTI-DIR MULTI-PROJ Score(Handel 400 train trees PREP) =p(prep Handel) p(w s Handel) p(x trade) Figure 1: Sentences in English and Germ ing words that mean demand. The fact demand takes nouns on its left and right German verlangen should do the same, c ing attachments to Verzicht and Gewerks AUTOMATIC Voc OCC Voc MAN DA 324K K DE 320K K EL 196K K ES 165K K IT 158K K NL 251K K PT 165K K SV 307K K Table 14:
58 UAS (8 languages averaged) behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MULTI-DIR MULTI-PROJ Score(Handel 400 train trees PREP) =p(prep Handel) p(w s Handel) p(x trade) Figure 1: Sentences in English and Germ ing words that mean demand. The fact demand takes nouns on its left and right German verlangen should do the same, c ing attachments to Verzicht and Gewerks AUTOMATIC Voc OCC Voc MAN DA 324K K DE 320K K EL 196K K ES 165K K IT 158K K NL 251K K PT 165K K SV 307K K Table 14:
59 UAS (8 languages averaged) behavior of words at a type level. In a discriminative dependency parsing framework, resource training methodologies (one weakly our approach produces gains across a range supervised and one indirectly supervised) and of target languages, using two different lowresource training two different Figure dictionary 1: Sentences sources in English (one manually constructed ing words and Germ Small Target methodologies (one Training weakly Set andthat onemean automatically demand. The con-facstructed). demand takes nouns on its left and right supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically German verlangen should do the same, c DELEX constructed). ing attachments to Verzicht and Gewerks DELEX+PROJ(MANUAL) 100 DELEX+PROJ(AUTOMATIC) DELEX MANUAL DELEX+PROJ(MANUAL) AUTOMATIC DELEX+PROJ(AUTOMATIC) 89.9 DELEX MULTI-DIR AUTOMATIC MAN 75 MULTI-PROJ DELEX+PROJ 77.4 Voc OCC Voc MULTI-DIR DA 324K K MULTI-PROJ DE Score(Handel +1.6 PREP) =p(prep Handel) K K 50 EL 196K K 75.1 ES 165K K Score(Handel PREP) =p(prep Handel) IT 158K K p(w s Handel) NL 251K K 25 PT 165K K p(w s Handel) p(x trade) SV 307K K 400 train trees p(x trade) 0 Open-class token coverage Table 14:
60 two different dictionary sources ( ally constructed and one automat structed). UAS (8 languages averaged) Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR MULTI-PROJ Score(Handel train trees PREP) =p(pre p(w s Handel) p(x trade)
61 two different dictionary sources ( ally constructed and one automat structed). UAS (8 languages averaged) Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR MULTI-PROJ Score(Handel train trees 200 target trees 400 train trees PREP) =p(pre p(w s Handel) p(x trade)
62 No Target Training Set Train ES trees Test DE CoNLL test set PT trees SV trees DET NOUN VERB ADJ El amarillismo es imparcial NOUN VERB NOUN PREP DET NOUN Bruxelas adia fiscalidade de os combustíveis PRON VERB PRON ADV Jag tycker det inte... [McDonald et al. (2011)]
63 No Target Training Set Train ES trees Test DE CoNLL test set DET NOUN VERB ADJ PT trees NOUN VERB NOUN PREP DET NOUN SV trees PRON VERB PRON ADV... [McDonald et al. (2011)]
64 No Target Training Set Train ES trees Test DE CoNLL test set PT trees SV trees DET NOUN VERB ADJ El amarillismo es imparcial NOUN VERB NOUN PREP DET NOUN Bruxelas adia fiscalidade de os combustíveis PRON VERB PRON ADV Jag tycker det inte... [McDonald et al. (2011)]
65 UAS (8 languages averaged) supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). No Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR 62.6 MULTI-PROJ Score(Handel PREP) =p(prep Handel) p(w s Handel) p(x trade) demand takes nouns on its l German verlangen should d ing attachments to Verzicht AUTOMAT Voc OC DA 324K 0.9 DE 320K 0.8 EL 196K 0.9 ES 165K 0.8 IT 158K 0.9 NL 251K 0.8 PT 165K 0.8 SV 307K 0.9 Tabl
66 UAS (8 languages averaged) supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). No Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) Type-level* (this work) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR 62.6 MULTI-PROJ Score(Handel PREP) =p(prep Handel) p(w s Handel) p(x trade) demand takes nouns on its l German verlangen should d ing attachments to Verzicht AUTOMAT Voc OC DA 324K 0.9 DE 320K 0.8 EL 196K 0.9 ES 165K 0.8 IT 158K 0.9 NL 251K 0.8 PT 165K 0.8 SV 307K 0.9 Tabl
67 UAS (8 languages averaged) supervised and one indirectly supervised) and structed). two different dictionary sources (one manually constructed and one automatically constructed). DELEX No Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) Type-level* (this work) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR 62.6 MULTI-PROJ Score(Handel Token-level (McDonald et al. 2011) 61.1 PREP) =p(prep Handel) Score(Handel p(w s Handel) p(x trade) DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTIDIR MULTIPROJ DELEX+PROJ MULTIPROJ +2.7 demand takes nouns on its l German verlangen should d ing attachments to Verzicht AUTOMAT Voc OC DA 324K DE 320K 0.8 EL 196K 0.9 ES 165K 0.8 IT 158K 0.9 NL 251K 0.8 PREP) = log p(prep Hande PT 165K 0.8 SV 307K 0.9 p(w s Handel) Tabl
68 Gain over delexicalized baseline DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTIDIR MULTIPROJ DELEX+PROJ MULTIPROJ Score(Handel Gains PREP) = log p(prep Handel) p(w s Handel) DA DE EL ES IT NL PT SV
69 Related Work Täckström et al. (2012): type-level transfer using word clusters
70 Related Work Täckström et al. (2012): type-level transfer using word clusters Naseem et al. (2012): adaptation of universal POS transfer to the target language
71 Conclusion Bilingual dictionaries allow transfer of linguistic knowledge at a type level
72 Conclusion Bilingual dictionaries allow transfer of linguistic knowledge at a type level Type-level transfer: Comparable to token-level transfer Simpler to implement Does not require bitext
73 Conclusion Bilingual dictionaries allow transfer of linguistic knowledge at a type level Type-level transfer: Comparable to token-level transfer Simpler to implement Does not require bitext Thank you!
Simple Type-Level Unsupervised POS Tagging
Simple Type-Level Unsupervised POS Tagging Yoong Keok Lee Aria Haghighi Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {yklee, aria42, regina}@csail.mit.edu
Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
Universal Dependencies
Universal Dependencies Joakim Nivre! Uppsala University Department of Linguistics and Philology Based on collaborative work with Jinho Choi, Timothy Dozat, Filip Ginter, Yoav Goldberg, Jan Hajič, Chris
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Wiki-ly Supervised Part-of-Speech Tagging
Wiki-ly Supervised Part-of-Speech Tagging Shen Li Computer & Information Science University of Pennsylvania [email protected] João V. Graça L 2 F INESC-ID Lisboa, Portugal [email protected] Ben
Transition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
Automatic Detection and Correction of Errors in Dependency Treebanks
Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany [email protected] Günter Neumann DFKI Stuhlsatzenhausweg
Machine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
A Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA [email protected], [email protected]
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014
Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ [email protected]
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ [email protected] 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction
Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Uwe D. Reichel Department of Phonetics and Speech Communication University of Munich [email protected] Abstract
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23
POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1
C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])
Extraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier
Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every
Ling 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA
Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
The Role of Sentence Structure in Recognizing Textual Entailment
Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
CALICO Journal, Volume 9 Number 1 9
PARSING, ERROR DIAGNOSTICS AND INSTRUCTION IN A FRENCH TUTOR GILLES LABRIE AND L.P.S. SINGH Abstract: This paper describes the strategy used in Miniprof, a program designed to provide "intelligent' instruction
Context Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
How To Translate English To Yoruba Language To Yoranuva
International Journal of Language and Linguistics 2015; 3(3): 154-159 Published online May 11, 2015 (http://www.sciencepublishinggroup.com/j/ijll) doi: 10.11648/j.ijll.20150303.17 ISSN: 2330-0205 (Print);
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Coffee Break German Lesson 06
LESSON NOTES WIE VIEL KOSTET DAS? In this episode of Coffee Break German we ll start by learning the numbers from zero to ten and then learn to deal with transactional situations involving paying for things
TS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
Semantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,
Search Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann
Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR 1 Gauri Rao, 2 Chanchal Agarwal, 3 Snehal Chaudhry, 4 Nikita Kulkarni,, 5 Dr. S.H. Patil 1 Lecturer department o f Computer Engineering BVUCOE,
INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning [email protected]
INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning [email protected] Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &
BILINGUAL TRANSLATION SYSTEM
BILINGUAL TRANSLATION SYSTEM (FOR ENGLISH AND TAMIL) Dr. S. Saraswathi Associate Professor M. Anusiya P. Kanivadhana S. Sathiya Abstract--- The project aims in developing Bilingual Translation System for
Effective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky [email protected] Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - [email protected]
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
DEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
SERVICES PRICE LIST - COMMERCIAL Sysorex Government Services, Inc.
SERVICES - COMMERCIAL Sysorex Government Services, Inc. ITEM NUMBER LABOR TYPE DESCRIPTION PT00201 PT00202 Junior System Staff System equivalent working knowledge of System ing $ 109.63 experience or equivalent,
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
A Chart Parsing implementation in Answer Set Programming
A Chart Parsing implementation in Answer Set Programming Ismael Sandoval Cervantes Ingenieria en Sistemas Computacionales ITESM, Campus Guadalajara [email protected] Rogelio Dávila Pérez Departamento de
Activities. but I will require that groups present research papers
CS-498 Signals AI Themes Much of AI occurs at the signal level Processing data and making inferences rather than logical reasoning Areas such as vision, speech, NLP, robotics methods bleed into other areas
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde LREC 2012, Istanbul May 25, 2012 NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics September
Tanja Wissik. COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm
Tanja Wissik COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm What is LISE? Main pillars of the LISE Project LISE Service Version and IATE use case Workflow Research
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
Domain Adaptation for Dependency Parsing via Self-training
Domain Adaptation for Dependency Parsing via Self-training Juntao Yu 1, Mohab Elkaref 1, Bernd Bohnet 2 1 University of Birmingham, Birmingham, UK 2 Google, London, UK 1 {j.yu.1, m.e.a.r.elkaref}@cs.bham.ac.uk,
Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning
Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning Claire Gardent CNRS / LORIA, Nancy, France (Joint work with Eric Kow and Laura Perez-Beltrachini) March
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
Syntactic Theory on Swedish
Syntactic Theory on Swedish Mats Uddenfeldt Pernilla Näsfors June 13, 2003 Report for Introductory course in NLP Department of Linguistics Uppsala University Sweden Abstract Using the grammar presented
INPUTLOG 6.0 a research tool for logging and analyzing writing process data. Linguistic analysis. Linguistic analysis
INPUTLOG 6.0 a research tool for logging and analyzing writing process data Linguistic analysis From character level analyses to word level analyses Linguistic analysis 2 Linguistic Analyses The concept
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
Circle the right answer for each question. 1) These words all have the same root word. Which is the root word?
Quiz Root words Level A Circle the right answer for each question. 1) These words all have the same root word. Which is the root word? A) takes B) taken C) mistaken D) take 2) These words all have the
Semantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
A Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database
I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries
Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen
Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen Date: 2010-01-11 Time: 08:15-11:15 Teacher: Mathias Broxvall Phone: 301438 Aids: Calculator and/or a Swedish-English dictionary Points: The
PyCantonese: Cantonese linguistic research in the age of big data
PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus
Plantronics Explorer 50. User Guide
Plantronics Explorer 50 User Guide Contents Welcome 3 What's in the box 4 Headset Overview 5 Be safe 5 Pair and Charge 6 Get Paired 6 Activate pair mode 6 Use two phones 6 Reconnect 6 Charge 6 Fit 7 The
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010 OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION SMT VS. RBMT [K.
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
Parsing Swedish. Atro Voutilainen Conexor oy [email protected]. CG and FDG
Parsing Swedish Atro Voutilainen Conexor oy [email protected] This paper presents two new systems for analysing Swedish texts: a light parser and a functional dependency grammar parser. Their
Topological Field Chunking in German
Topological Field Chunking in German Jorn Veenstra, Frank H. Müller, Tylman Ule [veenstra,fhm,ule]@sfs.uni-tuebingen.de ESSLLI Summerschool Workshop on Machine Learning Approaches in Computational Linguistics
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Multilingual and mixed-lingual TTS applications
Multilingual and mixed-lingual TTS applications LangTech 2003 November 24, 2003 Simona Fina, Manager Linguistics Real-life texts need mixed-lingual analysis Agenda Short presentation of SVOX Challenges
