Syntactic Transfer Using a Bilingual Lexicon Greg Durrett, Adam Pauls, and Dan Klein UC Berkeley
Parsing a New Language
Parsing a New Language Mozambique hope on trade with other members
Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)]
Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Parsing a New Language NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Parsing a New Language? NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Parsing a New Language? NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Parsing a New Language? NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members [Berg-Kirkpatrick and Klein (2010), Petrov et al. (2011)] [McDonald et al. (2011), Cohen et al. (2011)]
Token-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hopes for trade with other members [Hwa et al. (2005), Ganchev et al. (2009), McDonald et al. (2011)]
Token-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hopes for trade with other members Assumes access to parallel data [Hwa et al. (2005), Ganchev et al. (2009), McDonald et al. (2011)]
Token-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hopes for trade with other members Assumes access to parallel data Asymmetric train and test conditions [Hwa et al. (2005), Ganchev et al. (2009), McDonald et al. (2011)]
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN DE EN dict Handel trade, deal hofft hope mit with
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN trade, deal with DE EN dict Handel trade, deal hofft hope mit with
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN trade, deal with DE EN dict Handel trade, deal hofft hope mit with EN trees... trade with those regions... NOUN ADP DET NOUN... a deal with China... DET NOUN ADP NOUN
Type-level Projection Feature: Goodness in English Value: high NOUN VERB PREP NOUN PREP ADJ NOUN trade, deal with DE EN dict Handel trade, deal hofft hope mit with EN trees... trade with those regions... NOUN ADP DET NOUN... a deal with China... DET NOUN ADP NOUN
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN DE EN dict Handel trade, deal hofft hope mit with
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN hope with DE EN dict Handel trade, deal hofft hope mit with
Type-level Projection NOUN VERB PREP NOUN PREP ADJ NOUN hope with DE EN dict Handel trade, deal hofft hope mit with EN trees no examples
Type-level Projection Feature: Goodness in English Value: low NOUN VERB PREP NOUN PREP ADJ NOUN hope with DE EN dict Handel trade, deal hofft hope mit with EN trees no examples
Model Edge-factored discriminative parser, based on MSTParser (McDonald et al. 2005) DELEX features: universal POS (McDonald et al. 2011) PROJ features: Goodness in English
Model: DELEX Features NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: DELEX Features ATTACH NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: DELEX Features ATTACH NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: DELEX Features ATTACH Feature Value NOUN PREP 1 NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: DELEX Features ATTACH Feature Value NOUN PREP 1 + indicators including direction and distance NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH Feature Value Handel mit 1 NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH Insufficient target-language supervised data to use X Feature Value Handel mit 1 NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH Feature Value Handel mit NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH Feature Value Score(Handel mit) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH Feature Value Score(Handel mit) Query NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Model: PROJ Features ATTACH Feature Goodness in English Value Score(Handel mit) Query NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
DELEX+PROJ MULTI-PROJ Query Score Computation Score(Handel PREP) = log p(prep Handel) p(w s Handel)
DELEX+PROJ DELEX+PROJ MULTIPROJ MULTI-PROJ Query Score Computation Score(Handel Score(Handel PREP) = log p(prep Handel) PREP) = log p(prep Handel) = log X p(prep w s ) p(w s Handel) p(w s Handel) w s
DELEX+PROJ DELEX+PROJ MULTIPROJ MULTI-PROJ our approach our produces approach gains produces across gains a range across a ra of target languages, of target languages, using two using different twolow- resource training resource methodologies training methodologies (one weakly (one wea different lo supervised supervised and one indirectly and one supervised) indirectly supervised) and two different twodictionary different dictionary sources (one sources manually constructed ally constructed and one automatically and one automatically con- c (one ma structed). structed). Query Score Computation Score(Handel Score(Handel PREP) = log p(prep Handel) PREP) = log p(prep Handel) = log X p(prep w s ) p(w s Handel) p(w s Handel) w s Score(Handel Score(Handel PREP) =p(prep Handel) PREP) p(w s Handel) p(w s Handel) trade 0.5 deal p(x trade) p(x trade) 0.5
DELEX+PROJ DELEX+PROJ MULTIPROJ MULTI-PROJ = log X w s E(PREP w s ) p(w s Handel) Query Score Computation EN trees... trade with those regions... NOUN PREP DET NOUN... a good trade... DET ADJ NOUN our approach our produces approach gains produces across gains a range across a ra of target languages, of target languages, using two using different twolow- resource training resource methodologies training methodologies (one weakly (one wea different lo supervised supervised and one indirectly and one supervised) indirectly supervised) and two different twodictionary different dictionary sources (one sources manually constructed ally constructed and one automatically and one automatically con- c (one ma structed). structed). Score(Handel p(w s Handel) PREP) = log p(prep Handel) Score(Handel PREP) = log p(prep Handel) p(x trade) = log X p(prep w s ) p(w s Handel) p(w s Handel) w s w s = trade Score(Handel Score(Handel PREP) =p(prep Handel) PREP) p(w s Handel) p(w s Handel) trade 0.5 deal p(x trade) p(x trade) 0.5
DELEX+PROJ our approach our produces demand approach gains takes produces across nouns gains a on range its left and different supervised DELEX+PROJ and one indirectly supervised) and across a ra = log X dictionary MULTIPROJ E(PREP w sources (one manuconstructed demand takes nouns on it two different dictionary s ) p(w sources of target s Handel) languages, German of target languages, using verlangen two using different should twolow- resource training resource methodologies training methodologies (one weakly (one wea do the s different lo MULTI-PROJ and one automatically concted). s constructed and one automatically con- (one manually w German verlangen should ing attachments to Verzicht and G ing attachments to Verzic structed). Query Score supervised Computation supervised and one indirectly and one supervised) indirectly supervised) and two different twodictionary different dictionary sources (one sources manually constructed (one ma Score(Handel p(w s Handel) PREP) = log p(prep Handel) ally constructed and one automatically and one automatically constructed). = structed). log c Score(Handel PREP) p(prep Handel) (Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep Handel) p(x trade) = log X p(prep w s ) p(w s Handel) w s AUTOMATIC AUTOMA p(w Score(Handel s Handel) Voc OCC Voc O p(w s Handel) p(w s Handel) Score(Handel PREP) =p(prep Handel) PREP) DA 324K DA 324K 0.91 0 w s = trade DE 320K DE 320K 0.89 0 p(x trade) p(x trade) p(w s Handel) p(w EL s Handel) 196K EL 196K 0.94 0 ES PREP 0.5 trade ES 0.5 165K 165K 0.89 0 IT IT 158K 0 ADJ 0.5 deal p(x trade) p(x trade) 158K 0.91 0.5 NL 251K NL 251K 0.87 0 PT 165K PT 165K 0.85 0 SV 307K SV 307K 0.93 0 Ta Table 14:
Feature Specialization Feature: Goodness in English Value: Score(hofft PREP) Feature: Goodness in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Feature Specialization Feature: Goodness (VERB link) in English Value: Score(hofft PREP) Feature: Goodness in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Feature Specialization Feature: Goodness (VERB link) in English Value: Score(hofft PREP) Feature: Goodness (NOUN link) in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members
Feature Specialization Feature: Goodness (VERB link) in English Value: Score(hofft PREP) Feature: Goodness (NOUN link) in English Value: Score(Handel PREP) NOUN VERB PREP NOUN PREP ADJ NOUN Mozambique hope on trade with other members Similar to selective sharing of Naseem et al. (2012)
Experiments Sources of bilingual dictionaries Training Results
MANUAL Lexicon
MANUAL Lexicon
DELEX+PROJ MULTI-DIR MULTI-PROJ MANUAL Lexicon DELEX+PROJ structed). MULTI-PROJ resource training methodologies (one we supervised and one indirectly supervised two different dictionary sources (one m ally constructed and one automatically Score(Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep H p(w s Handel) p(w s Handel) deal 0.333 trade p(x trade) 0.333 trading 0.333
AUTOMATIC Lexicon Da ist zum Beispiel der Handel mit Drogen Trafficking in drugs is one example Unkontrollierter Handel mit leichten Waffen Uncontrolled trade in light weapons Ich will mit dem Handel anfangen I will start with trade
structed). structed). AUTOMATIC Lexicon Score(Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep Da ist zum Beispiel der Handel mit Drogen Trafficking in drugs is one example p(w s Handel) p(w s Handel) trade 0.66 trafficking p(x trade) p(x trade) 0.33 Unkontrollierter Handel mit leichten Waffen Uncontrolled trade in light weapons Ich will mit dem Handel anfangen I will start with trade
structed). structed). AUTOMATIC Lexicon Score(Handel Score(Handel PREP) =p(prep Handel) PREP) =p(prep Da ist zum Beispiel der Handel mit Drogen Trafficking in drugs is one example Unkontrollierter Handel mit leichten Waffen Uncontrolled trade in light weapons Ich will mit dem Handel anfangen I will start with trade p(w s Handel) p(w s Handel) trade 0.768 trafficking p(x trade) p(x trade) 0.057 commerce 0.05 trading 0.03 market 0.02 marketing 0.02 business 0.013 traffic 0.007 sales 0.002 deal 0.002......
Small Target Training Set 8 target languages, CoNLL shared task treebanks Train on 400 trees from CoNLL training set Test on CoNLL test set
Small Target Training Set 8 target languages, CoNLL shared task treebanks Train on 400 trees from CoNLL training set Test on CoNLL test set Universal POS tags generated from gold tags using mapping of Petrov et al. (2011)
Small Target Training Set 8 target languages, CoNLL shared task treebanks Train on 400 trees from CoNLL training set Test on CoNLL test set Universal POS tags generated from gold tags using mapping of Petrov et al. (2011) Source corpus: parsed English Gigaword
UAS (8 languages averaged) 80 78 76 74 72 70 behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MULTI-DIR MULTI-PROJ Score(Handel 75.1 400 train trees PREP) =p(prep Handel) p(w s Handel) p(x trade) Figure 1: Sentences in English and Germ ing words that mean demand. The fact demand takes nouns on its left and right German verlangen should do the same, c ing attachments to Verzicht and Gewerks AUTOMATIC Voc OCC Voc MAN DA 324K 0.91 22K DE 320K 0.89 58K EL 196K 0.94 23K ES 165K 0.89 206K IT 158K 0.91 78K NL 251K 0.87 50K PT 165K 0.85 46K SV 307K 0.93 28K Table 14:
UAS (8 languages averaged) 80 78 76 74 72 70 behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MULTI-DIR MULTI-PROJ 75.1 76.7 +1.6 Score(Handel 400 train trees PREP) =p(prep Handel) p(w s Handel) p(x trade) Figure 1: Sentences in English and Germ ing words that mean demand. The fact demand takes nouns on its left and right German verlangen should do the same, c ing attachments to Verzicht and Gewerks AUTOMATIC Voc OCC Voc MAN DA 324K 0.91 22K DE 320K 0.89 58K EL 196K 0.94 23K ES 165K 0.89 206K IT 158K 0.91 78K NL 251K 0.87 50K PT 165K 0.85 46K SV 307K 0.93 28K Table 14:
UAS (8 languages averaged) 80 78 76 74 72 70 behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MULTI-DIR MULTI-PROJ 75.1 77.4 76.7 +2.3 +1.6 Score(Handel 400 train trees PREP) =p(prep Handel) p(w s Handel) p(x trade) Figure 1: Sentences in English and Germ ing words that mean demand. The fact demand takes nouns on its left and right German verlangen should do the same, c ing attachments to Verzicht and Gewerks AUTOMATIC Voc OCC Voc MAN DA 324K 0.91 22K DE 320K 0.89 58K EL 196K 0.94 23K ES 165K 0.89 206K IT 158K 0.91 78K NL 251K 0.87 50K PT 165K 0.85 46K SV 307K 0.93 28K Table 14:
UAS (8 languages averaged) 80 78 76 74 72 70 behavior of words at a type level. In a discriminative dependency parsing framework, resource training methodologies (one weakly our approach produces gains across a range supervised and one indirectly supervised) and of target languages, using two different lowresource training two different Figure dictionary 1: Sentences sources in English (one manually constructed ing words and Germ Small Target methodologies (one Training weakly Set andthat onemean automatically demand. The con-facstructed). demand takes nouns on its left and right supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically German verlangen should do the same, c DELEX constructed). ing attachments to Verzicht and Gewerks DELEX+PROJ(MANUAL) 100 DELEX+PROJ(AUTOMATIC) DELEX MANUAL DELEX+PROJ(MANUAL) AUTOMATIC DELEX+PROJ(AUTOMATIC) 89.9 DELEX MULTI-DIR AUTOMATIC MAN 75 MULTI-PROJ DELEX+PROJ 77.4 Voc OCC Voc MULTI-DIR 76.7 +2.3 DA 324K 0.91 22K MULTI-PROJ DE Score(Handel +1.6 PREP) =p(prep Handel) 60.8 320K 0.89 58K 50 EL 196K 0.94 23K 75.1 ES 165K 0.89 206K Score(Handel PREP) =p(prep Handel) IT 158K 0.91 78K p(w s Handel) NL 251K 0.87 50K 25 PT 165K 0.85 46K p(w s Handel) p(x trade) SV 307K 0.93 28K 400 train trees p(x trade) 0 Open-class token coverage Table 14:
two different dictionary sources ( ally constructed and one automat structed). UAS (8 languages averaged) 80 78 76 74 72 70 Small Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR MULTI-PROJ 77.4 +2.3 Score(Handel 75.1 400 train trees PREP) =p(pre p(w s Handel) p(x trade)
two different dictionary sources ( ally constructed and one automat structed). UAS (8 languages averaged) 80 78 76 74 72 70 Small Target Training Set 71.8 73.5 73.6 +1.7 75.7 +2.1 DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR MULTI-PROJ 77.4 +2.3 Score(Handel 75.1 100 train trees 200 target trees 400 train trees PREP) =p(pre p(w s Handel) p(x trade)
No Target Training Set Train ES trees Test DE CoNLL test set PT trees SV trees DET NOUN VERB ADJ El amarillismo es imparcial NOUN VERB NOUN PREP DET NOUN Bruxelas adia fiscalidade de os combustíveis PRON VERB PRON ADV Jag tycker det inte... [McDonald et al. (2011)]
No Target Training Set Train ES trees Test DE CoNLL test set DET NOUN VERB ADJ PT trees NOUN VERB NOUN PREP DET NOUN SV trees PRON VERB PRON ADV... [McDonald et al. (2011)]
No Target Training Set Train ES trees Test DE CoNLL test set PT trees SV trees DET NOUN VERB ADJ El amarillismo es imparcial NOUN VERB NOUN PREP DET NOUN Bruxelas adia fiscalidade de os combustíveis PRON VERB PRON ADV Jag tycker det inte... [McDonald et al. (2011)]
UAS (8 languages averaged) 65 63 61 59 57 55 supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). No Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR 62.6 MULTI-PROJ +2.0 60.6 Score(Handel PREP) =p(prep Handel) p(w s Handel) p(x trade) demand takes nouns on its l German verlangen should d ing attachments to Verzicht AUTOMAT Voc OC DA 324K 0.9 DE 320K 0.8 EL 196K 0.9 ES 165K 0.8 IT 158K 0.9 NL 251K 0.8 PT 165K 0.8 SV 307K 0.9 Tabl
UAS (8 languages averaged) 65 63 61 59 57 55 supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed). No Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) Type-level* (this work) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR 62.6 MULTI-PROJ +2.0 60.6 Score(Handel PREP) =p(prep Handel) p(w s Handel) p(x trade) demand takes nouns on its l German verlangen should d ing attachments to Verzicht AUTOMAT Voc OC DA 324K 0.9 DE 320K 0.8 EL 196K 0.9 ES 165K 0.8 IT 158K 0.9 NL 251K 0.8 PT 165K 0.8 SV 307K 0.9 Tabl
UAS (8 languages averaged) 65 63 61 59 57 55 supervised and one indirectly supervised) and structed). two different dictionary sources (one manually constructed and one automatically constructed). DELEX No Target Training Set DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) Type-level* (this work) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTI-DIR 62.6 MULTI-PROJ +2.0 60.6 Score(Handel Token-level (McDonald et al. 2011) 61.1 PREP) =p(prep Handel) Score(Handel p(w s Handel) p(x trade) DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTIDIR MULTIPROJ DELEX+PROJ MULTIPROJ +2.7 demand takes nouns on its l German verlangen should d ing attachments to Verzicht AUTOMAT Voc OC DA 324K 0.9 63.8 DE 320K 0.8 EL 196K 0.9 ES 165K 0.8 IT 158K 0.9 NL 251K 0.8 PREP) = log p(prep Hande PT 165K 0.8 SV 307K 0.9 p(w s Handel) Tabl
Gain over delexicalized baseline 10 8 6 4 2 0 DELEX DELEX+PROJ(MANUAL) DELEX+PROJ(AUTOMATIC) MANUAL AUTOMATIC DELEX DELEX+PROJ MULTIDIR MULTIPROJ DELEX+PROJ MULTIPROJ Score(Handel Gains PREP) = log p(prep Handel) p(w s Handel) DA DE EL ES IT NL PT SV
Related Work Täckström et al. (2012): type-level transfer using word clusters
Related Work Täckström et al. (2012): type-level transfer using word clusters Naseem et al. (2012): adaptation of universal POS transfer to the target language
Conclusion Bilingual dictionaries allow transfer of linguistic knowledge at a type level
Conclusion Bilingual dictionaries allow transfer of linguistic knowledge at a type level Type-level transfer: Comparable to token-level transfer Simpler to implement Does not require bitext
Conclusion Bilingual dictionaries allow transfer of linguistic knowledge at a type level Type-level transfer: Comparable to token-level transfer Simpler to implement Does not require bitext Thank you!