Universal Dependencies

Size: px
Start display at page:

Download "Universal Dependencies"

Transcription

1 Universal Dependencies Joakim Nivre! Uppsala University Department of Linguistics and Philology Based on collaborative work with Jinho Choi, Timothy Dozat, Filip Ginter, Yoav Goldberg, Jan Hajič, Chris Manning, Ryan McDonald, Natalia Silveira, Marie de Marneffe, Slav Petrov, Sampo Pyysalo, Reut Tsarfaty, Daniel Zeman

2 Natural Language Processing

3 Natural Language Processing Linguistic diversity makes our life harder Why 90% parsing accuracy for English but only 80% for Finnish? Can we even compare the numbers?

4 Natural Language Processing Linguistic diversity makes our life harder Why 90% parsing accuracy for English but only 80% for Finnish? Can we even compare the numbers? Current NLP relies heavily on linguistic annotation Treebank annotation schemes vary across languages How do we avoid comparing apples and oranges?

5

6 Language X conj conj En katt jagar råttor och möss? cc conj En kat jager råder og møs cc conj A cat chases rats and mice

7 Language X conj conj conj En katt jagar råttor och möss En katt jagar råttor och möss Language Y? cc conj En kat jager råder og møs En kat jager råder og møs conj conj cc A cat chases rats and mice A cat chases rats and mice conj? cc conj cc

8 Language X conj conj conj conj conj En katt jagar råttor och möss En katt jagar råttor och möss En katt jagar råttor och möss Language Y? cc conj?? cc conj cc conj En kat jager råder og møs En kat jager råder og møs En kat jager råder og møs conj cc cc cc A cat chases A rats cat and chases mice rats and mice A cat chases rats and mice conj Language Z conj conj

9 Language X conj conj conj conj conj En katt jagar råttor och möss En katt jagar råttor och möss En katt jagar råttor och möss Language Y? cc conj?? cc conj cc conj En kat jager råder og møs En kat jager råder og møs En kat jager råder og møs conj cc cc cc A cat chases A rats cat and chases mice rats and mice A cat chases rats and mice conj Language Z conj conj Which languages are most closely related?

10 1/5 Language X conj conj conj conj conj En katt jagar råttor och möss En katt jagar råttor och möss En katt jagar råttor och möss Language Y? cc conj?? cc conj cc conj En kat jager råder og møs En kat jager råder og møs En kat jager råder og møs conj cc cc cc A cat chases A rats cat and chases mice rats and mice A cat chases rats and mice conj Language Z conj conj Which languages are most closely related?

11 1/5 Language X conj conj conj conj conj En katt jagar råttor och möss En katt jagar råttor och möss En katt jagar råttor och möss Language Y? cc conj?? cc conj cc conj En kat jager råder og møs En kat jager råder og møs En kat jager råder og møs conj cc cc cc A cat chases A rats cat and chases mice rats and mice A cat chases rats and mice conj Language Z conj conj 2/5 Which languages are most closely related?

12 1/5 Language X conj conj conj conj conj En katt jagar råttor och möss En katt jagar råttor och möss En katt jagar råttor och möss Language Y? cc conj?? cc conj cc conj En kat jager råder og møs En kat jager råder og møs En kat jager råder og møs conj cc cc cc A cat chases A rats cat and chases mice rats and mice A cat chases rats 2/5 and mice conj Language Z conj conj 2/5 Which languages are most closely related?

13 Language Swedish X conj conj conj conj conj conj En conj katt jagar conj råttor och möss En katt jagar råttor och möss En En katt katt jagar jagar råttor råttor och och möss mössen katt jagar råttor och möss 1/5 Language Danish Y? cc cc conj conj cc conj? cc conj? En En kat cc kat jager conj jager rotter råder og og mus møs råder og 2/5 møs En kat jager rotter råder og mus møs En kat jager rotter og mus Language English Z conj conj cc conj cc cc A cat cc cc chases rats and mice A cat chases rats and mice A cat cat chases chases rats rats 2/5 and and mice A cat chases rats and mice mice advmod Which languages advmod are most closely related? Toutefois, les filles adorent les Toutefois, toutefois les, fillestoutefois les adorent, fille les les adorer filles desserts les toutefois, ADV les PUNCTfilletoutefois DET adorer, NOUN les les VERB fille dessert DET conj advmod conj conj dob

14 Why is this a problem?

15 Why is this a problem? Hard to compare empirical results across languages

16 Why is this a problem? Hard to compare empirical results across languages Hard to evaluate cross-lingual learning

17 Why is this a problem? Hard to compare empirical results across languages Hard to evaluate cross-lingual learning Hard to build and maintain multilingual systems

18 Why is this a problem? Hard to compare empirical results across languages Hard to evaluate cross-lingual learning Hard to build and maintain multilingual systems Hard to make progress towards a universal parser

19 conj conj En katt Universal jagar råttor och möss Dependencies? cc conj En kat jager rotter og mus cc conj A cat chases rats and mice advmod Toutefois, les filles adorent les desserts. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres

20 conj conj En katt Universal jagar råttor och möss Dependencies? cc conj En kat jager rotter og mus cc conj A cat chases rats and mice advmod Toutefois, les filles adorent les desserts. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres Part-of-speech tags

21 conj conj En katt Universal jagar råttor och möss Dependencies? cc conj En kat jager rotter og mus cc conj A cat chases rats and mice advmod Toutefois, les filles adorent les desserts. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres Part-of-speech tags Morphological features

22 conj conj En katt Universal jagar råttor och möss Dependencies? cc conj En kat jager rotter og mus cc conj A cat chases rats and mice Dependency relations advmod Toutefois, les filles adorent les desserts. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres Part-of-speech tags Morphological features

23 Universal Dependencies

24 Universal Dependencies Stanford Dependencies

25 Universal Dependencies Stanford Dependencies CLEAR

26 Universal Dependencies Stanford Dependencies Google UD CLEAR

27 Universal Dependencies Stanford Dependencies CLEAR Google UD Stanford UD

28 Universal Dependencies Stanford Dependencies Google UD HamleDT CLEAR Stanford UD

29 Universal Dependencies Stanford Dependencies Google UD HamleDT CLEAR Stanford UD Interset

30 Universal Dependencies Stanford Dependencies Google UD HamleDT Google universal tags CLEAR Stanford UD Interset

31 Universal Dependencies Universal Dependencies

32 Universal Dependencies Universal Dependencies Milestones: Kick-off meeting at EACL in Gothenburg, April 2014 Release of annotation guidelines, Version 1, October 2014 Release of treebanks for 10 languages, January 2015 Release of treebanks for 18 languages, May 2015 Open community effort anyone can contribute!

33 Goals and Requirements

34 Goals and Requirements Cross-linguistically consistent grammatical annotation

35 Goals and Requirements Cross-linguistically consistent grammatical annotation Support multilingual research and development in NLP

36 Goals and Requirements Cross-linguistically consistent grammatical annotation Support multilingual research and development in NLP Based on common usage and existing de facto standards

37 Design Principles

38 Design Principles Dependency Widely used in practical NLP systems Available in treebanks for many languages

39 Design Principles Dependency Widely used in practical NLP systems Available in treebanks for many languages Lexicalism Basic annotation units are words syntactic words Words have morphological properties Words enter into syntactic relations

40 Design Principles Dependency Widely used in practical NLP systems Available in treebanks for many languages Lexicalism Basic annotation units are words syntactic words Words have morphological properties Words enter into syntactic relations Recoverability Transparent mapping from input text to word segmentation

41 Golden Rules

42 Golden Rules Maximize parallelism Don t annotate the same thing in different ways Don t make different things look the same

43 Golden Rules Maximize parallelism Don t annotate the same thing in different ways Don t make different things look the same But don t overdo it Don t annotate things that are not there Languages select from a universal pool of categories Allow language-specific extensions

44 En kat jager rotter og mus cc conj A cat chases rats Morphology and mice advmod Toutefois, les filles adorent les desserts. toutefois, les fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres

45 En kat jager rotter og og mus mus cc cc conj conj A cat chases rats rats Morphology and and mice mice advmod Toutefois, les filles adorent les desserts. toutefois, les fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres Lemma representing advmod the semantic content of the word Toutefois, les filles adorent les desserts. toutefois, le fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres

46 En kat jager rotter og og mus mus cc cc conj conj A cat chases rats rats Morphology and and mice mice advmod Toutefois, les filles adorent les desserts. toutefois, les fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres Lemma representing advmod the semantic content of the word Part-of-speech tag representing the abstract lexical Toutefois, les filles adorent les desserts. category associated with the word toutefois, le fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres

47 En kat jager rotter og og mus mus cc cc conj conj A cat chases rats rats Morphology and and mice mice advmod Toutefois, les filles adorent les desserts. toutefois, les fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Definite=Def Gender=Fem Number=Plur Definite=Def Gender=Masc Number=Plur Number=Plur Person=3 Number=Plur Number=Plur Tense=Pres Lemma representing advmod the semantic content of the word Part-of-speech tag representing the abstract lexical Toutefois, les filles adorent les desserts. category associated with the word toutefois, le fille adorer les dessert. ADV PUNCT DET NOUN VERB DET NOUN PUNCT Features Definite=Def representing Gender=Fem lexical Number=Plur and grammatical Definite=Def Gender=Masc properties Number=Plur Number=Plur Person=3 Number=Plur Number=Plur associated with the lemma Tense=Pres or the particular word form

48 Part-of-Speech Tags Open Closed Other ADJ ADP PUNCT ADV AUX SYM INTJ CONJ X NOUN PROPN VERB DET NUM PART PRON SCONJ Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset (Petrov et al., 2012) All languages use the same inventory, but not all tags have to be used by all languages

49 Features Lexical Inflectional Nominal Inflectional Verbal PronType Gender VerbForm NumType Animacy Mood Poss Number Tense Reflex Case Aspect Definite Voice Degree Person Negative Standardized inventory of morphological features, based on the Interset system (Zeman, 2008) Languages select relevant features and can add languagespecific features or values with documentation

50 Syntax nmod aux case aux The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT nmod aux case aux The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

51 nmod aux aux Syntax The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT case nmod The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT Content words are related by dependency relations

52 nmod aux aux Syntax The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT case nmod aux case aux The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT Content words are related by dependency relations nmod Function words attach to the content word they modify The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

53 Syntax nmod aux case aux The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT Content words are related by dependency relations Function words attach to the content word they modify aux Punctuation attach aux to head of phrase or clause The cat could have chased all the dogs down the street. DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT nmod case

54 pass case Hunden jagades av katten. NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def pass nmod The dog was chased by the cat. DET NOUN AUX VERB ADP DET NOUN PUNCT pass nmod Hunden jagades av katten. NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def nmod

55 pass Hunden jagades av katten. NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def pass nmod The dog was chased by the cat. DET NOUN AUX VERB ADP DET NOUN PUNCT pass nmod Hunden jagades av katten. NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def

56 pass auxpass nmod The dog was chased by the cat. DET NOUN AUX VERB ADP DET NOUN PUNCT pass nmod Hunden jagades av katten. NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def

57 pass auxpass nmod case The dog was chased by the cat. DET NOUN AUX VERB ADP DET NOUN PUNCT pass nmod case Hunden jagades av katten. NOUN VERB ADP NOUN PUNCT Definite=Def Voice=Pass Definite=Def nmod

58 Dependency Relations

59 Dependency Relations Taxonomy of 40 universal grammatical relations, broadly attested in language typology (de Marneffe et al., 2014) Language-specific subtypes may be added

60 Dependency Relations Taxonomy of 40 universal grammatical relations, broadly attested in language typology (de Marneffe et al., 2014) Language-specific subtypes may be added Organizing principles Three types of structures: nominals, clauses, modifiers Core arguments vs. other dependents (not complements vs. adjuncts)

61 Dependents of Clausal Predicates Nominal Clausal Other Core pass iobj csubj csubjpass ccomp xcomp Non-Core nmod vocative discourse expl advcl advmod neg aux auxpass cop mark

62

63 nmod nmod aux case advmod Mary was quietly reading a book in the garden. PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT advcl mark aux cop neg If you are sick, you should not exercise. SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT ccomp mark aux xcomp Peter thought that he should stop smoking. PROPN VERB SCONJ PRON AUX VERB VERB PUNCT appos mark

64 nmod nmod aux case aux case advmod advmod Mary was quietly reading book in the garden Mary was quietly reading a book in the garden. PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT advcl advcl mark mark aux aux cop neg cop neg If you are sick you should not exercise If you are sick, you should not exercise. SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT ccomp ccomp mark mark aux aux xcomp xcomp Peter thought that he should stop smoking Peter thought that he should stop smoking. PROPN VERB SCONJ PRON AUX VERB VERB PUNCT PROPN VERB SCONJ PRON AUX VERB VERB PUNCT appos appos mark mark nmod

65 aux aux auxadvmod advmod advmod nmod nmod nmod case case case Mary was quietly reading book in the garden Mary was quietly reading book in the garden PROPN Mary AUX was quietly ADV reading VERB DET a NOUN book ADP in DET the NOUN garden PUNCT. PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT mark mark mark cop cop cop advcl advcl advcl aux aux aux If you are sick you should not exercise If you are sick you should not exercise SCONJ If PRON you AUX are ADJ sick PUNCT, PRON you should AUX ADV not exercise VERB PUNCT. SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT ccomp ccomp ccompmark mark mark aux aux aux xcomp xcomp xcomp Peter thought that he should stop smoking Peter thought that he should stop smoking PROPN Peter thought VERB SCONJ that PRON he should AUX VERB stop smoking VERB PUNCT. PROPN VERB SCONJ PRON AUX VERB VERB PUNCT PROPN VERB SCONJ PRON AUX VERB VERB PUNCT appos appos appos mark mark mark nmod neg neg neg

66 PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT advcl Dependents mark of Nominals aux cop neg If you are sick, you should not exercise. SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT Nominal Clausal Other nummod appos nmod acl ccomp mark amod case xcomp Peter thought that he should stop smoking. PROPN VERB SCONJ PRON AUX VERB VERB PUNCT appos mark amod nmod case Cairo, the lovely capital of Egypt PROPN PUNCT DET ADJ NOUN ADP PROPN

67 Coordination conj cc () appos mark Coordination Cairo, the lovely capital of Eg PROPN PUNCT DET ADJ NOUN ADP PR amod nmod case conj cc conj Huey, Dewey and Louie PROPN PUNCT PROPN CONJ PROPN Coordinate structures are headed by the first conjunct Subsequent conjuncts depend on it via the conj relation Conjunctions depend on it via the cc relation Punctuation marks depend on it via the relation

68 Multiword Expressions Relation mwe name compound goeswith Examples in spite of, as well as, ad hoc Roger Bacon, Carl XVI Gustaf, New York phone book, four thousand, dress up notwith standing, with out UD annotation does not permit words with spaces Multiword expressions are analysed using special relations The mwe, name and goeswith relations are always head-initial The compound relation reflects the internal structure

69 Other Relations Relation parataxis list remnant reparandum foreign dep Explanation Loosely linked clauses of same rank Lists without syntactic structure Orphans in ellipsis linked to parallel elements Disfluency linked to (speech) repair Elements within opaque stretches of code switching Unspecified dependency Syntactically independent element of clause/phrase

70 Language-Specific Relations Language-specific relations are subtypes of universal relations added to capture important phenomena Subtyping permits us to back off to universal relations Relation acl:relcl compound:prt nmod:poss nmod:agent cc:preconj :pre Explanation Relative clause Verb particle (dress up) Genitive nominal (Mary s book) Agent in passive (saved by the bell) Preconjunction (both and) Preerminer (all those )

71 Papers Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Proceedings of CICLing, Naseem, T., Chen, H., Barzilay, R. and Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of EMNLP, McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of EMNLP, Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse? In Proceedings of LREC, McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL (Short Papers), De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology. In Proceedings of LREC, Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of EACL, Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of ACL (Short Papers), Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of Depling,

72 Papers Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Proceedings of CICLing, Naseem, T., Chen, H., Barzilay, R. and Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of EMNLP, McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of EMNLP, Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse? In Proceedings of LREC, McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL (Short Papers), De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology. In Proceedings of LREC, Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of EACL, Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of ACL (Short Papers), Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of Depling,

73 Papers Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Proceedings of CICLing, Naseem, T., Chen, H., Barzilay, R. and Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of EMNLP, McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of EMNLP, Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse? In Proceedings of LREC, McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL (Short Papers), De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology. In Proceedings of LREC, Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of EACL, Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of ACL (Short Papers), Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of Depling,

74 Papers Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Proceedings of CICLing, Naseem, T., Chen, H., Barzilay, R. and Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of EMNLP, McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of EMNLP, Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse? In Proceedings of LREC, McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL (Short Papers), De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology. In Proceedings of LREC, Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of EACL, Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of ACL (Short Papers), Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of Depling,

75 Papers Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Proceedings of CICLing, Naseem, T., Chen, H., Barzilay, R. and Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of EMNLP, McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of EMNLP, Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse? In Proceedings of LREC, McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL (Short Papers), De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology. In Proceedings of LREC, Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of EACL, Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of ACL (Short Papers), Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of Depling,

76 Papers Nivre, J. (2015) Towards a Universal Grammar of Natural Language Processing. In Proceedings of CICLing, Naseem, T., Chen, H., Barzilay, R. and Johnson, M. (2010) Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of EMNLP, McDonald, R., Petrov, S. and Hall, K. (2011) Multi-Source Transfer of Delexicalized Dependency Parsers. In Proceedings of EMNLP, Zeman, D., Marecek, D., Popel, M., Ramasamy, L., Stepánek, J., Zabokrtský, Z., Hajic, J. (2012) HamleDT: To Parse or Not to Parse? In Proceedings of LREC, McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013) Universal Dependency Annotation for Multilingual Parsing. In Proceedings of ACL (Short Papers), De Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014) Universal Stanford Dependencies: A Cross-Linguistic Typology. In Proceedings of LREC, Swanson, B. and Charniak, E. (2014) Data Driven Language Transfer Hypotheses. In Proceedings of EACL, Östling, R. (2015) Word Order Typology through Multilingual Word Alignment. In Proceedings of ACL (Short Papers), Futrell, R., Mahowald, K. and Gibson, E. (2015) Quantifying Word Order Freedom in Dependency Corpora. In Proceedings of Depling,

77 Research Problems Annotation Produce guidelines and/or annotation for language X Study the annotation of construction Y across languages Parsing Develop and/or evaluate a parser for language X Study cross-lingual transfer learning and/or annotation projection Develop better evaluation schemes for multilingual parsing Typology Study word order patterns in language X Compare the realisation of construction Y across languages

78 Questions?

Towards a Universal Grammar for Natural Language Processing

Towards a Universal Grammar for Natural Language Processing Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics and Philology Based on collaborative work with Filip Ginter, Yoav Goldberg, Jan Hajič,

More information

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Ling 201 Syntax 1. Jirka Hana April 10, 2006 Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

Syntactic Transfer Using a Bilingual Lexicon

Syntactic Transfer Using a Bilingual Lexicon Syntactic Transfer Using a Bilingual Lexicon Greg Durrett, Adam Pauls, and Dan Klein UC Berkeley Parsing a New Language Parsing a New Language Mozambique hope on trade with other members Parsing a New

More information

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles

Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

Annotation Guidelines for Dutch-English Word Alignment

Annotation Guidelines for Dutch-English Word Alignment Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College

More information

Context Grammar and POS Tagging

Context Grammar and POS Tagging Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]

More information

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged

More information

Chinese Open Relation Extraction for Knowledge Acquisition

Chinese Open Relation Extraction for Knowledge Acquisition Chinese Open Relation Extraction for Knowledge Acquisition Yuen-Hsien Tseng 1, Lung-Hao Lee 1,2, Shu-Yen Lin 1, Bo-Shun Liao 1, Mei-Jun Liu 1, Hsin-Hsi Chen 2, Oren Etzioni 3, Anthony Fader 4 1 Information

More information

Research Portfolio. Beáta B. Megyesi January 8, 2007

Research Portfolio. Beáta B. Megyesi January 8, 2007 Research Portfolio Beáta B. Megyesi January 8, 2007 Research Activities Research activities focus on mainly four areas: Natural language processing During the last ten years, since I started my academic

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many

More information

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23 POS Def. Part of Speech POS POS L645 POS = Assigning word class information to words Dept. of Linguistics, Indiana University Fall 2009 ex: the man bought a book determiner noun verb determiner noun 1

More information

Parsing Swedish. Atro Voutilainen Conexor oy [email protected]. CG and FDG

Parsing Swedish. Atro Voutilainen Conexor oy atro.voutilainen@conexor.fi. CG and FDG Parsing Swedish Atro Voutilainen Conexor oy [email protected] This paper presents two new systems for analysing Swedish texts: a light parser and a functional dependency grammar parser. Their

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Paraphrasing controlled English texts

Paraphrasing controlled English texts Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

EAP 1161 1660 Grammar Competencies Levels 1 6

EAP 1161 1660 Grammar Competencies Levels 1 6 EAP 1161 1660 Grammar Competencies Levels 1 6 Grammar Committee Representatives: Marcia Captan, Maria Fallon, Ira Fernandez, Myra Redman, Geraldine Walker Developmental Editor: Cynthia M. Schuemann Approved:

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of

More information

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect

More information

Wiki-ly Supervised Part-of-Speech Tagging

Wiki-ly Supervised Part-of-Speech Tagging Wiki-ly Supervised Part-of-Speech Tagging Shen Li Computer & Information Science University of Pennsylvania [email protected] João V. Graça L 2 F INESC-ID Lisboa, Portugal [email protected] Ben

More information

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided Categories Categories According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided into 1 2 Categories those that belong to the Emperor embalmed

More information

Building gold-standard treebanks for Norwegian

Building gold-standard treebanks for Norwegian Building gold-standard treebanks for Norwegian Per Erik Solberg National Library of Norway, P.O.Box 2674 Solli, NO-0203 Oslo, Norway [email protected] ABSTRACT Språkbanken at the National Library of Norway

More information

Outline of today s lecture

Outline of today s lecture Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?

More information

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

Automatic Detection and Correction of Errors in Dependency Treebanks

Automatic Detection and Correction of Errors in Dependency Treebanks Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany [email protected] Günter Neumann DFKI Stuhlsatzenhausweg

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

Doctoral School of Historical Sciences Dr. Székely Gábor professor Program of Assyiriology Dr. Dezső Tamás habilitate docent

Doctoral School of Historical Sciences Dr. Székely Gábor professor Program of Assyiriology Dr. Dezső Tamás habilitate docent Doctoral School of Historical Sciences Dr. Székely Gábor professor Program of Assyiriology Dr. Dezső Tamás habilitate docent The theses of the Dissertation Nominal and Verbal Plurality in Sumerian: A Morphosemantic

More information

Understanding English Grammar: A Linguistic Introduction

Understanding English Grammar: A Linguistic Introduction Understanding English Grammar: A Linguistic Introduction Additional Exercises for Chapter 8: Advanced concepts in English syntax 1. A "Toy Grammar" of English The following "phrase structure rules" define

More information

Structure of Clauses. March 9, 2004

Structure of Clauses. March 9, 2004 Structure of Clauses March 9, 2004 Preview Comments on HW 6 Schedule review session Finite and non-finite clauses Constituent structure of clauses Structure of Main Clauses Discuss HW #7 Course Evals Comments

More information

PyCantonese: Cantonese linguistic research in the age of big data

PyCantonese: Cantonese linguistic research in the age of big data PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus

More information

Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian

Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian Cristina Bosco, Alessandro Mazzei, and Vincenzo Lombardo Dipartimento di Informatica, Università di Torino, Corso Svizzera

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

#hardtoparse: POS Tagging and Parsing the Twitterverse

#hardtoparse: POS Tagging and Parsing the Twitterverse Analyzing Microtext: Papers from the 2011 AAAI Workshop (WS-11-05) #hardtoparse: POS Tagging and Parsing the Twitterverse Jennifer Foster 1, Özlem Çetinoğlu 1, Joachim Wagner 1, Joseph Le Roux 2 Stephen

More information

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1 Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically

More information

English auxiliary verbs

English auxiliary verbs 1. Auxiliary verbs Auxiliary verbs serve grammatical functions, for this reason they are said to belong to the functional category of words. The main auxiliary verbs in English are DO, BE and HAVE. Others,

More information

The course is included in the CPD programme for teachers II.

The course is included in the CPD programme for teachers II. Faculties of Humanities and Theology LLYU72, Swedish as a Second Language for Upper Secondary School Teachers, 60 credits Svenska som andraspråk för lärare i gymnasieskolan, 60 högskolepoäng First Cycle

More information

AUTOMATIC F-STRUCTURE ANNOTATION FROM THE AP TREEBANK

AUTOMATIC F-STRUCTURE ANNOTATION FROM THE AP TREEBANK AUTOMATIC F-STRUCTURE ANNOTATION FROM THE AP TREEBANK Louisa Sadler Josef van Genabith Andy Way University of Essex Dublin City University Dublin City University [email protected] [email protected]

More information

Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning

Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning Claire Gardent CNRS / LORIA, Nancy, France (Joint work with Eric Kow and Laura Perez-Beltrachini) March

More information

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX COLING 82, J. Horeck~ (ed.j North-Holland Publishing Compa~y Academia, 1982 COMPUTATIONAL DATA ANALYSIS FOR SYNTAX Ludmila UhliFova - Zva Nebeska - Jan Kralik Czech Language Institute Czechoslovak Academy

More information

CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing

More information

DEPENDENCY PARSING JOAKIM NIVRE

DEPENDENCY PARSING JOAKIM NIVRE DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes

More information

Social Media Text and Predictive Analytics

Social Media Text and Predictive Analytics Social Media Text and Predictive Analytics Invited Talk at Big Data Everywhere Cambridge, MA June 23, 2015 Dr. Subrata Das Machine Analytics, Inc., Cambridge, MA www.machineanalytics.com Outline Analytics

More information

Grammar Rules: Parts of Speech Words are classed into eight categories according to their uses in a sentence.

Grammar Rules: Parts of Speech Words are classed into eight categories according to their uses in a sentence. Grammar Rules: Parts of Speech Words are classed into eight categories according to their uses in a sentence. 1. Noun Name for a person, animal, thing, place, idea, activity. John, cat, box, desert,, golf

More information

Chapter 13, Sections 13.1-13.2. Auxiliary Verbs. 2003 CSLI Publications

Chapter 13, Sections 13.1-13.2. Auxiliary Verbs. 2003 CSLI Publications Chapter 13, Sections 13.1-13.2 Auxiliary Verbs What Auxiliaries Are Sometimes called helping verbs, auxiliaries are little words that come before the main verb of a sentence, including forms of be, have,

More information

Teaching English as a Foreign Language (TEFL) Certificate Programs

Teaching English as a Foreign Language (TEFL) Certificate Programs Teaching English as a Foreign Language (TEFL) Certificate Programs Our TEFL offerings include one 27-unit professional certificate program and four shorter 12-unit certificates: TEFL Professional Certificate

More information

Annotation and Evaluation of Swedish Multiword Named Entities

Annotation and Evaluation of Swedish Multiword Named Entities Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden [email protected] Introduction

More information

SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer

SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer Timur Gilmanov, Olga Scrivner, Sandra Kübler Indiana University

More information

Rethinking the relationship between transitive and intransitive verbs

Rethinking the relationship between transitive and intransitive verbs Rethinking the relationship between transitive and intransitive verbs Students with whom I have studied grammar will remember my frustration at the idea that linking verbs can be intransitive. Nonsense!

More information

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE OBJECTIVES the game is to say something new with old words RALPH WALDO EMERSON, Journals (1849) In this chapter, you will learn: how we categorize words how words

More information

Strategies for Technical Writing

Strategies for Technical Writing Strategies for Technical Writing Writing as Process Recommendation (to keep audience in mind): Write a first draft for yourself. Get your explanations and as many details as possible down on paper. Write

More information

Syntactic Theory on Swedish

Syntactic Theory on Swedish Syntactic Theory on Swedish Mats Uddenfeldt Pernilla Näsfors June 13, 2003 Report for Introductory course in NLP Department of Linguistics Uppsala University Sweden Abstract Using the grammar presented

More information

Simple Type-Level Unsupervised POS Tagging

Simple Type-Level Unsupervised POS Tagging Simple Type-Level Unsupervised POS Tagging Yoong Keok Lee Aria Haghighi Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {yklee, aria42, regina}@csail.mit.edu

More information

Development of a Dependency Treebank for Russian and its Possible Applications in NLP

Development of a Dependency Treebank for Russian and its Possible Applications in NLP Development of a Dependency Treebank for Russian and its Possible Applications in NLP Igor BOGUSLAVSKY, Ivan CHARDIN, Svetlana GRIGORIEVA, Nikolai GRIGORIEV, Leonid IOMDIN, Lеonid KREIDLIN, Nadezhda FRID

More information

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014 COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: [email protected] COURSE DESCRIPTION: This is a practical

More information

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics

More information

Introduction. Philipp Koehn. 28 January 2016

Introduction. Philipp Koehn. 28 January 2016 Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)

More information

Compound Sentences and Coordination

Compound Sentences and Coordination Compound Sentences and Coordination Mary Westervelt Reference: Ann Hogue (2003) The Essentials of English: A Writer s Handbook. New York, Pearson Education, Inc. When two sentences are combined in a way

More information

SCHOOL OF HEALTH SCIENCES DIVISION OF SPEECH & HEARING SCIENCES LEVEL MASTERS / DIET 1. SM025/ Linguistics 2: Clinical Linguistics

SCHOOL OF HEALTH SCIENCES DIVISION OF SPEECH & HEARING SCIENCES LEVEL MASTERS / DIET 1. SM025/ Linguistics 2: Clinical Linguistics SCHOOL OF HEALTH SCIENCES DIVISION OF SPEECH & HEARING SCIENCES LEVEL MASTERS / DIET 1 SM025/ Linguistics 2: Clinical Linguistics DATE: 05/05/2011 TIME: 9:30 WRITING TIME: 150 minutes READING TIME: 5 minutes

More information

Index. 344 Grammar and Language Workbook, Grade 8

Index. 344 Grammar and Language Workbook, Grade 8 Index Index 343 Index A A, an (usage), 8, 123 A, an, the (articles), 8, 123 diagraming, 205 Abbreviations, correct use of, 18 19, 273 Abstract nouns, defined, 4, 63 Accept, except, 12, 227 Action verbs,

More information

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive

More information

Learning Translation Rules from Bilingual English Filipino Corpus

Learning Translation Rules from Bilingual English Filipino Corpus Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,

More information

Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3

Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3 Yıl/Year: 2012 Cilt/Volume: 1 Sayı/Issue:2 Sayfalar/Pages: 40-47 Differences in linguistic and discourse features of narrative writing performance Abstract Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu

More information

The Role of Sentence Structure in Recognizing Textual Entailment

The Role of Sentence Structure in Recognizing Textual Entailment Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure

More information

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014 Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook

More information

Automatic Pronominal Anaphora Resolution. in English Texts

Automatic Pronominal Anaphora Resolution. in English Texts Automatic Pronominal Anaphora Resolution in English Texts Tyne Liang and Dian-Song Wu Department of Computer and Information Science National Chiao Tung University Hsinchu, Taiwan Email: [email protected];

More information

Automated Content Analysis of Discussion Transcripts

Automated Content Analysis of Discussion Transcripts Automated Content Analysis of Discussion Transcripts Vitomir Kovanović [email protected] Dragan Gašević [email protected] School of Informatics, University of Edinburgh Edinburgh, United Kingdom [email protected]

More information

Curriculum Vitae. Joakim Nivre. Personal Information. Education

Curriculum Vitae. Joakim Nivre. Personal Information. Education Curriculum Vitae Joakim Nivre Personal Information Name: Joakim Nivre Born: August 21, 1962 Gender: Male Nationality: Swedish Email: [email protected] Web: http://stp.lingfil.uu.se/ nivre/ Education

More information

Chapter 1. Introduction. 1.1. Topic of the dissertation

Chapter 1. Introduction. 1.1. Topic of the dissertation Chapter 1. Introduction 1.1. Topic of the dissertation The topic of the dissertation is the relations between transitive verbs, aspect, and case marking in Estonian. Aspectual particles, verbs, and case

More information

Veronika VINCZE, PhD. PERSONAL DATA Date of birth: 1 July 1981 Nationality: Hungarian

Veronika VINCZE, PhD. PERSONAL DATA Date of birth: 1 July 1981 Nationality: Hungarian Veronika VINCZE, PhD CONTACT INFORMATION Hungarian Academy of Sciences Research Group on Artificial Intelligence Tisza Lajos krt. 103., 6720 Szeged, Hungary Phone: +36 62 54 41 40 Mobile: +36 70 22 99

More information

TRANSLATING POLISH TEXTS INTO SIGN LANGUAGE IN THE TGT SYSTEM

TRANSLATING POLISH TEXTS INTO SIGN LANGUAGE IN THE TGT SYSTEM 20th IASTED International Multi-Conference Applied Informatics AI 2002, Innsbruck, Austria 2002, pp. 282-287 TRANSLATING POLISH TEXTS INTO SIGN LANGUAGE IN THE TGT SYSTEM NINA SUSZCZAŃSKA, PRZEMYSŁAW SZMAL,

More information

the primary emphasis on explanation in terms of factors outside the formal structure of language.

the primary emphasis on explanation in terms of factors outside the formal structure of language. Grammar: Functional Approaches William Croft MSC03 2130, Linguistics 1 University of New Mexico Albuquerque NM 87131-0001 USA Abstract The term functional or functionalist has been applied to any approach

More information

Syntax: Phrases. 1. The phrase

Syntax: Phrases. 1. The phrase Syntax: Phrases Sentences can be divided into phrases. A phrase is a group of words forming a unit and united around a head, the most important part of the phrase. The head can be a noun NP, a verb VP,

More information

Genre distinctions and discourse modes: Text types differ in their situation type distributions

Genre distinctions and discourse modes: Text types differ in their situation type distributions Genre distinctions and discourse modes: Text types differ in their situation type distributions Alexis Palmer and Annemarie Friedrich Department of Computational Linguistics Saarland University, Saarbrücken,

More information

CALICO Journal, Volume 9 Number 1 9

CALICO Journal, Volume 9 Number 1 9 PARSING, ERROR DIAGNOSTICS AND INSTRUCTION IN A FRENCH TUTOR GILLES LABRIE AND L.P.S. SINGH Abstract: This paper describes the strategy used in Miniprof, a program designed to provide "intelligent' instruction

More information

UNIVERSITÀ DEGLI STUDI DELL AQUILA CENTRO LINGUISTICO DI ATENEO

UNIVERSITÀ DEGLI STUDI DELL AQUILA CENTRO LINGUISTICO DI ATENEO TESTING DI LINGUA INGLESE: PROGRAMMA DI TUTTI I LIVELLI - a.a. 2010/2011 Collaboratori e Esperti Linguistici di Lingua Inglese: Dott.ssa Fatima Bassi e-mail: [email protected] Dott.ssa Liliana

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning [email protected]

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning [email protected] Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &

More information

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain? What s in a Lexicon What kind of Information should a Lexicon contain? The Lexicon Miriam Butt November 2002 Semantic: information about lexical meaning and relations (thematic roles, selectional restrictions,

More information

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,

More information

Linguistics to Structure Unstructured Information

Linguistics to Structure Unstructured Information Linguistics to Structure Unstructured Information Authors: Günter Neumann (DFKI), Gerhard Paaß (Fraunhofer IAIS), David van den Akker (Attensity Europe GmbH) Abstract The extraction of semantics of unstructured

More information

Learning Morphological Disambiguation Rules for Turkish

Learning Morphological Disambiguation Rules for Turkish Learning Morphological Disambiguation Rules for Turkish Deniz Yuret Dept. of Computer Engineering Koç University İstanbul, Turkey [email protected] Ferhan Türe Dept. of Computer Engineering Koç University

More information

Linguistic Universals

Linguistic Universals Armin W. Buch 1 2012/11/28 1 Relying heavily on material by Gerhard Jäger and David Erschler Linguistic Properties shared by all languages Trivial: all languages have consonants and vowels More interesting:

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information