UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu and Hindi
|
|
|
- Nigel James
- 9 years ago
- Views:
Transcription
1 UrduGram: Towards a Deep, Large-Coverage Grammar for Urdu and Hindi Tafseer Ahmed, Tina Bögel, Miriam Butt, Annette Hautli, Ghulam Raza, Sebastian Sulger and Veronika Walther Universität Konstanz FB Kolloquium, May /60
2 Preview 1 Urdu & the UrduGram Project 2 Urdu Transliterator 3 Syntax 4 Semantics 2 /60
3 Urdu & the UrduGram Project Urdu Urdu is 3 /60
4 Urdu & the UrduGram Project Urdu Urdu is a South Asian language spoken primarily in Pakistan and India 3 /60
5 Urdu & the UrduGram Project Urdu Urdu is a South Asian language spoken primarily in Pakistan and India descended from (a version of) Sanskrit (sister language of Latin) 3 /60
6 Urdu & the UrduGram Project Urdu Urdu is a South Asian language spoken primarily in Pakistan and India descended from (a version of) Sanskrit (sister language of Latin) structurally identical to Hindi (spoken mainly in India) 3 /60
7 Urdu & the UrduGram Project Urdu Urdu is a South Asian language spoken primarily in Pakistan and India descended from (a version of) Sanskrit (sister language of Latin) structurally identical to Hindi (spoken mainly in India) together with Hindi the fourth most spoken language in the world ( 250 million native speakers) 3 /60
8 Urdu & the UrduGram Project Urdu and Hindi The two languages are regarded as structurally identical: 4 /60
9 Urdu & the UrduGram Project Urdu and Hindi The two languages are regarded as structurally identical: syntax/morphology are practically identical 4 /60
10 Urdu & the UrduGram Project Urdu and Hindi The two languages are regarded as structurally identical: syntax/morphology are practically identical vocabulary is practically identical (Urdu: borrowed from Persian/Arabic; Hindi: borrowed from Sanskrit) 4 /60
11 Urdu & the UrduGram Project Urdu and Hindi The two languages are regarded as structurally identical: syntax/morphology are practically identical vocabulary is practically identical (Urdu: borrowed from Persian/Arabic; Hindi: borrowed from Sanskrit) main difference is in the script 4 /60
12 Urdu & the UrduGram Project Urdu and Hindi The two languages are regarded as structurally identical: syntax/morphology are practically identical vocabulary is practically identical (Urdu: borrowed from Persian/Arabic; Hindi: borrowed from Sanskrit) main difference is in the script We are developing a single grammar and lexicon for both of the languages! 4 /60
13 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz 5 /60
14 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi 5 /60
15 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi Grammar is part of the ParGram project 5 /60
16 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi Grammar is part of the ParGram project Collaborative, world-wide research project 5 /60
17 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi Grammar is part of the ParGram project Collaborative, world-wide research project Devoted to developing parallel LFG grammars for a variety of languages 5 /60
18 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi Grammar is part of the ParGram project Collaborative, world-wide research project Devoted to developing parallel LFG grammars for a variety of languages Features and analyses are kept parallel for easy transfer between languages 5 /60
19 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi Grammar is part of the ParGram project Collaborative, world-wide research project Devoted to developing parallel LFG grammars for a variety of languages Features and analyses are kept parallel for easy transfer between languages Languages involved: 5 /60
20 Urdu & the UrduGram Project Context of Work Computational LFG grammar in development in Konstanz Aim: large-scale LFG grammar for parsing Urdu/Hindi Grammar is part of the ParGram project Collaborative, world-wide research project Devoted to developing parallel LFG grammars for a variety of languages Features and analyses are kept parallel for easy transfer between languages Languages involved: English, German, French, Japanese, Norwegian, Welsh, Georgian, Hungarian, Turkish, Chinese, Indonesian, Urdu (among many others) 5 /60
21 Urdu & the UrduGram Project The ParGram Grammar Architecture 6 /60
22 Urdu & the UrduGram Project The Parallel in ParGram Analysis for transitive sentence in English ParGram grammar (F-Structure, Functional Structure ): 7 /60
23 Urdu & the UrduGram Project The Parallel in ParGram Analysis for transitive sentence in English ParGram grammar (F-Structure, Functional Structure ): "Nadya saw the book." PRED SUBJ OBJ CHECK 'see<[1:nadya], [113:book]>' PRED 'Nadya' CHECK _LEX-SOURCE morphology, _PROPER known-name NTYPE NSEM PROPER NAME-TYPE first_name, PROPER-TYPE name NSYN proper 1 CASE nom, GEND-SEM female, HUMAN +, NUM sg, PERS 3 PRED 'book' CHECK _LEX-SOURCE countnoun-lex NTYPE NSEM COMMON count NSYN common SPEC PRED 'the' DET DET-TYPE def 113 CASE obl, NUM sg, PERS 3 _SUBCAT-FRAME V-SUBJ-OBJ TNS-ASP MOOD indicative, PERF -_, PROG -_, TENSE past 57 CLAUSE-TYPE decl, PASSIVE -, VTYPE main 7 /60
24 Urdu & the UrduGram Project The Parallel in ParGram (cont.) Analysis for the same transitive sentence in Urdu ParGram grammar (F-Structure, Functional Structure ): 8 /60
25 Urdu & the UrduGram Project The Parallel in ParGram (cont.) Analysis for the same transitive sentence in Urdu ParGram grammar (F-Structure, Functional Structure ): "nadiyah ne kitab dekhi" PRED 'dekh<[1:nadiyah], [20:kitAb]>' PRED 'nadiyah' CHECK _NMORPH obl SUBJ OBJ CHECK NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE erg, GEND fem, NUM sg, PERS 3 PRED 'kitab' NTYPE NSEM COMMON count NSYN common 20 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _SUBCAT-FRAME V-SUBJ-OBJ, _VFORM perf LEX-SEM AGENTIVE + TNS-ASP ASPECT perf, MOOD indicative 42 CLAUSE-TYPE decl, PASSIVE -, VTYPE main 8 /60
26 Urdu & the UrduGram Project The Parallel in ParGram (cont.) Analysis for the same transitive sentence in Urdu ParGram grammar (F-Structure, Functional Structure ): "nadiyah ne kitab dekhi" PRED 'dekh<[1:nadiyah], [20:kitAb]>' PRED 'nadiyah' CHECK _NMORPH obl SUBJ OBJ CHECK NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE erg, GEND fem, NUM sg, PERS 3 PRED 'kitab' NTYPE NSEM COMMON count NSYN common 20 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _SUBCAT-FRAME V-SUBJ-OBJ, _VFORM perf LEX-SEM AGENTIVE + TNS-ASP ASPECT perf, MOOD indicative 42 CLAUSE-TYPE decl, PASSIVE -, VTYPE main Analyses are kept parallel where possible 8 /60
27 Urdu & the UrduGram Project The Parallel in ParGram (cont.) Analysis for the same transitive sentence in Urdu ParGram grammar (F-Structure, Functional Structure ): "nadiyah ne kitab dekhi" PRED 'dekh<[1:nadiyah], [20:kitAb]>' PRED 'nadiyah' CHECK _NMORPH obl SUBJ OBJ CHECK NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE erg, GEND fem, NUM sg, PERS 3 PRED 'kitab' NTYPE NSEM COMMON count NSYN common 20 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _SUBCAT-FRAME V-SUBJ-OBJ, _VFORM perf LEX-SEM AGENTIVE + TNS-ASP ASPECT perf, MOOD indicative 42 CLAUSE-TYPE decl, PASSIVE -, VTYPE main Analyses are kept parallel where possible Features are kept parallel where possible 8 /60
28 Urdu & the UrduGram Project The Parallel in ParGram (cont.) Demo: Large-Scale English ParGram Grammar 9 /60
29 Urdu & the UrduGram Project Computational Grammars - What For? The Motivation behind ParGram 10 /60
30 Urdu & the UrduGram Project Computational Grammars - What For? The Motivation behind ParGram The ParGram project is working on Deep Grammars 10 /60
31 Urdu & the UrduGram Project Computational Grammars - What For? The Motivation behind ParGram The ParGram project is working on Deep Grammars Provide detailed syntactic and semantic analyses 10 /60
32 Urdu & the UrduGram Project Computational Grammars - What For? The Motivation behind ParGram The ParGram project is working on Deep Grammars Provide detailed syntactic and semantic analyses Encode grammatical functions, tense, number etc. 10 /60
33 Urdu & the UrduGram Project Computational Grammars - What For? The Motivation behind ParGram The ParGram project is working on Deep Grammars Provide detailed syntactic and semantic analyses Encode grammatical functions, tense, number etc. Linguistically motivated 10 /60
34 Urdu & the UrduGram Project Computational Grammars - What For? The Motivation behind ParGram The ParGram project is working on Deep Grammars Provide detailed syntactic and semantic analyses Encode grammatical functions, tense, number etc. Linguistically motivated Usually manually constructed ( linguistic intuition) 10 /60
35 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications 11 /60
36 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: 11 /60
37 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications 11 /60
38 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications Web-Search 11 /60
39 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications Web-Search Question-Answering 11 /60
40 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications Web-Search Question-Answering Knowledge Representation 11 /60
41 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications Web-Search Question-Answering Knowledge Representation Text Summarization 11 /60
42 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications Web-Search Question-Answering Knowledge Representation Text Summarization Machine Translation 11 /60
43 Urdu & the UrduGram Project Computational Grammars - What For? Possible Applications Large-Coverage, Deep Computational Grammars can be useful for: Meaning-Sensitive Applications Web-Search Question-Answering Knowledge Representation Text Summarization Machine Translation Computer-Assisted Language Learning 11 /60
44 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com 12 /60
45 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com Semantic search engine 12 /60
46 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com Semantic search engine Uses large-scale English LFG 12 /60
47 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com Semantic search engine Uses large-scale English LFG Works on English Wikipedia 12 /60
48 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com Semantic search engine Uses large-scale English LFG Works on English Wikipedia Parses query and matches with parsed corpus 12 /60
49 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com Semantic search engine Uses large-scale English LFG Works on English Wikipedia Parses query and matches with parsed corpus Can give better results than regular search engines 12 /60
50 Urdu & the UrduGram Project Computational Grammars - What For? powerset.com Semantic search engine Uses large-scale English LFG Works on English Wikipedia Parses query and matches with parsed corpus Can give better results than regular search engines (Example: X was bought by Y vs. Y acquired X ) 12 /60
51 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: 13 /60
52 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: tokenizer 13 /60
53 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: tokenizer transliterator (Urdu & Hindi to Roman script) 13 /60
54 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) 13 /60
55 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) syntax (c- and f-structure) (xle) 13 /60
56 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) syntax (c- and f-structure) (xle) semantics (xfr ordered rewriting) 13 /60
57 Urdu & the UrduGram Project Our Overall Architecture Our parsing architecture currently looks like this: tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) syntax (c- and f-structure) (xle) semantics (xfr ordered rewriting) xle is the overall development platform, with the other modules (fst and xfr) being plugged into it. 13 /60
58 Urdu & the UrduGram Project Overview Overall Architecture tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) syntax (c- and f-structure) (xle) semantics (xfr ordered rewriting) 14 /60
59 Urdu Transliterator Aim of the transliterator Our aim is to build and integrate a transliterator that allows for both, Urdu and Hindi, to be parsed and generated with the same grammar. couplet by the poet Mirza Ghalib Urdu Hindi Romanized Script (the XLE grammar) Right now we are working on the Urdu-Roman transliterator. 15 /60
60 Urdu Transliterator Transliteration scheme An excerpt from our scheme table: Unicode Urdu character Latin letter Phoneme in transliteration scheme b /b/ p /p/ t /t/ T /ú/ ^ j /j/ ^ c / > Ù/ 16 /60
61 Urdu Transliterator Basic idea of the transliterator use finite state transducer to allow for generation and parsing. 17 /60
62 Urdu Transliterator Basic idea of the transliterator use finite state transducer to allow for generation and parsing. Urdu script: parsing generating ASCII: ba 17 /60
63 Urdu Transliterator Basic idea of the transliterator use finite state transducer to allow for generation and parsing. Urdu script: parsing generating ASCII: ba The same concept will be used to create a transliterator for Hindi/Devanagari 17 /60
64 Urdu Transliterator Basic idea of the transliterator use finite state transducer to allow for generation and parsing. Urdu script: parsing generating ASCII: ba The same concept will be used to create a transliterator for Hindi/Devanagari This way we can parse Urdu script and generate Hindi script (and vice versa) 17 /60
65 Urdu Transliterator Position of the transliterator the transliterator is composed with the tokenizer (separates the words within a sentence) 18 /60
66 Urdu Transliterator Position of the transliterator the transliterator is composed with the tokenizer (separates the words within a sentence) tokenizer and transliterator are placed in front of the morphology 18 /60
67 Urdu Transliterator Position of the transliterator the transliterator is composed with the tokenizer (separates the words within a sentence) tokenizer and transliterator are placed in front of the morphology Input Transliterator Output kitab Input kitab Morphology Output kitab+noun+fem+sg+count XLE /60
68 Urdu Transliterator Example The transliterator at this position works quite well: (1) larke boy ki kitab gen book The boy s book Problem: long sentences or highly ambiguous words (when looking at script) need some time to parse. 19 /60
69 Urdu Transliterator Problems of the script - an example The problem of the vowels... Diacritics represent short vowels Urdu script Roman script ba bi bu 20 /60
70 Urdu Transliterator Problems of the script - an example The problem of the vowels... Diacritics represent short vowels Urdu script Roman script ba bi bu (2) nadya ne yasin ko kitab dekhne di Nadya erg Yasin dat see let Nadya let Yassin see the book 20 /60
71 Urdu Transliterator Problems of the script - an example The problem of the vowels... Diacritics represent short vowels Urdu script Roman script ba bi bu (2) nadya ne yasin ko kitab dekhne di Nadya erg Yasin dat see let Nadya let Yassin see the book Unfortunately, these diacritics tend to be left out. 20 /60
72 Urdu Transliterator Consequences If the input is without diacritics, e.g.... Urdu script letter combination representation translation ktab kitab book 21 /60
73 Urdu Transliterator Consequences If the input is without diacritics, e.g.... Urdu script letter combination representation translation ktab kitab book.. then there are all kinds of possible combinations: kitab, kutaab, kitabu, ikataubi, ukitabia, akatabu, aukataib /60
74 Urdu Transliterator Consequences If the input is without diacritics, e.g.... Urdu script letter combination representation translation ktab kitab book.. then there are all kinds of possible combinations: kitab, kutaab, kitabu, ikataubi, ukitabia, akatabu, aukataib... (demo) 21 /60
75 Urdu Transliterator Solution In order to restrict this overgeneration the possible letter combinations need to be constrained: 22 /60
76 Urdu Transliterator Solution In order to restrict this overgeneration the possible letter combinations need to be constrained: which vowels are actually allowed to cooccur? ai, but not ia? 22 /60
77 Urdu Transliterator Solution In order to restrict this overgeneration the possible letter combinations need to be constrained: which vowels are actually allowed to cooccur? ai, but not ia? which consonants are actually allowed to cooccur? initial kr, but not gr? 22 /60
78 Urdu Transliterator Solution In order to restrict this overgeneration the possible letter combinations need to be constrained: which vowels are actually allowed to cooccur? ai, but not ia? which consonants are actually allowed to cooccur? initial kr, but not gr? certain combinations with semi-vowels or consonants are not allowed: a short vowel followed by v may not be followed by u or i 22 /60
79 Urdu Transliterator Solution In order to restrict this overgeneration the possible letter combinations need to be constrained: which vowels are actually allowed to cooccur? ai, but not ia? which consonants are actually allowed to cooccur? initial kr, but not gr? certain combinations with semi-vowels or consonants are not allowed: a short vowel followed by v may not be followed by u or i certain positions are prohibited: a word can never end in a short vowel or begin with a short vowel that is only represented with a diacritic 22 /60
80 Urdu Transliterator Solution write rules and filters out of these constraints and apply them to the transliterator (demo) 23 /60
81 Urdu Transliterator Solution write rules and filters out of these constraints and apply them to the transliterator (demo) Problem: these rules cannot be found in the literature - they are a product of extensive manual labor 23 /60
82 Urdu Transliterator Solution write rules and filters out of these constraints and apply them to the transliterator (demo) Problem: these rules cannot be found in the literature - they are a product of extensive manual labor However, the transliterator works quite well now Some sentences are still a little slow (but I keep looking for possible restrictions) continue with generation of Urdu and the Hindi transliterator 23 /60
83 Urdu Transliterator Overview Overall Architecture tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) syntax (c- and f-structure) (xle) semantics (xfr ordered rewriting) 24 /60
84 Syntax Syntax syntax component is at the core of Urdu grammar 25 /60
85 Syntax Syntax syntax component is at the core of Urdu grammar theoretical background: LFG 25 /60
86 Syntax Syntax syntax component is at the core of Urdu grammar theoretical background: LFG well-studied ( 30 years) framework with computational usability 25 /60
87 Syntax Syntax syntax component is at the core of Urdu grammar theoretical background: LFG well-studied ( 30 years) framework with computational usability c- and f-structures used for syntactic representation 25 /60
88 Syntax Syntax syntax component is at the core of Urdu grammar theoretical background: LFG well-studied ( 30 years) framework with computational usability c- and f-structures used for syntactic representation c-structure: basic constituent structure ( tree ) and linear precedence ( what parts belong together) 25 /60
89 Syntax Syntax syntax component is at the core of Urdu grammar theoretical background: LFG well-studied ( 30 years) framework with computational usability c- and f-structures used for syntactic representation c-structure: basic constituent structure ( tree ) and linear precedence ( what parts belong together) f-structure: encodes syntactic functions and properties 25 /60
90 Syntax Syntax CS 1: ROOT S KP VCmain NP V N hansi nadiyah "nadiyah hansi" PRED SUBJ CHECK 'hans<[1:nadiyah]>' PRED 'nadiyah' NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _SUBCAT-FRAME V-SUBJ, _VFORM perf LEX-SEM VERB-CLASS unerg TNS-ASP ASPECT perf, MOOD indicative 18 CLAUSE-TYPE decl, PASSIVE -, VTYPE main 26 /60
91 Syntax Syntax CS 1: ROOT S KP VCmain NP V N hansi nadiyah "nadiyah hansi" PRED SUBJ CHECK 'hans<[1:nadiyah]>' PRED 'nadiyah' NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _SUBCAT-FRAME V-SUBJ, _VFORM perf LEX-SEM VERB-CLASS unerg TNS-ASP ASPECT perf, MOOD indicative 18 CLAUSE-TYPE decl, PASSIVE -, VTYPE main current size: 53 phrase-structure rules, annotated for syntactic function (usual size of large-scale grammars: rules) 26 /60
92 Syntax Syntax CS 1: ROOT S KP VCmain NP V N hansi nadiyah "nadiyah hansi" PRED SUBJ CHECK 'hans<[1:nadiyah]>' PRED 'nadiyah' NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _SUBCAT-FRAME V-SUBJ, _VFORM perf LEX-SEM VERB-CLASS unerg TNS-ASP ASPECT perf, MOOD indicative 18 CLAUSE-TYPE decl, PASSIVE -, VTYPE main current size: 53 phrase-structure rules, annotated for syntactic function (usual size of large-scale grammars: rules) coverage: basic clauses with free word order, NP syntax, tense and aspect, causative verbs, complex predicates, relative clauses, passives, semantically-based case marking 26 /60
93 Syntax Discontinuous NPs in Urdu 1 Well known discontinuities 2 NP-internal discontinuity in Urdu 3 LFG implementation 4 Conclusion 27 /60
94 Syntax Extraction from DP (2) a. Er hat viele Bücher über Logik gekauft. He has many books on logic bought He has bought many books about logic. b. Bücher über Logik hat er viele gekauft. c. Über Logik hat er viele Bücher gekauft. (German) (3) mantiq=par nida=ne Ek kitab logic=loc.on Nida=Erg one book.f.3sg xarid-i he. buy-perf be.pres Nida has purchased a book on logic. (Urdu) 28 /60
95 Syntax Quantifier Float (4) a. They all have bought a car. b. They have all bought a car. (5) Am ali=ne bahut kha-e mango.pl Ali=Erg many eat-perf Ali ate many mangoes. (Urdu) 29 /60
96 Syntax Constituent-level discontinuities in Urdu NP-internal discontinuity Discontinuous NP Discontinuous AP 30 /60
97 Syntax When NP-internal discontinuity occurs in Urdu The NP-internal discontinuity in Urdu can occur when the argument-taking noun is modified by: argument-taking adjectives argument-taking specifier nouns 31 /60
98 Syntax Argument-taking adjectives in Urdu Nr. Type of Argument Example of Adjective Phrase (i) Dative Marked sadr=ko hasil president=dat possessed possessed by the president (ii) Ablative Marked adliyah=se xaif courts=abl afraid afraid of courts (iii) Locative Marked buxar=men mubtala fever=loc.in suffered suffered with fever (iv) Adpositional sihat=ke liye muzir health=gen for harmful harmful for health 32 /60
99 Syntax Simple examples of argument-taking nouns (6) a. istisna immunity b. muqaddamat=se istisna court-case.pl=abl immunity immunity from court-cases c. muqaddamat=se AInI istisna court-case.pl=abl constitutional immunity constitutional immunity from court-cases 33 /60
100 Syntax Simple examples of argument-taking nouns (7) a. barifing briefing b. salamti=par barifing security=loc briefing briefing on security c. salamti=par tafsili barifing security=loc detailed briefing detailed briefing on security 34 /60
101 Syntax Simple examples of argument-taking nouns (8) a. mutalba demand b. ArmI-cIf=sE mutalba army-chief=abl demand demand to the army-chief c. ArmI-cIf=sE qanuni mutalba army-chief=abl legal demand legal demand to the army-chief 35 /60
102 Syntax Examples of discontinuous NPs (9)a1. sadr=ko 1 hasil 1 muqaddamat=se 2 president=dat possessed court-cases=abl AInI istisna 2 constitutional immunity Constitutional Immunity from court-cases possessed by the president a2. [ NP [ AP [ KP sadr=ko] hasil][ KP muqaddamat=se] AInI istisna] b. muqaddamat=se 2 sadr=ko 1 hasil 1 AInI istisna 2 c. sadr=ko 1 muqaddamat=se 2 hasil 1 AInI istisna 2 d. *hasil 1 muqaddamat=se 2 sadr=ko 1 AInI istisna 2 36 /60
103 Syntax Hierarchical structure of AP in NP CS 1: NP KP AP AP N NP K KP A A istis2na N se NP K h2as3il AInI muqaddamat N ko s3adr Figure: Hierarchical structure of AP in NP 37 /60
104 Syntax Examples of discontinuous NPs (10) a1. ArmI-cIf=sE 2 salamti=par 1 barifing 1 =ka mutalba 2 army-chief=abl security=loc.on briefing=gen demand The demand to the army chief for briefing on security a2. [ NP [ KP ArmI-cIf=sE][ KP [ NP [ KP salamti=par] barifing]=ka] mutalba] b. salamti=par 1 ArmI-cIf=sE 2 barifing 1 =ka mutalba 2 38 /60
105 Syntax Examples of discontinuous NPs (11) [ NP [ KP ArmI-cIf=sE] [ KP [ NP [ KP mulki salamti=par] army-chief=abl of-country security=loc.on tafsili barifing]=ka] qanuni mutalba] detailed briefing=gen legal demand The legal demand to the army chief for a detailed briefing on security of the country 39 /60
106 Syntax LFG implementation of NP-internal discontinuity Scrambling of elements in oval possible with some constraints NP KP/PP A+ A N Spec(N)/Arg(N/A) Arg-taking-adj Arg-less-adj Head-noun Figure: Word Order in Noun Phrases of Urdu 40 /60
107 Syntax Implementation Issues Free word order in an NP Relating arguments with corresponding heads Head last constraint 41 /60
108 Syntax LFG instruments used Shuffle operator (, ): To accommodate free word order of different elements in the noun phrases. Non-deterministic operator ( $ ): Relating the corresponding arguments to the corresponding heads. Head Precedence Operator ( <h ): To make it sure that the head must not precede its arguments in the noun phrases. 42 /60
109 Syntax An excerpt from Grammar Rules NP KP*: { (^ ADJUNCT $ OBL)=! (^ ADJUNCT $ OBJ- GO)=! (^ OBL) =! (^ OBJ-GO) =! }, for scrambling AP*:! $ (^ ADJUNCT ) N : ^ =! KP*: { (^ ADJUNCT $ OBL)=! (^ ADJUNCT) <h (^ ADJUNCT $ OBL)... }... Figure: Grammar Rules 43 /60
110 Syntax C-structure for a discontinuous NP CS 1: NP KP KP AP AP N NP K NP K A A istis2na N ko N se h2as3il AInI s3adr muqaddamat Figure: C-structure 44 /60
111 Syntax F-structure for a discontinuous NP "s3adr ko muqaddamat se h2as3il AInI istis2na" 49 PRED OBL ADJUNCT 'istis2na<[34:muqaddamah]>' PRED 'muqaddamah' CHECK _NMORPH obl 34 CASE inst, GEND masc, NUM pl PRED 'h2as2il<[1:s3adr]>' PRED 's3adr' CHECK _NMORPH obl OBJ-GO NSEM COMMON count NTYPE NSYN common 1 CASE dat, GEND masc, NUM sg, PERS 3 CHECK _RESTRICTED - LEX-SEM GOAL + 39 ATYPE attributive PRED 'AInI' ATYPE attributive 44 <s [39:h2As2il] Figure: F-structure 45 /60
112 Syntax Summary Urdu is a typical language in which discontinuous NPs are found both at: Clause-level Constituent-level Constituent-level discontinuity in Urdu can be implemented in LFG framework by making use of: Shuffle operator (, ) Non-deterministic operator ( $ ) Head-precedence operator ( <h ) 46 /60
113 Syntax Overview Overall Architecture tokenizer transliterator (Urdu & Hindi to Roman script) morphology (fst) syntax (c- and f-structure) (xle) semantics (xfr ordered rewriting) 47 /60
114 Semantics Intro Aim: a large-coverage computational semantic analyzer on the basis of a deep syntactic analysis use f-structures as starting point apply xfr semantic rules from f-structure facts to a semantic representation (Crouch and King, 2006) judgment on the semantic well-formedness of a sentence The girl laughs. semantically well-formed #The tree laughs. semantically ill-formed we need lexical information about the words in a sentence 1 lexical resource for Urdu verbs more information on the verb and its arguments 2 general lexical resource for Urdu nouns, adjectives etc. 48 /60
115 Semantics Intro F-structure for nadiyah hansi (Nadya laughed). "nadiyah hansi" PRED SUBJ CHECK 'hans<[1:nadiyah]>' PRED 'nadiyah' NTYPE NSEM PROPER PROPER-TYPE name NSYN proper SEM-PROP SPECIFIC + 1 CASE nom, GEND fem, NUM sg, PERS 3 _VMORPH _MTYPE infl _RESTRICTED -, _VFORM perf LEX-SEM VERB-CLASS unerg TNS-ASP ASPECT perf, MOOD indicative 18 CLAUSE-TYPE decl, PASSIVE -, VTYPE main xfr semantic rule: PRED(%1, hans), SUBJ(%1, %subj), -OBJ(%1, %obj) ==> word(%1, hans, verb), role(agent, %1, %subj). 49 /60
116 Semantics Developing an Urdu VerbNet (1) following the methodology of the English VerbNet (Kipper-Schuler 2006) categorization of English verbs in 250 classes information on event structure and argument structure of verbs provides the general architecture for a VerbNet in any language e.g. parts of the entry for laugh in the English VerbNet 50 /60
117 Semantics Developing an Urdu VerbNet (2) Difficulty: resource sparseness of Urdu Approach 1: translating the entries in the English VerbNet to Urdu figure out problematic cases Approach 2: fully rely on corpus work extend tool for automatic subcategorization extraction (Ghulam, 2010) Can we benefit from a Hindi lexical resource? 51 /60
118 Semantics Hindi WordNet Facts: inspired in methodology and architecture by the English WordNet (Fellbaum 1998) 52 /60
119 Semantics Hindi WordNet developed at the Indian Institute of Technology, Bombay, India separated into four independent semantic nets verbs, nouns, adjectives and adverbs about verbs, nouns, adjectives and adverbs words are grouped according to their meaning similarity ( synsets ) 53 /60
120 Semantics Hindi WordNet Issues far less specific concepts than in the English WordNet Hindi WordNet: TOP Noun Inanimate Object Artifact kitab TOP Noun Inanimate Object Artifact mez English WordNet: entity physical entity object whole unit artifact creation product piece of work publication book entity physical entity object whole unit artifact instrumentatlity furnishing piece of furniture table 54 /60
121 Semantics Benefits for an Urdu VerbNet Preliminary experiments for Urdu/Hindi verbs Resources that we have: the database from Hindi WordNet a list of Urdu verbs out of Hindi verbs, we have found 534 verbs in an Urdu verb list (Humayoun, 2006) complex predicates are included in Hindi WordNet, but not in the Urdu wordlist total of around 700 Urdu verbs more than 2/3 of Urdu verbs are found all found verbs seem to be valid extract verb information from Hindi WordNet for the Urdu VerbNet 55 /60
122 Semantics Urdu Lexical Semantics Polysemy: An extreme case - eat expressions in Hindi/Urdu (Hook and Pardeshi, 2009): employing eat in idiomatic expressions about 160 eat expressions for Hindi/Urdu variety of uses due to loan translations from Persian 56 /60
123 Semantics Urdu Lexical Semantics h2asan=ne kek=ko khaya h2asan.erg cake.acc eat.perf.sg.masc Hasan ate the cake. eat= Agent, Theme inqilabi fikar zang kha jaegi revolutionary thought rust eat go.fut Revolutionary thinking will gather rust. eat (gather rust) = Patient, Theme is sal=ki mandi sheyar-bazar kha gayi this year.gen slowdown.fem stockmarket eat go.fut.fem This year s slowdown wrecked (lit. devoured) the stock market. eat (wreck) = Agent, Theme 57 /60
124 Semantics Urdu Lexical Semantics How do we approach polysemy in the computational semantics? extensive corpus work to find polysemous verbs assign different thematic roles to polysemous verbs? put all combinations in the Urdu VerbNet, but mark the original use? analysis for all sentences, mark idiomatic and semantically ill-formed sentences as such? 58 /60
125 Semantics Wrap up What we have talked about: architecture of the Urdu LFG Grammar ongoing work transliteration discontinuous NPs computational semantics challenges ahead Demo 59 /60
126 Semantics Thank you! 60 /60
Order and Discontinuity within Urdu NPs
Order and Discontinuity within Urdu NPs Ghulam Raza, Tafseer Ahmed and Miriam Butt Department of Linguistics, Universität Konstanz, Germany 19-21 October, 2010 First International Linguistic Conference
What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?
What s in a Lexicon What kind of Information should a Lexicon contain? The Lexicon Miriam Butt November 2002 Semantic: information about lexical meaning and relations (thematic roles, selectional restrictions,
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Word order in Lexical-Functional Grammar Topics M. Kaplan and Mary Dalrymple Ronald Xerox PARC August 1995 Kaplan and Dalrymple, ESSLLI 95, Barcelona 1 phrase structure rules work well Standard congurational
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
A Machine Translation System Between a Pair of Closely Related Languages
A Machine Translation System Between a Pair of Closely Related Languages Kemal Altintas 1,3 1 Dept. of Computer Engineering Bilkent University Ankara, Turkey email:[email protected] Abstract Machine translation
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni
Syntactic Theory Background and Transformational Grammar Dr. Dan Flickinger & PD Dr. Valia Kordoni Department of Computational Linguistics Saarland University October 28, 2011 Early work on grammar There
Correlation: ELLIS. English language Learning and Instruction System. and the TOEFL. Test Of English as a Foreign Language
Correlation: English language Learning and Instruction System and the TOEFL Test Of English as a Foreign Language Structure (Grammar) A major aspect of the ability to succeed on the TOEFL examination is
INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning [email protected]
INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning [email protected] Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Ling 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Semantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Syntax: Phrases. 1. The phrase
Syntax: Phrases Sentences can be divided into phrases. A phrase is a group of words forming a unit and united around a head, the most important part of the phrase. The head can be a noun NP, a verb VP,
Points of Interference in Learning English as a Second Language
Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when
VERB TRANSFER FOR ENGLISH TO URDU MACHINE TRANSLATION (Using Lexical Functional Grammar (LFG))
VERB TRANSFER FOR ENGLISH TO URDU MACHINE TRANSLATION (Using Lexical Functional Grammar (LFG)) MS Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science (Computer
Pupil SPAG Card 1. Terminology for pupils. I Can Date Word
Pupil SPAG Card 1 1 I know about regular plural noun endings s or es and what they mean (for example, dog, dogs; wish, wishes) 2 I know the regular endings that can be added to verbs (e.g. helping, helped,
Albert Pye and Ravensmere Schools Grammar Curriculum
Albert Pye and Ravensmere Schools Grammar Curriculum Introduction The aim of our schools own grammar curriculum is to ensure that all relevant grammar content is introduced within the primary years in
Chapter 1. Introduction. 1.1. Topic of the dissertation
Chapter 1. Introduction 1.1. Topic of the dissertation The topic of the dissertation is the relations between transitive verbs, aspect, and case marking in Estonian. Aspectual particles, verbs, and case
LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.
LESSON THIRTEEN STRUCTURAL AMBIGUITY Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity. Structural or syntactic ambiguity, occurs when a phrase, clause or sentence
English Appendix 2: Vocabulary, grammar and punctuation
English Appendix 2: Vocabulary, grammar and punctuation The grammar of our first language is learnt naturally and implicitly through interactions with other speakers and from reading. Explicit knowledge
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE
SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE OBJECTIVES the game is to say something new with old words RALPH WALDO EMERSON, Journals (1849) In this chapter, you will learn: how we categorize words how words
Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning
Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many
Paraphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining
A Beautiful Four Days in Berlin Takafumi Maekawa (Ryukoku University) [email protected]
A Beautiful Four Days in Berlin Takafumi Maekawa (Ryukoku University) [email protected] 1. The Data This paper presents an analysis of such noun phrases as in (1) within the framework of Head-driven
Outline of today s lecture
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
Compiler I: Syntax Analysis Human Thought
Course map Compiler I: Syntax Analysis Human Thought Abstract design Chapters 9, 12 H.L. Language & Operating Sys. Compiler Chapters 10-11 Virtual Machine Software hierarchy Translator Chapters 7-8 Assembly
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
Interpreting areading Scaled Scores for Instruction
Interpreting areading Scaled Scores for Instruction Individual scaled scores do not have natural meaning associated to them. The descriptions below provide information for how each scaled score range should
LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*
LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* Jonathan Yamron, James Baker, Paul Bamberg, Haakon Chevalier, Taiko Dietzel, John Elder, Frank Kampmann, Mark Mandel, Linda Manganaro, Todd Margolis,
Comparative Analysis on the Armenian and Korean Languages
Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State
Constraints in Phrase Structure Grammar
Constraints in Phrase Structure Grammar Phrase Structure Grammar no movement, no transformations, context-free rules X/Y = X is a category which dominates a missing category Y Let G be the set of basic
University of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus
University of Massachusetts Boston Applied Linguistics Graduate Program APLING 601 Introduction to Linguistics Syllabus Course Description: This course examines the nature and origin of language, the history
The Role of Sentence Structure in Recognizing Textual Entailment
Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Appendix to Chapter 3 Clitics
Appendix to Chapter 3 Clitics 1 Clitics and the EPP The analysis of LOC as a clitic has two advantages: it makes it natural to assume that LOC bears a D-feature (clitics are Ds), and it provides an independent
Contextual Deletion of Object and Ambiguity in Machine Translation
Revista Alicantina de Estudios Ingleses 7 (1994): 23-36 Contextual Deletion of Object and Ambiguity in Machine Translation Gloria Álvarez Benito Universidad de Sevilla J. Gabriel Amores Carredano Universidad
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
IP PATTERNS OF MOVEMENTS IN VSO TYPOLOGY: THE CASE OF ARABIC
The Buckingham Journal of Language and Linguistics 2013 Volume 6 pp 15-25 ABSTRACT IP PATTERNS OF MOVEMENTS IN VSO TYPOLOGY: THE CASE OF ARABIC C. Belkacemi Manchester Metropolitan University The aim of
Syntactic Theory on Swedish
Syntactic Theory on Swedish Mats Uddenfeldt Pernilla Näsfors June 13, 2003 Report for Introductory course in NLP Department of Linguistics Uppsala University Sweden Abstract Using the grammar presented
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA
Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP
Introduction to Transformational Grammar, LINGUIST 601 September 14, 2006 Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the 1 Trees (1) a tree for the brown fox
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
Linguistics & Cognitive Science
Linguistics & Cognitive Science 07.201* History of Cognitive Science Fall term 2000 formal account of pivotal role of linguistics in the development of cognitive science relation to psychology, AI, & philosophy
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
AUTOMATIC F-STRUCTURE ANNOTATION FROM THE AP TREEBANK
AUTOMATIC F-STRUCTURE ANNOTATION FROM THE AP TREEBANK Louisa Sadler Josef van Genabith Andy Way University of Essex Dublin City University Dublin City University [email protected] [email protected]
MODERN WRITTEN ARABIC. Volume I. Hosted for free on livelingua.com
MODERN WRITTEN ARABIC Volume I Hosted for free on livelingua.com TABLE OF CcmmTs PREFACE. Page iii INTRODUCTICN vi Lesson 1 1 2.6 3 14 4 5 6 22 30.45 7 55 8 61 9 69 10 11 12 13 96 107 118 14 134 15 16
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
Noam Chomsky: Aspects of the Theory of Syntax notes
Noam Chomsky: Aspects of the Theory of Syntax notes Julia Krysztofiak May 16, 2006 1 Methodological preliminaries 1.1 Generative grammars as theories of linguistic competence The study is concerned with
THE BACHELOR S DEGREE IN SPANISH
Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text
CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14
CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based
Movement and Binding
Movement and Binding Gereon Müller Institut für Linguistik Universität Leipzig SoSe 2008 www.uni-leipzig.de/ muellerg Gereon Müller (Institut für Linguistik) Constraints in Syntax 4 SoSe 2008 1 / 35 Principles
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Introduction to formal semantics -
Introduction to formal semantics - Introduction to formal semantics 1 / 25 structure Motivation - Philosophy paradox antinomy division in object und Meta language Semiotics syntax semantics Pragmatics
1 Relative clauses. CAS LX 500 Topics: Language Universals Fall 2010, September 23. 4b. The NPAH
CAS LX 500 Topics: Language Universals Fall 2010, September 23 4b. The NPAH 1 Relative clauses The formation of relative clauses A relative clause is something like a wh-question attached to a noun and
Academic Standards for Reading, Writing, Speaking, and Listening
Academic Standards for Reading, Writing, Speaking, and Listening Pre-K - 3 REVISED May 18, 2010 Pennsylvania Department of Education These standards are offered as a voluntary resource for Pennsylvania
Chapter 13, Sections 13.1-13.2. Auxiliary Verbs. 2003 CSLI Publications
Chapter 13, Sections 13.1-13.2 Auxiliary Verbs What Auxiliaries Are Sometimes called helping verbs, auxiliaries are little words that come before the main verb of a sentence, including forms of be, have,
COMPUTATIONAL DATA ANALYSIS FOR SYNTAX
COLING 82, J. Horeck~ (ed.j North-Holland Publishing Compa~y Academia, 1982 COMPUTATIONAL DATA ANALYSIS FOR SYNTAX Ludmila UhliFova - Zva Nebeska - Jan Kralik Czech Language Institute Czechoslovak Academy
Slot Grammars. Michael C. McCord. Computer Science Department University of Kentucky Lexington, Kentucky 40506
Michael C. McCord Computer Science Department University of Kentucky Lexington, Kentucky 40506 This paper presents an approach to natural language grammars and parsing in which slots and rules for filling
According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided
Categories Categories According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided into 1 2 Categories those that belong to the Emperor embalmed
Study Plan. Bachelor s in. Faculty of Foreign Languages University of Jordan
Study Plan Bachelor s in Spanish and English Faculty of Foreign Languages University of Jordan 2009/2010 Department of European Languages Faculty of Foreign Languages University of Jordan Degree: B.A.
Automatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
Formalism. Abstract This paper describes our work on parsing Turkish using the lexical-functional grammar
Parsing Turkish using the Lexical Functional Grammar Formalism Zelal Gungordu 1 Centre for Cognitive Science University of Edinburgh 2 Buccleuch Place Edinburgh, E8 9LW Scotland, U.K. [email protected]
Online Tutoring System For Essay Writing
Online Tutoring System For Essay Writing 2 Online Tutoring System for Essay Writing Unit 4 Infinitive Phrases Review Units 1 and 2 introduced some of the building blocks of sentences, including noun phrases
Writing Common Core KEY WORDS
Writing Common Core KEY WORDS An educator's guide to words frequently used in the Common Core State Standards, organized by grade level in order to show the progression of writing Common Core vocabulary
Section 8 Foreign Languages. Article 1 OVERALL OBJECTIVE
Section 8 Foreign Languages Article 1 OVERALL OBJECTIVE To develop students communication abilities such as accurately understanding and appropriately conveying information, ideas,, deepening their understanding
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2
FUNCTIONAL SKILLS ENGLISH - WRITING LEVEL 2 MARK SCHEME Instructions to marker There are 30 marks available for each of the three tasks, which should be marked separately, resulting in a total of 90 marks.
PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):
PP-Attachment Recall the PP-Attachment Problem (demonstrated with XLE): Chunk/Shallow Parsing The girl saw the monkey with the telescope. 2 readings The ambiguity increases exponentially with each PP.
CALICO Journal, Volume 9 Number 1 9
PARSING, ERROR DIAGNOSTICS AND INSTRUCTION IN A FRENCH TUTOR GILLES LABRIE AND L.P.S. SINGH Abstract: This paper describes the strategy used in Miniprof, a program designed to provide "intelligent' instruction
Introduction. Philipp Koehn. 28 January 2016
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
echd Basic System Requirements website: http://ochre.lib.uchicago.edu/echd/ for the OCHRE database in general see http://ochre.lib.uchicago.
echd Basic System Requirements website: http://ochre.lib.uchicago.edu/echd/ for the OCHRE database in general see http://ochre.lib.uchicago.edu Technical requirements for running the echd: - Java Runtime
Grade Genre Skills Lessons Mentor Texts and Resources 6 Grammar To Be Covered
Grade Genre Skills Lessons Mentor Texts and Resources 6 Grammar To Be Covered 6 Personal Narrative Parts of speech (noun, adj, verb, adv) Complete sentence (subj. and verb) Capitalization Tense (identify)
Sentence Structure/Sentence Types HANDOUT
Sentence Structure/Sentence Types HANDOUT This handout is designed to give you a very brief (and, of necessity, incomplete) overview of the different types of sentence structure and how the elements of
Structure of the talk. The semantics of event nominalisation. Event nominalisations and verbal arguments 2
Structure of the talk Sebastian Bücking 1 and Markus Egg 2 1 Universität Tübingen [email protected] 2 Rijksuniversiteit Groningen [email protected] 12 December 2008 two challenges for a
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages Hiroshi URAE, Taro TEZUKA, Fuminori KIMURA, and Akira MAEDA Abstract Translating webpages by machine translation is the
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25
L130: Chapter 5d Dr. Shannon Bischoff Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25 Outline 1 Syntax 2 Clauses 3 Constituents Dr. Shannon Bischoff () L130: Chapter 5d 2 / 25 Outline Last time... Verbs...
Master of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
CA4003 - Compiler Construction
CA4003 - Compiler Construction David Sinclair Overview This module will cover the compilation process, reading and parsing a structured language, storing it in an appropriate data structure, analysing
How To Translate English To Yoruba Language To Yoranuva
International Journal of Language and Linguistics 2015; 3(3): 154-159 Published online May 11, 2015 (http://www.sciencepublishinggroup.com/j/ijll) doi: 10.11648/j.ijll.20150303.17 ISSN: 2330-0205 (Print);
The finite verb and the clause: IP
Introduction to General Linguistics WS12/13 page 1 Syntax 6 The finite verb and the clause: Course teacher: Sam Featherston Important things you will learn in this section: The head of the clause The positions
