Frequency Dictionary of Verb Phrase Constructions
|
|
|
- Charlene Austin
- 9 years ago
- Views:
Transcription
1 Frequency Dictionary of Verb Phrase Constructions An automatic lexical acquisition method and its applications theses of the Ph.D. dissertation Bálint Sass supervisor: Gábor Prószéky, D.Sc. Pázmány Péter Catholic University, Faculty of Intormation Technology, Multidisciplinary Technical Sciences Doctoral School Budapest, 2011.
2
3 Introduction Részt vesz vmiben. (take part in sg) Górcső alá vesz vmit. (take sg under górcső = examine sg) Although verb subcategorization frames and multiword expressions are a separate field of research both in natural language processing and lexicography, there are such complicated constructions in several languages which are verb subcategorization frames and collocations at the same time. These constructions consist of (at least) two content units normally a verb and a nominal (with casemark/postposition/preposition), and additionally one (or more) valences are also inherent part of the construction. In addition to the above Hungarian examples similar constructions can be found in several languages indeed: get rid of (English), få lov til (Danish; get permission to sg), imati pravo na (szerb; have the right to sg), houden rekening met (holland; take sg into consideration), zijn van toepassing op (holland; concern sg), avoir effet sur (francia; have effect to sg). In the examples mentioned above, there are always two dependents: one of them is filled by a concrete, fixed word constituting a collocation with the verb, while in case of the other dependent, only its place is located by a casemark or a preposition. It can be seen that the dependents are usually connected to the verb by the same linguistic tools (casemarks, postpositions, prepositions or word order constraints); regardless of the fact that the dependent word is a fixed collocate or just an accidental word filling in a valence slot. In the részt vesz vmiben construction the object is a collocate (marked by the -t casemark), 3
4 while in the górcső alá vesz vmit the object is a valence slot. This alternation occurs also among the constructions of the same verb. The pillantást vet vkire (cast a glance at sy) and the szemére vet vmit (upbraid sy with sg) constructions consist of an object and a dependent with -ra/-re casemark equally, but in the first case the object is the collocate and the dependent with -ra/-re casemark is the valence slot, and in the other case just vice versa. Such constructions although our intuition as a native speaker tells us the contrary often are expressly frequent, they constitute an important segment of the constructions of a language, they cannot be treated marginally. They have non-compositional, idiomatic meaning often. Accordingly, they must be included in dictionaries and in language resources of automatic natural language processing tools both. In most cases, it is worth storing their translations as a separate unit, because these translations often contain not predictable elements. There is a need for a data driven computerized method which makes order in the overlapping system of relation markers, and separates dependents containing a fixed word from dependents which can be filled freely. A method that discovers which word is an integral part of a given verb phrase construction as a collocate, and what necessary valence slots are connected to the construction besides. In one word, a method that is able to extract typical verb phrase constructions from corpus. Main result of the dissertation is this method (section 3.3 of the dissertation), and the monolingual Hungarian verb phrase construction dictionary (section 4.2 of the dissertation) which is prepared using this method. The dictionary which is based on the simplest model of verb phrase constructions makes the usefulness of the extraction method tangible. But what gives the real significance of the method is it can be extended in several directions. Firstly, because the model is language independent, after appropriate language specific preprocessing the extraction method can be applied to several languages without modification, therefore, similar dictionaries can be prepared for various languages. Secondly, the method is able to handle more complicated 4
5 constructions (see for example gyenge lábakon áll (stand on weak legs = weak) which contains an additional adjectival collocate compared to the above constructions), and also noun-centered, adjective-centered constructions and so on. Thirdly, applying the model in a special way, the same mentioned lexical acquisition method can be made to handle parallel verb phrase constructions, namely verb phrase constructions and their translations. Such a way, the method can discover parallel construction pairs which are asymmetric, that means the two parts corresponds each other but formally totally different. 5
6
7 Methodology One of the central issues in present-day computational lexicography is the question how much of the traditional manual work can be taken over by computers, how far lexicography can get with purely automatic tools. The corpus is the resource from which the material of a dictionary can be collected automatically. In my research, I follow the strictly corpus-driven approach. I use the corpus not only as an aid, but taking the corpus authentic and representative, I derive the full linguistic information about verb phrase constructions solely from corpus data. During the corpus-driven dictionary material collection the typical verb phrase constructions are determined automatically, and based on corpus frequency a part of them are chosen automatically to include into the dictionary. Present-day huge corpora provide a solid basis for characterizing rarer phenomena too. In recent decades, results of corpus-driven lexicography revolutionized the preparation of the dictionaries in many ways. One important result is that the relevance of multiword lexical units collocations, phrasemes, idiomatic os institutionalized expressions is recognised, and such expressions gain more and more pronounced presence in new dictionaries. As Sinclair said many, if not most, meanings require the presence of more than one word for their normal realisation. In my research, I treat the formally different constructions in a unified framework, whether they are single-word or multiword units, verbs or verb phrase constructions. During the dictionary creation process I put the multiword verb phrase constructions as full-fledged 7
8 lexemes in the focus of my approach as the examples mentioned in the introduction shows. My type independent approach allows to represent the complete verb phrase constructions always, that is no unit is left out which is relevant in terms of the construction. Completeness of constructions is also an important requirement during the evaluation. The Hungarian language has free word order, at least in the sense that the verb and the dependents can occur almost in arbitrary order in the sentence, with possible intermittent units. In other words: verb phrase constructions can be continuous or non-continuous, they can occur in any order variant. Dealing with Hungarian, we can handle word order variability efficiently if we choose a linguistic framework which fits the nature of the language well, namely dependecy grammar. In dependency grammar analysis basic units are usually words. In contrast, in my research I have chosen the morpheme as basic unit to be able to interpret bound morphemes expressing the relation between the verb and the dependent (namely casemarks) as independent units in addition to words. During collecting the typical verb phrase constructions, I do not follow the usual approach which pays only attention whether two words are next to each other or not, but in our case elements of a verb phrase construction are always in a particular dependency relationship with each other. These dependency relationships themselves become full-fledged elements of verb phrase constructions, thereby the mentioned unified framework extends also to verb phrase constructions without a collocate, including verb subcategorization frames. 8
9 New Scientific Results The topic of the dissertation is extracting typical verb phrase constructions from corpus. We focus primarily on constructions which are multiword units and subcategorization frames at the same time, namely complex verb with valences. Such constructions are for example hasznot húz vmiből (pull benefit from sg = benefit from sg), igényt tart vmire (lay claim to sg) or lehetővé tesz vmit (make sg possible). These constructions contain a lexically free dependent (LFD) ( vmiből (from sg), vmire (to sg), vmit (sg)), and a lexically fixed dependent (LXD) ( hasznot (benefit), igényt (claim), lehetővé (possible)) too. First task was to develop a model for Hungarian which can represent all types of verb phrase constructions including the above mentioned type. The solution is a special graph represetation based on dependency analysis. Shaping this model is covered in section 2.1 in the dissertatiom, new results can be summarized as follows: Thesis 1. I developed a model for the Hungarian language which is able to uniformly represent clauses and also formally very different verb phrase constructions inherent in clauses. Basic unit of representation is the clause that is a central verb and its dependents together. Dependents are represented by their most important content unit (the head of 9
10 the phrase in case of nominal phrase dependents) and the relation marker connecting the dependent and the verb (a casemark or a postposition in case of nominal phrase dependents). To sum up: clause = verb + set of dependents dependent = relation marker + content unit Publications related to the thesis: (Sass, 2009c), (Sass, 2009a), (Sass, 2008), (Sass, 2005) The model can be depicted graphically by a 1-level deep dependency tree at best. The root is the verb, edges are the relation markers, and vertices are the content units. The general dependency tree corresponding to the model can be seen in Fig. 1, and also the concrete representation of one of the above constructions. verb tart relation marker relation marker t ra content unit content unit igény Figure 1: Visualization of the model by means of dependency tree. The general dependency tree corresponding to the model can be seen on the left side with relation markers and content units. In turn, a concrete construction can be seen on the right side, namely the igényt tart vmire (lay claim to sg). The arbitrary content unit wich occurs at the -ra LFD is not part of this construction. The next question is if we take a corpus, how its representation according to the above model can be worked out. Naturally, this representation can be derived from a dependency treebank, the other possibility 10
11 is to run a dependency parser on a POS tagged corpus. There is no available dependency treebank for Hungarian of appropriate size, and also no dependency parser developed yet. My dissertation does not cover the development of a Hungarian dependency parser (it could be the topic of another dissertation), but for my research I need a large corpus equipped with good quality representation according to my model. I have chosen the 187 million word Hungarian National Corpus as a representative Hungarian corpus, and investigated whether the suitable representation can be produced using a simple rule based approximate method. It turned out that the clause boundary detection and the partial shallow syntactic parsing of clauses (essentially identification of verbs and nominal dependents) can be done in appropriate quality in a rule based way. The processing of the corpus is discussed in section 2.2 in the dissertation, the moral of this section is uttered in the following thesis: Thesis 2. I showed that starting from POS tagged and disambiguated corpus a reliable model based representation can be produced, using rule based clause boundary detection and rule based shallow syntactic parsing with a relatively simple set of rules. Publications related to the thesis: (Sass, 2006b), (Sass, 2005) Although, in the future the quality of the representation can be improved using a complete dependecy parser, it is good enough in its current state to be the starting point for further research. The resulting representation is in itself a valuable resource. As a special corpus it opens the door to different queries which are unusual 11
12 in corpus query systems: we can prescind from the word order, and investigate verb phrase constructions uniformly, independently from their actual word order. Therefore, I developed the Verb Argument Browser corpus query system which is suitable for investigating typical dependents occurring along verbs and verb-noun collocations. This tool displays the characteristic words occurring as a given dependent, together with the appropriate corpus examples. Basically, the Verb Argument Browser provides two kinds of typical dependents. On the on hand, frequent words with literal meaning which often constitute a semantically coherent class; such as the different kinds of food appearing as direct objects of to eat. Onthe other hand, frequent words which is part of an idiomatic, complex verb or locution; such as kása (mush) as the object of eszik (eat) which is not here beacuse it is a typical food nowadays, but due to the fact that it constitute a saying with the verb: nem eszik olyan forrón a kását (the mush is not eaten that hot = wait a minute!). The Verb Argument Browser is described in section 3.2 in the dissertation, traits of it are worded in the following thesis: Thesis 3. I created the Verb Argument Browser special corpus query system. It can be used to map the dependent structure of verbs, or to identify the essential dependents of verb or verb frames, complex verbs included. The Browser is a useful tool in corpus linguistics research, manual building of lexical resources, and when authentic examples are needed for some verb phrase constructions. Publications related to the thesis: (Sass és Pajzs, 2010b) (Sass, 2009b) (Sass, 2008) (Sass, 2006b) The system can be applied to any corpus if it is equipped with the representation according to the model. The query interface of the original Hungarian version which includes the whole material of the 12
13 Hungarian National Corpus is freely available at nytud.hu/vab. It can be tried by the vendeg temporary user name and mazsola temporary password. Searching in a hundred-million word corpus response times are just a few seconds. Present-day corpora have reached the magnitude where beside manual query tools there is a need for automatic tools which sum up the linguistic information available in corpora. From this viewpoint the Verb Argument Browser is a manual tool, it can present typical words filling in concrete dependent slots. Main result of my dissertation is the automatic method that goes one very important step further: using a corpus it is able to determine which are the typical verb phrase constructions of a verb at all. It is able to determine what are the relevant queries, and runs these queries moreover. Thereby, we can collect all typical verb phrase constructions containing a given verb. Detailed presentation and evaluation of the algorithm can be found in section 3.3 in the dissertation, its essence is summed up in the next thesis: Thesis 4. I worked out a lexical acquisition method which is based on adding up frequencies of sentence skeletons in a special way. This method is capable of extracting characteristic verb phrase constructions of different kinds from a corpus wich is represented according to the model (Thesis 1). Publications related to the thesis: (Sass, 2010d), (Sass és Pajzs, 2010b), (Sass, 2009c) The novelty of the method lies in two facts. On the one hand, it adapts to the length (the number of units) of the verb phrase construction 13
14 resulting in expressions consisting of two and even more units. On the other hand, it is able to discover that in case of a given verb and a given dependent, only the relation marker is relevant (LFD), or the relation marker together with the concrete content unit (LXD). Accordingly it provides constructions containing LFDs and LXDs, or even both of them mixed. The examples mentioned at Thesis 1 belong to the latter group, they are complex verbs with valences: hasznot húz vmiből (pull benefit from sg = benefit from sg), igényt tart vmire (lay claim to sg) and lehetővé tesz vmit (make sg possible). 14
15 Applications The list of verb phrase constructions provided by the algorithm is directly applicable in the creation of a dictionary of verb phrase constructions. Arranging the constructions around verbs we obtain automatically created raw dictionary entries. To reach the quality of a real dictionary some manual lexicographic work should be done. It is not a labour-intensive step, the manual lexicographic work is limited to construction checking and example selection, the dictionary can be created fast and with a small budget. The dictionary is a valence dictionary, a collocation dictionary and a frequency dictionary at the same time. The sophisticated indexes allow comparison os verb phrase constructions according to several aspects. Steps of dictionary creation, the dictionary itself and its possible applications are covered in section 4.2 in the dissertation, its significance is stated in the following thesis: Thesis 5. I created a dictionary of a new kind, whose basic units are not words but expressions: verb phrase oconstructions. The way from bare text to the raw dictionary entries lead using purely automatic natural language processing tools. The most important step is the algorithm for extracting typical verb phrase constructions (Thesis 4) which automates the dictionary material collection step. I showed that this lexical acquisition method is well suited 15
16 for dictionary creation: the final dictionary truly contains the valences and verbaél expressions that are typical in the Hungarian language. In conclusion, a new kind of learners dictionary was created this way which highlights the most important verbal meanings, and allows the language learner to speak idiomatically not just grammatically correct. Publications related to the thesis: (Sass et al., 2010a) (Sass és Pajzs, 2010b) (Pajzs és Sass, 2010) (Sass és Pajzs, 2010c) How can can we use such a dictionary to support language learning or if we want to say something in Hungarian? By the help of the dictionary the verb noun collocations can be discovered: nouns which usually collocates with a given verb and also verbs which usually collocates with a given noun can be determined (using the index of fixed words). Consider that we want to speak Hungarian as an English native speaker. If we search for the translation of meet the requirements knowing that requirement is követelmény in Hungarian, we will find the appropriate verb at követelmény which is megfelel (that is not the literal translation of meet which would be találkozik ). The dictionary (Sass et al., 2010a) is available, it is published by the Tinta Publishing House. The fact that an automatic language processing method is language independent gives strong significance to it. Language independence of my approach depends on the language independence of being able to create the representation. Tools and methods based on the representation (the corpus query system, the lexical acquisition algorithm, the automatic part of the dictionary creation as desribed in the previous theses) work automatically if the represetation is at hand. As the representation relies only on the fact that there is predicate argument 16
17 structure in human languages, it is expected that the representation can be created for several languages. This guess was supported by experiments with languages having a different structure compared to Hungarian, namely Danish and Serbian. Language independence of my approach is covered in section 5.1 in the dissertation, the next thesis contains the results of this section: Thesis 6. I showed that the unified representation (Thesis 1) is language independent, it can be created for several languages. This result essentially depends on that utterances generally can be decomposed into units (clauses) which contain a verb and its dependents, and the dependency relationship between the verb and a dependent can be sepcified. The Verb Argument Browser (Thesis 3) for a language can be prepared with little effort having the representation. The algorithm for extracting typical verb phrase constructions (Thesis 4) can run on any corpus represented according to the model, thereby the collection of verb phrase constructions is feasible independently of language. Ultimately, the dictionary (Thesis 5) can also be created based on this algorithm by investing a limited amount of manual lexicographic work. Publications related to the thesis: (Sass, 2009d) Using my method new learners dictionaries similar to the Hungarian version desribed in the previous thesis can be prepared for new foreign languages which is popular among language learners in Hungary. 17
18 The modell (Thesis 1) can be extended in several ways, specifically, some complex structures can be traced back to the 1-level deep dependency tree shown in Fig. 1 (page 10). The most interesting question is: can we produce a representation which is made of a parallel corpus, and consequently contains parallel clauses and parallel verb phrase constructions (constructions and their translations); but at the same time it has formally the same structure as the original model so the lexical acquisition method can take it as input. In this way, we could gain a method which can extract parallel constructions applying the original extraction method: we would obtain translations of verb phrase constructions. Extensions of the model are discussed in section 5.2 and 5.3 in the dissertation, I report about application of the method to parallel constructions in section 5.4, the last thesis sums up this promising direction. Thesis 7. I showed that the common representation of a parallel clause (two clauses in two different languages corresponding to each other) can be produced as a 1-level deep dependency tree as the original model requires: the central unit becomes a pair consisting of the two verbs (in the two languages), and the dependents are assigned to this central unit as a combined set. Such a way a representation (for parallel corpora) formally similar to the original representation (for monolingual corpora) can be obtained. The lexical acquisition method (Thesis 4) can run on this representation directly extracting bilingual, parallel verb phrase constructions. The method is able to correlate bilingual constructions with each other that are asymmetric, that means have a completely different structure in the two languages. Publications related to the thesis: (Sass, 2010d) 18
19 I conducted the investigations about parallel verb phrase constructions on a Dutch French parallel corpus. For example, in the result I obtained the asymmetric pair of Dutch nemen deel aan and French participer à (both means take part in ). We see where a complex verb is used in Dutch, expressed by one word, a simple verb in French. In the future, this method can be used in the creation of new bilingual dictionaries which facilitate language learning through matching verb phrase constructions extracted from language use. It is a task for the future to work out the details of that kind of bilingual dictionary creation, my work is an important step in this direction. 19
20
21 Acknowledgements I am grateful to my wife, Dóri, to my children, Mici, Csöpi, Lencsi and Jáni and to my extended family for their constant support and encouragement. I would like to say thank you to my supervisor, Gábor Prószéky; to my boss, Tamás Váradi; to my nearest colleague, Csaba Oravecz; to my lexicographer colleague, Júlia Pajzs; and to the the leaders of the Doctoral School, Tamás Roska és Péter Szolgay for professional support and help. I would like to express my gratitude to my friends, colleagues and everybody who helped and supported me by their work, ideas, advices and striking insights during the past years and during the time of thesis writing. 21
22
23 Author s publications Book Sass Bálint Váradi Tamás Pajzs Júlia Kiss Margit 2010a. Magyar igei szerkezetek A leggyakoribb vonzatok és szókapcsolatok szótára [Hungarian Verb Phrase Constructions A Dictionary of Most Frequent Verb Frames and Collocations]. Tinta, Budapest. Journal article Sass Bálint Pajzs, Júlia 2010b. Igei szerkezetek gyakorisági szótára félautomatikus szótárkészítés nyelvtechnológiai eszközök segítségével [Frequency Dictionary of Verb Phrase Constructions Semiautomatic Lexicography Using NLP Tools]. Alkalmazott Nyelvtudomány [Applied Linguistics], 2010(1 2):5 32. Book chapter Sass Bálint 2006a. Extracting Idiomatic Hungarian Verb Frames. In Salakoski, Tapio Ginter, Filip Pyysalo, Sampo Pahikkala, Tapio (eds.): Advances in Natural Language Processing, Springer, Berlin Heidelberg New York. Lecture Notes in Computer Science, Vol
24 Sass Bálint The Verb Argument Browser. In Sojka, Petr Horák, Aleš Kopecek, Ivan Pala, Karel (eds.): Text, Speech and Dialogue, Springer, Berlin Heidelberg New York. Lecture Notes in Computer Science, Vol Sass Bálint 2009a. Korpusznyelvészeti eszköz a magyar igék bővítményszerkezetének vizsgálatára [Corpus Linguistic Tool for Investigating Argument Structure of Hungarian Verbs]. In Sinkovics Balázs (ed.): LingDok 8. Nyelvész-doktoranduszok dolgozatai [Papers of PhD students in Linguistics], JATEPress, Szeged. Sass Bálint 2009b. Mazsola eszköz a magyar igék bővítményszerkezetének vizsgálatára [Mazsola a Tool for Investigating Argument Structure of Hungarian Verbs]. In Váradi Tamás (ed.): Válogatás az I. Alkalmazott Nyelvészeti Doktorandusz Konferencia előadásaiból [A Selection of Papers of Hungarian Student Conference on Applied Linguistics], , RIL HAS, Budapest. Sass Bálint Pajzs Júlia 2010c. FDVC Creating a Corpus-driven Frequency Dictionary of Verb Phrase Constructions. In Granger, Sylviane Paquot, Magali (eds.): elexicography in the 21st century: New challenges, new applications. Proceedings of elex 2009, Cahiers du CEN- TAL 7. Presses universitaires de Louvain, , Louvain-la-Neuve, Belgium. Proceedings of International Conferences Pajzs Júlia Sass Bálint Towards Semi-automatic Dictionary Making. In Proceedings of the XIV. EURALEX International Congress, Sass Bálint First Attempt to Automatically Generate Hungarian Semantic Verb Classes. In Proceedings of the 4th Corpus Linguistics conference, Birmingham. 24
25 Sass Bálint 2009c. A Unified Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian. In Proceedings of RANLP 2009, , Borovets, Bulgaria. Sass Bálint 2009d. Verb Argument Browser for Danish. In Proceedings of the 17th Nordic Conference of Computational Linguistics, NoDaLiDa 2009, , Odense, Denmark. Proceedings of Hungarian Conferences Sass Bálint Vonzatkeretek a Magyar Nemzeti Szövegtárban [Verb Frames in the Hungarian National Corpus]. In Alexin Zoltán Csendes Dóra (ed.): III. Magyar Számítógépes Nyelvészeti Konferencia [3rd Hungarian Conference on Computational Linguistics] (MSZNY2005), , Szeged. Sass Bálint 2006b. Igei vonzatkeretek az MNSZ tagmondataiban [Verb Frames in the Clauses of the Hungarian National Corpus]. In Alexin Zoltán Csendes Dóra (ed.): IV. Magyar Számítógépes Nyelvészeti Konferencia [4th Hungarian Conference on Computational Linguistics] (MSZNY2006), 15 21, Szeged. Sass Bálint 2010d. Párhuzamos igei szerkezetek közvetlen kinyerése párhuzamos korpuszból [Extracting Parallel Verb Phrase Constructions from Parallel Corpus]. In Tanács Attila Vincze Veronika (ed.): VII. Magyar Számítógépes Nyelvészeti Konferencia [7th Hungarian Conference on Computational Linguistics] (MSZNY2010), , SZTE, Szeged. 25
DiCE in the web: An online Spanish collocation dictionary
GRANGER, S.; PAQUOT, M. (EDS.). 2010. ELEXICOGRAPHY IN THE 21ST CENTURY: NEW CHALLENGES, NEW APPLICATIONS. PROCEEDINGS OF ELEX2009, LOUVAIN-LA-NEUVE, 22-24 OCTOBER 2009. CAHIERS DU CENTAL 7. LOUVAIN-LA-NEUVE,
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
Text-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
COMPUTATIONAL DATA ANALYSIS FOR SYNTAX
COLING 82, J. Horeck~ (ed.j North-Holland Publishing Compa~y Academia, 1982 COMPUTATIONAL DATA ANALYSIS FOR SYNTAX Ludmila UhliFova - Zva Nebeska - Jan Kralik Czech Language Institute Czechoslovak Academy
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
MA in English language teaching Pázmány Péter Catholic University *** List of courses and course descriptions ***
MA in English language teaching Pázmány Péter Catholic University *** List of courses and course descriptions *** Code Course title Contact hours per term Number of credits BMNAT10100 Applied linguistics
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Moving Enterprise Applications into VoiceXML. May 2002
Moving Enterprise Applications into VoiceXML May 2002 ViaFone Overview ViaFone connects mobile employees to to enterprise systems to to improve overall business performance. Enterprise Application Focus;
Why major in linguistics (and what does a linguist do)?
Why major in linguistics (and what does a linguist do)? Written by Monica Macaulay and Kristen Syrett What is linguistics? If you are considering a linguistics major, you probably already know at least
LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier. Université Paris-Est Marne-la-Vallée
LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier Université Paris-Est Marne-la-Vallée [email protected] Penguin from http://tux.crystalxp.net/ 1 Linguistic data
Learning Translations of Named-Entity Phrases from Parallel Corpora
Learning Translations of Named-Entity Phrases from Parallel Corpora Robert C. Moore Microsoft Research Redmond, WA 98052, USA [email protected] Abstract We develop a new approach to learning phrase
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
stress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,
Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture
Pragmatic analysis of hotel websites in terms of interpersonal relationships. Theses of the PhD dissertation by. Kovács Péterné Dudás Andrea
Pragmatic analysis of hotel websites in terms of interpersonal relationships Theses of the PhD dissertation by Kovács Péterné Dudás Andrea Eötvös Loránd University Faculty of Humanities Doctoral School
Application of Natural Language Interface to a Machine Translation Problem
Application of Natural Language Interface to a Machine Translation Problem Heidi M. Johnson Yukiko Sekine John S. White Martin Marietta Corporation Gil C. Kim Korean Advanced Institute of Science and Technology
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
THE BACHELOR S DEGREE IN SPANISH
Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text
Reading Listening and speaking Writing. Reading Listening and speaking Writing. Grammar in context: present Identifying the relevance of
Acknowledgements Page 3 Introduction Page 8 Academic orientation Page 10 Setting study goals in academic English Focusing on academic study Reading and writing in academic English Attending lectures Studying
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])
GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning
PACLIC 24 Proceedings 357 GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning Mei-hua Chen a, Chung-chi Huang a, Shih-ting Huang b, and Jason S. Chang b a Institute of Information
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database
I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
Master of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14
CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Testing an electronic collocation dictionary interface: Diccionario de Colocaciones del Español
Testing an electronic collocation dictionary interface: Diccionario de Colocaciones del Español Orsolya Vincze, Margarita Alonso Ramos Universidade da Coruña, Campus da Zapateira s/n, A Coruña 15071, Spain
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
Overview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
Veronika VINCZE, PhD. PERSONAL DATA Date of birth: 1 July 1981 Nationality: Hungarian
Veronika VINCZE, PhD CONTACT INFORMATION Hungarian Academy of Sciences Research Group on Artificial Intelligence Tisza Lajos krt. 103., 6720 Szeged, Hungary Phone: +36 62 54 41 40 Mobile: +36 70 22 99
ICAME Journal No. 24. Reviews
ICAME Journal No. 24 Reviews Collins COBUILD Grammar Patterns 2: Nouns and Adjectives, edited by Gill Francis, Susan Hunston, andelizabeth Manning, withjohn Sinclair as the founding editor-in-chief of
Research Portfolio. Beáta B. Megyesi January 8, 2007
Research Portfolio Beáta B. Megyesi January 8, 2007 Research Activities Research activities focus on mainly four areas: Natural language processing During the last ten years, since I started my academic
Doctoral School of Historical Sciences Dr. Székely Gábor professor Program of Assyiriology Dr. Dezső Tamás habilitate docent
Doctoral School of Historical Sciences Dr. Székely Gábor professor Program of Assyiriology Dr. Dezső Tamás habilitate docent The theses of the Dissertation Nominal and Verbal Plurality in Sumerian: A Morphosemantic
Reliability on ARMA Examinations: How we do it at Miklós Zrínyi National Defence University
AARMS Vol. 3, No. 2 (2004) 319 325 EDUCATION Reliability on ARMA Examinations: How we do it at Miklós Zrínyi National Defence University ILONA VÁRNAINÉ KIS Miklós Zrínyi National Defence University, Budapest,
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Outline of today s lecture
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
Transaction-Typed Points TTPoints
Transaction-Typed Points TTPoints version: 1.0 Technical Report RA-8/2011 Mirosław Ochodek Institute of Computing Science Poznan University of Technology Project operated within the Foundation for Polish
Methods for the Extraction of Hungarian Multi-Word Lexemes
Methods for the Extraction of Hungarian Multi-Word Lexemes Balázs Kis*, Begoña Villada Moirón, Tamás Bíró, Gosse Bouma, Gábor Pohl*, Gábor Ugray*, John Nerbonne Rijksuniversiteit Groningen * MorphoLogic,
Data Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems. Balazs Balasko
Theses of the doctoral (PhD) dissertation Integration of Process Simulation and Data Mining Techniques for the Analysis and Optimization of Process Systems Balazs Balasko University of Pannonia PhD School
PAGE(S) WHERE TAUGHT (If submission is not a book, cite appropriate location(s))
Prentice Hall: Sendas Literarias 1, Español Completo Para Hispanohablantes with Guía del maestro 2001 Students will exhibit these skills at the end of a K 12 sequence. Communication: Communicate in Languages
Modern foreign languages
Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007
Section 8 Foreign Languages. Article 1 OVERALL OBJECTIVE
Section 8 Foreign Languages Article 1 OVERALL OBJECTIVE To develop students communication abilities such as accurately understanding and appropriately conveying information, ideas,, deepening their understanding
TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES
22 TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES Roxana CIOLĂNEANU Abstract Teaching a foreign language goes beyond teaching the language itself. Language is rooted in culture; it
User studies, user behaviour and user involvement evidence and experience from The Danish Dictionary
User studies, user behaviour and user involvement evidence and experience from The Danish Dictionary Henrik Lorentzen, Lars Trap-Jensen Society for Danish Language and Literature, Copenhagen, Denmark E-mail:
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology
Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology 1 P age Dr. Maruf Hasan MS CIS Program Potential Project
Study Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
Listening Student Learning Outcomes
Listening Student Learning Outcomes Goals for Learning Has sufficient vocabulary to comprehend an unsimplified academic lecture Can paraphrase academic discourse effectively in writing and discussion from
DEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
How To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?
Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]
EFL Learners Synonymous Errors: A Case Study of Glad and Happy
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 1, No. 1, pp. 1-7, January 2010 Manufactured in Finland. doi:10.4304/jltr.1.1.1-7 EFL Learners Synonymous Errors: A Case Study of Glad and
ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION
General and Professional Education 3/2013 pp. 21-27 ISSN 2084-1469 ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION Svetlana Sheremetyeva Department
Change by Successful Projects Csaba Deák Associate Professor University of Miskolc [email protected]
Change by Successful Projects Csaba Deák Associate Professor University of Miskolc [email protected] Summary The article attempts to answer the management and organizational questions which have arisen
Using Expert System in the Military Technology Research and Development
MIKLÓS ZRÍNYI NATIONAL DEFENSE UNIVERSITY Doctoral Committee MAJOR GÁBOR HANGYA Using Expert System in the Military Technology Research and Development author s review and official critiques of the entitled
Visual Interactive Syntax Learning: A Case of Blended Learning
Visual Interactive Syntax Learning: A Case of Blended Learning Jane Vinther, ph.d., Institut for Sprog og Kommunikation, Syddansk Universitet Jane Vinther is cand.mag. in English (major) and general pedagogy
SignLEF: Sign Languages within the European Framework of Reference for Languages
SignLEF: Sign Languages within the European Framework of Reference for Languages Simone Greiner-Ogris, Franz Dotter Centre for Sign Language and Deaf Communication, Alpen Adria Universität Klagenfurt (Austria)
CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature
CS4025: Pragmatics Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature For more info: J&M, chap 18,19 in 1 st ed; 21,24 in 2 nd Computing Science, University of
Processing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
The Oxford Learner s Dictionary of Academic English
ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students
