Topological Field Chunking in German
|
|
|
- Letitia Hodges
- 10 years ago
- Views:
Transcription
1 Topological Field Chunking in German Jorn Veenstra, Frank H. Müller, Tylman Ule ESSLLI Summerschool Workshop on Machine Learning Approaches in Computational Linguistics August 7, 2002 Topological Field Chunking in German p.1/35
2 German English Sentence structure in English lends itself to chunk analysis, followed by grammatical role assignment. In other Germanic languages the sentence structure can be different. NPs structure is different, VPs are less clustered. The strategy proposed the USA was accepted the NATO yesterday. by by Die von den USA vorgeschlagene Strategie wurde gestern von der NATO übernommen. Topological Field Chunking in German p.2/35
3 Topological Field Theory (TopF) Topological Field Theory gives a handle to divide the sentence into labeled verbal and non-verbal parts: Non-Verbal parts: VF: Vorfeld, MF: Mittelfeld, NF: Nachfeld Verbal parts: CF: Complementizer, LK: Linke Klammer, RK: Rechte Klammer Die von den USA vorgeschlagene Strategie wurde gestern von der NATO übernommen. Topological Field Chunking in German p.3/35
4 Overview Sentence Structure in German Topological Field Theory The Data: TüBa-D/Z Three TopFChunkers: FSA, PCFG, MBL Results Future Research Topological Field Chunking in German p.4/35
5 Sentence Structure in German I Verbs can be far apart in German sentences In so-called V2 sentences the finite verb is always in second position. The other verbal elements is further up in the sentence. Ich habe Spaghetti gestern mit meinen Freunden gegessen. I have yesterday with my friends spaghetti eaten. Topological Field Chunking in German p.5/35
6 Sentence Structure in German II The finite verb will stay in second position: Mit meinen Freunden Spaghetti gegessen habe. ich gestern With my friends have I yesterday spaghetti eaten. Mit meinen Freunden is in first position now, forcing the subject Ich after the verb. Topological Field Chunking in German p.6/35
7 Sentence Structure in German III In the Verb-Last constructions, verbs are clustered at the end of the sentence. This construction occurs e.g. in relative clauses: Er glaubt, dass ich gestern mit meinen Freunden Spaghetti gegessen habe He thinks that I yesterday with my friends spaghetti eaten have.. Topological Field Chunking in German p.7/35
8 Sentence Structure in German IV In yes/no-questions the finite verb is in first position, the so-called V1 sentences: Hast Spaghetti du gestern mit deinen Freunden gegessen? Have you yesterday with your friends spaghetti eaten? Topological Field Chunking in German p.8/35
9 Sentence Structure in German V There can be more after the last verbal chunk: Ich habe gestern Spaghetti in einer Pizzeria. gegessen I have yesterday spaghetti eaten in a pizzeria. Topological Field Chunking in German p.9/35
10 Sentence Structure in German VI The finite verb forms its own constituent. However, the non-finite verbal constituent can contain many verbs, the so-called verb cluster. Das sind allerdings Gelder, die vom Schatzmeister hintergezogen worden sein sollen. These are, however, funds which by the treasurer defrauded been have should. Topological Field Chunking in German p.10/35
11 Topological Fields Descriptive model of the constituent order in German(ic), the TopF structure gives the skeleton of the sentence. Topological fields describe sections with respect to the distributional properties of the verbs in German sentences. CF: complementizer, LK: left bracket, RK: right bracket, VF: initial field, MF: middle field, NF: final field The TopFs CF, LK and RK form the verbal frame relative to which the other fields are defined. The VF, MF and NF are defined relative to the Topological Field Chunking in German p.11/35 verbal fields.
12 Sentence Structure in German type compl. field topological fields verb complex (VL) (CF) (MF) (RK) (NF) finite verb verb complex (V1) (LK) (MF) (RK) (NF) finite verb verb complex (V2) (VF) (LK) (MF) (RK) (NF) left part right part Figure 1: VL=Verb-Last; V1=Verb-First; V2= Verb-Second; VF=Initial Field; MF=Middle Field; NF=Final Field Topological Field Chunking in German p.12/35
13 An example of an analysed sentence SIMPX 516 VF ON 515 SIMPX MF OA PRED C MF VC LK ADJX ON OD OA HD HD NX NX NX VXFIN VXFIN NX ADJX ADJX HD HD HD HD HD HD HD HD Wer seinem Publikum eine solche Reaktion prophezeit PWS PPOSAT NN ART PIDAT NN VVFIN $, VVFIN PRF KON ADJD KON ADJD $., macht sich entweder lächerlich Who his audience a such reaction predicts, makes himself or ridiculous or legendary. oder legendär. Topological Field Chunking in German p.13/35
14 Other Languages, e.g. Dutch Ik had willen eten gisteren met mijn vrienden pasta bij de pizzeria. I had yesterday with my friends pasta want eat at the pizzeria. Ik pasta had willen eten gisteren met mijn vrienden bij de pizzeria. Topological Field Chunking in German p.14/35
15 The Corpus: TüBa-D/Z German newspaper text from the Tageszeitung (taz). Annotation style comparable to VERBMOBIL about 7300 sentences annotated and POS-tagged with Topological Field structure Topological Field Chunking in German p.15/35
16 An example of an annotated sentence SIMPX 516 NF MOD 515 VF ON NX Katastrophenstimmung NN cngs:nsfw LK HD VXFIN herrscht VVFIN pnmt:3sis MF MOD ADVX HD HD HD HD HD HD HD erst ADV, $, wenn KOUS HD NX nichts PIS cng:*** 513 ADVX mehr ADV 514 zu PTKZU OV VXINF verheimlichen Catastrophe-mood prevails only, when nothing anymore to C hide is MF ON NX SIMPX VVINF VC HD VXFIN ist VAFIN pnmt:3sis. $. Topological Field Chunking in German p.16/35
17 The Task: TopF chunking Find a task with high accuracy to find the basic structure of German sentences. TopFChunking finds the most common verbal parts of the Topological Field structure: CF (compl), LK (left) and RK (right). This gives us a handle to find the basic structure of the sentence. TopFChunking is clearly defined, and can be expected to be performed with a high accuracy. Topological Field Chunking in German p.17/35
18 TopF chunking: some examples Die von den USA vorgeschlagene Strategie wurde gestern von der NATO übernommen. Ich habe gestern mit meinen Freunden Spaghetti gegessen. Das sind allerdings Gelder, die vom Schatzmeister hintergezogen worden sein sollen. Topological Field Chunking in German p.18/35
19 Three Chunkers Rule-based hand-crafted Finite State Automata (FSA), by Frank H. Müller. Rule-Based corpus trained PCFG: Probabilistic Context Free Grammars, by Tylman Ule Machine Learning Approach: Memory-Based Natural Language Processing, by Jorn Veenstra Topological Field Chunking in German p.19/35
20 Finite-State Automata I topological fields structure can be captured by regular expression grammar, thus, it can be translated into finite state automaton (FSA) FSA = automaton which accepts input symbols according to transition function FSA can act as a transducer POS tags are input of transducer and topological fields and POS tags are output transition function is described by hand-written rules Topological Field Chunking in German p.20/35
21 FSA II, a simplified example simple examples: eine Tatsache, [mit der] man leben kann ein Mensch, [dem] man vertrauen kann eine Frage, die [aufgeworfen werden könnte] recursion can be covered by iteration of FSAs (cascade) Topological Field Chunking in German p.21/35
22 FSA III, some POS tags APPR=preposition, PRELS=relative pronoun, APPO=postposition, VVPP=past participle, VAINF=auxiliary verb infinitive, VAFIN=auxiliary verb finite, VMFIN=modal verb finite Topological Field Chunking in German p.22/35
23 FSA IV, cannot be too simple The seemingly simple and effective FSA: is not correct. e.g. Ich [bin] [gegangen] I have gone or: Mit meinem Bruder [gespielt] [hab] ich nie. with my brother played have I never Topological Field Chunking in German p.23/35
24 Probabilistic Context Free Grammars The FSA approach uses hand-crafted non-probabilistic rules to recognise the TopFChunks. The PCFG approach extracts context-free, probabilistic rules from the corpus. The PCFG chunker uses POS information only. Used the LOPAR (Schmidt, 2000). Topological Field Chunking in German p.24/35
25 PCFG II Design rules over POS tags to predict TopF chunks. Extract CF rules from the corpus and count their frequencies. Assign probabilities to rules according to these frequencies. For parsing new sentences parse bottom up, prune unprobable parses and work out the most probable routes. Topological Field Chunking in German p.25/35
26 PCFG III: an example SIMPX 516 NF 515 MOD VF ON NX Katastrophenstimmung NN cngs:nsfw LK HD VXFIN herrscht VVFIN pnmt:3sis MF MOD ADVX HD HD HD HD HD HD HD erst ADV, $, wenn KOUS HD NX nichts PIS cng:*** 513 ADVX mehr ADV 514 zu PTKZU OV VXINF verheimlichen Catastrophe-mood prevails only, when nothing anymore to C hide is MF ON NX SIMPX VVINF VC HD VXFIN ist VAFIN pnmt:3sis. $. Topological Field Chunking in German p.26/35
27 Topological Field Chunking in German p.27/35 C SfS L PCFG IV: an example
28 Memory-Based Learning MBL is a machine learning approach to NLP Learning Stage is Memory-Based Classification stage is similarity-based We have used TiMBL, an MBL software tool. Approach TopFChunking as a word-by-word classification task, comparable to POS tagging and NP chunking. Topological Field Chunking in German p.28/35
29 Memory-Based Learning Assign to each word a tag: inside or outside one of the TopF chunks, or between two chunks. This gives us three tags (I, O & B) per TopF type (CF, LK & RK). Each word in the corpus data is annotated with one of these nine IOB tags. The MBL chunker is first trained on the corpus annotated with these IOB tags. We have used a context window of two words and POS tags to the left and to the right, and information gain to weight the feature relevance. Topological Field Chunking in German p.29/35
30 Memory-Based Learning C SfS L gelacht gestern hat Der Mann. gestern hat. Der Mann gelacht der DET Mann NN hat V gestern TEMP gelacht V --- I LK Topological Field Chunking in German p.30/35
31 Baseline As a baseline we have computer the most probable TopF IOB tag for each POS tag in the train set. We have used these TopT tags to chunk the test set. Topological Field Chunking in German p.31/35
32 Experimental setup first 5000 sentences as training data last 2300 sentences as test data FSA developer was not allowed to see the test set. Since the FSA and PCFG chunkers predicted too much, their output was converted to the TopF chunk format As the TüBa corpus is still under development, some changes were made during the development of the chunkers, this might have harmed the performance of the chunkers. Topological Field Chunking in German p.32/35
33 Results ALL LK VC C FSA TnT FSA gold PCFG TnT PCFG gold MBL TnT MBL gold baseline TnT baseline gold Topological Field Chunking in German p.33/35
34 Conclusions scores are high for all three approaches Many errors are caused by POS tag errors Since TopFChunking is a well defined task, rule-based and hand-crafted approaches work relatively well. Memory-Based Approaches seem to suffer more from sparse data than FSA and PCFG. Such a high performance gives a solid basis for parsing and other text analysis tasks, such as information extraction and grammatical function assignment. Topological Field Chunking in German p.34/35
35 Future Work Combining chunkers to optimise performance Annotating the full TopF structure of a sentence Sentence type classifier NP chunking with the German idiosyncrasies Grammatical function assignment Topological Field Chunking in German p.35/35
The finite verb and the clause: IP
Introduction to General Linguistics WS12/13 page 1 Syntax 6 The finite verb and the clause: Course teacher: Sam Featherston Important things you will learn in this section: The head of the clause The positions
Machine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
German Language Resource Packet
German has three features of word order than do not exist in English: 1. The main verb must be the second element in the independent clause. This often requires an inversion of subject and verb. For example:
Corpuslinguistics Annis 2 -Corpus query tool searching in Falko
Corpuslinguistics Annis 2 -Corpus query tool searching in Falko Marc Reznicek, Anke Lüdeling, Amir Zeldes, Hagen Hirschmann [email protected]... ánd many more members of the HU corpus team
Search Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann
Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation
Exemplar for Internal Achievement Standard. German Level 1
Exemplar for Internal Achievement Standard German Level 1 This exemplar supports assessment against: Achievement Standard 90885 Interact using spoken German to communicate personal information, ideas and
Satzstellung. Satzstellung Theorie. learning target. rules
Satzstellung learning target Aim of this topic is to explain how to arrange the different parts of a sentence in the correct order. I must admit it took quite a long time to handle this topic and find
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
German Language Support Package
German Language Support Package August 2014 Dear Parents and Students of Goethe International Charter School, Welcome to a new and exciting school year of great learning experiences and success! The key
German Language Support Package
German Language Support Package September 2014 Dear Parents and Students of Goethe International Charter School, Welcome to a new and exciting school year of great learning experiences and success! The
FOR TEACHERS ONLY The University of the State of New York
FOR TEACHERS ONLY The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION G COMPREHENSIVE EXAMINATION IN GERMAN Friday, June 15, 2007 1:15 to 4:15 p.m., only SCORING KEY Updated information
GCE EXAMINERS' REPORTS
GCE EXAMINERS' REPORTS GERMAN AS/Advanced JANUARY 2014 Grade boundary information for this subject is available on the WJEC public website at: https://www.wjecservices.co.uk/marktoums/default.aspx?l=en
GCE German (2660) Candidate Exemplar Work: Additional Candidate Exemplar Work Unit 1 (Autumn 2008)
hij Teacher Resource Bank GCE German (2660) Candidate Exemplar Work: Additional Candidate Exemplar Work Unit 1 (Autumn 2008) The Assessment and Qualifications Alliance (AQA) is a company limited by guarantee
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Exemplar for Internal Assessment Resource German Level 1. Resource title: Planning a School Exchange
Exemplar for internal assessment resource German 1.5A for Achievement Standard 90887! Exemplar for Internal Assessment Resource German Level 1 Resource title: Planning a School Exchange This exemplar supports
Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna)
Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna) VII Conference on Legal Translation, Court Interpreting and Comparative Legilinguistics Poznań, 28-30.06.2013
LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH
LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH Gertjan van Noord Deliverable 3-4: Report Annotation of Lassy Small 1 1 Background Lassy Small is the Lassy corpus in which the syntactic annotations
LEJ Langenscheidt Berlin München Wien Zürich New York
Langenscheidt Deutsch in 30 Tagen German in 30 days Von Angelika G. Beck LEJ Langenscheidt Berlin München Wien Zürich New York I Contents Introduction Spelling and pronunciation Lesson 1 Im Flugzeug On
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])
An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models
Dissertation (Ph.D. Thesis) An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models Christian Siefkes Disputationen: 16th February
bound Pronouns
Bound and referential pronouns *with thanks to Birgit Bärnreuther, Christina Bergmann, Dominique Goltz, Stefan Hinterwimmer, MaikeKleemeyer, Peter König, Florian Krause, Marlene Meyer Peter Bosch Institute
AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES
AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES Interpersonal Writing: E-mail Reply 5: STRONG performance in Interpersonal Writing Maintains the exchange with a response that is clearly appropriate
All the English here is taken from students work, both written and spoken. Can you spot the errors and correct them?
Grammar Tasks This is a set of tasks for use in the Basic Grammar Class. Although they only really make sense when used with the other materials from the course, you can use them as a quick test as the
Processing Dialogue-Based Data in the UIMA Framework. Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg
Processing Dialogue-Based Data in the UIMA Framework Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg Overview Background Processing dialogue-based Data Conclusion Gnjatović, Kunze,
WhyshouldevenSMBs havea lookon ITIL andit Service Management and
WhyshouldevenSMBs havea lookon ITIL andit Service Management and howcoulditil beusefulforthem? IT Service Management for SMEs: Challenges and Opportunites 27 October 2011, Debrecen, Hungary ITIL is a Registered
LEHMAN BROTHERS SECURITIES N.V. LEHMAN BROTHERS (LUXEMBOURG) EQUITY FINANCE S.A.
SUPPLEMENTS NO. 2 DATED 6 JUNE 2008 in accordance with 6(2) and 16 of the German Securities Prospectus Act to the two published Base Prospectuses, one per Issuer (together the "Base Prospectus") relating
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010 OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION SMT VS. RBMT [K.
Reference Determination for Demonstrative Pronouns
Peter Bosch & Carla Umbach University of Osnabrück, Germany This paper discusses results from a corpus study of German demonstrative and personal pronouns and from a reading time experiment in which we
A Joint Sequence Translation Model with Integrated Reordering
A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani, Helmut Schmid and Alexander Fraser Institute for Natural Language Processing University of Stuttgart Introduction Generation
Chinese Open Relation Extraction for Knowledge Acquisition
Chinese Open Relation Extraction for Knowledge Acquisition Yuen-Hsien Tseng 1, Lung-Hao Lee 1,2, Shu-Yen Lin 1, Bo-Shun Liao 1, Mei-Jun Liu 1, Hsin-Hsi Chen 2, Oren Etzioni 3, Anthony Fader 4 1 Information
0525 GERMAN (FOREIGN LANGUAGE)
CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education MARK SCHEME for the May/June 2014 series 0525 GERMAN (FOREIGN LANGUAGE) 0525/21 Paper 2 (Reading and Directed
IAC-BOX Network Integration. IAC-BOX Network Integration IACBOX.COM. Version 2.0.1 English 24.07.2014
IAC-BOX Network Integration Version 2.0.1 English 24.07.2014 In this HOWTO the basic network infrastructure of the IAC-BOX is described. IAC-BOX Network Integration TITLE Contents Contents... 1 1. Hints...
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.
Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Translational Equivalence Er hat die Prüfung bestanden, jedoch
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Modalverben Theorie. learning target. rules. Aim of this section is to learn how to use modal verbs.
learning target Aim of this section is to learn how to use modal verbs. German Ich muss nach Hause gehen. Er sollte das Buch lesen. Wir können das Visum bekommen. English I must go home. He should read
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow
A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system
Student Booklet. Name.. Form..
Student Booklet Name.. Form.. 2012 Contents Page Introduction 3 Teaching Staff 3 Expectations 3 Speaking German 3 Organisation 3 Self Study 4 Course Details and Contents 5/6 Bridging the gap and quiz 7/8
Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014
Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook
Vergleich der Versionen von Kapitel 1 des EU-GMP-Leitfaden (Oktober 2012) 01 July 2008 18 November 2009 31 Januar 2013 Kommentar Maas & Peither
Chapter 1 Quality Management Chapter 1 Quality Management System Chapter 1 Pharmaceutical Quality System Principle The holder of a Manufacturing Authorisation must manufacture medicinal products so as
GM0101S HEADSTART STUDENT GUIDE PREPARED IY THE DEFENSE LANGUAGE INSTITUTE FOREIGN LANGUAGE CENTER
GM0101S HEADSTART PREPARED IY THE DEFENSE LANGUAGE INSTITUTE FOREIGN LANGUAGE CENTER GERMAN PROGRAM Prepared by DEFENSE LANGUAGE INSTITUTE FOREIGN LANGUAGE CENTER SEPTEMBER 1977 CONTENTS Course Description
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
Chapter 6. Decoding. Statistical Machine Translation
Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for translation p(e f) Task of decoding: find the translation e best with highest probability Two types of error
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Scrambling in German { Extraction into te Mittelfeld Stefan Muller y Humboldt Universitat zu Berlin August, 199 Abstract German is a language wit a relatively free word order. During te last few years
Version 1.0: 0612. General Certificate of Secondary Education June 2012. German. (Specification 4665) Unit 4: Writing. Mark Scheme
Version 1.0: 0612 General Certificate of Secondary Education June 2012 German 46654 (Specification 4665) Unit 4: Writing Mark Scheme Mark schemes are prepared by the Principal Examiner and considered,
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
Text Generation for Abstractive Summarization
Text Generation for Abstractive Summarization Pierre-Etienne Genest, Guy Lapalme RALI-DIRO Université de Montréal P.O. Box 6128, Succ. Centre-Ville Montréal, Québec Canada, H3C 3J7 {genestpe,lapalme}@iro.umontreal.ca
Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde LREC 2012, Istanbul May 25, 2012 NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics September
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Lernsituation 9. Giving information on the phone. 62 Lernsituation 9 Giving information on the phone
Fachkunde 1, Lernfeld 2, Useful Office Vocabulary Lernsituation 9 Giving information on the phone Rolf astian, Managing Director of E Partners KG, recently visited the Promo World Fair in Düsseldorf, an
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Successful Collaboration in Agile Software Development Teams
Successful Collaboration in Agile Software Development Teams Martin Kropp, Magdalena Mateescu University of Applied Sciences Northwestern Switzerland School of Engineering & School of Applied Psychology
GCE EXAMINERS' REPORTS. GERMAN AS/Advanced
GCE EXAMINERS' REPORTS GERMAN AS/Advanced JANUARY 2012 Statistical Information This booklet contains summary details for each unit: number entered; maximum mark available; mean mark achieved; grade ranges.
Coffee Break German Lesson 06
LESSON NOTES WIE VIEL KOSTET DAS? In this episode of Coffee Break German we ll start by learning the numbers from zero to ten and then learn to deal with transactional situations involving paying for things
Negation and (finite) verb placement in a Polish-German bilingual child. Christine Dimroth, MPI, Nijmegen
Negation and (finite) verb placement in a Polish-German bilingual child Christine Dimroth, MPI, Nijmegen Consensus: Bilingual L1A bilingual children develop two separate linguistic systems from early on
Übungen zur Vorlesung Einführung in die Volkswirtschaftslehre VWL 1
Übungen zur Vorlesung Einführung in die Volkswirtschaftslehre VWL 1 Übungen Kapitel 31/38 Beat Spirig Aufgabe 31.4, UK capital outflow NCO = purchases of foreign assets by domestic residents purchases
Outline of today s lecture
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
AP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES
AP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES Identical to Scoring Guidelines used for French, Italian, and Spanish Language and Culture Exams Interpersonal Writing: E-mail Reply 5: STRONG
Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
AS Music Bach Chorale Cadence exercises
AS Music Bach Chorale Cadence exercises These ten chorale exercises (with solutions) are intended for use as EdExcel AS tests, but would also be useful as preparatory tests for other Bach chorale tests.
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
SYNTAX AND SEMANTICS OF CAUSAL DENN IN GERMAN TATJANA SCHEFFLER
SYNTAX AND SEMANTICS OF CAUSAL DENN IN GERMAN TATJANA SCHEFFLER Department of Linguistics University of Pennsylvania [email protected] This paper presents a new analysis of denn (because) in German.
O CONNOR ebook How do I say that in English?
O CONNOR ebook How do I say that in English? O CONNOR Kommunikation ohne Grenzen Ihr Spezialist für Sprachund Kommunikationstraining O CONNOR ebook How do I say that in English? Dear English Learner, First
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
The ALeSKo learner corpus: Design annotation quantitative. [short title for running heads: The ALeSKo learner corpus]
1 The ALeSKo learner corpus: Design annotation quantitative analyses [short title for running heads: The ALeSKo learner corpus] Heike Zinsmeister University of Konstanz Linguistics Department Box 185 78457
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Complex Predications in Argument Structure Alternations
Complex Predications in Argument Structure Alternations Stefan Engelberg (Institut für Deutsche Sprache & University of Mannheim) Stefan Engelberg (IDS Mannheim), Universitatea din Bucureşti, November
German Beginners. Stage 6 Syllabus. Preliminary and HSC Courses
German Beginners Stage 6 Syllabus Preliminary and HSC Courses Original published version updated: June 2009 Assessment and Reporting information updated 2009 Copyright Board of Studies NSW for and on behalf
(Incorporated as a stock corporation in the Republic of Austria under registered number FN 33209 m)
Prospectus Supplement No. 2 Erste Group Bank AG (Incorporated as a stock corporation in the Republic of Austria under registered number FN 33209 m) EUR 30,000,000,000 Debt Issuance Programme This supplement
Varieties of specification and underspecification: A view from semantics
Varieties of specification and underspecification: A view from semantics Torgrim Solstad D1/B4 SFB meeting on long-term goals June 29th, 2009 The technique of underspecification I Presupposed: in semantics,
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Wie ist das Wetter? (What s the weather like?)
Prior Knowledge: It is helpful if children already know some dates, weather phrases and items of clothing Objectives Present ideas and information orally to a range of audiences. Describe people, places,
I Textarbeit. Text 1. I never leave my horse
BEJ Musterprüfung Englisch (11020) 1 I Textarbeit Text 1 I never leave my horse 1 5 10 15 20 Police officers in Ireland don t carry guns. But they often ride through Dublin on horses. Julie Folan is a
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics
