Two Case Studies on Translating Pronouns in a Deep Syntax Framework
|
|
|
- August James
- 9 years ago
- Views:
Transcription
1 Two Case Studies on Translating Pronouns in a Deep Syntax Framework Michal Novák, Zdeněk Žabokrtský and Anna Nedoluzhko Charles University in Prague, Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Malostranské náměstí 25, CZ {mnovak,zabokrtsky,nedoluzko}@ufal.mff.cuni.cz Abstract We focus on improving the translation of the English pronoun it and English reflexive pronouns in an English-Czech syntaxbased machine translation framework. Our evaluation both from intrinsic and extrinsic perspective shows that adding specialized syntactic and coreference-related features leads to an improvement in translation quality. 1 Introduction Machine Translation (MT) is an extremely broad task and can be decomposed along various directions. One of them lies in using specialized translation models (TMs) for certain types of language expressions. For instance, different types of named entities often receive specialized treatment in real translation systems. This paper deals with introducing specialized TMs for two types of pronouns: the pronoun it and reflexive pronouns. The models are integrated into an English-Czech syntax-based MT framework. Several works have previously focused on translating pronouns. The linguistic study of Morin (2009) investigated the translation of pronouns, proper names and kinship terms from Indonesian into English. Onderková (2010) has conducted a corpus-based research on possesive pronouns in Czech and English, focusing especially on their use with parts of the human body. From the perspective of MT, translating personal pronouns from English to morphologically richer languages, such as French (Le Nagard and Koehn, 2010), German (Hardmeier and Federico, 2010) and Czech (Guillou, 2012) has recently aroused higher interest. In these languages, one usually has to ensure agreement in gender and number between the pronoun and its direct antecedent, which requires a coreference resolver to be involved. In this work, we make use of the English-to- Czech translation implemented within the TectoMT system (Žabokrtský et al., 2008). In contrast to the phrase-based approach (Koehn et al., 2003), TectoMT performs a tree-to-tree machine translation. An input English sentence is first analyzed into its deep-syntactic representation, which is subsequently transferred into Czech. The pipeline ends with generating a surface form of the Czech translation from its deep representation. The deep syntactic representation of a sentence in TectoMT follows the Prague tectogrammatics theory (Sgall, 1967; Sgall et al., 1986). It is a dependency tree whose nodes correspond to content words. Personal pronouns missing on the surface are reconstructed in special nodes. All nodes are assigned semantic roles and coreference relations are annotated. Originally, translation of both it and reflexive pronouns was treated by rules in TectoMT. The English deep representation of it was translated as to and a simple heuristics determined if it is being expressed on the surface. Similarly, reflexives were always translated as se. This paper evaluates the translation quality reached using specialized classifiers for these pronouns. Unlike the related work on pronouns in MT, we focus on improving the lexical choice, not tuning other components that affect generating a particular surface form (e.g. coreference resolution). 2 Linguistic analysis We started with an analysis of how the pronouns under investigation are translated 1 in two Czech- English parallel treebanks Prague Czech-English Dependency Treebank 2.0 (Hajič et al., 2011, PCEDT) and CzEng 1.0 (Bojar et al., 2012). 1 Note that besides the means mentioned below, there are other ways of translating these pronouns. However, in most cases they can be replaced by one of the variants listed with no harm to the quality of the Czech output International Joint Conference on Natural Language Processing, pages , Nagoya, Japan, October 2013.
2 English It coreferential with NP coreferential with VP or a segment pleonastic emphasis on object object affected by its own action Reflexives personal pronoun [PersPron] demonstrative pronoun to[to] nothing, structure changes [Null] adjective samotný [Samotny] pronoun sám[sam] reflexive pronoun se [Se] both sám and se[samse] Figure 1: The mapping of the types of English it (top) and reflexive pronouns (bottom) to their Czech counterparts. 2.1 Translating it from English to Czech Czech In English, three coarse-grained types of it are traditionally distinguished: referential it pointing to a noun phrase in the preceding or following context, anaphoric it referring to a verbal phrase or a larger discourse segment, and non-referential pleonastic it, whose presence is imposed only by the syntactic rules of English. There are three prevailing ways of translating it into Czech, also three different ways prevail. Personal pronouns or zero forms 2, whose gender and number are determined by their antecedent, are the most frequent variant (referred to as the PersPron class in the following). Another way is using the Czech demonstrative pronoun to, which is a neuter singular form of the pronoun ten (To class). The third option results in fact no lexical counterpart in the Czech translation, the English and Czech sentences thus having a different syntactic structure (Null class). The mapping between English and Czech types is shown in Figure 1. The To class is particularly overloaded. Even if a given occurrence of it corefers with a noun phrase, translating it to to does not require identifying the antecedent since the gender and number of to are always fixed (see Example 1). (1) Some investors say Friday s sell-off was a good thing. It was a healthy cleansing, says Michael Holland. Někteří investoři říkají, že páteční výprodej byla dobrá věc. Byla to zdravá očista, říká Michael Holland. 2.2 Translating reflexive pronouns from English to Czech According to the Longman Dictionary of Contemporary English, 3 reflexive pronouns are typically used in two scenarios: to show that the object is affected by its own action and to emphasize that the utterance relates to one particular thing, person etc. (see Example 2). (2) The Gambia s President himself participated in the hunt last year. The most usual Czech counterparts of English reflexives comprise the Czech reflexive pronoun se (Se class), the adjective samotný (Samotny class) and the pronoun sám (Sam class), all in various morphological forms. Moreover, sám often appears with se to emphasize that the action affecting the object is performed by the object itself (SamSe class). Figure 1 illustrates the correspondence between English usages and Czech expressions. 3 Data To train and intrinsically evaluate TMs for it and English reflexives, we have extracted data from the entire PCEDT and 11 sections of CzEng. Both treebanks follow the annotation style based on the Prague tectogrammatics theory (Sgall, 1967; Sgall et al., 1986). While PCEDT consists of 50,000 sentence pairs annotated mostly manually, the annotation of CzEng with 15 million parallel sentences is entirely automatic. Both treebanks have been provided with a fully automatic alignment of Czech and English nodes (Mareček et al., 2008), which is, however, prone to errors for it and its Czech counterparts. Since they are pronouns, they can replace a wide range of content words and their meaning is inferred mainly from the context. The situation is better for verbs as their usual parents in dependency trees: since they carry meaning in a greater extent, their automatic alignment is of a higher quality. We took advantage of this property and the gold annotation of semantic roles in PCEDT, obtaining Czech translations as the argument of the Czech verb aligned with the English parent verb that fills the same semantic role as the given it. Using this approach, we succeeded in reaching the Czech counterpart in more than 60% of instances. The rest had to be done manually. 2 Czech is a pro-drop language
3 It Train Test Reflexives Train Test PCEDT sections CzEng sections PersPron Se 6, To Sam 2, Null SamSe 1, Samotny Total Total 10,741 1,075 Table 1: Distribution of classes in the data sets. Czech counterparts of English reflexive pronouns have been collected directly from the alignment in CzEng, ignoring the cases where the aligned Czech word does not fall in one of the classes mentioned in Section 2.2. The overall statistics of the train and the test set are shown in Table 1. The disproportion of training instances for it results from the manual annotation of classes, which could not be completely finished due to time reasons. In order to maintain the overall distribution, we also had to limit the number of automatically annotated classes. Given the observation (see Section 2), we designed features to differentiate between the ways it and reflexives are translated. 3.1 Features for it The translation mapping in Figure 1 suggests that identifying the English type of it might be informative. We thus constructed a binary coreferencerelated feature based on the output of the system NADA (Bergsma and Yarowsky, 2011) giving an estimate of whether an instance of it is coreferential. Some verbs are more likely to bind with it that refers to a longer utterance. Such it is relatively consistently translated as a demonstrative to. However, PCEDT is too small to be a sufficient sample from a distribution over lexical properties. Hence, we took advantage of CzEng and collected co-occurrence counts between a semantic role that the given it fills concatenated with a lemma of its verbal parent and a Czech counterpart having the same semantic role (denoted as csit). We filtered out all occurrences where csit was neither a personal pronoun nor to. For both possible values of csit a feature is constructed by looking up frequencies for a concrete occurrence in the co-occurrence counts collected on CzEng and quantized into 4-5 bins following the formula: count(semrole : parent csit) bin(log( count(semrole : parent)count(csit) )). Linguistic analysis suggested including syntax- a) b) be it subj it be obj adj:compl v:to+inf subj v:that+fin Figure 2: Examples of syntactic features capturing typical constructions with a verb be. oriented patterns related to the verb to be such as those shown in Figure 2. For instance, nominal predicates 4 tend to be translated as to even if it is coreferential. On the other hand, an adjectival predicate followed by a subordinating clause introduced by the English connectives to or that usually indicates a pleonastic usage of it translated as a null subject. 3.2 Features for reflexive pronouns Here we focused on distinguishing between the two most frequent meanings (see Section 2.2). Ideally, the POS tag of the parent would be a sufficient feature because reflexives in the second meaning should depend on a noun. However, since we deal with automatically parsed trees we had to support the parent POS tag by the POS tag of the immediately preceding word. Moreover, another feature indicates if the preceding word is a noun and agrees with the pronoun in gender and number. Furthermore, we observed that sám rarely appears in other case than nominative. Although this feature exploits the target side, we can use it since the case of the governing Czech noun is already known at the point when reflexives are translated. Last but not least, the morpho-syntactic pattern (including a possible preposition) in which the reflexive pronoun appears is a valuable feature. 4 Experiments and Evaluation To mitigate a possible error caused by a wrong classifier choice, we built several models based on various Machine Learning classification methods including Maximum Entropy inplemented in the AI::MaxEntropy Perl library, 5 logistic regression with one-against-all strategy from Vowpal Wabbit 6 as well as decision trees, k-nn and SVM from Scikit-learn library (Pedregosa et al., 2011). 4 The verb to be has an object. 5 laye/ai-maxentropy vw 1039
4 It Reflexives Train Test Train Test Baseline AI::MaxEntropy VW (passes=20, l2=10e-5) sklearn:decision-trees sklearn:k-nn (k=10) sklearn:svm (kernel=linear) Table 2: Accuracy of both translation models on the training and test data. We compare our results with a majority class baseline (PersPron and Se classes) in Table 2. The results show a 17% gain when our approach is used. The specialized models have been integrated in the TectoMT system and extrinsically evaluated on the English-Czech test set for the WMT 2011 Translation Task (Callison-Burch et al., 2011). 7 This data set contains 3,003 English sentences with one Czech reference translation, out of which 430 contain at least one occurrence of it and 52 contain a reflexive pronoun. The new approach was compared to the original TectoMT rule-based pronoun handling heuristics (see Section 1). The shift from the original settings to the new translation models results in 166 changed sentences with it and 17 changed sentences with English reflexives. In terms of BLEU score, we observe a marginal drop from to using the new approach. However, BLEU may be too coarse for this kind of experiment. In order to give a more realistic view, we carried out a manual evaluation. All 17 modified sentences for reflexives and 50 randomly sampled changed sentences containing it were presented to one annotator who assessed which of the two systems gave a better translation. Table 3 shows that improved sentences dominate in both cases. Overall, the improved sentences account for around 8.5% of all sentences with it and 23% sentences containing a reflexive pronoun. 5 Discussion Looking into the types of improvements and errors in the manually evaluated sentences, we have found that the new model for it opted for a different translation only in cases where the original system decided to express to on the surface. In 13 out of 24 improvements, the new model for it succeeded in correctly resolving the Null class 7 It Reflexives new better than old old better than new 13 0 equal quality 13 5 Table 3: The results of manual evaluation on sentences translated by TectoMT in the original settings and using the new translation models while in the remaining 11 cases, the corrected class was PersPron. It took advantage mostly of the syntax-based features in the former case and the coreference-related feature in the latter. Regarding the reflexive pronouns the pronoun was used in its emphasizing meaning in all but two altered sentences. This accords with the design of features, which are mainly targeted at revealing this usage of reflexives. Moreover, the feature indicating if a Czech noun is in nominative case has proved to be particularly useful, correctly driving the lexical choice between sám and samotný. The majority of errors stem from incorrect activation of syntactic features due to parsing and POS tagging errors. 6 Conclusions In this work, we presented specialized translation models for two types of English pronouns: it and reflexives. Integrating them into an English-Czech syntax-based MT system TectoMT we succeeded in improving the concerned sentences measured by human evaluation. Generally, it is intractable to design a specific feature set for every word. However, this work shows on two examples that the correct translation of some words depends on many linguistic aspects, e.g. syntax and coreference and that is worth taking these aspects into account. Acknowledgments This work has been supported by the Grant Agency of the Czech Republic (grants P406/12/0658 and P406/2010/0875), the grant GAUK 4226/2011 and EU FP7 project Khresmoi (contract no ). This work has been using language resources developed and/or stored and/or distributed by the LINDAT-Clarin project of the Ministry of Education of the Czech Republic (project LM ). 1040
5 References Shane Bergsma and David Yarowsky NADA: A Robust System for Non-Referential Pronoun Detection. In DAARC, pages 12 23, Faro, Portugal, October. Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra Galuščáková, Martin Majliš, David Mareček, Jiří Maršík, Michal Novák, Martin Popel, and Aleš Tamchyna The Joy of Parallelism with CzEng 1.0. In Proceedings of LREC 2012, Istanbul, Turkey, May. ELRA, European Language Resources Association. Chris Callison-Burch, Philipp Koehn, Christof Monz, and Omar Zaidan Findings of the 2011 Workshop on Statistical Machine Translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22 64, Edinburgh, Scotland, July. Association for Computational Linguistics. Liane Guillou Improving Pronoun Translation for Statistical Machine Translation. In Proceedings of the Student Research Workshop at the 13th Conference of the EACL, pages 1 10, Stroudsburg, PA, USA. Association for Computational Linguistics. Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Silvie Cinková, Eva Fučíková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiří Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, and Zdeněk Žabokrtský Prague Czech-English Dependency Treebank 2.0. Kristýna Onderková Possessive Pronouns in English and Czech Works of Fiction, Their Use with Parts of Human Body and Translation. Master s thesis, Masaryk University, Faculty of Arts. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12: , November. Petr Sgall, Eva Hajičová, and Jarmila Panevová The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. D. Reidel Publishing Company, Dordrecht. Petr Sgall Generativní popis jazyka a česká deklinace. Academia, Prague, Czech Republic. Zdeněk Žabokrtský, Jan Ptáček, and Petr Pajas TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer. In Proceedings of the Third Workshop on Statistical Machine Translation, pages , Stroudsburg, PA, USA. Association for Computational Linguistics. Christian Hardmeier and Marcello Federico Modelling Pronominal Anaphora in Statistical Machine Translation. In Marcello Federico, Ian Lane, Michael Paul, and François Yvon, editors, Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT), pages Philipp Koehn, Franz Josef Och, and Daniel Marcu Statistical Phrase-based Translation. In Proceedings of the 2003 Conference of the NAACL HLT Volume 1, pages 48 54, Stroudsburg, PA, USA. Association for Computational Linguistics. Ronan Le Nagard and Philipp Koehn Aiding Pronoun Translation with Co-Reference Resolution. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages , Uppsala, Sweden, July. Association for Computational Linguistics. David Mareček, Zdeněk Žabokrtský, and Václav Novák Automatic Alignment of Czech and English Deep Syntactic Dependency Trees. In Proceedings of the Twelfth EAMT Conference, pages Izak Morin Translating Pronouns, Proper Names and Kinship Terms from Indonesian into English and vice versa. TEFLIN Journal: A publication on the teaching and learning of English, 16(2). 1041
Parmesan: Meteor without Paraphrases with Paraphrased References
Parmesan: Meteor without Paraphrases with Paraphrased References Petra Barančíková Institute of Formal and Applied Linguistics Charles University in Prague, Faculty of Mathematics and Physics Malostranské
Machine translation techniques for presentation of summaries
Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)
Paraphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining
Introduction. Philipp Koehn. 28 January 2016
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Factored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang [email protected], [email protected] School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
COMPUTATIONAL DATA ANALYSIS FOR SYNTAX
COLING 82, J. Horeck~ (ed.j North-Holland Publishing Compa~y Academia, 1982 COMPUTATIONAL DATA ANALYSIS FOR SYNTAX Ludmila UhliFova - Zva Nebeska - Jan Kralik Czech Language Institute Czechoslovak Academy
Spotting false translation segments in translation memories
Spotting false translation segments in translation memories Abstract Eduard Barbu Translated.net [email protected] The problem of spotting false translations in the bi-segments of translation memories
Syntactic and Semantic Differences between Nominal Relative Clauses and Dependent wh-interrogative Clauses
Theory and Practice in English Studies 3 (2005): Proceedings from the Eighth Conference of British, American and Canadian Studies. Brno: Masarykova univerzita Syntactic and Semantic Differences between
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
THERE ARE SEVERAL KINDS OF PRONOUNS:
PRONOUNS WHAT IS A PRONOUN? A Pronoun is a word used in place of a noun or of more than one noun. Example: The high school graduate accepted the diploma proudly. She had worked hard for it. The pronoun
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.
LESSON THIRTEEN STRUCTURAL AMBIGUITY Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity. Structural or syntactic ambiguity, occurs when a phrase, clause or sentence
Hybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms
Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms ESSLLI 2015 Barcelona, Spain http://ufal.mff.cuni.cz/esslli2015 Barbora Hladká [email protected]
Collaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert [email protected] Jean Senellart Systran SA [email protected] Laurent Romary Humboldt Universität Berlin
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh [email protected] Abstract We describe our systems for the SemEval
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
Empirical Machine Translation and its Evaluation
Empirical Machine Translation and its Evaluation EAMT Best Thesis Award 2008 Jesús Giménez (Advisor, Lluís Màrquez) Universitat Politècnica de Catalunya May 28, 2010 Empirical Machine Translation Empirical
INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning [email protected]
INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning [email protected] Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &
Czech Verbs of Communication and the Extraction of their Frames
Czech Verbs of Communication and the Extraction of their Frames Václava Benešová and Ondřej Bojar Institute of Formal and Applied Linguistics ÚFAL MFF UK, Malostranské náměstí 25, 11800 Praha, Czech Republic
Chapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models
p. Statistical Machine Translation Lecture 4 Beyond IBM Model 1 to Phrase-Based Models Stephen Clark based on slides by Philipp Koehn p. Model 2 p Introduces more realistic assumption for the alignment
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results
Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results Aleš Tamchyna, Ondřej Fiala, Kateřina Veselovská Charles University in Prague, Faculty of Mathematics and Physics Institute
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
Machine Translation and the Translator
Machine Translation and the Translator Philipp Koehn 8 April 2015 About me 1 Professor at Johns Hopkins University (US), University of Edinburgh (Scotland) Author of textbook on statistical machine translation
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System Thiruumeni P G, Anand Kumar M Computational Engineering & Networking, Amrita Vishwa Vidyapeetham, Coimbatore,
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
Syntax: Phrases. 1. The phrase
Syntax: Phrases Sentences can be divided into phrases. A phrase is a group of words forming a unit and united around a head, the most important part of the phrase. The head can be a noun NP, a verb VP,
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
Modeling coherence in ESOL learner texts
University of Cambridge Computer Lab Building Educational Applications NAACL 2012 Outline 1 2 3 4 The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion The Task: Automated Text
Automatic Pronominal Anaphora Resolution in English Texts
Computational Linguistics and Chinese Language Processing Vol. 9, No.1, February 2004, pp. 21-40 21 The Association for Computational Linguistics and Chinese Language Processing Automatic Pronominal Anaphora
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Genetic Algorithm-based Multi-Word Automatic Language Translation
Recent Advances in Intelligent Information Systems ISBN 978-83-60434-59-8, pages 751 760 Genetic Algorithm-based Multi-Word Automatic Language Translation Ali Zogheib IT-Universitetet i Goteborg - Department
The Book of Grammar Lesson Six. Mr. McBride AP Language and Composition
The Book of Grammar Lesson Six Mr. McBride AP Language and Composition Table of Contents Lesson One: Prepositions and Prepositional Phrases Lesson Two: The Function of Nouns in a Sentence Lesson Three:
Text-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden [email protected]
Building a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
On the practice of error analysis for machine translation evaluation
On the practice of error analysis for machine translation evaluation Sara Stymne, Lars Ahrenberg Linköping University Linköping, Sweden {sara.stymne,lars.ahrenberg}@liu.se Abstract Error analysis is a
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
Linguistic Universals
Armin W. Buch 1 2012/11/28 1 Relying heavily on material by Gerhard Jäger and David Erschler Linguistic Properties shared by all languages Trivial: all languages have consonants and vowels More interesting:
Classification of Electrocardiogram Anomalies
Classification of Electrocardiogram Anomalies Karthik Ravichandran, David Chiasson, Kunle Oyedele Stanford University Abstract Electrocardiography time series are a popular non-invasive tool used to classify
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
Report on the embedding and evaluation of the second MT pilot
Report on the embedding and evaluation of the second MT pilot quality translation by deep language engineering approaches DELIVERABLE D3.10 VERSION 1.6 2015-11-02 P2 QTLeap Machine translation is a computational
BEER 1.1: ILLC UvA submission to metrics and tuning task
BEER 1.1: ILLC UvA submission to metrics and tuning task Miloš Stanojević ILLC University of Amsterdam [email protected] Khalil Sima an ILLC University of Amsterdam [email protected] Abstract We describe
Automatic Pronominal Anaphora Resolution. in English Texts
Automatic Pronominal Anaphora Resolution in English Texts Tyne Liang and Dian-Song Wu Department of Computer and Information Science National Chiao Tung University Hsinchu, Taiwan Email: [email protected];
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
The Role of Sentence Structure in Recognizing Textual Entailment
Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure
Understanding Clauses and How to Connect Them to Avoid Fragments, Comma Splices, and Fused Sentences A Grammar Help Handout by Abbie Potter Henry
Independent Clauses An independent clause (IC) contains at least one subject and one verb and can stand by itself as a simple sentence. Here are examples of independent clauses. Because these sentences
Language Meaning and Use
Language Meaning and Use Raymond Hickey, English Linguistics Website: www.uni-due.de/ele Types of meaning There are four recognisable types of meaning: lexical meaning, grammatical meaning, sentence meaning
Overview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE
SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE OBJECTIVES the game is to say something new with old words RALPH WALDO EMERSON, Journals (1849) In this chapter, you will learn: how we categorize words how words
L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25
L130: Chapter 5d Dr. Shannon Bischoff Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25 Outline 1 Syntax 2 Clauses 3 Constituents Dr. Shannon Bischoff () L130: Chapter 5d 2 / 25 Outline Last time... Verbs...
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
6-1. Process Modeling
6-1 Process Modeling Key Definitions Process model A formal way of representing how a business system operates Illustrates the activities that are performed and how data moves among them Data flow diagramming
THUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
Ling 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
e-commerce product classification: our participation at cdiscount 2015 challenge
e-commerce product classification: our participation at cdiscount 2015 challenge Ioannis Partalas Viseo R&D, France [email protected] Georgios Balikas University of Grenoble Alpes, France [email protected]
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010 OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION SMT VS. RBMT [K.
UNIVERSITÀ DEGLI STUDI DELL AQUILA CENTRO LINGUISTICO DI ATENEO
TESTING DI LINGUA INGLESE: PROGRAMMA DI TUTTI I LIVELLI - a.a. 2010/2011 Collaboratori e Esperti Linguistici di Lingua Inglese: Dott.ssa Fatima Bassi e-mail: [email protected] Dott.ssa Liliana
Improving Credit Card Fraud Detection with Calibrated Probabilities
Improving Credit Card Fraud Detection with Calibrated Probabilities Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada and Björn Ottersten Interdisciplinary Centre for Security, Reliability
