Machine Translation System on the Pair of Arabic / English
|
|
- Antony Patrick
- 7 years ago
- Views:
Transcription
1 Machine Translation System on the Pair of Arabic / English Khaireddine Bacha and Mounir Zrigui Laboratoire UTIC, Équipe de Monastir, Faculté des sciences de Monastir, Monastir, Tunisia Khairi.bacha@gmail.com, Mounir.Zrigui@fsm.rnu.tn Keywords: Abstract: Learning, Arabic-English Machine Translation, Statistical Models. Our work fits into the project entitled "TELA": an environment for learning the Arabic language computerassisted, which covers many issues related to the use of words in Arabic. This environment contains several sub-systems whose purpose is to provide an important educational function by allowing the learner to discover information beyond the scope of the phrase of the year. In these subsystems there are semantic analyzers which have several features and multifunctions (Arabic-English machine translation, Arabic- English machine translation, derivation, and conjugation, etc.). Therefore, in this article we focused upon the design of machine translation systems on the pair of Arabic / English based on statistical models. 1 INTRODUCTION In the case of Arabic writing, the search began around 1970, even before problems editing Arabic texts are completely controlled. Early work involved including lexicons. For ten years, the internationalization of the Web and the proliferation of media in Arabic, revealed a large number of Arabic NLP applications. Researchs have begun to address more diverse issues such as syntax, machine translation, automatic indexing of documents, research information, etc... The field of natural language processing has been a major revolution in recent years in machine translation and the other hand needs reliable automatic translators which are constantly increasing. Therefore, we focused on this area to develop an automatic translator based on a statistical model. today. One of the first articles about this approach is that of (Brown, 1993). The translation can take place in two modules: An analysis module, which produces a representation of the input text in source language into a language independent of any postulated pivot language, and a generation module, which builds from this even a text representation of output in the target language. The triangle shown in Fig. 1 is assigned to Vauquois. It summarizes an analysis of the translation process yet fully relevant and used today (Vauquois, 1968). To find the best translation of a source sentence (s), the translator Moses seeks the target sentence (t) that maximizes a log-linear combination of characteristic functions. The characteristic functions used in this system are: 2 TRANSLATION OF REFERENCE SYSTEMS The statistical translation comes down to find the target document with the highest probability of being the translation of a source document. The development and maintenance of such a system usually requires a significant human labor by specialists (bilingual). This section summarizes several approaches that have marked research on machine translation, the middle of last century to Figure 1: Vauquois triangle; Representation of different linguistic architecture. The score (m) in the translation table assigns to every pair of phrases (t, s). The score of a language model: our experiments use a trigram model to fold. The score of the distortion model. 347
2 KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment The exponential in the number of target words generated. This "characteristic function", called word penalty, simply allows the counterbalance system's tendency to prefer short phrases. Finally, the expression that must maximize Moses is: Translation system based on statistical segments which are called in English "phrase based machine translation system statistique" for both language pairs, using free software Moses (KOEHN, 2007). Among the measures most commonly used in the community of machine translation to see the quality of the translation of reference: The score BLEU (Bilingual Evaluation Understudy) is proposed by (Papineni, 2002). The main idea is to compare the output with a translator / translation reference... The score NIST (National Institute of Standards and Technology) that was proposed in International assessments of machine translation systems. The conditions and detailed results are available on the NIST website. In 2006 and 2008, the BLEU scores of these reference systems are given in Table 1. Table 1: BLEU score of the reference system Arabic / English (Schwenk, 2010). Bitextes News + ISI bitextes + données ONU Taille des bitextes 56M 189M Dev Nist06 42,69 43,51 Test Nist08 42,06 42,19 3 OVERVIEW OF THE WORK OF TRANSLATION SYSTEMS STATISTICS FOR COUPLE OF ARABIC-ENGLISH The translation from Arabic into English, two languages considered remote structure, requires more effort than the translation between two languages of similar structure (such as French and English). The need of Arabization was mainly due to the need to simplify the use of Arabic in computer technology today. This implies the establishment up of engineering of the Arabic language in the context of multilingualism required the opening of the Arab world to other cultures that carry the current scientific and technological knowledge. The Arabic language is characterized by a rather complex morphology whose complexity presents challenges for machine translation (Diab, 2007). Our investigations are based on the main existing work in this area (Lee, 2004), using a segmentation approach in roots and affixes, then (Habash, 2006), with a linguistically motivated approach to tokenization, showed that morphological preprocessing can be useful for statistical machine translation. On the other hand, first Diab and colleagues (Diab, 2004), developed in Perl by the team of Mona Diab 1 at Leland Stanford Junior University grammatical labeling is used for Arabic ASVM 2. This is an adaptation to Arabic English Yamcha 3 system based on support vector machines margin (SVM). The system is trained on an annotated corpus named Arabic Treebank. Second, researchers, (Diab, 2004) showed that the use of voyellation only leads to no improvement (partial voyellation), or even worse results (voyellation complete). Besacier, is an improvement from the first system in the enrichment of single words with syntactic information, as labels (part of speech tags).this system has been subject to IWSLT08 (Besacier, 2004). And lately, Wigdan, provides an assessment of the morphological analyzer as a tool for preprocessing for Arabic- English translation (Wigdan, 2011). This work is done according to the recommendations of the first comprehensive system of automatic translation is shown under the "Georgetown Experiment" in January 7, The system, which has a vocabulary of 250 words, manages to translate some sixty sentences carefully chosen from Russian to English. The heart of this system is based on six rules of grammar and especially a bilingual dictionary. 4 SYSTEM ARCHITECTURE FORMACHINE TRANSLATION 4.1 Location of Module Automatic Translation Arabic / English in the TELA Project Under the TELA project (Towards Environmental Learning Arabic) (Khaireddine, 2011), whose goal is the design and implementation of an environment of
3 MachineTranslationSystemonthePairofArabic/English computer-assisted learning, covering many issues related to the use of words Arabic. A platform that allows teachers to create language learning activities based on technologies from the TAL, which notably includes automatic evaluations. We have designed and built a prototype dictionary (Khaireddine, 2012). The latter two roles, firstly it is the source of data on which are based exercises. that maximizes p (s / c), the probability that a sentence is the translation of c s (we always translate Arabic into English s in this c following): Source Text S Decoder Argmax p(e)*p(f/e) P(S/c) P(c) Models Translation model Language model Phase of training Arabic Corpus Anglish Corpus c Target Text S : Arabic sentence Figure 3: Machine statistical machine translation Arabic / English. Figure 2: General architecture of semantic analyzer multifunction. This return is also generated automatically by synthesizing the information in the dictionary. It has an important didactic function by allowing the learner to discover information beyond the scope of the phrase of the year. Second, it is an electronic dictionary trilingual (Arabic, French and English) one-way. Its macrostructure consists on a single volume. It can be seen as a structure composed of linguistic objects. Among these objects, we can find: the headword, pronunciation, grammatical,اسم,فعل) categories that can have this headword examples, etc..), Definitions, translations, صفة,مصدر collocations, an etymology, meanings, glosses, plans lexical, lexical functions, etc.. We first started completed the module Arabic- English machine translation. Construct a statistical translation system for a language of rich morphology such as Arabic requires several pretreatments. In this section, we will implement and test a machine translation system that manipulates data enriched with morphosyntactic information for the Arabic / English and propose some solutions. 4.2 Methdoologies: Approach used to Build Our Own Arabic Tagger Recall that the approach to statistical machine translation is as followed. By giving an Arabic sentence s, we may seek the English translation of c 3 shows the main components of probabilistic machine translation system. The decoder takes as input the source text, the translation model and the language model for outputting the translated text. Note that the language into which we want to translate is called "target language" 4.3 The Implementation Stages The initial input of our system, as described in Figure 3, has two bodies, one body and another Arabic English. These two bodies are structured so that each row i in the Arabic corpus is aligned with the line i in the English corpus, so that row i in the corpus is the English translation of the line i in the Arabic corpus. To obtain more flexibility in our translation table, we removed the short vowels in Arabic corpus. We find the same word being vowelized differently in our training corpus, that is to say, sometimes it is not vowelized and sometimes the first or last letter is vowelized. The effect of enriching words by morphosyntactic categories appears to improve translation quality of these examples (remove false alignments that existed in the classical model, filter model translation and generates more opportunities correct that no longer existed in the classical model). 4.4 Results and Evaluation of the Analyzer in Machine Translation Games development and test each consist of
4 KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment pairs of sentences. Measures used for evaluation are BLUE and NIST. The results which are respectively given in Table 2. Table 2: Blue scores and results obtained NILT system. ASVM Dev Test 06 Test08 Score BLEU Score NIST We can notice that the optimization is very important for the system, since it has increased significantly Bleu and NIST scores after adjustment system Arabic / English with texts in Arabic. The adapted system clearly produced good translations for these examples. There is of course a few mistakes in these sentences, but the quality of translations can largely understand the meaning of sentences. To be able to test and validate the overall approach, a first version of statistical machine translation system was performed. It allows the creation of virtually any type of words and phrases, without collection and use of the trace of their implementation by the learners. The screen shot below shows the current state of the "visible" system developed so far. The ergonomics and design interfaces are provisional and should evolve in the future... We want to have a table near the window. نريد مائدة بجانب النافذة. Figure 4: Translation Arabic / English. 5 CONCLUSIONS AND OUTLOOK The statistical approach to machine translation is now used to quickly build translation systems for many language pairs. This system has been integrated into a semantic analyzer multifunctional; its realization is part of a learning environment of the Arabic language computer-assisted "TELA". As perspective, in addition to the experimental validation on a larger scale of our results, it would be interesting to determine and integrate our machine translation system on complete pair of Arabic / English in the semantic analyzer and add a feature that calculates the similarity between the source word and its translation to avoid losing the meaning of the text, because even for commercial systems such as SYSTRAN translation, there are translations that have nothing to do with the source. REFERENCES Besacier, L., Benyoussef Atef and Blanchon, (October 2008). H. The LIG Arabic / English Speech translation System at IWSLT08, pp , Hawaii Brown, P., Pietra, S. D., Pietra, V. D. and Mercer, R. (1993), The mathematics of machine translation: Parameter estimation. in Computational Linguistics, pp Diab M., Ghoneim M. and Habash N. (2007). Arabic diacritization in the context of statistical machine translation. In Proceedings of MT-Summit. Habash, Nizar, Bonnie Dorr and Christof Monz. Challenges in Building an Arabic Generation-heavy Machine Translation System and Extending it with Statistical Components (2006). In Proceedings of the Association for Machine Translation in the Americas (AMTA-2006), Boston Diab, M., Hacioglu, K. and Jurafsky, D. (2004), Automatic tagging of arabic text: From raw text to base phrase chunks, in 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), Boston, MA. USA Diab M., Ghoneim M. and Habash N. (2007). Arabic diacritization in the context of statistical machine translation. In Proceedings of MT-Summit Lee Y. (2004). Morphological analysis for statistical machine translation. In Proceedings of HLT-NAACL 2004, p : Association for Computational Linguistics. Khaireddine. B., Mounir. Z., Mohamed Amine. N., Anis. Z., 2011, TELA: Towards Environmental Learning Arabic, The International Conference on Artificial Intelligence (ICAI'11), 2011, WORLDCOMP'11, Las Vegas, Nevada, USA, 6 pages. download/worldcomp%2711/2011%20cd%20pa pers/eee4685.pdf Khaireddine. B and Mounir. Z 2012, Design of a Synthesizer and a Semantic Analyzer's Multi Arabic, for use in Computer Assisted Teaching, International 350
5 MachineTranslationSystemonthePairofArabic/English Journal of Information Sciences and Application (IJISA) pp , /volume/ijisav4n1.htm Koehn P. and Schroeder J. (2007). Experiments in domain adaptation for statistical machine translation. In Second Workshop on SMT, p Schwenk H. (2010) Adaptation d un Système de Traduction Automatique Statistique avec des Ressources monolingues- Montréal. Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of COLING-ACL 02, Philadelphia, USA, Vauquois, B. (1968). A Survey of Formal Grammars and Algorithms for Reconition and Translation. FIP Congress-68, Edinburg, Wigdan. M, J. Gosme, F. Debili, Y. Lepage, N. Lucas (2011) Évaluation de G-LexAr pour la traduction automatique statistique - Montpellier 351
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationHybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationLeveraging ASEAN Economic Community through Language Translation Services
Leveraging ASEAN Economic Community through Language Translation Services Hammam Riza Center for Information and Communication Technology Agency for the Assessment and Application of Technology (BPPT)
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationAn Interactive Hypertextual Environment for MT Training
An Interactive Hypertextual Environment for MT Training Etienne Blanc GETA, CLIPS-IMAG BP 53, F-38041 Grenoble cedex 09 Etienne.Blanc@imag.fr Abstract An interactive hypertextual environment for MT training
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationAn Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System Thiruumeni P G, Anand Kumar M Computational Engineering & Networking, Amrita Vishwa Vidyapeetham, Coimbatore,
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationGenetic Algorithm-based Multi-Word Automatic Language Translation
Recent Advances in Intelligent Information Systems ISBN 978-83-60434-59-8, pages 751 760 Genetic Algorithm-based Multi-Word Automatic Language Translation Ali Zogheib IT-Universitetet i Goteborg - Department
More informationSystematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
More informationFactored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
More informationLearning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
More informationThe KIT Translation system for IWSLT 2010
The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of
More informationHow the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationAdapting General Models to Novel Project Ideas
The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationLatin Syllabus S2 - S7
European Schools Office of the Secretary-General Pedagogical Development Unit Ref.: 2014-01-D-35-en-2 Orig.: FR Latin Syllabus S2 - S7 APPROVED BY THE JOINT TEACHING COMMITTEE ON 13 AND 14 FEBRUARY 2014
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More informationFree Online Translators:
Free Online Translators: A Comparative Assessment of worldlingo.com, freetranslation.com and translate.google.com Introduction / Structure of paper Design of experiment: choice of ST, SLs, translation
More informationLanguage and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
More informationThe TCH Machine Translation System for IWSLT 2008
The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental
More informationThe Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationAn Overview of Applied Linguistics
An Overview of Applied Linguistics Edited by: Norbert Schmitt Abeer Alharbi What is Linguistics? It is a scientific study of a language It s goal is To describe the varieties of languages and explain the
More informationAdaptation to Hungarian, Swedish, and Spanish
www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,
More informationNATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
More informationLINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*
LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* Jonathan Yamron, James Baker, Paul Bamberg, Haakon Chevalier, Taiko Dietzel, John Elder, Frank Kampmann, Mark Mandel, Linda Manganaro, Todd Margolis,
More informationThe Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)
The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationAn Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
More informationMulti-Lingual Display of Business Documents
The Data Center Multi-Lingual Display of Business Documents David L. Brock, Edmund W. Schuster, and Chutima Thumrattranapruk The Data Center, Massachusetts Institute of Technology, Building 35, Room 212,
More informationAppraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationEmpirical Machine Translation and its Evaluation
Empirical Machine Translation and its Evaluation EAMT Best Thesis Award 2008 Jesús Giménez (Advisor, Lluís Màrquez) Universitat Politècnica de Catalunya May 28, 2010 Empirical Machine Translation Empirical
More informationBilingual Education Assessment Urdu (034) NY-SG-FLD034-01
Bilingual Education Assessment Urdu (034) NY-SG-FLD034-01 The State Education Department does not discriminate on the basis of age, color, religion, creed, disability, marital status, veteran status, national
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationA Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
More informationWHITE PAPER. Machine Translation of Language for Safety Information Sharing Systems
WHITE PAPER Machine Translation of Language for Safety Information Sharing Systems September 2004 Disclaimers; Non-Endorsement All data and information in this document are provided as is, without any
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationMASTERS OF SCIENCE ADVANCED ENGLISH PROFESSIONAL STUDIES IN THE FIELD OF ELECTRICAL ENERGETICS AND ENGINEERING
MASTERS OF SCIENCE ADVANCED ENGLISH PROFESSIONAL STUDIES IN THE FIELD OF ELECTRICAL ENERGETICS AND ENGINEERING Vasilii V. TIUNOV The FSBEI of HPE «The Perm National Research Polytechnic University» (PNRPU)
More informationAutomatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
More informationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine Translation Eleftherios Avramidis e.avramidis@sms.ed.ac.uk Philipp Koehn pkoehn@inf.ed.ac.uk School of Informatics University of Edinburgh
More informationImproving Machine Translation using Hybrid Dictionary-Graph Based Word Sense Disambiguation with Semantic and Statistical Methods
Improving Machine Translation using Hybrid Dictionary-Graph Based Word Sense Disambiguation with Semantic and Statistical Methods Ola Mohammad Ali, Mahmoud GadAlla, and Mohammad Said Abdelwahab Abstract
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationStatistical Pattern-Based Machine Translation with Statistical French-English Machine Translation
Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation Jin'ichi Murakami, Takuya Nishimura, Masato Tokuhisa Tottori University, Japan Problems of Phrase-Based
More informationModern foreign languages
Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007
More informationPROMT Technologies for Translation and Big Data
PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED
More informationUsing NLP and Ontologies for Notary Document Management Systems
Outline Using NLP and Ontologies for Notary Document Management Systems Flora Amato, Antonino Mazzeo, Antonio Penta and Antonio Picariello Dipartimento di Informatica e Sistemistica Universitá di Napoli
More informationACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no.
ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. 248347 Deliverable D5.4 Report on requirements, implementation
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationREALIZATION SORTING ALGORITHM USING PARALLEL TECHNOLOGIES bachelor, Mikhelev Vladimir candidate of Science, prof., Sinyuk Vasily
2. В. Гергель: Современные языки и технологии параллельного программирования. М., Изд-во МГУ, 2012. 3. Синюк, В. Г. Алгоритмы и структуры данных. Белгород: Изд-во БГТУ им. В. Г. Шухова, 2013. REALIZATION
More informationCollaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin
More informationUSABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE
USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE Ria A. Sagum, MCS Department of Computer Science, College of Computer and Information Sciences Polytechnic University of the Philippines, Manila, Philippines
More informationAutomatic Identification of Arabic Language Varieties and Dialects in Social Media
Automatic Identification of Arabic Language Varieties and Dialects in Social Media Fatiha Sadat University of Quebec in Montreal, 201 President Kennedy, Montreal, QC, Canada sadat.fatiha@uqam.ca Farnazeh
More informationAn Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
More informationParsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software
More informationNP Subject Detection in Verb-Initial Arabic Clauses
NP Subject Detection in Verb-Initial Arabic Clauses Spence Green, Conal Sathi, and Christopher D. Manning Computer Science Department Stanford University Stanford, CA 94305 {spenceg,csathi,manning}@stanford.edu
More informationProcessing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
More informationThe history of machine translation in a nutshell
The history of machine translation in a nutshell John Hutchins [Web: http://ourworld.compuserve.com/homepages/wjhutchins] [Latest revision: November 2005] 1. Before the computer 2. The pioneers, 1947-1954
More informationCINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
More informationPolish - English Statistical Machine Translation of Medical Texts.
Polish - English Statistical Machine Translation of Medical Texts. Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology kwolk@pjwstk.edu.pl Abstract.
More informationComputer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
More informationThe history of machine translation in a nutshell
1. Before the computer The history of machine translation in a nutshell 2. The pioneers, 1947-1954 John Hutchins [revised January 2014] It is possible to trace ideas about mechanizing translation processes
More informationUNIVERSITY OF JORDAN ADMISSION AND REGISTRATION UNIT COURSE DESCRIPTION
Course Description B.A Degree Spanish and English Language and Literature 2203103 Spanish Language for Beginners (1) (3 credit hours) Prerequisite : none In combination with Spanish for Beginners (2),
More informationStatistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models
p. Statistical Machine Translation Lecture 4 Beyond IBM Model 1 to Phrase-Based Models Stephen Clark based on slides by Philipp Koehn p. Model 2 p Introduces more realistic assumption for the alignment
More informationDutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
More informationConvergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past
More informationDistributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
More informationThe Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden oscar@sics.se
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationA Flexible Online Server for Machine Translation Evaluation
A Flexible Online Server for Machine Translation Evaluation Matthias Eck, Stephan Vogel, and Alex Waibel InterACT Research Carnegie Mellon University Pittsburgh, PA, 15213, USA {matteck, vogel, waibel}@cs.cmu.edu
More informationUM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation Liang Tian 1, Derek F. Wong 1, Lidia S. Chao 1, Paulo Quaresma 2,3, Francisco Oliveira 1, Yi Lu 1, Shuo Li 1, Yiming
More informationA Machine Translation System Between a Pair of Closely Related Languages
A Machine Translation System Between a Pair of Closely Related Languages Kemal Altintas 1,3 1 Dept. of Computer Engineering Bilkent University Ankara, Turkey email:kemal@ics.uci.edu Abstract Machine translation
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationIntroduction. Philipp Koehn. 28 January 2016
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
More informationReading Competencies
Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
More informationBuilding a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationThe University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation
The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationAn Empirical Study on Web Mining of Parallel Data
An Empirical Study on Web Mining of Parallel Data Gumwon Hong 1, Chi-Ho Li 2, Ming Zhou 2 and Hae-Chang Rim 1 1 Department of Computer Science & Engineering, Korea University {gwhong,rim}@nlp.korea.ac.kr
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationTowards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
More information