English to Creole and Creole to English Rule Based Machine Translation System
|
|
|
- Darren Howard
- 9 years ago
- Views:
Transcription
1 English to Creole and Creole to English Rule Based Machine Translation System Sameerchand Pudaruth, Lallesh Sookun, Arvind Kumar Ruchpaul Computer Science and Engineering University of Mauritius Mauritius Abstract Machine translation is the process of translating a text from one language to another, using computer software. A translation system is important to overcome language barriers, and help people communicate between different parts of the world. Most of popular online translation system caters only for the most commonly used languages but no research has been made so far concerning the translation of the Mauritian Creole language. In this paper, we present the first machine translation (MT) system that translates English sentences to Mauritian Creole language and vice-versa. The system uses the rule based machine translation approach to perform translation. It takes as input sentences in the source language either in English or Creole and outputs the translation of the sentences in the target language. The results show that the system can provide translation of acceptable quality. This system can potentially benefit many categories of people, since it allow them to perform their translation quickly and with ease. Keywords rule-based; Mauritian Creole; English I. INTRODUCTION People around the world have access to millions of documents, articles and websites over the Internet. However these electronic documents are often written in different languages and therefore it is necessary to translate them into the relevant target language. Traditionally, human translators have assisted people to understand texts in a foreign language. However the translation process is very time consuming and costly, when it is done by a human translator. Therefore there is an increasing need for a translation tool so that communication can take place between different parts of the world. A translation tool would help to overcome language barriers. In Mauritius, even if English is the official language, the majority of the Mauritian population uses the Creole language as a medium of communication. It is a language which is spoken only in Mauritius and some other small islands in the Indian Ocean. Many Mauritians face difficulties in expressing themselves properly using the English language. At the same time, foreigners and most particularly tourists find it extremely difficult to communicate with the Mauritian people. This is mainly because many Mauritians are at ease only in their native language, which is Creole. The Mauritian Creole language has recently been introduced in the education system in Mauritius [1] as an optional subject and also as a medium of instruction to facilitate teaching and learning in primary schools. The introduction of Creole at school is due to many factors, the most important of which is the high rate of failure at the Certificate of Primary Education (CPE) exams. Mauritian Creole is thus becoming more and more important for the Mauritian population as it has already been formalized as a full-fledged language. Its usage in formal situations has increased dramatically over the last five years. There was a time when it was considered rude to do advertising in Creole. However, now things have changed considerably and Creole has become the preferred medium for advertising for all range and types of organisations as well as for the government. Even the Bible is now available in Creole. In this paper we attempt to solve the identified problems by introducing a machine translation system that will help people translate texts from English to Mauritian Creole and vice-versa. The translation system would greatly assist students in switching to and from Mauritian Creole and English language. It would also be vital to foreigners and tourists as it would enable them understand the meaning of certain words or texts in the Creole language. The tool developed can also be used in conjunction with other translation tools. Thus, A German can use Google translate to translate German texts into English and the use our tool to convert to Creole and vice-versa. Since the Mauritian economy depends heavily on Tourism and tourists from all over the world come to Mauritius, the tool can be of tremendous value to visitors. This paper is organized as follows. Section 2 looks at the related work. The section is divided into 4 parts: an introduction to machine translation, its architectures and briefly describes the main paradigms that have been developed. An overview of the Mauritian Creole grammar will then be discussed. In section 3, we present how the translation system has been implemented. Finally, we evaluate the system in Section 4 and conclude the paper in Section 5. II. RELATED WORKS There are a number of issues that needs to be address for rule-based machine translation systems. For example, Charoenpornsawat et al. [2] address the issue of word-sense disambiguation by using machine learning techniques to automatically extract context information from a training corpus. Their work improves the translation quality of rule based MT systems using this approach. Oliveira et al. [3] shows that by using a systematic approach to break down the length of 25 P a g e
2 the sentences based on patterns, clauses, conjunctions, and punctuation can help to parse, and translate long sentences efficiently. Also, Poornima et al. [4] suggest simplifying complex sentences into simpler sentences in order to improve the translation quality. Their research presented results which showed that this method was able to preserve the meaning of the sentence after translation. A. Machine translation Machine translation (MT) refers to the process of translating a text from one language (source language) into another language (target language), using computers. The field of MT draws ideas and techniques mainly from linguistics, computer science, artificial intelligence (AI), translation theory and statistics [5]. In recent years, there has been a great increase in activity in the machine translation field, resulting in better translation quality being produced by MT systems. There are numerous motivations towards developing a computer based machine translator. First of all, this can help people perform translation faster and at a lower cost compared to human translators. There is also the perceived need to translate large amount of data and this can be done in a relatively short space of time using MT. Many different MT approaches have been developed, such as rule-based, statistical-based and example-based approaches. However there is still no MT approach that is able to produce high quality translations for broader domain systems. In fact most of the successful MT systems are for a restricted domain, such as the Canadian METEO system [6]. B. Machine translation architectures Morphological Word Source Text Composition Interlingua Transfer Transfer Direct Fig.1. The Vauquois Triangle for MT [8] Decomposition Word Target Text Morphological Based on the level of linguistic analysis that is processed, MT system may be categorized as: direct, transfer, and Interlingua. The Vauquois triangle as shown in Figure 1 is used to illustrate these three levels. It shows the increasing depth of analysis needed as the top of the triangle is reached (i.e. from direct approach through transfer approach to Interlingua approach) [7]. Furthermore, the amount of transfer knowledge required to traverse the gap between languages decreases as the top of the triangle is reached. 1) Direct Architecture In direct MT system, morphological analysis is performed on the source text to determine the core of the words to be translated. An example of this is the word searching would be analyzed as coming from the word search. After this stage, each word is translated into a specified language by using large bilingual dictionaries. Then the words are rearranged as according to the target language sentence format. 2) Transfer Architecture The transfer-based MT architecture was developed to produce better translation quality by capturing and using the linguistic information of the source text. It consists of three stages. During the first stage, the source language (SL) text is analyzed to its syntactic structure (i.e. to produce a parse tree) or semantic structure (i.e. to determine the logical form). Then transfer rules are used to map the SL syntactic/semantic structure to a structure in the target language (TL). Finally in the third stage, the target text is generated from the TL structure [8]. 3) Interlingua Architecture In Interlingua MT system, the source text is analyzed and translated into a language-independent representation known as an Interlingua [9]. The target text is then generated from that Interlingua representation. A common choice of Interlingua used by MT systems is Esperanto. C. Machine translation paradigms The main types of MT system are introduced in this section. 1) Rule Based Machine Translation A Rule-based MT (RBMT) system relies on linguistic rules to perform translation between the source and target language. The rules which are defined in such a system are extensively used in the various processes which analyze an input text, such as during morphological, syntactic and semantic analysis [10]. Rule-based approach is one of the oldest machine translation paradigms that have been developed. RBMT includes concepts such as transfer-based MT and interlinguabased MT. During the translation process in RBMT, the source text is analyzed to produce an intermediate representation (i.e. a parse tree or some abstract representation). The target text is then generated from that intermediate representation [10]. 2) Statistical Machine Translation Statistical MT (SMT) is a MT paradigm in which large bilingual corpora (i.e. a set of translated texts in both the source and target language) are analyzed to produce a translation model, and also large amounts of monolingual text in the target language are used to produce a language model [11]. The MT system then uses the two models to perform translation. The statistical approach has gained a growing interest in recent years, since its re-introduction by a group of IBM researchers in the early nineties. The idea was first proposed by Warren Weaver in 1949, but efforts in this 26 P a g e
3 direction were abandoned for various philosophical and theoretical reasons [12]. 3) Example-based Machine Translation Example-Based MT (EBMT) was first suggested by Nagao in 1984 [13]. The basic idea behind EBMT is to generate translation based on previous translation examples. Like statistical MT, example-based MT uses a large bilingual corpus; from which large number of examples are extracted and stored in a database EBMT has three main stages [14]: (i) first of all, the source text is decomposed, and the resulting fragments are matched against a database of real examples, (ii) during the second stage, each fragment is translated and (iii) finally the target text is generated by recombining each fragment. 4) Hybrid Machine Translation Hybrid approaches to MT integrates various MT paradigms, in an effort to make the most of the strengths of the individual paradigms, while compensating for their weaknesses. The basic idea behind hybrid approaches is to combine linguistic paradigms (i.e. RBMT) with non-linguistic paradigms, such as SMT or EBMT, to produce better results. D. Mauritian Creole Grammar The Mauritian Creole (MC) determiner system is quite different from that of French, and it can be considered to be much simpler, as there are no French definite and partitive articles. There is also the exclusion of grammatical gender as well as number in MC. The core of the MC determiner system has the following functional elements: An indefinite singular article enn [15]. A demonstrative sa, which is generally used in conjunction with la [15]. A post-nominal specificity marker la [15]. A plural marker bann [15]. The morpheme li, which is used to represent the pronoun he/she/it/him/her, depending upon the context it is used. Mauritian Creole verbs use TMA (Tense, Modality, and Aspect) markers to indicate the tense [16]. The tense marker ti indicates an action that has already taken place (i.e. past tense). The modality marker pu indicates something will happen (i.e. definite future) whereas the modality marker ava is used to express something that may possibly happen (i.e. indefinite future). The aspect marker pe/ape marks an action that is still going on (i.e. progressive), in contrast to the aspect marker finn/inn which indicates an action that is already over (i.e. perfect) [17]. III. IMPLEMENTATION The proposed system follows the following steps: 1) Split a text into an array of sentences using.,!,? as delimiters 2) Split each sentence into an array of words using \W meta sequence, and the underscore character as delimiters 3) A Greedy algorithm is used to find the longest match for a given fragment, in the database 4) Perform morphological analysis to extract root of word, and check for corresponding translation, in case word has not been translated in step 3. 5) Reorder the words according to the target language sentence format The rule based machine system relies on the use of bilingual dictionary to perform translation. We have used the Diksioner Morisien dictionary to build the bilingual dictionary in the database. A. Greedy Algorithm for Natural Language Processing The greedy algorithm is used to retrieve the target word(s) from the database and it works in following way: it starts at the first character in a sentence, and by traversing from left to right it attempts to find the longest match, based on the words in the database. When a fragment is found, a boundary is marked at the end of the longest match, and then the same searching process continues starting at the next character after the match. If a word is not found, the greedy algorithms remove that character, and then continue the searching process starting at the next character [18]. The steps to perform the greedy search are given below. 1. Initialize startingpos to 0 2. Initialize numelementswordarray to number of elements in wordarray 3. Initialize fragment to 5 4. Create an array translatedwordarray to Store target word 5. Create an array wordcategoryarray to Store word category 6. WHILE starting position <numelementswordarray 7. Initalize flag to 0 8. fragment = fragment Initialize fragmentendpos to startingpos + fragment 10. IF fragmentendpos>numelementswordarray THEN 11. fragmentendpos = numelementswordarray 12. ENDIF 13. Create an empty String greedystring to Store the fragment of words 14. FOR (k= startingpos; k <fragmentendpos; increment k) 15. greedystring = greedystring + + wordarray [k] 16. ENDFOR 17. IF flag is 0, and target word and word category is retrieved from database where source word = greedystring THEN 18. Put target word in translatedwordarray 19. Put word category in wordcategoryarray 20. Set startingpos to fragmentendpos 21. Set fragment to Set flag to ENDIF 24. IF flag is 0, and word not found THEN 25. CALL morphological (greedystring) 26. Store the variables received from the morphological function in a List with parameters (target word, word category) 27. Put target word in translatedwordarray 28. Put word category in wordcategoryarray 29. Set startingpos to fragmentendpos 27 P a g e
4 30. Set fragment to ENDIF 32.ENDWHILE The above greedy algorithm starts the searching process by taking a fragment of maximum of four words starting from the beginning of a sentence. If the fragment is not found in the database, the last word of the fragment is removed, and the searching process continues. If a fragment is found, a boundary is marked at its end, and the searching process continues taking another fragment of a maximum of four words, starting from the boundary. B. Morphological In case a word is not translated after the Greedy search sub-process, it enters the Morphological analysis process, where rules are applied to the word to find its root, which is then searched in the database. An example of a morphological rule is: the suffix ing is removed from the word walking to obtain its root (i.e. walk). The steps for the morphological analysis process are as follows: 1. Get word 2. Set String truncatedword to word 3. Create an empty String savelastchar to Store the last characters of a word 4. Create an empty String translatedwordstring to Store translated word 5. Create an empty String wordcategorystring to Store word category 6. IF length of word > 3 THEN 7. FOR num= 1to 4 8. Set truncatedchar to last character of truncatedword 9. Add truncatedchar to savelastchar 10. truncatedword = truncatedword last character 11. IF num = 1 THEN 12. IF the reverse of String savelastchar = suffix THEN 13. IF target word and word category is retrieved from database where source word = truncatedword THEN 14. Add data to translatedwordstring (optional) 15. Add target word to translatedwordstring 16. Add data to translatedwordstring (optional) 17. Add word category to wordcategorystring 18. Break 19. ENDIF 20. ENDIF 21. ENDIF 22. IF num = 2 THEN 23. Repeat steps ENDIF 25. IF num = 3 THEN 26. Repeat steps ENDIF 28. IF num = 4 THEN 29. Repeat steps Add the reverse of String savelastchar to truncatedword 31. Add the uppercase of String truncatedword to translatedwordstring 32. Add data to wordcategorystring 33. Break 34. ENDIF 35. ENDIF 36. ENDFOR 37.ELSE 38. Add the uppercase of String word to translatedwordstring 39. Add data to wordcategorystring 40.ENDIF 41.RETURN the variables (translatedwordstring, wordcategorystring) in an Array C. Reorder word After the words are translated, they are rearranged according to the target language sentence format. The pseudocode for reordering words is given below: 1. Get translatedwordarray and wordcategoryarray 2. FOR EACH element in wordcategoryarray 3. Initialize key to the key of the array element 4. Initialize value to the value of the array element 5. IF key is not equal to 0 6. IF wordcategoryarray [key -1] is equal to adjective, and value equal to noun 7. CALL swaparrayelement (translatedwordarray, key-1, key) 8. CALL swaparrayelement (wordcategoryarray, key-1, key) 9. ENDIF 10. ENDIF 11.ENDFOR EACH 12.RETURN the variables (translatedwordarray, wordcategoryarray) in an Array IV. EVALUATION Our goal is to provide the most accurate translation; therefore, whenever new rules were added, a series of tests was carried out to make sure that it does not affect the quality of translation. Table 1 presents some sample translations obtained when translating sentences from English to Mauritian Creole. TABLE I. Source text She is a brilliant student I love spicy food I can t tell if he is listening to me or not Either take it or leave it TRANSLATION OF ENGLISH SENTENCES TO MAURITIAN CREOLE Expected Result Target Text Li enn zelev Li ene zelev intelizan Mo kontan intelizan Mo kontan manze epise Mo pa capav dir manze epise Mo pa capav dir si lip ekoute mwa oubien non si li pe ekoute mwa oubien pa Swa pran li oubien les li Swa pran li oubien dekale li From the above table, we have shown that the translation obtained is of acceptable quality. The column Expected result represents the result obtained from a manual translator while Target Text is the text generated from the program. However for some cases, for e.g. in the last sentence, the word leave is being translated to the word dekale in Mauritian Creole, which is being used in the wrong context. The Creole word dekale means put it somewhere else or move it 28 P a g e
5 slightly in English. This will be remedied in our future work where context will be used to deal with polysymy, i.e., words that have different meanings. Table 2 presents some sample translation obtained when translating from Mauritian Creole to English. TABLE II. TRANSLATION OF MAURITIAN CREOLE SENTENCES TO ENGLISH Source text Expected Result Target Text Done to disan, sauve ene lavi! Give your blood, save a life! Give your blood, save Apel mwa kan ou rente lakaz Dan dimans nou mete promotion lor diri Divin bon pou lasante Call me when you get home On Sunday we offer discount on rice Wine is good for health a life! Call me when you enter house On sunday we put promotion on rice Wine good for health As can be seen, most of the generated text is similar to those obtained by the human translator. In the last example, the word is is missing as it is currently quite difficult to know when to add these types of words when translating from English to Creole. The rules are quite complex and there are too many exceptions. It is also challenging to deal with cases of synonymy, i.e. one word in the English language can be translated to several words with completely different meanings in Creole. A. Weaknesses of the current system Word sense disambiguation (WSD) is the process of identifying the appropriate sense of a word in a given context. This has not been addressed. Another weakness is that the system can deal with only short sentences. Thus, in our future work, we intend to improve the translation for longer sentences as well. The tool can also be converted into a web application so that it is easily accessible to everyone. V. CONCLUSION In this paper, we have implemented the first automated translation system that performs translation of English sentences to Mauritian Creole and vice-versa. The translation system uses the rule-based machine translation approach to perform translation. The results obtained show that the implemented system can provide translation of acceptable quality. The speed of translation of the system is also satisfactory. As part of our future work, we plan to investigate how the problem of word sense disambiguation can be solved and how translation can be improved for longer sentences. The translation system would benefit both foreigners and the Mauritian population as it would enable them to swap between their mother tongue and Mauritian Creole with ease and convenience. REFERENCES [1] QUIRIN, S., Kreol Morisien au Primaire. Week End, 15 Jan. p17. [2] CHAROENPORNSAWAT, P. AND SORNLERTLAMVANICH, V. AND CHAROENPORN, T., Improving translation quality of rulebased machine translation. 19th International Conference on Computational Linguistics. [3] OLIVEIRA, F. AND WONG, F. AND HONG, I., Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation. 11th Annual Conference on Computational Linguistics and Intelligent Text Processing, March 2010, Iasi, Romania. Heidelberg: Springer, pp [4] POORNIMA, C. AND DHANALAKSHMI, V. AND ANAND, K.M. AND SOMAN, K.P., Rule based Sentence Simplification for English to Tamil Machine Translation System. International Journal of Computer Applications, 25 (8), [5] HUTCHINS, W.J. AND SOMERS, H.L., An Introduction to Machine Translation. London: Academic Press. [6] CHANDIOUX, J., METEO, an operational system for the translation of public weather forecasts. Seminar on Machine Translation, 8-9 March 1976, Rosslyn, Virginia. Stroudsburg: Association for Computational Linguistics, [7] JURAFSKY, D. AND MARTIN, J.H., Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. 2nd ed. New Jersey: Prentice Hall. [8] NIE, J-Y., Cross-Language Information Retrieval. San Rafael, California: Morgan & Claypool Publishers. [9] MISHRA, R.B., Artificial Intelligence PB. New Delhi: PHI Learning Private Limited. [10] CARL, M. AND WAY, A., Recent Advances in Example-based Machine Translation. Heidelberg: Springer. [11] UEFFING, N. AND HAFFARI, G. AND SARKAR, A., Semisupervised learning for Machine Translation. Machine Translation, 21 (2), [12] ARMSTRONG, A-W. AND ARMSTRONG, S., Using large corpora. Cambridge: MIT Press. [13] NAGAO, M., A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. Artificial and human intelligence. Elsevier Science Publishers, pp [14] Somers, H., An overview of EBMT. In: M. CARL AND A. WAY, ed. Recent Advances in Example-based Machine Translation. Heidelberg: Springer, [15] GUILLEMIN, D., The Syntax and s of a Determiner System: A Case Study of Mauritian Creole. Amsterdam: John Benjamins Publishing Company. [16] ADONE, D., The acquisition of Mauritian Creole. Amsterdam: John Benjamins Publishing Company. [17] DICK, J-Y. AND AH-VEE, A. AND COLLEN, L., A Few Points on Verbs in Mauritian Kreol and in Creole Languages [online]. Port- Louis, Ledikasyon pu Travayer. [18] PALMER, David D., A trainable rule-based algorithm for word segmentation. Proceedings of the 35th annual meeting on Association for Computational Linguistics, 7-12 July 2007, Madrid, Spain. 29 P a g e
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
TRANSLATION OF TELUGU-MARATHI AND VICE- VERSA USING RULE BASED MACHINE TRANSLATION
TRANSLATION OF TELUGU-MARATHI AND VICE- VERSA USING RULE BASED MACHINE TRANSLATION Dr. Siddhartha Ghosh 1, Sujata Thamke 2 and Kalyani U.R.S 3 1 Head of the Department of Computer Science & Engineering,
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
BILINGUAL TRANSLATION SYSTEM
BILINGUAL TRANSLATION SYSTEM (FOR ENGLISH AND TAMIL) Dr. S. Saraswathi Associate Professor M. Anusiya P. Kanivadhana S. Sathiya Abstract--- The project aims in developing Bilingual Translation System for
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Language and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University [email protected] http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
Semantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
How To Translate English To Yoruba Language To Yoranuva
International Journal of Language and Linguistics 2015; 3(3): 154-159 Published online May 11, 2015 (http://www.sciencepublishinggroup.com/j/ijll) doi: 10.11648/j.ijll.20150303.17 ISSN: 2330-0205 (Print);
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
A Machine Translation System Between a Pair of Closely Related Languages
A Machine Translation System Between a Pair of Closely Related Languages Kemal Altintas 1,3 1 Dept. of Computer Engineering Bilkent University Ankara, Turkey email:[email protected] Abstract Machine translation
GRAMMAR, SYNTAX, AND ENGLISH LANGUAGE LEARNERS
GRAMMAR, SYNTAX, AND ENGLISH LANGUAGE LEARNERS When it comes to grammar, many writing tutors and instructors are unsure of the most effective way to teach ESL students. And while numerous studies, articles
Paraphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining
Functional Modeling with Data Flow Diagrams
Functional Modeling with Data Flow Diagrams Amasi Elbakush 5771668 Teaching Assistant : Daniel Alami Utrecht University 1 Introduction Data Flow Diagrams (DFDs) are a visual representation of the flow
Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier
Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Application of Natural Language Interface to a Machine Translation Problem
Application of Natural Language Interface to a Machine Translation Problem Heidi M. Johnson Yukiko Sekine John S. White Martin Marietta Corporation Gil C. Kim Korean Advanced Institute of Science and Technology
Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning
Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many
ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION
ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION FLORENCE REEDER The MITRE Corporation 1 / George Mason University [email protected] ABSTRACT Developing a system that accurately produces
Pseudo code Tutorial and Exercises Teacher s Version
Pseudo code Tutorial and Exercises Teacher s Version Pseudo-code is an informal way to express the design of a computer program or an algorithm in 1.45. The aim is to get the idea quickly and also easy
National Quali cations SPECIMEN ONLY
N5 SQ40/N5/02 FOR OFFICIAL USE National Quali cations SPECIMEN ONLY Mark Urdu Writing Date Not applicable Duration 1 hour and 30 minutes *SQ40N502* Fill in these boxes and read what is printed below. Full
Rule based Sentence Simplification for English to Tamil Machine Translation System
Volume 25 No8, July 2011 Rule based Sentence Simplification for English to Tamil Machine Translation System Poornima C, Dhanalakshmi V Computational Engineering and Networking Amrita Vishwa Vidyapeetham
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System Thiruumeni P G, Anand Kumar M Computational Engineering & Networking, Amrita Vishwa Vidyapeetham, Coimbatore,
PROMT Technologies for Translation and Big Data
PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED
Comparative Analysis on the Armenian and Korean Languages
Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Cohesive writing 1. Conjunction: linking words What is cohesive writing?
Cohesive writing 1. Conjunction: linking words What is cohesive writing? Cohesive writing is writing which holds together well. It is easy to follow because it uses language effectively to guide the reader.
a Chinese-to-Spanish rule-based machine translation
Chinese-to-Spanish rule-based machine translation system Jordi Centelles 1 and Marta R. Costa-jussà 2 1 Centre de Tecnologies i Aplicacions del llenguatge i la Parla (TALP), Universitat Politècnica de
Web-based automatic translation: the Yandex.Translate API
Maarten van Hees [email protected] Web-based automatic translation: the Yandex.Translate API Paulina Kozłowska [email protected] Nana Tian [email protected] ABSTRACT Yandex.Translate
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Introduction. Philipp Koehn. 28 January 2016
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
MACHINE TRANSLATION FROM ENGLISH TO ARABIC
2011 International Conference on Biomedical Engineering and Technology IPCBEE vol.11 (2011) (2011) IACSIT Press, Singapore MACHINE TRANSLATION FROM ENGLISH TO ARABIC Mouiad Alawneh, Nazlia Omar and Tengku
Statistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
Classification of Natural Language Interfaces to Databases based on the Architectures
Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural
LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*
LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* Jonathan Yamron, James Baker, Paul Bamberg, Haakon Chevalier, Taiko Dietzel, John Elder, Frank Kampmann, Mark Mandel, Linda Manganaro, Todd Margolis,
The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)
The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit
stress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,
Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Chapter 10 Paraphrasing and Plagiarism
Source: Wallwork, Adrian. English for Writing Research Papers. New York: Springer, 2011. http://bit.ly/11frtfk Chapter 10 Paraphrasing and Plagiarism Why is this chapter important? Conventions regarding
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Report Writing: Editing the Writing in the Final Draft
Report Writing: Editing the Writing in the Final Draft 1. Organisation within each section of the report Check that you have used signposting to tell the reader how your text is structured At the beginning
A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students
69 A Survey of Online Tools Used in English-Thai and Thai-English Translation by Thai Students Sarathorn Munpru, Srinakharinwirot University, Thailand Pornpol Wuttikrikunlaya, Srinakharinwirot University,
SOUTH SEATTLE COMMUNITY COLLEGE (General Education) COURSE OUTLINE Revision: (Don Bissonnette and Kris Lysaker) July 2009
SOUTH SEATTLE COMMUNITY COLLEGE (General Education) COURSE OUTLINE Revision: (Don Bissonnette and Kris Lysaker) July 2009 DEPARTMENT: CURRICULLUM: COURSE TITLE: Basic and Transitional Studies English as
COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014
COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: [email protected] COURSE DESCRIPTION: This is a practical
The history of machine translation in a nutshell
1. Before the computer The history of machine translation in a nutshell 2. The pioneers, 1947-1954 John Hutchins [revised January 2014] It is possible to trace ideas about mechanizing translation processes
Albert Pye and Ravensmere Schools Grammar Curriculum
Albert Pye and Ravensmere Schools Grammar Curriculum Introduction The aim of our schools own grammar curriculum is to ensure that all relevant grammar content is introduced within the primary years in
Points of Interference in Learning English as a Second Language
Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Automated Online English -Arabic Translator
Faculty of Engineering & Architecture Electrical & Computer Engineering Department Final Year Project - Spring 2006 Automated Online English -Arabic Translator Supervisor: Prof. Ali El Hajj Report Grader:
Year 1 reading expectations (New Curriculum) Year 1 writing expectations (New Curriculum)
Year 1 reading expectations Year 1 writing expectations Responds speedily with the correct sound to graphemes (letters or groups of letters) for all 40+ phonemes, including, where applicable, alternative
ONLINE ENGLISH LANGUAGE RESOURCES
ONLINE ENGLISH LANGUAGE RESOURCES Developed and updated by C. Samuel for students taking courses at the English and French Language Centre, Faculty of Arts (Links live as at November 2, 2009) Dictionaries
Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Automation of Translation: Past, Presence, and Future Karl Heinz Freigang, Universität des Saarlandes, Saarbrücken
Automation of Translation: Past, Presence, and Future Karl Heinz Freigang, Universität des Saarlandes, Saarbrücken Introduction First attempts in "automating" the process of translation between natural
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
UNIVERSITÀ DEGLI STUDI DELL AQUILA CENTRO LINGUISTICO DI ATENEO
TESTING DI LINGUA INGLESE: PROGRAMMA DI TUTTI I LIVELLI - a.a. 2010/2011 Collaboratori e Esperti Linguistici di Lingua Inglese: Dott.ssa Fatima Bassi e-mail: [email protected] Dott.ssa Liliana
[Refer Slide Time: 05:10]
Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture
M LTO Multilingual On-Line Translation
O non multa, sed multum M LTO Multilingual On-Line Translation MOLTO Consortium FP7-247914 Project summary MOLTO s goal is to develop a set of tools for translating texts between multiple languages in
GMAT.cz www.gmat.cz [email protected]. GMAT.cz KET (Key English Test) Preparating Course Syllabus
Lesson Overview of Lesson Plan Numbers 1&2 Introduction to Cambridge KET Handing Over of GMAT.cz KET General Preparation Package Introduce Methodology for Vocabulary Log Introduce Methodology for Grammar
Compiler I: Syntax Analysis Human Thought
Course map Compiler I: Syntax Analysis Human Thought Abstract design Chapters 9, 12 H.L. Language & Operating Sys. Compiler Chapters 10-11 Virtual Machine Software hierarchy Translator Chapters 7-8 Assembly
1. Process Modeling. Process Modeling (Cont.) Content. Chapter 7 Structuring System Process Requirements
Content Chapter 7 Structuring System Process Requirements Understand the logical (&physical) process modeling by using data flow diagrams (DFDs) Draw DFDs & Leveling Balance higher-level and lower-level
English-Japanese machine translation
[Information processing: proceedings of the International Conference on Information Processing, Unesco, Paris 1959] 163 English-Japanese machine translation By S. Takahashi, H. Wada, R. Tadenuma and S.
Machine Translation Computer Aided Translation Machine Language Processing
Machine Translation Computer Aided Translation Machine Language Processing Martin Kappus ([email protected]) 1 Machine Translation Computer-Aided Translation Agenda Machine Translation Introduction History
READING THE NEWSPAPER
READING THE NEWSPAPER Outcome (lesson objective) Students will comprehend and critically evaluate text as they read to find the main idea. They will construct meaning as they analyze news articles and
How to become a successful language learner
How to become a successful language learner By Alison Fenner English and German Co-ordinator, Institution Wide Language Programme Introduction Your success in learning a language depends on you! You may
Writing Common Core KEY WORDS
Writing Common Core KEY WORDS An educator's guide to words frequently used in the Common Core State Standards, organized by grade level in order to show the progression of writing Common Core vocabulary
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Third Grade Language Arts Learning Targets - Common Core
Third Grade Language Arts Learning Targets - Common Core Strand Standard Statement Learning Target Reading: 1 I can ask and answer questions, using the text for support, to show my understanding. RL 1-1
NATURAL LANGUAGE TO SQL CONVERSION SYSTEM
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 2, Jun 2013, 161-166 TJPRC Pvt. Ltd. NATURAL LANGUAGE TO SQL CONVERSION
A Knowledge-based System for Translating FOL Formulas into NL Sentences
A Knowledge-based System for Translating FOL Formulas into NL Sentences Aikaterini Mpagouli, Ioannis Hatzilygeroudis University of Patras, School of Engineering Department of Computer Engineering & Informatics,
The Michigan State University - Certificate of English Language Proficiency (MSU- CELP)
The Michigan State University - Certificate of English Language Proficiency (MSU- CELP) The Certificate of English Language Proficiency Examination from Michigan State University is a four-section test
An English-to-Arabic Prototype Machine Translator for Statistical Sentences
Intelligent Information Management, 2012, 4, 13-22 http://dx.doi.org/10.4236/iim.2012.41003 Published Online January 2012 (http://www.scirp.org/journal/iim) An English-to-Arabic Prototype Machine Translator
NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR 1 Gauri Rao, 2 Chanchal Agarwal, 3 Snehal Chaudhry, 4 Nikita Kulkarni,, 5 Dr. S.H. Patil 1 Lecturer department o f Computer Engineering BVUCOE,
Difficulties that Arab Students Face in Learning English and the Importance of the Writing Skill Acquisition Key Words:
Difficulties that Arab Students Face in Learning English and the Importance of the Writing Skill Acquisition Key Words: Lexical field academic proficiency syntactic repertoire context lexical categories
Visionet IT Modernization Empowering Change
Visionet IT Modernization A Visionet Systems White Paper September 2009 Visionet Systems Inc. 3 Cedar Brook Dr. Cranbury, NJ 08512 Tel: 609 360-0501 Table of Contents 1 Executive Summary... 4 2 Introduction...
English Appendix 2: Vocabulary, grammar and punctuation
English Appendix 2: Vocabulary, grammar and punctuation The grammar of our first language is learnt naturally and implicitly through interactions with other speakers and from reading. Explicit knowledge
Understanding Data Flow Diagrams Donald S. Le Vie, Jr.
Understanding Flow Diagrams Donald S. Le Vie, Jr. flow diagrams (DFDs) reveal relationships among and between the various components in a program or system. DFDs are an important technique for modeling
REALIZATION SORTING ALGORITHM USING PARALLEL TECHNOLOGIES bachelor, Mikhelev Vladimir candidate of Science, prof., Sinyuk Vasily
2. В. Гергель: Современные языки и технологии параллельного программирования. М., Изд-во МГУ, 2012. 3. Синюк, В. Г. Алгоритмы и структуры данных. Белгород: Изд-во БГТУ им. В. Г. Шухова, 2013. REALIZATION
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle
SCIENTIFIC PAPERS, UNIVERSITY OF LATVIA, 2010. Vol. 757 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES 11 22 P. The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle Armands Šlihte Faculty
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms Mohammed M. Elsheh and Mick J. Ridley Abstract Automatic and dynamic generation of Web applications is the future
Database Design For Corpus Storage: The ET10-63 Data Model
January 1993 Database Design For Corpus Storage: The ET10-63 Data Model Tony McEnery & Béatrice Daille I. General Presentation Within the ET10-63 project, a French-English bilingual corpus of about 2 million
Discourse Markers in English Writing
Discourse Markers in English Writing Li FENG Abstract Many devices, such as reference, substitution, ellipsis, and discourse marker, contribute to a discourse s cohesion and coherence. This paper focuses
MORPHOLOGY BASED PROTOTYPE STATISTICAL MACHINE TRANSLATION SYSTEM FOR ENGLISH TO TAMIL LANGUAGE
MORPHOLOGY BASED PROTOTYPE STATISTICAL MACHINE TRANSLATION SYSTEM FOR ENGLISH TO TAMIL LANGUAGE A Thesis Submitted for the Degree of Doctor of Philosophy in the School of Engineering by ANAND KUMAR M CENTER
