Improving the Accuracy of English-Arabic Statistical Sentence Alignment
|
|
|
- Delilah Wade
- 9 years ago
- Views:
Transcription
1 The International Arab Journal of Information Technology, Vol. 8, No. 2, April Improving the Accuracy of English-Arabic Statistical Sentence Alignment Mohammad Salameh 1, Rached Zantout 2, and Nashat Mansour 1 1 Department of Computer Science and Mathematics, Lebanese American University, Lebanon 2 College of Computer and Information Sciences, Prince Sultan University, Saudi Arabia Abstract: Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences. Keywords: Word alignment, sentence alignment, parallel corpora, and statistical natural language processing. Received January 15, 2009; accepted August 3, Introduction Computing research and applications for the Arabic language has recently increased. This is due mainly to the increase in the number of Arab internet users who do not master other languages [7]. Another reason for the interest in Arabic is non-arab (usually European and American) interest in Arab countries. Arabic is more difficult to treat compared to European languages mainly because of its rich morphology. There are many difficulties for machine treatment of Arabic text compared to treating English text [5]. First, a single word in Arabic can have many variations depending on the morphological variations that it can undertake. Like English, Arabic adds prefixes and suffixes to a word to form other variants. However, unlike English, Arabic can also add infixes to words. This makes algorithms for English morphological analysis not applicable to Arabic. Second, the prefixes and suffixes of a word may not always be attached to the other letters in the same word. Third, in Arabic a whole English sentence can be represented using one single word (e.g., "Should we then force it upon you" is translated to.("أفنلزمكموها" Fourth, unlike English, the prepositions and pronouns are not separate words. Prepositions and pronouns in Arabic are usually attached to the word (for example, the 2-word English "آتاب ه " word phrase "his book" is translated to a single in Arabic). Fifth, a single English word can have a multiword Arabic translation (e.g., بال سلاح م زود غي ر is translated to unarmed ). Sixth, spaces, in Arabic, might not separate two words from each other. For example, the conjunction و is usually written without a space between it and the next word. Seventh, the word order in an Arabic sentence is usually different from that in the English sentence. In fact, the word order might change in different Arabic translations of the same English sentence. For example, the sentence "The ذهب الولد الى " into boy went to school" can be translated الولد ذهب الى " to " or even الى المدرسة ذهب الولد" or "المدرسة Eighth, Arabic sentences might not contain."المدرس ة any verbs. Recently, Natural Language Processing (NLP) systems in general and Machine Translation (MT) systems in particular have improved their output quality by resorting to statistical techniques. A prerequisite for the use of any statistical technique is the availability of appropriate data. For example, Statistical Machine Translation (SMT) systems require the existence of a Language Model (LM) and a Translation Model (TM). Such models are usually built by automatic processing of multilingual corpora. Multilingual corpora are a large number of texts translated to all languages supported by the corpus. Before such corpora can be used, they have to be aligned so that a word in the text in one language is linked to the corresponding word in the text of the other language. Several systems exist that can do the alignment automatically by processing a large unaligned corpus. One such system is the GIZA++ system [13] that has been proven to work well for corpora of parallel European languages texts. In this paper, we review related literature in section 2. The results of trying to use GIZA++ directly on an English-Arabic corpus are reported in Section 3. These results show that GIZA++ does not work well when
2 172 The International Arab Journal of Information Technology, Vol. 8, No. 2, April 2011 raw Arabic text is used. The complexity of the Arabic language is proven to be an obstacle for statistical automatic sentence aligners like GIZA++. In Section 4, a solution is detailed that improves the performance of GIZA++. In this solution, the Arabic and English texts are preprocessed before being handled by GIZA++. Guidelines for preprocessing the Arabic part of the multilingual corpus are presented and proven to ameliorate the automatic alignment process. In Section 5, the paper is concluded with a summary of the achievements and suggestions for future research. 2. Better Alignment in Literature Alignment of multilingual corpora is usually divided up into sentence alignment and word alignment. Sentence alignment identifies correspondences between sentences in one language and sentences in another language. The basic units for segmenting parallel texts in sentence alignment are paragraphs and sentences. It is usually assumed that both parallel texts have the same number of paragraphs. Sentences inside paragraphs do not necessarily have a one-to-one alignment in a multilingual corpus. A sentence in a source language might be translated into 2 or more sentences in the target language. Similarly 2 or more source language sentences might be translated into a single target language sentences. Several clues in a text can aid in aligning parallel texts such as titles (including chapter, section, table and figure titles), cells inside a table, and items of an enumeration (items usually listed each on a separate line). Sentence alignment techniques vary from length based models to lexical based models or a combination. Length based techniques mainly rely on comparing the length of the sentences and therefore are considered knowledge poor [1]. Such approaches achieve high accuracy for many European languages [3]. Length based approaches that use the numbers of Characters in a sentence were proven to work better than those that use the number of words [2, 3]. Lexical based methods are considered knowledge-rich [10]. Anchor words are chosen from the pair and checked whether they correspond to each other. One of the methods relies on calculating the probability of word pairs by checking the frequency of words in the source sentence and trying to find how many words in the target sentence correspond to their translations. Another method is to find chains of corresponding vectors after mapping the text into a two dimensional space [10]. Combinations of length-based and lexical-based sentence alignment techniques exist [15]. String similarity measures and machine readable dictionaries can be used to enhance the sentence alignment. Word frequencies and occurrences are checked to achieve good results. K-vec is one of the approaches where sentence pairs are split into equal segments and parallel segments are checked if they have words with similar meanings [9]. Word alignment is the process of linking corresponding words and phrases in a parallel text. The aim is to extract the maximum number of corresponding sets from the parallel text that can be used in a desired application. Factors that affect word alignment are Structural, lexical, and grammatical differences between the languages; morphological differences between languages; and spelling errors inside the parallel text. Word alignment techniques are divided into two groups: the Association approaches and the Estimation approaches. Association approaches use heuristics to perform the word alignment [12] while Estimation approaches use statistical rules [9]. Preprocessing to ameliorate SMT quality has been proven to be effective for morphologically rich languages such as German [11], Spanish, Catalan, and Serbian [14], and Czech [4]. As far as Arabic is concerned [8] and [6] showed that morphological preprocessing helps only in the case of small corpora.using GIZA++ without preprocessing. 3. Properties of the Corpus Used The corpus used in this paper was collected from the UN documents and resolutions of the past ten years. It contains 2154 English-Arabic sentence pairs whose properties are shown in Table 1. Most of the used terms are related to the political and geographical context of the countries that the resolutions were addressing (Palestine, Lebanon, Syria, Iraq, Afghanistan, and African countries). The documents contain many political terms and named entities (names of places, political figures, months, and chemicals). In addition, the sentences in the documents contain many abbreviations for names of UN projects, offices and other entities that are translated into meaningful complete Arabic phrases in the Arabic version of the texts. Table 1. Propertries of used corpus. Number of Pairs 2154 Maximum Number of Words in an English Sentence 132 Maximum Number of Words in an Arabic Sentence 175 Maximum Ratio of English to Arabic Sentence 1.5 Minimum Ratio of English to Arabic Words Average Ratio of English to Arabic Words For example: "UNMOVIC" in English is translated to " و التفت يش والتحق ق للرص د المتح دة الام م "لجن ة in Arabic. In the corpus, a small number of the English sentences were translated to more than one Arabic sentence. Those sentences were either split if it was proper to do so or ignored. The sentences in the corpus were prepared for GIZA++ by transforming them into English-Arabic parallel sentences in an XML file. The
3 Improving the Accuracy of English-Arabic Statistical Sentence Alignment 173 XML file as shown in Figure 1 consists of <sentence> nodes. Each <sentence> node has 3 elements: <ID> the serial number of the English-Arabic sentence pair, <EN> the English sentence, <Ar> the Arabic translation of the English sentence. <sentence> <ID>45</ID> <En> Welcomes the continued contribution of UNIFIL to operational demining </En> <Ar> </Ar> رحب بالمساهمة المستمرة لقوة الا مم المتحدة المو قتة في لبنان في عمليات إزالة الا لغام </sentence> Figure 1. Example of a sentence node. The XML files were then processed by Giza++. In many sentence pairs, most of the English words were aligned to NULL, 36 out of 58 in the example in Figure % of the English words were aligned to NULL. Only 22.32% of the English words were aligned correctly. Only 10.72% of the Arabic words were aligned to NULL and 36.04% of the Arabic words were aligned correctly with their English counterparts. Going through the words that were not aligned properly many problems were discovered. The first problem was that many English words have more than one Arabic translation. GIZA++ treated each translation as a separate word even if the meaning of the word was still the same. The second problem was that the same Arabic word appeared in different forms inside the corpus. GIZA++ treated those occurrences as separate words and not multiple occurrences of the same word. The third problem was the existence of large (more than 100 words) sentences in both the English and Arabic texts. GIZA++ sets a limit of 100 words per sentence to process it correctly. The fourth problem was that most of the English stopwords (words that are used to divide long sentences into smaller chunks) have more than one translation in Arabic. In some of the cases, some of the English stopwords did not have a corresponding independent word in the Arabic text. Figure 3 shows that the stopword "which" has no corresponding independent word in the Arabic sentence. 4. Ameliorating GIZA++ Output In order to ameliorate GIZA++'s results, preprocessing was done on the Arabic and English texts. The only preprocessing on the English text was done by transforming all letters into lowercase. The first preprocessing step for Arabic text was to remove diacritics from the Arabic text. This was done by filtering all the kashida's, transforming all forms of the,"ا" to one standard form, the ("إ" and "أ") Arabic Alif و separating the Arabic coordinating conjunction from Arabic words by a space, separating the punctuation marks from the words around them and breaking up compound words in English and Arabic sentences into their constituent words. The second preprocessing step was to manually separate the prefixes and suffixes of each Arabic word from its body by spaces. This meant that now the prefix, the suffix and the body of one Arabic word are now three words instead of one. The previous preprocessing steps can be classified as character-level processing and word-level processing and were done in order to increase the cardinality of the Arabic root words in the corpus. The third preprocessing step can be classified as sentence-level which was to reduce the length of long sentences. This was done because GIZA++ produces better results with shorter sentences. Shorter sentences are produced by breaking up long sentences, in both Arabic and English texts, at the stopwords and punctuation marks. # Sentence pair (340) source length 58 target length 47 alignment score : e-73 وقد قررت المحكمة بما لا یدع مجالا للشك أن إسراي يل ملزمة بوضع حد لانتهاآاتها للقانون الدولي وبوقف تشييد الجدار الذي تبنيه في الا رض الفلسطينية المحتلة بما فيها داخل القدس الشرقية وما حولها وبتفكيك الهياآل المقامة في تلك المنطقة وبا لغاء أو إبطال جميع القوانين التشریعية والتنظيمية المتصلة به NULL ({ }) The ({ 1 }) Court ({ 2 }) has ({ }) determined ({ }) beyond ({ }) any ({ }) doubt ({ }) that ({ 9 }) Israel ({ 10 }) is ({ }) under ({ }) obligation ({ }) to ({ }) terminate ({ }) its ({ }) breaches ({ }) of ({ }) international ({ }) law, ({ }) to ({ }) cease ({ }) the ({ }) construction ({ }) of ({ }) the ({ }) wall ({ }) being ({ 20 }) built ({ 21 }) in ({ }) the ({ }) Occupied ({ 23 }) Palestinian ({ 24 }) Territory, ({ 25 }) including ({ }) in ({ }) and ({ }) around ({ 28 }) East ({ }) Jerusalem, ({ }) to ({ }) dismantle ({ }) the ({ }) structure ({ }) therein ({ }) situated ({ }) and ({ }) to ({ }) repeal ({ }) or ({ 40 }) render ({ }) ineffective ({ 41 }) all ({ 42 }) legislative ({ }) and ({ }) regulatory ({ }) acts ({ }) relating ({ }) thereto ({ 47 }) Figure 2. Example of many words aligned to NULL. <sentence> <ID>1901</ID> <En>The experts produced conflicting reports, however, which further compounded the stalemate</en> <Ar> ازمة ال تعميق الى ت ادى ة متناقض تقاریر وا اعد خبراء ال ان غير </Ar> </sentence> Figure 3. Missing stopwords. The prefixes in Arabic words that were identified in this paper were:,"ت","ي","و","س","ف","ب","ل" Prefixes: 1-letter."ا","ن"."بال","آال" Prefixes: 2-letter."ال","لل" Prefixes: 3-letter
4 174 The International Arab Journal of Information Technology, Vol. 8, No. 2, April 2011 The suffixes in Arabic words that were identified in this paper were separated into two level suffixes: 1st level suffixes:."ك ","ن","ا","ت","ه","ي","ة" Suffixes: 1-letter,"ه ا ","ی ا ","ن ا ","ه ن ","آ م ","ت ن " Suffixes: 2-letter."هم","ما","ني","آن","تم" 2nd level suffixes: if the root of the word is a verb "ا" or "ی ن ","ون","ان" then separate the suffix (either or ("وا" from the verb by a space. If the root of the word is a noun then separate the suffix (either ","ون","ان" "ی ن or ("ات" from the noun by a space. As a special case, some nouns end in Alef Al-Tanween, in this case remove Alef Al-Tanween. If the root of the word is a relative adjective or a "ة" noun in their feminine form then separate the from the Adjective by a space. As an example, after performing the split on the Arabic لكل إنسان الحق على قدم المساواة التامة like: words, a sentence مع الا خرین في أن تنظر قضيته أمام محكمة مستقلة نزیه ة نظ را ع ادلا علنيا للفصل في حقوقه والتزاماته وأیة تهمة جناي ية توجه إليه ل آل إنسان ال حق على قدم ال مساواة ال تام ة becomes مع ال ا خر ین ف ي ان تنظ ر ق ضية ه ام ام محكم ة م ستقل ة نزیه ة نظر عادل علني لل فصل في حقوق ه و التزام ات ه و اي ة تهمة جناي ي ة توجه إلى ه In the sentence level processing phase, three elements are added to each <sentence> node namely, <EnLen>, <ArLen>, <Ratio> as shown in Figure 4. The <EnLen> and <ArLen> fields contain the number of words in the English and Arabic sentences respectively. A "word" in this paper is the set of characters that are preceded and followed by a space. <Ratio> is the <EnLen> divided by <ArLen> and is used to decide whether the split position for a long sentence is good. <sentence> <ID>54</ID> <En>It should be noted that, during the second terrorist attack near the Canal Hotel on 22 September, two UNMOVIC local staff were injured</en> ا اص يب ق د ی ن محل ي ال لجن ة ال ی ن موظ ف م ن اثنين ان ملاحظة ال ب جدیر ال من و <Ar> </Ar> ایلول 22 في قناة ال فندق قرب وقع الذي ثاني ال ارهابي ال هجوم ال خلال <EnLen>25</EnLen> <ArLen>36</ArLen> <Ratio> </Ratio> </sentence> Figure 4. Updated sentence node. The choice of the stop words in English and Arabic languages is important. Three criteria were manually used to decide whether an English-Arabic pair is a stopword. First, a stopword should appear frequently in the text. Second, an English stopword should have an Arabic translation that is also an Arabic stopword. Third, splitting English and Arabic sentence pairs at their stopwords should produce sentence pairs that have a high correspondence between their words. After analyzing the commonly used English stopwords in the literature, several conclusions were attained. First, each English stop word has more than one translation in Arabic. For Example: "in order" is ". ذل ك ل " and "بغي ة " ", حت ى ","من اجل","آي translated to Second, many English stop words have the same translation in Arabic. For example, "within" and "during" are both translated to "."خ لال Third, the same Arabic stopword (especially prepositions) can be the translation of many English stopwords. Fourth, some Arabic stopwords may not have any corresponding words in the English translation. Fifth, it is always easier to search for the stopwords in the English sentence first and then search for the corresponding stopword in the Arabic sentence. Sixth, possessive pronouns always precede the noun in English while they appear as suffixes that are attached to the noun in Arabic. The preceding conclusions led to the exclusion of many English stopwords. Among the excluded stopwords are demonstrative pronouns (such as 'this', 'that', 'these', 'those'), indefinite Pronouns (such as 'anything', 'anybody', 'anyone', 'something', 'somebody', 'someone', 'nothing', 'nobody', 'none', 'no one'), possessive pronouns (such as 'mine', 'yours', 'his', 'hers', 'ours', 'theirs'), simple pronouns (such as 'I', 'you', 'he', 'she', 'it', 'we', 'they', 'me', 'him', 'her', 'us', 'them'), reflexive pronouns (such as 'self', 'myself', 'yourself', 'himself', 'herself', 'itself', 'oneself', 'ourselves', 'yourselves', 'themselves'), verb forms (such as 'am', 'are', 'is', 'was', 'were', 'be', 'being', 'been', 'has', 'have', 'had', 'having', 'can', 'could', 'want', 'wants', 'wanted', 'shall', 'should', 'will', 'would', 'may', 'might', 'must', 'ought', 'do', 'does', 'doing', 'did', 'done', 'make', 'makes', 'making'), coordinating conjunctions (such as 'and', 'but', 'or'), prepositions with very high frequency (such as 'under, 'from', 'in', 'by', 'for', 'upon', 'with', 'on' 'of', 'within', 'to' and at which can be translated to the 'ب','عل ى ','ال ى ','ف ي ','م ن ' prepositions: following arabic.('ل' and The stopwords that were used are the subordinating conjunctions shown in Table 2. Those stopwords achieve a fair split in English and Arabic sentences. A split is considered successful if the ratio of the positions of the English stopword and its Arabic translation (Arabic stopword) is close to the ratio of word counts in both sentences (within a certain threshold value of 0.21 which was determined experimentally as a good value). Aside from using stopwords, punctuation marks (especially the comma) were also used to breakup long sentences. The split according to commas uses a threshold value of 0.12 (that also was determined experimentally).
5 Improving the Accuracy of English-Arabic Statistical Sentence Alignment 175 Particularly Table 2. Stopwords used in the paper. بما في including لا سيما خاصة Between بين in order آي من اجل حتى بغية ذلك ل مما الذي التي which قبل Before Therefore Though If Because Whenever Within بالرغم على الرغم من although لذلك اذن حين فيما when رغم ان و ان في حين بينما في ما لا یزال while حتى لو اذا ما لم unless نتيجة نظر لان لكن but حيثما آلما متى خلال اثناء during خلال داخل تحت under عبر Across According When Especially Rather قبل سابق prior طبق وفق على النحو وفق عمل accordance عندما عند بعدم بيد لا بل اذ but also لا سيما خاصة بدل Table 3 shows the results of successful splits. We can conclude from Table 3 that all English sentences have been transformed into ones that have less than 100 words. Table 4 shows that unsuccessful splits happened only with English sentences that have less than 100 words. This means that all English sentences that have more than 100 words have been split successfully. Table 3. Successful splits. Statistics about Sentences after Successful Split Maximum Number of Words in an English Sentence 91 Maximum Number of Words in an Arabic Sentence 147 Maximum Ratio of English to Arabic Words 2.5 Minimum Ratio of English to Arabic Words 0.26 Average Ratio of English to Arabic Words The average ratio of English to Arabic words did not change drastically before and after the splits which means that the structure of the corpus was not changed by splitting. Around 13% of commas in English sentences either disappeared in Arabic translations or were replaced by conjunctions. After splitting, the 2154 pairs of English-Arabic sentences became GIZA++ was able to align 22.32% of the English words when the original text (without preprocessing) was used. With preprocessing, this percentage rose to about 45.72%. Before preprocessing 44.7% of the English words were not aligned to any Arabic words. This percentage dropped down to 20.9% after reprocessing. The words that were not correctly aligned even after preprocessing were the words that had a low frequency in our corpus. Table 5 gives an example of a sentence and its alignment before and after preprocessing. Before preprocessing, 11 English words were aligned correctly while after preprocessing, 23 English words were aligned correctly. Table 4. Unsucsessful splits. Statistics about Sentences that were not Split Maximum Number of Words in an English Sentence 92 Maximum Number of Words in an Arabic Sentence 155 Maximum Ratio of English to Arabic Words 1.5 Minimum Ratio of English to Arabic Words 0.29 Average Ratio of English to Arabic Words Conclusions and Future Work In this paper, the aim was to prepare the English- Arabic corpus for correct processing by the alignment software. This was done by increasing the frequency of Arabic words through character and word level processing. Another step was to decrease the length of English-Arabic sentence pairs by splitting them into smaller phrases using stopwords and commas. These preprocessing steps resulted in a 100% improvement in alignment and number of unaligned unaligned words. Many aspects of the work presented in this paper can be explored further. First, an automatic way for choosing the best stopwords from English and Arabic text should be devised. It is our belief that the stopwords will be function of the type of text and therefore stopwords used in literary text might be different from stopwords used in newspaper articles. Second, new ways for reducing the sizes of sentences should be developed. Currently, we rely on stopwords and commas. It remains to be seen whether including all punctuation marks and morphological information might ameliorate the splitting of long sentences. Third, Arabic particles and prepositions around the words should be used to improve the alignment rather than considering those particles and prepositions as independent entities. Fourth, dictionary entries and/or morphological information should be used to check the alignment produced by GIZA++ or to suggest alignment to GIZA++. A word pair that was already aligned in previous runs should be used to better the alignment of future runs. Fifth, automating the preprocessing steps described in this paper will enable the use of larger corpora that are commensurate with the corpora used for European languages.
6 176 The International Arab Journal of Information Technology, Vol. 8, No. 2, April 2011 Table 5. Improvements due to preprocessing. Without Preprocessing 1 # Sentence pair (340) source length 58 target length 47 alignment score : e-73 وقد قررت المحكمة بما لا یدع مجالا للشك أن إسراي يل ملزمة بوضع حد لانتهاآاتها للقانون الدولي وبوقف تشييد الجدار الذي تبنيه في الا رض الفلسطينية المحتلة بما فيها داخل القدس الشرقية وما حولها وبتفكيك الهياآل المقامة في تلك المنطقة وبا لغاء أو إبطال جميع القوانين التشریعية والتنظيمية المتصلة به NULL ({ }) The ({ 1 }) Court ({ 2 }) has ({ }) determined ({ }) beyond ({ }) any ({ }) doubt ({ }) that ({ 9 }) Israel ({ 10 }) is ({ }) under ({ }) obligation ({ }) to ({ }) terminate ({ }) its ({ }) breaches ({ }) of ({ }) international ({ }) law, ({ }) to ({ }) cease ({ }) the ({ }) construction ({ }) of ({ }) the ({ }) wall ({ }) being ({ 20 }) built ({ 21 }) in ({ }) the ({ }) Occupied ({ 23 }) Palestinian ({ 24 }) Territory, ({ 25 }) including ({ }) in ({ }) and ({ }) around ({ 28 }) East ({ }) Jerusalem, ({ }) to ({ }) dismantle ({ }) the ({ }) structure ({ }) therein ({ }) situated ({ }) and ({ }) to ({ }) repeal ({ }) or ({ 40 }) render ({ }) ineffective ({ 41 }) all ({ 42 }) legislative ({ }) and ({ }) regulatory ({ }) acts ({ }) relating ({ }) thereto ({ 47 }) # Sentence pair (746) source length 20 target length 27 alignment score : e-42 قد قرر ت ال محكمة بما لا یدع مجالا لل شك ان اسراي يل ملزم ة بوضع حد ل انتهاك ات ها لل قانون ال دولي NULL ({ }) the ({ }) court ({ 3 5 }) has ({ 1 }) determined ({ 2 }) beyond ({ 6 }) any ({ }) doubt ({ }) that ({ 13 }) israel ({ 14 }) is ({ }) under ({ }) obligation ({ 15 }) to ({ }) terminate ({ }) its ({ }) breaches ({ }) of ({ }) international ({ 26 }) law ({ 24 }), ({ 27 }) With Preprocessing # Sentence pair (749) source length 6 target length 12 alignment score : e-18 NULL ({ }) in ({ }) and ({ }) around ({ }) east ({ }) jerusalem ({ 4 6 }), ({ 12 }) ها داخل ال قدس ال شرقي ة و ما حول ها # Sentence pair (747) source length 15 target length 18 alignment score : e-23 و ب وقف تشييد ال جدار الذي تبنيه في ال ارض ال فلسطيني ة ال محتل ة NULL ({ }) to ({ }) cease ({ 3 }) the ({ }) construction ({ 4 6 }) of ({ }) the ({ }) wall ({ }) being ({ }) built ({ 7 8 }) in ({ 9 }) the ({ }) occupied ({ }) palestinian ({ }) territory ({ }), ({ 18 }) # Sentence pair (750) source length 19 target length 30 alignment score : e-38 و بتفكيك ال هياآل ال مقامة في تلك ال منطقة و ب الغاء او ابطال جميع ال قوانين ال تشریعي ة و التنظيمي ة ال متصل ة ب ه NULL ({ }) to ({ }) dismantle ({ }) the ({ }) structure ({ }) therein ({ }) situated ({ }) and ({ 12 }) to ({ }) repeal ({ }) or ({ 15 }) render ({ }) ineffective ({ }) all ({ 17 }) legislative ({ }) and ({ 23 }) regulatory ({ }) acts ({ }) relating ({ }) thereto ({ }) References [1] Aswani N. and Gaizaukas R., A Hybrid Approach to Align Sentence and Words in English Hindi Parallel Corpora, in Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp , [2] Brown P. and Lai J., Aligning Sentences in Parallel Corpora, in the Proceedings of 29 th Annual Meeting for ACL, pp , [3] Gal W. and Church K., A Program for Aligning Sentences in Bilingual Corpus, in the Proceedings of 29 th Annual Meeting of the ACL, pp , [4] Goldwater S. and McClosky D., Improving Statistical MT through Morphological Analysis, in Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP, pp , [5] Guessoum A. and Zantout R., Arabic Morphological Generation and its Impact on the Quality of Machine Translation to Arabic, Antal Vanden Bosch and Guenter Neumann, [6] Habash N. and Sadat F. Arabic Preprocessing Schemes for Statistical Machine Translation, in Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp , [7] Hamandi L., Damaj I., Zantout R., and Guessoum A., Parallelizing Arabic Morphological Analysis: Towards Faster Arabic Natural Language Processing Systems, in Proceedings of CIBITIC, Lebanon, pp , [8] Lee Y., Morphological Analysis for Statistical Machine Translation, in Proceedings of of the North American Chapter of ACL, pp , [9] Melamed D., A Geometric Approach to Mapping Bitext Correspondence, Conference of Empirical Methods in NLP, Philadelphia, pp. 1-12, [10] Melamed D., A Portable Algorithm for Mapping Bitext Correspondence, in Proceedings of 35 th Conference of Association for Computing Linguistics, Spain, pp , [11] Nießen S. and Ney H., Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information, Computer Journal of Computational Linguistics, vol. 30, no. 2, pp , [12] Osh F. and Ney H., A Systematic Comparison of Various Statistical Alignment Models, Computer Journal of Computational Linguistics, vol. 29, no. 2, pp , [13] Osh J., GIZA++: Training of Statistical translation models, Last Visited 2008.
7 Improving the Accuracy of English-Arabic Statistical Sentence Alignment 177 [14] Popovi c M. and Ney H. Towards the Use of Word Stems and Suffixes for Statistical Machine Translation, in Proceedings of the Conference on Language Resources and Evaluation, pp , [15] Simard M. and Foster G., Using Cognates to Align Sentences in Bilingual Corpora, in the Proceedings of 4 th International Conference on Theoretical and Methodological Issues in Machine Translation, Canada, pp , Mohamad Salameh received his MS and BS degrees in computer science from the Lebanese American University. He is currently working at exceed it services as a consultant for Microsoft share point server 2007 and project server His research interests are in natural language processing for Arabic language and computer graphics. Nashat Mansour is a professor of computer science at the Lebanese American University. He received BE and M Eng Sc degrees in electrical engineering from the University of New South Wales, and MS in computer engineering and PhD in computer science from Syracuse University. His research interests include software testing, application of evolutionary algorithms to real-world problems, protein structure prediction, and Arabicrelated computing. Rached Zantout received his BE from the American University of Beirut, Lebanon in 1988, his MSc from the University of Florida in 1990, and PhD from the Ohio State University in 1994, all degrees being in electrical engineering. He was a research associate and teaching associate for most of his graduate studies. Directly after finishing his PhD, he joined Scriptel Corporation and worked on several R&D projects to develop a new generation of graphic input devices. Between 1995 and 2000, he was an assistant professor at King Saud University in Riyadh (Saudi Arabia). Then he moved to Lebanon and taught at reputed Lebanese universities like the University of Balamand, American University of Beirut, Lebanese American University, Beirut Arab University, Hariri Canadian University. He is currently an associate professor at the Prince Sultan University. His research interests cover robotics, artificial intelligence, and natural language processing. He currently works on developing components for machine translation and natural language processing with a special focus on tools related to the Arabic Language. He also has active research in the area of autonomous robot navigation, computer vision and digital image processing.
8 The International Arab Journal of Information Technology, Vol. 8, No. 1, January
9 The International Arab Journal of Information Technology, Vol. 8, No. 1, January
Overview. Map of the Arabic Languagge. Some tips. Need to over-learn to reach a stage of unconscious competence. Recap Parts of Speech
Map of the Arabic Languagge Recap Parts of Speech Overview in Understanding the Qur an Start-up Initial growth Rapid growth Continuous growth Some tips Don t take furious notes Videos will be downloadable
ﺮﺋﺎ ﻤﱠﻀﻟا The Arabic Pronouns
ر الض م اي ر The Arabic Pronouns (The Arabic Pronouns) Designed and compiled by the one in need of Allah s pardon Aboo Imraan Abdus-Saboor bin Tomas Maldonado al-mekseekee -may Allah forgive him, his family,
THERE ARE SEVERAL KINDS OF PRONOUNS:
PRONOUNS WHAT IS A PRONOUN? A Pronoun is a word used in place of a noun or of more than one noun. Example: The high school graduate accepted the diploma proudly. She had worked hard for it. The pronoun
TEACHING ARABIC AS A SECOND LANGUAGE
بسم االله الرحمن الرحيم TEACHING ARABIC AS A SECOND LANGUAGE و م ن ا ي ات ه خ ل ق ال سم او ات و ال ا ر ض و اخ ت ل اف أ ل س ن ت ك م و أ ل و ان ك م إ ن ف ي ذ ل ك ل ا ي ات لل ع ال م ين 22 الروم 30 And among
Prayer Book for. Children &
Prayer Book for Muslim Children & New Muslims s (Prayer Book for Muslim Children & New Muslims) Designed and compiled by the one in need of Allah s pardon Aboo Imraan Abdus-Saboor bin Tomas Maldonado al-mekseekee
Why Won t you Think?
Why Won t you Think? www.detailedquran.com إ ن ش ر الد و اب ع ند الل ه الص م ال ب ك م ال ذ ين ل ي ع ق ل ون Truly, the worst of all creatures in the sight of Allah are the deaf, the dumb, those who do not
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON Essam S. Hanandeh, Department of Computer Information System, Zarqa University, Zarqa, Jordan [email protected] ABSTRACT The massive
Writing Common Core KEY WORDS
Writing Common Core KEY WORDS An educator's guide to words frequently used in the Common Core State Standards, organized by grade level in order to show the progression of writing Common Core vocabulary
Islamic Studies. Student Workbook. Level 1. Name. 2 Dudley Street Cheetham Hill Manchester, M89DA. manchestersalafischool.
Islamic Studies Level 1 Student Workbook Name 2 Dudley Street Cheetham Hill Manchester, M89DA manchestersalafischool.com 0161 317 1481 Islamic Studies Level 1 Student Workbook Overview Coming to the Masjid
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
English to Arabic Transliteration for Information Retrieval: A Statistical Approach
English to Arabic Transliteration for Information Retrieval: A Statistical Approach Nasreen AbdulJaleel and Leah S. Larkey Center for Intelligent Information Retrieval Computer Science, University of Massachusetts
ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Determine two or more main ideas of a text and use details from the text to support the answer
Strand: Reading Nonfiction Topic (INCCR): Main Idea 5.RN.2.2 In addition to, in-depth inferences and applications that go beyond 3.5 In addition to score performance, in-depth inferences and applications
. / # 0 %! 1 2 3 ( 3 3 1 3 4 5 # ( )
! # %!& ( )! +,! &. / # 0 %! 1 2 3 ( 3 3 1 3 4 5 # ( ) # 6. 17 ( ( 3 3 1 3 4 10 ( ) 7 5 # ( ) # 6.! 8 7 5 9 A Proposed Model for Quranic Arabic WordNet Manal AlMaayah 1, Majdi Sawalha 2, Mohammad A. M.
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Author Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
Data Pre-Processing in Spam Detection
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
Interpreting areading Scaled Scores for Instruction
Interpreting areading Scaled Scores for Instruction Individual scaled scores do not have natural meaning associated to them. The descriptions below provide information for how each scaled score range should
Recite with the Heart
Recite with the Heart We have now completed al-fatihah and move on to the Quran recitation after it. Did you ever notice that any Quran recited in Salah is always recited when we are in the standing position?
Understanding Clauses and How to Connect Them to Avoid Fragments, Comma Splices, and Fused Sentences A Grammar Help Handout by Abbie Potter Henry
Independent Clauses An independent clause (IC) contains at least one subject and one verb and can stand by itself as a simple sentence. Here are examples of independent clauses. Because these sentences
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6
Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6 4 I. READING AND LITERATURE A. Word Recognition, Analysis, and Fluency The student
Discovering suffixes: A Case Study for Marathi Language
Discovering suffixes: A Case Study for Marathi Language Mudassar M. Majgaonker Comviva Technologies Limited Gurgaon, India Abstract Suffix stripping is a pre-processing step required in a number of natural
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Grammar Unit: Pronouns
Name: Miss Phillips Period: Grammar Unit: Pronouns Unit Objectives: 1. Students will identify personal, indefinite, and possessive pronouns and recognize antecedents of pronouns. 2. Students will demonstrate
OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC
OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC We ll look at these questions. Why does translation cost so much? Why is it hard to keep content consistent? Why is it hard for an organization
How To Write A Dissertation
FORMAT GUIDELINES FOR DOCTORAL DISSERTATIONS Northwestern University The Graduate School Last revised 1/23/2015 Formatting questions not addressed in this document should be directed to Student Services,
ESL 005 Advanced Grammar and Paragraph Writing
ESL 005 Advanced Grammar and Paragraph Writing Professor, Julie Craven M/Th: 7:30-11:15 Phone: (760) 355-5750 Units 5 Email: [email protected] Code: 30023 Office: 2786 Room: 201 Course Description:
24. PARAPHRASING COMPLEX STATEMENTS
Chapter 4: Translations in Sentential Logic 125 24. PARAPHRASING COMPLEX STATEMENTS As noted earlier, compound statements may be built up from statements which are themselves compound statements. There
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Paraphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining
Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)
Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger European Commission Joint Research Centre (JRC) https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems
The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)
The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit
Pronouns. Their different types and roles. Devised by Jo Killmister, Skills Enhancement Program, Newcastle Business School
Pronouns Their different types and roles Definition and role of pronouns Definition of a pronoun: a pronoun is a word that replaces a noun or noun phrase. If we only used nouns to refer to people, animals
Arabic Grammar Rules for Madeenah Book One
ب س م ا ه لل الر ح من الر ح ي م Arabic Grammar Rules for Madeenah Book One The three vowel markings ال ح ر ك ات الث الث ة kasrah - - ض م ة mmmhddh ف ت ح ة - fathah ك س ر ة (i) (u) (a) Sukoon س ك و ن -
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
TERMS. Parts of Speech
TERMS Parts of Speech Noun: a word that names a person, place, thing, quality, or idea (examples: Maggie, Alabama, clarinet, satisfaction, socialism). Pronoun: a word used in place of a noun (examples:
An Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
Comparative Analysis on the Armenian and Korean Languages
Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State
PUSD High Frequency Word List
PUSD High Frequency Word List For Reading and Spelling Grades K-5 High Frequency or instant words are important because: 1. You can t read a sentence or a paragraph without knowing at least the most common.
Proofreading and Editing:
Proofreading and Editing: How to proofread and edit your way to a perfect paper What is Proofreading? The final step in the revision process The focus is on surface errors: You are looking for errors in
Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Details Craft and Structure RL.3.1 Ask and answer questions to demonstrate understanding of a text, referring explicitly to the text as the basis for the answers.
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Index. 344 Grammar and Language Workbook, Grade 8
Index Index 343 Index A A, an (usage), 8, 123 A, an, the (articles), 8, 123 diagraming, 205 Abbreviations, correct use of, 18 19, 273 Abstract nouns, defined, 4, 63 Accept, except, 12, 227 Action verbs,
stress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,
Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
Year 1 reading expectations (New Curriculum) Year 1 writing expectations (New Curriculum)
Year 1 reading expectations Year 1 writing expectations Responds speedily with the correct sound to graphemes (letters or groups of letters) for all 40+ phonemes, including, where applicable, alternative
Chapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
Albert Pye and Ravensmere Schools Grammar Curriculum
Albert Pye and Ravensmere Schools Grammar Curriculum Introduction The aim of our schools own grammar curriculum is to ensure that all relevant grammar content is introduced within the primary years in
Report Writing: Editing the Writing in the Final Draft
Report Writing: Editing the Writing in the Final Draft 1. Organisation within each section of the report Check that you have used signposting to tell the reader how your text is structured At the beginning
Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning
Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many
Mobile Robot FastSLAM with Xbox Kinect
Mobile Robot FastSLAM with Xbox Kinect Design Team Taylor Apgar, Sean Suri, Xiangdong Xi Design Advisor Prof. Greg Kowalski Abstract Mapping is an interesting and difficult problem in robotics. In order
Patent Big Data Analysis by R Data Language for Technology Management
, pp. 69-78 http://dx.doi.org/10.14257/ijseia.2016.10.1.08 Patent Big Data Analysis by R Data Language for Technology Management Sunghae Jun * Department of Statistics, Cheongju University, 360-764, Korea
Texas Success Initiative (TSI) Assessment. Interpreting Your Score
Texas Success Initiative (TSI) Assessment Interpreting Your Score 1 Congratulations on taking the TSI Assessment! The TSI Assessment measures your strengths and weaknesses in mathematics and statistics,
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System
An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System Thiruumeni P G, Anand Kumar M Computational Engineering & Networking, Amrita Vishwa Vidyapeetham, Coimbatore,
Grade 4 Writing Assessment. Eligible Texas Essential Knowledge and Skills
Grade 4 Writing Assessment Eligible Texas Essential Knowledge and Skills STAAR Grade 4 Writing Assessment Reporting Category 1: Composition The student will demonstrate an ability to compose a variety
Third Grade Language Arts Learning Targets - Common Core
Third Grade Language Arts Learning Targets - Common Core Strand Standard Statement Learning Target Reading: 1 I can ask and answer questions, using the text for support, to show my understanding. RL 1-1
Information extraction from texts. Technical and business challenges
Information extraction from texts Technical and business challenges Overview Mentis Text mining field overview Application: Information Extraction Motivation & Overview Page 2 Mentis - Overview Consulting
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE
SPAN 100/101 ELEMENTARY SPANISH COURSE OBJECTIVES This Spanish course pays equal attention to developing all four language skills (listening, speaking, reading, and writing), with a special emphasis on
Pupil SPAG Card 1. Terminology for pupils. I Can Date Word
Pupil SPAG Card 1 1 I know about regular plural noun endings s or es and what they mean (for example, dog, dogs; wish, wishes) 2 I know the regular endings that can be added to verbs (e.g. helping, helped,
Language Arts Literacy Areas of Focus: Grade 5
Language Arts Literacy : Grade 5 Mission: Learning to read, write, speak, listen, and view critically, strategically and creatively enables students to discover personal and shared meaning throughout their
3rd Grade - ELA Writing
3rd Grade - ELA Text Types and Purposes College & Career Readiness 1. Opinion Write arguments to support claims in an analysis of substantive topics or texts, using valid reasoning and relevant and sufficient
Braille: Deciphering the Code Adapted from American Foundation for the Blind http://www.afb.org/braillebug/braille_deciphering.asp
Braille: Deciphering the Code Adapted from American Foundation for the Blind http://www.afb.org/braillebug/braille_deciphering.asp People often think that Braille is a language. Actually there is a Braille
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Genetic Algorithm-based Multi-Word Automatic Language Translation
Recent Advances in Intelligent Information Systems ISBN 978-83-60434-59-8, pages 751 760 Genetic Algorithm-based Multi-Word Automatic Language Translation Ali Zogheib IT-Universitetet i Goteborg - Department
PROMT Technologies for Translation and Big Data
PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED
KINDGERGARTEN. Listen to a story for a particular reason
KINDGERGARTEN READING FOUNDATIONAL SKILLS Print Concepts Follow words from left to right in a text Follow words from top to bottom in a text Know when to turn the page in a book Show spaces between words
Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm
Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm Peng Li and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
Nouns may show possession or ownership. Use an apostrophe with a noun to show something belongs to someone or to something.
Nouns Section 1.4 Possessive Nouns Nouns may show possession or ownership. Use an apostrophe with a noun to show something belongs to someone or to something. Jane s dress is red. The table s legs were
Computer-aided Document Indexing System
Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous
Performance Indicators-Language Arts Reading and Writing 3 rd Grade
Learning Standards 1 st Narrative Performance Indicators 2 nd Informational 3 rd Persuasive 4 th Response to Lit Possible Evidence Fluency, Vocabulary, and Comprehension Reads orally with Applies letter-sound
Glossary of literacy terms
Glossary of literacy terms These terms are used in literacy. You can use them as part of your preparation for the literacy professional skills test. You will not be assessed on definitions of terms during
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Grade 6: Module 1: Unit 2: Lesson 19 Peer Critique and Pronoun Mini-Lesson: Revising Draft Literary Analysis
Grade 6: Module 1: Unit 2: Lesson 19 Revising Draft Literary Analysis This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Exempt third-party content
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
English Appendix 2: Vocabulary, grammar and punctuation
English Appendix 2: Vocabulary, grammar and punctuation The grammar of our first language is learnt naturally and implicitly through interactions with other speakers and from reading. Explicit knowledge
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Ask your teacher about any which you aren t sure of, especially any differences.
Punctuation in Academic Writing Academic punctuation presentation/ Defining your terms practice Choose one of the things below and work together to describe its form and uses in as much detail as possible,
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
