Identifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families
|
|
- April Boone
- 7 years ago
- Views:
Transcription
1 Identifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families Zi Long Lijuan Dong Takehito Utsuro Tomoharu Mitsuhashi Mikio Yamamoto Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, , Japan Japan Patent Information Organization, 4-1-7, Toyo, Koto-ku, Tokyo, , Japan Abstract In the task of acquiring Japanese-Chinese technical term translation equivalent pairs from parallel patent documents, this paper considers situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents and studies the issue of identifying synonymous translation equivalent pairs. First, we collect candidates of synonymous translation equivalent pairs from parallel patent sentences. Then, we apply the Support Vector Machines (SVMs) to the task of identifying bilingual synonymous technical terms, and achieve the performance of over 85% precision and over 60% F-measure. We further examine two types of segmentation of Chinese sentences, i.e., by characters and by morphemes, and integrate those two types of segmentation in the form of the intersection of SVM judgments, which achieved over 90% precision. Keywords: synonymous technical terms, patent families, technical term translation 1. Introduction For both high quality machine and human translation, a large scale and high quality bilingual lexicon is the most important key resource. Since manual compilation of bilingual lexicon requires plenty of time and huge manual labor, in the research area of knowledge acquisition from natural language text, automatic bilingual lexicon compilation have been studied. Techniques invented so far include translation term pair acquisition based on statistical co-occurrence measure from parallel sentences (Matsumoto and Utsuro, 2000), translation term pair acquisition from comparable corpora (Fung and Yee, 1998), compositional translation generation based on an existing bilingual lexicon for human use (Tonoike et al., 2006), and translation term pair acquisition by collecting partially bilingual texts through the search engine (Huang et al., 2005). Among those efforts of acquiring bilingual lexicon from text, Morishita et al. (2008) studied to acquire Japanese- English technical term translation lexicon from phrase tables, which are trained by a phrase-based SMT model with parallel sentences automatically extracted from parallel patent documents. In more recent studies, they require the acquired technical term translation equivalents to be consistent with word alignment in parallel sentences and achieved 91.9% precision with almost 70% recall. Furthermore, based on the achievement above, Liang et al. (2011a) considered situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents. More specifically, in the task of acquiring Japanese-English technical term translation equivalent pairs, Liang et al. (2011a) studied the issue of identifying Japanese-English synonymous translation equivalent pairs. First, they collect candidates of synonymous translation equivalent pairs from parallel patent sentences. Then, they apply the Support Vector Machines (SVMs) (Vapnik, 1998) to the task of identifying bilingual synonymous technical terms. Based on the technique and the results of identifying Japanese-English synonymous translation equivalent pairs in Liang et al. (2011a), we aim at identifying Japanese- Chinese synonymous translation equivalent pairs from Japanese-Chinese patent families. We especially examine two types of segmentation of Chinese sentences, namely, by characters and by morphemes. Although both types of segmentation achieved almost similar performance around 95 97% (in recall / precision / f-measure) in the task of acquiring Japanese-Chinese technical term translation pairs, they have different types of errors. Also in the task of identifying Japanese-Chinese synonymous technical terms, both types of segmentation achieved almost similar performance, while they have different types of errors. Thus, we integrate those two types of segmentation in the form of the intersection of SVM judgments, and show that this achieves over 90% precision. 2. Japanese-Chinese Parallel Patent Documents Japanese-Chinese parallel patent documents are collected from the Japanese patent documents published by the Japanese Patent Office (JPO) in and the Chinese patent documents published by State Intellectual Property Office of the People s Republic of China (SIPO) in From them, we extract 312,492 patent families, and the method of Utiyama and Isahara (2007) is applied 1 to the text of those patent families, and Japanese and Chinese sentences are aligned. In this paper, we use 3.6M parallel patent sentences with the highest scores of sentence alignment. 3. Phrase Table of an SMT Model As a toolkit of a phrase-based SMT model, we use Moses (Koehn et al., 2007) and apply it to the whole 3.6M parallel patent sentences. Before applying Moses, Japanese sentences are segmented into a sequence of morphemes by the Japanese morphological analyzer MeCab 2 with the 1 Here, we used a Japanese-Chinese translation lexicon consisting of about 170,000 Chinese head words. 2
2 Figure 1: Developing a Reference Set of Bilingual Synonymous Technical Terms morpheme lexicon IPAdic 3. For Chinese sentences, we examine two types of segmentation, i.e., segmentation by characters 4 and segmentation by morphemes 5. As the result of applying Moses, we have a phrase table in the direction of Japanese to Chinese translation, and another one in the opposite direction of Chinese to Japanese translation. In the direction of Japanese to Chinese translation, we finally obtain 108M (Chinese sentences segmented by morphemes) / 274M (Chinese sentences segmented by characters) translation pairs with 75M / 197M unique Japanese phrases with Japanese to Chinese phrase translation probabilities P (p C p J ) of translating a Japanese phrase p J into a Chinese phrase p C. For each Japanese phrase, those multiple translation candidates in the phrase table are ranked in descending order of Japanese to Chinese phrase translation probabilities. In the similar way, in the phrase table in the opposite direction of Chinese to Japanese translation, for each Chinese phrase, multiple Japanese translation candidates are ranked in descending order of Chinese to Japanese phrase translation probabilities. Those two phrase tables are then referred to when identifying a bilingual technical term pair, given a parallel sen A consecutive sequence of numbers as well as a consecutive sequence of alphabetical characters are segmented into a token. 5 Chinese sentences are segmented into a sequence of morphemes by the Chinese morphological analyzer Stanford Word Segment (Tseng et al., 2005) trained with Chinese Penn Treebank. tence pair S J,S C and a Japanese technical term t J,or a Chinese technical term t C. In the direction of Japanese to Chinese, given a parallel sentence pair S J,S C containing a Japanese technical term t J, Chinese translation candidates collected from the Japanese to Chinese phrase table are matched against the Chinese sentence S C of the parallel sentence pair. Among those found in S C, ˆt C with the largest translation probability P (t C t J ) is selected and the bilingual technical term pair t J, ˆt C is identified. Similarly, in the opposite direction of Chinese to Japanese, given a parallel sentence pair S J,S C containing a Chinese technical term t C, the Chinese to Japanese phrase table is referred to when identifying a bilingual technical term pair. 4. Developing a Reference Set of Bilingual Synonymous Technical Terms When developing a reference set of bilingual synonymous technical terms (detailed procedure to be found in Liang et al. (2011a)), starting from a seed bilingual term pair s JC = s J,s C, we repeat the translation estimation procedure of the previous section six times and generate the set CBP(s J ) of candidates of bilingual synonymous technical term pairs. Figure 1 illustrates the whole procedure. Then, we manually divide the set CBP(s J ) into SBP(s JC ), those of which are synonymous with s JC, and the remaining NSBP(s JC ). As in Table 1, we collect 114 seeds, where the number of bilingual technical terms included in SBP(s JC ) in total for all of the 114 seed bilin-
3 Table 1: Number of Bilingual Technical Terms: Candidates and Reference of Synonyms Candidates of Synonyms s J CBP(s J ) Reference of Synonyms s JC SBP(s JC ) Candidates of Synonyms s J CBP(s J ) Reference of Synonyms s JC SBP(s JC ) (a) With the Phrase Table based on Chinese Sentences Segmented by Characters # of bilingual technical terms for the total 114 seeds in the set (a) in the set (a) 8, ,563 13, ,496 2, (b) With the Phrase Table based on Chinese Sentences Segmented by Morphemes # of bilingual technical terms for the total 114 seeds in the set (b) in the set (b) 14, ,948 14, ,604 2, average per seed average per seed gual technical term pairs is around 2,500 to 2,600, which amounts to around 22 per seed on average. It can be also seen from Table 1 that although about 90% of reference of synonymous technical terms are shared by the two types of segmentation (by characters and by morphemes), only about 40% to 50% of candidates of synonymous technical terms are shared by the two types of segmentation. 5. Identifying Bilingual Synonymous Technical Terms by Machine Learning In this section, we apply the SVMs to the task of identifying bilingual synonymous technical terms. In this paper, we model the task of identifying bilingual synonymous technical terms by the SVMs as that of judging whether or not the input bilingual term pair t J,t C is synonymous with the seed bilingual technical term pair s JC = s J,s C The Procedure First, let CBP be the union of the sets CBP(s J ) of candidates of bilingual synonymous technical term pairs for all of the 114 seed bilingual technical term pairs. In the training and testing of the classifier for identifying bilingual synonymous technical terms, we first divide the set of 114 seed bilingual technical term pairs into 10 subsets. Here, for each i-th subset (i =1,...,10), we construct the union CBP i of the sets CBP(s J ) of candidates of bilingual synonymous technical term pairs, where CBP 1,...,CBP 10 are 10 disjoint subsets 6 of CBP. 6 Here, we divide the set of 114 seed bilingual technical term pairs into 10 subsets so that the numbers of positive (i.e., syn- As a tool for learning SVMs, we use TinySVM ( chasen.org/ taku/software/tinysvm/). As the kernel function, we use the polynomial (1st order) kernel 7. In the testing of a SVMs classifier, we regard the distance from the separating hyperplane to each test instance as a confidence measure, and return test instances satisfying confidence measures over a certain lower bound only as positive samples (i.e., synonymous with the seed). In the training of SVMs, we use 8 subsets out of the whole 10 subsets CBP 1,...,CBP 10. Then, we tune the lower bound of the confidence measure with one of the remaining two subsets. With this subset, we also tune the parameter of TinySVM for trade-off between training error and margin. Finally, we test the trained classifier against another one of the remaining two subsets. We repeat this procedure of training / tuning / testing 10 times, and average the 10 results of test performance Features Table 2 lists all the features used for training and testing of SVMs for identifying bilingual synonymous technical terms. Features are roughly divided into two types: those of the first type f 1,...,f 6 simply represent various characteristics of the input bilingual technical term t J,t C, while those of the second type f 7,...,f 16 represent relation of the input bilingual technical term t J,t C and the onymous with the seed) / negative (i.e., not synonymous with the seed) samples in each CBP i (i = 1,...,10) are comparative among the 10 subsets. 7 We compare the performance of the 1st order and 2nd order kernels, where we have almost comparative performance.
4 Table 2: Features for Identifying Bilingual Synonymous Technical Terms by Machine Learning class features for bilingual technical terms t J,t C features for the relation of bilingual technical terms t J,t C and the seed s J,s C definition feature ( where X denotes J or C, and s J,s C denotes the seed bilingual technical term pair ) f 1 : frequency log of the frequency of t J,t C within the whole parallel patent sentences f 2 : rank of the Chinese term given t J, log of the rank of t C with respect to the descending order of the conditional translation probability P(t C t J ) f 3 : rank of the Japanese term given t C, log of the rank of t J with respect to the descending order of the conditional translation probability P(t J t C ) f 4 : number of Japanese characters number of characters in t J f 5 : number of Chinese charac- number of characters in t C f 6 : ters number of times generating translation by applying the phrase tables the number of times repeating the procedure of generating translation by applying the phrase tables until generating t C or t J from s J, as in s C t J t C,or,s J t C t J f 7 : identity of Japanese terms returns 1 when t J = s J f 8 : identity of Chinese terms returns 1 when t C = s C ED(tX,sX) f 9 : edit distance similarity of f 9 (t X,s X )=1 max( t X, s X ) (where ED is the edit distance monolingual terms of t X and s X, and t denotes the number of characters of t.) bigram(tx ) bigram(sx ) f 10 : character bigram similarity f 10 (t X,s X )= max( t X, s X ) 1 (where bigram(t) is of monolingual terms the set of character bigrams of the term t.) const(tj ) const(sj ) f 11 : rate of identical morphemes f 11 (t J,s J )= max( const(t J ), const(s J ) ) (where const(t) is the (for Japanese terms) set of morphemes in the Japanese term t.) const(t f 12 : rate of identical characters f 11 (t C,s C ) = C) const(s C ) max( const(t C), const(s C) ) (where const(t) is (for Chinese terms) the set of Characters in the Chinese term t.) f 13 : subsumption relation of returns 1 when the difference of t J and s J is only in their suffixes, strings / variants relation of or only whether or not having the prolonged sound, or only in surface forms (for Japanese their hiragana parts. terms ) f 14 : identical stem (for Chinese returns 1 when the difference of t C and s C is only whether or not terms) haing the word which is not the prefix or suffix. f 15 : f 16 : rate of intersection in translation by the phrase table translation by the phrase table f 15 (t X,s X )= ( where trans(t) is the set of translation of term t from the phrase table.) returns 1 when s J can be generated by translating t E with the phrase table, or, s E can be generated by translating t J with the phrase table. trans(tx) trans(sx ) max( trans(t X), trans(s X) ) seed bilingual technical term pair s JC = s J,s C. Among the features of the first type are the frequency (f 1 ), ranks of terms with respect to the conditional translation probabilities (f 2 and f 3 ), length of terms (f 4 and f 5 ), and the number of times repeating the procedure of generating translation with the phrase tables until generating input terms t J and t C from the Japanese seed term s J (f 6 ). Among the features of the second type are identity of monolingual terms (f 7 and f 8 ), edit distance of monolingual terms (f 9 ), character bigram similarity of monolingual terms (f 10 ), rate of identical morphemes (in Japanese, f 11 ) / characters (in Chinese, f 12 ), string subsumption and variants for Japanese (f 13 ), identical stem for Chinese (f 14 ), rate of intersection in translation by the phrase table (f 15 ), and translation by the phrase tables (f 16 ) Evaluation Results Table 3 shows the evaluation results for a baseline as well as for SVMs. As the baseline, we simply judge the input bilingual term pair t J,t C as synonymous with the seed bilingual technical term pair s JC = s J,s C when t J and s J are identical, or, t C and s C are identical. When training / testing a SVMs classifier, we tune the lower bound of the confidence measure of the distance from the separating hyperplane in two ways: i.e., for maximizing precision and for maximizing F-measure. When maximizing precision, we achieve almost 87% precision where F-measure is over 40%. When maximizing F-measure, we achieve over 60% F-measure with around 71% precision and over 52% recall. As shown in Figure 2, the two types of segmentation of Chinese sentences, namely, by characters and by morphemes, tend to have different types of errors. So, we integrate those two types of segmentation in the form of the intersection of
5 Table 3: Evaluation Results (%) segmented by characters segmented by morphemes intersection precision recall f-measure precision recall f-measure precision recall f-measure baseline (t J and s J are identical, or, t C and s C are identical.) SVM maximum precision maximum f-measure Figure 2: Evaluating Intersection of Judgments by SVM based on Character/Morpheme based Segmentation of Chinese Sentences SVM judgments, where, for both types of segmentation, we tune the lower bound of the confidence measure of the distance from the separating hyperplane. We maximize precision while keeping recall over 25% with held-out data, and this achieves over 90% precision as shown in Table Related Work Among related works on acquiring bilingual lexicon from text, Itagaki et al. (2007) focused on automatic validation of translation pairs available in the phrase table trained by an SMT model. Lu and Tsou (2009) and Yasuda and Sumita (2013) also studied to extract bilingual terms from comparable patents, where, they first extract parallel sentences from comparable patents, and then extract bilingual terms from parallel sentences. Those studies differ from this paper in that those studies did not address the issue of acquiring bilingual synonymous technical terms. Tsunakawa and Tsujii (2008) is mostly related to our study, in that they also proposed to apply machine learning technique to the task of identifying bilingual synonymous technical terms. However, Tsunakawa and Tsujii (2008) studied the issue of identifying bilingual synonymous technical terms only within manually compiled bilingual technical term lexicon and thus are quite limited in its applicability. Our approach, on the other hand, is quite advantageous in that we start from parallel patent documents which continue to be published every year and then, that we can generate candidates of bilingual synonymous technical terms automatically. Our study in this paper is also different from previous works on identifying synonyms based on bilingual and monolingual resources (e.g. Lin and Zhao (2003)) in that we learn bilingual synonymous technical terms from phrase tables of a phrase-based SMT model trained with very large parallel sentences. Also in the context of SMT between Japanese and Chinese, Sun and Lepage (2012) pointed out that character-based segmentation of sentences contributed to improving machine translation performance compared to morpheme-based segmentation of sentences. 7. Conclusion In the task of acquiring Japanese-Chinese technical term translation equivalent pairs from parallel patent documents, this paper considered situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents and studied the is-
6 sue of identifying synonymous translation equivalent pairs. We especially examined two types of segmentation of Chinese sentences, i.e., by characters and by morphemes, and integrated those two types of segmentation in the form of the intersection of SVM judgments, which achieved over 90% precision. One of the most important future works is definitely to improve recall. To do this, we plan to apply the semi-automatic framework (Liang et al., 2011b) which have been invented in the task of identifying Japanese- English synonymous translation equivalent pairs and have been proven to be effective in improving recall. We plan to examine whether this semi-automatic framework is also effective in the task of identifying Japanese-Chinese synonymous translation equivalent pairs. 8. References P. Fung and L. Y. Yee An IR approach for translating new words from nonparallel, comparable texts. In Proc. 17th COLING and 36th ACL, pages F. Huang, Y. Zhang, and S. Vogel Mining key phrase translations from Web corpora. In Proc. HLT/EMNLP, pages M. Itagaki, T. Aikawa, and X. He Automatic validation of terminology translation consistency with statistical method. In Proc. MT Summit XI, pages P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst Moses: Open source toolkit for statistical machine translation. In Proc. 45th ACL, Companion Volume, pages B. Liang, T. Utsuro, and M. Yamamoto. 2011a. Identifying bilingual synonymous technical terms from phrase tables and parallel patent sentences. Procedia - Social and Behavioral Sciences, 27: B. Liang, T. Utsuro, and M. Yamamoto. 2011b. Semiautomatic identification of bilingual synonymous technical terms from phrase tables and parallel patent sentences. In Proc. 25th PACLIC, pages D. Lin and S. Zhao Identifying synonyms among distributionally similar words. In Proc. 18th IJCAI, pages B. Lu and B. K. Tsou Towards bilingual term extraction in comparable patents. In Proc. 23rd PACLIC, pages Y. Matsumoto and T. Utsuro Lexical knowledge acquisition. In R. Dale, H. Moisl, and H. Somers, editors, Handbook of Natural Language Processing, chapter 24, pages Marcel Dekker Inc. Y. Morishita, T. Utsuro, and M. Yamamoto Integrating a phrase-based SMT model and a bilingual lexicon for human in semi-automatic acquisition of technical term translation lexicon. In Proc. 8th AMTA, pages J. Sun and Y. Lepage Can word segmentation be considered harmful for statistical machine translation tasks between Japanese and Chinese? In Proc. 26th PACLIC, pages M. Tonoike, M. Kida, T. Takagi, Y. Sasaki, T. Utsuro, and S. Sato A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the web. In Proc. 2nd Intl. Workshop on Web as Corpus, pages H. Tseng, P. Chang, G. Andrew, D. Jurafsky, and C. Manning A conditional random field word segmenter for Sighan bakeoff In Proc. 4th SIGHAN Workshop on Chinese Language Processing, pages T. Tsunakawa and J. Tsujii Bilingual synonym identification with spelling variations. In Proc. 3rd IJCNLP, pages M. Utiyama and H. Isahara A Japanese-English patent parallel corpus. In Proc. MT Summit XI, pages V. N. Vapnik Statistical Learning Theory. Wiley- Interscience. K. Yasuda and E. Sumita Building a bilingual dictionary from a Japanese-Chinese patent corpus. In Computational Linguistics and Intelligent Text Processing, volume 7817 of LNCS, pages Springer.
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More information7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationThe TCH Machine Translation System for IWSLT 2008
The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental
More informationHybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationMachine translation techniques for presentation of summaries
Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)
More information2-3 Automatic Construction Technology for Parallel Corpora
2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large
More informationAdapting General Models to Novel Project Ideas
The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe
More informationThe KIT Translation system for IWSLT 2010
The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationAdaptation to Hungarian, Swedish, and Spanish
www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,
More informationImproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
mproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora Rajdeep Gupta, Santanu Pal, Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University Kolata
More informationParallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems
Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Ergun Biçici Qun Liu Centre for Next Generation Localisation Centre for Next Generation Localisation School of Computing
More informationOn-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications
On-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications 12 12 2 3 Jordi Centelles ', Marta R. Costa-jussa ', Rafael E. Banchs, and Alexander Gelbukh Universitat Politecnica
More informationFactored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationTRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5
Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4
More informationUM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation Liang Tian 1, Derek F. Wong 1, Lidia S. Chao 1, Paulo Quaresma 2,3, Francisco Oliveira 1, Yi Lu 1, Shuo Li 1, Yiming
More informationUEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh e.hasler@ed.ac.uk Abstract We describe our systems for the SemEval
More informationCollaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin
More informationAn Empirical Study on Web Mining of Parallel Data
An Empirical Study on Web Mining of Parallel Data Gumwon Hong 1, Chi-Ho Li 2, Ming Zhou 2 and Hae-Chang Rim 1 1 Department of Computer Science & Engineering, Korea University {gwhong,rim}@nlp.korea.ac.kr
More informationBuilding a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
More informationAdaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation Mu Li Microsoft Research Asia muli@microsoft.com Dongdong Zhang Microsoft Research Asia dozhang@microsoft.com
More informationA New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationCross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
More informationHuman in the Loop Machine Translation of Medical Terminology
Human in the Loop Machine Translation of Medical Terminology by John J. Morgan ARL-MR-0743 April 2010 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings in this report
More informationSegmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan
More informationGetting Off to a Good Start: Best Practices for Terminology
Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationTranslating the Penn Treebank with an Interactive-Predictive MT System
IJCLA VOL. 2, NO. 1 2, JAN-DEC 2011, PP. 225 237 RECEIVED 31/10/10 ACCEPTED 26/11/10 FINAL 11/02/11 Translating the Penn Treebank with an Interactive-Predictive MT System MARTHA ALICIA ROCHA 1 AND JOAN
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no.
ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. 248347 Deliverable D5.4 Report on requirements, implementation
More informationAn Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationApplying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System
Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System Ki-Young Lee and Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
More informationConvergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past
More informationReliable and Cost-Effective PoS-Tagging
Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve
More informationPolish - English Statistical Machine Translation of Medical Texts.
Polish - English Statistical Machine Translation of Medical Texts. Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology kwolk@pjwstk.edu.pl Abstract.
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationChinese-Japanese Machine Translation Exploiting Chinese Characters
Chinese-Japanese Machine Translation Exploiting Chinese Characters CHENHUI CHU, TOSHIAKI NAKAZAWA, DAISUKE KAWAHARA, and SADAO KUROHASHI, Kyoto University The Chinese and Japanese languages share Chinese
More informationIdentifying Prepositional Phrases in Chinese Patent Texts with. Rule-based and CRF Methods
Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods Hongzheng Li and Yaohong Jin Institute of Chinese Information Processing, Beijing Normal University 19, Xinjiekou
More informationAn Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
More informationJane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e
More informationThe United Nations Parallel Corpus v1.0
The United Nations Parallel Corpus v1.0 Michał Ziemski, Marcin Junczys-Dowmunt, Bruno Pouliquen United Nations, DGACM, New York, United States of America Adam Mickiewicz University, Poznań, Poland World
More informationThe Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 7 16
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 21 7 16 A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context Mirko Plitt, François Masselot
More informationThe Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden oscar@sics.se
More informationEvaluation of Bayesian Spam Filter and SVM Spam Filter
Evaluation of Bayesian Spam Filter and SVM Spam Filter Ayahiko Niimi, Hirofumi Inomata, Masaki Miyamoto and Osamu Konishi School of Systems Information Science, Future University-Hakodate 116 2 Kamedanakano-cho,
More informationFeature Selection for Electronic Negotiation Texts
Feature Selection for Electronic Negotiation Texts Marina Sokolova, Vivi Nastase, Mohak Shah and Stan Szpakowicz School of Information Technology and Engineering, University of Ottawa, Ottawa ON, K1N 6N5,
More informationEnglish to Arabic Transliteration for Information Retrieval: A Statistical Approach
English to Arabic Transliteration for Information Retrieval: A Statistical Approach Nasreen AbdulJaleel and Leah S. Larkey Center for Intelligent Information Retrieval Computer Science, University of Massachusetts
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationRevisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora
Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora Audrey Laroche OLST Dép. de linguistique et de traduction Université de Montréal audrey.laroche@umontreal.ca
More informationT U R K A L A T O R 1
T U R K A L A T O R 1 A Suite of Tools for Augmenting English-to-Turkish Statistical Machine Translation by Gorkem Ozbek [gorkem@stanford.edu] Siddharth Jonathan [jonsid@stanford.edu] CS224N: Natural Language
More informationFast-Champollion: A Fast and Robust Sentence Alignment Algorithm
Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm Peng Li and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationTRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies
TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies François Yvon and Yong Xu and Marianna Apidianaki LIMSI, CNRS, Université Paris-Saclay 91 403 Orsay {yvon,yong,marianna}@limsi.fr
More informationCache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation
Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Nicola Bertoldi Mauro Cettolo Marcello Federico FBK - Fondazione Bruno Kessler via Sommarive 18 38123 Povo,
More informationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine Translation Eleftherios Avramidis e.avramidis@sms.ed.ac.uk Philipp Koehn pkoehn@inf.ed.ac.uk School of Informatics University of Edinburgh
More informationProtein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationA Survey on Product Aspect Ranking Techniques
A Survey on Product Aspect Ranking Techniques Ancy. J. S, Nisha. J.R P.G. Scholar, Dept. of C.S.E., Marian Engineering College, Kerala University, Trivandrum, India. Asst. Professor, Dept. of C.S.E., Marian
More informationEnriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach -
Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach - Philipp Sorg and Philipp Cimiano Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe, Germany {sorg,cimiano}@aifb.uni-karlsruhe.de
More informationExtraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
More informationStatistical Machine Translation prototype using UN parallel documents
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Statistical Machine Translation prototype using UN parallel documents Bruno Pouliquen, Christophe Mazenc World Intellectual Property
More informationThe University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation
The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics
More informationAn Iterative approach to extract dictionaries from Wikipedia for under-resourced languages
An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages Rohit Bharadwaj G SIEL, LTRC IIIT Hyd bharadwaj@research.iiit.ac.in Niket Tandon Databases and Information Systems
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationSEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
More informationA Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
More informationEnd-to-End Sentiment Analysis of Twitter Data
End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,
More informationFOR IMMEDIATE RELEASE
FOR IMMEDIATE RELEASE Hitachi Developed Basic Artificial Intelligence Technology that Enables Logical Dialogue Analyzes huge volumes of text data on issues under debate, and presents reasons and grounds
More informationHybrid Approach to English-Hindi Name Entity Transliteration
Hybrid Approach to English-Hindi Name Entity Transliteration Shruti Mathur Department of Computer Engineering Government Women Engineering College Ajmer, India shrutimathur19@gmail.com Varun Prakash Saxena
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationChapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
More informationAn Iterative Link-based Method for Parallel Web Page Mining
An Iterative Link-based Method for Parallel Web Page Mining Le Liu 1, Yu Hong 1, Jun Lu 2, Jun Lang 2, Heng Ji 3, Jianmin Yao 1 1 School of Computer Science & Technology, Soochow University, Suzhou, 215006,
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationStatistical Pattern-Based Machine Translation with Statistical French-English Machine Translation
Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation Jin'ichi Murakami, Takuya Nishimura, Masato Tokuhisa Tottori University, Japan Problems of Phrase-Based
More informationFactored Markov Translation with Robust Modeling
Factored Markov Translation with Robust Modeling Yang Feng Trevor Cohn Xinkai Du Information Sciences Institue Computing and Information Systems Computer Science Department The University of Melbourne
More informationHarvesting parallel text in multiple languages with limited supervision
Harvesting parallel text in multiple languages with limited supervision Luciano Barbosa 1 V ivek Kumar Rangarajan Sridhar 1 Mahsa Y armohammadi 2 Srinivas Bangalore 1 (1) AT&T Labs - Research, 180 Park
More informationLanguage Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationDomain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Domain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study Pavel Pecina 1, Antonio Toral 2, Vassilis
More informationPre-processing of Bilingual Corpora for Mandarin-English EBMT
Pre-processing of Bilingual Corpora for Mandarin-English EBMT Ying Zhang, Ralf Brown, Robert Frederking, Alon Lavie Language Technologies Institute, Carnegie Mellon University NSH, 5000 Forbes Ave. Pittsburgh,
More informationA Systematic Cross-Comparison of Sequence Classifiers
A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel grurgrur@gmail.com, feldman@cs.biu.ac.il,
More informationWeb based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection
Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292
More informationSentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:
More information