Human in the Loop Machine Translation of Medical Terminology
|
|
|
- Kristian Perkins
- 10 years ago
- Views:
Transcription
1 Human in the Loop Machine Translation of Medical Terminology by John J. Morgan ARL-MR-0743 April 2010 Approved for public release; distribution unlimited.
2 NOTICES Disclaimers The findings in this report are not to be construed as an official Department of the Army position unless so designated by other authorized documents. Citation of manufacturer s or trade names does not constitute an official endorsement or approval of the use thereof. Destroy this report when it is no longer needed. Do not return it to the originator.
3 Army Research Laboratory Adelphi, MD ARL-MR-0743 April 2010 Human in the Loop Machine Translation of Medical Terminology John J. Morgan Computational and Information Sciences Directorate, ARL Approved for public release; distribution unlimited.
4 REPORT DOCUMENTATION PAGE Form Approved OMB No Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing the burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports ( ), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DATE (DD-MM-YYYY) April REPORT TYPE Interim 4. TITLE AND SUBTITLE Human in the Loop Machine Translation of Medical Terminology 3. DATES COVERED (From - To) 1 st quarter FY10 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) John J. Morgan 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) U.S. Army Research Laboratory ATTN: RDRL-CII-T 2800 Powder Mill Road Adelphi, MD PERFORMING ORGANIZATION REPORT NUMBER ARL-MR SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S) 11. SPONSOR/MONITOR'S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited. 13. SUPPLEMENTARY NOTES 14. ABSTRACT This memorandum report outlines the Human in the Loop Translation process, which strives to increase translators productivity using statistical machine translation (SMT). In this process, U.S. Army Research Laboratory (ARL) systems trained by open source SMT toolkits were placed in a feedback loop with human translators in Afghanistan to incrementally improve Dari translations of English medical training manuals and to create bilingual glossaries in the medical domain. Anecdotal evidence indicates that the quality of the machine translation drafts produced using this feedback loop process is high enough to assist human translators in translating training manuals in the medical domain. Automatic scores showed large improvements in translation quality as the loop progressed for a very narrow task, providing indirect evidence that SMT can benefit human translators even with small amounts of training data. 15. SUBJECT TERMS Statistical machine translation, medical terminologies, human in the loop 16. SECURITY CLASSIFICATION OF: a. REPORT Unclassified b. ABSTRACT Unclassified c. THIS PAGE Unclassified 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF PAGES 20 19a. NAME OF RESPONSIBLE PERSON John J. Morgan 19b. TELEPHONE NUMBER (Include area code) (301) Standard Form 298 (Rev. 8/98) Prescribed by ANSI Std. Z39.18 ii
5 Contents List of Tables Acknowledgments iv v 1. Introduction 1 2. Background 1 3. Statistical Machine Translation (SMT) Systems 3 4. Human in the Loop Translation Process Medical Language System (UMLS) Tools Training Moses and JosHUa Results and Discussion 6 6. Conclusions 7 7. References 8 List of Symbols, Abbreviations and Acronyms 10 Distribution List 11 iii
6 List of Tables Table 1. BLEU scores for systems trained incrementally on chapters of the FCCS text....7 iv
7 Acknowledgments The Human in the Loop project was conceived and directed by Multilingual Computing Branch (MLCB) member Dr. Stephen A. LaRocca. I would like to thank Dr. LaRocca for his guidance and I would also like to thank Mr. Hazrat Ghulam Jahed for his work translating and editing. v
8 INTENTIONALLY LEFT BLANK. vi
9 1. Introduction The U.S. Army Research Laboratory (ARL) is looking for ways to support the mission of transitioning control of the war in Afghanistan from the United States and the North Atlantic Treaty Organization (NATO) to the Afghan government. In his speech at West Point, NY, on December 1, 2009, President Obama said, We must strengthen the capacity of Afghanistan s security forces and government so that they can take lead responsibility for Afghanistan s future. This transition of control depends on the success of training the Afghan military. Success in training relies on effective communication between forces that speak different languages, which, in turn, relies on high-quality translation services. Although translators can produce high-quality work on their own, state-of-the-art machine translation technologies could provide translators many benefits, among them the ability to coordinate their efforts to archive and reuse translations. This report describes how ARL s Multilingual Computing Branch (MLCB) is supporting the training mission in Afghanistan through the use of language translation technology: 1. The MLCB is working on the translation of a key training text. The techniques used in this translation can be used as a template for future translations of similar texts. 2. Corpora of in-domain texts are being archived and prepared for future processing by the MLCB and other research organizations including Afghan organizations. 3. Terminologies in the Afghan languages, Dari and Pashto, are being extracted and standardized. 4. The natural language processing (NLP) tools being used in this project are open source and freely available for use by Afghans in the future. This report discusses the translation of one specific English document (a medical training manual) into Dari and describes a method for using machine-translated text with the aim of helping translators in the future. 2. Background Currently, the U.S. Army is training soldiers in the Afghan National Army (ANA), including soldiers who aspire to become medical doctors and nurses. The trainers are using the same manuals, originally written in English, which are used to train U.S. Soldiers. The ANA would like to have these manuals available in the trainees native languages, which, in the case of many of them, is Dari or Pashto. In order for the training projects to be sustained by Afghans in the 1
10 future, these training manuals will need to be available in the languages of the Afghan people. Additionally, methods for translating new training documents should also be available. In fall 2008, the MLCB received a request for help from a military medical doctor working with an Embedded Training Team (ETT) as part of the Combined Security Transition Command- Afghanistan (CSTC-A). The physician needed a Dari translation of an English-language medical training manual, Fundamental Critical Care Support (FCCS), published by the Society for Critical Care Medicine (SCCM) and used as a textbook for training doctors and nurses for work in Intensive Care Units (ICUs). Knowing that the MLCB worked on language technology, the military doctor wanted to know if machine translation could help with creating such a translation. Of interest was whether a machine translation of a medical document from English into Dari could assist human translators in translating the same document. Human translators on the ETT can translate ~3 pages per day. Machine translation speed depends on computer hardware, but it is reasonable to expect that machine translation systems on a current desktop computer can translate 1 sentence per second. Thus, machine translation could produce a human translator s daily output in 1 min. However, even the best machine translations are usually not acceptable to human readers. One can only expect a rough draft translation from a computer-generated translation. Current state-of-the-art machine translation systems rely on large parallel corpora (databases of human-translated texts) and NLP tools. Work is just beginning on collecting these corpora and developing the NLP tools for the languages of Afghanistan. Given the physician s access to a team of highly skilled translators who were familiar with medical terminology in both English and Dari, the MLCB proposed a partnership with the ETT s team of translators to develop a new translation process. In this process, termed Human in the Loop Translation, the MLCB provides rough draft machine translations of the FCCS text into Dari; the human translators edit the rough draft and return the edited copy to the MLCB; and then the MLCB updates the machine translation corpus based on the translators changes. Partnering in this way yields potential benefits to both the ETT and our lab. The military medical doctor increases the productivity of his team of translators, while the MLCB collects an English/Dari bilingual parallel corpus in the medical training manual domain, which could then be used to translate future training manuals and also serve as a source of data for scientific research. The Human in the Loop process also addresses another issue that doctors working on ETTs brought up: they felt that they were starting all over again every time a new team arrived. An archive of translated documents and bilingual terminologies would remedy this problem, further enabling the transition of future medical care to Afghan responsibility. 2
11 3. Statistical Machine Translation (SMT) Systems NLP tools and other language resources are very scarce for Dari. Specifically, English/Dari bilingual text corpora were almost nonexistent when the command surgeon contacted the MLCB. We proposed the Human in the Loop process in order to build a machine translation system that would gradually improve as more domain-specific bilingual text became available for training the system. For that task, we considered using open source, statistical machine translation (SMT) systems. These systems have the following advantages: 1. They provide the means to create complete translation systems. These packages include a decoder, which actually performs the translation by taking an English sentence as input and searching for the best Dari translation to output. 2. In most cases, they provide scripts for training the components the decoder needs to perform translations. 3. They only require a parallel corpus of text to produce an effective system. They do not require experts in any field to produce grammars, mathematical models, or any complicated software engineering. 4. The systems can be built and rebuilt rapidly once the training corpus is available. Training one system on our 50,000 segment parallel corpus takes two days on a single processor. Parts of the process of building these systems can be parallelized and run on a cluster of computers to speed up the process as the training corpus grows. We explored two options for training SMT systems: Moses (1), which implements a phrasebased translation method; and the Johns Hopkins open source architecture (JosHUa) (2), a new third-generation SMT system. Broadly speaking, first-generation systems translate words, second-generation systems translate phrases, and third-generation systems translate structures. Specifically, JosHUa implements a system called Hiero (3) that automatically extracts grammar structures called synchronous context free grammars (SCFGs) and decodes using chart parsing. The field of machine translation seems to be moving toward the syntax-based approach of JosHUa and our results show better performance from JosHUa than Moses. 3
12 4. Human in the Loop Translation Process First, the systems were trained on a bilingual corpus that we had been aligning at the MLCB for over a year. The source of the corpus was a newspaper published in Kabul, Afghanistan, in the three languages English, Dari, and Pashto. We also had access to English to Dari translations of Army field manuals that had been done recently. After resolving copyright issues with the SCCM, we received the entire FCCS book in portable document format (pdf). After extracting the text from the pdf file, we ran the first chapter through our systems trained on the newspaper and field manual corpora. The translations were very rough; many technical medical terms were not translated since they did not appear in our training corpus. A two-column spreadsheet was composed, the first column containing each line of the chapter in English and the second column containing the machine-rendered translation in Dari. The spreadsheet was ed to the military physician in Afghanistan, and the translators wrote their version in a third column of the spreadsheet and sent it back to the MLCB. The first and third columns of the spreadsheet were extracted, and the systems were retrained with this new set of ~250 English/Dari segments. We continued this process on subsequent chapters. As of this report, chapters 1 through 6 have been translated and chapter 7 has yet to be sent back from the translators in Afghanistan. At the end of this project, we envision publishing the entire FCCS book in an English/Dari bilingual printed book format, where facing pages contain mirrored translations: English on the left-hand side and Dari on the right-hand side. Although a printed book may seem low tech, this format is what trainers in the field have requested and what they feel will be most helpful in their work with trainees. Trainers and other translation professionals have also expressed an interest in bilingual (English/Dari) glossaries in special domains. The remainder of this report will focus on extracting and translating medical terminology from English into Dari. We have extended our previous work to a Human in the Loop procedure for creating a bilingual English/Dari glossary of medical terms similar to the method used by Deléger, Merkel, and Zweigenbaum (5). According to these researchers, medical terminology is a necessary resource for any kind of health care information task (e.g., coding, free text indexing, information retrieval). Extracting terms from text can be very tedious. If terms only consisted of single words, the task of terminology extraction would be easy: a human would be given a list of words and prompted to decide whether or not the word is a term. But terms are not limited to single words. Thus, a human has to read the text in context in order to determine the technical terms. Deléger, Merkel, and Zweigenbaum worked with the heavily resourced English-French language pair, so their procedure was slightly different. Once they extracted their terms, they displayed a sentence from a parallel corpus in which it occurred in context. 4
13 4.1 Medical Language System (UMLS) Tools In our procedure, we leverage work done by the Lexical Systems Group (LSG) at the National Institutes of Health (NIH) in creating the Unified Medical Language System (UMLS) (12). LSG has developed tools for extracting technical terms from text in the biomedical domain. We used these tools to extract English terms from the FCCS. Specifically, we took the following steps with the UMLS tools: 1. Text was parsed into phrases using the Java class gov.nih.nlm.nls.nlp.parser.parse. 2. Next, noun phrases and prepositional phrases were extracted (with prepositions stripped). 3. Terms were extracted using the LSG tool lexical variant generator (lvg). We experimented with different options and flow components for lvg. The following is one example of a command line for lvg: lvg -f:ct:fa:g:p. In this program, the Ct option retrieves the unique lexical names of the terms, the fa option filters out acronyms, the g option removes genitives, and the p option strips punctuation. 4. The list of terms was fed to the Moses or JosHUa decoders, which wrote out candidate Dari translations. 5. The candidate translations were edited by a native Dari speaker. 6. Finally, the edited translations were appended to the training corpus. So far, the Moses and JosHUa SMT systems have been trained with ~65,000 segments from parallel texts in the newspaper, military training manual, and medical domains. 4.2 Training Moses and JosHUa The training steps for the Moses and JosHUa systems included the following: 1. Sentence level alignment of the parallel training corpus was performed using Robert Moore s Bilingual Sentence Aligner (6). 2. Word level alignments were extracted using the Posterior Constraints Alignment Toolkit (PostCAT) (7) and the Berkeley Aligner (13). 3. The Java class joshua.corpus.suffix_array.compile was used to put the word alignments into suffix array data structures, based on the syntax-based translation method developed by Adam Lopez (11). Note: Moses does not yet have the capability to implement this step. 4. In the JosHUa system, grammar rules were extracted from the suffix arrays and a development set of segments with the Java class joshua.prefix_tree.extractrules. In Moses, a table of phrase translations was extracted from the word alignments obtained in step 2. 5
14 5. The grammar rules in the JosHUa system and the phrase pairs in the Moses system were scored using relative frequency statistics. 6. Both systems used the Stanford Research Institute Language Modeling (SRILM) toolkit to make a 4-gram statistical language model for Dari. 7. A reordering model was trained for Moses. Since reordering is modeled by the grammar rules, JosHUa did not train a reordering model. 8. Both systems trained weights for the log-linear model with a minimum error rate training (mert). JosHUa used Omar Zaidan s zmert via the Java class joshua.zmert.zmert (9) and Moses used the Franz Och original version of mert (10). The log linear model was used to combine the feature functions including those defined by the language model and phrase table (in the case of Moses) or grammar rules (in the case of JosHUa). 5. Results and Discussion The output from the two systems was reviewed by a MLCB native Dari speaker. He found the output from the Moses system to be a useful aid to translators as a rough draft, saying that he would use them himself if he were asked to translate the FCCS. He also noted that the quality of the translations was improving with every cycle in the loop. While this subjective evaluation was encouraging, a more objective look at the effect of our process on the quality of machine-translated text was needed. The FCCS text includes a test that is given to trainees after instruction to assess their learning. This test, which includes terminology from the entire FCCS text, was translated into Dari. We then used the set of 311 segments in the test to evaluate the performance of our systems (table 1). We obtained Bilingual Evaluation Understudy (BLEU) (12) scores using version 12 of the National Institute of Standards and Technology s (NIST) MTeval utility (13) (ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13.pl) on the output from the Moses and JosHUa systems. BLEU is a standard, albeit controversial, metric for evaluating the performance of machine translation systems. We used the BLEU scores here to confirm the trend we noticed anecdotally. The scores were conveniently obtained via MTeval and we do not make claims concerning accuracy of BLEU as a metric. 6
15 Table 1. BLEU scores for systems trained incrementally on chapters of the FCCS text. Training Moses BLEU JosHUa BLEU baseline baseline + chapter baseline + chapter 1 + mert baseline + chapters 1, baseline + chapters 1,2 + mert baseline + chapters 1,2, baseline + chapters 1,2,3 + mert baseline + chapters 1,2,3, baseline + chapters mert baseline + chapters baseline + chapters mert baseline + medical domain + chapters baseline + medical domain + chapters mert The baseline system was trained on text from the newspaper and field manual domains. Our medical domain data were excluded from the baseline system s training corpus. For the JosHUa system, appending a new chapter to the training corpus clearly resulted in improved BLEU scores on the test set, although in three cases running mert yielded better BLEU scores than appending the next chapter s text to the training corpus. BLEU scores also rose with each new chapter appended to the Moses training corpus, but mert training lowered BLEU scores in four cases. The FCCS contains 15 chapters. Table 1 shows that the Human in the Loop procedure increases the performance of an SMT system based on the BLEU scores in the upper teens to the upper 30s, even before translating half the chapters in the book. 6. Conclusions Trainers who need translations of specific texts and in a very specific domain, like critical care medicine, can benefit from automatically generated machine translations by participating in a Human in the Loop process. The systems reported on in this report were trained on small amounts of data, but by periodically updating their training corpora, the systems eventually produced useful translations. In translating a book, for example, we believe that using this procedure would yield acceptable results by the time half the chapters in the book were translated. 7
16 7. References 1. Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.; Dyer, C.; Bojar, O.; Constantin, A.; Herbst, E. Moses: Open Source Toolkit for Statistical Machine Translation. Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, June 2007, pages , Association for Computational Linguistics Li, Z.; Callison-Burch, C.; Dyer, C.; Ganitkevitch, J.; Khudanpur, S.; Schwartz, L.; Thornton, W.N.G.; Weese, J.; Zaidan, O. F. Joshua: An Open Source Toolkit for Parsingbased Machine Translation. Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece, March 30 31, 2009, 3. Chiang, D.; Lopez, A.; Madnani, N.; Monz, C.; Resnik, P.; Subotin, M. The Hiero Machine Translation System: Extensions, Evaluation, and Analysis. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, October 6 8, 2005, , Association for Computational Linguistics: Morristown, NJ. 4. Deléger, L.; Merkel, M.; Zweigenbaum, P. Translating Medical Terminologies Through Word Alignment in Parallel Text Corpora. J. of Biomedical Informatics August 2009, 42 (4), Moore, R. C. Fast and Accurate Sentence Alignment of Bilingual Corpora. In Machine Translation: From Research to Real Users, Richardson, S., ed.; Proceedings, 5th Conference of the Association for Machine Translation in the Americas, Tiburon, CA, , 2002, 6. Graça, J.; Ganchev, K.; Taskar, B. Post-cat - Posterior Constrained Alignment Toolkit. The Third Machine Translation Marathon 2009, 91, Stolcke, A. SRILM an Extensible Language Modeling Toolkit. Intl. Conf. on Spoken Language Processing, 2002, Zaidan, O. F. Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems. The Prague Bulletin of Mathematical Linguistics 2009, 91,
17 9. Och, F. Jf. Minimum Error Rate Training for Statistical Machine Translation. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL2003), Sapporo, Japan, July 7 12, 2003, , Lopez, A. D.; Resnik, P. S. Machine Translation by Pattern Matching; University of Maryland at College Park, College Park, MD, 2008, ISBN: Bodenreider, O. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Research January 2004, 32, Database issue D267 D Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. BLEU: A Method for Automatic Evaluation of Machine Translation; Technical Report RC22176 (W ); IBM Research, 2001, Doddington, G. Automatic Evaluation of Machine Translation Quality Using N-gram Cooccurrence Statistics. Proceedings of the Second International Conference on Human Language Technology Research, Liang, P.; Taskar, B.; Klein, D. Alignment by Agreement. Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, NY, June 4 9, 2006, [doi> / ]. 9
18 List of Symbols, Abbreviations and Acronyms ANA ARL BLEU CSTC-A ETT FCCS ICU JosHUa LSG lvg mert MLCB NATO NIH NIST NLP pdf PostCAT SCCM SCFG SMT SRILM UMLS Afghan National Army U.S. Army Research Laboratory Bilingual Evaluation Understudy Combined Security Transition Command Afghanistan Embedded Training Team Fundamental Critical Care Support Intensive Care Unit Johns Hopkins open source architecture Lexical Systems Group lexical variant generator minimum error rate training Multilingual Computing Branch North Atlantic Treaty Organization National Institutes of Health National Institute of Standards and Technology natural language processing portable document format Posterior Constraints Alignment Toolkit Society for Critical Care Medicine synchronous context free grammar statistical machine translation Stanford Research Institute Language Modeling Unified Medical Language System 10
19 NO. OF COPIES ORGANIZATION 1 PDF ADMNSTR DEFNS TECHL INFO CTR ATTN DTIC OCP 8725 JOHN J KINGMAN RD STE 0944 FT BELVOIR VA HCS US ARMY RSRCH LAB 1 PDF ATTN RDRL CII B BROOME ATTN RDRL CII T J J MORGAN (12 HCs, 1 PDF) ATTN RDRL CII T M HOLLAND ATTN RDRL CIM P TECHL PUB ATTN RDRL CIM L TECHL LIB ATTN IMNE ALC HRR MAIL & RECORDS MGMT ADELPHI MD HC US ARMY RSRCH LAB ATTN RDRL CIM G T LANDFRIED BLDG 4600 ABERDEEN PROVING GROUND MD TOTAL: 20 (18 HCs, 2 PDFs) 11
20 INTENTIONALLY LEFT BLANK. 12
Simulation of Air Flow Through a Test Chamber
Simulation of Air Flow Through a Test Chamber by Gregory K. Ovrebo ARL-MR- 0680 December 2007 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings in this report are not
Hybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
Collaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert [email protected] Jean Senellart Systran SA [email protected] Laurent Romary Humboldt Universität Berlin
THUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Building a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
Detailed Alignment Procedure for the JEOL 2010F Transmission Electron Microscope
Detailed Alignment Procedure for the JEOL 2010F Transmission Electron Microscope by Wendy Sarney ARL-MR-603 December 2004 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings
Overview Presented by: Boyd L. Summers
Overview Presented by: Boyd L. Summers Systems & Software Technology Conference SSTC May 19 th, 2011 1 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection
Human Body Mesh With Maya
Animating a Human Body Mesh with Maya for Doppler Signature Computer Modeling by Getachew Kirose ARL-TN-0351 June 2009 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
Machine translation techniques for presentation of summaries
Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
Factored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang [email protected], [email protected] School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
Report Documentation Page
(c)2002 American Institute Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e
EAD Expected Annual Flood Damage Computation
US Army Corps of Engineers Hydrologic Engineering Center Generalized Computer Program EAD Expected Annual Flood Damage Computation User's Manual March 1989 Original: June 1977 Revised: August 1979, February
The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 5 18. A Guide to Jane, an Open Source Hierarchical Translation Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 5 18 A Guide to Jane, an Open Source Hierarchical Translation Toolkit Daniel Stein, David Vilar, Stephan Peitz, Markus Freitag, Matthias
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation
73rd MORSS CD Cover Page UNCLASSIFIED DISCLOSURE FORM CD Presentation 21-23 June 2005, at US Military Academy, West Point, NY 712CD For office use only 41205 Please complete this form 712CD as your cover
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden [email protected]
DEFENSE CONTRACT AUDIT AGENCY
DEFENSE CONTRACT AUDIT AGENCY Fundamental Building Blocks for an Acceptable Accounting System Presented by Sue Reynaga DCAA Branch Manager San Diego Branch Office August 24, 2011 Report Documentation Page
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
DCAA and the Small Business Innovative Research (SBIR) Program
Defense Contract Audit Agency (DCAA) DCAA and the Small Business Innovative Research (SBIR) Program Judice Smith and Chang Ford DCAA/Financial Liaison Advisors NAVAIR 2010 Small Business Aviation Technology
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan
RT 24 - Architecture, Modeling & Simulation, and Software Design
RT 24 - Architecture, Modeling & Simulation, and Software Design Dennis Barnabe, Department of Defense Michael zur Muehlen & Anne Carrigy, Stevens Institute of Technology Drew Hamilton, Auburn University
John Mathieson US Air Force (WR ALC) Systems & Software Technology Conference Salt Lake City, Utah 19 May 2011
John Mathieson US Air Force (WR ALC) Systems & Software Technology Conference Salt Lake City, Utah 19 May 2011 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation Liang Tian 1, Derek F. Wong 1, Lidia S. Chao 1, Paulo Quaresma 2,3, Francisco Oliveira 1, Yi Lu 1, Shuo Li 1, Yiming
Guide to Using DoD PKI Certificates in Outlook 2000
Report Number: C4-017R-01 Guide to Using DoD PKI Certificates in Outlook 2000 Security Evaluation Group Author: Margaret Salter Updated: April 6, 2001 Version 1.0 National Security Agency 9800 Savage Rd.
Issue Paper. Wargaming Homeland Security and Army Reserve Component Issues. By Professor Michael Pasquarett
Issue Paper Center for Strategic Leadership, U.S. Army War College May 2003 Volume 04-03 Wargaming Homeland Security and Army Reserve Component Issues By Professor Michael Pasquarett Background The President
Asset Management- Acquisitions
Indefinite Delivery, Indefinite Quantity (IDIQ) contracts used for these services: al, Activity & Master Plans; Land Use Analysis; Anti- Terrorism, Circulation & Space Management Studies; Encroachment
Leveraging Open Source Software to Create Technical Animations of Scientific Data
Leveraging Open Source Software to Create Technical Animations of Scientific Data by John M. Vines ARL-TR-3918 September 2006 Approved for public release; distribution is unlimited. NOTICES Disclaimers
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
CAPTURE-THE-FLAG: LEARNING COMPUTER SECURITY UNDER FIRE
CAPTURE-THE-FLAG: LEARNING COMPUTER SECURITY UNDER FIRE LCDR Chris Eagle, and John L. Clark Naval Postgraduate School Abstract: Key words: In this paper, we describe the Capture-the-Flag (CTF) activity
Pima Community College Planning Grant For Autonomous Intelligent Network of Systems (AINS) Science, Mathematics & Engineering Education Center
Pima Community College Planning Grant For Autonomous Intelligent Network of Systems (AINS) Science, Mathematics & Engineering Education Center Technical Report - Final Award Number N00014-03-1-0844 Mod.
Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
mproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora Rajdeep Gupta, Santanu Pal, Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University Kolata
DEFENSE BUSINESS PRACTICE IMPLEMENTATION BOARD
Defense Business Practice Implementation Board DEFENSE BUSINESS PRACTICE IMPLEMENTATION BOARD Report to the Senior Executive Council, Department of Defense MANAGEMENT INFORMATION TASK GROUP Report FY02-3
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh [email protected] Abstract We describe our systems for the SemEval
Synchronization of Unique Identifiers Across Security Domains
Synchronization of Unique Identifiers Across Security Domains by Mark R. Mittrick and Gary S. Moss ARL-MR-712 February 2009 Approved for public release; distribution is unlimited. NOTICES Disclaimers The
An Application of an Iterative Approach to DoD Software Migration Planning
An Application of an Iterative Approach to DoD Software Migration Planning John Bergey Liam O Brien Dennis Smith September 2002 Product Line Practice Initiative Unlimited distribution subject to the copyright.
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
Translating the Penn Treebank with an Interactive-Predictive MT System
IJCLA VOL. 2, NO. 1 2, JAN-DEC 2011, PP. 225 237 RECEIVED 31/10/10 ACCEPTED 26/11/10 FINAL 11/02/11 Translating the Penn Treebank with an Interactive-Predictive MT System MARTHA ALICIA ROCHA 1 AND JOAN
REPORT DOCUMENTATION PAGE *
REPORT DOCUMENTATION PAGE * Form Approved OMBNo. 07040188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
The noisier channel : translation from morphologically complex languages
The noisier channel : translation from morphologically complex languages Christopher J. Dyer Department of Linguistics University of Maryland College Park, MD 20742 [email protected] Abstract This paper
Interagency National Security Knowledge and Skills in the Department of Defense
INSTITUTE FOR DEFENSE ANALYSES Interagency National Security Knowledge and Skills in the Department of Defense June 2014 Approved for public release; distribution is unlimited. IDA Document D-5204 Log:
TITLE: The Impact Of Prostate Cancer Treatment-Related Symptoms On Low-Income Latino Couples
AD Award Number: W81WH-07-1-0069 TITLE: The Impact Of Prostate Cancer Treatment-Related Symptoms On Low-Income Latino Couples PRINCIPAL INVESTIGATOR: Sally L. Maliski, Ph.D., R.N. CONTRACTING ORGANIZATION:
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation Mu Li Microsoft Research Asia [email protected] Dongdong Zhang Microsoft Research Asia [email protected]
THE MIMOSA OPEN SOLUTION COLLABORATIVE ENGINEERING AND IT ENVIRONMENTS WORKSHOP
THE MIMOSA OPEN SOLUTION COLLABORATIVE ENGINEERING AND IT ENVIRONMENTS WORKSHOP By Dr. Carl M. Powe, Jr. 2-3 March 2005 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden
Integrated Force Method Solution to Indeterminate Structural Mechanics Problems
NASA/TP 2004-207430 Integrated Force Method Solution to Indeterminate Structural Mechanics Problems Surya N. Patnaik Ohio Aerospace Institute, Brook Park, Ohio Dale A. Hopkins and Gary R. Halford Glenn
FIRST IMPRESSION EXPERIMENT REPORT (FIER)
THE MNE7 OBJECTIVE 3.4 CYBER SITUATIONAL AWARENESS LOE FIRST IMPRESSION EXPERIMENT REPORT (FIER) 1. Introduction The Finnish Defence Forces Concept Development & Experimentation Centre (FDF CD&E Centre)
Applying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System
Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System Ki-Young Lee and Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research
Domain Adaptation for Medical Text Translation Using Web Resources
Domain Adaptation for Medical Text Translation Using Web Resources Yi Lu, Longyue Wang, Derek F. Wong, Lidia S. Chao, Yiming Wang, Francisco Oliveira Natural Language Processing & Portuguese-Chinese Machine
Mr. Steve Mayer, PMP, P.E. McClellan Remediation Program Manger Air Force Real Property Agency. May 11, 2011
Mr. Steve Mayer, PMP, P.E. McClellan Remediation Program Manger Air Force Real Property Agency May 11, 2011 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection
Advanced Micro Ring Resonator Filter Technology
Advanced Micro Ring Resonator Filter Technology G. Lenz and C. K. Madsen Lucent Technologies, Bell Labs Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection
AFRL-RX-WP-TP-2008-4023
AFRL-RX-WP-TP-2008-4023 HOW KILLDEER MOUNTAIN MANUFACTURING IS OPTIMIZING AEROSPACE SUPPLY CHAIN VISIBILITY USING RFID (Postprint) Jeanne Duckett Killdeer Mountain Manufacturing, Inc. FEBRUARY 2008 Final
2010 2011 Military Health System Conference
2010 2011 Military Health System Conference Population Health Management The Missing Element of PCMH Sharing The Quadruple Knowledge: Aim: Working Achieving Together, Breakthrough Achieving Performance
ELECTRONIC HEALTH RECORDS. Fiscal Year 2013 Expenditure Plan Lacks Key Information Needed to Inform Future Funding Decisions
United States Government Accountability Office Report to Congressional July 2014 ELECTRONIC HEALTH RECORDS Fiscal Year 2013 Expenditure Plan Lacks Key Information Needed to Inform Future Funding Decisions
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
In June 1998 the Joint Military Intelligence. Intelligence Education for Joint Warfighting A. DENIS CLIFT
Defense Intelligence Analysis Center, home of JMIC. Intelligence Education for Joint Warfighting Courtesy Joint Military Intelligence College By A. DENIS CLIFT In June 1998 the Joint Military Intelligence
An Implicit Three-Dimensional Meteorological Message for Artillery Trajectory Calculation
An Implicit Three-Dimensional Meteorological Message for Artillery Trajectory Calculation by Patrick A. Haines, David J. Epler II, James L. Cogan, Sean G. O Brien, Benjamin T. MacCall, and Brian P. Reen
Computer Aided Translation
Computer Aided Translation Philipp Koehn 30 April 2015 Why Machine Translation? 1 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority
Algorithmic Research and Software Development for an Industrial Strength Sparse Matrix Library for Parallel Computers
The Boeing Company P.O.Box3707,MC7L-21 Seattle, WA 98124-2207 Final Technical Report February 1999 Document D6-82405 Copyright 1999 The Boeing Company All Rights Reserved Algorithmic Research and Software
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010 OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION SMT VS. RBMT [K.
Microstructural Evaluation of KM4 and SR3 Samples Subjected to Various Heat Treatments
NASA/TM 2004-213140 Microstructural Evaluation of KM4 and SR3 Samples Subjected to Various Heat Treatments David Ellis and Timothy Gabb Glenn Research Center, Cleveland, Ohio Anita Garg University of Toledo,
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5
Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4
HEC-DSS Add-In Excel Data Exchange for Excel 2007-2010
HEC-DSS Add-In Excel Data Exchange for Excel 2007-2010 User's Manual Version 3.2.1 January 2010 Approved for Public Release. Distribution Unlimited. CPD-79b REPORT DOCUMENTATION PAGE Form Approved OMB
UMass at TREC 2008 Blog Distillation Task
UMass at TREC 2008 Blog Distillation Task Jangwon Seo and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This paper presents the work done for
An Oil-Free Thrust Foil Bearing Facility Design, Calibration, and Operation
NASA/TM 2005-213568 An Oil-Free Thrust Foil Bearing Facility Design, Calibration, and Operation Steve Bauman Glenn Research Center, Cleveland, Ohio March 2005 The NASA STI Program Office... in Profile
Addressing the Real-World Challenges in the Development of Propulsion IVHM Technology Experiment (PITEX)
NASA/CR 2005-213422 AIAA 2004 6361 Addressing the Real-World Challenges in the Development of Propulsion IVHM Technology Experiment (PITEX) William A. Maul, Amy Chicatelli, and Christopher E. Fulton Analex
Automatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
I N S T I T U T E F O R D E FE N S E A N A L Y S E S NSD-5216
I N S T I T U T E F O R D E FE N S E A N A L Y S E S NSD-5216 A Consistent Approach for Security Risk Assessments of Dams and Related Critical Infrastructure J. Darrell Morgeson Jason A. Dechant Yev Kirpichevsky
