Multilingual Information Retrieval Using English and Chinese Queries
|
|
- Isabella Holmes
- 7 years ago
- Views:
Transcription
1 Multilingual Information Retrieval Using and Chinese Queries Aitao Chen School of Information Management and Systems University of California, Berkeley CLEF 2001 Workshop: 3-4 Sept, 2001, Darmstadt, Germany
2 Outline Overview over what we did at CLEF-2001 German decompounding Chinese topics translation Merging strategies and alternative methods Conclusions
3 Participation in CLEF-2001 Monolingual task (German and Spanish) Bilingual task (Chinese to ) Multilingual task ( and Chinese)
4 Overview of Multilingual Information Retrieval Using Queries Query Documents SYSTRAN and L&H French German French German Italian Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs merger combined ranked list of documents
5 Chinese Overview of Multilingual Information bilingual dict parallel texts search engine Retrieval Using Chinese Queries SYSTRAN and L&H Query French German Italian Documents French German Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs merger combined ranked list of documents
6 German Decompounding Procedure Create a German base dictionary consisting of single words only (compounds are excluded). Decompose a compound into component words found in the German base dictionary. Choose the decomposition with the minimum number of component words. If there are more than one decompositions having the minimum number of component words, choose the decomposition with the highest probability.
7 German Decompounding: Example 1 Compound: filmfestspiele (film festival) 1. Base dictionary film fest fests festspiele piele s 2. Decompositions: 1. film fest s piele 2. film fest spiele 3. film fests piele 4. film festspiele 3. Result: filmfestspiele = file festspiele
8 German Decompounding: Example 2 Compound: hungerstreiks (hunger strike) 1. Base dictionary erst hung hunger hungers hungerst reik reiks s streik streiks 2. Decompositions: log p(d) 1. hung erst reik s hung erst reiks hunger streik s hunger streiks hungerst reik s hungerst reiks Result: hungerstreiks = hunger streiks
9 German Decompounding: Probability of Decomposition C = W 1 W2 W3 W4 p( C) = p( W1 ) p( W2 ) p( W3 ) p( W4 ) p( w) = n tfc( i= 1 tfc( w) w i ) tfc(w) is the number of times word w occurs in a corpus. n is the number of unique words (including compounds) in a corpus.
10 German Decompounding: Failed Cases 1. erdatmosphäre = erde + atmosphäre (earth atmosphere) 2. mittagessenzeit = mittag essen zeit (noon meal time) (mittagessenzeit = mittagessen zeit) lunch time 3. And others
11 German Decompounding and Monolingual Retrieval Performance Test collections -Decompounding -Stemming -Expansion + Decompounding Change CLEF-2001 (49/225K).3673 (1877/2130).4314 (1949/2130) % CLEF-2000 (37/154K).3189 (673/821).4112 (770/821) % TREC-6/7/8 (73/252K).2993 (1907/2626).3368 (2172/2626) % Only component words of compounds are kept in the queries.
12 German Monolingual Retrieval Performance Precision Recall BK2GGA1 (.4050) BK2GGA2 (.3551) bk2gga1* (.4436) Features: +stemming, +decompounding, -expansion
13 Overview of Chinese to Retrieval Chinese topics segmentation stopwords removal Translation resources Term selection & weighting de-segmentation LDC bilingual wordlist term selection Monolingual Chinese words Bilingual dict (parallel texts) term selection term merging & weighting queries (in words) docs Preprocessing Chinese search engine term selection system retrieval results
14 Chinese Topics Preprocessing: De-segmentation
15 Translation Resources: Creation of Bilingual Dictionary From Parallel Texts Parallel texts: Hong Kong news (4/98-4/2001) and FBIS Chinese collection. Document alignment: + LDC wordlist. Paragraph & sentence alignment: adapted from Gale and Church s length-based model. Association measure: Dunning s maximum likelihood ratio statistic.
16 Term Translation Using Search Engine
17 E1 1 E2 1 E3 1 Term Selection, Merging, and E3 1 E4 1 Weighting (1) Top-3 translations of Chinese word C1 from LDC wordlist. Translations are ranked by occurrence frequency in the LA Times collection. (2) Top-2 translations of Chinese word C1 from parallel texts. Translations are ranked by association weight. (1) (2) (5) E1 1 E2 1 E3 2 E4 1 C1 2 E1.20 E2.20 E3.40 E4.20 Original query term frequency of C1 Final term weights for translations of C1 E1.40 E2.40 E3.80 E4.40 (3) (4) (6)
18 Translation Resources Versus Chinese-to- Performance Precision Recall LDC+HKF+YAHOO (.4112) LDC+HKF (.3599) LDC (.2679) HKF (.2675) Mono (.5553)
19 Multilingual Information Retrieval: Merging Strategy docs French docs Italian docs German docs Spanish docs E1 e1 E2 e2 E50 e50 E51 e51 E1000 e1000 F1 f1 F2 f2 F50 f50 F51 f51 F1000 f1000 I1 i1 I2 i2 I50 i50 I51 i51 I1000 i1000 G1 g1 G2 g2 G50 g50 G51 g51 G1000 g1000 S1 s1 S2 s2 S50 s50 S51 s51 S1000 s1000 E1.8*e1 + 1 E2.8*e2 + 1 E50.8*e E51.8*e51 E1000.8*e1000 F1 f1 + 1 F2 f2 + 1 F50 f F51 f51 F1000 f1000 I1 i1 + 1 I2 i2 + 1 I50 i I51 i51 I1000 i1000 G1 g1 + 1 G2 g2 + 1 G50 g G51 g51 G1000 g1000 S1 s1 + 1 S2 s2 + 1 S50 s S51 s51 S1000 s1000 (1) combine lists; (2) sort by adjusted weight; (3) take top 1000 docs
20 Performance of Multilingual Information Retrieval Using Long Queries Query Documents French French SYSTRAN and L&H German German Italian Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs (.5553) (.4776) (.3789) (.3934) (.4703) merger (.3424) combined ranked list of documents
21 Performance of Multilingual Information Retrieval Using Chinese Long Queries Original Query Chinese bilingual dict parallel texts search engine SYSTRAN and L&H (.4122) Query French German Italian Documents French German Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs (.4122) (.2874) (.2619) (.2509) (.2942) merger (.2217) combined ranked list of documents
22 Multilingual Information Retrieval: Alternative Merging Strategy docs French docs Italian docs German docs Spanish docs E1 e1 E2 e2 E50 e50 E51 e51 E1000 e1000 F1 f1 F2 f2 F50 f50 F51 f51 F1000 f1000 I1 i1 I2 i2 I50 i50 I51 i51 I1000 i1000 G1 g1 G2 g2 G50 g50 G51 g51 G1000 g1000 S1 s1 S2 s2 S50 s50 S51 s51 S1000 s1000 E1 e1/e1 E2 e2/e1 E50 e50/e1 E51 e51/e1 E1000 e1000/e1 F1 f1/f1 F2 f2/f1 F50 f50/f1 F51 f51/f1 F1000 f1000/f1 I1 i1/i1 I2 i2/i1 I50 i50/i1 I51 i51/i1 I1000 i1000/i1 G1 g1/g1 G2 g2/g1 G50 g50/g1 G51 g51/g1 G1000 g1000/g1 S1 s1/s1 S2 s2/s1 S50 s50/s1 S51 s51/s1 S1000 s1000/s1 (1) combine lists; (2) sort by adjusted weight; (3) take top 1000 docs
23 Multilingual Information Retrieval: Alternative Method 1 Multilingual Query Multilingual Document Collection translator French German Italian engine French German Italian Spanish Spanish ranked list of docs in multiple languages
24 Multilingual Information Retrieval: Alternative Method 2 Translated documents Query Original documents engine translator translator French German translator Italian translator Spanish ranked list of docs in
25 Multilingual Information Retrieval: Alternative Method 3 Query Documents French French translator German German Italian Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs translator translator translator translator docs docs docs docs docs combined ranked list of documents
26 Performance of Different ML Methods Precision Recall BK2MUEAA1 (.3424) NormalizedMerging (.3286) ML Alternative 1 (.3126) ML Alternative 3 (.3648)
27 Conclusions German decompounding can significantly improve retrieval performance. Keeping only component words in the query works better than keeping both compounds and component words. Chinese search engine is a valuable resource for translating Chinese proper nouns into. Merging documents by adjusted probability of relevance works reasonably well.
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationApproaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationCINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationUsing Wikipedia to Translate OOV Terms on MLIR
Using to Translate OOV Terms on MLIR Chen-Yu Su, Tien-Chien Lin and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University of Technology Taichung County 41349, TAIWAN
More informationEuropeanaConnect Multilinguality Survey
EuropeanaConnect Multilinguality Survey Nicola Ferro & Vivien Petras Workshop at ICSD 2009 Trento, Italy 9 September 2009 Background EuropeanaConnect Task 2.1 User studies & multilingual resources use:
More informationHow One Word Can Make all the Difference
How One Word Can Make all the Difference Using Subject Metadata for Automatic Query Expansion and Reformulation Vivien Petras School of Information Management & Systems UC Berkeley Overview Introduction
More informationUsing COTS Search Engines and Custom Query Strategies at CLEF
Using COTS Search Engines and Custom Query Strategies at CLEF David Nadeau, Mario Jarmasz, Caroline Barrière, George Foster, and Claude St-Jacques Language Technologies Research Centre Interactive Language
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationGeoCLEF Administration. Content. Initial Aim of GeoCLEF. Interesting Issues
9 th Workshop of the Cross-Language Evaluation Forum (CLEF) Århus, 18 th Sept. 2008 GeoCLEF Administration Joint effort of Fredric Gey, Ray Larson (U. California at Berkeley) Diana Santos (Linguateca,
More informationThe Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized
The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized to each field. The Language Grid, a software that provides
More informationSimple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd adam@lexmasterclass.com Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
More informationRecent developments in machine translation policy at the European Patent Office
Recent developments in machine translation policy at the European Patent Office Dr Georg Artelsmair Director European Co-operation European Patent Office Brussels, 17 November 2010 The European Patent
More informationImproving Non-English Web Searching (inews07)
SIGIR 2007 WORKSHOP REPORT Improving Non-English Web Searching (inews07) Fotis Lazarinis Technological Educational Institute Mesolonghi, Greece lazarinf@teimes.gr Jesus Vilares Ferro University of A Coruña
More informationGetting Off to a Good Start: Best Practices for Terminology
Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction
More informationCross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
More informationOptimizing Multilingual Search With Solr
www.basistech.com info@basistech.com 617-386-2090 Optimizing Multilingual Search With Solr Pg. 1 INTRODUCTION Today s search application users expect search engines to just work seamlessly across multiple
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationBITS: A Method for Bilingual Text Search over the Web
BITS: A Method for Bilingual Text Search over the Web Xiaoyi Ma, Mark Y. Liberman Linguistic Data Consortium 3615 Market St. Suite 200 Philadelphia, PA 19104, USA {xma,myl}@ldc.upenn.edu Abstract Parallel
More informationOntology-Based Multilingual Information Retrieval
Ontology-Based Multilingual Information Retrieval Jacques Guyot * Saïd Radhouani *,** Gilles Falquet * * Centre universitaire d informatique 24, rue Général-Dufour, CH-1211 Genève 4, Switzerland ** Laboratoire
More informationHow To Access Multilingual Information On The Web With Google And Clir
Information Access across Languages on the Web: From Search Engines to Digital Libraries Jiangping Chen, Yu Bao Department of Library and Information Sciences, University of North Texas 1155 Union Circle
More informationEmbedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval
Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval Wessel Kraaij Jian-Yun Nie Michel Simard TNO TPD Université de Montréal Université de Montréal Although more and
More informationThe University of Lisbon at CLEF 2006 Ad-Hoc Task
The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports
More informationMulti language e Discovery Three Critical Steps for Litigating in a Global Economy
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationDutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationAn Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
More informationIntroduction. Philipp Koehn. 28 January 2016
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
More informationUniversity of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion
University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion Gina-Anne Levow University of Chicago 1100 E. 58th St, Chicago, IL 60637, USA levow@cs.uchicago.edu Abstract Pseudo-relevance feedback,
More informationGlossary of translation tool types
Glossary of translation tool types Tool type Description French equivalent Active terminology recognition tools Bilingual concordancers Active terminology recognition (ATR) tools automatically analyze
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationInteroperability, Standards and Open Advancement
Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations
More informationA Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR
A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR Dan Wu School of Information Management Wuhan University, Hubei, China woodan@whu.edu.cn Daqing He School of Information
More informationHow Effective is Google s Translation Service in Search?
ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Communications of the
More informationQuestion template for interviews
Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by
More informationMorphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications
Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications Berlin Berlin Buzzwords 2011, Dr. Christoph Goller, IntraFind AG Outline IntraFind AG Indexing Morphological
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationHow effective is Google s translation service in search?
How effective is Google s translation service in search? Jacques Savoy, Ljiljana Dolamic Computer Science Dept., University of Neuchatel, Rue Emile Argand 11, 2009 Neuchâtel, Switzerland {Jacques.Savoy,
More informationA Comparative Study of Online Translation Services for Cross Language Information Retrieval
A Comparative Study of Online Translation Services for Cross Language Information Retrieval Ali Hosseinzadeh Vahid, Piyush Arora, Qun Liu, Gareth J. F. Jones ADAPT Centre / CNGL School of Computing Dublin
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationHPI in-memory-based database system in Task 2b of BioASQ
CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture
More informationConstruction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
More informationOverview of iclef 2008: search log analysis for Multilingual Image Retrieval
Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Julio Gonzalo Paul Clough Jussi Karlgren UNED U. Sheffield SICS Spain United Kingdom Sweden julio@lsi.uned.es p.d.clough@sheffield.ac.uk
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationCross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
More informationWhy are Organizations Interested?
SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationFast-Champollion: A Fast and Robust Sentence Alignment Algorithm
Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm Peng Li and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationCross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
More informationImproved implementation for finding text similarities in large collections of data
Improved implementation for finding text similarities in large collections of data Notebook for PAN at CLEF 2011 Ján Grman and udolf avas SVOP Ltd., Bratislava, Slovak epublic {grman,ravas}@svop.sk Abstract.
More informationCACAO PROJECT AT THE LOGCLEF TRACK
CACAO PROJECT AT THE LOGCLEF TRACK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype
More informationIntegra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013
Integra(on of human and machine transla(on Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Motivation Human translation (HT) worldwide demand for translation services has accelerated,
More informationThe SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
More information2-3 Automatic Construction Technology for Parallel Corpora
2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large
More informationThe bilingual system MUSCLEF at QA@CLEF 2006
The bilingual system MUSCLEF at QA@CLEF 2006 Brigitte Grau, Anne-Laure Ligozat, Isabelle Robba, Anne Vilnat, Michael Bagur and Kevin Séjourné LIR group, LIMSI-CNRS, BP 133 91403 Orsay Cedex, France firstname.name@limsi.fr
More information4. Clause combining 2
Informática Aplicada a la Traducción Building and Using Translation Memories 4.1 What is a Parall Corpus A Parall Corpus consists of a set of sentences (or other segments of text) in one language, each
More informationIncorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
More informationECTACO Universal Translator ML320
ECTACO Universal Translator ML320 10-Language Dictionary English, Czech, Finnish, French, German, Italian, Polish, Russian, Spanish, Turkish User s Manual Ectaco, Inc. assumes no responsibility for any
More informationThe Influence of Topic and Domain Specific Words on WER
The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der
More informationSINAI at WEPS-3: Online Reputation Management
SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes
More informationImplementing Cross-Language Text Retrieval Systems for Large-scale Text. Mark W. Davis and William C. Ogden
Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web Mark W. Davis and William C. Ogden From: AAAI Technical Report SS-97-05. Compilation copyright
More informationComputer Aided Document Indexing System
Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationRESEARCH ASSISTANCE. The Portal is also accessible to the general public but restricted to the free case law databases.
RESEARCH ASSISTANCE I. Introduction The Common Portal of National Case Law is a meta-search engine which enables users to simultaneously research almost all the case law databases of the Supreme Courts
More informationFotis Lazarinis Technological Educational Institute of Mesolonghi, Greece. Jesús Vilares Department of Computer Science, University of A Coruña, Spain
NOTICE: this is the author s version of a work that was accepted for publication in Information Retrieval. Changes resulting from the publishing process, such as peer review, editing, corrections, structural
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationC o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationtranslation case study laterooms.com lots of content, quickly case study intl eng
translation case study laterooms.com lots of content, quickly case study intl eng background Client LateRooms.com online hotel booking Dates April 2009 July 2009 Volume 5.5 million words of hotel descriptions
More informationAn Iterative approach to extract dictionaries from Wikipedia for under-resourced languages
An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages Rohit Bharadwaj G SIEL, LTRC IIIT Hyd bharadwaj@research.iiit.ac.in Niket Tandon Databases and Information Systems
More informationOpen Source Call Centres Case Studies: 40 and 200 Seats
Open Source Call Centres Case Studies: 40 and 200 Seats Presented by Matt Florell President - ViciDial Group it360 * Toronto, Canada April 7, 2010 Open Source Software Used in Both Case Studies: Linux
More informationThe University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
More information2004/2005 Avg salary - Department academic
2004/2005 Centre for Applied Linguistics 38,339 French Studies 42,395 School of Theatre, Performance and Cultural Policy Studies 42,790 History of Art 43,276 Computer Science 43,281 English and Comparative
More informationFulfilling World Language Requirements through Alternate Means
Fulfilling World Language Requirements through Alternate Means OUSD Board Policy 6146.1 allows students to meet graduation requirements through demonstration of proficiency. Both University of California
More informationCompletely mastered service. repv - service management software
Completely mastered service repv - service management software Profit from service management... Your aim: sustained success You are always on target and ahead of the competition. This is how you have
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationIntegrating Query Translation and Document Translation in a Cross-Language Information Retrieval System
Integrating Query Translation and Document Translation in a Cross-Language Information Retrieval System Guo-Wei Bian and Hsin-Hsi Chen Department of Computer Science and Information Engineering National
More informationBig Data Summarization Using Semantic. Feture for IoT on Cloud
Contemporary Engineering Sciences, Vol. 7, 2014, no. 22, 1095-1103 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.49137 Big Data Summarization Using Semantic Feture for IoT on Cloud Yoo-Kang
More informationATLAS.ti 5 HyperResearch 2.6 MAXqda The Ethnograph 5.08 QSR N 6 QSR NVivo. Media types: rich text. Editing of coded documents supported
Software Overview ATLAS.ti 5 HyperResearch 2.6 MAXqda The Ethnograph 5.08 QSR N 6 QSR NVivo DATA ENTRY Media types: Text (txt, rtf, doc), graphic (jpeg, bmp, tiff and others), audio (wav, au, snd, mp3),
More informationCross Language Information Retrival and query Aggression
TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS K.L. Kwok, L. Grunfeld, N. Dinstl and M. Chan Computer Science Department, Queens College, CUNY Flushing, NY 11367 Abstract
More informationTHE LIST OF TUITION-FREE STUDY PROGRAMMES IN ACADEMIC YEAR 2014/2015 (ALL PROGRAMMES ARE TAUGHT IN THE POLISH LANGUAGE) Faculty of Social Sciences
THE LIST OF TUITION-FREE STUDY PROGRAMMES IN ACADEMIC YEAR 04/05 (ALL PROGRAMMES ARE TAUGHT IN THE POLISH LANGUAGE) / Specialisation Level of study Number of available places Philosophy History Political
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationComprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
More informationMANAGING TRANSLATION AND LOCALISATION PROJECTS WITH LTC ORGANISER
MANAGING TRANSLATION AND LOCALISATION PROJECTS WITH LTC ORGANISER Dr Adriane Rinsche, Language Technology Centre Ltd., 5-7 Kingston Hill, Kingston upon Thames, Surrey, KT2 7PW, UK Email: rinsche@langtech.co.uk
More informationWorking Note FIRE 2013
Working Note FIRE 2013 FAQ retrieval using noisy queries Divyesh Sanjay Kothari Abhinav Saraswat Sarang Kapoor ISM DHANBAD ISM DHANBAD ISM DHANBAD Anjaney Pandey ISM DHANBAD Sukomal Pal ISM DHANBAD mailto:divyesh2506@gmail.com
More informationAnalyzing Chinese-English Mixed Language Queries in a Web Search Engine
Analyzing Chinese-English Mixed Language Queries in a Web Search Engine Hengyi Fu School of Information Florida State University 142 Collegiate Loop, FL 32306 hf13c@my.fsu.edu Shuheng Wu School of Information
More informationModern foreign languages
Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007
More informationOptimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
More informationQuery Modification through External Sources to Support Clinical Decisions
Query Modification through External Sources to Support Clinical Decisions Raymond Wan 1, Jannifer Hiu-Kwan Man 2, and Ting-Fung Chan 1 1 School of Life Sciences and the State Key Laboratory of Agrobiotechnology,
More informationMaskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier
Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every
More informationMultilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013
Multilingual Term Extraction as a Service from Acrolinx Ben Gottesman Michael Klemme Acrolinx CHAT2013 Definitions term extraction: automatically identifying potential terms in a document (corpus) multilingual
More informationAutomatic Text Processing: Cross-Lingual. Text Categorization
Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell Informazione Università degli Studi di Siena Dottorato di Ricerca in Ingegneria dell Informazone XVII ciclo
More informationThe Successful Application of Natural Language Processing for Information Retrieval
The Successful Application of Natural Language Processing for Information Retrieval ABSTRACT In this paper, a novel model for monolingual Information Retrieval in English and Spanish language is proposed.
More informationThe XLDB Group at CLEF 2004
The XLDB Group at CLEF 2004 Nuno Cardoso, Mário J. Silva, and Miguel Costa Grupo XLDB - Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa {ncardoso, mjs, mcosta} at xldb.di.fc.ul.pt
More informationEUROPEAN. Geographic Trend Report for GMAT Examinees
2011 EUROPEAN Geographic Trend Report for GMAT Examinees EUROPEAN Geographic Trend Report for GMAT Examinees The European Geographic Trend Report for GMAT Examinees identifies mobility trends among GMAT
More information