Multilingual Information Retrieval Using English and Chinese Queries

Size: px
Start display at page:

Download "Multilingual Information Retrieval Using English and Chinese Queries"

Transcription

1 Multilingual Information Retrieval Using and Chinese Queries Aitao Chen School of Information Management and Systems University of California, Berkeley CLEF 2001 Workshop: 3-4 Sept, 2001, Darmstadt, Germany

2 Outline Overview over what we did at CLEF-2001 German decompounding Chinese topics translation Merging strategies and alternative methods Conclusions

3 Participation in CLEF-2001 Monolingual task (German and Spanish) Bilingual task (Chinese to ) Multilingual task ( and Chinese)

4 Overview of Multilingual Information Retrieval Using Queries Query Documents SYSTRAN and L&H French German French German Italian Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs merger combined ranked list of documents

5 Chinese Overview of Multilingual Information bilingual dict parallel texts search engine Retrieval Using Chinese Queries SYSTRAN and L&H Query French German Italian Documents French German Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs merger combined ranked list of documents

6 German Decompounding Procedure Create a German base dictionary consisting of single words only (compounds are excluded). Decompose a compound into component words found in the German base dictionary. Choose the decomposition with the minimum number of component words. If there are more than one decompositions having the minimum number of component words, choose the decomposition with the highest probability.

7 German Decompounding: Example 1 Compound: filmfestspiele (film festival) 1. Base dictionary film fest fests festspiele piele s 2. Decompositions: 1. film fest s piele 2. film fest spiele 3. film fests piele 4. film festspiele 3. Result: filmfestspiele = file festspiele

8 German Decompounding: Example 2 Compound: hungerstreiks (hunger strike) 1. Base dictionary erst hung hunger hungers hungerst reik reiks s streik streiks 2. Decompositions: log p(d) 1. hung erst reik s hung erst reiks hunger streik s hunger streiks hungerst reik s hungerst reiks Result: hungerstreiks = hunger streiks

9 German Decompounding: Probability of Decomposition C = W 1 W2 W3 W4 p( C) = p( W1 ) p( W2 ) p( W3 ) p( W4 ) p( w) = n tfc( i= 1 tfc( w) w i ) tfc(w) is the number of times word w occurs in a corpus. n is the number of unique words (including compounds) in a corpus.

10 German Decompounding: Failed Cases 1. erdatmosphäre = erde + atmosphäre (earth atmosphere) 2. mittagessenzeit = mittag essen zeit (noon meal time) (mittagessenzeit = mittagessen zeit) lunch time 3. And others

11 German Decompounding and Monolingual Retrieval Performance Test collections -Decompounding -Stemming -Expansion + Decompounding Change CLEF-2001 (49/225K).3673 (1877/2130).4314 (1949/2130) % CLEF-2000 (37/154K).3189 (673/821).4112 (770/821) % TREC-6/7/8 (73/252K).2993 (1907/2626).3368 (2172/2626) % Only component words of compounds are kept in the queries.

12 German Monolingual Retrieval Performance Precision Recall BK2GGA1 (.4050) BK2GGA2 (.3551) bk2gga1* (.4436) Features: +stemming, +decompounding, -expansion

13 Overview of Chinese to Retrieval Chinese topics segmentation stopwords removal Translation resources Term selection & weighting de-segmentation LDC bilingual wordlist term selection Monolingual Chinese words Bilingual dict (parallel texts) term selection term merging & weighting queries (in words) docs Preprocessing Chinese search engine term selection system retrieval results

14 Chinese Topics Preprocessing: De-segmentation

15 Translation Resources: Creation of Bilingual Dictionary From Parallel Texts Parallel texts: Hong Kong news (4/98-4/2001) and FBIS Chinese collection. Document alignment: + LDC wordlist. Paragraph & sentence alignment: adapted from Gale and Church s length-based model. Association measure: Dunning s maximum likelihood ratio statistic.

16 Term Translation Using Search Engine

17 E1 1 E2 1 E3 1 Term Selection, Merging, and E3 1 E4 1 Weighting (1) Top-3 translations of Chinese word C1 from LDC wordlist. Translations are ranked by occurrence frequency in the LA Times collection. (2) Top-2 translations of Chinese word C1 from parallel texts. Translations are ranked by association weight. (1) (2) (5) E1 1 E2 1 E3 2 E4 1 C1 2 E1.20 E2.20 E3.40 E4.20 Original query term frequency of C1 Final term weights for translations of C1 E1.40 E2.40 E3.80 E4.40 (3) (4) (6)

18 Translation Resources Versus Chinese-to- Performance Precision Recall LDC+HKF+YAHOO (.4112) LDC+HKF (.3599) LDC (.2679) HKF (.2675) Mono (.5553)

19 Multilingual Information Retrieval: Merging Strategy docs French docs Italian docs German docs Spanish docs E1 e1 E2 e2 E50 e50 E51 e51 E1000 e1000 F1 f1 F2 f2 F50 f50 F51 f51 F1000 f1000 I1 i1 I2 i2 I50 i50 I51 i51 I1000 i1000 G1 g1 G2 g2 G50 g50 G51 g51 G1000 g1000 S1 s1 S2 s2 S50 s50 S51 s51 S1000 s1000 E1.8*e1 + 1 E2.8*e2 + 1 E50.8*e E51.8*e51 E1000.8*e1000 F1 f1 + 1 F2 f2 + 1 F50 f F51 f51 F1000 f1000 I1 i1 + 1 I2 i2 + 1 I50 i I51 i51 I1000 i1000 G1 g1 + 1 G2 g2 + 1 G50 g G51 g51 G1000 g1000 S1 s1 + 1 S2 s2 + 1 S50 s S51 s51 S1000 s1000 (1) combine lists; (2) sort by adjusted weight; (3) take top 1000 docs

20 Performance of Multilingual Information Retrieval Using Long Queries Query Documents French French SYSTRAN and L&H German German Italian Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs (.5553) (.4776) (.3789) (.3934) (.4703) merger (.3424) combined ranked list of documents

21 Performance of Multilingual Information Retrieval Using Chinese Long Queries Original Query Chinese bilingual dict parallel texts search engine SYSTRAN and L&H (.4122) Query French German Italian Documents French German Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs (.4122) (.2874) (.2619) (.2509) (.2942) merger (.2217) combined ranked list of documents

22 Multilingual Information Retrieval: Alternative Merging Strategy docs French docs Italian docs German docs Spanish docs E1 e1 E2 e2 E50 e50 E51 e51 E1000 e1000 F1 f1 F2 f2 F50 f50 F51 f51 F1000 f1000 I1 i1 I2 i2 I50 i50 I51 i51 I1000 i1000 G1 g1 G2 g2 G50 g50 G51 g51 G1000 g1000 S1 s1 S2 s2 S50 s50 S51 s51 S1000 s1000 E1 e1/e1 E2 e2/e1 E50 e50/e1 E51 e51/e1 E1000 e1000/e1 F1 f1/f1 F2 f2/f1 F50 f50/f1 F51 f51/f1 F1000 f1000/f1 I1 i1/i1 I2 i2/i1 I50 i50/i1 I51 i51/i1 I1000 i1000/i1 G1 g1/g1 G2 g2/g1 G50 g50/g1 G51 g51/g1 G1000 g1000/g1 S1 s1/s1 S2 s2/s1 S50 s50/s1 S51 s51/s1 S1000 s1000/s1 (1) combine lists; (2) sort by adjusted weight; (3) take top 1000 docs

23 Multilingual Information Retrieval: Alternative Method 1 Multilingual Query Multilingual Document Collection translator French German Italian engine French German Italian Spanish Spanish ranked list of docs in multiple languages

24 Multilingual Information Retrieval: Alternative Method 2 Translated documents Query Original documents engine translator translator French German translator Italian translator Spanish ranked list of docs in

25 Multilingual Information Retrieval: Alternative Method 3 Query Documents French French translator German German Italian Italian Spanish Spanish docs French docs German docs Italian docs Spanish docs translator translator translator translator docs docs docs docs docs combined ranked list of documents

26 Performance of Different ML Methods Precision Recall BK2MUEAA1 (.3424) NormalizedMerging (.3286) ML Alternative 1 (.3126) ML Alternative 3 (.3648)

27 Conclusions German decompounding can significantly improve retrieval performance. Keeping only component words in the query works better than keeping both compounds and component words. Chinese search engine is a valuable resource for translating Chinese proper nouns into. Merging documents by adjusted probability of relevance works reasonably well.

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

Using Wikipedia to Translate OOV Terms on MLIR

Using Wikipedia to Translate OOV Terms on MLIR Using to Translate OOV Terms on MLIR Chen-Yu Su, Tien-Chien Lin and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University of Technology Taichung County 41349, TAIWAN

More information

EuropeanaConnect Multilinguality Survey

EuropeanaConnect Multilinguality Survey EuropeanaConnect Multilinguality Survey Nicola Ferro & Vivien Petras Workshop at ICSD 2009 Trento, Italy 9 September 2009 Background EuropeanaConnect Task 2.1 User studies & multilingual resources use:

More information

How One Word Can Make all the Difference

How One Word Can Make all the Difference How One Word Can Make all the Difference Using Subject Metadata for Automatic Query Expansion and Reformulation Vivien Petras School of Information Management & Systems UC Berkeley Overview Introduction

More information

Using COTS Search Engines and Custom Query Strategies at CLEF

Using COTS Search Engines and Custom Query Strategies at CLEF Using COTS Search Engines and Custom Query Strategies at CLEF David Nadeau, Mario Jarmasz, Caroline Barrière, George Foster, and Claude St-Jacques Language Technologies Research Centre Interactive Language

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

GeoCLEF Administration. Content. Initial Aim of GeoCLEF. Interesting Issues

GeoCLEF Administration. Content. Initial Aim of GeoCLEF. Interesting Issues 9 th Workshop of the Cross-Language Evaluation Forum (CLEF) Århus, 18 th Sept. 2008 GeoCLEF Administration Joint effort of Fredric Gey, Ray Larson (U. California at Berkeley) Diana Santos (Linguateca,

More information

The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized

The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized The Language Grid The Language Grid combines users language resources and machine translators to produce high-quality translation that is customized to each field. The Language Grid, a software that provides

More information

Simple maths for keywords

Simple maths for keywords Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd adam@lexmasterclass.com Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all

More information

Recent developments in machine translation policy at the European Patent Office

Recent developments in machine translation policy at the European Patent Office Recent developments in machine translation policy at the European Patent Office Dr Georg Artelsmair Director European Co-operation European Patent Office Brussels, 17 November 2010 The European Patent

More information

Improving Non-English Web Searching (inews07)

Improving Non-English Web Searching (inews07) SIGIR 2007 WORKSHOP REPORT Improving Non-English Web Searching (inews07) Fotis Lazarinis Technological Educational Institute Mesolonghi, Greece lazarinf@teimes.gr Jesus Vilares Ferro University of A Coruña

More information

Getting Off to a Good Start: Best Practices for Terminology

Getting Off to a Good Start: Best Practices for Terminology Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction

More information

Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure

Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,

More information

Optimizing Multilingual Search With Solr

Optimizing Multilingual Search With Solr www.basistech.com info@basistech.com 617-386-2090 Optimizing Multilingual Search With Solr Pg. 1 INTRODUCTION Today s search application users expect search engines to just work seamlessly across multiple

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

BITS: A Method for Bilingual Text Search over the Web

BITS: A Method for Bilingual Text Search over the Web BITS: A Method for Bilingual Text Search over the Web Xiaoyi Ma, Mark Y. Liberman Linguistic Data Consortium 3615 Market St. Suite 200 Philadelphia, PA 19104, USA {xma,myl}@ldc.upenn.edu Abstract Parallel

More information

Ontology-Based Multilingual Information Retrieval

Ontology-Based Multilingual Information Retrieval Ontology-Based Multilingual Information Retrieval Jacques Guyot * Saïd Radhouani *,** Gilles Falquet * * Centre universitaire d informatique 24, rue Général-Dufour, CH-1211 Genève 4, Switzerland ** Laboratoire

More information

How To Access Multilingual Information On The Web With Google And Clir

How To Access Multilingual Information On The Web With Google And Clir Information Access across Languages on the Web: From Search Engines to Digital Libraries Jiangping Chen, Yu Bao Department of Library and Information Sciences, University of North Texas 1155 Union Circle

More information

Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval

Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval Wessel Kraaij Jian-Yun Nie Michel Simard TNO TPD Université de Montréal Université de Montréal Although more and

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

Dutch Parallel Corpus

Dutch Parallel Corpus Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Introduction. Philipp Koehn. 28 January 2016

Introduction. Philipp Koehn. 28 January 2016 Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)

More information

University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion

University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion Gina-Anne Levow University of Chicago 1100 E. 58th St, Chicago, IL 60637, USA levow@cs.uchicago.edu Abstract Pseudo-relevance feedback,

More information

Glossary of translation tool types

Glossary of translation tool types Glossary of translation tool types Tool type Description French equivalent Active terminology recognition tools Bilingual concordancers Active terminology recognition (ATR) tools automatically analyze

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

Interoperability, Standards and Open Advancement

Interoperability, Standards and Open Advancement Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations

More information

A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR

A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR Dan Wu School of Information Management Wuhan University, Hubei, China woodan@whu.edu.cn Daqing He School of Information

More information

How Effective is Google s Translation Service in Search?

How Effective is Google s Translation Service in Search? ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Communications of the

More information

Question template for interviews

Question template for interviews Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by

More information

Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications

Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications Berlin Berlin Buzzwords 2011, Dr. Christoph Goller, IntraFind AG Outline IntraFind AG Indexing Morphological

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

How effective is Google s translation service in search?

How effective is Google s translation service in search? How effective is Google s translation service in search? Jacques Savoy, Ljiljana Dolamic Computer Science Dept., University of Neuchatel, Rue Emile Argand 11, 2009 Neuchâtel, Switzerland {Jacques.Savoy,

More information

A Comparative Study of Online Translation Services for Cross Language Information Retrieval

A Comparative Study of Online Translation Services for Cross Language Information Retrieval A Comparative Study of Online Translation Services for Cross Language Information Retrieval Ali Hosseinzadeh Vahid, Piyush Arora, Qun Liu, Gareth J. F. Jones ADAPT Centre / CNGL School of Computing Dublin

More information

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:

More information

HPI in-memory-based database system in Task 2b of BioASQ

HPI in-memory-based database system in Task 2b of BioASQ CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture

More information

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi

More information

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Julio Gonzalo Paul Clough Jussi Karlgren UNED U. Sheffield SICS Spain United Kingdom Sweden julio@lsi.uned.es p.d.clough@sheffield.ac.uk

More information

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Cross-Lingual Concern Analysis from Multilingual Weblog Articles

Cross-Lingual Concern Analysis from Multilingual Weblog Articles Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/

More information

Why are Organizations Interested?

Why are Organizations Interested? SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions

More information

TS3: an Improved Version of the Bilingual Concordancer TransSearch

TS3: an Improved Version of the Bilingual Concordancer TransSearch TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by

More information

Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm

Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm Peng Li and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

Cross-Lingual Concern Analysis from Multilingual Weblog Articles

Cross-Lingual Concern Analysis from Multilingual Weblog Articles Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/

More information

Improved implementation for finding text similarities in large collections of data

Improved implementation for finding text similarities in large collections of data Improved implementation for finding text similarities in large collections of data Notebook for PAN at CLEF 2011 Ján Grman and udolf avas SVOP Ltd., Bratislava, Slovak epublic {grman,ravas}@svop.sk Abstract.

More information

CACAO PROJECT AT THE LOGCLEF TRACK

CACAO PROJECT AT THE LOGCLEF TRACK CACAO PROJECT AT THE LOGCLEF TRACK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype

More information

Integra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013

Integra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Integra(on of human and machine transla(on Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Motivation Human translation (HT) worldwide demand for translation services has accelerated,

More information

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter

More information

2-3 Automatic Construction Technology for Parallel Corpora

2-3 Automatic Construction Technology for Parallel Corpora 2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large

More information

The bilingual system MUSCLEF at QA@CLEF 2006

The bilingual system MUSCLEF at QA@CLEF 2006 The bilingual system MUSCLEF at QA@CLEF 2006 Brigitte Grau, Anne-Laure Ligozat, Isabelle Robba, Anne Vilnat, Michael Bagur and Kevin Séjourné LIR group, LIMSI-CNRS, BP 133 91403 Orsay Cedex, France firstname.name@limsi.fr

More information

4. Clause combining 2

4. Clause combining 2 Informática Aplicada a la Traducción Building and Using Translation Memories 4.1 What is a Parall Corpus A Parall Corpus consists of a set of sentences (or other segments of text) in one language, each

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

ECTACO Universal Translator ML320

ECTACO Universal Translator ML320 ECTACO Universal Translator ML320 10-Language Dictionary English, Czech, Finnish, French, German, Italian, Polish, Russian, Spanish, Turkish User s Manual Ectaco, Inc. assumes no responsibility for any

More information

The Influence of Topic and Domain Specific Words on WER

The Influence of Topic and Domain Specific Words on WER The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der

More information

SINAI at WEPS-3: Online Reputation Management

SINAI at WEPS-3: Online Reputation Management SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes

More information

Implementing Cross-Language Text Retrieval Systems for Large-scale Text. Mark W. Davis and William C. Ogden

Implementing Cross-Language Text Retrieval Systems for Large-scale Text. Mark W. Davis and William C. Ogden Implementing Cross-Language Text Retrieval Systems for Large-scale Text Collections and the World Wide Web Mark W. Davis and William C. Ogden From: AAAI Technical Report SS-97-05. Compilation copyright

More information

Computer Aided Document Indexing System

Computer Aided Document Indexing System Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

RESEARCH ASSISTANCE. The Portal is also accessible to the general public but restricted to the free case law databases.

RESEARCH ASSISTANCE. The Portal is also accessible to the general public but restricted to the free case law databases. RESEARCH ASSISTANCE I. Introduction The Common Portal of National Case Law is a meta-search engine which enables users to simultaneously research almost all the case law databases of the Supreme Courts

More information

Fotis Lazarinis Technological Educational Institute of Mesolonghi, Greece. Jesús Vilares Department of Computer Science, University of A Coruña, Spain

Fotis Lazarinis Technological Educational Institute of Mesolonghi, Greece. Jesús Vilares Department of Computer Science, University of A Coruña, Spain NOTICE: this is the author s version of a work that was accepted for publication in Information Retrieval. Changes resulting from the publishing process, such as peer review, editing, corrections, structural

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

translation case study laterooms.com lots of content, quickly case study intl eng

translation case study laterooms.com lots of content, quickly case study intl eng translation case study laterooms.com lots of content, quickly case study intl eng background Client LateRooms.com online hotel booking Dates April 2009 July 2009 Volume 5.5 million words of hotel descriptions

More information

An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages

An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages Rohit Bharadwaj G SIEL, LTRC IIIT Hyd bharadwaj@research.iiit.ac.in Niket Tandon Databases and Information Systems

More information

Open Source Call Centres Case Studies: 40 and 200 Seats

Open Source Call Centres Case Studies: 40 and 200 Seats Open Source Call Centres Case Studies: 40 and 200 Seats Presented by Matt Florell President - ViciDial Group it360 * Toronto, Canada April 7, 2010 Open Source Software Used in Both Case Studies: Linux

More information

The University of Amsterdam s Question Answering System at QA@CLEF 2007

The University of Amsterdam s Question Answering System at QA@CLEF 2007 The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,

More information

2004/2005 Avg salary - Department academic

2004/2005 Avg salary - Department academic 2004/2005 Centre for Applied Linguistics 38,339 French Studies 42,395 School of Theatre, Performance and Cultural Policy Studies 42,790 History of Art 43,276 Computer Science 43,281 English and Comparative

More information

Fulfilling World Language Requirements through Alternate Means

Fulfilling World Language Requirements through Alternate Means Fulfilling World Language Requirements through Alternate Means OUSD Board Policy 6146.1 allows students to meet graduation requirements through demonstration of proficiency. Both University of California

More information

Completely mastered service. repv - service management software

Completely mastered service. repv - service management software Completely mastered service repv - service management software Profit from service management... Your aim: sustained success You are always on target and ahead of the competition. This is how you have

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Integrating Query Translation and Document Translation in a Cross-Language Information Retrieval System

Integrating Query Translation and Document Translation in a Cross-Language Information Retrieval System Integrating Query Translation and Document Translation in a Cross-Language Information Retrieval System Guo-Wei Bian and Hsin-Hsi Chen Department of Computer Science and Information Engineering National

More information

Big Data Summarization Using Semantic. Feture for IoT on Cloud

Big Data Summarization Using Semantic. Feture for IoT on Cloud Contemporary Engineering Sciences, Vol. 7, 2014, no. 22, 1095-1103 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.49137 Big Data Summarization Using Semantic Feture for IoT on Cloud Yoo-Kang

More information

ATLAS.ti 5 HyperResearch 2.6 MAXqda The Ethnograph 5.08 QSR N 6 QSR NVivo. Media types: rich text. Editing of coded documents supported

ATLAS.ti 5 HyperResearch 2.6 MAXqda The Ethnograph 5.08 QSR N 6 QSR NVivo. Media types: rich text. Editing of coded documents supported Software Overview ATLAS.ti 5 HyperResearch 2.6 MAXqda The Ethnograph 5.08 QSR N 6 QSR NVivo DATA ENTRY Media types: Text (txt, rtf, doc), graphic (jpeg, bmp, tiff and others), audio (wav, au, snd, mp3),

More information

Cross Language Information Retrival and query Aggression

Cross Language Information Retrival and query Aggression TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS K.L. Kwok, L. Grunfeld, N. Dinstl and M. Chan Computer Science Department, Queens College, CUNY Flushing, NY 11367 Abstract

More information

THE LIST OF TUITION-FREE STUDY PROGRAMMES IN ACADEMIC YEAR 2014/2015 (ALL PROGRAMMES ARE TAUGHT IN THE POLISH LANGUAGE) Faculty of Social Sciences

THE LIST OF TUITION-FREE STUDY PROGRAMMES IN ACADEMIC YEAR 2014/2015 (ALL PROGRAMMES ARE TAUGHT IN THE POLISH LANGUAGE) Faculty of Social Sciences THE LIST OF TUITION-FREE STUDY PROGRAMMES IN ACADEMIC YEAR 04/05 (ALL PROGRAMMES ARE TAUGHT IN THE POLISH LANGUAGE) / Specialisation Level of study Number of available places Philosophy History Political

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1] Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

Comprendium Translator System Overview

Comprendium Translator System Overview Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4

More information

MANAGING TRANSLATION AND LOCALISATION PROJECTS WITH LTC ORGANISER

MANAGING TRANSLATION AND LOCALISATION PROJECTS WITH LTC ORGANISER MANAGING TRANSLATION AND LOCALISATION PROJECTS WITH LTC ORGANISER Dr Adriane Rinsche, Language Technology Centre Ltd., 5-7 Kingston Hill, Kingston upon Thames, Surrey, KT2 7PW, UK Email: rinsche@langtech.co.uk

More information

Working Note FIRE 2013

Working Note FIRE 2013 Working Note FIRE 2013 FAQ retrieval using noisy queries Divyesh Sanjay Kothari Abhinav Saraswat Sarang Kapoor ISM DHANBAD ISM DHANBAD ISM DHANBAD Anjaney Pandey ISM DHANBAD Sukomal Pal ISM DHANBAD mailto:divyesh2506@gmail.com

More information

Analyzing Chinese-English Mixed Language Queries in a Web Search Engine

Analyzing Chinese-English Mixed Language Queries in a Web Search Engine Analyzing Chinese-English Mixed Language Queries in a Web Search Engine Hengyi Fu School of Information Florida State University 142 Collegiate Loop, FL 32306 hf13c@my.fsu.edu Shuheng Wu School of Information

More information

Modern foreign languages

Modern foreign languages Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

Query Modification through External Sources to Support Clinical Decisions

Query Modification through External Sources to Support Clinical Decisions Query Modification through External Sources to Support Clinical Decisions Raymond Wan 1, Jannifer Hiu-Kwan Man 2, and Ting-Fung Chan 1 1 School of Life Sciences and the State Key Laboratory of Agrobiotechnology,

More information

Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier

Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every

More information

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013 Multilingual Term Extraction as a Service from Acrolinx Ben Gottesman Michael Klemme Acrolinx CHAT2013 Definitions term extraction: automatically identifying potential terms in a document (corpus) multilingual

More information

Automatic Text Processing: Cross-Lingual. Text Categorization

Automatic Text Processing: Cross-Lingual. Text Categorization Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell Informazione Università degli Studi di Siena Dottorato di Ricerca in Ingegneria dell Informazone XVII ciclo

More information

The Successful Application of Natural Language Processing for Information Retrieval

The Successful Application of Natural Language Processing for Information Retrieval The Successful Application of Natural Language Processing for Information Retrieval ABSTRACT In this paper, a novel model for monolingual Information Retrieval in English and Spanish language is proposed.

More information

The XLDB Group at CLEF 2004

The XLDB Group at CLEF 2004 The XLDB Group at CLEF 2004 Nuno Cardoso, Mário J. Silva, and Miguel Costa Grupo XLDB - Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa {ncardoso, mjs, mcosta} at xldb.di.fc.ul.pt

More information

EUROPEAN. Geographic Trend Report for GMAT Examinees

EUROPEAN. Geographic Trend Report for GMAT Examinees 2011 EUROPEAN Geographic Trend Report for GMAT Examinees EUROPEAN Geographic Trend Report for GMAT Examinees The European Geographic Trend Report for GMAT Examinees identifies mobility trends among GMAT

More information