Creating resources for Arabic summarisation. Dr Mahmoud El-Haj School of Computing and Communications

Size: px
Start display at page:

Download "Creating resources for Arabic summarisation. Dr Mahmoud El-Haj School of Computing and Communications"

Transcription

1 Creating resources for Arabic summarisation Dr Mahmoud El-Haj School of Computing and Communications

2 Purpose The purpose of this exercise was two-fold: First it addresses a shortage of relevant data for Arabic natural language processing. Second it demonstrates the application of crowdsourcing, Machine Translation and Human Experts to the problem of creating natural language resources.

3 Creating Resources for Single document Summarisation Contribution Provide a human single-document summaries corpus. Method We used online workforce service (Mechanical Turk). Source Arabic Wikipedia and two Arabic newspapers (Alrai, Alwatan).

4 The Document Collection The sources were chosen for the following reasons. Contain real text written and used by native Arabic speakers. Written by many authors from different backgrounds. Cover a range of topics from different subject areas.

5 Human-Generated Summaries The corpus was generated using Mechanical Turk Human Intelligence Tasks (HITS). The assessors (workers) summarise one article per task. No more than half of the sentences in the article. Five summaries were created for each article. Summaries for a given article were generated by different workers.

6 Quality of the Summaries Each worker was asked to provide up to three keywords. In the case of selecting random sentences, the summary is still considered.

7 Creating Gold-Standards First: We selected all sentences identified by at least three of the five annotators (Level 3). Second: We selected all sentences identified by at least two annotators (Level 2). Finally, all sentences identified by any of the annotators (All).

8 HIT Sample

9 EASC Corpus Statistics Corpus Name Essex Arabic Summaries Corpus (EASC) Number of Documents 153 Number of Sentences 2,360 Number of Words 41,493 Number of Distinct Words 18,264 Number of Human Summaries 765 (five for each document).

10 Creating Resources for Multidocument Summarisation Contribution Provide a human and system multi-document summaries corpus. Method We used Machine Translation and Human Expert (Arabic Native Speakers).

11 Using Machine Translation MT was used to overcome the lack of Arabic multidocument resources (gold-standard summaries). The DUC 2002 English dataset provides articles and multidocument gold standards for extractive summaries. We performed sentence-by-sentence translation of this dataset into Arabic. The translation is to allow us to evaluate Arabic summaries against English gold standards. The translating into Arabic was done using Google Translate.

12 Using Machine Translation Main objective is to compare our Arabic multi-document summariser with the currently available English ones. To do this we translated the sentences in each article into Arabic, using the Java version of the Google Translate API. A total of 17,340 sentences were translated. The process resulted in a parallel sentence-by-sentence Arabic/English version of the DUC-2002 dataset. The summaries were created using the applicable English or Arabic version of our multi-document summariser system. We did not translate the gold-standard summaries themselves into Arabic.

13 Duc-2002 Arabic Corpus Statistics Corpus Name Duc-2002 (Arabic Translation) Number of Documents 567 Number of Sentences 17,340 Number of Words 199,423 Number of Distinct Words 19,307 Number of Reference Sets 59 Documents per Reference Set 10 on average Number of Gold-standard summaries 118 (two for each reference set)

14 Using Human Experts Used human participants to manually create the resources. The manual process included translating, validating and summarising documents written in English. The process of manually creating an Arabic multidocument corpus was part of our organisation for the TAC 2011 MultiLing Summarisation Pilot.

15 Using Human Experts The participants are responsible for translating and summarising an English test collection into six languages including: Arabic, Hindi, French, Czech, Greek and Hebrew. The test collection was based on WikiNews texts. The source documents contained no meta-data or tags and were represented as UTF 8 plain text files. The test collection contained 100 articles divided into ten reference sets, each contained ten related articles discussing the same topic. The original language of the dataset was English.

16 The Participants A total of twelve people participated in: translating the English corpus into Arabic. summarising the set of related Arabic articles. validating the translation. evaluating the summarisation quality. The participants were paid using Amazon vouchers. The amount of the vouchers varied depending on the task performed. The total amount of Amazon vouchers paid to the participants was 250 where three of the participants volunteered to do the tasks.

17 The Participants The participants have been selected based on their proficiency in Arabic and each one of them must be an Arabic native speaker. The participants are studying, or have finished a university degree in an Arabic speaking country. The participants age range between 22 and 64 years old.

18 Creating The Corpus The participants translated the English dataset into Arabic. For each translated article another translator should validate the translation and fix any errors. For each of the translated articles, a number of three manual summaries were created by three different participants (human peers). Amid the summarisation process the participants should evaluate the quality of the generated summary by assigning a score between one (unreadable summary) and five (fluent and readable summary). No self evaluation was allowed.

19 TAC 2011 Arabic Corpus Statistics Corpus Name Duc-2002 (Arabic Translation) Number of Documents 100 Number of Sentences 1,573 Number of Words 30,908 Number of Distinct Words 9,632 Number of Reference Sets 10 Documents per Reference Set 10 Number of Gold-standard summaries 30 (three for each reference set)

20 Summary Resource creation plays an important role in the advance of Arabic single and multidocument summarisation. Three summaries corpora have been created. 1. Essex Arabic Summaries Corpus (EASC), a single-document Arabic extractive summaries. 2. The second corpus was a multidocument summaries created using a machine translation tool. The approach used introduced a costeffective solution to the problem of limited Arabic resources. 3. The third corpus, MultiLing dataset, is a multidocument summaries corpus. The corpus was created by native Arabic speakers.

MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations

MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations George Giannakopoulos NCSR Demokritos Athens, Greece ggianna@iit.demokritos.gr Jeff

More information

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Turker-Assisted Paraphrasing for English-Arabic Machine Translation Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University

More information

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies

More information

Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian

Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian Lei Li BUPT, China leili@bupt.edu.cn Corina Forascu RACAI, Romania UAIC, Romania corinfor@info.uaic.ro

More information

Crowdsourcing for Big Data Analytics

Crowdsourcing for Big Data Analytics KYOTO UNIVERSITY Crowdsourcing for Big Data Analytics Hisashi Kashima (Kyoto University) Satoshi Oyama (Hokkaido University) Yukino Baba (Kyoto University) DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY

More information

Build Vs. Buy For Text Mining

Build Vs. Buy For Text Mining Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Liquid OS X User Guide

Liquid OS X User Guide Basic Use To use Liquid on selected text: Liquid OS X User Guide Select the text CMD-shift-2 Now Liquid appears with your selected text inserted, ready for you to choose a command. Choose a command by

More information

2-3 Automatic Construction Technology for Parallel Corpora

2-3 Automatic Construction Technology for Parallel Corpora 2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large

More information

Social Media Monitoring Tools enhanced by Semantic Web Technologies. Presentation of the Master Thesis Fabian Gasser

Social Media Monitoring Tools enhanced by Semantic Web Technologies. Presentation of the Master Thesis Fabian Gasser Social Media Monitoring Tools enhanced by Semantic Web Technologies Presentation of the Master Thesis Fabian Gasser Contents 1. 2. 3. 4. 5. 6. 7. 8. Main Concepts Challenges Research Question Social Media

More information

Linguistic Resources for OpenHaRT-13

Linguistic Resources for OpenHaRT-13 Linguistic Resources for OpenHaRT-13 Zhiyi Song 1, Stephanie Strassel 1, David Doermann 2, Amanda Morris 1 1 Linguistic Data Consortium 2 Applied Media Analysis Introduction to LDC Model LDC is an open,

More information

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC We ll look at these questions. Why does translation cost so much? Why is it hard to keep content consistent? Why is it hard for an organization

More information

Europass website activity report 2014 (Cyprus, Greek) Visits from Cyprus during 2014

Europass website activity report 2014 (Cyprus, Greek) Visits from Cyprus during 2014 Europass website activity report 2014 (Cyprus, Greek) State of play: 2015 Visits from Cyprus during 2014 Page 1 of 14 Month Visits 2,826 2,996 2,748 2,373 2,283 2,360 3,033 2,461 3,719 4,011 4,124 2,436

More information

Developing a User-based Method of Web Register Classification

Developing a User-based Method of Web Register Classification Developing a User-based Method of Web Register Classification Jesse Egbert Douglas Biber Northern Arizona University Introduction The internet has tremendous potential for linguistic research and NLP applications

More information

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures Afsin Akdogan, Hien To, Seon Ho Kim and Cyrus Shahabi Integrated Media Systems Center University of Southern California, Los Angeles,

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

An Introduction to the Topics Course

An Introduction to the Topics Course An Introduction to the Topics Course Department of Computing, Imperial College London DOC163: An Introduction to the Topics Course Slide 1 Introduction to the Topics in Computing Course The topics element

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus Novel The Stranger

Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus Novel The Stranger Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus Novel The Stranger Mahmoud El-Haj 1, Paul Rayson 1, and David Hall 2 1 School of

More information

WHITE PAPER. Machine Translation of Language for Safety Information Sharing Systems

WHITE PAPER. Machine Translation of Language for Safety Information Sharing Systems WHITE PAPER Machine Translation of Language for Safety Information Sharing Systems September 2004 Disclaimers; Non-Endorsement All data and information in this document are provided as is, without any

More information

The. Languages Ladder. Steps to Success. The

The. Languages Ladder. Steps to Success. The The Languages Ladder Steps to Success The What is it? The development of a national recognition scheme for languages the Languages Ladder is one of three overarching aims of the National Languages Strategy.

More information

Adaptation to Hungarian, Swedish, and Spanish

Adaptation to Hungarian, Swedish, and Spanish www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,

More information

The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content Omar F. Zaidan and Chris Callison-Burch Dept. of Computer Science, Johns Hopkins University Baltimore,

More information

Lecture Overview. Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Web 2.0 Definition. Web 2.

Lecture Overview. Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Web 2.0 Definition. Web 2. Lecture Overview Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Martin Halvey Introduction to Web 2.0 Overview of Tagging Systems Overview of tagging Design and attributes

More information

REQUESTER BEST PRACTICES GUIDE

REQUESTER BEST PRACTICES GUIDE REQUESTER BEST PRACTICES GUIDE Requester Best Practices Guide Amazon Mechanical Turk is a marketplace for work where businesses (aka Requesters) publish tasks (aka HITS), and human providers (aka Workers)

More information

LSI TRANSLATION PLUG-IN FOR RELATIVITY. within

LSI TRANSLATION PLUG-IN FOR RELATIVITY. within within LSI Translation Plug-in (LTP) for Relativity is a free plug-in that allows the Relativity user to access the STS system 201 Broadway, Cambridge, MA 02139 Contact: Mark Ettinger Tel: 800-654-5006

More information

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Using the Amazon Mechanical Turk for Transcription of Spoken Language Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.

More information

Question template for interviews

Question template for interviews Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by

More information

MLLP Transcription and Translation Platform

MLLP Transcription and Translation Platform MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino

More information

Transcription bottleneck of speech corpus exploitation

Transcription bottleneck of speech corpus exploitation Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen

More information

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC) Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger European Commission Joint Research Centre (JRC) https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems

More information

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites

More information

IP5 MMT Project. June 2012. Korean Intellectual Property Office

IP5 MMT Project. June 2012. Korean Intellectual Property Office IP5 MMT Project June 2012 Korean Intellectual Property Office 1. Background 2. IP5 Efforts to Improve MMT 3. Issues to be Considered 1. Background Power Shift Convertibility of information based on automatic

More information

Military Service Credit Guidelines 29 March 11

Military Service Credit Guidelines 29 March 11 Articulation Office University Drive Camarillo, CA 930 805-437-847 Military Service Credit Guidelines 9 March. In accordance with the recommendations from the American Council on Education (ACE), Guide

More information

RECENSEO Quick Reference

RECENSEO Quick Reference Your team has the tools to dramatically speed document review. And those tools are as easy as,, Pronunciation. Re cĕn sēō Origination. From the Latin word, Review. Adjective. Powerful, intuitive, secure,

More information

Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011

Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011 Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011 Bangalore, India Title Text Analytics Introduction Entity Person Comparative Analysis Entity or Event Text Analytics Text

More information

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

PPInterFinder A Web Server for Mining Human Protein Protein Interaction PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar

More information

INDEX. OutIndex Services...2. Collection Assistance...2. ESI Processing & Production Services...2. Computer-Based Language Translation...

INDEX. OutIndex Services...2. Collection Assistance...2. ESI Processing & Production Services...2. Computer-Based Language Translation... SERVICES INDEX OutIndex Services...2 Collection Assistance...2 ESI Processing & Production Services...2 Computer-Based Language Translation...3 OutIndex E-Discovery Deployment & Installation Consulting...3

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation GATE Mímir and cloud services Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation GATE Mímir GATE Mímir is an indexing system for GATE documents. Mímir can index: Text: the original

More information

PROMT Technologies for Translation and Big Data

PROMT Technologies for Translation and Big Data PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED

More information

Intelligent Personal Assistants: A view from the intersection of Data Mining and Mobile Services

Intelligent Personal Assistants: A view from the intersection of Data Mining and Mobile Services Intelligent Personal Assistants: A view from the intersection of Data Mining and Mobile Services Y. Akinaga S. Mukherjee [speaker] H.-S. Shin P. Subasic R. Sujithan Y. Sun H. Yin DOCOMO Innovations, Inc.,

More information

TAUS 2015. Membership Program (Executive Overview) write to memberservices@taus.net to request the 35 pages detailed service overview. www.taus.

TAUS 2015. Membership Program (Executive Overview) write to memberservices@taus.net to request the 35 pages detailed service overview. www.taus. TAUS 2015 Membership Program (Executive Overview) write to memberservices@taus.net to request the 35 pages detailed service overview www.taus.net Five Reasons to be a TAUS Member 1. Access the collaborative

More information

Dutch Parallel Corpus

Dutch Parallel Corpus Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT

More information

How do I translate...?

How do I translate...? How do I translate...? Professional Translation Hybrid Translation Machine Translation Certified Translation Supported Formats Language Codes Toll Free: (800)790-3680 Professional Translation Hybrid Translation

More information

Approach to providing interpreting and information in translation or alternative format. www.homesforharingey.org 020 8489 5611 Homes for Haringey Ltd

Approach to providing interpreting and information in translation or alternative format. www.homesforharingey.org 020 8489 5611 Homes for Haringey Ltd Approach to providing interpreting and information in translation or alternative format www.homesforharingey.org 020 8489 5611 Homes for Haringey Ltd 1. Purpose of this document This document sets out

More information

Mobile and Cloud computing and SE

Mobile and Cloud computing and SE Mobile and Cloud computing and SE This week normal. Next week is the final week of the course Wed 12-14 Essay presentation and final feedback Kylmämaa Kerkelä Barthas Gratzl Reijonen??? Thu 08-10 Group

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

FREE computing using Amazon EC2

FREE computing using Amazon EC2 FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat

More information

Language Access Implementation Guide. New York City Department of Homeless Services

Language Access Implementation Guide. New York City Department of Homeless Services Language Access Implementation Guide New York City Department of Homeless Services 1 I. Agency Mission and Background I. Agency Mission and Background THE MISSION OF THE DEPARTMENT OF HOMELESS SERVICES

More information

Are You Paying Too Much for ediscovery Processing?

Are You Paying Too Much for ediscovery Processing? Are You Paying Too Much for ediscovery Processing? How new technologies can accelerate ediscovery and lower your costs Guy MacNeill Product Manager, Lexbe LC ediscovery Webinar Series About our webinars

More information

How to localise your website

How to localise your website How to localise your website Grow your e-commerce website globally 22 October 2015 Levent Yildizgoren Managing Director - TTC wetranslate Ltd About me 22 years industry experience Mentor, speaker, charity

More information

Evaluation Metrics for Persuasive NLP with Google AdWords

Evaluation Metrics for Persuasive NLP with Google AdWords Evaluation Metrics for Persuasive NLP with Google AdWords Marco Guerini, Carlo Strapparava, Oliviero Stock FBK-Irst Via Sommarive 18, Povo, I-38100 Trento {guerini, strappa, stock}@fbk.eu Abstract Evaluating

More information

Localizing Your Mobile App is Good for Business

Localizing Your Mobile App is Good for Business Global Insight Localizing Your Mobile App is Good for Business Simply put, the more people who can find and use your mobile application in their native language, the larger your potential market. But launching

More information

Big Data and Scripting. (lecture, computer science, bachelor/master/phd)

Big Data and Scripting. (lecture, computer science, bachelor/master/phd) Big Data and Scripting (lecture, computer science, bachelor/master/phd) Big Data and Scripting - abstract/organization abstract introduction to Big Data and involved techniques lecture (2+2) practical

More information

Language Translation Services RFP Issued: January 1, 2015

Language Translation Services RFP Issued: January 1, 2015 Language Translation Services RFP Issued: January 1, 2015 The following are answers to questions Brand USA has received to the RFP for Language Translation Services. Thanks to everyone who submitted questions

More information

Interoperability, Standards and Open Advancement

Interoperability, Standards and Open Advancement Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations

More information

Aligning the World Language Content Standards for California Public Schools with the Common Core Standards. Table of Contents

Aligning the World Language Content Standards for California Public Schools with the Common Core Standards. Table of Contents Aligning the World Language Standards for California Public Schools with the Common Core Standards Table of s Page Introduction to World Language Standards for California Public Schools (WLCS) 2 World

More information

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400

More information

D3.3.1: Sematic tagging and open data publication tools

D3.3.1: Sematic tagging and open data publication tools COMPETITIVINESS AND INNOVATION FRAMEWORK PROGRAMME CIP-ICT-PSP-2013-7 Pilot Type B WP3 Service platform integration and deployment in cloud infrastructure D3.3.1: Sematic tagging and open data publication

More information

INTERNATIONAL LANGUAGE STUDIES Spanish Option University Transfer Degree

INTERNATIONAL LANGUAGE STUDIES Spanish Option University Transfer Degree INTERNATIONAL LANGUAGE STUDIES Spanish Option University Transfer Degree Degree Awarded: Associate in Arts General Education Requirements Credit Hours: 37 See the General Education Requirements for the

More information

Machine Learning Approach To Augmenting News Headline Generation

Machine Learning Approach To Augmenting News Headline Generation Machine Learning Approach To Augmenting News Headline Generation Ruichao Wang Dept. of Computer Science University College Dublin Ireland rachel@ucd.ie John Dunnion Dept. of Computer Science University

More information

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

Language technologies for Education: recent results by the MLLP group

Language technologies for Education: recent results by the MLLP group Language technologies for Education: recent results by the MLLP group Alfons Juan 2nd Internet of Education Conference 2015 18 September 2015, Sarajevo Contents The MLLP research group 2 translectures

More information

Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION. clickworker GmbH 2015

Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION. clickworker GmbH 2015 Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION 2015 CLICKWORKER AT A GLANCE Segment: Paid Crowdsourcing / Microtasking Services: Text Creation (incl. SEO Texts), Internet Research,

More information

Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services

Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services What you need to know The government of D.C. has identified, vetted, and engaged four vendors in a citywide contract

More information

HEC MBA CV Template. The document attached will provide you with the template and format required for your HEC CV.

HEC MBA CV Template. The document attached will provide you with the template and format required for your HEC CV. CV Template The document attached will provide you with the template and format required for your HEC CV. The template is filled with instructions and other place-holding information. You will need to

More information

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is

More information

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

Deborah W. Robinson, Ph.D., World Languages Consultant, Ohio Department of Education. Debbie.robinson@ode.state.oh.us

Deborah W. Robinson, Ph.D., World Languages Consultant, Ohio Department of Education. Debbie.robinson@ode.state.oh.us Dear Colleagues, Language educators in Ohio now have multiple means to award credit by proficiency. To help you develop local policies around how much credit to award based on a proficiency rating, I have

More information

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Developing Microsoft SharePoint Server 2013 Advanced Solutions Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions Course Details Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key

More information

Higher Education Heritage Language Profile Portland State University, Portland, OR

Higher Education Heritage Language Profile Portland State University, Portland, OR Higher Education Heritage Language Profile Portland State University, Portland, OR Address: Contact: 393 Newberger Hall 724 SW Harrison, Portland, OR 97201 Linda Godson, PhD. Coordinator, Heritage Language

More information

Methods of Social Media Research: Data Collection & Use in Social Media

Methods of Social Media Research: Data Collection & Use in Social Media Methods of Social Media Research: Data Collection & Use in Social Media Florida State University College of Communication and Information Sanghee Oh shoh@cci.fsu.edu Overview Introduction to myself Introduction

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

Fine-grained German Sentiment Analysis on Social Media

Fine-grained German Sentiment Analysis on Social Media Fine-grained German Sentiment Analysis on Social Media Saeedeh Momtazi Information Systems Hasso-Plattner-Institut Potsdam University, Germany Saeedeh.momtazi@hpi.uni-potsdam.de Abstract Expressing opinions

More information

Third Party Pre Grant Prior Art Submission

Third Party Pre Grant Prior Art Submission Third Party Pre Grant Prior Art Submission Mark H. Webbink Senior Lecturing Fellow Duke University School of Law Prior Rules for Third Party Submissions Peer To Patent AIA Third Party Submission Changes

More information

Machine Translation Ensuring Meaningful Access for Limited English Proficient Individuals

Machine Translation Ensuring Meaningful Access for Limited English Proficient Individuals Machine Translation Ensuring Meaningful Access for Limited English Proficient Individuals Michael Mulé, DOJ Robin Runge, CRC U.S. Department of Labor Civil Rights Center June 24, 2014 Overview Civil Rights

More information

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1 Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing

More information

Fulfilling World Language Requirements through Alternate Means

Fulfilling World Language Requirements through Alternate Means Fulfilling World Language Requirements through Alternate Means OUSD Board Policy 6146.1 allows students to meet graduation requirements through demonstration of proficiency. Both University of California

More information

webcertain Recruitment pack Ceri Wright [Pick the date]

webcertain Recruitment pack Ceri Wright [Pick the date] Recruitment pack Ceri Wright [Pick the date] SEO Executive Have you recently caught the SEO bug and looking to develop your skills and career in a rapidly growing agency? If your answer is YES then Webcertain

More information

Big Data and Scripting

Big Data and Scripting Big Data and Scripting 1, 2, Big Data and Scripting - abstract/organization contents introduction to Big Data and involved techniques schedule 2 lectures (Mon 1:30 pm, M628 and Thu 10 am F420) 2 tutorials

More information

Recent developments in machine translation policy at the European Patent Office

Recent developments in machine translation policy at the European Patent Office Recent developments in machine translation policy at the European Patent Office Dr Georg Artelsmair Director European Co-operation European Patent Office Brussels, 17 November 2010 The European Patent

More information

A Publicly Available Annotated Corpus for Supervised Email Summarization

A Publicly Available Annotated Corpus for Supervised Email Summarization A Publicly Available Annotated Corpus for Supervised Email Summarization Jan Ulrich, Gabriel Murray, and Giuseppe Carenini Department of Computer Science University of British Columbia, Canada {ulrichj,

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Enhancing Document Review Efficiency with OmniX

Enhancing Document Review Efficiency with OmniX Xerox Litigation Services OmniX Platform Review Technical Brief Enhancing Document Review Efficiency with OmniX Xerox Litigation Services delivers a flexible suite of end-to-end technology-driven services,

More information

COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department

COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department Research Projects 2014 This year students will work in groups of 3 on projects. Each project will be a small research

More information

BYODs & FAIR Data Stewardship

BYODs & FAIR Data Stewardship BYODs & FAIR Data Stewardship Luiz Olavo Bonino luiz.bonino@dtls.nl www.elixir-europe.org Summary FAIR Data stewardship Approach in NL BYOD FAIR Data tooling ecosystem Way of working (FAIR) Data Stewardship

More information

Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization

Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization Luís Marujo 1,2, Anatole Gershman 1, Jaime Carbonell 1, Robert Frederking 1,

More information

A crowdsourcing framework for software localization

A crowdsourcing framework for software localization Universitat Politècnica de Catalunya(UPC) Facultat d Informàtica de Barcelona(FIB) KTH Royal Institute of Technology School of Information and Communication Technology Degree project in European Master

More information

http://conference.ifla.org/ifla77 Date submitted: June 1, 2011

http://conference.ifla.org/ifla77 Date submitted: June 1, 2011 http://conference.ifla.org/ifla77 Date submitted: June 1, 2011 Lost in Translation the challenges of multilingualism Martin Flynn Victoria and Albert Museum London, United Kingdom E-mail: m.flynn@vam.ac.uk

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Leveraging Business to Consumer Learning for Marketing, Training, and Support of Customers

Leveraging Business to Consumer Learning for Marketing, Training, and Support of Customers Leveraging Business to Consumer Learning for Marketing, Training, and Support of Customers Developed by rapidld Steve Owens Vice President, Consulting Fall, 2013 2013 Rapid Learning Deployment, LLC B2C

More information

A Primer On Metadata Analysis

A Primer On Metadata Analysis A Primer On Metadata Analysis Jeffrey Lewis Records Management Program Manager SOL Capital Management Co. @Info_Currency What Is Metadata Data About Data Is used to relate informagon to other pieces informagon

More information

How To Help The European Single Market With Data And Information Technology

How To Help The European Single Market With Data And Information Technology Connecting Europe for New Horizon European activities in the area of Big Data Márta Nagy-Rothengass DG CONNECT, Head of Unit "Data Value Chain" META-Forum 2013, 19 September 2013, Berlin OUTLINE 1. Data

More information

Bridging the Online Language Barriers with Machine Translation at the United Nations

Bridging the Online Language Barriers with Machine Translation at the United Nations Bridging the Online Language Barriers with Machine Translation at the United Nations Fernando Serván! Food and Agriculture Organization of the! United Nations (FAO),! Rome, Italy! About FAO - International

More information

Chapter 12: Advanced topic Web 2.0

Chapter 12: Advanced topic Web 2.0 Chapter 12: Advanced topic Web 2.0 Contents Web 2.0 DOM AJAX RIA Web 2.0 "Web 2.0" refers to the second generation of web development and web design that facilities information sharing, interoperability,

More information