Creating resources for Arabic summarisation. Dr Mahmoud El-Haj School of Computing and Communications
|
|
- Scot Lynch
- 7 years ago
- Views:
Transcription
1 Creating resources for Arabic summarisation Dr Mahmoud El-Haj School of Computing and Communications
2 Purpose The purpose of this exercise was two-fold: First it addresses a shortage of relevant data for Arabic natural language processing. Second it demonstrates the application of crowdsourcing, Machine Translation and Human Experts to the problem of creating natural language resources.
3 Creating Resources for Single document Summarisation Contribution Provide a human single-document summaries corpus. Method We used online workforce service (Mechanical Turk). Source Arabic Wikipedia and two Arabic newspapers (Alrai, Alwatan).
4 The Document Collection The sources were chosen for the following reasons. Contain real text written and used by native Arabic speakers. Written by many authors from different backgrounds. Cover a range of topics from different subject areas.
5 Human-Generated Summaries The corpus was generated using Mechanical Turk Human Intelligence Tasks (HITS). The assessors (workers) summarise one article per task. No more than half of the sentences in the article. Five summaries were created for each article. Summaries for a given article were generated by different workers.
6 Quality of the Summaries Each worker was asked to provide up to three keywords. In the case of selecting random sentences, the summary is still considered.
7 Creating Gold-Standards First: We selected all sentences identified by at least three of the five annotators (Level 3). Second: We selected all sentences identified by at least two annotators (Level 2). Finally, all sentences identified by any of the annotators (All).
8 HIT Sample
9 EASC Corpus Statistics Corpus Name Essex Arabic Summaries Corpus (EASC) Number of Documents 153 Number of Sentences 2,360 Number of Words 41,493 Number of Distinct Words 18,264 Number of Human Summaries 765 (five for each document).
10 Creating Resources for Multidocument Summarisation Contribution Provide a human and system multi-document summaries corpus. Method We used Machine Translation and Human Expert (Arabic Native Speakers).
11 Using Machine Translation MT was used to overcome the lack of Arabic multidocument resources (gold-standard summaries). The DUC 2002 English dataset provides articles and multidocument gold standards for extractive summaries. We performed sentence-by-sentence translation of this dataset into Arabic. The translation is to allow us to evaluate Arabic summaries against English gold standards. The translating into Arabic was done using Google Translate.
12 Using Machine Translation Main objective is to compare our Arabic multi-document summariser with the currently available English ones. To do this we translated the sentences in each article into Arabic, using the Java version of the Google Translate API. A total of 17,340 sentences were translated. The process resulted in a parallel sentence-by-sentence Arabic/English version of the DUC-2002 dataset. The summaries were created using the applicable English or Arabic version of our multi-document summariser system. We did not translate the gold-standard summaries themselves into Arabic.
13 Duc-2002 Arabic Corpus Statistics Corpus Name Duc-2002 (Arabic Translation) Number of Documents 567 Number of Sentences 17,340 Number of Words 199,423 Number of Distinct Words 19,307 Number of Reference Sets 59 Documents per Reference Set 10 on average Number of Gold-standard summaries 118 (two for each reference set)
14 Using Human Experts Used human participants to manually create the resources. The manual process included translating, validating and summarising documents written in English. The process of manually creating an Arabic multidocument corpus was part of our organisation for the TAC 2011 MultiLing Summarisation Pilot.
15 Using Human Experts The participants are responsible for translating and summarising an English test collection into six languages including: Arabic, Hindi, French, Czech, Greek and Hebrew. The test collection was based on WikiNews texts. The source documents contained no meta-data or tags and were represented as UTF 8 plain text files. The test collection contained 100 articles divided into ten reference sets, each contained ten related articles discussing the same topic. The original language of the dataset was English.
16 The Participants A total of twelve people participated in: translating the English corpus into Arabic. summarising the set of related Arabic articles. validating the translation. evaluating the summarisation quality. The participants were paid using Amazon vouchers. The amount of the vouchers varied depending on the task performed. The total amount of Amazon vouchers paid to the participants was 250 where three of the participants volunteered to do the tasks.
17 The Participants The participants have been selected based on their proficiency in Arabic and each one of them must be an Arabic native speaker. The participants are studying, or have finished a university degree in an Arabic speaking country. The participants age range between 22 and 64 years old.
18 Creating The Corpus The participants translated the English dataset into Arabic. For each translated article another translator should validate the translation and fix any errors. For each of the translated articles, a number of three manual summaries were created by three different participants (human peers). Amid the summarisation process the participants should evaluate the quality of the generated summary by assigning a score between one (unreadable summary) and five (fluent and readable summary). No self evaluation was allowed.
19 TAC 2011 Arabic Corpus Statistics Corpus Name Duc-2002 (Arabic Translation) Number of Documents 100 Number of Sentences 1,573 Number of Words 30,908 Number of Distinct Words 9,632 Number of Reference Sets 10 Documents per Reference Set 10 Number of Gold-standard summaries 30 (three for each reference set)
20 Summary Resource creation plays an important role in the advance of Arabic single and multidocument summarisation. Three summaries corpora have been created. 1. Essex Arabic Summaries Corpus (EASC), a single-document Arabic extractive summaries. 2. The second corpus was a multidocument summaries created using a machine translation tool. The approach used introduced a costeffective solution to the problem of limited Arabic resources. 3. The third corpus, MultiLing dataset, is a multidocument summaries corpus. The corpus was created by native Arabic speakers.
MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations
MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations George Giannakopoulos NCSR Demokritos Athens, Greece ggianna@iit.demokritos.gr Jeff
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationSystematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
More informationMulti-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian
Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian Lei Li BUPT, China leili@bupt.edu.cn Corina Forascu RACAI, Romania UAIC, Romania corinfor@info.uaic.ro
More informationCrowdsourcing for Big Data Analytics
KYOTO UNIVERSITY Crowdsourcing for Big Data Analytics Hisashi Kashima (Kyoto University) Satoshi Oyama (Hokkaido University) Yukino Baba (Kyoto University) DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY
More informationBuild Vs. Buy For Text Mining
Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationLiquid OS X User Guide
Basic Use To use Liquid on selected text: Liquid OS X User Guide Select the text CMD-shift-2 Now Liquid appears with your selected text inserted, ready for you to choose a command. Choose a command by
More information2-3 Automatic Construction Technology for Parallel Corpora
2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large
More informationSocial Media Monitoring Tools enhanced by Semantic Web Technologies. Presentation of the Master Thesis Fabian Gasser
Social Media Monitoring Tools enhanced by Semantic Web Technologies Presentation of the Master Thesis Fabian Gasser Contents 1. 2. 3. 4. 5. 6. 7. 8. Main Concepts Challenges Research Question Social Media
More informationLinguistic Resources for OpenHaRT-13
Linguistic Resources for OpenHaRT-13 Zhiyi Song 1, Stephanie Strassel 1, David Doermann 2, Amanda Morris 1 1 Linguistic Data Consortium 2 Applied Media Analysis Introduction to LDC Model LDC is an open,
More informationOPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC
OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC We ll look at these questions. Why does translation cost so much? Why is it hard to keep content consistent? Why is it hard for an organization
More informationEuropass website activity report 2014 (Cyprus, Greek) Visits from Cyprus during 2014
Europass website activity report 2014 (Cyprus, Greek) State of play: 2015 Visits from Cyprus during 2014 Page 1 of 14 Month Visits 2,826 2,996 2,748 2,373 2,283 2,360 3,033 2,461 3,719 4,011 4,124 2,436
More informationDeveloping a User-based Method of Web Register Classification
Developing a User-based Method of Web Register Classification Jesse Egbert Douglas Biber Northern Arizona University Introduction The internet has tremendous potential for linguistic research and NLP applications
More informationA Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures
A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures Afsin Akdogan, Hien To, Seon Ho Kim and Cyrus Shahabi Integrated Media Systems Center University of Southern California, Los Angeles,
More informationComputer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015
Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD
More informationAn Introduction to the Topics Course
An Introduction to the Topics Course Department of Computing, Imperial College London DOC163: An Introduction to the Topics Course Slide 1 Introduction to the Topics in Computing Course The topics element
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationLanguage Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus Novel The Stranger
Language Independent Evaluation of Translation Style and Consistency: Comparing Human and Machine Translations of Camus Novel The Stranger Mahmoud El-Haj 1, Paul Rayson 1, and David Hall 2 1 School of
More informationWHITE PAPER. Machine Translation of Language for Safety Information Sharing Systems
WHITE PAPER Machine Translation of Language for Safety Information Sharing Systems September 2004 Disclaimers; Non-Endorsement All data and information in this document are provided as is, without any
More informationThe. Languages Ladder. Steps to Success. The
The Languages Ladder Steps to Success The What is it? The development of a national recognition scheme for languages the Languages Ladder is one of three overarching aims of the National Languages Strategy.
More informationAdaptation to Hungarian, Swedish, and Spanish
www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,
More informationThe Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content Omar F. Zaidan and Chris Callison-Burch Dept. of Computer Science, Johns Hopkins University Baltimore,
More informationLecture Overview. Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Web 2.0 Definition. Web 2.
Lecture Overview Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Martin Halvey Introduction to Web 2.0 Overview of Tagging Systems Overview of tagging Design and attributes
More informationREQUESTER BEST PRACTICES GUIDE
REQUESTER BEST PRACTICES GUIDE Requester Best Practices Guide Amazon Mechanical Turk is a marketplace for work where businesses (aka Requesters) publish tasks (aka HITS), and human providers (aka Workers)
More informationLSI TRANSLATION PLUG-IN FOR RELATIVITY. within
within LSI Translation Plug-in (LTP) for Relativity is a free plug-in that allows the Relativity user to access the STS system 201 Broadway, Cambridge, MA 02139 Contact: Mark Ettinger Tel: 800-654-5006
More informationUsing the Amazon Mechanical Turk for Transcription of Spoken Language
Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.
More informationQuestion template for interviews
Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by
More informationMLLP Transcription and Translation Platform
MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino
More informationTranscription bottleneck of speech corpus exploitation
Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen
More informationAutomated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)
Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger European Commission Joint Research Centre (JRC) https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems
More informationComputer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
More informationIP5 MMT Project. June 2012. Korean Intellectual Property Office
IP5 MMT Project June 2012 Korean Intellectual Property Office 1. Background 2. IP5 Efforts to Improve MMT 3. Issues to be Considered 1. Background Power Shift Convertibility of information based on automatic
More informationMilitary Service Credit Guidelines 29 March 11
Articulation Office University Drive Camarillo, CA 930 805-437-847 Military Service Credit Guidelines 9 March. In accordance with the recommendations from the American Council on Education (ACE), Guide
More informationRECENSEO Quick Reference
Your team has the tools to dramatically speed document review. And those tools are as easy as,, Pronunciation. Re cĕn sēō Origination. From the Latin word, Review. Adjective. Powerful, intuitive, secure,
More informationText Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011
Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011 Bangalore, India Title Text Analytics Introduction Entity Person Comparative Analysis Entity or Event Text Analytics Text
More informationPPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
More informationINDEX. OutIndex Services...2. Collection Assistance...2. ESI Processing & Production Services...2. Computer-Based Language Translation...
SERVICES INDEX OutIndex Services...2 Collection Assistance...2 ESI Processing & Production Services...2 Computer-Based Language Translation...3 OutIndex E-Discovery Deployment & Installation Consulting...3
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationGATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation
GATE Mímir and cloud services Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation GATE Mímir GATE Mímir is an indexing system for GATE documents. Mímir can index: Text: the original
More informationPROMT Technologies for Translation and Big Data
PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED
More informationIntelligent Personal Assistants: A view from the intersection of Data Mining and Mobile Services
Intelligent Personal Assistants: A view from the intersection of Data Mining and Mobile Services Y. Akinaga S. Mukherjee [speaker] H.-S. Shin P. Subasic R. Sujithan Y. Sun H. Yin DOCOMO Innovations, Inc.,
More informationTAUS 2015. Membership Program (Executive Overview) write to memberservices@taus.net to request the 35 pages detailed service overview. www.taus.
TAUS 2015 Membership Program (Executive Overview) write to memberservices@taus.net to request the 35 pages detailed service overview www.taus.net Five Reasons to be a TAUS Member 1. Access the collaborative
More informationDutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
More informationHow do I translate...?
How do I translate...? Professional Translation Hybrid Translation Machine Translation Certified Translation Supported Formats Language Codes Toll Free: (800)790-3680 Professional Translation Hybrid Translation
More informationApproach to providing interpreting and information in translation or alternative format. www.homesforharingey.org 020 8489 5611 Homes for Haringey Ltd
Approach to providing interpreting and information in translation or alternative format www.homesforharingey.org 020 8489 5611 Homes for Haringey Ltd 1. Purpose of this document This document sets out
More informationMobile and Cloud computing and SE
Mobile and Cloud computing and SE This week normal. Next week is the final week of the course Wed 12-14 Essay presentation and final feedback Kylmämaa Kerkelä Barthas Gratzl Reijonen??? Thu 08-10 Group
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationFREE computing using Amazon EC2
FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat
More informationLanguage Access Implementation Guide. New York City Department of Homeless Services
Language Access Implementation Guide New York City Department of Homeless Services 1 I. Agency Mission and Background I. Agency Mission and Background THE MISSION OF THE DEPARTMENT OF HOMELESS SERVICES
More informationAre You Paying Too Much for ediscovery Processing?
Are You Paying Too Much for ediscovery Processing? How new technologies can accelerate ediscovery and lower your costs Guy MacNeill Product Manager, Lexbe LC ediscovery Webinar Series About our webinars
More informationHow to localise your website
How to localise your website Grow your e-commerce website globally 22 October 2015 Levent Yildizgoren Managing Director - TTC wetranslate Ltd About me 22 years industry experience Mentor, speaker, charity
More informationEvaluation Metrics for Persuasive NLP with Google AdWords
Evaluation Metrics for Persuasive NLP with Google AdWords Marco Guerini, Carlo Strapparava, Oliviero Stock FBK-Irst Via Sommarive 18, Povo, I-38100 Trento {guerini, strappa, stock}@fbk.eu Abstract Evaluating
More informationLocalizing Your Mobile App is Good for Business
Global Insight Localizing Your Mobile App is Good for Business Simply put, the more people who can find and use your mobile application in their native language, the larger your potential market. But launching
More informationBig Data and Scripting. (lecture, computer science, bachelor/master/phd)
Big Data and Scripting (lecture, computer science, bachelor/master/phd) Big Data and Scripting - abstract/organization abstract introduction to Big Data and involved techniques lecture (2+2) practical
More informationLanguage Translation Services RFP Issued: January 1, 2015
Language Translation Services RFP Issued: January 1, 2015 The following are answers to questions Brand USA has received to the RFP for Language Translation Services. Thanks to everyone who submitted questions
More informationInteroperability, Standards and Open Advancement
Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations
More informationAligning the World Language Content Standards for California Public Schools with the Common Core Standards. Table of Contents
Aligning the World Language Standards for California Public Schools with the Common Core Standards Table of s Page Introduction to World Language Standards for California Public Schools (WLCS) 2 World
More informationANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
More informationD3.3.1: Sematic tagging and open data publication tools
COMPETITIVINESS AND INNOVATION FRAMEWORK PROGRAMME CIP-ICT-PSP-2013-7 Pilot Type B WP3 Service platform integration and deployment in cloud infrastructure D3.3.1: Sematic tagging and open data publication
More informationINTERNATIONAL LANGUAGE STUDIES Spanish Option University Transfer Degree
INTERNATIONAL LANGUAGE STUDIES Spanish Option University Transfer Degree Degree Awarded: Associate in Arts General Education Requirements Credit Hours: 37 See the General Education Requirements for the
More informationMachine Learning Approach To Augmenting News Headline Generation
Machine Learning Approach To Augmenting News Headline Generation Ruichao Wang Dept. of Computer Science University College Dublin Ireland rachel@ucd.ie John Dunnion Dept. of Computer Science University
More informationA prototype infrastructure for D Spin Services based on a flexible multilayer architecture
A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationLanguage technologies for Education: recent results by the MLLP group
Language technologies for Education: recent results by the MLLP group Alfons Juan 2nd Internet of Education Conference 2015 18 September 2015, Sarajevo Contents The MLLP research group 2 translectures
More informationYour Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION. clickworker GmbH 2015
Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION 2015 CLICKWORKER AT A GLANCE Segment: Paid Crowdsourcing / Microtasking Services: Text Creation (incl. SEO Texts), Internet Research,
More informationReference Guide: Approved Vendors for Translation and In-Person Interpretation Services
Reference Guide: Approved Vendors for Translation and In-Person Interpretation Services What you need to know The government of D.C. has identified, vetted, and engaged four vendors in a citywide contract
More informationHEC MBA CV Template. The document attached will provide you with the template and format required for your HEC CV.
CV Template The document attached will provide you with the template and format required for your HEC CV. The template is filled with instructions and other place-holding information. You will need to
More informationSIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
More informationARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationDeborah W. Robinson, Ph.D., World Languages Consultant, Ohio Department of Education. Debbie.robinson@ode.state.oh.us
Dear Colleagues, Language educators in Ohio now have multiple means to award credit by proficiency. To help you develop local policies around how much credit to award based on a proficiency rating, I have
More informationDeveloping Microsoft SharePoint Server 2013 Advanced Solutions
Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions Course Details Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key
More informationHigher Education Heritage Language Profile Portland State University, Portland, OR
Higher Education Heritage Language Profile Portland State University, Portland, OR Address: Contact: 393 Newberger Hall 724 SW Harrison, Portland, OR 97201 Linda Godson, PhD. Coordinator, Heritage Language
More informationMethods of Social Media Research: Data Collection & Use in Social Media
Methods of Social Media Research: Data Collection & Use in Social Media Florida State University College of Communication and Information Sanghee Oh shoh@cci.fsu.edu Overview Introduction to myself Introduction
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationFine-grained German Sentiment Analysis on Social Media
Fine-grained German Sentiment Analysis on Social Media Saeedeh Momtazi Information Systems Hasso-Plattner-Institut Potsdam University, Germany Saeedeh.momtazi@hpi.uni-potsdam.de Abstract Expressing opinions
More informationThird Party Pre Grant Prior Art Submission
Third Party Pre Grant Prior Art Submission Mark H. Webbink Senior Lecturing Fellow Duke University School of Law Prior Rules for Third Party Submissions Peer To Patent AIA Third Party Submission Changes
More informationMachine Translation Ensuring Meaningful Access for Limited English Proficient Individuals
Machine Translation Ensuring Meaningful Access for Limited English Proficient Individuals Michael Mulé, DOJ Robin Runge, CRC U.S. Department of Labor Civil Rights Center June 24, 2014 Overview Civil Rights
More informationExtreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1
Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing
More informationFulfilling World Language Requirements through Alternate Means
Fulfilling World Language Requirements through Alternate Means OUSD Board Policy 6146.1 allows students to meet graduation requirements through demonstration of proficiency. Both University of California
More informationwebcertain Recruitment pack Ceri Wright [Pick the date]
Recruitment pack Ceri Wright [Pick the date] SEO Executive Have you recently caught the SEO bug and looking to develop your skills and career in a rapidly growing agency? If your answer is YES then Webcertain
More informationBig Data and Scripting
Big Data and Scripting 1, 2, Big Data and Scripting - abstract/organization contents introduction to Big Data and involved techniques schedule 2 lectures (Mon 1:30 pm, M628 and Thu 10 am F420) 2 tutorials
More informationRecent developments in machine translation policy at the European Patent Office
Recent developments in machine translation policy at the European Patent Office Dr Georg Artelsmair Director European Co-operation European Patent Office Brussels, 17 November 2010 The European Patent
More informationA Publicly Available Annotated Corpus for Supervised Email Summarization
A Publicly Available Annotated Corpus for Supervised Email Summarization Jan Ulrich, Gabriel Murray, and Giuseppe Carenini Department of Computer Science University of British Columbia, Canada {ulrichj,
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationEnhancing Document Review Efficiency with OmniX
Xerox Litigation Services OmniX Platform Review Technical Brief Enhancing Document Review Efficiency with OmniX Xerox Litigation Services delivers a flexible suite of end-to-end technology-driven services,
More informationCOMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department
COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department Research Projects 2014 This year students will work in groups of 3 on projects. Each project will be a small research
More informationBYODs & FAIR Data Stewardship
BYODs & FAIR Data Stewardship Luiz Olavo Bonino luiz.bonino@dtls.nl www.elixir-europe.org Summary FAIR Data stewardship Approach in NL BYOD FAIR Data tooling ecosystem Way of working (FAIR) Data Stewardship
More informationSupervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization Luís Marujo 1,2, Anatole Gershman 1, Jaime Carbonell 1, Robert Frederking 1,
More informationA crowdsourcing framework for software localization
Universitat Politècnica de Catalunya(UPC) Facultat d Informàtica de Barcelona(FIB) KTH Royal Institute of Technology School of Information and Communication Technology Degree project in European Master
More informationhttp://conference.ifla.org/ifla77 Date submitted: June 1, 2011
http://conference.ifla.org/ifla77 Date submitted: June 1, 2011 Lost in Translation the challenges of multilingualism Martin Flynn Victoria and Albert Museum London, United Kingdom E-mail: m.flynn@vam.ac.uk
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationLeveraging Business to Consumer Learning for Marketing, Training, and Support of Customers
Leveraging Business to Consumer Learning for Marketing, Training, and Support of Customers Developed by rapidld Steve Owens Vice President, Consulting Fall, 2013 2013 Rapid Learning Deployment, LLC B2C
More informationA Primer On Metadata Analysis
A Primer On Metadata Analysis Jeffrey Lewis Records Management Program Manager SOL Capital Management Co. @Info_Currency What Is Metadata Data About Data Is used to relate informagon to other pieces informagon
More informationHow To Help The European Single Market With Data And Information Technology
Connecting Europe for New Horizon European activities in the area of Big Data Márta Nagy-Rothengass DG CONNECT, Head of Unit "Data Value Chain" META-Forum 2013, 19 September 2013, Berlin OUTLINE 1. Data
More informationBridging the Online Language Barriers with Machine Translation at the United Nations
Bridging the Online Language Barriers with Machine Translation at the United Nations Fernando Serván! Food and Agriculture Organization of the! United Nations (FAO),! Rome, Italy! About FAO - International
More informationChapter 12: Advanced topic Web 2.0
Chapter 12: Advanced topic Web 2.0 Contents Web 2.0 DOM AJAX RIA Web 2.0 "Web 2.0" refers to the second generation of web development and web design that facilities information sharing, interoperability,
More information