Real-Time Identification of MWE Candidates in Databases from the BNC and the Web
|
|
|
- Alban Underwood
- 9 years ago
- Views:
Transcription
1 Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text Archive Oxford 21 April 2005 William H. Fletcher United States Naval Academy ( Radboud University of Nijmegen)
2 Objectives of Presentation Describe background and biases Define key terms elastically Outline my software applications Sketch range of uses and target audiences envisioned Show and compare MRR and MI Encourage feedback and suggestions for further development 2
3 Background and Biases 1 of 4 Multimedia in CALL user (interface) KWiCFinder to Identify useful texts Find examples of actual use for teaching and writing Clarify linguistic questions Explore emerging semantic fields build web-based ad-hoc corpora download free from KWiCFinder.com 3
4 Background and Biases 2 of 4 kfngram n-grams, phrase-frames free, flexible, GUI; fast even on large datasets (20MW) Phrases in English website n-grams (n = 1 8) phrase-frames: set of n-gram variants identical in all but one word PoS-grams: set of n-gram variants with the same sequence of PoS tags chargrams now BNC; sub-corpora, MICASE and ANS to follow 4
5 Background and Biases 3 of 4 Web as Corpus Search Engine Consortium Initiative of Silvia Bernardini and Marco Baroni, University of Bologna, Forlì Other WAC enthusiasts: Mr. Collocations Stefan Evert, Sebastian Hoffmann, Adam Kilgarriff and myself Initial goal: gigaword Web corpus (800M English, 100M each German and Italian) 5
6 Background and Biases 4 of 4 Emphasis on the practical: reasonable speed, acceptable precision and recall Motivations on-the-fly subcorpora for PIE kfngramdb overcome kfngram limitations: static lists, straight frequency managing KWiCFinder ad-hoc Web corpora better integration all three tools 6
7 Objective Evaluate and compare statistical techniques to identify MWE candidates for corpus database queries for MWEs with specific lexical items subsequent screening, either manually or with processing-intensive metrics deemed more effective than those used here 7
8 Terms MWE cover-term for multi-word units, salient collocations, formulaic expressions Real-time / on-the-fly with tolerable delay Scalable from kilo- to mega- and gigacorpora In practice real time for (sub)corpora 25 MW 8
9 Target Audience kfngramdb (Corpus) linguists compare subcorpora in large linguistic databases identify content domain and text-type for Web as corpus learn database principles by example on PC Language professionals teachers, advanced language learners: readings, instructional materials, examples; identify (MW)Es writers (L2 / L1): organize / maintain exemplars for imitatio, personal corpus and reference materials translators: domain-specific parallel / comparable corpora, possibly compiled ad-hoc from Web sources 9
10 Relational databases (RDMS) Why? 1 of 3 organize linguistic data, rapid retrieval sophisticated queries relating the content of one field or table to others filter / focus results by relevant criteria dynamic interactive datasets, not static list standard query language SQL: skills transfer to other RDMS several powerful RDMS systems are free multi-platform (develop on PC, deploy on *nix) 10
11 Relational databases Which? 2 of 3 Microsoft Access + wizards easy to learn + produces SQL queries adaptable to other RDMS + excellent front-end to other RDMSs (e.g. MySQL) Windows only (MS Office Pro Suite) 11
12 Relational databases Which? 3 of 3 MySQL +free, fast, scalable +tight integration with PHP for Web interface ±powerful non-standard SQL extensions +active development, large, helpful user base +user-defined C functions callable in queries (e.g. to calculate lexical association metrics) +embeddable in other applications +multiple platform 12
13 Which Lexical Association Metric? Gravity Counts for the boundaries of collocations * Compares Mutual Information, T-score, Dice, Gravity Counts Gravity Counts take larger context into account most useful for identifying collocation boundaries but data processing intensive * Daudaravičius, Vidas and Rūta Marcinkevičienė, International Journal of Corpus Linguistics, 9:2 (2004),
14 Mutual Rank Ratio 1 of 4 Paul Deane, Educational Testing Service, A Nonparametric Method for Extraction of Candidate Phrasal terms, Association for Computational Linguistics lexical association metric for knowledge-free extraction of phrasal terms, identification of MWUs in untagged text Based on ratio of global to local shared ranks Performance similar or superior to other metrics identifying 2- and 3-grams in WordNet when n-grams including the top 160 ranked types are excluded 14
15 Mutual Rank Ratio 2 of 4 shared rank: tied items assigned same rank e.g. items all have frequency 512 shared rank is ( ) / 2 = 12.5 next item ranked 16 (higher if shared) global and local rank United Kingdom local rank of a specific n-gram (LR) united kingdom global rank of phrase-frames of which n-gram is a variant (GR) * kingdom (the k., animal k., his k. united k.) united * (u. kingdom, u. states, u. nations, u. distillers ) 15
16 Mutual Rank Ratio 3 of 4 Formula MRR = (GR united * GR * kingdom ) 1/2 LR united kingdom nth root of product of all Global (phrase-frame) Ranks divided by Local (n-gram) Rank 16
17 Mutual Rank Ratio Pros & Cons 4 of 4 + Easy to calculate, especially if n-grams and phrase-frames are already known (PIE, kfngram) + Finds MWUs in untagged text >= others* + Weighting reflects Zipfian distribution - Excludes MWUs - with top types (state of the art, matter of principle) - not in phrase-frames ( singletons ) * if most frequent types excluded 17
18 Mutual Information 1 of 2 Popular metric for finding rare word pairs Formula (after D & M) MI(x,y) = log 2 (N f(x,y) / f(x) f(y)) N corpus size f(x,y) frequency of co-occurrence f(x), f(y) total frequency in corpus Calculated for pairs of words with frequency rank > 150, span 2-4 words; n-grams with these pairs retrieved (could include state of the art, matter of principle) 18
19 Mutual Information Pros & Cons 2 of 2 +Straightforward calculation with parameters needed for some other metrics +Finds elusive items, including singletons +Complements MRR - Strong bias toward the infrequent: Two co-occurring rare words will show a high score, but two co-occurring frequent words will show a low score. (D & M 325) 19
20 1 of 3 MRR and MI Compared Minimal overlap in MWEs (top 500 items < 20% shared; ranking very different) Complementary: both identify different sets of interesting MWE candidates Both calculation on-the-fly in series of SQL queries alone impractical / intractable on PC for corpora > 5MW hybrid approach with programmatic math faster, more scalable 20
21 2 of 3 MRR and MI Compared Top-ranked 500 n-grams by MRR but not by MI by MI but not by MRR by both (<20% of total) in MICASE (MIchigan Corpus of Academic Spoken English, 1.8 MW) EuroParl (European Parliament transcripts, 500 KW) Click for word lists 21
22 3 of 3 MRR and Singletons In a large tagged corpus (BNC), Mutual Rank Ratio strands many MWE singletons, n-grams lacking a phrase-frame for at least one wildword position Frequent singletons should be reviewed for potential MWEs Singletons less problematic for smaller untagged corpora Click for word lists 22
23 Toward Gigacorpora Today s RDMSs excel at locating and relating millions of records, but do not scale well into the billions Search engine technology points the way Doug Cutting s Lucene open source text indexer (Java) handles large plain-text collections Hybrid approach Lucene to locate documents / passages RDMS to manage text metadata, markup 23
24 Real-Time Identification of MWE Candidates Reactions and suggestions encouraged
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY
Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
CONCEPTCLASSIFIER FOR SHAREPOINT
CONCEPTCLASSIFIER FOR SHAREPOINT PRODUCT OVERVIEW The only SharePoint 2007 and 2010 solution that delivers automatic conceptual metadata generation, auto-classification and powerful taxonomy tools running
On the Fly Query Segmentation Using Snippets
On the Fly Query Segmentation Using Snippets David J. Brenes 1, Daniel Gayo-Avello 2 and Rodrigo Garcia 3 1 Simplelogica S.L. [email protected] 2 University of Oviedo [email protected] 3 University
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web
wwwcoveocom Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Why you need Coveo Enterprise Search Quickly find documents scattered across your enterprise network Coveo is actually
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities
Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.
Query term suggestion in academic search
Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.
ifinder ENTERPRISE SEARCH
DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality
QUT Digital Repository: http://eprints.qut.edu.au/
QUT Digital Repository: http://eprints.qut.edu.au/ Lu, Chengye and Xu, Yue and Geva, Shlomo (2008) Web-Based Query Translation for English-Chinese CLIR. Computational Linguistics and Chinese Language Processing
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
BUSINESS INTELLIGENCE
BUSINESS INTELLIGENCE Microsoft Dynamics NAV BUSINESS INTELLIGENCE Driving better business performance for companies with changing needs White Paper Date: January 2007 www.microsoft.com/dynamics/nav Table
Flattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
MarkLogic Enterprise Data Layer
MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer September 2011 September 2011 September 2011 Table of Contents Executive Summary... 3 An Enterprise Data
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Voice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
Spam Filtering using Naïve Bayesian Classification
Spam Filtering using Naïve Bayesian Classification Presented by: Samer Younes Outline What is spam anyway? Some statistics Why is Spam a Problem Major Techniques for Classifying Spam Transport Level Filtering
Performance Management Platform
Open EMS Suite by Nokia Performance Management Platform Functional Overview Version 1.4 Nokia Siemens Networks 1 (16) Performance Management Platform The information in this document is subject to change
Text Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
BIRT Document Transform
BIRT Document Transform BIRT Document Transform is the industry leader in enterprise-class, high-volume document transformation. It transforms and repurposes high-volume documents and print streams such
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan [email protected]
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Learn to Personalized Image Search from the Photo Sharing Websites
Learn to Personalized Image Search from the Photo Sharing Websites ABSTRACT: Increasingly developed social sharing websites, like Flickr and Youtube, allow users to create, share, annotate and comment
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
Optimizing Multilingual Search With Solr
www.basistech.com [email protected] 617-386-2090 Optimizing Multilingual Search With Solr Pg. 1 INTRODUCTION Today s search application users expect search engines to just work seamlessly across multiple
INSIGHT NAV. White Paper
INSIGHT Microsoft DynamicsTM NAV Business Intelligence Driving business performance for companies with changing needs White Paper January 2008 www.microsoft.com/dynamics/nav/ Table of Contents 1. Introduction...
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
Machine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise
CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet
AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom
AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom Laurence Anthony Waseda University [email protected] Abstract In this paper, I will
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Electronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
Complex, true real-time analytics on massive, changing datasets.
Complex, true real-time analytics on massive, changing datasets. A NoSQL, all in-memory enabling platform technology from: Better Questions Come Before Better Answers FinchDB is a NoSQL, all in-memory
A Java proxy for MS SQL Server Reporting Services
1 of 5 1/10/2005 9:37 PM Advertisement: Support JavaWorld, click here! January 2005 HOME FEATURED TUTORIALS COLUMNS NEWS & REVIEWS FORUM JW RESOURCES ABOUT JW A Java proxy for MS SQL Server Reporting Services
Using Microsoft Business Intelligence Dashboards and Reports in the Federal Government
Using Microsoft Business Intelligence Dashboards and Reports in the Federal Government A White Paper on Leveraging Existing Investments in Microsoft Technology for Analytics and Reporting June 2013 Dev
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
About This Document 3. About the Migration Process 4. Requirements and Prerequisites 5. Requirements... 5 Prerequisites... 5
Contents About This Document 3 About the Migration Process 4 Requirements and Prerequisites 5 Requirements... 5 Prerequisites... 5 Installing the Migration Tool and Enabling Migration 8 On Linux Servers...
KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
Get the most value from your surveys with text analysis
PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That
opencrx Enterprise Class Open Source CRM
opencrx Enterprise Class Open Source CRM [email protected] What is opencrx? opencrx is an Open Source Standard Solution for CRM (CRM = Customer Relationship Management) opencrx is highly interoperable and
A Monitored Student Testing Application Using Cloud Computing
A Monitored Student Testing Application Using Cloud Computing R. Mullapudi and G. Hsieh Department of Computer Science, Norfolk State University, Norfolk, Virginia, USA [email protected], [email protected]
Document Management Server - Overview
Introduction The Document Management System (DMS) is a web-based application designed to allow for storage and retrieval of documents with user-defined document types, document groups, and keywords. The
One Approach of e-learning Platform Customization for Primary Education
One Approach of e-learning Platform Customization for Primary Education Nenad Kojic, Aleksandra Adzic, Radica Kojic Abstract There are many different types of platforms for e learning. A lot of them can
A Corpus-Based Tool for Exploring Domain-Specific Collocations in English
A Corpus-Based Tool for Exploring Domain-Specific Collocations in English Ping-Yu Huang 1, Chien-Ming Chen 2, Nai-Lung Tsao 3 and David Wible 3 1 General Education Center, Ming Chi University of Technology
Business Value Reporting and Analytics
IP Telephony Contact Centers Mobility Services WHITE PAPER Business Value Reporting and Analytics Avaya Operational Analyst April 2005 avaya.com Table of Contents Section 1: Introduction... 1 Section 2:
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
In-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
4D and SQL Server: Powerful Flexibility
4D and SQL Server: Powerful Flexibility OVERVIEW MS SQL Server has become a standard in many parts of corporate America. It can manage large volumes of data and integrates well with other products from
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS
9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence
COURSE SYLLABUS EDG 6931: Designing Integrated Media Environments 2 Educational Technology Program University of Florida
COURSE SYLLABUS EDG 6931: Designing Integrated Media Environments 2 Educational Technology Program University of Florida CREDIT HOURS 3 credits hours PREREQUISITE Completion of EME 6208 with a passing
Annotated Corpora in the Cloud: Free Storage and Free Delivery
Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki [email protected] Abstract The paper describes a technical strategy for implementing natural
Microsoft Dynamics NAV
Microsoft Dynamics NAV Maximising value through business insight Business Intelligence White Paper May 2013 Reports were tedious. Earlier it would take days for manual collation. Now all this is available
Web Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
Data Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
ElegantJ BI. White Paper. The Enterprise Option Reporting Tools vs. Business Intelligence
ElegantJ BI White Paper The Enterprise Option Integrated Business Intelligence and Reporting for Performance Management, Operational Business Intelligence and Data Management www.elegantjbi.com ELEGANTJ
Search Engine Submission
Search Engine Submission Why is Search Engine Optimisation (SEO) important? With literally billions of searches conducted every month search engines have essentially become our gateway to the internet.
Learning Translations of Named-Entity Phrases from Parallel Corpora
Learning Translations of Named-Entity Phrases from Parallel Corpora Robert C. Moore Microsoft Research Redmond, WA 98052, USA [email protected] Abstract We develop a new approach to learning phrase
Microsoft Dynamics NAV
Microsoft Dynamics NAV 2015 Microsoft Dynamics NAV Maximising value through business insight Business Intelligence White Paper December 2014 CONTENTS Reports were tedious. Earlier it would take days for
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
Introduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
Sketch Engine. Sketch Engine. SRDANOVIĆ ERJAVEC Irena, Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine
Sketch Engine SRDANOVIĆ ERJAVEC Irena, Sketch Engine Sketch Engine Web 1 Word Sketch Thesaurus Sketch Difference Sketch Engine JpWaC 4 Web Sketch Engine 1. 1980 10 80 Kilgarriff & Rundell 2002 500 1,000
Pre-processing of Bilingual Corpora for Mandarin-English EBMT
Pre-processing of Bilingual Corpora for Mandarin-English EBMT Ying Zhang, Ralf Brown, Robert Frederking, Alon Lavie Language Technologies Institute, Carnegie Mellon University NSH, 5000 Forbes Ave. Pittsburgh,
The Online Grade Book A Case Study in Learning about Object-Oriented Database Technology
The Online Grade Book A Case Study in Learning about Object-Oriented Database Technology Charles R. Moen, M.S. University of Houston - Clear Lake [email protected] Morris M. Liaw, Ph.D. University of Houston
Tagetik Extends Customer Value with SQL Server 2012
Tagetik Extends Customer Value with SQL Server 2012 Author: Dave Kasabian Contributors: Marco Pierallini, Luca Pieretti Published: February 2012 Summary: As the 2011 Microsoft ISV Line of Business partner
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Seamless Web Data Entry for SAS Applications D.J. Penix, Pinnacle Solutions, Indianapolis, IN
Seamless Web Data Entry for SAS Applications D.J. Penix, Pinnacle Solutions, Indianapolis, IN ABSTRACT For organizations that need to implement a robust data entry solution, options are somewhat limited
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
Giuseppe Riccardi, Marco Ronchetti. University of Trento
Giuseppe Riccardi, Marco Ronchetti University of Trento 1 Outline Searching Information Next Generation Search Interfaces Needle E-learning Application Multimedia Docs Indexing, Search and Presentation
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Microsoft Dynamics NAV
Microsoft Dynamics NAV Maximising value through business insight Business Intelligence White Paper October 2015 CONTENTS Reports were tedious. Earlier it would take days for manual collation. Now all this
Customer Insight Appliance. Enabling retailers to understand and serve their customer
Customer Insight Appliance Enabling retailers to understand and serve their customer Customer Insight Appliance Enabling retailers to understand and serve their customer. Technology has empowered today
Banking Industry Performance Management
A MICROSOFT WHITE PAPER Banking Industry Performance Management Using Business Intelligence to Increase Revenue and Profitability Software for the business. Overview Today, banks operate in a complex,
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Analyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
RingStor User Manual. Version 2.1 Last Update on September 17th, 2015. RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816.
RingStor User Manual Version 2.1 Last Update on September 17th, 2015 RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ 08816 Page 1 Table of Contents 1 Overview... 5 1.1 RingStor Data Protection...
T14 RUMatricula Phase II. Section 1 Metaphor and requirements
Section 1 Metaphor and requirements RUMatricula is a system that aims at replacing current UPRM terminal-based course selection software with a web-based and mobile-friendly alternative that is simple
