DiaCollo: On the trail of diachronic collocations
|
|
|
- Christiana Jacobs
- 10 years ago
- Views:
Transcription
1 DiaCollo: On the trail of diachronic collocations Bryan Jurish AG Elektronisches Publizieren Historische Semantik und Semantic Web Heidelberger Akademie der Wissenschaften 14 th 16 th September, 2015
2 Overview The Situation Diachronic Text Corpora Collocation Profiling Diachronic Collocation Profiling DiaCollo Requests & Parameters Profile, Diffs & Indices Association Score Functions Examples Summary & Outlook
3 The Situation: Diachronic Text Corpora heterogeneous text collections, especially with respect to date of origin other partitionings potentially relevant too, e.g. by author, text class, etc. increasing number available for linguistic & humanities research, e.g. Deutsches Textarchiv (DTA) (Geyken et al. 2011) Referenzkorpus Altdeutsch (DDD) (Richling 2011) Corpus of Historical American English (COHA) (Davies 2012)... but even putatively synchronic corpora have a temporal extension, e.g. DWDS/ZEIT ( Kohl ) ( ) DWDS/Blogs ( Browser ) ( ) DDR Presseportal ( ) should reveal temporal phenomena such as semantic shift problematic for conventional natural language processing tools implicit assumptions of homogeneity
4 The Situation: Collocation Profiling You shall know a word by the company it keeps J. R. Firth Basic Idea (Church & Hanks, 1990; Manning & Schütze 1999; Evert 2005) lookup all candidate collocates (w 2 ) occurring with the target term (w 1 ) rank candidates by association score chance co-occurrences with high-frequency items must be filtered out! statistical methods require large data sample What for? computational lexicography (Kilgarriff & Tugwell 2002; Didakowski & Geyken 2013) neologism detection (Kilgarriff et al. 2015) distributional semantics (Schütze 1992; Sahlgren 2006) text mining / distant reading (Heyer et al. 2006; Moretti 2013)
5 Diachronic Collocation Profiling The Problem: (temporal) heterogeneity conventional collocation extractors assume corpus homogeneity co-occurrence frequencies are computed only for word-pairs (w 1, w 2 ) influence of occurrence date (and other document properties) is irrevocably lost A Solution (sketch) represent terms as n-tuples of independent attributes, including occurrence date partition term vocabulary on-the-fly into user-specified intervals ( date slices ) collect independent slice-wise profiles into final result set Advantages full support for diachronic axis variable query-level granularity flexible attribute selection multiple association scores Drawbacks sparse data requires larger corpora computationally expensive large index size no syntactic relations (yet)
6 DiaCollo: Overview General Background developed to aid CLARIN historians in analyzing discourse topic trends successfully applied to mid-sized and large corpora, e.g. J. G. Dingler s Polytechnisches Journal ( , 19K documents, 35M tokens) Deutsches Textarchiv ( , 2.6K documents, 173M tokens) Implementation DWDS Zeitungen ( , 10M documents, 4.3G tokens) Perl API, command-line, & RESTful DDC/D* web-service plugin + GUI fast native indices over n-tuple inventories, equivalence classes, etc. scalable even in a high-load environment no persistent server process is required native index access via direct file I/O or mmap() system call various output & visualization formats, e.g. TSV, JSON, HTML, d3-cloud
7 DiaCollo: Requests & Parameters request-oriented RESTful service (Fielding 2000) accepts user requests as set of parameter=value pairs Parameter Description parameter passing via URL query string or HTTP POST request common parameters: query date slice groupby score kbest diff global profile format target lemma(ta), regular expression, or DDC query target date(s), interval, or regular expression aggregation granularity or 0 (zero) for a global profile aggregation attributes with optional restrictions score function for collocate ranking maximum number of items to return per date-slice score aggregation function for diff profiles request global profile pruning (vs. default slice-local pruning) profile type to be computed ({native,ddc} {unary,diff}) output format or visualization mode
8 DiaCollo: Profiles, Diffs & Indices Profiles & Diffs simple request unary profile for target term(s) filtered & projected to selected attribute(s) trimmed to k-best collocates for target word(s) aggregated into independent slice-wise sub-intervals (profile, query) (groupby) (score, kbest, global) (date, slice) diff request comparison of two independent targets (profile, bquery,... ) highlights differences or similarities of target queries (diff) can be used to compare different words (query bquery)... or different corpus subsets w.r.t. a given word (e.g. date bdate) Indices & Attributes compile-time filtering of native indices: frequency threshholds, PoS-tags default index attributes: Lemma (l), Pos (p) finer-grained queries possible with DDC back-end
9 DiaCollo: Scoring Functions Supported Score Functions f raw collocation frequency = f 12 lf collocation log-frequency = log 2 (f 12 + ε) mi pointwise MI log-frequency log 2 f 12 N f 1 f 2 log 2 f 12 ld log-dice coefficient (Rychlý 2008) 14 + log 2 2 f 12 f 1 +f 2 Supported Diff Operations diff raw score difference = s a s b adiff absolute score difference = s a s b avg arithmetic average = s a+s b 2 max maximum = max{s a, s b } min minimum = min{s a, s b } havg harmonic average gavg geometric average 2s as b s a +s b s a s b
10 Example 1: Krise ( crisis ) in der ZEIT Berlin blockade aftermath anti-government protests & strikes in France Nixon & Brandt resignations; Iranian revolution Solidarność in Poland; Soviet war in Afghanistan; Schmidt coalition collapses wars in ex-yugoslavia, Kosovo & Chechnya; financial crises in Asia & Mexico global financial crisis 2010 present civil wars in Syria & the Ukraine; Greek bankruptcy
11 Example 1: Selected Word-Clouds : 2010 present:
12 Example 2: Mann vs. Frau in the DTA Disclaimer historical corpus data can reveal persistent cultural biases linked collocation data does not reflect the opinions of this author or the BBAW! Observations fixed & formulaic expressions very prominent gnädige Frau Frau X geborene Y der gemeine Mann pretty much exclusively cultural bias: Mann berühmt, ehrlich, gelehrt, tapfer, weise,... Frau betrübt, lieb, schön, tugendreich, verwitwet,... differences grow less pronounced in late 18 th & 19 th centuries (masculine variant: gnädiger Herr) (birth- vs. married surname) (masculine generic)
13 Example 2: Selected Word-Clouds : :
14 Example 3: 400 Years of Potables query: "(Getränk gn-sub WITH $p=nn)=2 (trinken WITH $p=/vv[ip]/)" #FMIN 1 Remarks uses DDC back-end for fine-grained data acquisition uses GermaNet thesaurus-based lexical expansion for Getränk ( beverage ) considers only those target terms immediately preceding verb trinken ( to drink ) global profile uses shared target-set Observations near-constants: Bier, Milch, Wasser, Wein ( beer, milk, water, wine ) : Tee, Kaffee, Schokolade ( tea, coffee, chocolate ) appear : Schnaps displaces Branntwein; Champagner appears : Alkohol ( alcohol ) as category of beverages : Kognak, Saft, Sekt, Whisky ( cognac, juice, sparkling wine, whisky )
15 Example 3: Selected Word-Clouds : :
16 Summary & Outlook Diachronic Collocation Profiling diachronic text corpora conventional tools diachronic profiling DiaCollo on-the-fly corpus partitioning attribute-wise term indices diff profile mode DDC/D* integration RESTful web service semantic shift, discourse trends implicit assumptions of homogeneity date-dependent lexemes arbitrary query granularity flexible result filtering direct comparison fine-grained queries, corpus KWIC links external API, online visualization Future Work distributional semantic profiles (Berry et al. 1995; Blei et al., 2003) cross-product visualizations (Barnes & Hut 1986)... and more!
17 The End Thank you for listening!
Simple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd [email protected] Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
Check list for web developers
Check list for web developers Requirement Yes No Remarks 1. Input Validation 1.1) Have you done input validation for all the user inputs using white listing and/or sanitization? 1.2) Does the input validation
From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv 1. The Metadata Format CMDI Metadata? Metadata Format? and more Metadata? Metadata Format?
Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011
Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase
How To Write A German Reference Corpus Of Computer Mediated Communication
DeRiK: A German Reference Corpus of Computer-Mediated Communication Michael Beißwenger 1, Maria Ermakova 2, Alexander Geyken 2, Lothar Lemnitzer 2, Angelika Storrer 1 1 Department of German Language and
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Folksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim [email protected]
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY
Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
Big Data for Satellite Business Intelligence
Big Data for Satellite Business Intelligence GSAW 2015 Loic COULET, Kratos ISE 2015 by Kratos ISE. Published by The Aerospace Corporation with permission. Who s talking? Computer Science Passionate Kratos
Generation of Word Profiles for large German corpora
Generation of Word Profiles for large German corpora Alexander Geyken, Alexander Siebert and Jörg Didakowski 1. Introduction Electronic corpora have been used in lexicography and the domain of language
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
Data Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
Information Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
Security Analytics Topology
Security Analytics Topology CEP = Stream Analytics Hadoop = Batch Analytics Months to years LOGS PKTS Correlation with Live in Real Time Meta, logs, select payload Decoder Long-term, intensive analysis
Schema documentation for types1.2.xsd
Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
aaps algacom Account Provisioning System
aaps algacom Account Provisioning System Simple web interface, data integrity checks and customizable policies allow account administration without specific skills Account provisioning against Active Directory
Sitecore Health. Christopher Wojciech. netzkern AG. [email protected]. Sitecore User Group Conference 2015
Sitecore Health Christopher Wojciech netzkern AG [email protected] Sitecore User Group Conference 2015 1 Hi, % Increase in Page Abondonment 40% 30% 20% 10% 0% 2 sec to 4 2 sec to 6 2 sec
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...
Informatica Corporation B2B Data Exchange Version 9.5.0 Release Notes June 2012 Copyright (c) 2006-2012 Informatica Corporation. All rights reserved. Contents New Features... 1 Installation... 3 Upgrade
Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse
Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse Rob Sanderson, Matthew Brook O Donnell and Clare Llewellyn What happens with
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Expense Management. Configuration and Use of the Expense Management Module of Xpert.NET
Expense Management Configuration and Use of the Expense Management Module of Xpert.NET Table of Contents 1 Introduction 3 1.1 Purpose of the Document.............................. 3 1.2 Addressees of the
TDAQ Analytics Dashboard
14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture
COURSE SYLLABUS COURSE TITLE:
1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Service-Oriented Architecture and Software Engineering
-Oriented Architecture and Software Engineering T-86.5165 Seminar on Enterprise Information Systems (2008) 1.4.2008 Characteristics of SOA The software resources in a SOA are represented as services based
How to Schedule Report Execution and Mailing
SAP Business One How-To Guide PUBLIC How to Schedule Report Execution and Mailing Release Family 8.8 Applicable Releases: SAP Business One 8.81 PL10 and PL11 SAP Business One 8.82 PL01 and later All Countries
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Big Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context
MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able
VMware vcenter Log Insight User's Guide
VMware vcenter Log Insight User's Guide vcenter Log Insight 1.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition.
Testing the API behind a mobile app
Testing the API behind a mobile app Marc van t Veer, Polteq www.eurostarconferences.com @esconfs #esconfs Testing the API behind a mobile app System testing in a cloud enabling project Marc van t Veer
ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E17087-01
ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E17087-01 FEBRUARY 2010 COPYRIGHT Copyright 1998, 2009, Oracle and/or its affiliates. All rights reserved. Part
Sybase Unwired Platform 2.0
white paper Sybase Unwired Platform 2.0 Development Paradigm www.sybase.com TABLE OF CONTENTS 1 Sybase Unwired Platform 1 Mobile Application Development 2 Mobile Business Object (MBO) Development 4 Mobile
Relational Database: Additional Operations on Relations; SQL
Relational Database: Additional Operations on Relations; SQL Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Overview The course packet
the missing log collector Treasure Data, Inc. Muga Nishizawa
the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days
Testing the API behind a mobile app. Tutorial Marc van t Veer
Testing the API behind a mobile app Tutorial Marc van t Veer Content What is an API Why use an API How to use an API How-to test an API Exercises Group exercises (learning concepts 7) Individual exercises
A Plan for the Continued Development of the DNS Statistics Collector
A Plan for the Continued Development of the DNS Statistics Collector Background The DNS Statistics Collector ( DSC ) software was initially developed under the National Science Foundation grant "Improving
DRAFT! c January 7, 1999 Christopher Manning & Hinrich Schütze. 141. 5 Collocations
DRAFT! c January 7, 1999 Christopher Manning & Hinrich Schütze. 141 5 Collocations COMPOSITIONALITY TERM TECHNICAL TERM TERMINOLOGICAL PHRASE A COLLOCATION is an expression consisting of two or more words
automates system administration for homogeneous and heterogeneous networks
IT SERVICES SOLUTIONS SOFTWARE IT Services CONSULTING Operational Concepts Security Solutions Linux Cluster Computing automates system administration for homogeneous and heterogeneous networks System Management
Component Based Rapid OPC Application Development Platform
Component Based Rapid OPC Application Development Platform Jouni Aro Prosys PMS Ltd, Tekniikantie 21 C, FIN-02150 Espoo, Finland Tel: +358 (0)9 2517 5401, Fax: +358 (0) 9 2517 5402, E-mail: [email protected],
Value-added Services for 3D City Models using Cloud Computing
Value-added Services for 3D City Models using Cloud Computing Javier HERRERUELA, Claus NAGEL, Thomas H. KOLBE (javier.herreruela claus.nagel thomas.kolbe)@tu-berlin.de Institute for Geodesy and Geoinformation
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory
Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory A Little Context! The Five Golden Principles of Security! Know your system! Principle
Varnish the Drupal way
Varnish the Drupal way About me Boyan Borisov Team Leader @ Propeople [email protected] @boyan_borisov Skype: boian.borisov hap://linkedin.com/in/ boyanborisov What is Varnish? Reverse proxy cache server...
IT Service Level Management 2.1 User s Guide SAS
IT Service Level Management 2.1 User s Guide SAS The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS IT Service Level Management 2.1: User s Guide. Cary, NC:
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Flexible Routing and Load Control on Back-End Servers. Controlling the Request Load and Quality of Service
ORACLE TRAFFIC DIRECTOR KEY FEATURES AND BENEFITS KEY FEATURES AND BENEFITS FAST, RELIABLE, EASY-TO-USE, SECURE, AND SCALABLE LOAD BALANCER [O.SIDEBAR HEAD] KEY FEATURES Easy to install, configure, and
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
ArcGIS Server Security Threats & Best Practices 2014. David Cordes Michael Young
ArcGIS Server Security Threats & Best Practices 2014 David Cordes Michael Young Agenda Introduction Threats Best practice - ArcGIS Server settings - Infrastructure settings - Processes Summary Introduction
Index. AdWords, 182 AJAX Cart, 129 Attribution, 174
Index A AdWords, 182 AJAX Cart, 129 Attribution, 174 B BigQuery, Big Data Analysis create reports, 238 GA-BigQuery integration, 238 GA data, 241 hierarchy structure, 238 query language (see also Data selection,
What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project
Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute
Introduction to Text Mining. Module 2: Information Extraction in GATE
Introduction to Text Mining Module 2: Information Extraction in GATE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence
... Introduction... 17. ... Acknowledgments... 19
... Introduction... 17... Acknowledgments... 19 PART I... Getting Started... 21 1... Introduction to Mobile App Development... 23 1.1... The Mobile Market and SAP... 23 1.1.1... Growth of Smart Devices...
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
ProxySG TechBrief Implementing a Reverse Proxy
ProxySG TechBrief Implementing a Reverse Proxy What is a reverse proxy? The Blue Coat ProxySG provides the basis for a robust and flexible Web communications solution. In addition to Web policy management,
UniGR Workshop: Big Data «The challenge of visualizing big data»
Dept. ISC Informatics, Systems & Collaboration UniGR Workshop: Big Data «The challenge of visualizing big data» Dr Ir Benoît Otjacques Deputy Scientific Director ISC The Future is Data-based Can we help?
Introducing Apache Pivot. Greg Brown, Todd Volkert 6/10/2010
Introducing Apache Pivot Greg Brown, Todd Volkert 6/10/2010 Speaker Bios Greg Brown Senior Software Architect 15 years experience developing client and server applications in both services and R&D Apache
Version Control Your Jenkins Jobs with Jenkins Job Builder
Version Control Your Jenkins Jobs with Jenkins Job Builder Abstract Wayne Warren [email protected] Puppet Labs uses Jenkins to automate building and testing software. While we do derive benefit from
Running a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
SQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
CRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
A Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA [email protected], [email protected]
Handling of "Dynamically-Exchanged Session Parameters"
Ingenieurbüro David Fischer AG A Company of the Apica Group http://www.proxy-sniffer.com Version 5.0 English Edition 2011 April 1, 2011 Page 1 of 28 Table of Contents 1 Overview... 3 1.1 What are "dynamically-exchanged
Responsive, resilient, elastic and message driven system
Responsive, resilient, elastic and message driven system solving scalability problems of course registrations Janina Mincer-Daszkiewicz, University of Warsaw [email protected] Dundee, 2015-06-14 Agenda
How is it helping? PragmatiQa XOData : Overview with an Example. P a g e 1 12. Doc Version : 1.3
XOData is a light-weight, practical, easily accessible and generic OData API visualizer / data explorer that is useful to developers as well as business users, business-process-experts, Architects etc.
Certified Selenium Professional VS-1083
Certified Selenium Professional VS-1083 Certified Selenium Professional Certified Selenium Professional Certification Code VS-1083 Vskills certification for Selenium Professional assesses the candidate
Dave Haseman, Ross. Hightower. Mobile Development for SAP* ^>. Galileo Press. Bonn. Boston
Dave Haseman, Ross Hightower Mobile Development for SAP* -a ^>. Galileo Press # Bonn Boston Introduction 17 Acknowledgments 19 PART I Getting Started 1.1 The Mobile Market and SAP 23 1.1.1 Growth of Smart
Local Culture in Global English:
Local Culture in Global English: a case study of Kultur in Sprache / Sprachwissenschaft in Kulturwissenschaften Josef Schmied Chair English Language & Linguistics Chemnitz University of Technology www.tu-chemnitz.de/phil/english/linguist
Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications
Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications Berlin Berlin Buzzwords 2011, Dr. Christoph Goller, IntraFind AG Outline IntraFind AG Indexing Morphological
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
An Overview of SAP BW Powered by HANA. Al Weedman
An Overview of SAP BW Powered by HANA Al Weedman About BICP SAP HANA, BOBJ, and BW Implementations The BICP is a focused SAP Business Intelligence consulting services organization focused specifically
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
Big Table A Distributed Storage System For Data
Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,
Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class
Problem 1. (10 Points) James 6.1 Problem 2. (10 Points) James 6.3 Problem 3. (10 Points) James 6.5 Problem 4. (15 Points) James 6.7 Problem 5. (15 Points) James 6.10 Homework 4 Statistics W4240: Data Mining
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
K@ A collaborative platform for knowledge management
White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index
Yandex: Webmaster Tools Overview and Guidelines
Yandex: Webmaster Tools Overview and Guidelines Agenda Introduction Register Features and Tools 2 Introduction What is Yandex Yandex is the leading search engine in Russia. It has nearly 60% market share
