AIDA-light: High-Throughput Named-Entity Disambiguation
|
|
|
- Osborn Holt
- 9 years ago
- Views:
Transcription
1 AIDA-light: High-Throughput Named-Entity Disambiguation Ba Dat Nguyen Johannes Hoffart Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik Saarbrücken, Germany 1
2 2 / 25 Overview Named Entity Disambiguation High-performance Accurate Entity Disambiguation Simplifying Expensive Features Categories and Domains Multi-phase Computation Experiments 2
3 Named Entity Disambiguation ` (NED) NED aims to map mentions of ambiguous names in natural language onto a set of known entities (e.g. YAGO or DBpedia). Text & Mentions correct entities Under Fergie, United won the Premier League title 13 times. Fergie_(singer), an American singer, songwriter, fashion designer, television host and actress. Alex_Ferguson, a former Scottish football manager of Manchester United F.C. Sarah, Duchess_of_York, the former wife of Prince Andrew, Duke of York.... United_Airlines, an American major airline. United_Airways, a Bangladeshi airline. Manchester_United_F.C., an English professional football club.... Premier League, the English professional football league.... 3
4 4 / 25 State-of-the-art NED Systems Accurate Systems: AIDA and Illinois Wikifier: use rich contextual features (and joint inference) emphasis on quality. High-performance Systems: DBpedia Spotlight and TagMe: mention-by-mention inference with more lightweight features emphasis on speed. 4
5 5 / 25 AIDA-light Goal: reconcile efficiency and accuracy. Approach: simplify expensive features. add novel features with low footprint. multi-phase computation. 5
6 Joint Inference over Disambiguation ` Graph Construct an undirected weighted graph between mentions and entities. Compute the best joint mapping sub-graph. Mentions Entities 6
7 7 / 25 Simplify Expensive Features Key-phrases (AIDA): link anchor texts including categories, citation titles, and external references. Key-tokens: extracted from all key-phrases except stop words. Example: AIDA key-phrases: U.S. President, President of the U.S. AIDA-light key-tokens: President, U.S. 7
8 8 / 25 Categories and Domains Entity, Categories and Domains For example: Domain Hierarchy Entity:Premier_League Category:Football_Leagues Domain:Football Domain-Entity Coherence A entity belongs to a domain if it belongs to at least one category of the domain recompute the mention-entity edge s weight under the domain. Entity-Entity Coherence connect entities from the same domain give higher weight to same-domain entity-entity coherence edges. 8
9 9 / 25 e Multi-phase Computation Easy mentions: mentions with very few candidates or with skewed distributions. Update the context by chosen entities (with domains). Better understanding of the context. Reduce the complexity of the later stages. Mentions: Fergie, United, Premier League Mention:Premier League Entity:Premier_League Domain:Football Domain:Football Fergie United Alex Ferguson Fergie Singer Sarah Manchester Duchess of York United F.C. 9
10 10 / 25 Experimental Setup Systems under comparison: AIDA-light AIDA DBpedia Spotlight Performance measures: All systems take the same mentions as the input. Each mention is mapped to one entity in DBpedia YAGO. Mapping a mention of in-kb entity to null is a failure. We apply per-mention precision. 10
11 11 / 25 Experimental Corpora CoNLL-YAGO testb: news articles with long-tail entities. WP: short contexts with highly ambiguous mentions and long-tail entities. Wikipedia articles: Wikipedia articles with internal links as mentions. Wiki-links: long documents with a few mentions. 11
12 12 / 25 Results on NED Quality Precision on different corpora, statistically significant improvements over Spotlight are marked with an asterisk. 12
13 13 / 25 Results on Run-time Average per-document run-time results. AIDA uses a SQL database, not considered here. 13
14 14 / 25 Conclusion A high-performance accurate NED system First method to consider domain coherence. Judicious choice of high benefit/cost features. Experiments: AIDA-light as good as rich-feature systems. as efficient as fastest systems. 14
15 AIDA-light source code is available to download at Thanks! 15
Statistical Analyses of Named Entity Disambiguation Benchmarks
Statistical Analyses of Named Entity Disambiguation Benchmarks Nadine Steinmetz, Magnus Knuth, and Harald Sack Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany, [email protected]
Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com
Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around
ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data Mining Heiko Paulheim University of Mannheim, Germany Research Group Data and Web Science [email protected] Abstract. Many data
Big Text: from Names and Phrases to Entities and Relations
Big Text: from Names and Phrases to Entities and Relations Gerhard Weikum Max Planck Institute for Informatics Saarbrücken, Germany http://www.mpi-inf.mpg.de/~weikum/ From Natural-Language Text to Knowledge
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner Petar Ristoski, Christian Bizer, and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {petar.ristoski,heiko,chris}@informatik.uni-mannheim.de
Predicting stocks returns correlations based on unstructured data sources
Predicting stocks returns correlations based on unstructured data sources Mateusz Radzimski, José Luis Sánchez-Cervantes, José Luis López Cuadrado, Ángel García-Crespo Departamento de Informática Universidad
Real Madrid brings the stadium closer to 450 million fans around the globe, with the Microsoft Cloud
Customer Solution Story Real Madrid brings the stadium closer to 450 million fans around the globe, with the Microsoft Cloud We can create a one-to-one relationship with fans around the planet with the
Media and Photography
Media and Photography The director calling lights, camera, action. The actors and actresses collecting all the gongs. The influential radio presenter playing the latest hits. They would all be nothing
MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group
Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and
Tech Presentation 2016
Tech Presentation 2016 Our Management Team Marvin Igelman CEO Alex Zivkovic CTO David Berman CFO Matt Burns PM and Growth BreakingSports is the world s first fully automated real-time alerts platform for
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
SpiderStore: A Native Main Memory Approach for Graph Storage
SpiderStore: A Native Main Memory Approach for Graph Storage Robert Binna, Wolfgang Gassler, Eva Zangerle, Dominic Pacher, Günther Specht Databases and Information Systems, Institute of Computer Science
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology
COS 597A: Principles of Database and Information Systems elational model elational model A formal (mathematical) model to represent objects (data/information), relationships between objects Constraints
A Test Collection for Email Entity Linking
A Test Collection for Email Entity Linking Ning Gao, Douglas W. Oard College of Information Studies/UMIACS University of Maryland, College Park {ninggao,oard}@umd.edu Mark Dredze Human Language Technology
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Social Media Usage In European Clubs Football Industry. Is Digital Reach Better Correlated With Sports Or Financial Performane?
117 Social Media Usage In European Clubs Football Industry. Is Digital Reach Better Correlated With Sports Or Financial Performane? Teodor Dima 1 Social media is likely the marketing and communication
Uncovering Plagiarism, Authorship, and Social Software Misuse
Uncovering Plagiarism, Authorship, and Social Software Misuse Bauhaus-Universität Weimar Universitat Politècnica de València University of Lugano Duquesne University Universitat Politècnica de Catalunya
Keyphrase Extraction for Scholarly Big Data
Keyphrase Extraction for Scholarly Big Data Cornelia Caragea Computer Science and Engineering University of North Texas July 10, 2015 Scholarly Big Data Large number of scholarly documents on the Web PubMed
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Big Data Methods for Computational Linguistics
Big Data Methods for Computational Linguistics Gerhard Weikum, Johannes Hoffart, Ndapandula Nakashole, Marc Spaniol, Fabian Suchanek, Mohamed Amir Yosef Max Planck Institute for Informatics Saarbruecken,
Model of a flow in intersecting microchannels. Denis Semyonov
Model of a flow in intersecting microchannels Denis Semyonov LUT 2012 Content Objectives Motivation Model implementation Simulation Results Conclusion Objectives A flow and a reaction model is required
Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora
SAP Brief SAP Technology SAP HANA Vora Objectives Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora Bridge the divide between enterprise data and Big Data Bridge the divide
Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014
Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3
Analyze This! Get Better Insight with Power BI for Office 365
11:15 12:15 Analyze This! Get Better Insight with Power BI for Office 365 Jeff Fenn, BI & Development Practice Manager FMT Consultants Agenda Agile BI Self-Service BI in Excel Power BI for Office 365 Infrastructure
High-Performance Scorecards. Best practices to build a winning formula every time
High-Performance Scorecards Best practices to build a winning formula every time Will your team win or lose? Scorecards drive financial decision making For decades, your organization has used the predictive
parent ROADMAP SUPPORTING YOUR CHILD IN GRADE FIVE ENGLISH LANGUAGE ARTS
TM parent ROADMAP SUPPORTING YOUR CHILD IN GRADE FIVE ENGLISH LANGUAGE ARTS 5 America s schools are working to provide higher quality instruction than ever before. The way we taught students in the past
Greetings from the Iron Bear Club
BAYLOR UNIVERSITY CHAMPIONSHIPS START HERE Inside this issue: Volume 1, Issue 1 Greetings from the Iron Bear Club April 2009 Facility Pictures 2 Kaz Kazadi Anne Tamporello Chris Ruf 3 The Iron Bear Club
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Plato: A Selective Context Model for Entity Resolution
Plato: A Selective Context Model for Entity Resolution Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira Google Inc., Mountain View CA 94043, USA {nevena, asubram, ringgaard, pereira}@google.com
Taxonomic Data Integration from Multilingual Wikipedia Editions
Noname manuscript No. (will be inserted by the editor) Taxonomic Data Integration from Multilingual Wikipedia Editions Gerard de Melo Gerhard Weikum Received: date / Accepted: date Abstract Information
Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008
Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008 Introduction We ve all heard about the importance of data quality in our IT-systems and how the data
The Manuscript as Cultural Heritage: Digitisation ++
The Manuscript as Cultural Heritage: Digitisation ++ A Digital Humanities Point of View Not so much a conservation point of view slides on http://www.slideshare.net/gradmans Prof. Dr. Stefan Gradmann Humboldt-Universität
SMART InTeRneT OF ThIngS
Join us at the SMART InTeRneT OF ThIngS SUMMIT 2012 Manchester United FC Conference Centre, Old Trafford, UK Supported by: November 26-28 2012 The European Commission The Institution of Engineering and
Energy Efficient Systems
Energy Efficient Systems Workshop Report (September 2014) Usman Wajid University of Manchester United Kingdom Produced as a result of Workshop on Energy Efficient Systems @ ICT4S conference, Stockholm
BIG DATA AND ANALYTICS
BIG DATA AND ANALYTICS Björn Bjurling, [email protected] Daniel Gillblad, [email protected] Anders Holst, [email protected] Swedish Institute of Computer Science AGENDA What is big data and analytics? and why one must bother
harmon.ie Delivers the Business Value of Office 365 Migrations
harmon.ie Delivers the Business Value of Office 365 Migrations Congratulations on your move to SharePoint Online and the Office 365 cloud. With Office 365 you will reap the benefits of flexibility and
Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia
Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia Martin Scaiano University of Ottawa [email protected] Diana Inkpen University of Ottawa [email protected] Abstract
Model-Based Requirements Engineering with AutoRAID
Model-Based Requirements Engineering with AutoRAID Bernhard Schätz, Andreas Fleischmann, Eva Geisberger, Markus Pister Fakultät für Informatik, Technische Universität München Boltzmannstr. 3, 85748 Garching,
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping
Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping Martin Leginus 1, Leon Derczynski 2 and Peter Dolog 1 1 Department of Computer Science, Aalborg University Selma Lagerlofs
IT based Knowledge Management. Abstract
3rd International Conference on Engineering & Business Education ICEBE, Manila, Philippines,2010 IT based Knowledge Management Prof. Dr. Uwe Lämmel Hochschule Wismar Wismar Business School Philipp-Müller-Straße,
WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA. by Zareen Saba Syed
WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA by Zareen Saba Syed Thesis submitted to the Faculty of the Graduate School of the University of Maryland in partial fulfillment of the requirements
UNDERGRADUATE PROGRAMME SPECIFICATION
UNDERGRADUATE PROGRAMME SPECIFICATION Programme Title: Awarding Body: Teaching Institution: Final Awards: Intermediate Awards: Mode of Study: UCAS Codes: QAA Benchmarks: Music Business and Production Staffordshire
Personal Browsing History and B-hist Data Storage
B-hist: Entity-Centric Search over Personal Web Browsing History http://mem0r1es.io Michele Catasta 1, Alberto Tonon 2, Vincent Pasquier 3, Gianluca Demartini 2, Philippe Cudré-Mauroux 2, and Karl Aberer
Qlik s Associative Model
White Paper Qlik s Associative Model See the Whole Story that Lives Within Your Data August, 2015 qlik.com Table of Contents Introduction 3 Qlik s associative model 3 Query-based visualization tools only
ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS
ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: [email protected];
Essential Graphics/Design Concepts for Non-Designers
Essential Graphics/Design Concepts for Non-Designers presented by Ana Henke Graphic Designer and Publications Supervisor University Communications and Marketing Services New Mexico State University Discussion
GLOBAL TRANSFER MARKET REPORT 2015
15 GLOBAL TRANSFER MARKET REPORT 2015 2 TABLE OF CONTENTS 1 Foreword 4 Highlights of the 2014 key indicators 6 Executive summary 8 The nature of international transfers 10 1 Market activity and mobility
FOOTBALL INVESTOR. Member s Guide 2014/15
FOOTBALL INVESTOR Member s Guide 2014/15 V1.0 (September 2014) - 1 - Introduction Welcome to the Football Investor member s guide for the 2014/15 season. This will be the sixth season the service has operated
Two new DB2 Web Query options expand Microsoft integration As printed in the September 2009 edition of the IBM Systems Magazine
Answering the Call Two new DB2 Web Query options expand Microsoft integration As printed in the September 2009 edition of the IBM Systems Magazine Written by Robert Andrews [email protected] End-user
Specops Command. Installation Guide
Specops Software. All right reserved. For more information about Specops Command and other Specops products, visit www.specopssoft.com Copyright and Trademarks Specops Command is a trademark owned by Specops
SAP S/4HANA Embedded Analytics
Frequently Asked Questions November 2015, Version 1 EXTERNAL SAP S/4HANA Embedded Analytics The purpose of this document is to provide an external audience with a selection of frequently asked questions
CORE Assessment Module Module Overview
CORE Assessment Module Module Overview Content Area Mathematics Title Speedy Texting Grade Level Grade 7 Problem Type Performance Task Learning Goal Students will solve real-life and mathematical problems
the sporting index group
sporting index group corporate factsheet the sporting index group Sporting Index was founded in 1992 and is well known as the undisputed world leader in sports spread betting, dominating the global market
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
Towards a Sales Assistant using a Product Knowledge Graph
Towards a Sales Assistant using a Product Knowledge Graph Haklae Kim, Jungyeon Yang, and Jeongsoon Lee Samsung Electronics Co., Ltd. Maetan dong 129, Samsung-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do 443-742,
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia
Advertising Keywords Recommendation for Short-Text Web Pages Using Wikipedia WEINAN ZHANG and DINGQUAN WANG, Shanghai Jiao Tong University GUI-RONG XUE, Aliyun.com HONGYUAN ZHA, Georgia Institute of Technology
LEADS 360 Assessment Price List
LEADS 360 Assessment Price List April 2015 QUANTITY 1 LEVEL 2 LEVELS 3 LEVELS 4 LEVELS 10 2,041 2,124 2,206 2,248 25 4,763 4,886 4,969 5,216 50 9,111 9,276 9,441 9,771 100 17,078 17,408 17,738 18,398 250
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
The Alignment of Common Core and ACT s College and Career Readiness System. June 2010
The Alignment of Common Core and ACT s College and Career Readiness System June 2010 ACT is an independent, not-for-profit organization that provides assessment, research, information, and program management
Yet Another Triple Store Benchmark? Practical Experiences with Real-World Data
Yet Another Triple Store Benchmark? Practical Experiences with Real-World Data Martin Voigt, Annett Mitschick, and Jonas Schulz Dresden University of Technology, Institute for Software and Multimedia Technology,
Extracting Publications and Citations from Scopus
Extracting Publications and Citations from Scopus Contents About Scopus... 1 Opening the Scopus Database... 1 Finding Publications... 4 Author Search... 4 Affiliation or University Search... 7 Citation
SAP BW 7.4 Real-Time Replication using Operational Data Provisioning (ODP)
SAP BW 7.4 Real-Time Replication using Operational Data Provisioning (ODP) Dr. Astrid Tschense-Österle, AGS SLO Product Management Marc Hartz, Senior Specialist SCE Rainer Uhle, BW Product Management May
