Text-Driven Ontology Generation and Extension in the Finance Domain. Mihaela Vela Language Technology Lab DFKI Saarbrücken



Similar documents
Semantic EPC: Enhancing Process Modeling Using Ontologies

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

Semantic Variability Modeling for Multi-staged Service Composition

Terminology Extraction from Log Files

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Semantic annotation of requirements for automatic UML class diagram generation

Terminology Extraction from Log Files

The CroCo Translation Archive

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Word Completion and Prediction in Hebrew

Named Entity Recognition Experiments on Turkish Texts

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

An experience with Semantic Web technologies in the news domain

Domain Classification of Technical Terms Using the Web

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

B A S I C S C I E N C E S

Extraction and Visualization of Protein-Protein Interactions from PubMed

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1

Reverse Engineering a Rule-Based Finnish Named Entity Recognizer

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013

Collecting Polish German Parallel Corpora in the Internet

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Question Answering and Multilingual CLEF 2008

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture

Special Topics in Computer Science

Estonian Large Vocabulary Speech Recognition System for Radiology

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

A Modeling Methodology for Scientific Processes

Trends in corpus specialisation



Introduction to Text Mining. Module 2: Information Extraction in GATE

Social Media Monitoring Tools enhanced by Semantic Web Technologies. Presentation of the Master Thesis Fabian Gasser

Brill s rule-based PoS tagger

Corpus Design for a Unit Selection Database

KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE

Introduction to IE with GATE

Improve Cooperation in R&D. Catalyze Drug Repositioning. Optimize Clinical Trials. Respect Information Governance and Security

Customizing an English-Korean Machine Translation System for Patent Translation *

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Financial Events Recognition in Web News for Algorithmic Trading

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Computer Aided Translation

Information Technology for KM

7. Classification. Business value. Structuring (repetition) Automation. Classification (after Leymann/Roller) Automation.

Experiences from Verbmobil. Norbert Reithinger DFKI GmbH Stuhlsatzenhausweg 3 D Saarbrücken bert@dfki.de

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

Solutions for energy efficient buildings and cities

Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications

Support verb constructions

Anotaciones semánticas: unidades de busqueda del futuro?

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Design and Implementation of an Automatic Semantic Annotation Service

Semantically enhanced Information Retrieval: an ontology-based approach

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

IST World. European RTD Information and Service Portal FP IST Brigitte Jörg, Language Technology Lab, DFKI GmbH

Interactive Dynamic Information Extraction

Projektgruppe. Information Extraction An Incomplete Overview

A stream computing approach towards scalable NLP

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Text Mining derive useful information from textual resources such as Web pages, media articles, document archives, etc.

Text Mining - Scope and Applications

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

Technical Report. The KNIME Text Processing Feature:

MODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G.

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Leveraging ASEAN Economic Community through Language Translation Services

Data Pre-Processing in Spam Detection

SIMOnt: A Security Information Management Ontology Framework

Natural Language Processing in the EHR Lifecycle

Brauchen die Digital Humanities eine eigene Methodologie?

Adding Semantics to Business Intelligence

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1

DanNet Teaching and Research Perspectives at CST

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Information Management

Language and Computation

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

Text Analysis for Big Data. Magnus Sahlgren

Natural Language Processing

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

IT services for analyses of various data samples

Build Vs. Buy For Text Mining

Semantic Search in Portals using Ontologies

it-novum whitepaper August 2011 SAP and SugarCRM

Cyber-Physical Systems, Internet of Things & Industry 4.0 First Technical Prototypes

Ontology-based Information Extraction and Question Answering Coming Together

WIKINGER Wiki Next Generation Enhanced Repositories

Data Management and Standardisation in Distributed Systems Biology Research

A Framework for Change Impact Analysis of Ontology-Driven Content-Based Systems

Transcription:

Text-Driven Ontology Generation and Extension in the Finance Domain Mihaela Vela Language Technology Lab DFKI Saarbrücken

European MUSING project Development of Business Intelligence tools and modules based on semantic knowledge and content systems www.musing.eu

Our Approach Ontology generation and extension on the base of textual patterns and various natural language annotation layers Multi-layer approach for building T- Box elements Corpus: German economical news articles

Processing Layers String processing Morpho-syntactic information Chunking and dependency information Accessing Semantic Resources Diagonal to the other processes

Processing of Text Patterns Extract nominals Candidates for concepts E.g.: konzern, firma, chef Extract compounds Restricting the number of suggested concepts to those occuring in compounds Candidates for relations between suggested concepts E.g.: adressdatenbanken[suggestedconcept + suffix] objectproperty(adress, datenbanken) adressdatenbanken[prefix + SuggestedConcept] subclassof(adressdatenbanken, datenbanken) E.g.: konzernchef[suggestedconcept + suffix] objectproperty(konzern, chef) konzernchef[prefix + SuggestedConcept] subclassof (konzernchef, chef)

Filtering and Consolidation Reformulation of compounds SuggestedConcept + PREP[mit] + Suffix E.g.: Datenbanken mit Adressen subclassof(adressendatenbanken, datenbank) Consolidated subclassof Filter out objectproperty SuggestedConcept + PREP[von von dem vom] + Prefix E.g.: Chef vom Konzern objectproperty(konzern, chef) Consolidated objectproperty = HAS Filter out subclassof More precise determination of textual patterns that can be used for OG SuggestedConcept + PREP[mit] + NOUN_MODIFIER + Suffix E.g.: Datenbanken mit Milliarden Adressen amount of address data SuggestedConcept + PREP(von von dem vom) + ADJ_MODIFIER + Prefix E.g.: Chef des deutschen Konzern subclassification of bank This to come to relevance for OG in next processing steps

Examples of Compounds to be Considered candidates for A-Box needs first NE detection and recognition E.g.: colonia-konzern, nokia-konzern, lornho-chef, iata-generaldirektor To be considered in next processing steps

Statistics Total number of tokens: 168593 Processing stage Concept Compounds Reformulation Concept selection 20109 - - Compound selection 3408 11163 - Compound filtering 291 370 1093 Wirtschaftswoche 1992 Need to extend the volume of the corpus or access other corpora for searching for compound reformulations

Other related points Strategy for semi-automatic validation/rejection/specification including when necessary/possible intervention of domain experts and ontology engineers Issues to be dealt with in the next steps Relation transitivity More structure in the ontology E.g.: iata-generaldirektor + iata-konzern x(konzern, generaldirektor) Morphological variations Fine-grainedness of relations Filling A-Box Consider broader context for OG Domain and range

Morphology and Semantics Find compounds based on COMP (less restrictions) E.g.: <W INFL="[17 18 19]" POS="1" STEM="chef" COMP="firmen chef" TC="22">Firmenchef</W> Class instantiation (A-Box) as a side effect E.g.: bayer-konzern, busse design Deal with morphological variations E.g.: firmenchef, bankenchef POS and semantic resources for compound components E.g.: großkonzern vs. chemiekonzern us-konzern vs. mega-konzern

2rd Layer Conclusions and Results existing ontology is consolidated on T-Box and expanded on A-Box Context is still narrow only word analysis

Dependency Information Broader context E.g.: [NP-Subj Er] [VG soll] [PP im Konzern] [NP-Ind-Obj Finanzchef [NE- Pers Gerhard Liener] ] [VG folgen] Finanzchef vs. Konzernchef

Conclusions Build/expand finance ontology in 3 stages Trackability and versioning between the differnt layers and runs