Structuring Medical Records with Apache Stanbol. Rafa Haro, Senior Software Engineer, Athento Antonio Pérez Morales, Senior Software Engineer, Ixxus

Size: px
Start display at page:

Download "Structuring Medical Records with Apache Stanbol. Rafa Haro, Senior Software Engineer, Athento Antonio Pérez Morales, Senior Software Engineer, Ixxus"

Transcription

1 Structuring Medical Records with Apache Stanbol Rafa Haro, Senior Software Engineer, Athento Antonio Pérez Morales, Senior Software Engineer, Ixxus

2 Committer, PMC Apache Stanbol, Apache ManifoldCF Topics: Document Analysis, NLP, Machine Learning, Semantic Technologies, ECM Apache Stanbol, Apache ManifoldCF Topics: ECM, Semantic Search, ETL, Machine Learning

3 Apache Stanbol provides a set of reusable components for semantic content management. It extends existing CMSs with a number of semantic services. Traditional Semantic CMS

4 Software Architecture for Semantically Enabled CM and ECM systems

5 Apache Stanbol Story Started within FP7 European Project IKS (Interactive Knowledge Stack ) IKS project brought together an Open Source Community for Defining and Building Platforms in the Semantic CMS Space Incubated in November 2010 Successfully promoted within CMS and ECM industry through IKS Early Adopters Program Graduated to Top-Level Apache Project in October 2012

6 What is a Semantic CMS? Traditional CMS Atomic Unit: Document Properties as meta-data (key-value schemas) Keyword Search Document Management Document Types Document Workflow Semantic CMS Atomic Unit: Entity Semantic meta-data (RDF) Semantic Search Knowledge Management Entity Management Ontologies Source: What Apache Stanbol Can Do for You?. Fabian Christ. ApacheCon Europe 2012

7 Key Points Designed to bring Semantic Technologies to existing CMS Non-intrusive set of RESTful Semantic Services Extremely Modular: Use only the modules you need Main Features: Multilingual Content Enhancement: Structure Content through Semantic Metadata Knowledge Bases Management Knowledge Models and Reasoning Semantic Indexing and Search

8 Stanbol Components Stanbol components provide: RESTful API Java APIs and OSGi services Stanbol components do NOT depend on each other however they can be easily combined to Apache Stanbol Enhancer Apache Stanbol EntityHub Apache Stanbol Ontology Manager Apache Stanbol Reasoners Stanbol Enhancement Engines Apache Stanbol ContentHub Apache Stanbol FactStore Apache Stanbol CMS Adapter Apache Stanbol Rules Apache Stanbol Component Layer

9 Stanbol Components (II) Enhancer: Extracts Knowledge from unstructured parsed content EntityHub: Manage Domain Entities and Topics (Knowledge Bases) ContentHub: Semantic Indexing / Search over your - semantic enhanced - Content CMS Adapter: Sync. your CMS with Apache Stanbol (JCR/CMIS) Ontology Manager: Manage you formal Domain Knowledge Reasoners & Rules: Apply Domain Knowledge to improve / validate extracted Information. Refactor / refine knowledge to align it to public schemas such as schema.org

10 Built on Top of Apache. Apache Felix as OSGi environment Apache Sling launchers and OSGi Tools Apache Maven for building Apache Clerezza as RDF Framework Apache Jena as TripleStore Apache Solr for Knowledge Bases Management Apache Tika for converting input Apache OpenNLP for NLP Processing

11 Integration Scenarios Stand-Alone Server (Stanbol Launchers) Web Application (Servlet-Container) Embedded within an OSGi environment Source: What Apache Stanbol Can Do for You?. Fabian Christ. ApacheCon Europe 2012

12 Project Current Status Incubation (Nov 2010) Apache Stanbol incubating (Aug 2012) Graduation (October 2012) IKS Project Ending (Dec 2012) Apache Stanbol (March 2014) Apache Stanbol (October 2016) Contributions (commits) to Trunk Since Incubation

13 Project Current Status (II) 22 PMC Members (Last Addition Jul 2016) 26 Committers (Last Addition May 2015) 3-5 active committers last 2 years dev@stanbol.apache.org: 228 subscribers Activity has been gradually decreasing 3 major releases Source: Apache Stanbol Committee Report Helper (

14 Stanbol Enhancer RDF

15 Stanbol Enhancer (II)

16 Stanbol Enhancer (III)

17 Stanbol Enhancement Chains Define how Content is processed by the Enhancer through an ExecutionPlan Different Implementations: API: ListChain: in order sequential enhancement engines execution. Parallel Execution of engines not supported WeightedChain: ExecutionPlan is calculated using the engines order metadata. Parallel Execution of engines allowed /enhancer: executes the default chain /enhancer/chain/{chain-name}: executes a concrete named chain /enhancer/engine/{engine-name}: executes a concrete named engine

18 Current Enhancement Engines Preprocessing Tika Engine content type detection text extraction from several document formats metadata extraction from several document formats Natural Language Processing Language Detection (different implementations) Sentence Detection (OpenNLP, SmartCN, REST) Tokenizer (OpenNLP, SmartCN, REST) POS Tagging (OpenNLP, REST) Chunking (OpenNLP, REST) NER (OpenNLP, OpenCalais, REST) Entity Linking Named Entity Linking EntityHub Linking Engine FST (Lucene Finit State Transducer) Linking Engine Entity Co-mention Commercial Engines (OpenCalais, Zemanta, CELI ) Sentiment Analysis Disambiguation DBPedia Spotlight Solr MLT based PostProcessing: Dereferencing

19 Stanbol EntityHub

20 Stanbol EntityHub (II) Manage Multiple Entity Sources (Knowledge Bases) Allows Fast Entity-Lookup using Apache Solr Referenced Site (Remote LD + Local Caches) Vs Managed Site (Entity CRUD Api over manually configured Sites) API: Query for Entities (used by Entity Linking Engines) curl -X POST -d "name=lyon&limit=10" \ CRUD for Managed Sites LDPath support for: Graph Path Retrieval (Used for dereferencing) Schema Translation Simple Reasoning friend-names = foaf:knows/foaf:name schema:name = rdfs:label[@en];

21 Use Case: Hexin Project - Structuring Medical Records R&D Project for Sergas (Galician Public Health Office) Clinical Data Analysis Platform for supporting: Clinical Assistance Epidemiology studies Medical Research Big Data approach for analyzing both structured historical clinical data and unstructured medical records Medical Records are written in Spanish and Galician

22 Hexin: Architecture Event Detection Process ETL URX BIG DATA (HDFS + HIVE) Data Source PatientId Date Structured Events Semantic Events Symptoms: Cough Unrest Reference Cases Detection Process New Case BI Rules Cassandra Unrest Cough Fever>38 Patient Validation Analysis

23 Hexin: Semantic Tagging

24 Hexin: Objective Paciente diabético desde los 5 años y con EPOC moderada grado 2 de la GOLD

25 Hexin:Solution Design Structure Medical Records using Apache Stanbol Enhancer Custom Ontology: Symptoms Diseases Diagnosis Tests Family and Personal History Custom Enhancement Chain: Language Detection > NLP > Entity Linking > Negation Detection > Fact Extraction

26 Hexin: Ontology

27 Hexin: Ontology Indexing For supporting the Entity Linking process against Hexin Ontology, an EntityHub site must be created 2 options: ManagedSite: full CRUD storage <-> DYNAMIC ReferencedSite: READ-ONLY remote site + local index Stanbol EntityHub Indexing Tool: RDF > JenaTDB > Solr Index hexin:* hexin:label > rdfs:label Configure Custom Namespaces, Mappings and Properties Generates an OSGi Bundle with the Yard and YardSite default configurations Copy the index to Stanbol /datafiles folder and install the bundle using Apache Felix OSGi Web Console

28 Hexin: Enhancement Chain Lang. Detect. OpenNLP-Sent. OpenNLP-Token OpenNLP-POS OpenNLP-Chunker Hexin Linking Fact Extract. Negex Custom Hexin Engine. Implemented for the project Entity Linking Engine. Available in Stanbol with a Custom Configuration for this use case NLP Engines. Available in Stanbol. Default Configuration Pre-Processing Engine. Available in Stanbol

29 Hexin: Linking

30 Hexin: Linking (II)

31 Hexin: public class MyEngine implements EnhancementEngine public void activate(componentcontext c) { // initialize, configure,... } public int canenhance(contentitem item) { if(...item matches our expectations...) { return ENHANCE_SYNCHRONOUS; } else { return CANNOT_ENHANCE; } } Maven build maven-bundleplugin adds OSGI metadata maven-scr-plugin adds services metadata OSGi bundle MANIFEST.MF OSGi metadata registered by OSGi MyEngine Service } public void computeenhancements(contentitem item) { // run the engine and add results to item s // RDF graph based on the item s InputStream } Install in Stanbol no restart needed

32 NLP at Apache Stanbol

33 NLP at Apache Stanbol (II) Browsable Map with Spans Spans sorted by Natural Order Iterator based API that allows concurrent Modifications Annotations supported at Spans Level POS Annotation PosTag tag (e.g. NE) lexical category (e.g. Noun) Phrase Annotation (chunks) PhraseTag tag (e.g. NP) lexical-category (e.g. NounPhrase) Sentiment Annotation SentimentTag:: Double Stanbol is an Amazing Tool Token Sentence Chunk Span Types: Token Chunk Sentence Text Section Analyzed Text

34 Hexin Custom Engine: Negex Context/Negex: Algorithm for Negation Detection Based on Triggers-Terms + Regex public abstract class AbstractNegexDetector implements NegexDetector public Set<IRI> detectnegations(string language, Graph metadata, AnalysedText at) throws NegexException{} protected abstract boolean isnegated(string language, String concept, String sentence); } Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. Oct 2001;34(5):

35 Hexin Custom Engine: Negex (II) Triggers Types: Pre-condition Negation terms (e.g. absence of) Pseudo Negation terms (e.g. no increase) Pre-condition possibility phrase (e.g. rule him out) Post-condition negation terms (e.g. unlikely) Termination terms (e.g. but, however) Implementation available under Apache License 2.0 Engine Implementation Challenges: Entity Annotations as Targets AnalyzedText and EntityAnnotations relationships are currently obfuscated GLUE CODE for locating Entity Annotations Spans by using START - END Text Annotations properties Once Entity Annotation sentence is located, is used as context along with the Entity surface-form (mention) for applying the algorithm Negation Returned as a Custom Property for the TextAnnotation (negated = True or False)

36 Hexin Custom Engine: Fact Extraction Paciente diabético desde los 5 años y con EPOC moderada grado 2 de la GOLD

37 Hexin Custom Engine: Fact Extraction (II) In-Context Entity Fact Extraction Facts returned as Entity RDF Metadata like the rest of Entity Properties Different Implementations of Context (all extracted from AnalyzedText structure) Sentence Context (default and usually enough) Window of Text Context Paragraph Context Rule Based Approach: Regex over RAW Text or POS tags Sequence ENTITY reserved word -> OR expression for all ENTITY labels

38 Hexin Custom Engine: Fact Extraction (III) Supported Expressions: diabetes diabético DM desde los N años diabetes diabético DM a los N años Debut diabetes diabético DM a los N años

39 Hexin Custom Engine: Fact Extraction (IV) POS based Rules: Diabetes diagnosed when he was 5 years old NNS VB WRB PRP VBD CD NNS JJ ENTITY \s VB * VB[be] (CD) years old or simply ENTITY \s VB * VB[be] (CD)

40 Thanks for your attention!

Semantic Content Management with Apache Stanbol

Semantic Content Management with Apache Stanbol Semantic Content Management with Apache Stanbol Ali Anil SINACI and Suat GONUL SRDC Software Research & Development and Consultancy Ltd., ODTU Teknokent Silikon Blok No:14, 06800 Ankara, Turkey {anil,suat}@srdc.com.tr

More information

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014

Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014 Apache Sling A REST-based Web Application Framework Carsten Ziegeler cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

The Search API in Drupal 8. Thomas Seidl (drunken monkey)

The Search API in Drupal 8. Thomas Seidl (drunken monkey) The Search API in Drupal 8 Thomas Seidl (drunken monkey) Disclaimer Everything shown here is still a work in progress. Details might change until 8.0 release. Basic architecture Server Index Views Technical

More information

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC

More information

Software Architecture Document

Software Architecture Document Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Appendix A: Inventory of enrichment efforts and tools initiated in the context of the Europeana Network

Appendix A: Inventory of enrichment efforts and tools initiated in the context of the Europeana Network 1/12 Task Force on Enrichment and Evaluation Appendix A: Inventory of enrichment efforts and tools initiated in the context of the Europeana 29/10/2015 Project Name Type of enrichments Tool for manual

More information

If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects

If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects Nick Burch CTO Quanticate Apache Projects 154 Top Level Projects 33 Incubating Projects

More information

Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW

Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW About this Course This course provides SharePoint developers the information needed to implement SharePoint solutions

More information

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Developing Microsoft SharePoint Server 2013 Advanced Solutions Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions Page 1 of 9 Developing Microsoft SharePoint Server 2013 Advanced Solutions Course 20489B: 4 days; Instructor-Led Introduction

More information

Text Analytics Software Choosing the Right Fit

Text Analytics Software Choosing the Right Fit Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Text Analytics World San Francisco, 2013 Agenda Introduction Text Analytics Basics

More information

CHAPTER 6 EXTRACTION OF METHOD SIGNATURES FROM UML CLASS DIAGRAM

CHAPTER 6 EXTRACTION OF METHOD SIGNATURES FROM UML CLASS DIAGRAM CHAPTER 6 EXTRACTION OF METHOD SIGNATURES FROM UML CLASS DIAGRAM 6.1 INTRODUCTION There are various phases in software project development. The various phases are: SRS, Design, Coding, Testing, Implementation,

More information

Using NLP and Ontologies for Notary Document Management Systems

Using NLP and Ontologies for Notary Document Management Systems Outline Using NLP and Ontologies for Notary Document Management Systems Flora Amato, Antonino Mazzeo, Antonio Penta and Antonio Picariello Dipartimento di Informatica e Sistemistica Universitá di Napoli

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Developer s Guide. How to Develop a Communiqué Digital Asset Management Solution

Developer s Guide. How to Develop a Communiqué Digital Asset Management Solution Developer s Guide How to Develop a Communiqué Digital Asset Management Solution 1 PURPOSE 3 2 CQ DAM OVERVIEW 4 2.1 2.2 Key CQ DAM Features 4 2.2 How CQ DAM Works 6 2.2.1 Unified Architecture 7 2.2.2 Asset

More information

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

COMBINING AND EASING THE ACCESS OF THE ESWC SEMANTIC WEB DATA

COMBINING AND EASING THE ACCESS OF THE ESWC SEMANTIC WEB DATA STI INNSBRUCK COMBINING AND EASING THE ACCESS OF THE ESWC SEMANTIC WEB DATA Dieter Fensel, and Alex Oberhauser STI Innsbruck, University of Innsbruck, Technikerstraße 21a, 6020 Innsbruck, Austria firstname.lastname@sti2.at

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

Content Management System (CMS)

Content Management System (CMS) Content Management System (CMS) ASP.NET Web Site User interface to the CMS SQL Server metadata storage, configuration, user management, order history, etc. Windows Service (C#.NET with TCP/IP remote monitoring)

More information

Text Clustering Using LucidWorks and Apache Mahout

Text Clustering Using LucidWorks and Apache Mahout Text Clustering Using LucidWorks and Apache Mahout (Nov. 17, 2012) 1. Module name Text Clustering Using Lucidworks and Apache Mahout 2. Scope This module introduces algorithms and evaluation metrics for

More information

VRTRESEARCH&INNOVATION

VRTRESEARCH&INNOVATION 15/02/2013 VRTRESEARCH&INNOVATION Project C: Future CMS Onderzoek mogelijkheden aggregatie en categorisatie content C.3.1.2 Proof of concept aggregatie en categorisatie content VRT RESEARCH &INNOVATION

More information

Apache Karaf in real life ApacheCon NA 2014

Apache Karaf in real life ApacheCon NA 2014 Apache Karaf in real life ApacheCon NA 2014 Agenda Very short history of Karaf Karaf basis A bit deeper dive into OSGi Modularity vs Extensibility DIY - Karaf based solution What we have learned New and

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

GlassFish v3. Building an ex tensible modular Java EE application server. Jerome Dochez and Ludovic Champenois Sun Microsystems, Inc.

GlassFish v3. Building an ex tensible modular Java EE application server. Jerome Dochez and Ludovic Champenois Sun Microsystems, Inc. GlassFish v3 Building an ex tensible modular Java EE application server Jerome Dochez and Ludovic Champenois Sun Microsystems, Inc. Agenda Java EE 6 and GlassFish V3 Modularity, Runtime Service Based Architecture

More information

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company

Semantic SharePoint. Technical Briefing. Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company Semantic SharePoint Technical Briefing Helmut Nagy, Semantic Web Company Andreas Blumauer, Semantic Web Company What is Semantic SP? a joint venture between iquest and Semantic Web Company, initiated in

More information

Using Apache Solr for Ecommerce Search Applications

Using Apache Solr for Ecommerce Search Applications Using Apache Solr for Ecommerce Search Applications Rajani Maski Happiest Minds, IT Services SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. 2 Copyright Information This document

More information

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution. 1 2 Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution. BI Apps supports Oracle sources, such as Oracle E-Business Suite Applications, Oracle's Siebel Applications, Oracle's

More information

The Open Source Knowledge Discovery and Document Analysis Platform

The Open Source Knowledge Discovery and Document Analysis Platform Enabling Agile Intelligence through Open Analytics The Open Source Knowledge Discovery and Document Analysis Platform 17/10/2012 1 Agenda Introduction and Agenda Problem Definition Knowledge Discovery

More information

Publishing Linked Data Requires More than Just Using a Tool

Publishing Linked Data Requires More than Just Using a Tool Publishing Linked Data Requires More than Just Using a Tool G. Atemezing 1, F. Gandon 2, G. Kepeklian 3, F. Scharffe 4, R. Troncy 1, B. Vatant 5, S. Villata 2 1 EURECOM, 2 Inria, 3 Atos Origin, 4 LIRMM,

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

Agent Services-Based Infrastructure for Online Assessment of Trading Strategies

Agent Services-Based Infrastructure for Online Assessment of Trading Strategies Agent Services-Based Infrastructure for Online Assessment of Trading Strategies Longbing Cao, Jiaqi Wang, Li Lin, Chengqi Zhang Faculty of Information Technology, University of Technology Sydney, Australia

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

MongoDB Developer and Administrator Certification Course Agenda

MongoDB Developer and Administrator Certification Course Agenda MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Server-side OSGi with Apache Sling. Felix Meschberger Day Management AG 124

Server-side OSGi with Apache Sling. Felix Meschberger Day Management AG 124 Server-side OSGi with Apache Sling Felix Meschberger Day Management AG 124 About Felix Meschberger > Senior Developer, Day Management AG > fmeschbe@day.com > http://blog.meschberger.ch > VP Apache Sling

More information

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management Ram Soma 2, Amol Bakshi 1, Kanwal Gupta 3, Will Da Sie 2, Viktor Prasanna 1 1 University of Southern California,

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Configuring SharePoint 2013 Document Management and Search. Scott Jamison Chief Architect & CEO Jornata scott.jamison@jornata.com

Configuring SharePoint 2013 Document Management and Search. Scott Jamison Chief Architect & CEO Jornata scott.jamison@jornata.com Configuring SharePoint 2013 Document Management and Search Scott Jamison Chief Architect & CEO Jornata scott.jamison@jornata.com Configuring SharePoint 2013 Document Management and Search Scott Jamison

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Developing Microsoft SharePoint Server 2013 Advanced Solutions Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions Course Details Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key

More information

Optimizing Multilingual Search With Solr

Optimizing Multilingual Search With Solr www.basistech.com info@basistech.com 617-386-2090 Optimizing Multilingual Search With Solr Pg. 1 INTRODUCTION Today s search application users expect search engines to just work seamlessly across multiple

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Apache Flink. Fast and Reliable Large-Scale Data Processing

Apache Flink. Fast and Reliable Large-Scale Data Processing Apache Flink Fast and Reliable Large-Scale Data Processing Fabian Hueske @fhueske 1 What is Apache Flink? Distributed Data Flow Processing System Focused on large-scale data analytics Real-time stream

More information

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Investigating Hadoop for Large Spatiotemporal Processing Tasks Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein dstrohschein@cga.harvard.edu Stephen Mcdonald stephenmcdonald@cga.harvard.edu Benjamin Lewis blewis@cga.harvard.edu Weihe

More information

IKAN ALM Architecture. Closing the Gap Enterprise-wide Application Lifecycle Management

IKAN ALM Architecture. Closing the Gap Enterprise-wide Application Lifecycle Management IKAN ALM Architecture Closing the Gap Enterprise-wide Application Lifecycle Management Table of contents IKAN ALM SERVER Architecture...4 IKAN ALM AGENT Architecture...6 Interaction between the IKAN ALM

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Automate Your BI Administration to Save Millions with Command Manager and System Manager

Automate Your BI Administration to Save Millions with Command Manager and System Manager Automate Your BI Administration to Save Millions with Command Manager and System Manager Presented by: Dennis Liao Sr. Sales Engineer Date: 27 th January, 2015 Session 2 This Session is Part of MicroStrategy

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

PHP Integration Kit. Version 2.5.1. User Guide

PHP Integration Kit. Version 2.5.1. User Guide PHP Integration Kit Version 2.5.1 User Guide 2012 Ping Identity Corporation. All rights reserved. PingFederate PHP Integration Kit User Guide Version 2.5.1 December, 2012 Ping Identity Corporation 1001

More information

AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS

AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS Wesley Deneke 1, Wing-Ning Li 2, and Craig Thompson 2 1 Computer Science and Industrial Technology Department, Southeastern Louisiana University,

More information

Triplestore Testing in the Cloud with Clojure. Ryan Senior

Triplestore Testing in the Cloud with Clojure. Ryan Senior Triplestore Testing in the Cloud with Clojure Ryan Senior About Me Senior Engineer at Revelytix Inc Revelytix Info Strange Loop Sponsor Semantic Web Company http://revelytix.com Blog: http://objectcommando.com/blog

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

Things Made Easy: One Click CMS Integration with Solr & Drupal

Things Made Easy: One Click CMS Integration with Solr & Drupal May 10, 2012 Things Made Easy: One Click CMS Integration with Solr & Drupal Peter M. Wolanin, Ph.D. Momentum Specialist (principal engineer), Acquia, Inc. Drupal contributor drupal.org/user/49851 co-maintainer

More information

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Big Data and Semantic Web in Manufacturing Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Outline Big data in Manufacturing Big data Analytics Semantic web technologies Case

More information

Semantic Stored Procedures Programming Environment and performance analysis

Semantic Stored Procedures Programming Environment and performance analysis Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski

More information

Social Media Monitoring Tools enhanced by Semantic Web Technologies. Presentation of the Master Thesis Fabian Gasser

Social Media Monitoring Tools enhanced by Semantic Web Technologies. Presentation of the Master Thesis Fabian Gasser Social Media Monitoring Tools enhanced by Semantic Web Technologies Presentation of the Master Thesis Fabian Gasser Contents 1. 2. 3. 4. 5. 6. 7. 8. Main Concepts Challenges Research Question Social Media

More information

Scope. Cognescent SBI Semantic Business Intelligence

Scope. Cognescent SBI Semantic Business Intelligence Cognescent SBI Semantic Business Intelligence Scope...1 Conceptual Diagram...2 Datasources...3 Core Concepts...3 Resources...3 Occurrence (SPO)...4 Links...4 Statements...4 Rules...4 Types...4 Mappings...5

More information

Nuxeo, an open source platform for content-centric business applications. Stéfane Fermigier, Nuxeo Laurent Doguin, Nuxeo

Nuxeo, an open source platform for content-centric business applications. Stéfane Fermigier, Nuxeo Laurent Doguin, Nuxeo Nuxeo, an open source platform for content-centric business applications Stéfane Fermigier, Nuxeo Laurent Doguin, Nuxeo Nuxeo, the Company Providing an Open Source Content Management Platform for Business

More information

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark

More information

Effective Web Application Development with Apache Sling. Robert Munteanu ( @rombert ), Adobe Systems Romania

Effective Web Application Development with Apache Sling. Robert Munteanu ( @rombert ), Adobe Systems Romania Effective Web Application Development with Apache Sling Robert Munteanu ( @rombert ), Adobe Systems Romania About the Speaker Apache Sling PMC member Fanboy of the Sling/JCR/OSGi stack Enthusiastic Open-Source

More information

Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489

Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489 Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489 Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key aspects of the apps

More information

Big Data and Scripting Systems beyond Hadoop

Big Data and Scripting Systems beyond Hadoop Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid

More information

Automating Attack Analysis Using Audit Data. Dr. Bruce Gabrielson (BAH) CND R&T PMO 28 October 2009

Automating Attack Analysis Using Audit Data. Dr. Bruce Gabrielson (BAH) CND R&T PMO 28 October 2009 Automating Attack Analysis Using Audit Data Dr. Bruce Gabrielson (BAH) CND R&T PMO 28 October 2009 2 Introduction Audit logs are cumbersome and traditionally used after the fact for forensics analysis.

More information

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

D5.3.2b Automatic Rigorous Testing Components

D5.3.2b Automatic Rigorous Testing Components ICT Seventh Framework Programme (ICT FP7) Grant Agreement No: 318497 Data Intensive Techniques to Boost the Real Time Performance of Global Agricultural Data Infrastructures D5.3.2b Automatic Rigorous

More information

Clinical Mapping (CMAP) Draft for Public Comment

Clinical Mapping (CMAP) Draft for Public Comment Integrating the Healthcare Enterprise 5 IHE Patient Care Coordination Technical Framework Supplement 10 Clinical Mapping (CMAP) 15 Draft for Public Comment 20 Date: June 1, 2015 Author: PCC Technical Committee

More information

GenomeSpace Architecture

GenomeSpace Architecture GenomeSpace Architecture The primary services, or components, are shown in Figure 1, the high level GenomeSpace architecture. These include (1) an Authorization and Authentication service, (2) an analysis

More information

San Jose State University

San Jose State University San Jose State University Fall 2011 CMPE 272: Enterprise Software Overview Project: Date: 5/9/2011 Under guidance of Professor, Rakesh Ranjan Submitted by, Team Titans Jaydeep Patel (007521007) Zankhana

More information

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University

More information

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer

More information

LabelTranslator - A Tool to Automatically Localize an Ontology

LabelTranslator - A Tool to Automatically Localize an Ontology LabelTranslator - A Tool to Automatically Localize an Ontology Mauricio Espinoza 1, Asunción Gómez Pérez 1, and Eduardo Mena 2 1 UPM, Laboratorio de Inteligencia Artificial, 28660 Boadilla del Monte, Spain

More information

Cache Configuration Reference

Cache Configuration Reference Sitecore CMS 6.2 Cache Configuration Reference Rev: 2009-11-20 Sitecore CMS 6.2 Cache Configuration Reference Tips and Techniques for Administrators and Developers Table of Contents Chapter 1 Introduction...

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

Full-text Search in Intermediate Data Storage of FCART

Full-text Search in Intermediate Data Storage of FCART Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Hive Interview Questions

Hive Interview Questions HADOOPEXAM LEARNING RESOURCES Hive Interview Questions www.hadoopexam.com Please visit www.hadoopexam.com for various resources for BigData/Hadoop/Cassandra/MongoDB/Node.js/Scala etc. 1 Professional Training

More information

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics

More information

THE EUROPEAN DATA PORTAL

THE EUROPEAN DATA PORTAL European Public Sector Information Platform Topic Report No. 2016/03 UNDERSTANDING THE EUROPEAN DATA PORTAL Published: February 2016 1 Table of Contents Keywords... 3 Abstract/ Executive Summary... 3 Introduction...

More information

Information as a Service in a Data Analytics Scenario A Case Study

Information as a Service in a Data Analytics Scenario A Case Study 2008 IEEE International Conference on Web Services Information as a Service in a Analytics Scenario A Case Study Vishal Dwivedi, Naveen Kulkarni SETLabs, Infosys Technologies Ltd { Vishal_Dwivedi, Naveen_Kulkarni}@infosys.com

More information

How RAI's Hyper Media News aggregation system keeps staff on top of the news

How RAI's Hyper Media News aggregation system keeps staff on top of the news How RAI's Hyper Media News aggregation system keeps staff on top of the news 13 th Libre Software Meeting Media, Radio, Television and Professional Graphics Geneva - Switzerland, 10 th July 2012 Maurizio

More information

OWB Users, Enter The New ODI World

OWB Users, Enter The New ODI World OWB Users, Enter The New ODI World Kulvinder Hari Oracle Introduction Oracle Data Integrator (ODI) is a best-of-breed data integration platform focused on fast bulk data movement and handling complex data

More information

High-Speed In-Memory Analytics over Hadoop and Hive Data

High-Speed In-Memory Analytics over Hadoop and Hive Data High-Speed In-Memory Analytics over Hadoop and Hive Data Big Data 2015 Apache Spark Not a modified version of Hadoop Separate, fast, MapReduce-like engine In-memory data storage for very fast iterative

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

DE-20489B Developing Microsoft SharePoint Server 2013 Advanced Solutions

DE-20489B Developing Microsoft SharePoint Server 2013 Advanced Solutions DE-20489B Developing Microsoft SharePoint Server 2013 Advanced Solutions Summary Duration Vendor Audience 5 Days Microsoft Developer Published Level Technology 21 November 2013 300 Microsoft SharePoint

More information

How To Develop An Open Play Context Framework For Android (For Android)

How To Develop An Open Play Context Framework For Android (For Android) Dynamix: An Open Plug-and-Play Context Framework for Android Darren Carlson and Andreas Schrader Ambient Computing Group / Institute of Telematics University of Lübeck, Germany www.ambient.uni-luebeck.de

More information