fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries

Similar documents
Disributed Query Processing KGRAM - Search Engine TOP 10

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Interrogation d'entrepôts distribués et hétérogènes

Fédération et analyse de données distribuées en imagerie biomédicale

Publishing Linked Data Requires More than Just Using a Tool

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

Semantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

A generic approach for data integration using RDF, OWL and XML

Linked Science as a producer and consumer of big data in the Earth Sciences

LDIF - Linked Data Integration Framework

Data and beyond

Supporting Change-Aware Semantic Web Services

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Models and Architecture for Smart Data Management

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

Databases & Data Infrastructure. Kerstin Lehnert

OntoDBench: Ontology-based Database Benchmark

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Additional mechanisms for rewriting on-the-fly SPARQL queries proxy

Industry 4.0 and Big Data

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

FIPA agent based network distributed control system

SmartLink: a Web-based editor and search environment for Linked Services

secure intelligence collection and assessment system Your business technologists. Powering progress

BYODs & FAIR Data Stewardship

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

SPC BOARD (COMMISSIONE DI COORDINAMENTO SPC) AN OVERVIEW OF THE ITALIAN GUIDELINES FOR SEMANTIC INTEROPERABILITY THROUGH LINKED OPEN DATA

urika! Unlocking the Power of Big Data at PSC

Software Design Document (SDD) Template

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Data collection architecture for Big Data

Towards Semantics-Enabled Distributed Infrastructure for Knowledge Acquisition

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum

bigdata Managing Scale in Ontological Systems

Fraunhofer FOKUS. Fraunhofer Institute for Open Communication Systems Kaiserin-Augusta-Allee Berlin, Germany.

Graph Database Performance: An Oracle Perspective

Knowledge Management

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability

Semantic Search in Portals using Ontologies

Linked Statistical Data Analysis

SOA Success is Not a Matter of Luck

Semantic Web and Multi-Agent Approach to Corporate Memory Management

Amit Sheth & Ajith Ranabahu, Presented by Mohammad Hossein Danesh

Ontology and automatic code generation on modeling and simulation

OAK Database optimizations and architectures for complex large data Ioana MANOLESCU-GOUJOT

«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge

Vertical Integration of Enterprise Industrial Systems Utilizing Web Services

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Open Ontology Repository Initiative

WHITE PAPER TOPIC DATE Enabling MaaS Open Data Agile Design and Deployment with CA ERwin. Nuccio Piscopo. agility made possible

Data Grids. Lidan Wang April 5, 2007

The various steps in the solution approach are presented below.

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

SCIENTIFIC workflows have recently emerged as a new

Building Semantic Content Management Framework

CAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary

Enabling Collaboration Using the Biomedical Informatics Research Network (BIRN):

Data Wrangling: From the Wild to the Lake

Data Quality in Information Integration and Business Intelligence

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Distributed knowledge sharing and production through collaborative e-science platforms

Semantic Information on Electronic Medical Records (EMRs) through Ontologies

ON DEMAND ACCESS TO BIG DATA THROUGH SEMANTIC TECHNOLOGIES. Peter Haase fluid Operations AG

Thomas Usländer Fraunhofer IITB

Enable Location-based Services with a Tracking Framework

In 2014, the Research Data Purdue University

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The Ontological Approach for SIEM Data Repository

An Ontology-based Architecture for Integration of Clinical Trials Management Applications

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY

Application of ontologies for the integration of network monitoring platforms

Data Management in NeuroMat and the Neuroscience Experiments System (NES)

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

Leveraging ambient applications interactions with their environment to improve services selection relevancy

CREATING AND APPLYING KNOWLEDGE IN ELECTRONIC HEALTH RECORD SYSTEMS. Prof Brendan Delaney, King s College London

Some Research Challenges for Big Data Analytics of Intelligent Security

Presente e futuro del Web Semantico

IAAA Grupo de Sistemas de Información Avanzados

How To Create A Social Web With A Free And Open Source Web Browser (W3C)

Enabling End User Access to Big Data in the O&G Industry

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

DataBridges: data integration for digital cities

Susanna-Assunta Sansone, PhD. Metadata WG3 chair.

Second EUDAT Conference, October 2013 Workshop: Digital Preservation of Cultural Data Scalability in preservation of cultural heritage data

RFI Summary: Executive Summary

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL

CRM dig : A generic digital provenance model for scientific observation

Semantic Interoperability

Automatic Timeline Construction For Computer Forensics Purposes

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Lift your data hands on session

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

Transcription:

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE consortium CNRS/UNS, laboratoire I3S (UMR7271), équipe MODALIS INRIA/CNRS/UNS, laboratoire I3S (UMR 7271), équipe Wimmics INSERM U1099, laboratoire LTSI, équipe MediCIS U. Picardie, laboratoire MIS, équipe Connaissances icube, journée défis masse de données Illkirch, 7/11/2014 1

Motivations Biomedical data High heterogeneity: images, clinical data, biomarkers, biology... Increasing amount / number of (open) sources Big Data Large-scale medical studies (statistical medical studies, epidemiology...) Data (re)analysis opportunities Multicentric studies, Translational research Centralized approaches encounter limitations Need for cross-factors analysis Linked Data Multiple data source kinds Large data volumes to transfer / archive / search Sensitive patient data / complex access control policies Need to adopt uniform data model & format Data is de facto distributed over acquisition centers icube, journée défis masse de données Illkirch, 7/11/2014 2

Biomedical data mediation & federation Data federation through distributed querying and query rewriting Client Query-based access to data Federator (query decomposition, planning & results federation) Remote sub-queries Site 1 Mediator (ETL or query rewriting) Mediator (ETL or query rewriting) Site 2 Heterogeneous databases schema mediation Medical data & metadata: raw data + models + processing results + models + provenance... icube, journée défis masse de données Illkirch, 7/11/2014 3

Domain ontology-based federation Medical domain ontology (reference model) Data querying Client Query-based access to data Federator Data alignment Remote sub-queries Site 1 Mediator Mediator Site 2 icube, journée défis masse de données Illkirch, 7/11/2014 4

Exploitation example: ANR GINSENG (epidemiology) Partnership with GINSENG consortium and Mnemotix SME Federation of heterogeneous epidemiology repositories Multiple epidemiology data acquisition networks Cross-correlation with external data (e.g. demographic: IGN) Mediation of the EHR (Electronic Health Record) data schema icube, journée défis masse de données Illkirch, 7/11/2014 5

Challenges and expertise Challenges Representation of data semantics for heterogeneous data sources biomedical ontology building Data federation distributed query engine Data mediation RDB2RDF, ontology alignment Partnership Distributed query engine Semantic representation Semantic reference I3S/Modalis INRIA/I3S/Wimmics LTSI/MediCIS MIS/Connaissances Medical applications icube, journée défis masse de données Illkirch, 7/11/2014 6

Scientific networking annual workshop Objectives October 2012 Semantic models usage becomes mainstream Medical systems are mostly centralized but strigent need to support multi-centric studies October 2013 Multidisciplinary workshop gathering international experts in biomedical data representation, semantic, distribution, federation, integration... Many distributed data federation engines Work on data partition schemes and data modeling Query language expressivity limitations October 2014 Towards Web-scale databases (bilion of tuples) R/W access and Semantic reasoning increasingly studied icube, journée défis masse de données Illkirch, 7/11/2014 7

Reference ontology 3-levels structure: foundational (DOLCE), core, domain Particular Domain-specific rules Inference abilities Endurant Languages Dataset Subjects acquisition equipment Centres DataTop ontology Current focus on measurements Derived relational schema Medical Image formats Action Dataset processings Dataset Expression acquisitions Conceptualization Inscription Studies Investigators Perdurant Medical image files Medical Datasets image expressions icube, journée défis masse de données Illkirch, 7/11/2014 Examinations 8

Ontology modules Modularized ontology to improve reuse and lightweightness ONL-MR-DA: MR Dataset Acquisition ONL-DP: Data Processing ONL-MSA: Mental State Assessment OntoVIP: Medical Image Simulation Wide diffusion http://bioportal.bioontology.org/ontologies icube, journée défis masse de données Illkirch, 7/11/2014 9

Data query and federation engine KGRAM (Knowledge Graph Abstract Machine) Semantic query engine: Full support of SPARQL1.1 Generic interface for heterogeneous backends Flexible architecture facilitating different deployment scenarios Mediation interface to access relational data Federated relational schema derived from the ontology icube, journée défis masse de données Illkirch, 7/11/2014 10

Distributed Query Processing Query federator decoupled from data sources Asynchronous querying of multiple data sources Query planning and parallel querying icube, journée défis masse de données Illkirch, 7/11/2014 11

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. FILTER (CONTAINS (?name, 'Bob')) } icube, journée défis masse de données Illkirch, 7/11/2014 12

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 icube, journée défis masse de données Illkirch, 7/11/2014 13

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 14

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 15

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 16

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 17

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 Q2' #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 18

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 Q2' #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 19

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 Q2' Q2'' #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 20

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 Q2' Q2'' #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 21

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 Q2' Q2'' #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 22

Distributed Query Processing KGRAM query processing Q SELECT?name?date WHERE {?x foaf:name?name.?x dbpedia:birthdate?date. Q1 FILTER (CONTAINS (?name, 'Bob')) } Q2 Asynchronous execution Interface KGRAM core Q Q1 Q2' Q2'' #1 #2 icube, journée défis masse de données Illkirch, 7/11/2014 23

Data analysis workflows Data-driven compute-intensive workflow engine icube, journée défis masse de données Illkirch, 7/11/2014 24

Ontology Provenance capture Concepts & Rules T1-MRI DataSet T2-MRI fmri... DataSetProcessing Segmentation Registration... A B Annotations Processing icube, journée défis masse de données Illkirch, 7/11/2014 25

Ontology Provenance capture Concepts & Rules T1-MRI DataSet T2-MRI fmri... DataSetProcessing Segmentation Registration... A B Img1 IsA T1-MRI Annotations Processing Img2 IsA T1-MRI icube, journée défis masse de données Illkirch, 7/11/2014 26

Ontology Provenance capture Concepts & Rules T1-MRI DataSet DataSetProcessing T2-MRI fmri... Segmentation Registration... A B Tool1 HasInput T1-MRI Tool1 IsA Registration Annotations Img1 IsA T1-MRI Img2 IsA T1-MRI Tool1 HasOutput Transfo Processing icube, journée défis masse de données Illkirch, 7/11/2014 27

Ontology Provenance capture Concepts & Rules T1-MRI DataSet T2-MRI fmri... DataSetProcessing Segmentation Registration... A B Img1 IsA T1-MRI Img2 IsA T1-MRI Tool1 HasInput T1-MRI Tool1 HasOutput Transfo Tool1 IsA Registration Annotations Img1 IsProcessedBy Tool1 Img2 IsProcessedBy Tool1 Processing Tool1 Produced Transfo1 Transfo1 IsA GlobalTransfo icube, journée défis masse de données Illkirch, 7/11/2014 28

Provenance summarization Fine-grained annotation traces generated at run-time Summary generated by inference rules application Produce relevant and human-tractable experiment summaries icube, journée défis masse de données Illkirch, 7/11/2014 29

Conclusions Query-based data federation grounded on semantic web standards (SPARQL, RDF, RDFS) Ontology-based Emphasis on query language expressivity Support for both horizontal and vertical data partitioning Broad applicability (given that ontologies are available) Reference model for data alignment and query terms Towards support of Multi-Centric studies Heterogeneous databases (data models, database engines) Inherently distibuted data sets Cross-domains (translational research) support through Linked Data icube, journée défis masse de données Illkirch, 7/11/2014 30