Fédération et analyse de données distribuées en imagerie biomédicale

Similar documents
fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Interrogation d'entrepôts distribués et hétérogènes

Disributed Query Processing KGRAM - Search Engine TOP 10

Shanoir: Software as a Service Environment to Manage Population Imaging Research Repositories

Grid-wide neuroimaging data federation in the context of the NeuroLOG project.

, Grégoire Malandain, Johan Montagnat 2, Mé 4 9

«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Data and beyond

BIRN Update. Carl Kesselman

Distributed knowledge sharing and production through collaborative e-science platforms

Shanoir: So*ware as a Service Environment to Manage Popula9on Imaging Research Repositories

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

FIPA agent based network distributed control system

Publishing Linked Data Requires More than Just Using a Tool

QASM: a Q&A Social Media System Based on Social Semantics

Leveraging ambient applications interactions with their environment to improve services selection relevancy

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Grids. Lidan Wang April 5, 2007

BYODs & FAIR Data Stewardship

Management of Imaging Data in Clinical Trials: John Sunderland, Ph.D. Department of Radiology University of Iowa

Linked Science as a producer and consumer of big data in the Earth Sciences

A demonstration of the use of Datagrid testbed and services for the biomedical community

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

bigdata Managing Scale in Ontological Systems

OAK Database optimizations and architectures for complex large data Ioana MANOLESCU-GOUJOT

Data Management in NeuroMat and the Neuroscience Experiments System (NES)

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Research Traceability using Provenance Services for Biomedical Analysis

A science-gateway workload archive application to the self-healing of workflow incidents

Graph Database Performance: An Oracle Perspective

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

SCIENTIFIC workflows have recently emerged as a new

Web and Big Data at LIG. Marie-Christine Rousset (Pr UJF, déléguée scientifique du LIG)

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

BIRN Update. Carl Kesselman

Semantic Web and Multi-Agent Approach to Corporate Memory Management

Additional mechanisms for rewriting on-the-fly SPARQL queries proxy

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

CREATING AND APPLYING KNOWLEDGE IN ELECTRONIC HEALTH RECORD SYSTEMS. Prof Brendan Delaney, King s College London

Dendro: collaborative research data management built on linked open data

How To Integrate An Ontology Based Database Into A Cancer Database

Seed4C: A High-security project for Cloud Infrastructure

Enabling Collaboration Using the Biomedical Informatics Research Network (BIRN):

Building Semantic Content Management Framework

Bridging clinical information systems and grid middleware: a Medical Data Manager

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL

Patterns for Business Object Model Integration in Process-Driven and Service-Oriented Architectures

CAMDIT: A Toolkit for Integrating Heterogeneous Medical Data for improved Health Care Service Provisioning

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Extending SOA Infrastructure for Semantic Interoperability

Praseeda Manoj Department of Computer Science Muscat College, Sultanate of Oman

Seed4C: A Cloud Security Infrastructure validated on Grid 5000

LDIF - Linked Data Integration Framework

A Knowledge Base Approach for Genomics Data Analysis

Web. Studio. Visual Studio. iseries. Studio. The universal development platform applied to corporate strategy. Adelia.

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Semantic Interoperability

Semantic Exploration of Archived Product Lifecycle Metadata under Schema and Instance Evolution

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Biomedical Informatics Applications, Big Data, & Cloud Computing

PDS4 and Build 5a Update. Dan Crichton, Emily Law November 2014

Case Study: Semantic Integration as the Key Enabler of Interoperability and Modular Architecture for Smart Grid at Long Island Power Authority (LIPA)

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability

D5.3.2b Automatic Rigorous Testing Components

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Practical Implementation of a Bridge between Legacy EHR System and a Clinical Research Environment

Intelligent interoperable application for employment exchange system using ontology

Neuroimaging Big Data Challenges and Computational Workflow Solutions

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

Basic Scheduling in Grid environment &Grid Scheduling Ontology

DataBridges: data integration for digital cities

GOAL AND STATUS OF THE TLSE PLATFORM

Semantic Web Services for e-learning: Engineering and Technology Domain

Application of ontologies for the integration of network monitoring platforms

An Ontology-based Architecture for Integration of Clinical Trials Management Applications

Digital libraries of the future and the role of libraries

Models and Architecture for Smart Data Management

A generic approach for data integration using RDF, OWL and XML

escidoc: una plataforma de nueva generación para la información y la comunicación científica

Databases & Data Infrastructure. Kerstin Lehnert

> Semantic Web Use Cases and Case Studies

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Bringing Compute to the Data Alternatives to Moving Data. Part of EUDAT s Training in the Fundamentals of Data Infrastructures

Data collection architecture for Big Data

Introduction to Service Oriented Architectures (SOA)

Total Exploration & Production: Field Monitoring Case Study

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

EAI OVERVIEW OF ENTERPRISE APPLICATION INTEGRATION CONCEPTS AND ARCHITECTURES. Enterprise Application Integration. Peter R. Egli INDIGOO.

urika! Unlocking the Power of Big Data at PSC

Transcription:

Software technologies for integration of processes and data in neurosciences ConnaissancEs Distribuées en Imagerie BiomédicaLE Fédération et analyse de données distribuées en imagerie biomédicale Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the and the consortiums Lyon, 13 décembre 2012 http://neurolog.i3s.unice.fr http://credible.i3s.unice.fr

Multi-centric studies in neurosciences Sharing computing resources and algorithms Research (large data sets, statistical studies, models design...) Complex analysis (compute-intensive studies, validation procedures...) Data Procedures Seq1 > dcscdssdcsdcdsc bscdsbcbjbfvbfvbvfbvbvbhvb hsvbhdvbhfdbvfd Processing tools Seq2 > bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscg dfgcsdycgdkcsqkc Seqn > bvdfvfdvhbdfvb bhvdsvbhvbhdvrefghefgdscg dfgcsdycgdkcsqkchdsqhfduh dhdhqedezhhezldhezhfehfle zfzejfv Computing power 2

Motivations Neuroscience data Increasing use of imaging biomarkers for research and diagnosis Increasing number of (multi-centric) large-scale studies Distribution of resources over acquisition sites Need to consider existing site-wide legacy environments Computing resources Heavy computing power required Data analysis pipelines Integrating neurodata analysis codes from different toolkits Centralized approaches encounter limitations Large data volumes to transfer / archive / search Sensitive patient data / complex access control policies Need to adopt uniform data model & format Approach: federate existing resources in a distributed, collaborative platform 3

middleware Heavy client Distributed data federation (files, relational DB, semantic) Distributed computing 4

Rennes IRISA platform Paris ANR TLOG (2006-2010) Multi-centric environment for neurosciences 5 neuroscience centers federated IFR49 (La Pitié Salpétrière) (CHU Rennes) GIN Grenoble (Michalon Hospital) Sophia Antipolis I3S INRIA Sophia (technical) (Centre Antoine Lacassagne) I3S (Sophia Antipolis) core technical site IRISA (INRIA Rennes), collaborating with the University Hospital of Rennes IFR49 (INSERM affiliated neuroscience group in Paris La Pitié Salpétrière Hospital) GIN (INSERM affiliated neurosciences institute of Grenoble, Michalon Hospital) INRIA Sophia Antipolis collaborating with Centre Antoine Lacassagne (Nice) 5

Neuro-imaging data Pathologies MS Multiple Sclerosis Brain strokes Brain tumors Alzheimer's Stroke Data considered Imaging data Various MR modalities (T1, T1 Gado, T2, Flair, Diffusion, PD) Processed images (Registered, Segmented, ) Associated metadata Studies Subjects Data acquisition context and provenance Neurophysiological and Neuroclinical tests Measurements derived from image data 6

data mediation & federation Preserve legacy environment (e.g. relational databases) Cope with heterogenous schemas Use a relational database mediation & federation engine (BusinessObject/SAP DataFederator product) 7

Semantic reference Application ontology Onto Based on a common modeling framework 3-levels structure one Foundational ontology: i.e. DOLCE Several Core ontologies Several Domain ontologies Implemented in OWL-Lite Derived relational schema NeuroBase 2005 Onto OntoNeuroBase 2007 Federated schema Test-bed applications requirements external resources: XNAT, XCEDE, SNOMED, GALEN, FMA, CDISC Grenoble schema Rennes schema Paris schema 8

Onto Variables Instrument Scores Assessment Examination Study MR protocol Dataset Subject Experimental Groups of Subjects 9

data mediation & federation Experimenting both relational and semantics technologies METAMorphoses conversion of (federated) relational databases into a semantic annotations store 10

From relational to semantic datastores multi-disciplinary Antipolis (Oct. 15-17, 2012) workshop in Sophia Semantic models are widely accepted Existing systems in the biomedical community are mostly centralized The need for multi-centric studies support is unambiguous Exploiting / reusing data in a multi-disciplinary context is still preliminary, and ontological resources are not sufficient Approach Semantic reference design RDF triples-based knowledge bases Semantic alignment for heterogeneous data sources Data sources mapping Distributed semantic query engine SPARQL v1.1 compliant 11

Distributed query engine Based on KGRAM (Knowledge Graph Abstract Machine) Full support of SPARQL v1.1 Flexible software architecture adaptable to many use cases Deployment example Meta-producer distributes queries over multiple query endpoints KGRAM endpoint interfaces with heterogeneous data stores 12

Results: distributed query processing Multiple neuroscience data stores querying Relational stores (DF), semantic bases (KGRAM) or both (KGRAM) Performance analysis Q1 : costly evaluation (336 remote invocations) Q2 : selective query (only 5 resulting datasets) Query Relational (SAP DF) Semantic Semantic+Relational Q1 3.03 s ± 0.25 6.13 s ± 0.05 11.76 s ± 0.05 Q2 1.52 s ± 0.62 0.60 s ± 0.03 1.53 s ± 0.14 13

Workflow enactor Workflow language + enactor Abstract description of data analysis pipelines Interface to computing infrastructure Data processing code wrapper component (jigsaw) Interface to grid(s) workload management API MOTEUR workflow manager http://modalis.polytech.unice.fr/moteur2 GWENDIA language Data-driven, implicit-parallelism oriented language Graphic interface to design workflows MOTEUR enactor Transparent exploitation of parallelism Interfaced to multiple submission interface Local resources, EGI, Grid5000... 14

Codes packaging and enactment Enactment data sets Data analysis code bundling Bundles deployment descriptor workflow engine code + dependencies site server 15

Semantic extensions Ontology Concepts & Rules DataSet DataSetProcessing T1-MRI T2-MRI fmri... Segmentation Registration... A B Annotations Processing 16

Semantic extensions Ontology Concepts & Rules DataSet DataSetProcessing T1-MRI T2-MRI fmri... Img1 IsA T1-MRI Segmentation Registration... A B Img2 IsA T1-MRI Annotations Processing 17

Semantic extensions Ontology Concepts & Rules DataSet DataSetProcessing T1-MRI T2-MRI fmri... Segmentation Registration... A B Tool1 HasInput T1-MRI Annotations Tool1 IsA Registration Img1 IsA T1-MRI Processing Tool1 HasOutput Transfo Img2 IsA T1-MRI 18

Semantic extensions Ontology Concepts & Rules DataSet DataSetProcessing T1-MRI T2-MRI fmri... Annotations Segmentation Registration... A B Tool1 HasInput T1-MRI Tool1 HasOutput Transfo Tool1 IsA Registration Img1 IsA T1-MRI Img2 IsA T1-MRI Processing Img1 IsProcessedBy Tool1 Img2 IsProcessedBy Tool1 Tool1 Produced Transfo1 Transfo1 IsA GlobalTransfo 19

Semantic knowledge use in workflows Fine-grained annotation traces generated at run-time Summary generated by inference rules application Produce relevant and human-tractable experiment summaries Example in the context of the VIP (Virtual Imaging Platform) project 20

(Some) lessons learned Data federation feasibility Relational data federation requires semantic reference Dual relational / semantic data view is confusing for end users Mapping to a semantic, well-documented data model Need to cope with site failures Data access control is a tough problem Semantic technologies Powerful semantic query and inference engine Trade-off between query language expressivity and performance Coupling data and processing semantics Semantic querying and inference capability are foreign to users Non-trivial user interface to be defined to query the federation Application to other domains VIP Virtual Imaging Platform http://www.creatis.insa-lyon.fr/vip 21

References Reports & publications available on-line http://credible.i3s.unice.fr & http://neurolog.i3s.unice.fr Publications O. Corby, A. Gaignard, C. Faron-Zucker, J. Montagnat. KGRAM Versatile Inference and Query Engine for the Web of Linked Data IEEE/WIC/ACM International Conference on Web Intelligence, Macao, China, Dec. 2012. A. Gaignard, J. Montagnat, C. Faron-Zucker, O. Corby. Semantic Federation of Distributed Neurodata MICCAI Workshop on Data- and Compute-Intensive Clinical and Translational Imaging Applications, pages 41-50, Nice, France, October 2012. B. Gibaud, G. Kassel, M. Dojat, B. Batrancourt, F. Michel, A. Gaignard, J. Montagnat : sharing neuroimaging data using an ontology-based federated approach AMIA, vol. 2011, pages 472 480, Washington DC, USA, October 2011. F. Michel, A. Gaignard, F. Ahmad, C. Barillot, B. Batrancourt, M. Dojat, B. Gibaud, et al. Grid-wide neuroimaging data federation in the context of the project HealthGrid'10, pages 112-123, IOS Press, Paris, France, 28-30 June 2010. Research reports -12-1-v1: multi-disciplinary workshop report -12-2-v1: distributed semantic query engines -12-3-v1: sémantique des données de l'observation 22