Using Open Source software and Open data to support Clinical Trial Protocol design Nikolaos Matskanis, Joseph Roumier, Fabrice Estiévenart {nikolaos.matskanis, joseph.roumier, fabrice.estievenart}@cetic.be CETIC Centre of Excellence in Information and Communication Technologies Med-e-Tel Conference Luxembourg, 10 th April 2014
Adoption: Free, libre and Open Source Semantic web Software and data commercial & scientific support. Openness is useful for inter-linking. Medical domain strong adoption, e.g. Biomedical Ontology Portal. Openness is useful when you need trust in data. Requirements for our project: Re-use existing components (libraries, software,...) Ensure that the published modifications by tiers are kept open, even for SaaS License: Affero GPL v3
Supporting Clinical Trial Protocol Design The project goal is to assist the CTP design using Open source software Open linked Data We have developed open source components Clinical Trial Protocol Repository Semantic Mapper Linked Data Application
Clinical Trial Protocol Repository
Ontologies integration for Search Engines Medical domain is split in many expertise fields different health-related ontologies, models, coding systems, protocols, etc. PONTE aims at covering all the domains of clinical trial design integration of many point-of-views LOINC, KEGG Compounds & Lipids, NCI Common Terminology Criteria for Adverse Events (CTCAE) v.4, ICD-10-CM, Animals ontology from GO3R project, etc. Resulting ontology is a hierarchy 49500 concepts Developed in Web Ontology Language (OWL) & translated into OBO
Design CTP Ontology Based on standards (DICOM, ICD-10-CM, Chebi, ATC, LOINC) for study/trial design Driven by input and feedback from medical partners Validated with medical experts in workshops and demo events Used as the backbone of PONTE Platform Provides high and low level structure of the CTP document Is linked with the eligibility criteria ontology Ontology Metrics:
CTP Repository Architecture CTPRepository Web Service (Open source libraries and container) RDF repository (OpenRDF Sesame) Querying, reasoning operations XML database (BaseX, custom implementation) Caching XML documents CTP Editing Interface EHR Communication Decision Support CTP Sections, Hospitals Criteria, hospitals Patient Information CTPRepository RDF Repository XML Database
XML Java Model RDF Triples
Semantic Mapper
Semantic Term Code Mapper Service dedicated to the mapping of different vocabularies/classification schemes Example : Vocabulary ICD-9-CM ICD-10-CM Code 41071 I21.4 Name Subendocardial infarction, initial episode of care Non-ST elevation (NSTEMI) myocardial infarction
Vocabularies Subsets Vocabularies managed by the mapper : Disorders : ICD-9-CM, ICD-10-CM Pharmacological substances: IOPR, ChEBI, ATC Genders : CNR, DICOM Marital Status : CNR, Ponte And others
Mappings come from : Architecture manual work for small vocabularies mapping files (GEM:General Equivalence Mappings) Technologies MySQL, relational database with abstaction layer and result caching SOAP service with Java implementation
Linked Data Application
Linked Open Data Open and freely available data Value of data increases the more it is interlinked with other data RDF to structure the data HTTP URIs to publish Semantic references such as owl:sameas to semantically associate and link Linked Data Benefits: By following the links, humans can browse, search engines can search/crawl Query traversal Extend query by following links in the results Gather and aggregate results over distributed data sources
The Linked Open Data Cloud Linked Open Data communities Governments, media, academic institutes Medical and life sciences: Clinical research (PubMed, clinicaltrials.gov, GeneOntology) Disease (Diseasome) Drug (DrugBank, DailyMed, Kegg, Sider)
Consuming Linked Data SotA The Bio2RDF project has created a framework on demand data for mash-ups The FedBench project has a benchmark framework analysing the Linked Data querying efficiency and performance SQUIN engine and model for traversal based query execution over Linked Data SPARQLeR designed for finding semantic associations in RDF bases. PHP and Javascript libraries for Linked Data mashup arc2, Graphite, EasyRDF, Moriarty Services for publishing linked data (Virtuoso)
Linked Data Application The LDApp interface allows the clinician to enter the question from one of the main perspectives: Disease, Drug, Target and Clinical Trial. Mechanisms to query these LOD sources Offers query expansion across multiple sources and navigation through them aggregates the retrieved information
Application User Interface.
How it works SELECT DISTINCT?d1?l1?i WHERE {?i a drugbank:drug_interactions.?i drugbank:interactiondrug1?d1.?i drugbank:interactiondrug2?d2.?d1 rdfs:label?l1.?d2 rdfs:label?l2. Filter (?l1="liothyronine"?l2="liothyronine") }
Evaluation Evaluation workshop with medical partners The expansion on trials queries Results from 2 data sources. Most results were characterised as relevant. The expansion on disease and drug targets Results up to 3 data sources. Results were relevant Aggregation of linked drugs and trials was very helpful Response times for queries Queries to drugbank and linkedct: 5 to 10 seconds. Expansions As single source searches. Retrieving of linked instances is usually very fast.
Questions