Genomic CDS: an example of a complex ontology for pharmacogenetics and clinical decision support



Similar documents
Integration of genomic data into electronic health records

A Pilot Study of the Incidence of Exposure to Drugs for which Preemptive Pharmacogenomic Testing Is Available

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

How To Use Networked Ontology In E Health

Logical and categorical methods in data transformation (TransLoCaTe)

Ontology and automatic code generation on modeling and simulation

A Primer of Genome Science THIRD

Semantic Stored Procedures Programming Environment and performance analysis

Graph Database Performance: An Oracle Perspective

Amazon EC2 XenApp Scalability Analysis

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE E15

Cloud-Based Big Data Analytics in Bioinformatics

Implementation of Pharmacogenomics in Clinical Practice: Barriers and Potential Solutions

The Semantic Web Rule Language. Martin O Connor Stanford Center for Biomedical Informatics Research, Stanford University

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

FREE computing using Amazon EC2

Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology

What is Pharmacogenomics? Personalization of Medications for You! Michigan State Medical Assistants Conference May 6, 2006

Multilingual and Localization Support for Ontologies

Web-Based Genomic Information Integration with Gene Ontology

Biomedical Big Data and Precision Medicine

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

An Ontology-based e-learning System for Network Security

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Regulatory Issues in Genetic Testing and Targeted Drug Development

Introduction to genetic testing and pharmacogenomics

Clinical Implementation of Psychiatric Pharmacogenomic Testing

Secure Semantic Web Service Using SAML

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats

treatments) worked by killing cancerous cells using chemo or radiotherapy. While these techniques can

Integrating Bioinformatics, Medical Sciences and Drug Discovery

School of Nursing. Presented by Yvette Conley, PhD

Completing Description Logic Knowledge Bases using Formal Concept Analysis

SNOMED-CT

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

CCR Biology - Chapter 7 Practice Test - Summer 2012

The Human Genome Project

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Application of ontologies for the integration of network monitoring platforms

SNP Data Integration and Analysis for Drug- Response Biomarker Discovery

Message Classification user guide

What s New in Pathway Studio Web 11.1

FDA - Adverse Event Reporting System (FAERS)

How To Write A Drupal Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

OWL Ontology Translation for the Semantic Web

Global Alliance. Ewan Birney Associate Director EMBL-EBI

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

An Efficient and Scalable Management of Ontology

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Annotea and Semantic Web Supported Collaboration

Genomics and the EHR. Mark Hoffman, Ph.D. Vice President Research Solutions Cerner Corporation

SNPbrowser Software v3.5

GenBank, Entrez, & FASTA

Optimizing Description Logic Subsumption

Understanding and Supporting Intersubjective Meaning Making in Socio-Technical Systems: A Cognitive Psychology Perspective

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Ontology-Based Discovery of Workflow Activity Patterns

> Semantic Web Use Cases and Case Studies

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual

14.3 Studying the Human Genome

escience and Post-Genome Biomedical Research

Operations Guide for the System Center Cloud Services Process Pack

Outcome Data, Links to Electronic Medical Records. Dan Roden Vanderbilt University

Core Facility Genomics

DLDB: Extending Relational Databases to Support Semantic Web Queries

Classifying ELH Ontologies in SQL Databases

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications

Ampersand and the Semantic Web

Reduced Risk & Improved Efficacy through Integrated Pharmacogenetics

Privacy and Security Policies for Healthcare Solutions on the Cloud

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Reputation Network Analysis for Filtering

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1

Transcription:

Genomic CDS: an example of a complex ontology for pharmacogenetics and clinical decision support Matthias Samwald 1 1 Medical University of Vienna, Vienna, Austria matthias.samwald@meduniwien.ac.at Abstract. Individual genetic data can be used to better predict the efficacy and safety of medications for individual patients. The Genomic Clinical Decision Support (Genomic CDS) ontology aims to utilize advanced Web Ontology Language 2 (OWL 2) reasoning for this task. The important, clear-cut medical use case, the complex axioms in the ontology and the heavy use of qualified cardinality restrictions make the ontology an interesting test object for new OWL 2 reasoners with improved performance. Keywords: OWL, pharmacogenetics, clinical decision support 1 Motivation Different patients can react drastically different to the same type of medication (Fig. 1). The goal of personalized medicine and pharmacogenetics is to predict an individual patient s response by analyzing genetic markers that influence how medications are metabolized or able to bind to their targets. Fig. 1. The efficacy and safety of medications can drastically vary between patients. The goal of pharmacogenetics is to classify patients into subgroup based on genetic markers, to better predict which treatments could help and which could do harm. To produce clinically valid and trustworthy predictions, no errors or ambiguities should arise in the process of inferring a patient s likely response from raw genetic

data. Current formalisms, data infrastructures and software applications leave many opportunities for introducing such errors and ambiguities. Ontologies formalized with the Web Ontology Language 2 (OWL 2) could be an excellent choice for tackling this problem, but the complexity and potentially large scale of ontologies in this domain also pose formidable challenges to currently available OWL 2 reasoners. 2 The Genomic CDS ontology The Genomic Clinical Decision Support (Genomic CDS) ontology is an OWL 2 ontology aimed at representing pharmacogenetic knowledge and providing clinical decision support based on pharmacogenetic data. It is being developed by members of the Clinical Pharmacogenomics Task Force, which is part of the Health Care and Life Science Interest Group of the World Wide Web Consortium (W3C). The OWL files of the ontology, as well as demo files containing example patient data can be downloaded from http://www.genomic-cds.org/ont/snapshot-june-2013 We also created a simplified version of the Genomic CDS ontology, called Genomic CDS light, which does not contain some of the axioms of the full ontology. Both versions of the ontology have ALCQ expressivity. They are characterized by extensive use of qualified cardinality restrictions. The goals of developing the ontology are: Providing a simple and concise formalism for representing pharmacogenetic knowledge, Finding errors and lacking definitions in pharmacogenetic knowledge bases Automatically assigning alleles and phenotypes to patients Matching patients to clinically appropriate pharmacogenetic guidelines and clinical decision support messages In the most common scenario, genetic patient data in OWL format is combined with the axioms of the Genomic CDS ontology, and an OWL reasoner is used to infer matching pharmacogenetic treatment recommendations. Several inference steps are needed to derive matching treatment recommendations from raw data about genetic markers (Fig. 2). The raw data consists of small variants in the genetic code, which in most cases are so-called single nucleotide polymorphisms (SNPs), such as an A instead of a G. Alleles are variants of a gene that are defined by containing sets of such small variants. Phenotypes are referring to the specific effects that certain small variants and alleles can have on the organism, e.g., how quickly a patient metabolizes a specific drug. Clinical guidelines can use small variants, alleles and/or phenotypes to match patients with treatment recommendations The human genome usually contains two copies of each gene (one from the father, one from the mother), with each copy potentially bearing multiple genetic variants. Because of this, the ontologies rely heavily on qualified cardinality restrictions with

cardinalities of two, which seems to cause performance issues with most current OWL reasoners. Fig. 2. : Through a series of inference steps, matching pharmacogenetic treatment guidelines are inferred from raw genetic patient data. A simplified example of a rule for inferring an allele (CYP2C9*3) and its single nucleotide polymorphisms (SNPs) from a so-called tagging SNP (a SNP that is necessary and sufficient for inferring the presence of the allele) looks like this in Manchester syntax: Class: 'human with CYP2C9*3' EquivalentTo: has some rs1057910_c SubClassOf: has some 'CYP2C9 *3', (has some rs1057910_c) and (has some rs1057911_a) and (has some rs1799853_c) and (has some rs2256871_a) and (has some rs72558188_agaaatggaa)

An example of an axiom for inferring an adequate clinical decision support message for the anticoagulant drug warfarin (based on a combination of alleles and SNPs according to an official recommendation in the drug label): Class: 'human triggering CDS rule 7' EquivalentTo: (has some 'CYP2C9*1') and (has some 'CYP2C9*3') and (has exactly 2 rs9923231_c) Annotations: label "human triggering CDS rule 7", CDS_message "3-4 mg warfarin per day should be considered as a starting dose range for a patient with this genotype according to the Warfarin drug label (Bristol-Myers Squibb)." We used two OWL 2 reasoners with our ontology: TrOWL 1 [1] and HermiT 2 [2]. We also evaluated other OWL 2 reasoners (Fact++ 3, Pellet 4 ) in early stages of the project, but excluded them from further tests because they did not terminate or crashed even with small, preliminary versions of the ontology we developed. We compared the performance of the two reasoners on a virtual machine running on the Amazon Elastic Cloud Computing (EC2) cloud 5. The machine was of the High-Memory Extra Large Instance type, running Microsoft Windows Server 2008, with 17.1 GB of memory, a 64-bit platform, and two virtual cores with 3.25 EC2 compute units each. The reasoners were run as plugins in the 64 bit version of the Protégé 4.2 ontology editor. The initial heap size for Protégé was 10 10 bytes (10 GB), and the maximum allowed heap size was 1.5x10 10 bytes (15 GB). The TrOWL reasoner plugins with version 0.6 and 1.1 were each run three times for each ontology, and the mean of the time needed for classification was calculated. The HermiT 1.3.8 plugin was run once for each version of the ontology. These preliminary tests showed TrOWL to be significantly more performant than HermiT for classifying the ontologies ( Table 1). However, HermiT was able to identify biologically meaningful inconsistencies present in genomic-cds-demo.owl (but not present in the light version of the ontology). TrOWL did not recognize these inconsistencies, most likely because it only partially covers the OWL 2 DL ruleset. These results show that only TrOWL is performant enough to be used in realistic settings (e.g. for clinical decision support), but that HermiT could serve to test and validate the results from TrOWL during develop- 1 2 3 4 5 http://trowl.eu http://www.hermit-reasoner.com/ http://code.google.com/p/factplusplus/ http://clarkparsia.com/pellet/ http://aws.amazon.com/en/ec2/

ment (possibly comparing the results of the two reasoners for smaller ontology fragments). Table 1. Reasoning performance: TrOWL is significantly more performant than HermiT in classifying our demo ontology (OWL 2 DL with ALCQ expressivity) HermiT 1.3.8 TrOWL 1.1 TrOWL 0.6 genomic-cds-light-demo.owl (2150 classes, 9500 axioms) genomic-cds-demo.owl (2300 classes, 11000 axioms) 3 hours 48 minutes detected inconsistencies 1.5 seconds 18 seconds 5.8 seconds 54 seconds 3 Conclusions and outlook The Genomic CDS ontology is an example of an OWL 2 ontology for clinical genetics and decision support. Even though it is focused on a relatively small set of the most important pharmacogenetic markers, the ontology poses a significant challenge to currently available OWL 2 reasoners. There is great need for reasoners that are optimized for the kinds of OWL axioms encountered in ontologies dealing with clinical genomics. 4 Acknowledgements The research leading to these results has received funding from the Austrian Science Fund (FWF): [PP 25608-N15]. References 1. Thomas, E., Pan, J.Z., Ren, Y.: TrOWL: Tractable OWL 2 Reasoning Infrastructure. the Proc. of the Extended Semantic Web Conference (ESWC2010) (2010). 2. Motik, B., Shearer, R., Horrocks, I.: Hypertableau Reasoning for Description Logics. J. Artif. Intell. Res. 36, 165 228 (2009).