A Comparison of BEL V1.0 and BioPAX Level 3



Similar documents
Big Data Problem? or Big Problem with Data? William Hayes, PhD SVP PlaCorm Dev, Selventa

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Web-Based Genomic Information Integration with Gene Ontology

Molecule Shapes. 1

7/15/2015 THE CHALLENGE. Amazon, Google & Facebook have Big Data problems. in Oncology we have a Small Data Problem!

AP Biology Essential Knowledge Student Diagnostic

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Data integration is a feature that clearly expands the role of the GTL

Network Webinar Series

Unit I: Introduction To Scientific Processes

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

> Semantic Web Use Cases and Case Studies

Appendix B Data Quality Dimensions

Guide to Building Pathways in Mammal using Pathway Studio Web

Semantic Search in Portals using Ontologies

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Research Data Integration of Retrospective Studies for Prediction of Disease Progression A White Paper. By Erich A. Gombocz

The Standard Graphical Notation for Biological Networks

Analysis of Illumina Gene Expression Microarray Data

Managing Variability in Software Architectures 1 Felix Bachmann*

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

Feed Forward Loops in Biological Systems

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Biochemistry. Entrance Requirements. Requirements for Honours Programs. 148 Bishop s University 2015/2016

Ingenuity Pathway Analysis (IPA )

Bachelor of Science in Applied Bioengineering

What s New in Pathway Studio Web 11.1

green B 1 ) into a single unit to model the substrate in this reaction. enzyme

Genetics Lecture Notes Lectures 1 2

Replication Study Guide

The EcoCyc Curation Process

Testing LTL Formula Translation into Büchi Automata

Enzymes and Metabolism

European Medicines Agency

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

Subject Area(s) Biology. Associated Unit Engineering Nature: DNA Visualization and Manipulation. Associated Lesson Imaging the DNA Structure

Catalysis by Enzymes. Enzyme A protein that acts as a catalyst for a biochemical reaction.

White Paper. Yeast Systems Biology - Concepts

Visualizing Networks: Cytoscape. Prat Thiru

How To Understand Enzyme Kinetics

Regulation of enzyme activity

Bachelor of Science in Biochemistry and Molecular Biology

Patterns in. Lecture 2 GoF Design Patterns Creational. Sharif University of Technology. Department of Computer Engineering

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

MULTIPLE CHOICE QUESTIONS

CHM333 LECTURE 13 14: 2/13 15/13 SPRING 2013 Professor Christine Hrycyna

Chapter 8: An Introduction to Metabolism

TECHNICAL INSIGHTS TECHNOLOGY ALERT

Object Oriented Design

Supplementary materials showing the Forrester diagram structure of the models: Sector 1: EXPRESSION

1. A covalent bond between two atoms represents what kind of energy? a. Kinetic energy b. Potential energy c. Mechanical energy d.

Hormones & Chemical Signaling

Protein Protein Interaction Networks

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Lecture 3: Mutations

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

INTERNATIONAL FRAMEWORK FOR ASSURANCE ENGAGEMENTS CONTENTS

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

Business rules and science

University Uses Business Intelligence Software to Boost Gene Research

Edward Odenkirchen, Ph.D. Office of Pesticide Programs US Environmental Protection Agency

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

Pathologist s Discussion of Plaintiffs Latest Theories

Bio-IT World 2013 Best Practices Awards

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Insight-Based Studies for Pathway and Microarray Visualization Tools

Overview of ISO for 'Biologicals'

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Biomedical Careers in Industry: A Few Tips for the Newcomer

AP BIOLOGY 2008 SCORING GUIDELINES (Form B)

Modeling without Borders: Creating and Annotating VCell Models Using the Web

Next Generation Science Standards

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Two Forms of Energy

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Lecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides

13.4 Gene Regulation and Expression

Understanding West Nile Virus Infection

Introduction to PhD Research Proposal Writing. Dr. Solomon Derese Department of Chemistry University of Nairobi, Kenya

Overview of Phase 1 Oncology Trials of Biologic Therapeutics

Genetic testing. The difference diagnostics can make. The British In Vitro Diagnostics Association

Figure 5. Energy of activation with and without an enzyme.

Building Applications with Protégé: An Overview. Protégé Conference July 23, 2006

CNAS ASSESSMENT COMMITTEE CHEMISTRY (CH) DEGREE PROGRAM CURRICULAR MAPPINGS AND COURSE EXPECTED STUDENT LEARNING OUTCOMES (SLOs)

CPO Science and the NGSS

Gene Enrichment Analysis

From Data to Foresight:

BIO202 Fundamentals of Biology: Molecular and Cellular Biology. Prof. Vitaly Citovsky

MUTATION, DNA REPAIR AND CANCER

arxiv: v1 [cs.dl] 20 May 2013

Delivering the power of the world s most successful genomics platform

PreciseTM Whitepaper

A terminology model approach for defining and managing statistical metadata

Biological kinds and the causal theory of reference

Transcription:

A Comparison of BEL V1.0 and BioPAX Level 3 Selventa Inc. and Robert Powers, Predictive Medicine Inc. Joanne Luciano, President, Predictive Medicine Inc. Selventa TM One Alewife Center, Cambridge, MA 02140 617.547.5421 www.selventa.com

Selventa TM One Alewife Center, Cambridge, MA 02140 617.547.5421 www.selventa.com

Table of Contents Introduction... 1 Examples Used in this Document... 1 Distinct Approaches to Curation... 2 Control Description vs. Causal Relation... 2 Common Ground: Enzymatic Catalysis of Reactions... 3 Causality at Multiple Levels of Abstraction... 4 Uncommon Ground: Handling of Contextual Information... 5 Use of Standards and External Vocabularies... 7 Summary... 8 Appendix A: Additional Comparison Examples... 9 Example 1: Inactivation of FOXO1 by AKT1... 9 Example 2: Activation of FOXO1 by PTEN.... 10 Additional Information... 12 Obtaining Technical Support... 12 Email Support... 12 Phone Support... 12 Learning More About Selventa s Software and Services... 12 i

Selventa TM One Alewife Center, Cambridge, MA 02140 617.547.5421 www.selventa.com

Introduction The Biological Expression Language (BEL) and the Biopathways Exchange Language 1 (BioPAX) are both open standards used to capture knowledge about molecular biology and biological processes. While their capabilities have considerable overlap they were designed with different goals and for different uses. BioPAX focuses on enabling integration, exchange, visualization, and analysis of biological pathway data. BEL, in contrast, is designed to represent discrete scientific findings and their relevant contextual information as qualitative causal relationships that can drive knowledge-based analytics. This purpose of this document is to compare and contrast the capabilities of BEL with those of BioPAX Level 3. The focus of BioPAX on pathway construction leads to its rich vocabulary of pathway component classes that can be instantiated and interconnected to create a wide variety of models. This is consistent with the BioPAX mission as a format for data exchange between pathway data groups. In contrast, BEL is intended as an intuitive language of discourse for biologists, in the spirit of chemical reaction notation, allowing scientists to capture scientific findings as concise, discrete statements in a use-neutral form. A second fundamental difference is BEL s focus on the representation of qualitative causal relationships, capturing statements of cause and effect that enable biological inference by applications. BEL s design enables the representation of causal relationships across a wide range of mechanistic detail and between the levels of molecular event, cellular process, and organism-scale phenotype. In contrast, BioPAX provides a vocabulary to express control at a precise, biochemical level of description, facilitating the communication of detailed pathway knowledge. A third distinction is in the treatment of contextual information. BioPAX enables pathways and pathway elements to be documented with specific types of biological and provenance annotations. BEL provides a system of user defined annotations on individual statements to capture information that can facilitate the selection of knowledge for inclusion in models and which enriches the supporting evidence for relationships within a model. BEL and BioPAX can both work with standard vocabularies and ontologies, but with different approaches. Objects in a BioPAX model can be optionally linked to vocabularies via a system of external references. In contrast, all BEL terms are functionally composed based on values from external vocabularies. Because BEL and BioPAX can reference the same vocabularies, it is possible to transform knowledge from one language to the other. Examples Used in this Document The version of BioPAX referred to in this document is BioPAX Level 3 released July 2010. All BioPAX examples are presented in N3 RDF syntax. The BEL examples and definitions in this document use a draft version of the language and syntax 2. For brevity and clarity, all BEL examples and all BioPAX examples will implicitly use the following header information: 1 Demir E, et. al., (2010) The BioPAX community standard for pathway data sharing, Nat Biotechnology 28(9) p 935-42. 2 Selventa, BEL V1.0 Language Overview, Cambridge MA 02140. 1

SET DOCUMENT name = Examples Document SET DOCUMENT version = 1.0 DEFINE NAMESPACE HGNC AS URL http://www.belscripts.org/ns/hgnc DEFINE NAMESPACE CHEBI AS URL http://www.belscripts.org/ns/chebi DEFINE NAMESPACE NCI AS URL http://www.belscripts.org/ns/nci DEFINE ANNOTATION Tissue AS URL http://www.belscripts.org/anno/tissue DEFINE ANNOTATION Cell AS URL http://www.belscripts.org/anno/cell @base <http://www.example.com#>. @prefix ex: <http://www.example.com#>. Figure 1(a) BEL Document Header @prefix bpax: <http://www.biopax.org/release/biopax-level3.owl#>. Figure 1(b) BioPAX Level 3 Header Distinct Approaches to Curation BEL and BioPAX embody different approaches to the curation of biological knowledge. Curation in BioPAX is achieved through the specification of components of pathways and how they interconnect into networks of biological activity. This is achieved through the construction of RDF-triples. BioPAX is extremely flexible in specifying the components of pathways, allowing wide latitude to express what constitutes a pathway and how pathways might interact. This latitude in construction is complemented by a precise system of entities and relationships with well-defined biological meaning. Curation in BEL is achieved by the recording of discrete BEL Statements in BEL Documents. While these BEL Documents and BEL Statements can be used as a resource for the assembly of biological models, the intent is for the BEL Documents to be use-neutral and exist simply as a record of curated facts and as a means for publishing and sharing knowledge that could be used to build models. BEL s mission to capture scientific findings leads to a strategy in which knowledge is expressed as concise, discrete statements with annotations capturing contextual information. Control Description vs. Causal Relation BEL is designed to represent qualitative causal relationships such as cause and effect relationships reported as scientific findings. Scientific findings are primarily expressed using class-level relationships such as increases and decreases. These simple relationships are powerful because all terms in BEL statements represent classes of abundances or classes of process, allowing any two terms to potentially be linked in a causal statement. The semantics of a causal relationship in a BEL statement are simple: a change in one quantity can cause a change in another quantity in the context of the statement. BioPAX expresses control of Interactions and Pathways via instances of Control and instances of its subclasses such as Catalysis, Modulation and TemplateReactionRegulation. These structured entities are appropriate for the expression of a broad range of molecular mechanisms. 2

BEL and BioPAX are both designed to avoid the ambiguity that can occur in simplistic causal flow networks in which activation and inhibition arrows connect nodes representing physical entities. In such networks, the meaning of the causal influence is unclear and is typically interpreted from context by a human reader. BioPAX best practices strongly emphasize that the targets of control processes should not be physical entities and instead should be processes that involve physical entities. The resulting BioPAX models are thus more precise but less compact than causal flow networks, including more objects and greater detail. In contrast, BEL approaches the problem of ambiguity in causal flow by using a structured system of functionally composed entity types to represent modifications, combinations, and activities of entities. The resulting vocabulary enables BEL-derived networks to express specific meanings using a small set of causal relationships. The simplicity of these relationships facilitates causal reasoning in BEL-derived network models, where chains of causal connections can be followed forward to identify potential downstream effects or backward to identify potential upstream controllers. Common Ground: Enzymatic Catalysis of Reactions The following example demonstrates the representation of the catalysis of a specific enzymatic reaction in both BioPAX and BEL. Figure 2(a) below illustrates the representation of this relationship as a single BEL Statement in which the subject is an expression representing a class of molecular activity. The object is an expression representing a class of reaction and the relationship directlyincreases, abbreviated as =>, describes the form of the interaction. The directlyincreases relationship is distinguished from increases in that it indicates that the effect of the subject on the object is mediated by direct physical interaction. The BEL statement is interpreted as increases in the catalytic activity of the abundance of the protein ALOX5 can directly increase the reaction process in which the abundance of arachidonate molecules is the reactant and abundances of leukotriene A4 and water molecules are products. When BEL is paraphrased, it would be more precise to include references to namespaces, such as in the molecular abundance designated by arachidonate in the CHEBI namespace, but these are omitted in the interest of clarity. cat(p(hgnc:alox5)) => reaction( reactants(a(chebi: 5(S)-HPETE )), products(a(chebi: leukotriene A4 ), a(chebi: water ) )) Figure 2(a) BEL Representation of a Catalytic Reaction Figure 2(b) illustrates the corresponding reaction in BioPAX format. :palox5 a bpax:protein ; bpax:displayname "ALOX5" ; bpax:entityreference :pralox5. :pralox5 a bpax:proteinreference ; bpax:xref :xralox5. :xralox5 a bpax:unificationxref ; bpax:db "HGNC" ; bpax:id "HGNC:435". # Arachidonic acid 5-hydoperoxide :marachidonic a bpax:smallmolecule; bpax:displayname "Arachidonic acid" ; bpax:entityreference :mrarachidonic. :mrarachidonic a bpax:smallmoleculereference ; bpax:xref :xrarachidonic. :xrarachidonic a bpax:unificationxref ; bpax:db "CHEBI" ; bpax:id "CHEBI:15632". 3

:mleukotriene a bpax:smallmolecule ; bpax:displayname "Leukotriene A4" ; bpax:entityreference :mrleukotriene. :mrleukotriene a bpax:smallmoleculereference ; bpax:xref :xrleukotriene. :xrleukotriene a bpax:unificationxref ; bpax:db "CHEBI" ; bpax:id "CHEBI:15651". :mwater a bpax:smallmolecule ; bpax:displayname "Water" ; bpax:entityreference :mrwater. :mrwater a bpax:smallmoleculereference ; bpax:xref :xrwater. :xrwater a bpax:unificationxref ; bpax:db "CHEBI" ; bpax:id "CHEBI:15377". :reaction a bpax:biochemicalreaction ; bpax:conversiondirection "LEFT-TO-RIGHT" ; bpax:left :marachidonic ; bpax:right :mleukotriene, :mwater. :catalysis a bpax:catalysis ; bpax:controltype "ACTIVATION" ; bpax:controller :palox5 ; bpax:controlled :reaction. Figure 2(b) BioPAX Representation of a Catalytic Reaction In this example the reaction is expressed as a BiochemicalReaction and the catalytic activity becomes a Catalysis which controls the BiochemicalReaction and in which the protein ALOX5 is the controller. Causality at Multiple Levels of Abstraction Scientific findings in molecular biology and the life sciences vary widely in the granularity and completeness of the information reported. BEL s approach enables the representation of causal relationships across a wide range of mechanistic detail and between the levels of molecular event, cellular process, and organism-scale phenotype. The following example illustrates a case in which a class of chemicals can affect a tissuelevel biological process. In BEL, the concept corticosteroids inhibit tissue damage fits in the same qualitative causal paradigm as the previous example of enzymatic catalysis, while in BioPAX, tissue damage doesn t fit the BioPAX constraints on the targets of Control processes, which are required to be either Interactions or Pathways. Figure 3(a) illustrates the representation of this concept in BEL as a decreases ( - ) relationship between a class of molecular abundance and a class of biological process. The example can be interpreted as an increase in the abundance of corticosteroid molecules can cause a decrease in the frequency/intensity of the biological process of tissue damage. 4

a(chebi:corticosteroid) - bp(nci: Tissue Damage ) Figure 3(a) BEL Representation of an Inhibition This statement causally-links the abundance of a specific type of molecule to a tissue or organism-level phenomenon, but is silent about what mechanisms would contribute to it. BEL can be said to take a phenomenological approach to knowledge representation, describing empirically demonstrated relationships between types of abundance or process that are directly or indirectly measurable. Figure 3(b) illustrates the corresponding expression of this concept in BioPAX. This relationship can be encoded by instantiating a Control with the type INHIBITION, a SmallMolecule to represent corticosteroid, and an Interaction to represent tissue damage. :mcorticosteroid a bpax:smallmolecule ; bpax:displayname "Corticosteroid" ; bpax:entityreference :mrcorticosteroid. :mrcorticosteroid a bpax:smallmoleculereference ; bpax:xref :xrcorticosteroid. :xrcorticosteroid a bpax:unificationxref ; bpax:db "CHEBI" ; bpax:id "CHEBI:16827". :bptissuedamage a bpax:interaction ; bpax:displayname "Tissue Damage" ; bpax:entityreference :mrtissuedamage. :mrtissuedamage a bpax:smallmoleculereference ; bpax:xref :xrtissuedamage. :xrtissuedamage a bpax:unificationxref ; bpax:db "NCIt" ; bpax:id "NCIt:C50773". :control a bpax:control ; bpax:controltype "INHIBITION" ; bpax:controller :mcorticosteroid ; bpax:controlled :bptissuedamage. Figure 3(b) BioPAX Representation of an Inhibition One issue with this approach is that the representation of tissue damage as an Interaction is inconsistent with its definition, A biological relationship between two or more entities. An interaction is defined by the entities. Neither Interaction nor its subclasses of Catalysis, Modulation, and TemplateReactionRegulation are appropriate types for a complex, tissue-level phenomenon like tissue damage. The essential concept of generic control or qualitative causality is present in both languages, but the vocabulary of BioPAX is primarily appropriate for processes at the molecular level. Uncommon Ground: Handling of Contextual Information The two languages vary in their treatment of contextual information. BioPAX provides specific annotations for pathways and pathway elements while BEL provides an extensible system of annotations at the statement level. The latter provides more 5

flexibility in capturing information that can facilitate the selection of knowledge used in model construction and which enriches the supporting evidence for relationships within a model. BioPAX pathways and pathway elements encode context by the assignment of species, cellular compartment, supporting references, and a BioSource. A BioSource object, for example, encapsulates biological context such as cell and tissue types. This information documents the BioPAX model, allowing indexing and enabling scientists to identify the supporting references. BEL provides a system in which annotation types and their allowed values can be defined, documented, and then used in BEL Documents. Each BEL Statement in a BEL Document may be annotated with multiple annotation types. Commonly used annotation types include the supporting citation for a statement, evidence text from the citation, tissue, cell type, cellular location, species, and disease. Additional annotation types such as dosage, exposure time, recovery time, and other experimental parameters can be easily defined and used as needed. The content of BEL-derived models can be controlled by using this system of statement annotations to select relevant knowledge, such as limiting the included statements based on species or cell type. BEL s system of annotations supports representation of contextual information about each scientific finding such as the biological system in which the finding was demonstrated, experimental methodology employed, reference, and curation process. Figure 4(a) illustrates a BEL Statement that encodes the direct transcriptional activation of RBL2 by the FOXO1 transcription factor with a cell annotation of fibroblast and a tissue of annotation of lung, indicating that the relationship has been demonstrated in the context of lung fibroblasts. SET Cell = Fibroblast SET Tissue = lung tscript(p(hgnc:fox01)) => r(hgnc:rbl2) Figure 4(a) BEL Encoding of a Direct Transcriptional Activation of RBL2 While the scope of the BEL annotations to a statement or group of statements doesn t have an exact parallel in BioPAX, Figure 4(b) illustrates an approach in which a BioPAX pathway is instantiated to contain the transcriptional activation of RBL2 that is in turn controlled by a complex containing FOXO1. :pfoxo1 a bpax:protein ; bpax:displayname "FOXO1" ; bpax:entityreference :prfoxo1. :prfoxo1 a bpax:proteinreference ; bpax:xref :xrfoxo1. :xrfoxo1 a bpax:unificationxref ; bpax:db "HGNC" ; bpax:id "HGNC:3819". :prbl2 a bpax:protein ; bpax:displayname "RBL2" ; bpax:entityreference :prrbl2. :prrbl2 a bpax:proteinreference ; bpax:xref :xrrbl2. :xrrbl2 a bpax:unificationxref ; bpax:db "HGNC" ; bpax:id "HGNC:9894". :complex a bpax:complex ; bpax:component :pfoxo1. 6

:trregulation a bpax:templatereactionregulation ; bpax:controltype "ACTIVATION" ; bpax:controller :complex ; bpax:controlled :templatereaction0. :templatereaction0 a bpax:templatereaction ; bpax:product :rna. :rna a bpax:rna ; bpax:displayname "RBL2" ; bpax:entityreference :rrrbl2. :rrrbl2 a bpax:rnareference ; bpax:xref :xrrbl2. :templatereaction1 a bpax:templatereaction ; bpax:template :rna ; bpax:product :prbl2. :pathway a bpax:pathway ; bpax:pathwaycomponent :trregulation ; bpax:organism :biosource. :biosource a bpax:biosource ; bpax:celltype :cellvocabulary ; bpax:tissue :tissuevocabulary. :cellvocabulary a bpax:cellvocabulary ; bpax:xref :xrfibroblast. :xrfibroblast a bpax:unificationxref ; bpax:db "NCIt" ; bpax:id "NCIt:C12482". :tissuevocabulary a bpax:tissuevocabulary ; bpax:xref :xrlung. :xrlung a bpax:unificationxref ; bpax:db "NCIt" ; bpax:id "NCIt:C12468". Figure 4(b) BioPAX Encoding of a Direct Transcriptional Activation of RBL2 In this example, the pathway is annotated via an organism relationship to a BioSource that has cell and tissue assignments, backed with Xrefs to the appropriate external vocabularies. Use of Standards and External Vocabularies Both BioPAX and BEL are designed to integrate with external vocabularies. BioPAX starts from the capability to represent interactions and pathways and then provides the means to link those objects to external references. BEL starts from the premise that all terms must be defined based on one or more external references and that equivalence of 7

terms defined by different namespaces depends on the choice of mappings between namespaces. BEL and BioPAX both support the use of standard vocabularies and ontologies to aid in the consistent representation of biological knowledge. BioPAX enables the modeler to annotate components with references to values in external standard vocabulary sources (Xrefs) whereas BEL uses terms that are defined by values in BEL namespaces, structures which specify allowed identifiers by enumeration. BEL namespaces may be based on standard vocabularies and ontologies provided by the BEL Framework or they can be created by users and organizations to reflect a working vocabulary specific to their needs. In BEL, all terms are specified by functional expressions where the inputs to the functions are either BEL namespace values, special forms such as protein modification indicators, or other term expressions. Term functions determine the type of term that an expression specifies, where the possible types comprise an intrinsic BEL ontology. A given expression in BEL is therefore dependent on the namespaces employed but can be a valid expression within the language regardless of the choice of namespaces. Equivalence between terms composed using different identifiers can be explicitly managed within the BEL Framework. The support for the use and reconciliation of different namespaces facilitates dynamic, programmatic integration of findings encoded in BEL by diverse authorities. In contrast, BioPAX provides an extensive ontology of model components and representational mechanisms to link instances of those components to values in external vocabularies and ontologies. As seen in the examples from previous sections, biological objects in external databases, such as proteins and small molecules can be referenced by pathway components using instances of the Xref class hierarchy. External controlled vocabulary terms, such as those defined by the Gene Ontology Consortium or the PSI- MI initiative, are referenced by pathway components using instances of the ControlledVocabulary class hierarchy. These links enable data integration and provide definition to the elements of BioPAX models. Summary BEL and BioPAX are languages that embody alternative, complementary approaches to the representation of biological knowledge. While they are both committed to the sharing of represented knowledge and linkage to public vocabularies, they are best suited to the capture of different kinds of relationships and reflect different processes of curation and use. BEL enables scientific findings to be expressed as qualitative causal or correlative relationships between entities that are modeled as processes and abundances. The language supports the collation and use of reusable facts to dynamically assemble models, ranging from large models that can be considered as causal network knowledgebases to small models that express pathways. Many choices in the design of BEL were driven by the goal of powering knowledge-driven applications that reason about causation in biology. In contrast, BioPAX is designed to be a data exchange format that can help make data collection and integration easier across the wide variety of existing pathway resources. BioPAX enables precise and detailed descriptions of molecular mechanisms for clear communication. The capabilities that distinguish BEL from BioPAX enable it to address different and important challenges in the life sciences. 8

Appendix A: Additional Comparison Examples The following examples explore cases of concepts expressed in BEL compared with possible renderings in BioPAX. As shown earlier in this document, for a few cases, such as catalysis of reactions, a straightforward correspondence exists between the structures in the two languages. In most cases, the divergence in the intent of the two languages make it appropriate to compare representational strategies but not to look for exact correspondence. In these examples, the linkage of BioPAX objects to external vocabularies via Xrefs is omitted for brevity and clarity. Example 1: Inactivation of FOXO1 by AKT1 AKT1 is a kinase that can phosphorylate the FOXO1 transcription factor, causing it to be excluded from the nucleoplasm, thereby preventing its transcriptional activity. Figure 5(a) illustrates an approach to the representation of this relationship in BioPAX in which the phosphorylation itself is expressed as a BiochemicalReaction and then a Catalysis is instantiated to indicate that AKT1 has activating role. @base <http://www.example.com#>. @prefix ex: <http://www.example.com#>. @prefix bpax: <http://www.biopax.org/release/biopax-level3.owl#>. :pakt1 a bpax:protein. :pfoxo1 a bpax:protein. :pfoxo1p a bpax:protein. :matp a bpax:smallmolecule. :madp a bpax:smallmolecule. :kapakt1 a bpax:catalysis ; bpax:controltype "ACTIVATION" ; bpax:controller :pakt1 ; bpax:controlled _:phosphorylation. _:phosphorylation a bpax:biochemicalreaction ; bpax:conversiondirection "LEFT-TO-RIGHT" ; bpax:left :matp, :pfoxo1 ; bpax:right :madp, :pfoxo1p. _:complex a bpax:complex ; bpax:component :pfoxo1. _:transcriptionalactivity a bpax:templatereactionregulation ; bpax:controller _:complex. Figure 5(a) BioPAX Encoding of the Inactivation of FOXO1 by AKT1 The effect on the transcriptional activity of FOXO1 can be expressed by making the protein representing unphosphorylated FOXO1 a component of a complex that controls a TemplateReactionRegulation representing FOXO1 transcriptional activity. The 9

phosphorylation reaction could be inferred to consume the unphosphorylated species, reducing the amount of the complex and hence the dependent reaction. Figure 5(b) illustrates a BEL representation of the same example. In this case the phosphorylation is expressed by a direct causal increase relationship between the kinase activity of AKT1 and the abundance of FOXO1 phosphorylated at an unspecified residue, indicating that the effect is mediated by the physical interaction of AKT1 molecules with FOXO1 molecules. The abundance of phosphorylated FOXO1 is in turn related to the transcriptional activity of FOXO1 by a direct causal decrease relationship, completing a chain of causation from the activity of AKT1 to the activity of FOXO1. kin(p(hgnc:akt1)) => p(hgnc:foxo1, pmod(p)) p(hgnc:foxo1, pmod(p)) = tscript(p(hgnc:foxo1)) Figure 5(b) BEL Encoding of the Inactivation of FOXO1 by AKT1 The BEL approach omits some of the reaction details, but these could also be captured if we had chosen to use a reaction term expression as the object of the BEL statement. An alternative BEL representation is illustrated in Figure 5(c). kin(p(hgnc:akt1)) = tscript(p(hgnc:foxo1)) Figure 5(c) Alternative BEL Encoding of the Inactivation of FOXO1 by AKT1 In this example, the detail of the abundance of phosphorylated FOXO1 is omitted and the two activities are linked directly. Example 2: Activation of FOXO1 by PTEN. The case where the activity of FOXO1 is influenced by a more distant upstream controller has a similar representation in both BEL and BioPAX. In this example, the activity of the phosphatase PTEN leads to increased FOXO1 activity. PTEN inhibits the activity of AKT1 by an indirect mechanism and this in turn leads to FOXO1 activation. Figure 6 illustrates the representation of the empirical relationship between PTEN phosphatase activity and FOXO1 transcriptional activity using the causal relationship increases (abbreviated - > ) which does not imply that the relationship is mediated by direct physical interaction: phos(p(hgnc:pten)) -> tscript(p(hgnc:foxo1)) Figure 6 BEL Encoding of the Activation of FOXO1 by PTEN To express this concept in BioPAX, the entire chain of de-phosphorylations, phosphorylations, translocations, and other events that mediate the mechanism could be modeled. But this level of detail may not be reported or known at all when representing a given scientific finding. The indirect relationship is a valid biological fact that is potentially valuable to record because it can inform subsequent understanding by scientists an inference by algorithms. Following the strategy used in the example above, representations could be constructed for both the phosphatase activity of PTEN and the transcriptional activity of FOXO1. The difference is that there would be no shared element to link the two, as there was in the case of the FOXO1 protein being both a consumed reactant in a phosphorylation and also a member of a controlling complex in a transcriptional event. One strategy to link the two would be to associate each activity with a PathwayStep and contain those in sequence in a Pathway. This might be thought of as implying causality by sequence, but is much weaker than the causation implicit in an instance of Control. To satisfy the range constraints of the controlled and controller relationships of a Control, the activities 10

could be each encapsulated in a Pathway and then one Pathway could be controlled by the other. This seems like an awkward solution and one that would scale badly in a large model with many indirect causal relationships. The concept of the BioPAX Control, as it is currently defined, is being used inappropriately in this strategy, and this case illustrates a fundamental difference in the intention of the two languages. 11

Additional Information This section provides additional information that might be helpful to you. Obtaining Technical Support Technical support is available by phone or email during normal business hours (8am to 5pm EST). Email Support Send an email to support@selventa.com. Please make sure to include your customer account number, user name, a phone number where you can be reached and details about the issue. Phone Support Please call Selventa s technical support line at (617) 851-5273 during normal support hours. Learning More About Selventa s Software and Services For all sales and other inquires, please contact: Louis Latino EVP Sales and Marketing One Alewife Center, Cambridge MA 02140 Phone: (617) 547-5421 x237 Email: llatino@selventa.com 12