The Preparation of Information in Data Science
The Role of Ontologies in Unlocking Big Data Big Data holds the potential of revealing great insights from large diverse data sets if properly exploited with the right analytics To better realize this potential a shift needs to occur from representations of individual data sets to representations that enable interoperability across all data sets 2
The Common Core Development Method Rule governed development of an extensible set of ontologies to which data from sub-domains can be aligned and linked together Combines principles from the Linked Open Data Initiative, Open Biological and Biomedical Ontologies (OBO) Foundry, and object-oriented programming 3
Linked Open Data Initiative Began as a means for integrating data on the world wide web Based on a simple set of guiding principles* Use Universal Resource Identifiers (URIs) as names of things Use HTTP URIs so that people can look up those names When someone looks up a URI provide useful information Include links to other URIs so they can discover other things *Tim Berners-Lee Linked Open Data h:ps://www.w3.org/designissues/linkeddata 4
A Linked Open Data Success Story DBPedia Pages accessed from web browsers that link data from Wikipedia 5
Linked Open Data Issue - A Profusion of Ontologies Linking Open Data cloud diagram 2014, by Max Schmachtenberg, ChrisPan Bizer, Anja Jentzsch and Richard Cyganiak. h:p://lod-cloud.net/ 6
Effects of Profusion Costs increase relative to the amount of duplicative effort relative to the number of mappings relative to the number of vernaculars Effectiveness decreases Searches have low recall and precision Re-use creates ambiguities 7
OBO Foundry The Open Biological and Biomedical (OBO) Foundry is a collaborative group of organizations devoted to establishing best practices in ontology development Leverages the lessons learned from over $300M investment in ontology development 8
An OBO Foundry Best Practice Use a Common Upper EnPty Object Quality OrganizaPo n Physical ArPfact bearer_of Quality of OrganizaPo n Quality of Physical ArPfact has_quality has_quality Produces common patterns within ontologies Reuse of mappings from the sources Easier to include new sources of data Enables reuse of queries and analytics Structure of data stays constant Easier to transition to new domains of interest 9
Basic Formal An upper ontology with not more than 40 class terms and 20 relationships Provides an extensible structure for the interrelationships between basic entities Used as the upper ontology in hundreds of ontologies, primarily in the biomedical domain Used by at least one hundred different project 10
An OBO Foundry Best Practice - Truth as a Development Guideline Strive towards creating a digital copy of the world Adds the constraint that every assertion within an ontology must be true Reduces perspective from the ontology enabling links to many sources Provides an objective means for settling disputes over terminology 11
OBO Foundry Issue - Ontologies with Too Wide a Scope Good practice of reusing existing terminology But the of Biomedical Investigations (OBI) is not a logical choice for where the term Organization is maintained 12
Object Oriented Programming - Modularity as a Development Guideline One axis of modularity in the CCO is level of generality Upper and midlevel ontologies are stable and of manageable scale Upper Ontologies Describe the Structure of the World Mid-Level Ontologies Add General Content to the Structure Content and structure is inherited from higher levels Domain Level Ontologies Add Content Relevant to a Community 13
Object Oriented Programming - Modularity as a Development Guideline The second axis of modularity in the CCO is content parpcipates in Process Physical Object occurs on occurs at contained in has Temporal Region Site Site A:ribute 14
The Common Core Ontologies in Practice The Common Core Ontologies (CCO) are intended to serve as a vocabulary that can describe objects and processes that are common to many domains of interest The remaining objects and processes that are unique to particular domains of interest are described by ontologies that extend from the CCO in a repeatable, rule governed process 15
The Common Core and Domain Ontologies Basic Formal (BFO) Upper : Extended RelaPon Common Core : Domain : Event Agent Quality ArPfact GeospaPal Time Affec%ve State Ethnicity Occupa%on Ci%zenship Curriculum Sensor Undersea Warfare WatercraC Hydrographic Feature Agent Informa%on Physiographic Feature InformaPon EnPty Units of Measure Space Object Currency Unit 16
The Benefits of the Common Core Development Process 17